Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In various fields, for actions to be performed by a user, such as transactions, investments, ratings and the like, an image including a target object is uploaded by the user, the target object is generally a certificate, and the identity, authority, capability and the like of the user are identified.
In order to accurately acquire the content of the certificate, the target object in the image needs to be located and classified before the target object in the image, so as to determine whether the target object in the image is possible to be accurately identified. At present, a target object in an image can be positioned in some scenes, and a target object in an image can be positioned in other scenes, however, the two ways of respectively obtaining the positioning and classification of the target object are time-consuming, occupy a large storage space, are easily limited by the storage space, and are particularly difficult to apply in terminal equipment.
For the above scene, the embodiment of the application is based on a multitask learning model, the positioning information and the classification information of the target object in the image are acquired in parallel, and the image is processed according to the positioning information and/or the classification information, so that the image can be accurately identified.
The technical scheme of the embodiment of the application can be applied to various electronic devices. The electronic device may be a terminal device, such as a Mobile Phone (Mobile Phone), a tablet computer (Pad), a computer, a Virtual Reality (VR) terminal device, an Augmented Reality (AR) terminal device, a terminal device in industrial control (industrial control), a terminal device in unmanned driving (self driving), a terminal device in remote medical treatment (remote medical), a terminal device in smart city (smart city), a terminal device in smart home (smart home), and the like. The terminal equipment in this application embodiment can also be wearable equipment, and wearable equipment also can be called as wearing formula smart machine, is the general term of using wearing formula technique to carry out intelligent design, develop the equipment that can dress to daily wearing, like glasses, gloves, wrist-watch, dress and shoes etc.. A wearable device is a portable device that is worn directly on the body or integrated into the clothing or accessories of the user. The terminal device may be fixed or mobile.
For example, the electronic device in the embodiment of the present application may also be a server, and when the electronic device is the server, the electronic device may receive an image acquired by the terminal device and perform image processing on the image.
Fig. 1a or fig. 1b are schematic structural diagrams of an electronic device 100 according to an embodiment of the present disclosure. As shown in fig. 1a, the electronic device 100 comprises: the image processing device comprises an image acquisition unit 101, a preprocessing unit 102, an image recognition unit 103 and a post-processing unit 104, wherein the image acquisition unit 101 is connected with the preprocessing unit 102, the image recognition unit 103 and the post-processing unit 104 in sequence.
The image acquiring unit 102 is configured to acquire an image to be recognized, which includes, for example, a document object. For example, the image acquired by the image acquisition device for the target object, or the image transmitted by another device and containing the target object, or the image input by the user and containing the target object may be received, which is not limited in this embodiment of the present application.
The preprocessing unit 102 receives the image to be recognized sent by the image obtaining unit 101, and performs a preprocessing operation on the image to make the image meet the input requirement of the image recognizing unit 103, for example, to adjust the image to a preset size, such as 224 × 224 pixels.
The image recognition unit 103 receives the image containing the target object sent by the preprocessing unit 102, and recognizes the image to obtain the positioning information and the classification information of the target object.
Illustratively, as shown in fig. 1b, the image recognition unit 103 includes a feature extractor 1031, a classifier 1032 and a regressor 1033, and the optional image recognition unit 103 may be an image recognition model trained based on a multi-task learning model. The feature extractor 1031 is connected to the classifier 1032 and the regressor 1033, respectively, and exemplarily, the classifier 1032 and the regressor 1033 are all connected in a full layer, that is, each node on the top layer in the classifier 1032 or the regressor is connected to all nodes on the bottom layer of the feature extractor 1031. The feature extractor 1031 is configured to perform feature extraction on the image to obtain a feature image, and the classifier 1032 and the regressor 1033 both acquire the feature image from the feature extractor 1031 and perform data processing in a parallel manner. The classifier 1032 is configured to output classification information of the target object according to the feature image, and the regressor 1033 is configured to output localization information of the target object according to the feature image.
The post-processing unit 104 receives the classification information and the positioning information sent by the image recognition unit 103, and performs a corresponding image processing operation based on the classification information and/or the positioning information.
The present application is specifically illustrated by the following examples.
Fig. 2 is a flowchart illustrating an image processing method 200 according to an embodiment of the present disclosure.
In order to improve the efficiency of image recognition and save storage resources, the embodiment of the application determines the positioning information and the classification information of the target object based on the image recognition model obtained by pre-training, and executes corresponding image processing operation based on the positioning information and/or the classification information, so that the image can be accurately recognized.
As shown in fig. 2, the image processing method provided in the embodiment of the present application includes:
s201: and inputting the image containing the target object into the image recognition model to obtain a recognition result of the target object.
The identification result comprises positioning information and classification information.
The positioning information includes coordinates of a plurality of corner points of the target object, and when the target object is a certificate class object, the target object generally has four corner points, and optionally, coordinates of each corner point are two-dimensional coordinates, that is, x values and y values, and then the positioning information is 8-dimensional information.
The classification information includes a probability of the target object on each preset category for characterizing a state of the target object or the image. Optionally, the preset categories include: at least one of a front side is presented in the image, a back side is presented in the image, an error object, a reproduction, a screenshot, a rotation angle of 0 °, a rotation angle of 90 °, a rotation angle of 180 °, and a rotation angle of 270 °.
The classification information may include more or less content than the above examples, depending on the application scenario. Illustratively, the classification information includes at least one of a presence of a positive side in the image, a presence of a negative side in the image, and an erroneous object.
The image recognition model is obtained by training based on a multi-task learning model established in advance. It should be understood that the multitask learning model is a network model established based on the multitask learning mechanism, has shared weight, and can be parallel to a plurality of branches.
In this step, the pre-trained image recognition model can determine the positioning information and the classification information of the target object in parallel, in other words, the positioning information and the classification information output by the image recognition model can be obtained by inputting the image containing the target object into the image recognition model.
S202: based on the recognition result, a corresponding image processing operation is performed.
For example, the corresponding image processing operation may be performed based on the positioning information in the recognition result, for example, the target object may be extracted from the image based on the positioning information, in other words, the background image not containing the target object may be removed based on the positioning information.
For example, if the state of the target object is determined to be a normal state based on the classification information, the next recognition operation is continued on the image, or the image is stored, or no processing is performed, etc., if the state of the target object is determined to be an abnormal state based on the classification information, indication information is generated to indicate that the image containing the target object is acquired again, and if the state of the target object is determined to be a state to be corrected based on the classification information, the image is corrected.
Both of the above examples may be performed in parallel.
In another example, the state of the target object may be determined based on the location information and the classification information in the recognition result, for example, the state of the target object may be determined based on the classification information, and when the state of the target object is a normal state, the target object may be extracted from the image based on the location information.
Optionally, the abnormal state includes: presenting a front side on the image or presenting a back side on the image, which is not consistent with a preset orientation, for example, the preset orientation is the front side of the target object, that is, the front side of the target object needs to be acquired, and the image is presented as the back side of the target object; the target object is an error object, for example, the image does not contain the required target object; the image containing the target object is obtained by copying, for example, the acquired target object is a target object in another image and is not a target object entity; the image containing the target object is obtained from a screenshot, and so on.
Optionally, the normal state includes: presenting the front side on the image or the back side on the image in accordance with a preset orientation; the rotation angle of the target object is 0 °.
Optionally, the state to be corrected includes: the rotation angle of the target object is 90 °, 180 °, 270 °.
In the embodiment of the application, the positioning information and the classification information of the input image are identified in parallel through the image identification model to obtain the identification result, the processing efficiency is improved, the image identification model with the weight sharing multi-task learning mechanism saves the storage space, and the image is subjected to corresponding image processing operation according to the identification result to ensure that the target object in the image can be accurately identified.
In a specific implementation mode, feature extraction is carried out on an image through a feature extractor of an image recognition model to obtain a feature image of a target object; and inputting the characteristic image into a classifier of the image recognition model to obtain classification information, and inputting the characteristic image into a regressor of the image recognition model to obtain positioning information.
Fig. 3 is a schematic flowchart of an image recognition model according to an embodiment of the present disclosure. As shown in fig. 3, an image 300 including a target object 301 is input to a feature extractor 310, a feature image is obtained by feature extraction, and the feature images are input to a classifier 320 and a regressor 330, respectively. The classifier 320 outputs classification information of a target object or an image based on a feature image, for example, a probability of outputting a preset class of 3 dimensions, an error object (non _ idcard)0.01, a front side (front _ idcard)0.02 in the image, and a back side (back _ idcard)0.97 in the image. The regressor 330 outputs positioning information based on the feature images, for example, 8-dimensional coordinate information [ x1, y1, x2, y2, x3, y3, x4, y4] is output, where (x1, y1), (x2, y2), (x3, y3), (x4, y4) are coordinates of 4 corner points of the target object in the image, respectively.
Fig. 4 is a schematic diagram of a feature extraction flow of a feature extraction network according to an embodiment of the present application. For example, the feature extractor may be a feature extraction network, which performs channel separation processing, convolution processing, stitching processing, and channel scrambling processing on the input image in sequence to obtain and output a feature image, as shown in fig. 4. Optionally, the convolution processing includes 5 convolution stages, and the dimensions of the output channels of the 5 convolution stages are set to 24, 36, 72, 144, and 512 in sequence through parameter setting. It should be appreciated that the architecture of the feature extraction network is similar to the lightweight convolutional neural network, ShuffleNet V2.
And respectively inputting the characteristic image into a classifier and a regressor of the image recognition model, wherein the classifier outputs classification information of the image, and the regressor outputs positioning information of the target object in the image.
For example, the classifier can be regarded as a mapping function G, the input of which is the above-mentioned feature image F, so that the classification of the classifier is G (F). The classifier has high expandability, and can expand output dimensionality to n dimensionalities according to requirements to meet n classification requirements.
Illustratively, the regressor is represented by a function P, and due to the multitask learning mechanism, the input of the function P is also the above-mentioned feature image F, so the regression output of the regressor is P (F). Generally, the target object of the certificate is mostly rectangular and has four corner points, the output dimension of the full connection layer is 8-dimensional, and the input feature of the certificate is based on the shared feature image extracted by the feature extractor, so that the size of the model can be greatly reduced by using the method based on point regression, the size of the finally obtained model is generally within 1 million, and the requirements of electronic equipment with small storage space, such as mobile terminals like mobile phones and intelligent wearing, can be met. If the positioning problem is not converted into the corner point regression problem, but the problem is solved by adopting the traditional target detection or segmentation method, the size of the model is often different from dozens of megabytes to hundreds of megabytes, and the limitation of the storage space of the mobile terminal equipment cannot be met.
On the basis of any of the above embodiments, the embodiments of the present application propose the following three possible implementations as to how to execute the corresponding image processing operation based on the recognition result:
the first method is as follows: and extracting the target object from the image based on the positioning information in the identification result.
It should be understood that when the image capturing device captures an image of a target object in the certificate class, an inclination angle inevitably exists between the image capturing device and the target object, as shown in fig. 5a, the outline of the target object appears as a trapezoid in the image, and in order to accurately recognize the target object in the image, it is first necessary to correct the appearance effect of the target object to convert the target object into a regular quadrangle as shown in fig. 5 b.
Fig. 6 is a schematic flowchart of an image processing method according to an embodiment of the present application. As shown in fig. 6, the method includes:
s601: and performing affine transformation on the image based on the coordinates of the plurality of corner points to obtain a plurality of transformed corner point coordinates.
For example, the rotation matrix may be determined based on coordinates of a plurality of corner points, for example, a geometric center of the image is obtained, the rotation matrix is obtained based on the geometric center and coordinates of the plurality of corner points of the target object, and the image is subjected to radial transformation based on the rotation matrix to obtain coordinates of the plurality of corner points after transformation.
S602: and extracting the target object from the transformed image based on the transformed plurality of corner point coordinates.
In this step, extracting the target object may be understood as segmenting the image to obtain a foreground image including the target object and a background image not including the target object.
Further, image recognition may be performed on the foreground image including the target object, for example, to recognize content such as characters and images in the target object.
And secondly, determining the state of the target object based on the classification information in the recognition result, and executing corresponding image processing operation based on the state of the target object.
Fig. 7 is a flowchart illustrating an image processing method according to an embodiment of the present application. As shown in fig. 6, the method includes:
s701: and determining at least one category to which the target object belongs based on the probability of the target object on each preset category and a preset threshold value.
S702: and determining the state of the target object to be a normal state, an abnormal state or a state to be corrected based on the at least one category.
For example, for each preset category, if the probability of the target object in the preset category is greater than a preset threshold, the preset category is determined to be the category to which the target object belongs, and if the probability of the target object in the preset category is less than or equal to the preset threshold, the target object is determined not to belong to the preset category. It should be understood that the target object may belong to one to more categories.
Assuming that the preset category comprises an error object, a front side is presented in the image, and a back side is presented in the image, the probability of the target object in the error object category is 0.01, the probability of the target object in the front side is 0.02, and the probability of the target object in the back side is 0.97, at least one category to which the target object belongs is the back side presented in the image. Assuming that the preset orientation is a reverse direction, that is, the reverse side of the target object needs to be acquired, the state of the target object is a normal state, and the preset orientation is a forward direction, that is, the front side of the target object needs to be acquired, the state of the target object is an abnormal state.
Illustratively, when the preset category includes one or more of presenting a front side on the image, presenting a back side on the image, an error object, a flip, a screenshot, a rotation angle of 0 °, a rotation angle of 90 °, a rotation angle of 180 °, or a rotation angle of 270 °, the state of the target object may be determined by:
if the at least one category includes presenting a front side on the image or presenting a back side on the image, determining whether an orientation of the target object on the image is consistent with a preset orientation.
And if at least one category comprises an error object, a reproduction or a screenshot, or the orientation of the target object on the image is not consistent with the preset orientation, the target object is in an abnormal state.
If the at least one category does not comprise the error object, the copying and the screenshot, and the orientation of the target object on the image is consistent with the preset orientation, determining a rotation angle category which is included by the at least one category; the rotation angle category is one of a rotation angle of 0 °, a rotation angle of 90 °, a rotation angle of 180 °, or a rotation angle of 270 °.
Further, if the rotation angle category is a rotation angle of 90 °, a rotation angle of 180 °, or a rotation angle of 270 °, the target object is in a state to be corrected; if the rotation angle type is a rotation angle of 0 °, the state of the target object is a normal state.
Illustratively, on the basis of the above embodiment, based on the state of the target object, the corresponding image processing operation is performed, including: if the state of the target object is an abnormal state, generating indication information, wherein the indication information is used for indicating to reacquire an image containing the target object; if the state of the target object is the state to be corrected, rotating the image to enable the rotation angle of the target object to be 0 degrees; if the state of the target object is a normal state, the next recognition operation can be continuously carried out on the image, or the image is stored or is not processed.
And thirdly, executing corresponding image processing operation based on the classification information and the positioning information in the recognition result.
Illustratively, the state of the target object is determined based on the classification information in the recognition result, and the corresponding image processing operation is performed based on the state of the target object.
The specific implementation process of determining the state of the target object based on the classification information in the recognition result is similar to that in the above embodiment, and is not described here again.
Illustratively, if the state of the target object is an abnormal state, generating indication information, wherein the indication information is used for indicating to reacquire an image containing the target object; if the state of the target object is the state to be corrected, rotating the image to enable the rotation angle of the target object to be 0 degrees; and if the state of the target object is a normal state, extracting the target object from the image based on the positioning information.
The process of extracting the target object from the image based on the positioning information is similar to the above embodiment, and is not described herein again.
Fig. 8 is a schematic structural diagram of an electronic device 800 according to an embodiment of the present application, and as shown in fig. 8, the electronic device 800 includes:
the processing unit 810 is configured to input an image including a target object into an image recognition model to obtain positioning information and classification information, where the positioning information includes coordinates of a plurality of corner points of the target object, the classification information includes a probability of the target object in each preset category, and the image recognition model is obtained by training based on a pre-established multi-task learning model;
the processing unit 810 is further configured to perform a corresponding image processing operation based on the positioning information or the classification information.
The electronic device 800 provided by this embodiment includes a processing unit 810, which performs parallel recognition on the positioning information and the classification information of an input image through an image recognition model to obtain a recognition result, so as to improve processing efficiency, and the image recognition model with a multi-task learning mechanism with shared weights saves storage space, and performs corresponding image processing operation on the image according to the recognition result, so as to ensure that a target object in the image can be accurately recognized.
In one possible design, the processing unit 810 is specifically configured to:
performing feature extraction on the image through a feature extractor of the image recognition model to obtain a feature image of the target object;
and inputting the characteristic image into a classifier of the image recognition model to obtain classification information, and inputting the characteristic image into a regressor of the image recognition model to obtain positioning information.
Optionally, the feature extractor includes 5 convolution stages, and the output channel dimensions of the 5 convolution stages are 24, 36, 72, 144, and 512 in sequence.
Optionally, the classifier and the regressor are all fully connected layers connected to the feature extractor.
In one possible design, the processing unit 810 is specifically configured to:
extracting a target object from the image based on the positioning information in the recognition result;
alternatively, the first and second electrodes may be,
determining the state of the target object based on the classification information in the recognition result; based on the state of the target object, a corresponding image processing operation is performed.
In one possible design, the processing unit 810 is specifically configured to:
performing affine transformation on the image based on the coordinates of the multiple corner points to obtain transformed coordinates of the multiple corner points;
and extracting the target object from the image based on the transformed corner point coordinates.
In one possible design, the processing unit 810 is specifically configured to:
determining at least one category to which the target object belongs based on the probability of the target object on each preset category and a preset threshold;
and determining the state of the target object to be a normal state, an abnormal state or a state to be corrected based on the at least one category.
In one possible design, the processing unit 810 is specifically configured to:
if the at least one category includes presenting a front side on the image or presenting a back side on the image, determining whether an orientation of the target object on the image is consistent with a preset orientation;
if at least one category comprises an error object, a reproduction or a screenshot, or the orientation of the target object on the image is inconsistent with a preset orientation, the target object is in an abnormal state;
if the at least one category does not comprise the error object, the copying and the screenshot, and the orientation of the target object on the image is consistent with the preset orientation, determining a rotation angle category which is included by the at least one category, wherein the rotation angle category is one of a rotation angle of 0 degrees, a rotation angle of 90 degrees, a rotation angle of 180 degrees or a rotation angle of 270 degrees;
if the rotation angle type is a rotation angle of 90 degrees, a rotation angle of 180 degrees or a rotation angle of 270 degrees, the target object is in a state to be corrected;
if the rotation angle type is a rotation angle of 0 °, the state of the target object is a normal state.
In one possible design, the processing unit 810 is specifically configured to:
if the state of the target object is an abnormal state, generating indication information, wherein the indication information is used for indicating to obtain the image containing the target object again;
if the state of the target object is the state to be corrected, rotating the image to enable the rotation angle of the target object to be 0 degrees;
and if the state of the target object is a normal state, extracting the target object from the image based on the positioning information.
The electronic device provided in this embodiment can be used to implement the method in any of the above embodiments, and the implementation effect is similar to that of the method embodiment, and is not described here again.
Fig. 9 is a schematic hardware structure diagram of an electronic device 900 according to an embodiment of the present application. As shown in fig. 9, in general, the electronic apparatus 900 includes: a processor 910 and a memory 920.
The processor 910 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 910 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 901 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 910 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 910 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
Memory 920 may include one or more computer-readable storage media, which may be non-transitory. Memory 920 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 920 is used to store at least one instruction for execution by processor 910 to implement the methods provided by the method embodiments herein.
Optionally, as shown in fig. 9, the electronic device 900 may further include a transceiver 930, and the processor 910 may control the transceiver 930 to communicate with other devices, and in particular, may transmit information or data to the other devices or receive information or data transmitted by the other devices.
The transceiver 930 may include a transmitter and a receiver, among others. The transceiver 930 may further include one or more antennas.
Optionally, the electronic device 900 may implement corresponding processes in the methods of the embodiments of the present application, and for brevity, details are not described here again.
Those skilled in the art will appreciate that the configuration shown in fig. 9 does not constitute a limitation of the electronic device 900, and may include more or fewer components than those shown, or combine certain components, or employ a different arrangement of components.
Embodiments of the present application also provide a non-transitory computer-readable storage medium, where instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method provided by the above embodiments.
The computer-readable storage medium in this embodiment may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, etc. that is integrated with one or more available media, and the available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., SSDs), etc.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The embodiment of the present application also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method provided by the above embodiment.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.