CN112651395A - Image processing method and device - Google Patents

Image processing method and device Download PDF

Info

Publication number
CN112651395A
CN112651395A CN202110032170.8A CN202110032170A CN112651395A CN 112651395 A CN112651395 A CN 112651395A CN 202110032170 A CN202110032170 A CN 202110032170A CN 112651395 A CN112651395 A CN 112651395A
Authority
CN
China
Prior art keywords
image
target object
state
rotation angle
category
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110032170.8A
Other languages
Chinese (zh)
Inventor
杨航
杨青
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Du Xiaoman Technology Beijing Co Ltd
Original Assignee
Shanghai Youyang New Media Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Youyang New Media Information Technology Co ltd filed Critical Shanghai Youyang New Media Information Technology Co ltd
Priority to CN202110032170.8A priority Critical patent/CN112651395A/en
Publication of CN112651395A publication Critical patent/CN112651395A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T3/00Geometric image transformations in the plane of the image
    • G06T3/02Affine transformations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/194Segmentation; Edge detection involving foreground-background segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The method inputs an image containing a target object into an image recognition model to obtain a recognition result of the target object, wherein the recognition result comprises positioning information and classification information, the positioning information is used for representing the position of the target object in the image, the classification information is used for representing the probability of the target object in each preset category, the image recognition model is obtained based on multi-task learning model training established in advance, corresponding image processing operation is executed based on the recognition result, the image recognition efficiency can be improved, and storage resources are saved.

Description

Image processing method and device
Technical Field
The present application relates to the field of image processing technologies, and in particular, to a method and an apparatus for processing an image.
Background
At present, certificates uploaded by users are collected and identified through the internet to achieve the purpose of determining identities and authorities, in order to achieve accurate identification, the positions of the certificates in images and whether the certificates in the images meet the identified conditions need to be determined before identification, although the prior art can locate the certificates or identify abnormal states of the certificates, higher processing efficiency and smaller storage space occupation need to be pursued.
Disclosure of Invention
The application provides an image processing method and device, which can realize accurate identification of an image under the condition of occupying a small storage space.
In a first aspect, an embodiment of the present application provides an image processing method, including:
inputting an image containing a target object into an image recognition model to obtain a recognition result of the target object, wherein the recognition result comprises positioning information and classification information, the positioning information is used for representing the position of the target object in the image, the classification information is used for representing the probability of the target object in each preset category, and the image recognition model is obtained based on pre-established multi-task learning model training;
based on the recognition result, a corresponding image processing operation is performed.
In a second aspect, an embodiment of the present application provides an apparatus for processing an image, including:
the image recognition system comprises a processing unit, a processing unit and a processing unit, wherein the processing unit is used for inputting an image containing a target object into an image recognition model to obtain positioning information and classification information, the positioning information comprises a plurality of corner point coordinates of the target object, the classification information comprises the probability of the target object on each preset class, and the image recognition model is obtained by training based on a pre-established multi-task learning model;
the processing unit is further configured to perform a corresponding image processing operation based on the positioning information or the classification information.
In a third aspect, an embodiment of the present application provides an electronic device, including: a memory and a processor;
the memory stores computer-executable instructions;
the processor executes the computer-executable instructions stored by the memory, causing the processor to perform the method of the first aspect or embodiments thereof.
In a fourth aspect, an embodiment of the present application provides a storage medium, including: a readable storage medium and a computer program for implementing the method of the first aspect or implementations thereof.
According to the image recognition method and device, the positioning information and the classification information of the input image are recognized in parallel through the image recognition model to obtain the recognition result, the processing efficiency is improved, the storage space is saved through the image recognition model with the multi-task learning mechanism of the shared weight, and the corresponding image processing operation is performed on the image according to the recognition result to ensure that the target object in the image can be recognized accurately.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without inventive exercise.
Fig. 1a is a schematic structural diagram of an electronic device 100 according to an embodiment of the present disclosure;
fig. 1b is a schematic structural diagram of an electronic device 100 according to an embodiment of the present disclosure;
fig. 2 is a flowchart illustrating an image processing method 200 according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of an image recognition model according to an embodiment of the present application;
fig. 4 is a schematic diagram of a feature extraction flow of a feature extraction network according to an embodiment of the present application;
fig. 5a and fig. 5b are schematic diagrams illustrating an effect of an affine transformation provided in the embodiment of the present application;
fig. 6 is a schematic flowchart of an image processing method according to an embodiment of the present application;
fig. 7 is a schematic flowchart of an image processing method according to an embodiment of the present application;
fig. 8 is a schematic structural diagram of an electronic device 800 according to an embodiment of the present disclosure;
fig. 9 is a schematic hardware structure diagram of an electronic device 900 according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
In various fields, for actions to be performed by a user, such as transactions, investments, ratings and the like, an image including a target object is uploaded by the user, the target object is generally a certificate, and the identity, authority, capability and the like of the user are identified.
In order to accurately acquire the content of the certificate, the target object in the image needs to be located and classified before the target object in the image, so as to determine whether the target object in the image is possible to be accurately identified. At present, a target object in an image can be positioned in some scenes, and a target object in an image can be positioned in other scenes, however, the two ways of respectively obtaining the positioning and classification of the target object are time-consuming, occupy a large storage space, are easily limited by the storage space, and are particularly difficult to apply in terminal equipment.
For the above scene, the embodiment of the application is based on a multitask learning model, the positioning information and the classification information of the target object in the image are acquired in parallel, and the image is processed according to the positioning information and/or the classification information, so that the image can be accurately identified.
The technical scheme of the embodiment of the application can be applied to various electronic devices. The electronic device may be a terminal device, such as a Mobile Phone (Mobile Phone), a tablet computer (Pad), a computer, a Virtual Reality (VR) terminal device, an Augmented Reality (AR) terminal device, a terminal device in industrial control (industrial control), a terminal device in unmanned driving (self driving), a terminal device in remote medical treatment (remote medical), a terminal device in smart city (smart city), a terminal device in smart home (smart home), and the like. The terminal equipment in this application embodiment can also be wearable equipment, and wearable equipment also can be called as wearing formula smart machine, is the general term of using wearing formula technique to carry out intelligent design, develop the equipment that can dress to daily wearing, like glasses, gloves, wrist-watch, dress and shoes etc.. A wearable device is a portable device that is worn directly on the body or integrated into the clothing or accessories of the user. The terminal device may be fixed or mobile.
For example, the electronic device in the embodiment of the present application may also be a server, and when the electronic device is the server, the electronic device may receive an image acquired by the terminal device and perform image processing on the image.
Fig. 1a or fig. 1b are schematic structural diagrams of an electronic device 100 according to an embodiment of the present disclosure. As shown in fig. 1a, the electronic device 100 comprises: the image processing device comprises an image acquisition unit 101, a preprocessing unit 102, an image recognition unit 103 and a post-processing unit 104, wherein the image acquisition unit 101 is connected with the preprocessing unit 102, the image recognition unit 103 and the post-processing unit 104 in sequence.
The image acquiring unit 102 is configured to acquire an image to be recognized, which includes, for example, a document object. For example, the image acquired by the image acquisition device for the target object, or the image transmitted by another device and containing the target object, or the image input by the user and containing the target object may be received, which is not limited in this embodiment of the present application.
The preprocessing unit 102 receives the image to be recognized sent by the image obtaining unit 101, and performs a preprocessing operation on the image to make the image meet the input requirement of the image recognizing unit 103, for example, to adjust the image to a preset size, such as 224 × 224 pixels.
The image recognition unit 103 receives the image containing the target object sent by the preprocessing unit 102, and recognizes the image to obtain the positioning information and the classification information of the target object.
Illustratively, as shown in fig. 1b, the image recognition unit 103 includes a feature extractor 1031, a classifier 1032 and a regressor 1033, and the optional image recognition unit 103 may be an image recognition model trained based on a multi-task learning model. The feature extractor 1031 is connected to the classifier 1032 and the regressor 1033, respectively, and exemplarily, the classifier 1032 and the regressor 1033 are all connected in a full layer, that is, each node on the top layer in the classifier 1032 or the regressor is connected to all nodes on the bottom layer of the feature extractor 1031. The feature extractor 1031 is configured to perform feature extraction on the image to obtain a feature image, and the classifier 1032 and the regressor 1033 both acquire the feature image from the feature extractor 1031 and perform data processing in a parallel manner. The classifier 1032 is configured to output classification information of the target object according to the feature image, and the regressor 1033 is configured to output localization information of the target object according to the feature image.
The post-processing unit 104 receives the classification information and the positioning information sent by the image recognition unit 103, and performs a corresponding image processing operation based on the classification information and/or the positioning information.
The present application is specifically illustrated by the following examples.
Fig. 2 is a flowchart illustrating an image processing method 200 according to an embodiment of the present disclosure.
In order to improve the efficiency of image recognition and save storage resources, the embodiment of the application determines the positioning information and the classification information of the target object based on the image recognition model obtained by pre-training, and executes corresponding image processing operation based on the positioning information and/or the classification information, so that the image can be accurately recognized.
As shown in fig. 2, the image processing method provided in the embodiment of the present application includes:
s201: and inputting the image containing the target object into the image recognition model to obtain a recognition result of the target object.
The identification result comprises positioning information and classification information.
The positioning information includes coordinates of a plurality of corner points of the target object, and when the target object is a certificate class object, the target object generally has four corner points, and optionally, coordinates of each corner point are two-dimensional coordinates, that is, x values and y values, and then the positioning information is 8-dimensional information.
The classification information includes a probability of the target object on each preset category for characterizing a state of the target object or the image. Optionally, the preset categories include: at least one of a front side is presented in the image, a back side is presented in the image, an error object, a reproduction, a screenshot, a rotation angle of 0 °, a rotation angle of 90 °, a rotation angle of 180 °, and a rotation angle of 270 °.
The classification information may include more or less content than the above examples, depending on the application scenario. Illustratively, the classification information includes at least one of a presence of a positive side in the image, a presence of a negative side in the image, and an erroneous object.
The image recognition model is obtained by training based on a multi-task learning model established in advance. It should be understood that the multitask learning model is a network model established based on the multitask learning mechanism, has shared weight, and can be parallel to a plurality of branches.
In this step, the pre-trained image recognition model can determine the positioning information and the classification information of the target object in parallel, in other words, the positioning information and the classification information output by the image recognition model can be obtained by inputting the image containing the target object into the image recognition model.
S202: based on the recognition result, a corresponding image processing operation is performed.
For example, the corresponding image processing operation may be performed based on the positioning information in the recognition result, for example, the target object may be extracted from the image based on the positioning information, in other words, the background image not containing the target object may be removed based on the positioning information.
For example, if the state of the target object is determined to be a normal state based on the classification information, the next recognition operation is continued on the image, or the image is stored, or no processing is performed, etc., if the state of the target object is determined to be an abnormal state based on the classification information, indication information is generated to indicate that the image containing the target object is acquired again, and if the state of the target object is determined to be a state to be corrected based on the classification information, the image is corrected.
Both of the above examples may be performed in parallel.
In another example, the state of the target object may be determined based on the location information and the classification information in the recognition result, for example, the state of the target object may be determined based on the classification information, and when the state of the target object is a normal state, the target object may be extracted from the image based on the location information.
Optionally, the abnormal state includes: presenting a front side on the image or presenting a back side on the image, which is not consistent with a preset orientation, for example, the preset orientation is the front side of the target object, that is, the front side of the target object needs to be acquired, and the image is presented as the back side of the target object; the target object is an error object, for example, the image does not contain the required target object; the image containing the target object is obtained by copying, for example, the acquired target object is a target object in another image and is not a target object entity; the image containing the target object is obtained from a screenshot, and so on.
Optionally, the normal state includes: presenting the front side on the image or the back side on the image in accordance with a preset orientation; the rotation angle of the target object is 0 °.
Optionally, the state to be corrected includes: the rotation angle of the target object is 90 °, 180 °, 270 °.
In the embodiment of the application, the positioning information and the classification information of the input image are identified in parallel through the image identification model to obtain the identification result, the processing efficiency is improved, the image identification model with the weight sharing multi-task learning mechanism saves the storage space, and the image is subjected to corresponding image processing operation according to the identification result to ensure that the target object in the image can be accurately identified.
In a specific implementation mode, feature extraction is carried out on an image through a feature extractor of an image recognition model to obtain a feature image of a target object; and inputting the characteristic image into a classifier of the image recognition model to obtain classification information, and inputting the characteristic image into a regressor of the image recognition model to obtain positioning information.
Fig. 3 is a schematic flowchart of an image recognition model according to an embodiment of the present disclosure. As shown in fig. 3, an image 300 including a target object 301 is input to a feature extractor 310, a feature image is obtained by feature extraction, and the feature images are input to a classifier 320 and a regressor 330, respectively. The classifier 320 outputs classification information of a target object or an image based on a feature image, for example, a probability of outputting a preset class of 3 dimensions, an error object (non _ idcard)0.01, a front side (front _ idcard)0.02 in the image, and a back side (back _ idcard)0.97 in the image. The regressor 330 outputs positioning information based on the feature images, for example, 8-dimensional coordinate information [ x1, y1, x2, y2, x3, y3, x4, y4] is output, where (x1, y1), (x2, y2), (x3, y3), (x4, y4) are coordinates of 4 corner points of the target object in the image, respectively.
Fig. 4 is a schematic diagram of a feature extraction flow of a feature extraction network according to an embodiment of the present application. For example, the feature extractor may be a feature extraction network, which performs channel separation processing, convolution processing, stitching processing, and channel scrambling processing on the input image in sequence to obtain and output a feature image, as shown in fig. 4. Optionally, the convolution processing includes 5 convolution stages, and the dimensions of the output channels of the 5 convolution stages are set to 24, 36, 72, 144, and 512 in sequence through parameter setting. It should be appreciated that the architecture of the feature extraction network is similar to the lightweight convolutional neural network, ShuffleNet V2.
And respectively inputting the characteristic image into a classifier and a regressor of the image recognition model, wherein the classifier outputs classification information of the image, and the regressor outputs positioning information of the target object in the image.
For example, the classifier can be regarded as a mapping function G, the input of which is the above-mentioned feature image F, so that the classification of the classifier is G (F). The classifier has high expandability, and can expand output dimensionality to n dimensionalities according to requirements to meet n classification requirements.
Illustratively, the regressor is represented by a function P, and due to the multitask learning mechanism, the input of the function P is also the above-mentioned feature image F, so the regression output of the regressor is P (F). Generally, the target object of the certificate is mostly rectangular and has four corner points, the output dimension of the full connection layer is 8-dimensional, and the input feature of the certificate is based on the shared feature image extracted by the feature extractor, so that the size of the model can be greatly reduced by using the method based on point regression, the size of the finally obtained model is generally within 1 million, and the requirements of electronic equipment with small storage space, such as mobile terminals like mobile phones and intelligent wearing, can be met. If the positioning problem is not converted into the corner point regression problem, but the problem is solved by adopting the traditional target detection or segmentation method, the size of the model is often different from dozens of megabytes to hundreds of megabytes, and the limitation of the storage space of the mobile terminal equipment cannot be met.
On the basis of any of the above embodiments, the embodiments of the present application propose the following three possible implementations as to how to execute the corresponding image processing operation based on the recognition result:
the first method is as follows: and extracting the target object from the image based on the positioning information in the identification result.
It should be understood that when the image capturing device captures an image of a target object in the certificate class, an inclination angle inevitably exists between the image capturing device and the target object, as shown in fig. 5a, the outline of the target object appears as a trapezoid in the image, and in order to accurately recognize the target object in the image, it is first necessary to correct the appearance effect of the target object to convert the target object into a regular quadrangle as shown in fig. 5 b.
Fig. 6 is a schematic flowchart of an image processing method according to an embodiment of the present application. As shown in fig. 6, the method includes:
s601: and performing affine transformation on the image based on the coordinates of the plurality of corner points to obtain a plurality of transformed corner point coordinates.
For example, the rotation matrix may be determined based on coordinates of a plurality of corner points, for example, a geometric center of the image is obtained, the rotation matrix is obtained based on the geometric center and coordinates of the plurality of corner points of the target object, and the image is subjected to radial transformation based on the rotation matrix to obtain coordinates of the plurality of corner points after transformation.
S602: and extracting the target object from the transformed image based on the transformed plurality of corner point coordinates.
In this step, extracting the target object may be understood as segmenting the image to obtain a foreground image including the target object and a background image not including the target object.
Further, image recognition may be performed on the foreground image including the target object, for example, to recognize content such as characters and images in the target object.
And secondly, determining the state of the target object based on the classification information in the recognition result, and executing corresponding image processing operation based on the state of the target object.
Fig. 7 is a flowchart illustrating an image processing method according to an embodiment of the present application. As shown in fig. 6, the method includes:
s701: and determining at least one category to which the target object belongs based on the probability of the target object on each preset category and a preset threshold value.
S702: and determining the state of the target object to be a normal state, an abnormal state or a state to be corrected based on the at least one category.
For example, for each preset category, if the probability of the target object in the preset category is greater than a preset threshold, the preset category is determined to be the category to which the target object belongs, and if the probability of the target object in the preset category is less than or equal to the preset threshold, the target object is determined not to belong to the preset category. It should be understood that the target object may belong to one to more categories.
Assuming that the preset category comprises an error object, a front side is presented in the image, and a back side is presented in the image, the probability of the target object in the error object category is 0.01, the probability of the target object in the front side is 0.02, and the probability of the target object in the back side is 0.97, at least one category to which the target object belongs is the back side presented in the image. Assuming that the preset orientation is a reverse direction, that is, the reverse side of the target object needs to be acquired, the state of the target object is a normal state, and the preset orientation is a forward direction, that is, the front side of the target object needs to be acquired, the state of the target object is an abnormal state.
Illustratively, when the preset category includes one or more of presenting a front side on the image, presenting a back side on the image, an error object, a flip, a screenshot, a rotation angle of 0 °, a rotation angle of 90 °, a rotation angle of 180 °, or a rotation angle of 270 °, the state of the target object may be determined by:
if the at least one category includes presenting a front side on the image or presenting a back side on the image, determining whether an orientation of the target object on the image is consistent with a preset orientation.
And if at least one category comprises an error object, a reproduction or a screenshot, or the orientation of the target object on the image is not consistent with the preset orientation, the target object is in an abnormal state.
If the at least one category does not comprise the error object, the copying and the screenshot, and the orientation of the target object on the image is consistent with the preset orientation, determining a rotation angle category which is included by the at least one category; the rotation angle category is one of a rotation angle of 0 °, a rotation angle of 90 °, a rotation angle of 180 °, or a rotation angle of 270 °.
Further, if the rotation angle category is a rotation angle of 90 °, a rotation angle of 180 °, or a rotation angle of 270 °, the target object is in a state to be corrected; if the rotation angle type is a rotation angle of 0 °, the state of the target object is a normal state.
Illustratively, on the basis of the above embodiment, based on the state of the target object, the corresponding image processing operation is performed, including: if the state of the target object is an abnormal state, generating indication information, wherein the indication information is used for indicating to reacquire an image containing the target object; if the state of the target object is the state to be corrected, rotating the image to enable the rotation angle of the target object to be 0 degrees; if the state of the target object is a normal state, the next recognition operation can be continuously carried out on the image, or the image is stored or is not processed.
And thirdly, executing corresponding image processing operation based on the classification information and the positioning information in the recognition result.
Illustratively, the state of the target object is determined based on the classification information in the recognition result, and the corresponding image processing operation is performed based on the state of the target object.
The specific implementation process of determining the state of the target object based on the classification information in the recognition result is similar to that in the above embodiment, and is not described here again.
Illustratively, if the state of the target object is an abnormal state, generating indication information, wherein the indication information is used for indicating to reacquire an image containing the target object; if the state of the target object is the state to be corrected, rotating the image to enable the rotation angle of the target object to be 0 degrees; and if the state of the target object is a normal state, extracting the target object from the image based on the positioning information.
The process of extracting the target object from the image based on the positioning information is similar to the above embodiment, and is not described herein again.
Fig. 8 is a schematic structural diagram of an electronic device 800 according to an embodiment of the present application, and as shown in fig. 8, the electronic device 800 includes:
the processing unit 810 is configured to input an image including a target object into an image recognition model to obtain positioning information and classification information, where the positioning information includes coordinates of a plurality of corner points of the target object, the classification information includes a probability of the target object in each preset category, and the image recognition model is obtained by training based on a pre-established multi-task learning model;
the processing unit 810 is further configured to perform a corresponding image processing operation based on the positioning information or the classification information.
The electronic device 800 provided by this embodiment includes a processing unit 810, which performs parallel recognition on the positioning information and the classification information of an input image through an image recognition model to obtain a recognition result, so as to improve processing efficiency, and the image recognition model with a multi-task learning mechanism with shared weights saves storage space, and performs corresponding image processing operation on the image according to the recognition result, so as to ensure that a target object in the image can be accurately recognized.
In one possible design, the processing unit 810 is specifically configured to:
performing feature extraction on the image through a feature extractor of the image recognition model to obtain a feature image of the target object;
and inputting the characteristic image into a classifier of the image recognition model to obtain classification information, and inputting the characteristic image into a regressor of the image recognition model to obtain positioning information.
Optionally, the feature extractor includes 5 convolution stages, and the output channel dimensions of the 5 convolution stages are 24, 36, 72, 144, and 512 in sequence.
Optionally, the classifier and the regressor are all fully connected layers connected to the feature extractor.
In one possible design, the processing unit 810 is specifically configured to:
extracting a target object from the image based on the positioning information in the recognition result;
alternatively, the first and second electrodes may be,
determining the state of the target object based on the classification information in the recognition result; based on the state of the target object, a corresponding image processing operation is performed.
In one possible design, the processing unit 810 is specifically configured to:
performing affine transformation on the image based on the coordinates of the multiple corner points to obtain transformed coordinates of the multiple corner points;
and extracting the target object from the image based on the transformed corner point coordinates.
In one possible design, the processing unit 810 is specifically configured to:
determining at least one category to which the target object belongs based on the probability of the target object on each preset category and a preset threshold;
and determining the state of the target object to be a normal state, an abnormal state or a state to be corrected based on the at least one category.
In one possible design, the processing unit 810 is specifically configured to:
if the at least one category includes presenting a front side on the image or presenting a back side on the image, determining whether an orientation of the target object on the image is consistent with a preset orientation;
if at least one category comprises an error object, a reproduction or a screenshot, or the orientation of the target object on the image is inconsistent with a preset orientation, the target object is in an abnormal state;
if the at least one category does not comprise the error object, the copying and the screenshot, and the orientation of the target object on the image is consistent with the preset orientation, determining a rotation angle category which is included by the at least one category, wherein the rotation angle category is one of a rotation angle of 0 degrees, a rotation angle of 90 degrees, a rotation angle of 180 degrees or a rotation angle of 270 degrees;
if the rotation angle type is a rotation angle of 90 degrees, a rotation angle of 180 degrees or a rotation angle of 270 degrees, the target object is in a state to be corrected;
if the rotation angle type is a rotation angle of 0 °, the state of the target object is a normal state.
In one possible design, the processing unit 810 is specifically configured to:
if the state of the target object is an abnormal state, generating indication information, wherein the indication information is used for indicating to obtain the image containing the target object again;
if the state of the target object is the state to be corrected, rotating the image to enable the rotation angle of the target object to be 0 degrees;
and if the state of the target object is a normal state, extracting the target object from the image based on the positioning information.
The electronic device provided in this embodiment can be used to implement the method in any of the above embodiments, and the implementation effect is similar to that of the method embodiment, and is not described here again.
Fig. 9 is a schematic hardware structure diagram of an electronic device 900 according to an embodiment of the present application. As shown in fig. 9, in general, the electronic apparatus 900 includes: a processor 910 and a memory 920.
The processor 910 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and so on. The processor 910 may be implemented in at least one hardware form of a DSP (Digital Signal Processing), an FPGA (Field-Programmable Gate Array), and a PLA (Programmable Logic Array). The processor 901 may also include a main processor and a coprocessor, where the main processor is a processor for Processing data in an awake state, and is also called a Central Processing Unit (CPU); a coprocessor is a low power processor for processing data in a standby state. In some embodiments, the processor 910 may be integrated with a GPU (Graphics Processing Unit), which is responsible for rendering and drawing the content required to be displayed on the display screen. In some embodiments, the processor 910 may further include an AI (Artificial Intelligence) processor for processing computing operations related to machine learning.
Memory 920 may include one or more computer-readable storage media, which may be non-transitory. Memory 920 may also include high speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 920 is used to store at least one instruction for execution by processor 910 to implement the methods provided by the method embodiments herein.
Optionally, as shown in fig. 9, the electronic device 900 may further include a transceiver 930, and the processor 910 may control the transceiver 930 to communicate with other devices, and in particular, may transmit information or data to the other devices or receive information or data transmitted by the other devices.
The transceiver 930 may include a transmitter and a receiver, among others. The transceiver 930 may further include one or more antennas.
Optionally, the electronic device 900 may implement corresponding processes in the methods of the embodiments of the present application, and for brevity, details are not described here again.
Those skilled in the art will appreciate that the configuration shown in fig. 9 does not constitute a limitation of the electronic device 900, and may include more or fewer components than those shown, or combine certain components, or employ a different arrangement of components.
Embodiments of the present application also provide a non-transitory computer-readable storage medium, where instructions in the storage medium, when executed by a processor of an electronic device, enable the electronic device to perform the method provided by the above embodiments.
The computer-readable storage medium in this embodiment may be any available medium that can be accessed by a computer or a data storage device such as a server, a data center, etc. that is integrated with one or more available media, and the available media may be magnetic media (e.g., floppy disks, hard disks, magnetic tapes), optical media (e.g., DVDs), or semiconductor media (e.g., SSDs), etc.
Those of ordinary skill in the art will understand that: all or a portion of the steps of implementing the above-described method embodiments may be performed by hardware associated with program instructions. The program may be stored in a computer-readable storage medium. When executed, the program performs steps comprising the method embodiments described above; and the aforementioned storage medium includes: various media that can store program codes, such as ROM, RAM, magnetic or optical disks.
The embodiment of the present application also provides a computer program product containing instructions, which when run on a computer, causes the computer to execute the method provided by the above embodiment.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above description is only exemplary of the present application and should not be taken as limiting the present application, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims (10)

1. A method of processing an image, the method comprising:
inputting an image containing a target object into an image recognition model to obtain a recognition result of the target object, wherein the recognition result comprises positioning information and classification information, the positioning information is used for representing the position of the target object in the image, the classification information is used for representing the probability of the target object in each preset category, and the image recognition model is obtained based on pre-established multi-task learning model training;
and executing corresponding image processing operation based on the identification result.
2. The method of claim 1, wherein inputting the image containing the target object into an image recognition model, obtaining the positioning information and the classification information comprises:
extracting the features of the image through a feature extractor of the image recognition model to obtain a feature image of the target object;
and inputting the characteristic image into a classifier of the image recognition model to obtain the classification information, and inputting the characteristic image into a regressor of the image recognition model to obtain the positioning information.
3. The method of claim 1 or 2, wherein the feature extractor comprises 5 convolution stages, and the output channel dimensions of the 5 convolution stages are 24, 36, 72, 144, 512 in sequence.
4. The method of claim 1 or 2, wherein the classifier and the regressor are all connected layers connected to the feature extractor.
5. The method according to claim 1 or 2, wherein said performing, based on said recognition result, a corresponding image processing operation comprises:
extracting the target object from the image based on the positioning information in the identification result;
alternatively, the first and second electrodes may be,
determining the state of the target object based on the classification information in the identification result; based on the state of the target object, a corresponding image processing operation is performed.
6. The method of claim 5, wherein the positioning information comprises coordinates of a plurality of corner points of the target object, and wherein extracting the target object from the image based on the positioning information in the recognition result comprises:
performing affine transformation on the image based on the coordinates of the plurality of corner points to obtain a plurality of transformed corner point coordinates;
and extracting the target object from the image based on the transformed corner point coordinates.
7. The method of claim 5, wherein determining the state of the target object based on the classification information in the recognition result comprises:
determining at least one category to which the target object belongs based on the probability of the target object on each preset category and a preset threshold;
and determining the state of the target object to be a normal state, an abnormal state or a state to be corrected based on the at least one category.
8. The method of claim 5, wherein the determining the state of the target object as a normal state, an abnormal state, or a to-be-corrected state based on the at least one category comprises:
if the at least one category includes presenting a front side on the image or presenting a back side on the image, determining whether an orientation of the target object on the image is consistent with a preset orientation;
if the at least one category comprises an error object, a reproduction or a screenshot, or the orientation of the target object on the image is inconsistent with a preset orientation, the target object is in an abnormal state;
if the at least one category does not comprise an error object, a copy and a screenshot, and the orientation of the target object on the image is consistent with a preset orientation, determining a rotation angle category which the at least one category comprises, wherein the rotation angle category is one of a rotation angle of 0 degrees, a rotation angle of 90 degrees, a rotation angle of 180 degrees or a rotation angle of 270 degrees;
if the rotation angle type is a rotation angle of 90 degrees, a rotation angle of 180 degrees or a rotation angle of 270 degrees, the target object is in a state to be corrected;
and if the rotation angle type is the rotation angle of 0 degrees, the state of the target object is a normal state.
9. The method of claim 8, wherein performing the corresponding image processing operation based on the state of the target object comprises:
if the state of the target object is an abnormal state, generating indication information, wherein the indication information is used for indicating to reacquire an image containing the target object;
if the state of the target object is a state to be corrected, rotating the image to enable the rotation angle of the target object to be 0 degree;
and if the state of the target object is a normal state, extracting the target object from the image based on the positioning information.
10. An apparatus for processing an image, comprising:
the image recognition system comprises a processing unit, a processing unit and a processing unit, wherein the processing unit is used for inputting an image containing a target object into an image recognition model to obtain positioning information and classification information, the positioning information comprises a plurality of corner point coordinates of the target object, the classification information comprises the probability of the target object on each preset class, and the image recognition model is obtained by training based on a pre-established multi-task learning model;
the processing unit is further configured to perform a corresponding image processing operation based on the positioning information or the classification information.
CN202110032170.8A 2021-01-11 2021-01-11 Image processing method and device Pending CN112651395A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110032170.8A CN112651395A (en) 2021-01-11 2021-01-11 Image processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110032170.8A CN112651395A (en) 2021-01-11 2021-01-11 Image processing method and device

Publications (1)

Publication Number Publication Date
CN112651395A true CN112651395A (en) 2021-04-13

Family

ID=75367871

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110032170.8A Pending CN112651395A (en) 2021-01-11 2021-01-11 Image processing method and device

Country Status (1)

Country Link
CN (1) CN112651395A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109344727A (en) * 2018-09-07 2019-02-15 苏州创旅天下信息技术有限公司 Identity card text information detection method and device, readable storage medium storing program for executing and terminal
CN109657673A (en) * 2017-10-11 2019-04-19 阿里巴巴集团控股有限公司 Image-recognizing method and terminal
CN110020651A (en) * 2019-04-19 2019-07-16 福州大学 Car plate detection localization method based on deep learning network
CN110163193A (en) * 2019-03-25 2019-08-23 腾讯科技(深圳)有限公司 Image processing method, device, computer readable storage medium and computer equipment
CN110659646A (en) * 2019-08-21 2020-01-07 北京三快在线科技有限公司 Automatic multitask certificate image processing method, device, equipment and readable storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109657673A (en) * 2017-10-11 2019-04-19 阿里巴巴集团控股有限公司 Image-recognizing method and terminal
CN109344727A (en) * 2018-09-07 2019-02-15 苏州创旅天下信息技术有限公司 Identity card text information detection method and device, readable storage medium storing program for executing and terminal
CN110163193A (en) * 2019-03-25 2019-08-23 腾讯科技(深圳)有限公司 Image processing method, device, computer readable storage medium and computer equipment
CN110020651A (en) * 2019-04-19 2019-07-16 福州大学 Car plate detection localization method based on deep learning network
CN110659646A (en) * 2019-08-21 2020-01-07 北京三快在线科技有限公司 Automatic multitask certificate image processing method, device, equipment and readable storage medium

Similar Documents

Publication Publication Date Title
US11232286B2 (en) Method and apparatus for generating face rotation image
US9501716B2 (en) Labeling component parts of objects and detecting component properties in imaging data
US20240112035A1 (en) 3d object recognition using 3d convolutional neural network with depth based multi-scale filters
US9727775B2 (en) Method and system of curved object recognition using image matching for image processing
CN106709404B (en) Image processing apparatus and image processing method
EP4156017A1 (en) Action recognition method and apparatus, and device and storage medium
WO2020171919A1 (en) Neural network for skeletons from input images
Liu et al. Real-time robust vision-based hand gesture recognition using stereo images
CN112052186B (en) Target detection method, device, equipment and storage medium
AU2019477545B2 (en) Methods for handling occlusion in augmented reality applications using memory and device tracking and related apparatus
US20220309836A1 (en) Ai-based face recognition method and apparatus, device, and medium
CN108701355B (en) GPU optimization and online single Gaussian-based skin likelihood estimation
CN112749609B (en) Human body image segmentation method, device, computer equipment and storage medium
CN112308770B (en) Portrait conversion model generation method and portrait conversion method
WO2022152116A1 (en) Image processing method and apparatus, device, storage medium, and computer program product
US20220207913A1 (en) Method and device for training multi-task recognition model and computer-readable storage medium
Zhang et al. R2net: Residual refinement network for salient object detection
US20210406568A1 (en) Utilizing multiple stacked machine learning models to detect deepfake content
Luo et al. Multi-scale face detection based on convolutional neural network
CN116994319A (en) Model training method, face recognition equipment and medium
CN112651395A (en) Image processing method and device
CN116228850A (en) Object posture estimation method, device, electronic equipment and readable storage medium
CN112598074A (en) Image processing method and device, computer readable storage medium and electronic device
Nguyen et al. Facial Landmark Detection with Learnable Connectivity Graph Convolutional Network
CN115147681B (en) Training of clothing generation model and method and device for generating clothing image

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 401121 b7-7-2, Yuxing Plaza, No.5 Huangyang Road, Yubei District, Chongqing

Applicant after: Chongqing duxiaoman Youyang Technology Co.,Ltd.

Address before: Room 3075, building 815, Jiayuan district, Shanghai

Applicant before: SHANGHAI YOUYANG NEW MEDIA INFORMATION TECHNOLOGY Co.,Ltd.

CB02 Change of applicant information
TA01 Transfer of patent application right

Effective date of registration: 20211220

Address after: Room 606, 6 / F, building 4, courtyard 10, Xibeiwang Road, Haidian District, Beijing 100085

Applicant after: Du Xiaoman Technology (Beijing) Co.,Ltd.

Address before: 401121 b7-7-2, Yuxing Plaza, No.5 Huangyang Road, Yubei District, Chongqing

Applicant before: Chongqing duxiaoman Youyang Technology Co.,Ltd.

TA01 Transfer of patent application right
RJ01 Rejection of invention patent application after publication

Application publication date: 20210413

RJ01 Rejection of invention patent application after publication