Image processing method, image processing device, electronic equipment and storage medium
Technical Field
The present invention relates to the field of machine vision technologies, and in particular, to an image processing method and apparatus, an electronic device, and a storage medium.
Background
In recent years, with the rapid development of machine vision, deep learning, big data and computer hardware, human body detection technology and human face detection technology have been developed. However, in the existing detection technology, human body detection and human face detection are mostly taken as two independent tasks, and the human body and the human face belonging to the same person in the image are rarely considered to be combined and output in pairs. Therefore, how to match and output the human body and the human face belonging to the same person in the image becomes a problem to be solved by the technical personnel in the field.
Disclosure of Invention
The embodiment of the invention provides an image processing method, an image processing device, electronic equipment and a storage medium, and aims to solve the technical problem that in the prior art, a human body and a human face belonging to the same person in an image cannot be matched and output.
According to a first aspect of the present invention, there is disclosed an image processing method, the method comprising:
receiving a target image;
inputting the target image into a preset first detection model for processing to obtain a first detection result; the first detection result comprises first human body information and first human body information which are detected by a human face anchor frame and have matching relations, and second human body information which are detected by a human body anchor frame and have matching relations, wherein the first human body information and the second human body information both comprise human body probabilities and human body positions, the first human face information and the second human face information both comprise human face probabilities, and the first human face information also comprises human face positions;
determining candidate first face information and candidate first human body information matched with the candidate first face information according to the first face information in the first detection result; determining candidate second human body information according to the second human body information in the first detection result;
determining candidate first human body information and candidate second human body information with corresponding relation according to the human body position in the candidate first human body information and the human body position in the candidate second human body information;
determining candidate second human body information and candidate first human face information with a matching relationship according to the corresponding relationship between the candidate first human body information and the candidate second human body information and the matching relationship between the candidate first human body information and the candidate first human face information;
and matching and outputting the human body position in the candidate second human body information and the human face position in the candidate first human face information with the matching relationship.
According to a second aspect of the present invention, there is disclosed an image processing apparatus comprising:
a receiving module for receiving a target image;
the first processing module is used for inputting the target image into a preset first detection model for processing to obtain a first detection result; the first detection result comprises first human body information and first human body information which are detected by a human face anchor frame and have matching relations, and second human body information which are detected by a human body anchor frame and have matching relations, wherein the first human body information and the second human body information both comprise human body probabilities and human body positions, the first human face information and the second human face information both comprise human face probabilities, and the first human face information also comprises human face positions;
the first determining module is used for determining candidate first face information and candidate first human body information matched with the candidate first face information according to the first face information in the first detection result; determining candidate second human body information according to the second human body information in the first detection result;
a second determining module, configured to determine candidate first human body information and candidate second human body information having a correspondence relationship according to the human body position in the candidate first human body information and the human body position in the candidate second human body information;
a third determining module, configured to determine candidate second human body information and candidate first human face information having a matching relationship according to a corresponding relationship between the candidate first human body information and the candidate second human body information and a matching relationship between the candidate first human body information and the candidate first human face information;
and the first output module is used for matching and outputting the human body position in the candidate second human body information with the matching relationship with the human face position in the candidate first human face information.
According to a third aspect of the present invention, there is disclosed an electronic device comprising: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps in the image processing method as described in the first aspect.
According to a fourth aspect of the present invention, a computer-readable storage medium is disclosed, having stored thereon a computer program which, when executed by a processor, implements the steps in the image processing method described in the first aspect.
In the embodiment of the invention, for a target image needing to be processed, a human face anchor frame in a first detection model can be used for detecting the matching relation between a human face and a human body belonging to the same person in the target image, the matching relation between the human body and the human face belonging to the same person in the target image is detected through a human body anchor frame in the first detection model, then the human face detected by the human face anchor frame and the human body detected by the human body anchor frame are bound into pairs based on the matching relation detected by the human face anchor frame and the human body anchor frame, the binding of the human face and the human body is realized by simultaneously detecting the human face and the human body by using the same anchor frame, and the advantages of more accurate human body position detection of the human face anchor frame and more accurate human body position detection of the human body anchor frame in the target image are fully exerted, so that the accurate human body and accurate.
Drawings
FIG. 1 is an exemplary diagram of an anchor frame in an image provided by the present invention;
FIG. 2 is a flow diagram of an image processing method of one embodiment of the invention;
FIG. 3 is a flow chart of a first detection model training step of one embodiment of the present invention;
FIG. 4 is a diagram of an example network structure for a first detection model in accordance with one embodiment of the present invention;
FIG. 5 is a flow diagram of an image processing method of another embodiment of the invention;
FIG. 6 is a flow chart of a second detection model training step of one embodiment of the present invention;
FIG. 7 is a flow diagram for one implementation of step 508 of one embodiment of the present invention;
FIG. 8 is an exemplary diagram of an image processing method of one embodiment of the present invention;
fig. 9 is a block diagram of the image processing apparatus according to the embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
The face-to-person binding technique is a technique of locating a face and a body from an image and binding the face and the body belonging to the same person into a pair. The face-to-person binding technology has wide application prospects in the fields of automatic driving, security protection, new retail, virtual augmented reality and the like. In recent years, with the rapid development of deep learning, big data and computer hardware, human body detection technology and human face detection technology have been developed. However, in the prior art, the face detection and the human body detection are mostly taken as two independent tasks, and the combination of the face and the human body belonging to the same person and the output in pairs is rarely considered. Therefore, how to match and output the human body and the human face belonging to the same person in the image becomes a problem to be solved by the technical personnel in the field.
In order to solve the above technical problem, embodiments of the present invention provide an image processing method, an image processing apparatus, an electronic device, and a storage medium.
For ease of understanding, the concepts involved in the embodiments of the present invention will be described first.
An anchor frame: the target detection algorithm usually samples a large number of regions in the input image, then determines whether the regions contain targets that are interested by the user, and adjusts the edges of the regions to predict the real bounding boxes (real-bounding boxes) of the targets more accurately. The region sampling method used by different models may be different, and the anchor frame is one of the region sampling methods: it generates a number of bounding boxes of different sizes and aspect ratios (aspectratios) centered around each pixel, which are called anchor boxes. In general, in order to ensure the accuracy of detecting an object by an anchor frame, a plurality of anchor frames having different sizes and aspect ratios may be designed in consideration of the aspect ratio (ratio) and the size (scale) distribution of the object when designing the anchor frames. When the human face and the human body are detected, a human face anchor frame suitable for the aspect ratio and the size of the human face and a human body anchor frame suitable for the aspect ratio and the size of the human body can be generated. For example, the input image shown in fig. 1 includes a plurality of anchor frames, and a plurality of anchor frames with different sizes can be generated by centering on one pixel point, it should be noted that, as an example, only anchor frames of two pixel points are shown in fig. 1, and in practical application, a plurality of anchor frames with different sizes can be generated at each pixel point.
Intersection ratio of two bounding boxes (IOU): and the ratio of the intersection area and the phase-to-phase area of the two bounding boxes, wherein the value range of the IOU is between 0 and 1, 0 represents that the two bounding boxes have no overlapped pixels, and 1 represents that the two bounding boxes are equal.
Marking an anchor frame of the sample image in the training set: to train the target detection model, two types of labels need to be labeled for each anchor frame: the type of the target contained in the anchor frame is called type for short; the offset of the real bounding box relative to the anchor frame, called offset for short. In the training set of target detection, each sample image has been labeled with the position of the real bounding box and the class of the contained target. After the anchor frame is generated, a real boundary frame similar to the anchor frame can be distributed to the anchor frame according to the intersection ratio of the anchor frame and the real boundary frame, and then the anchor frame is marked according to the position and the category information of the real boundary frame similar to the anchor frame. When detecting the target, firstly generating a plurality of anchor frames, then predicting the category and the offset for each anchor frame, then adjusting the positions of the anchor frames according to the predicted offset to obtain a predicted boundary frame, and finally screening the predicted boundary frame needing to be output.
The process of model training for a model for human face detection based on an anchor frame in the prior art includes: the method comprises an annotation stage and a training stage, wherein the annotation stage comprises the following steps:
in the prior art, by labeling an anchor frame in a sample image, the labeled data includes a category of a target detected by the anchor frame (for example, whether a human face or a human body is detected, or whether any target is not detected); and the offset of the real boundary frame of the target detected by the anchor frame relative to the anchor frame, so as to obtain the anchor frame marking data of each anchor frame in the sample image.
Wherein the training phase comprises the following steps: after anchor frame marking data of each anchor frame in a sample image is obtained, inputting the sample image into an initial model constructed based on a neural network, outputting a prediction result, comparing the prediction result with the anchor frame marking data of each anchor frame in the sample image to obtain a comparison result, adjusting each parameter in the model constructed based on the neural network through the comparison result and a loss function, after parameter adjustment is completed, inputting the sample image into the model after parameter adjustment again, repeating the process until the model converges (namely the difference between the prediction result and the anchor frame marking data is not reduced along with training), determining the model as a final model at the moment, wherein when the final model is used for prediction, the input of the final model is an image to be processed, the output of the final model comprises two branches, and the output data of one branch is used for determining a human face prediction boundary frame in the image to be processed, the output data of the other branch is used to determine a human prediction bounding box in the image to be processed.
The process of prediction based on models trained in the prior art: inputting an image to be processed into a final model obtained by training, wherein the final model can generate a plurality of anchor frames (including a human face anchor frame and a human body anchor frame) for the image to be processed, one branch of the model outputs the prediction category and the offset of the human face anchor frame, and the other branch of the model outputs the prediction category and the offset of the human body anchor frame. And then, obtaining a human face prediction boundary frame in the image to be processed according to the human face anchor frame, the prediction category and the offset of the human face anchor frame, and obtaining a human body prediction boundary frame in the image to be processed according to the human body anchor frame, the prediction category and the offset of the human body anchor frame.
Through analyzing and discovering a model training process and a model-based prediction process in the prior art, although a human face and a human body in an image can be recognized by a model obtained through training in the prior art, the matching relationship between the human body and the human face cannot be established, namely, the human face detection and the human body detection are still used as two independent tasks.
Next, an image processing method according to an embodiment of the present invention will be described.
It should be noted that the method provided by the embodiment of the present invention is applicable to an electronic device, and in practical application, the electronic device may include: mobile terminals such as smart phones, tablet computers, personal digital assistants, etc. may also include: computer devices such as a notebook/desktop computer, a desktop computer, and a server, which are not limited in the embodiments of the present invention.
Fig. 2 is a flow chart of an image processing method according to an embodiment of the present invention, which may include the following steps, as shown in fig. 2: step 201, step 202, step 203, step 204, step 205 and step 206, wherein,
in step 201, a target image is received.
In the embodiment of the present invention, the target image is an image to be processed, that is, an image of a human body and a human face in the image need to be matched, where the target image may be derived from user input or from input of other electronic devices.
In step 202, inputting a target image into a preset first detection model for processing to obtain a first detection result; the first detection result comprises first human body information and first human body information which are detected by a human face anchor frame and have a matching relationship, and second human body information and second human face information which are detected by a human body anchor frame and have a matching relationship, the first human body information and the second human body information both comprise human body probabilities and human body positions, the first human face information and the second human face information both comprise human face probabilities, and the first human face information further comprises human face positions.
In the embodiment of the invention, the first face information and the first human body information which are detected by the face anchor frame and have matching relation refer to the information of the face and the information of the human body which belong to the same person and are predicted by the face anchor frame.
The first face information comprises face probability and face position, the face probability in the first face information refers to the probability that a face predicted by a face anchor frame belongs to the face, the face position in the first face information refers to the position of the face predicted by the face anchor frame, and the face position in the first face information can be the coordinate offset of the face predicted by the face anchor frame relative to the face anchor frame.
The first human body information comprises a human body probability and a human body position, the human body probability in the first human body information refers to the probability that a human body predicted by the human face anchor frame belongs to the human body, the human body position in the first human body information refers to the position where the human body predicted by the human face anchor frame is located, and the human body position in the first human body information can be the coordinate offset of the human body predicted by the human face anchor frame relative to the human face anchor frame.
In the embodiment of the invention, the second human body information which is detected by the human body anchor frame and has matching relation with the human body information
The second face information refers to information of a human body and information of a face of the same person predicted by the human body anchor frame.
The second face information comprises face probability, and the face probability in the second face information refers to the probability that the face predicted by the human body anchor frame belongs to the face.
The second human body information comprises a human body probability and a human body position, the human body probability in the second human body information refers to the probability that the human body predicted by the human body anchor frame belongs to the human body, the human body position in the second human body information refers to the position where the human body predicted by the human body anchor frame is located, and the human body position in the second human body information can be the coordinate offset of the human body predicted by the human body anchor frame relative to the human body anchor frame.
In the embodiment of the invention, in order to ensure the human body detection precision of the human body anchor frame, the height-to-width ratio (ratio) and the size (scale) distribution of the human body can be considered when the human body anchor frame is designed; in addition, in order to ensure the accuracy of the face detection by the face anchor frame, the ratio and scale distribution of the face can be considered when the face anchor frame is designed.
Based on the above design concept of the anchor frame, in the embodiment of the invention, the human body anchor frame is designed according to the aspect ratio and the size of the human body, and the human face anchor frame is designed according to the aspect ratio and the size of the human face.
For convenience of understanding, a training process of the first detection model is first described, as shown in fig. 3, fig. 3 is a flowchart of a training step of the first detection model according to an embodiment of the present invention, and the training step may specifically include the following steps: step 301, step 302, step 303, step 304 and step 305, wherein,
in step 301, a first training set is obtained, wherein the first training set comprises sample images.
In this embodiment of the present invention, the first training set may include one or more sample images, and preferably, the first training set may include a large number of sample images, considering that the larger the number of samples is, the more accurate the detection result of the trained model is.
In step 302, a real face bounding box and a real body bounding box in the sample image are labeled.
In the embodiment of the invention, for each sample image in the first training set, a real face bounding box and a real human body bounding box in the sample image are labeled, wherein the real face bounding box is an area where a real face in the sample image is located, and the real human body bounding box is an area where a real human body in the sample image is located.
In step 303, a human face anchor frame and a human body anchor frame are generated in the sample image according to a preset anchor frame generation rule.
In the embodiment of the present invention, in order to ensure the detection accuracy of the human face and the human body, the preset anchor frame generation rule may include: designing the size of the human body anchor frame according to the aspect ratio and the size of the human body; and designing the size of the human face anchor frame according to the aspect ratio and the size of the human face.
In step 304, based on the preset anchor frame labeling rule, the real face bounding box and the real human body bounding box in the sample image, the anchor frame in the sample image is subjected to class labeling and offset labeling, and anchor frame labeling data of the sample image is obtained.
In the embodiment of the invention, the real face bounding box and the real human body bounding box belonging to the same person are associated by belonging to the same anchor frame.
In the embodiment of the present invention, the anchor frame marking data may specifically include: the labeling data of the face anchor frame matched with the real face boundary frame, the labeling data of the human body anchor frame matched with the real human body boundary frame, the labeling data of the face anchor frame not matched with the real face boundary frame and the labeling data of the human body anchor frame not matched with the real human body boundary frame; wherein, the labeling data of the face anchor frame matched with the real face boundary frame comprises: the class and offset of the face anchor frame relative to the real face bounding box, and the class and offset of the face anchor frame relative to the real human body bounding box belonging to the same person as the real face bounding box; the labeling data of the human body anchor frame matched to the real human body boundary frame comprises: the category and offset of the human body anchor frame relative to the real human body boundary frame, and the category and offset of the human body anchor frame relative to the real human face boundary frame belonging to the same person as the real human body boundary frame.
Compared with the anchor frame technology (the human face anchor frame is only related to the human face image but not the human body image, and the human body anchor frame is only related to the human body image but not the human face image) adopted in the model training process in the prior art, the embodiment of the invention establishes the association relationship between the human face image and the human body image belonging to the same person through the human face anchor frame, and establishes the association relationship between the human body image and the human face image belonging to the same person through the human body anchor frame.
That is, in the embodiment of the present invention, when performing anchor frame labeling, for a face anchor frame, anchor frame labeling data of the face anchor frame includes: the class and offset of the face anchor box relative to the real face bounding box (the anchor box labeling data in the prior art also includes this part of content), further includes: the class and offset of the face anchor frame relative to a real human body bounding box belonging to the same person as the real face bounding box.
For the human body anchor frame, the anchor frame marking data of the human body anchor frame comprises the following components: the category and offset of the human anchor frame with respect to the real human body bounding box (the anchor frame labeling data in the prior art also includes this part of contents), further including: the class and offset of the human anchor frame relative to a real human face bounding box belonging to the same person as the real human body bounding box.
In the embodiment of the invention, the anchor frame marking data of the face anchor frame and the anchor frame marking data of the human body anchor frame can exist in the form of data tables, in practical application, the anchor frame marking data of the face anchor frame can exist in the form of one data table or two data tables, when the anchor frame marking data exist in the form of two data tables, one data table records whether the face anchor frame detects a target of a face type and an offset of the face anchor frame relative to a real face boundary frame, and the other data table records whether the face anchor frame detects a target of a human body type and an offset of the face anchor frame relative to a real human body boundary frame belonging to the same person as the real face boundary frame;
similarly, the anchor frame marking data of the human body anchor frame can exist in the form of one data table or two data tables, when the data tables exist in the form of two data tables, whether the human body anchor frame detects the target of the human body category and the offset of the human body anchor frame relative to the real human body boundary frame or not is recorded in one data table, and whether the human body anchor frame detects the target of the human face category and the offset of the human body anchor frame relative to the real human face boundary frame belonging to the same person as the real human body boundary frame or not is recorded in the other data table.
In the embodiment of the present invention, whether the anchor frame in the anchor frame marking data detects a certain type of target may be represented by 0 and 1, where 0 represents that the anchor frame does not detect a certain type of target, and 1 represents that the anchor frame detects a certain type of target. Alternatively, other numbers may be used to indicate which type of object is detected by the anchor frame, which is not limited by the embodiment of the present invention.
In the embodiment of the present invention, the presetting of the anchor frame marking rule may include: calculating the IOU value of each face anchor frame and each real face boundary frame in the sample image, if the IOU value of the face anchor frame and the real face boundary frame is maximum and reaches a certain threshold value, attributing the real face boundary frame and the real body boundary frame corresponding to the real face boundary frame to the face anchor frame, namely setting the target of whether the face anchor frame detects the face type as 1, namely the target of the face type is detected, and setting the offset of the face anchor frame relative to the real face boundary frame as: setting the relative position of the face anchor frame and the real face boundary frame, setting the target of whether the face anchor frame detects the human body type as 1, namely, the target of the human body type is detected, and setting the offset of the face anchor frame relative to the real body boundary frame of the same person to which the real face boundary frame belongs as: the relative position of the human face anchor frame and the real human body boundary frame of the same person to which the real human face boundary frame belongs; if the IOU values of the face anchor frame and the real face boundary frame do not reach a certain threshold value, setting whether the face anchor frame detects a face target as 0, namely a target without a face category detected, setting whether the face anchor frame detects a human target as 0, namely a target without a human category detected, and setting the offset of the face anchor frame relative to the real face frame and the real human frame as a preset first default value;
calculating the intersection ratio of each human body anchor frame and each real human body boundary frame in the sample image, if the IOU value of the human body anchor frame and the IOU value of the real human body boundary frame are the maximum and reach a certain threshold, attributing the real human body boundary frame and the real human face boundary frame corresponding to the real human body boundary frame to the human body anchor frame, namely setting the target of whether the human body type is detected by the human body anchor frame as 1, namely the target of the human body type is detected, and setting the offset of the human body anchor frame relative to the real human body boundary frame as: the relative position of the human body anchor frame and the real human body boundary frame, whether the human body anchor frame detects the target of the human face type or not is set as 1, namely the target of the human face type is detected, and the offset of the human body anchor frame relative to the real human face boundary frame of the same person to which the real human body boundary frame belongs is set as: the relative position of the human body anchor frame and the real human face boundary frame of the same person to which the real human body boundary frame belongs; if the IOU values of the human body anchor frame and the real human body boundary frame do not reach a certain threshold value, setting whether the human body target is detected by the human body anchor frame to be 0, namely the target of the human body category is not detected, setting whether the human face target is detected by the human face anchor frame to be 0, namely the target of the human face category is not detected, and setting the offset of the human body anchor frame relative to the real human face frame and the real human body frame to be a preset second default value.
In step 305, a sample image is used as an input, anchor frame annotation data of the sample image is used as an output target, a preset first initial model is trained, and a model obtained through training is determined as a first detection model.
In this embodiment of the present invention, the preset first initial model may be a model created based on RetinaNet, where RetinaNet is a network essentially composed of two FCN subnetworks, namely Resnet + FPN +.
In the embodiment of the invention, when model training is carried out, a sample image is input into a preset first initial model, a prediction result is output, the prediction result is compared with anchor frame marking data of each anchor frame in the sample image to obtain a comparison result, each parameter in the first initial model is adjusted through the comparison result and a loss function, after parameter adjustment is finished, the sample image is input into the model after parameter adjustment again, the process is repeated until the model converges, and at the moment, the model obtained by training is determined as a first detection model.
Therefore, in the embodiment of the invention, when the anchor frame in the sample image is labeled, the real face bounding box and the real human body bounding box belonging to the same person belong to the same anchor frame, and the real face bounding box and the real human body bounding box belonging to the same person are associated, so that the first detection model obtained by training can realize the detection and matching output of the human body and the human face.
Illustratively, the network structure of the first detection model may be as follows:
the input of the first detection model is an image to be recognized, and the output of the first detection model comprises: a first predicted branch and a second predicted branch, wherein an output outcome of the first predicted branch comprises: the first face information and the first human body information which are detected by the face anchor frame and have matching relation, and the output result of the second prediction branch comprises: and second human body information and second human face information which are detected by the human body anchor frame and have matching relation.
For ease of understanding and description, the first predicted branch of the first detection model is referred to as a "face 2person branch" and the second predicted branch is referred to as a "person 2face branch", as shown in fig. 4, the face2person branch includes: the face2person _ face branch and the face2person _ person branch are respectively used for outputting first face information detected by a face anchor frame, wherein the face2person _ face branch is a main branch and is used for outputting the first face information detected by the face anchor frame, the face2person _ person branch is a subordinate branch and is used for outputting the first person information detected by the face anchor frame, and the first face information and the first person information have a matching relationship;
the person2face branch includes: a person2face _ person branch and a person2face _ face branch, wherein the person2face _ person branch is a master branch and is used for outputting second human body information detected by a human body anchor frame, the person2face _ face branch is a slave branch and is used for outputting second human face information detected by the human body anchor frame, and the second human body information and the second human face information have a matching relationship;
each of the face2person _ face, face2person _ person, and person2face _ person branches includes a cls branch (not shown) for outputting a corresponding confidence (i.e., probability) of the predicted classification and a bbox branch for outputting a corresponding offset (i.e., position) of the prediction.
It should be noted that, in order to improve the detection effect of the model and eliminate the negative influence of a certain branch on the detection effect of other branches, for the person2face _ face small branch, the bbox branch is removed, and only the cls branch is retained. At this time, in the embodiment of the present invention, the second face information only includes the face probability and does not include the face position.
In step 203, determining candidate first face information and candidate first human body information matched with the candidate first face information according to the first face information in the first detection result; and determining candidate second human body information according to the second human body information in the first detection result.
In the embodiment of the present invention, when there is no overlap between the face positions in each first face information (for example, when the face anchor frame is sparse, such a situation may occur), the first face information in the first detection result may be directly determined as candidate first face information, and the first person information corresponding to the candidate first face information in the first detection result may be determined as candidate first person information; when the face positions in each first face information overlap (for example, when the face anchor frames are dense and one face may be detected by a plurality of face anchor frames), the first face information in the first detection result needs to be deduplicated to obtain candidate first face information, and the first person information corresponding to the candidate first face information in the first detection result is determined as the candidate first person information.
In the embodiment of the invention, when the human body positions in each second human body information are not overlapped (for example, when the human body anchor frames are sparse, the situation may occur), the second human body information in the first detection result can be directly determined as candidate second human body information; when the human body positions in the second human body information overlap (for example, the human body anchor frames are dense, which may occur when one human body may be detected by a plurality of human body anchor frames), the second human body information in the first detection result needs to be deduplicated to obtain candidate second human body information.
When the first human face information and the second human body information need to be deduplicated, in an embodiment of the present invention, the step 203 may specifically include the following steps:
performing NMS (network management system) processing on the face position in the first face information of which the face probability in the first detection result is higher than a preset second threshold value to obtain candidate first face information and candidate first human body information matched with the candidate first face information; and performing NMS (network management system) processing on the human body position in the second human body information of which the human body probability in the first detection result is higher than a preset third threshold value to obtain candidate second human body information.
For convenience of understanding, taking the network structure shown in fig. 4 as an example, in the embodiment of the present invention, for a face2person branch, according to an output of a face2person _ face branch, a face position where a cls value of the face2person _ face branch is higher than a preset second threshold is reserved, and is processed by the NMS, so as to obtain candidate first face information, and meanwhile, first person information corresponding to an index on the face2person _ face branch is reserved as the candidate first person information. And for the person2face branch, according to the output of the person2face _ person branch, reserving the human body position of which the cls value of the person2face _ person branch is higher than a preset third threshold value, and performing NMS (network management system) processing to obtain candidate second human body information.
When the first person information and the second person information need to be deduplicated, in another embodiment provided by the present invention, the step 203 may specifically include the following steps:
determining first face information with the largest face probability in a certain area in the first detection result as candidate first face information, and determining first person information corresponding to the candidate first face information in the first detection result as candidate first person information; and determining second human body information with the maximum human body probability in a certain region in the first detection result as candidate second human body information.
In step 204, candidate first human body information and candidate second human body information with a corresponding relationship are determined according to the human body position in the candidate first human body information and the human body position in the candidate second human body information.
In view of the fact that the human face anchor frame has a better effect on human face detection and the human body anchor frame has a better effect on human body detection, in the embodiment of the invention, the human body information detected by the human body anchor frame is used for replacing the human body information detected by the human face anchor frame, namely, the candidate second human body information is used for replacing the candidate first human body information.
Before the replacement, it is required to determine a corresponding relationship between the candidate first human body information and the candidate second human body information, and in an embodiment provided by the present invention, the step 204 may specifically include the following steps (not shown in the figure): step 2041 and step 2042, wherein,
in step 2041, the human body positions in the candidate second human body information are sorted in descending order according to the human face probability in the second human face information corresponding to the candidate second human body information, and a sorting result { P }is obtained1,P2,…,PNIn which PiThe human body positions in the candidate second human body information arranged at the ith position after being sorted according to the descending order, N is the total number of the candidate second human body information, and i is more than or equal to 1 and less than or equal to N;
in step 2042, for each PiCalculate PiAnd if the maximum value of the calculated IOU values is larger than a preset fourth threshold value, establishing a corresponding relation between the candidate second human body information corresponding to the maximum value and the candidate first human body information corresponding to the maximum value.
Before the replacement, it is required to determine a corresponding relationship between the candidate first human body information and the candidate second human body information, and in another embodiment provided by the present invention, the step 204 may specifically include the following steps (not shown in the figure): step 2043 and step 2044, wherein,
in step 2043, the human body positions in the candidate first human body information are sorted in descending order according to the human face probability in the first human face information corresponding to the candidate first human body information, and a sorting result { Q is obtained1,Q2,…,QMWherein, QjFor candidate first human body information ranked at j-th position according to descending orderThe position of the human body in the information, M is the total number of the candidate first human body information, and j is more than or equal to 1 and less than or equal to M;
in step 2044, for each QjCalculating QjAnd if the maximum value of the calculated IOU values is larger than a preset fourth threshold value, establishing a corresponding relation between the candidate second human body information corresponding to the maximum value and the candidate first human body information corresponding to the maximum value.
In step 205, the candidate second human body information and the candidate first human face information having a matching relationship are determined according to the correspondence relationship between the candidate first human body information and the candidate second human body information and the matching relationship between the candidate first human body information and the candidate first human face information.
In step 206, the human body position in the candidate second human body information with the matching relationship and the human face position in the candidate first human face information are matched and output.
In the embodiment of the invention, the image area where the human body position in the candidate second human body information is located and the image area where the human face position in the candidate first human face information is located can be bound into a pair and then output, so that the human body and the human face belonging to the same person in the target image are obtained.
As can be seen from the above embodiment, in this embodiment, for the target image to be processed, the matching relationship between the human face and the human body belonging to the same person in the target image can be detected by using the human face anchor frame in the first detection model, the human body and the human face which belong to the same person in the target image are detected by the human body anchor frame in the first detection model, then the human face detected by the human face anchor frame and the human body detected by the human body anchor frame are bound into pairs based on the matching relation detected by the human face anchor frame and the human body anchor frame, the human face and the human body are simultaneously detected by the same anchor frame to realize the binding of the human face and the human body, simultaneously gives full play to the advantages that the human face position detection by the human face anchor frame is more accurate and the human body position detection by the human body anchor frame is more accurate, therefore, the accurate human body and the accurate human face belonging to the same person in the target image are matched and output.
In addition, the existing face detection technology has a poor detection effect on a difficult face (such as a small face, a blurred face, an occluded face, and the like), and generally needs to increase a large amount of calculation to improve the detection effect of the difficult face.
Fig. 5 is a flowchart of an image processing method according to another embodiment of the present invention, in the embodiment of the present invention, for candidate second human body information that is not matched to candidate first human body information, if a probability of a human face in second human face information corresponding to the candidate second human body information is higher than a preset first threshold, it indicates that a human body position of the candidate second human body information includes a difficult human face (e.g., a small face, a blurred human face, an occluded human face, etc.), the partial candidate second human body information is further refined and detected, the difficult human faces therein are detected, and a pair is bound, where as shown in fig. 5, the method may include the following steps: step 501, step 502, step 503, step 504, step 505, step 506, step 507, step 508 and step 509, wherein,
in step 501, a target image is received.
In step 502, inputting a target image into a preset first detection model for processing to obtain a first detection result; the first detection result comprises first human body information and first human body information which are detected by a human face anchor frame and have a matching relationship, and second human body information and second human face information which are detected by a human body anchor frame and have a matching relationship, the first human body information and the second human body information both comprise human body probabilities and human body positions, the first human face information and the second human face information both comprise human face probabilities, and the first human face information further comprises human face positions.
In step 503, determining candidate first face information and candidate first human body information matched with the candidate first face information according to the first face information in the first detection result; and determining candidate second human body information according to the second human body information in the first detection result.
In step 504, candidate first human body information and candidate second human body information with a corresponding relationship are determined according to the human body position in the candidate first human body information and the human body position in the candidate second human body information.
In step 505, the candidate second human body information and the candidate first human face information having a matching relationship are determined according to the correspondence between the candidate first human body information and the candidate second human body information and the matching relationship between the candidate first human body information and the candidate first human face information.
In step 506, the human body position in the candidate second human body information with the matching relationship and the human face position in the candidate first human face information are matched and output.
Steps 501 to 506 in the embodiment of the present invention are similar to steps 201 to 206 in the embodiment shown in fig. 2, and are not described herein again, for details, please refer to the contents in the embodiment shown in fig. 2.
In step 507, if there is candidate second human body information that is not matched with the candidate first human body information and the probability of a human face in the second human face information corresponding to the candidate second human body information is higher than a preset first threshold, an image region where a human image is located is extracted according to the position of the human body in the candidate second human body information.
In the embodiment of the present invention, if candidate second human body information that is not matched with the candidate first human body information exists and the probability of a human face in the second human face information corresponding to the candidate second human body information is higher than a preset first threshold, it indicates that an image region where a human body position of the candidate second human body information is located likely contains a human face, but the contained human face is generally difficult to detect (e.g., blurred or relatively small, etc.), and therefore is not contained in the candidate first human body information, at this time, an image region where a human image is located is extracted according to the human body position in the candidate second human body information, and human face detection is performed, so as to improve an effect of binding a human face.
In the embodiment of the present invention, the size of the image region extracted according to the human body position in the candidate second human body information may be equal to or larger than the size of the region where the human body position is located.
In step 508, the image area is input into a preset second detection model for processing, so as to obtain a second detection result, where the second detection result includes third face information, and the third face information includes a face position.
For convenience of understanding, a training process of the second detection model is first described, as shown in fig. 6, fig. 6 is a flowchart of a training step of the second detection model according to an embodiment of the present invention, where the training step is used for training the second detection model by using a target detection algorithm based on keypoint detection, and specifically includes the following steps: step 601, step 602, and step 603, wherein,
in step 601, a second training set is obtained, where the second training set includes a sample human body image, and the sample human body image includes a human face image and a human body image belonging to the same person.
In this embodiment of the present invention, the second training set may include one or more sample human body images, and preferably, the second training set may include sample human body images, considering that the greater the number of samples, the more accurate the detection result of the trained model is.
In step 602, generating annotation data of the sample human body image; wherein, the labeling data of each sample human body image comprises: the thermodynamic diagram marked with the position of the upper left corner of the area where the face image in the sample human body image is located, the thermodynamic diagram marked with the position of the lower right corner of the area where the face image in the sample human body image is located, and the thermodynamic diagram marked with the position of the center point of the area where the face image in the sample human body image is located.
In the embodiment of the invention, in order to reduce the training difficulty and improve the robustness of the network, the color values of the positions of the upper left corner, the lower right corner and the center point of the region where the face image is located in the sample human body image in the corresponding thermodynamic diagram are 255, the color values of adjacent pixels around the positions of the upper left corner, the lower right corner and the center point in the corresponding thermodynamic diagram are sequentially decreased to 0 (namely, the values of the pixels are adjusted in a Gaussian blur mode), and the color values of the rest pixels are 0.
In the embodiment of the present invention, in order to reduce the workload of image processing, the sizes of the thermodynamic diagrams in the annotation data may be all one fourth of the size of the sample human body image.
In step 603, the sample human body image is used as an input, the labeled data of the sample human body image is used as an output target, a preset second initial model is trained, and the trained model is determined as a second detection model.
In this embodiment of the present invention, the preset second initial model may be a model created based on the FPN.
In the embodiment of the invention, when model training is carried out, a sample human body image is input into a preset second initial model, a prediction result is output, the prediction result is compared with labeled data of the sample human body image to obtain a comparison result, each parameter in the second initial model is adjusted through the comparison result and a loss function, after parameter adjustment is finished, the sample human body image is input into the model after parameter adjustment again, the process is repeated until the model converges, and at the moment, the model obtained by training is determined as a second detection model.
Therefore, in the embodiment of the invention, a target detection algorithm based on key point detection and a thermodynamic diagram mode can be adopted for model training to obtain a second detection model for detecting a difficult face.
In an embodiment provided by the present invention, the face detection is performed based on the second detection model obtained by training in fig. 6, as shown in fig. 7, the step 508 may specifically include the following steps: step 5081, step 5082, step 5083 and step 5084, wherein,
in step 5081, inputting a preset second detection model into the image area for processing, so as to obtain a first thermodynamic diagram, a second thermodynamic diagram and a third thermodynamic diagram; the first thermodynamic diagram is used for representing the position of the upper left corner of the area where the face image is predicted in the image area, the second thermodynamic diagram is used for representing the position of the upper right corner of the area where the face image is predicted in the image area, and the third thermodynamic diagram is used for representing the position of the center point of the area where the face image is predicted in the image area.
In the embodiment of the present invention, to improve the detection effect, the image area may be first adjusted to a fixed size, for example, 384 × 128, and then input into the second detection model for processing.
In step 5082, the position of the pixel point with the largest color value in the first thermodynamic diagram, the second thermodynamic diagram and the third thermodynamic diagram and the color value at the position are determined respectively.
In step 5083, if the color values of the positions of the pixels with the maximum color values in the first thermodynamic diagram, the second thermodynamic diagram and the third thermodynamic diagram are all greater than the corresponding preset fifth threshold, the predicted face candidate frame is generated based on the positions of the pixels with the maximum color values in the first thermodynamic diagram and the positions of the pixels with the maximum color values in the second thermodynamic diagram.
In step 5084, if the position of the pixel point with the largest color value in the third thermodynamic diagram is within the predicted face candidate frame, the position in the image area corresponding to the predicted face candidate frame is determined as the face position of the third face information.
In the embodiment of the invention, the output of the second detection model comprises three channels which respectively represent thermodynamic diagrams of the upper left corner, the lower right corner and the central point of the region where the predicted face image is located, the maximum values of the color values in the thermodynamic diagrams of the three channels and the positions of the maximum values are respectively taken, the positions of the maximum values of the upper left corner and the lower right corner form a predicted face candidate frame, and the maximum value of the central point forms a candidate central point of the predicted face frame. And if the three maximum values are respectively larger than the preset threshold value and the predicted face candidate frame comprises the candidate central point, determining the position corresponding to the predicted face candidate frame in the image area as the face position of the third face information.
In the embodiment of the present invention, if the size of the thermodynamic diagram is one fourth of the size of the sample human body image when the second detection model is trained, the height and the width of the thermodynamic diagram output by the second detection model are also one fourth of the input image area, that is, the predicted face frame obtained on the thermodynamic diagram is 1/4 of the real face frame, and there is an error in position due to down-sampling (down-sampling), so that the coordinate transformation of the upper left corner and the lower right corner needs to be mapped to the original image. For example, assuming that the coordinates at the upper left corner of the thermodynamic diagram are (x1, y1), the transformed coordinates are (4x1+2,4y1+2), and the transformed coordinates at the lower right corner can be obtained in the same way.
Therefore, the embodiment of the invention provides a difficult face detection method based on key point detection, which is characterized in that the thermodynamic diagram mode is used for simultaneously detecting the upper left corner, the central point and the lower right corner of the face frame, and the face frame is filtered according to the central point, so that the detection effect of the difficult face (such as a fuzzy face or a small face) is effectively improved.
In step 509, the human body position in the candidate second human body information corresponding to the third face information and the face position in the third face information are matched and output.
In the embodiment of the invention, the image area where the human body position in the candidate second human body information corresponding to the third face information is located can be bound with the image area where the human face position in the third face information is located in pair and then output, so that the human body and the human face belonging to the same person in the target image are obtained. That is, in the embodiment of the present invention, a face region corresponding to a human body which is not bound to a face through the first detection model but has a face at a high probability is input into the second detection model, that is, the difficult face model, to perform further face detection, so as to obtain a human body and a difficult face belonging to the same person in the target image, thereby improving the recall rate of the face.
For convenience of understanding, the technical solution of the embodiment of the present invention is described with reference to the exemplary diagram shown in fig. 8, as shown in fig. 8, the target image includes three persons, which are A, B and C respectively, where the face of C is blurred, the probability of the face in the first face information of C detected by the first detection model is smaller than or equal to a preset second threshold, and the probability of the face in the second face information of C detected by the first detection model is higher than the preset first threshold. When the target image needs to be processed, the target image is firstly input into the first detection model to be processed, so that a face image (including the face image of A and the face image of B) matched with a human body, a human body image (including the human body image of A and the human body image of B) matched with the human face and a human body image (namely the human body image of C) containing a difficult human face are obtained, and then the human body image of C is input into the second detection model to be processed, so that the face image (namely the face image of C) contained in the human body image of C is obtained.
It can be seen from the above embodiment that, in this embodiment, the first detection model is used to detect the face image that is easy to detect, and the second detection model is used to detect the face image that is not easy to detect (i.e. the difficult face), so that the matching output of the human body and the face belonging to the same person in the image can be realized, and the balance between the speed and the detection effect can be realized.
In another embodiment provided by the present invention, the image processing method may further include the steps of:
and for candidate second human body information of which the face information is not detected by the second detection model, matching and outputting the human body position in the candidate second human body information with a predefined invalid face position.
In another embodiment provided by the present invention, the image processing method may further include the steps of:
and if candidate second human body information which is not matched with the candidate first human body information exists and the human face probability in the second human face information corresponding to the candidate second human body information does not reach a preset first threshold value, matching and outputting the human body position in the candidate second human body information and a predefined invalid human face position.
In the embodiment of the present invention, the predefined invalid face positions may include: the position of the empty human face anchor frame or the human face position with the human face probability lower than a certain value.
Fig. 9 is a block diagram of an image processing apparatus according to an embodiment of the present invention, and as shown in fig. 9, an image processing apparatus 900 may include: a receiving module 901, a first processing module 902, a first determining module 903, a second determining module 904, a third determining module 905 and a first output module 906, wherein the receiving module 901 is used for receiving a target image;
a first processing module 902, configured to input the target image into a preset first detection model for processing, so as to obtain a first detection result; the first detection result comprises first human body information and first human body information which are detected by a human face anchor frame and have matching relations, and second human body information which are detected by a human body anchor frame and have matching relations, wherein the first human body information and the second human body information both comprise human body probabilities and human body positions, the first human face information and the second human face information both comprise human face probabilities, and the first human face information also comprises human face positions;
a first determining module 903, configured to determine candidate first face information and candidate first person information matched with the candidate first face information according to the first person information in the first detection result; determining candidate second human body information according to the second human body information in the first detection result;
a second determining module 904, configured to determine candidate first human body information and candidate second human body information having a corresponding relationship according to the human body position in the candidate first human body information and the human body position in the candidate second human body information;
a third determining module 905, configured to determine candidate second human body information and candidate first human face information having a matching relationship according to a corresponding relationship between the candidate first human body information and the candidate second human body information and a matching relationship between the candidate first human body information and the candidate first human face information;
and a first output module 906, configured to match and output the human body position in the candidate second human body information and the human face position in the candidate first human face information, which have a matching relationship.
It can be seen from the above embodiment that, in this embodiment, for a target image to be processed, a human face anchor frame in a first detection model may be used to detect a matching relationship between a human face and a human body belonging to the same person in the target image, the human body anchor frame in the first detection model may be used to detect the matching relationship between the human body and the human face belonging to the same person in the target image, and then the human face detected by the human face anchor frame and the human body detected by the human body anchor frame may be bound into a pair based on the matching relationship detected by the human face anchor frame and the human body anchor frame, so as to implement matching output of the human body and the human face belonging to the same person in the target image.
Optionally, as an embodiment, the image processing apparatus 900 may further include:
the extraction module is used for extracting an image area where the portrait is located according to the position of the human body in the candidate second human body information if the candidate second human body information which is not matched with the candidate first human body information exists and the probability of the human face in the second human face information corresponding to the candidate second human body information is higher than a preset first threshold value;
the second processing module is used for inputting the image area into a preset second detection model for processing to obtain a second detection result, wherein the second detection result comprises third face information, and the third face information comprises a face position;
and the second output module is used for matching and outputting the human body position in the candidate second human body information corresponding to the third face information with the human face position in the third face information.
Optionally, as an embodiment, the image processing apparatus 900 may further include:
and the third output module is used for matching and outputting the human body position in the candidate second human body information and the predefined invalid human face position for the candidate second human body information of which the human face information is not detected by the second detection module.
Optionally, as an embodiment, the image processing apparatus 900 may further include:
and the fourth output module is used for matching and outputting the human body position in the candidate second human body information and the predefined invalid human face position if the candidate second human body information which is not matched with the candidate first human body information exists and the human face probability in the second human face information corresponding to the candidate second human body information does not reach the preset first threshold value.
Optionally, as an embodiment, the first determining module 903 may include:
the first processing submodule is used for carrying out non-maximum value suppression NMS processing on the face position in the first face information of which the face probability is higher than a preset second threshold value in the first detection result to obtain candidate first face information and candidate first human body information matched with the candidate first face information; and performing NMS (network management system) processing on the human body position in the second human body information of which the human body probability in the first detection result is higher than a preset third threshold value to obtain candidate second human body information.
Optionally, as an embodiment, the second determining module 904 may include:
a sorting submodule, configured to sort the human body positions in the candidate second human body information in a descending order according to the human face probability in the second human face information corresponding to the candidate second human body information, so as to obtain a sorting result { P1,P2,…,PNIn which PiThe human body positions in the candidate second human body information arranged at the ith position after being sorted according to the descending order, N is the total number of the candidate second human body information, and i is more than or equal to 1 and less than or equal to N;
a relationship establishing submodule for establishing a relationship between each PiCalculate PiAnd if the maximum value of the calculated IOU values is larger than a preset fourth threshold value, establishing the corresponding relation between the candidate second human body information corresponding to the maximum value and the candidate first human body information corresponding to the maximum value.
Optionally, as an embodiment, the second processing module may include:
the second processing submodule is used for inputting the image area into a preset second detection model for processing to obtain a first thermodynamic diagram, a second thermodynamic diagram and a third thermodynamic diagram, wherein the first thermodynamic diagram is used for representing the position of the upper left corner of the area where the predicted face image is located in the image area, the second thermodynamic diagram is used for representing the position of the upper right corner of the area where the predicted face image is located in the image area, and the third thermodynamic diagram is used for representing the position of the center point of the area where the predicted face image is located in the image area;
the first determining submodule is used for respectively determining the position of the pixel point with the maximum color value in the first thermodynamic diagram, the second thermodynamic diagram and the third thermodynamic diagram and the color value at the position;
the generation submodule is used for generating a predicted face candidate frame based on the position of the pixel point with the maximum color value in the first thermodynamic diagram and the position of the pixel point with the maximum color value in the second thermodynamic diagram if the color values of the positions of the pixel points with the maximum color values in the first thermodynamic diagram, the second thermodynamic diagram and the third thermodynamic diagram are all larger than the corresponding preset fifth threshold;
and the second determining submodule is used for determining the position corresponding to the predicted face candidate frame in the image area as the face position of the third face information if the position of the pixel point with the maximum color value in the third thermodynamic diagram is in the predicted face candidate frame.
Optionally, as an embodiment, the second face information only includes a face probability.
Optionally, as an embodiment, the human body anchor frame is designed according to the aspect ratio and the size of the human body; and the human face anchor frame is designed according to the aspect ratio and the size of the human face.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
According to still another embodiment of the present invention, there is also provided an electronic apparatus including: a memory, a processor and a computer program stored on the memory and executable on the processor, the computer program, when executed by the processor, implementing the steps in the image processing method according to any of the embodiments described above.
According to still another embodiment of the present invention, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps in the image processing method according to any one of the above-mentioned embodiments.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The image processing method, the image processing apparatus, the electronic device, and the storage medium according to the present invention are described in detail above, and a specific example is applied in the present disclosure to explain the principle and the implementation of the present invention, and the description of the above embodiment is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.