WO2021204037A1

WO2021204037A1 - Detection method and apparatus for facial key point, and storage medium and electronic device

Info

Publication number: WO2021204037A1
Application number: PCT/CN2021/084220
Authority: WO
Inventors: 蔡中印; 赵晓辉; 陈斌; 宋晨
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-11-12
Filing date: 2021-03-31
Publication date: 2021-10-14
Also published as: CN112380981A

Abstract

A detection method and apparatus for a facial key point, and a storage medium and an electronic device, which belong to the technical field of facial recognition. The method comprises: acquiring an image to be labeled, which includes a face (S110); inputting said image into a pre-trained key point labeling model, so that the key point labeling model outputs a thermodynamic diagram and predicted key point coordinates that correspond to said image, and an occlusion confidence of each point location in said image (S120); determining target key point coordinates of said image according to the thermodynamic diagram, the predicted key point coordinates and the occlusion confidence (S130); and performing key point labeling on said image according to the target key point coordinates (S140). By means of the method, the recognition efficiency of a facial key point can be improved, and the recognition accuracy of the facial key point is ensured.

Description

Method, device, storage medium and electronic equipment for detecting key points of human face

This application claims the priority of the Chinese patent application filed at the Chinese Patent Office on November 12, 2020 with the application number 202011264438.2 and the invention title "Methods, devices, storage media and electronic equipment for detecting key points of human faces". The entire content is incorporated into this application by reference.

Technical field

This application relates to the technical field of face recognition, and specifically, to a method for detecting key points of a face, a device for detecting key points of a face, a computer-readable storage medium, and an electronic device.

Background technique

Face key point detection refers to a technology that detects key points on the face, such as eyes, nose, and face edges, in a face image. It can be used in scenes such as locating parts of human faces, recognizing facial expressions, intelligent driving test judgments, and assisted driving. In the current technical solution, an image is marked for multiple times and the average value is taken to eliminate errors in marking the key points of the face. However, the inventor found that multiple labeling takes a long time and the cost is high. Therefore, how to improve the recognition efficiency of the key points of the face and ensure the accuracy of the recognition of the key points of the face has become an urgent technical problem to be solved.

It should be noted that the information disclosed in the background art section above is only used to enhance the understanding of the background of the application, and therefore may include information that does not constitute the prior art known to those of ordinary skill in the art.

technical problem

One of the objectives of the embodiments of the present application is to provide a method for detecting key points of a human face, a device for detecting key points of a human face, a computer-readable storage medium, and electronic equipment, so as to solve the problem that multiple annotations are time-consuming and costly. To improve the recognition efficiency of the key points of the face, and ensure the accuracy of the recognition of the key points of the face.

Technical solutions

In order to solve the above technical problems, the technical solutions adopted in the embodiments of this application are:

The first aspect of the embodiments of the present application provides a method for detecting key points of a human face, which includes:

Obtain an image to be annotated containing a human face;

Input the image to be annotated into the pre-trained key point annotation model, so that the key point annotation model outputs the heat map corresponding to the image to be annotated, predicted key point coordinates, and each point in the image to be annotated Occlusion confidence level;

Determining the target key point coordinates of the image to be annotated according to the heat map, the predicted key point coordinates, and the occlusion confidence;

According to the coordinate of the target key point, the key point is marked on the image to be marked.

The second aspect of the embodiments of the present application provides an apparatus for detecting key points of a human face, which includes:

The acquisition module is used to acquire the to-be-labeled image containing the human face;

The processing module is used to input the image to be annotated into the pre-trained key point annotation model, so that the key point annotation model outputs the heat map corresponding to the image to be annotated, predicted key point coordinates, and the to be annotated The occlusion confidence of each point in the image;

A determining module, configured to determine the target key point coordinates of the image to be annotated according to the heat map, the predicted key point coordinates, and the occlusion confidence;

The marking module is used to mark the key points of the image to be marked according to the coordinates of the target key points.

A third aspect of the embodiments of the present application provides a computer-readable storage medium on which a computer program is stored, wherein the steps of implementing the computer program when the computer program is executed by a processor include:

Obtain an image to be annotated containing a human face;

The fourth aspect of the embodiments of the present application provides an electronic device, which includes:

Processor; and

A memory on which a computer program is stored;

Wherein, the steps that the processor is configured to be implemented by executing the computer program include:

Obtain an image to be annotated containing a human face;

Beneficial effect

The beneficial effects of this application are:

Based on the embodiments of the present application, by acquiring the image to be annotated containing the face, the image to be annotated is input to the pre-trained key point annotation model, so that the key point annotation model outputs the heat map and prediction key corresponding to the image to be annotated Point coordinates and the occlusion confidence of each point in the image to be annotated, and then determine the target key point coordinates of the image to be annotated according to the heat map, predicted key point coordinates, and occlusion confidence, so as to mark the key points of the image to be annotated. , Determine the target key point coordinates through the heat map, predict the key point coordinates and the occlusion confidence, which can ensure the accuracy of the target key point coordinates, and at the same time, there is no need to mark multiple times, thereby improving the recognition efficiency of the face key points.

Description of the drawings

The drawings herein are incorporated into the specification and constitute a part of the specification, show embodiments that conform to the application, and are used together with the specification to explain the principle of the application. Obviously, the drawings in the following description are only some embodiments of the application. For those of ordinary skill in the art, other drawings can be obtained based on these drawings without creative work.

Fig. 1 shows a schematic flowchart of a method for detecting key points of a human face according to an embodiment of the present application.

FIG. 2 shows a schematic flowchart of step S130 in the method for detecting key points of a human face in FIG. 1 according to an embodiment of the present application.

Fig. 3 shows a schematic flow chart of determining an occlusion threshold further included in the method for detecting key points of a human face according to an embodiment of the present application.

FIG. 4 shows a schematic flowchart of step S330 in the method for detecting key points of a human face in FIG. 3 according to an embodiment of the present application.

FIG. 5 shows a schematic flowchart of training a key point annotation model further included in the method for detecting key points of a face according to an embodiment of the present application.

FIG. 6 shows a schematic flowchart of step S540 in the method for detecting key points of a human face in FIG. 5 according to an embodiment of the present application.

Fig. 7 shows a schematic composition block diagram of a device for detecting key points of a human face according to an embodiment of the present application.

Fig. 8 shows a schematic block diagram of an electronic device according to an embodiment of the present application.

Fig. 9 shows a schematic diagram of a computer-readable storage medium according to an embodiment of the present application.

Embodiments of the present invention

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the example embodiments can be implemented in various forms, and should not be construed as being limited to the examples set forth herein; on the contrary, the provision of these embodiments makes this application more comprehensive and complete, and fully conveys the concept of the example embodiments To those skilled in the art. The described features, structures or characteristics can be combined in one or more embodiments in any suitable way. In the following description, many specific details are provided to give a sufficient understanding of the embodiments of the present application. However, those skilled in the art will realize that the technical solutions of the present application can be practiced without one or more of the specific details, or other methods, components, devices, steps, etc. can be used. In other cases, the well-known technical solutions are not shown or described in detail to avoid overwhelming the crowd and obscure all aspects of the present application.

In addition, the drawings are only schematic illustrations of the application and are not necessarily drawn to scale. The same reference numerals in the figures denote the same or similar parts, and thus their repeated description will be omitted. Some of the block diagrams shown in the drawings are functional entities and do not necessarily correspond to physically or logically independent entities. These functional entities may be implemented in the form of software, or implemented in one or more hardware modules or integrated circuits, or implemented in different networks and/or processor devices and/or microcontroller devices.

Fig. 1 shows a schematic flowchart of a method for detecting key points of a human face according to an embodiment of the present application. The method for detecting key points of a face can be applied to terminal devices, such as smart phones, tablets, or portable computers. In other embodiments, the method for detecting key points of a face can also be applied to a server. There are no special restrictions.

Referring to FIG. 1, the method for detecting key points of a human face at least includes steps S110 to S140, which are described in detail as follows:

In step S110, an image to be labeled containing a human face is acquired.

In an embodiment of the present application, the terminal device may obtain an image to be annotated from a local storage location, where the image to be annotated includes an unlabeled face part. It should be noted that the number of acquired images to be labeled may be one, or any number of two or more, which is not specifically limited in this application.

In an embodiment of the present application, when the terminal device receives the labeling instruction, it can turn on its configured photographing device such as a camera, and the user can aim the photographing device at the object to be labelled to obtain the image to be labelled.

In step S120, the image to be annotated is input to the pre-trained key point annotation model, so that the key point annotation model outputs the heat map corresponding to the image to be annotated, the predicted key point coordinates, and the to-be-annotated image The occlusion confidence of each point in the image.

In this embodiment, the key point labeling model can be trained by using a convolutional neural network, and through pre-training, the key point labeling model can output a heat map corresponding to the input, predict the key point coordinates, and each of the inputs. The occlusion confidence of the point.

It should be noted that the heat map can use the brightness of each point to characterize the possibility that each point is a key point of the face, that is, the higher the brightness, the greater the probability that the point corresponds to the key point, and the opposite , The lower the brightness, the less likely that the point corresponds to the key point.

However, the recognition effect of the heat map on the to-be-labeled image with the occluded face is not good, and its reliability corresponding to the occluded part is low. Therefore, a set of fully-linked layers is trained so that the fully-linked layer can output the predicted key point coordinates of the key point positions of the image to be labeled. Retrain the key point annotation model so that the key point annotation model can output the occlusion confidence of each point in the image to be annotated.

It should be noted that the occlusion confidence can be used to describe the likelihood that each point in the image to be labeled is occluded. In actual use, the occlusion confidence can be a value between 0 and 1. If the occlusion confidence is greater, it means that the corresponding point is more likely to be unoccluded; if the occlusion confidence is smaller, it means The more likely the corresponding point is to be blocked.

In step S130, the target key point coordinates of the image to be marked are determined according to the heat map, the predicted key point coordinates, and the occlusion confidence.

In this step, by considering the occlusion confidence, the target key point coordinates are identified from the coordinates of the highlighted position of the heat map and the predicted key point coordinates, thereby ensuring the accuracy of the target key point coordinates.

In an embodiment of the present application, the coordinates of the highlighted position in the heat map can be compared with the occlusion confidence of each point in the image to be annotated. If the occlusion confidence level corresponding to the coordinates of the highlighted position in the heat map is low and is within the occluded numerical range, it means that the highlight position is more likely to be occluded. Therefore, the highlighted position can be Adjust the brightness to the low brightness range.

If the occlusion confidence level corresponding to the coordinates of the highlighted position in the heat map is high and is within the range of unoccluded values, it means that the highlight position is more likely to be unoccluded, so the highlight position can be The reliability is high, and the highlighted position may not be processed. As a result, the updated heat map can be obtained after the comparison.

Combining the updated heat map with the predicted key point coordinates output by the key point annotation model, the coordinates of the highlighted position of the updated heat map and the predicted key point coordinates are deduplicated to obtain the target key point coordinates. Thus, by occluding the confidence level, the coordinates of the highlighted position in the heat map can be screened, combined with the prediction of the key point coordinates, thereby ensuring the accuracy of the target key point coordinates.

In step S140, the key points of the image to be marked are marked according to the coordinates of the target key points.

In this step, based on the determined target key point coordinates, the image to be marked can be marked with key points. In an example, the key point annotation of the image to be annotated may be to highlight the coordinates of the target key point in the image to be annotated, for example, to display in a predetermined color of red or yellow. In another example, the key point labeling of the image to be labeled may be in the form of annotation. Specifically, one end of the annotation box may indicate the position corresponding to the target key point coordinates, and the other end may contain the region information of the key point. For example, key points of lips, key points of eyes or key points of nose and so on.

In the embodiment shown in FIG. 1, the image to be annotated is input into the key point annotation model, so that the key point annotation model outputs the heat map corresponding to the image to be annotated, the predicted key point coordinates, and each point in the image to be annotated According to the heat map, the predicted key point coordinates and the occlusion confidence degree, the target key point coordinates of the image to be labeled can be determined. The accuracy of the target key point coordinates can be ensured, and there is no need to perform multiple times. Marking improves the marking efficiency.

Based on the embodiment shown in FIG. 1, FIG. 2 shows a schematic flowchart of step S130 in the method for detecting key points of a human face in FIG. 1 according to an embodiment of the present application. Referring to FIG. 2, step S130 includes at least step S210 to step S240, which are described in detail as follows:

In step S210, the occlusion confidence of each point in the to-be-labeled image is compared with a preset occlusion threshold to determine the point to be processed whose occlusion confidence is less than the occlusion threshold.

Among them, the occlusion threshold can be a threshold used to determine whether the point is occluded. If the occlusion confidence of a certain point is less than the occlusion threshold, it means that the point is more likely to be occluded; if the occlusion of a certain point is confident If the degree is greater than or equal to the occlusion threshold, it means that the point is more likely to be unoccluded.

In this embodiment, the occlusion confidence level corresponding to each point in the image to be labeled is compared with the preset occlusion threshold, and it can be obtained that the point in the image to be labeled has a higher probability, that is, the occlusion confidence is less than the occlusion threshold. To be processed.

In step S220, according to the points to be processed, the predicted key point coordinates corresponding to the points to be processed are selected from the predicted key point coordinates as the first key point coordinates.

In this embodiment, since the point to be processed is a point that has a greater possibility of being occluded, compared to the highlighted position corresponding to the point to be processed in the heat map, it corresponds to the point to be processed The predicted key point coordinates of the predicted key point coordinates are highly reliable, so the predicted key point coordinates corresponding to the point to be processed in the predicted key point coordinates are taken as the first key point coordinates.

In step S230, according to the point to be processed, the coordinates of the highlight point except for the position corresponding to the point to be processed are selected from the heat map as the second key point coordinate.

In this embodiment, since the point to be processed is a point that has a greater possibility of being occluded, the highlight in the heat map except for the position corresponding to the point to be processed is a point that has a greater possibility The unobstructed point has high credibility, so it can be used as the second key point coordinate.

In step S240, the first key point coordinates and the second key point coordinates are integrated to determine the target key point coordinates of the image to be annotated.

In this embodiment, according to the identified first key point coordinates and the second key point coordinates, the first key point coordinates and the second key point coordinates are integrated to combine the first key point coordinates and the second key point coordinates. The coordinates are used as the target key point coordinates.

In the embodiment shown in FIG. 2, by setting the occlusion confidence, the first key point coordinates and the second key point coordinates are respectively selected from the predicted key point coordinates and the heat map as the target key point coordinates. Therefore, the coordinates with higher reliability can be selected from the predicted key point coordinates and the heat map respectively as the target key point coordinates, thereby ensuring the accuracy of the target key point coordinates.

Based on the embodiment shown in FIG. 1, FIG. 3 shows a schematic flowchart of determining the occlusion threshold further included in the method for detecting key points of a human face according to an embodiment of the present application. Referring to FIG. 3, determining the occlusion threshold includes at least step S310 to step S330, which are described in detail as follows:

In step S310, an occlusion training sample set is obtained, the occlusion training sample set includes a plurality of occlusion sample images, and the face in the occlusion sample image includes partial occlusion.

In this embodiment, the occlusion training sample set may contain multiple occlusion sample images, and the face in the occlusion sample image is partially occluded, such as background, hat, mask, bangs, glasses, mustache, fingers, pen, or microphone, etc. Common face occluders. Corresponding to each occlusion sample image, the coordinates of the occlusion position corresponding to the occlusion sample image can be correspondingly stored for subsequent comparison.

In step S320, the occlusion sample images in the occlusion training sample set are input into the key point annotation model, so that the key point annotation model outputs the occlusion confidence of each point in the occlusion sample image.

In this embodiment, the key point annotation model is constructed so that the key point annotation model can output the occlusion confidence of each point corresponding to its input. The occlusion sample images in the occlusion training sample set are input to the key point annotation model to be trained, so that the key point annotation model can output the occlusion confidence of each point in the occlusion sample image.

In step S330, an occlusion threshold is determined according to the occlusion confidence of each point in the multiple occlusion sample images.

In an embodiment of the present application, according to the occlusion confidence of each point in the occlusion sample image output by the key point annotation model, it is compared with the coordinates corresponding to the occlusion position in each occlusion sample image to obtain each occlusion sample The occlusion confidence of the points in the image without occlusion, that is, the occlusion confidence of the points in each occlusion sample image other than the occlusion position. The occlusion threshold is determined according to the occlusion confidence of the point corresponding to the unoccluded position for subsequent judgment.

In an example of the present application, the minimum value can be selected as the occlusion threshold from the occlusion confidence of the points in the unoccluded position in each occlusion sample image, so as to ensure that the occlusion threshold can be identified as much as possible in the subsequent comparison The occluded position in the image to be annotated.

In the embodiment shown in FIG. 3, by setting the occlusion training sample set as the input of the key point labeling model, the key point labeling model can output the occlusion confidence of each point in the occlusion sample image in the training sample set. Therefore, the occlusion threshold is determined according to the occlusion confidence, which ensures the validity of the occlusion threshold setting, so that the occlusion threshold has reference value.

Based on the embodiments shown in FIG. 1 and FIG. 3, FIG. 4 shows a schematic flowchart of step S330 in the method for detecting key points of a human face in FIG. 3 according to an embodiment of the present application. Referring to FIG. 4, step S330 includes at least step S410 to step S420, which are described in detail as follows:

In step S410, from the occlusion confidence of each point in the multiple occlusion sample images, the occlusion confidence of the point corresponding to the unoccluded position in the occlusion sample image is selected as the confidence to be selected.

In this embodiment, the occlusion confidence of each point in each occlusion sample image output by the key point annotation model is matched with the coordinates of the point corresponding to the occluded position in each occlusion sample image, and each occlusion can be obtained. The occlusion confidence of points in the sample image except the occluded position, that is, the occlusion confidence of the points corresponding to the unoccluded positions in each occluded sample image, and will correspond to the points of the unoccluded positions in each occluded sample image The occlusion confidence of is used as the candidate confidence, and one of them is selected as the occlusion threshold.

In step S420, the occlusion confidence levels arranged in predetermined proportions are selected from the to-be-selected confidence levels in descending order as the occlusion threshold.

In this embodiment, the predetermined ratio may be a ratio that is preset by a person skilled in the art to determine the occlusion threshold. For example, the predetermined ratio may be 98%, 99%, or 99.5%, and so on. For example, the number of confidence levels to be selected is 1000. If the predetermined ratio is 99.5%, the candidate ranked 995th (ie 1000*99.5%) will be selected from the candidate confidence levels arranged in descending order The confidence is used as the occlusion threshold.

It should be noted that because there is a certain error in the recognition of the key point annotation model, there is also a certain error in the occlusion confidence corresponding to the occluded position and the unoccluded position, and there is an intersection. Therefore, those skilled in the art can set the preset according to prior experience. Ratio to eliminate the error, thereby ensuring the effectiveness of the occlusion threshold and avoiding subsequent misidentification.

Based on the embodiment shown in FIG. 1, FIG. 5 shows a schematic flowchart of training a key point labeling model further included in the method for detecting key points of a face according to an embodiment of the present application. Referring to FIG. 5, training the key point annotation model includes at least step S510 to step S530, which are described in detail as follows:

In step S510, a training sample set is obtained. The training sample set includes a plurality of sample images including human faces, and the sample images include key point information.

In this embodiment, the training sample set may be a sample set used to train the key point annotation model, which may include multiple sample images containing human faces, and each sample image may contain its own key point information. The key point information may be the coordinates of the key point of the image to be labeled that has been calibrated in advance.

In an example of the present application, the training sample set, such as an image database, can be obtained from a local storage location. Specifically, when a training request for the key point annotation model is received, a predetermined number of sample images can be randomly selected from the image database and randomly arranged to obtain a training sample set. In other examples, the training sample set can also be obtained from a third-party organization through the network, which is not specifically limited in this application.

In step S520, the sample image is input into the key point annotation model to be trained, so that the key point annotation model outputs a heat map corresponding to the sample image, predicted key point coordinates, and the sample image The occlusion confidence of each point.

In this embodiment, the key point annotation model is constructed to have three branches, one is the heat map branch, which can output the heat map corresponding to the sample image, and the highlights of the heat map can be used Yu represents the key point position of the sample image; the second is the prediction key point coordinate branch, which can analyze the sample image to output the predicted key point coordinates for predicting the key point of the sample image; the third is the occlusion confidence output branch, This branch can output the occlusion confidence level corresponding to each point in the sample image.

Therefore, each sample image in the training sample set is input to the keypoint annotation model, so that the three branches of the keypoint annotation model output the heat map corresponding to each sample image, the predicted keypoint coordinates, and the sample image. The occlusion confidence of each point.

In step S530, the target key point coordinates in the sample image are determined according to the heat map corresponding to the sample image, the predicted key point coordinates, and the occlusion confidence of each point in the sample image.

In this embodiment, the target key point coordinates corresponding to each sample image can be determined by referring to the selection method described in the above embodiment according to the heat map corresponding to each sample image, the predicted key point coordinates, and the occlusion confidence of each point. , This application will not repeat it here.

In step S540, the parameters in the key point labeling model to be trained are adjusted so that the target key point coordinates in the sample image match the key point information.

In this embodiment, according to the determined target key point coordinates of the sample image, it is compared with the pre-calibrated key point information of each sample image to determine whether the target key point coordinates match the key point information. Matching means that the key point labeling model is incorrectly recognized. Therefore, the parameters of the key point labeling model can be adjusted so that the target key point coordinates determined according to the output of the key point labeling model can be matched with the key point information of the sample image , So as to ensure the accuracy of the recognition of the key point annotation model.

Based on the embodiments shown in FIGS. 1 and 5, FIG. 6 shows a schematic flowchart of step S540 in the method for detecting key points of a human face in FIG. 5 according to an embodiment of the present application. Referring to FIG. 6, step S540 includes at least step S610 to step S640, which are described in detail as follows:

In step S610, the training sample set is input to the key point labeling models to be trained with different learning rates, so that each key point labeling model outputs training data respectively, and the training data includes the heat map corresponding to each sample image , Predict the coordinates of key points and the occlusion confidence of each point in the sample image.

In this embodiment, the training sample set is input to the key point labeling model to be trained with different learning rates, so that the key point labeling model with different learning rates outputs multiple sets of training data. The training data includes the corresponding training data for each sample image. Heat map, predicted key point coordinates, and occlusion confidence of each point in the sample image.

In an embodiment of the present application, a larger learning rate can be set for the key point labeling model during the first training. After 100 times of training based on the training sample set are completed, the learning rate is reduced by 10 times, and Continue training according to the updated learning rate until the key point labeling model is close to convergence and the loss function no longer drops, storing multiple sets of training data output by the key point labeling model under different learning rates.

In step S620, statistics are performed based on the plurality of training data, and target training data is identified from the plurality of training data.

In an embodiment of the present application, according to multiple sets of training data, the target key point coordinate data corresponding to each sample image in each set of training data can be correspondingly determined. As a result, multiple sets of target key point coordinate data corresponding to each sample image can be obtained, and then according to the multiple sets of target key point coordinate data corresponding to each sample image, the target key point coordinates can be used as the sample image with the number of occurrences above a predetermined number of times. For example, in multiple sets of target key point coordinate data corresponding to a sample image, the number of times that coordinate A appears is 50 times, the number of times that coordinate B appears is 10 times, and the predetermined number is 40 times. The coordinate A is determined as the true key point coordinate of the sample image, the coordinate B is not adopted, and so on. It should be noted that the above figures are only illustrative examples, and this application does not specifically limit this.

It should be understood that in the multiple sets of target key point sample data corresponding to each sample image, the more frequent the target key point coordinates are, the more likely they are the true key point coordinates of the sample image. Therefore, the number of occurrences can be reduced The target key point coordinates greater than a predetermined number of times are used as the real key point coordinates corresponding to the sample image and integrated to obtain the target training data corresponding to the sample image.

In step S630, the key point information contained in the sample image is updated according to the target training data to obtain updated key point information of the sample image.

In this embodiment, the key point information originally contained in the sample image is replaced according to the obtained target training data, so as to obtain the updated key point information of the sample image.

In step S640, the parameters in the key point labeling model to be trained are adjusted to make the target key point coordinates match the updated key point information.

In this embodiment, when the key point annotation model is subsequently trained, the parameters of the key point annotation model are adjusted so that the output of the key point annotation model can match the updated key point information of each sample image. As a result, the updated key point information is used to guide the training of the key point annotation model to eliminate the error of the original pre-calibrated key point information, so as to ensure the training effect of the key point annotation model to ensure the key point annotation model. The accuracy of the output.

This application also provides a device for detecting key points of a human face. Referring to FIG. 7, the device may include:

The obtaining module 710 is configured to obtain an image to be annotated containing a human face;

The processing module 720 is configured to input the image to be annotated into the pre-trained key point annotation model, so that the key point annotation model outputs the heat map corresponding to the image to be annotated, predicted key point coordinates, and the to-be-annotated image. Annotate the occlusion confidence of each point in the image;

The determining module 730 is configured to determine the target key point coordinates of the image to be annotated according to the heat map, the predicted key point coordinates, and the occlusion confidence level;

The marking module 740 is configured to mark the key points of the image to be marked according to the coordinates of the target key points.

In an embodiment of the present application, the determining module 730 includes:

A determining unit, configured to compare the occlusion confidence of each point in the to-be-labeled image with a preset occlusion threshold, and determine the point to be processed whose occlusion confidence is less than the occlusion threshold;

The first selection unit is configured to select, from the predicted key point coordinates, the predicted key point coordinates corresponding to the point to be processed as the first key point coordinates according to the point to be processed;

The second selection unit is configured to select, from the heat map, the coordinates of the highlight point except for the position corresponding to the point to be processed as the second key point coordinate according to the point to be processed;

The integration unit is configured to integrate the first key point coordinates and the second key point coordinates to determine the target key point coordinates of the image to be annotated.

The specific details of each module in the above-mentioned face key point detection device have been described in detail in the corresponding face key point detection method, so it will not be repeated here.

It should be noted that although several modules or units of the device for action execution are mentioned in the above detailed description, this division is not mandatory. In fact, according to the embodiments of the present application, the features and functions of two or more modules or units described above may be embodied in one module or unit. Conversely, the features and functions of a module or unit described above can be further divided into multiple modules or units to be embodied.

In addition, although the various steps of the method in the present application are described in a specific order in the drawings, this does not require or imply that these steps must be performed in the specific order, or that all the steps shown must be performed to achieve the desired result. Additionally or alternatively, some steps may be omitted, multiple steps may be combined into one step for execution, and/or one step may be decomposed into multiple steps for execution, etc.

Through the description of the above embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) execute the method according to the embodiment of the present application.

In an exemplary embodiment of the present application, an electronic device capable of implementing the above method is also provided.

Those skilled in the art can understand that various aspects of the present application can be implemented as a system, a method, or a program product. Therefore, each aspect of the present application can be specifically implemented in the following forms, namely: complete hardware implementation, complete software implementation (including firmware, microcode, etc.), or a combination of hardware and software implementations, which can be collectively referred to herein as "Circuit", "Module" or "System".

The electronic device 500 according to this embodiment of the present application will be described below with reference to FIG. 8. The electronic device 500 shown in FIG. 8 is only an example, and should not bring any limitation to the function and scope of use of the embodiments of the present application.

As shown in FIG. 8, the electronic device 500 is represented in the form of a general-purpose computing device. The components of the electronic device 500 may include, but are not limited to: the aforementioned at least one processing unit 510, the aforementioned at least one storage unit 520, and a bus 530 connecting different system components (including the storage unit 520 and the processing unit 510).

Wherein, the storage unit stores program code, and the program code can be executed by the processing unit 510, so that the processing unit 510 executes the various exemplary methods described in the “Exemplary Method” section of this specification. Steps of implementation. For example, the processing unit 510 may perform step 110 as shown in FIG. 1: obtain an image to be annotated containing a human face; step S120: input the image to be annotated into the pre-trained key point annotation model, so that The key point annotation model outputs the heat map corresponding to the image to be annotated, the predicted key point coordinates, and the occlusion confidence of each point in the image to be annotated; step S130, according to the heat map and the predicted key point The coordinates and the occlusion confidence level determine the target key point coordinates of the image to be annotated; step S140, according to the target key point coordinates, mark the key points of the image to be annotated.

The storage unit 520 may include a readable medium in the form of a volatile storage unit, such as a random access storage unit (RAM) 5201 and/or a cache storage unit 5202, and may further include a read-only storage unit (ROM) 5203.

The storage unit 520 may also include a program/utility tool 5204 having a set of (at least one) program module 5205. Such program module 5205 includes but is not limited to: an operating system, one or more application programs, other program modules, and program data, Each of these examples or some combination may include the implementation of a network environment.

The bus 530 may represent one or more of several types of bus structures, including a storage unit bus or a storage unit controller, a peripheral bus, a graphics acceleration port, a processing unit, or a local area using any bus structure among multiple bus structures. bus.

The electronic device 500 may also communicate with one or more external devices 700 (such as keyboards, pointing devices, Bluetooth devices, etc.), and may also communicate with one or more devices that enable a user to interact with the electronic device 500, and/or communicate with Any device (such as a router, modem, etc.) that enables the electronic device 500 to communicate with one or more other computing devices. This communication can be performed through an input/output (I/O) interface 550. In addition, the electronic device 500 may also communicate with one or more networks (for example, a local area network (LAN), a wide area network (WAN), and/or a public network, such as the Internet) through the network adapter 560. As shown in the figure, the network adapter 560 communicates with other modules of the electronic device 500 through the bus 530. It should be understood that although not shown in the figure, other hardware and/or software modules can be used in conjunction with the electronic device 500, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives And data backup storage system, etc.

Through the description of the above embodiments, those skilled in the art can easily understand that the example embodiments described here can be implemented by software, or can be implemented by combining software with necessary hardware. Therefore, the technical solution according to the embodiments of the present application can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, U disk, mobile hard disk, etc.) or on the network , Including several instructions to make a computing device (which can be a personal computer, a server, a terminal device, or a network device, etc.) execute the method according to the embodiment of the present application.

In the exemplary embodiment of the present application, a computer-readable storage medium is also provided, on which a program product capable of implementing the above method of this specification is stored. In some possible implementation manners, various aspects of the present application can also be implemented in the form of a program product, which includes program code. When the program product runs on a terminal device, the program code is used to make the The terminal device executes the steps according to various exemplary embodiments of the present application described in the above-mentioned "Exemplary Method" section of this specification.

Referring to FIG. 9, a program product 600 for implementing the above method according to an embodiment of the present application is described. It can adopt a portable compact disk read-only memory (CD-ROM) and include program code, and can be installed in a terminal device, For example, running on a personal computer. However, the program product of this application is not limited to this. In this document, the readable storage medium can be any tangible medium that contains or stores a program, and the program can be used by or in combination with an instruction execution system, device, or device. Moreover, the readable storage medium may be non-volatile or volatile.

The program product can use any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may be, for example, but not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read only memory (ROM), erasable Type programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.

The computer-readable signal medium may include a data signal propagated in baseband or as a part of a carrier wave, and readable program code is carried therein. This propagated data signal can take many forms, including but not limited to electromagnetic signals, optical signals, or any suitable combination of the foregoing. The readable signal medium may also be any readable medium other than a readable storage medium, and the readable medium may send, propagate, or transmit a program for use by or in combination with the instruction execution system, apparatus, or device.

The program code contained on the readable medium can be transmitted by any suitable medium, including but not limited to wireless, wired, optical cable, RF, etc., or any suitable combination of the foregoing.

The program code used to perform the operations of the present application can be written in any combination of one or more programming languages. The programming languages include object-oriented programming languages—such as Java, C++, etc., as well as conventional procedural programming languages. Programming language-such as "C" language or similar programming language. The program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on. In the case of a remote computing device, the remote computing device can be connected to a user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or it can be connected to an external computing device (for example, using Internet service providers). Business to connect via the Internet).

In addition, the above-mentioned drawings are merely schematic illustrations of the processing included in the method according to the exemplary embodiments of the present application, and are not intended for limitation. It is easy to understand that the processing shown in the above drawings does not indicate or limit the time sequence of these processings. In addition, it is easy to understand that these processes can be executed synchronously or asynchronously in multiple modules, for example.

After considering the specification and practicing the invention disclosed herein, those skilled in the art will easily think of other embodiments of the present application. This application is intended to cover any variations, uses, or adaptive changes of this application. These variations, uses, or adaptive changes follow the general principles of this application and include common knowledge or customary technical means in the technical field that are not disclosed in this application. . The description and the embodiments are only regarded as exemplary, and the true scope and spirit of the application are pointed out by the claims.

Claims

A method for detecting key points of a human face, which includes:

Obtain an image to be annotated containing a human face;

Input the image to be annotated into the pre-trained key point annotation model, so that the key point annotation model outputs the heat map corresponding to the image to be annotated, predicted key point coordinates, and each point in the image to be annotated Occlusion confidence level;

Determining the target key point coordinates of the image to be annotated according to the heat map, the predicted key point coordinates, and the occlusion confidence;

According to the coordinate of the target key point, the key point is marked on the image to be marked.
The detection method according to claim 1, wherein, according to the heat map, the predicted key point coordinates, and the occlusion confidence, determining the target key point coordinates of the image to be annotated comprises:

Comparing the occlusion confidence of each point in the image to be labeled with a preset occlusion threshold, and determine the point to be processed with the occlusion confidence less than the occlusion threshold;

According to the point to be processed, the predicted key point coordinate corresponding to the point to be processed is selected from the predicted key point coordinate as the first key point coordinate;

According to the point to be processed, the coordinates of the highlight point other than the position corresponding to the point to be processed are selected from the heat map as the second key point coordinate;

The first key point coordinates and the second key point coordinates are integrated to determine the target key point coordinates of the image to be annotated.
The detection method according to claim 2, wherein the detection method further comprises:

Acquiring an occlusion training sample set, the occlusion training sample set includes a plurality of occlusion sample images, and the face in the occlusion sample image includes partial occlusion;

Inputting the occlusion sample images in the occlusion training sample set into the key point annotation model, so that the key point annotation model outputs the occlusion confidence of each point in the occlusion sample image;

The occlusion threshold is determined according to the occlusion confidence of each point in the multiple occlusion sample images.
The detection method according to claim 3, wherein determining the occlusion threshold according to the occlusion confidence of each point in the plurality of occlusion sample images includes:

From the occlusion confidence levels of each point in the multiple occlusion sample images, select the occlusion confidence level of the point corresponding to the unoccluded position in the occlusion sample image as the confidence level to be selected;

From the to-be-selected confidences, in descending order, select the occlusion confidences arranged at a predetermined ratio position as the occlusion threshold.
The detection method according to claim 1, wherein the detection method further comprises:

Acquiring a training sample set, where the training sample set includes a plurality of sample images including human faces, and the sample images include key point information;

The sample image is input into the key point annotation model to be trained, so that the key point annotation model outputs the heat map corresponding to the sample image, the predicted key point coordinates, and the occlusion of each point in the sample image Confidence;

Determine the target key point coordinates in the sample image according to the heat map corresponding to the sample image, the predicted key point coordinates, and the occlusion confidence of each point in the sample image;

Adjust the parameters in the key point labeling model to be trained so that the target key point coordinates in the sample image match the key point information.
The detection method according to claim 5, wherein adjusting the parameters in the key point labeling model to be trained so that the target key point coordinates in the sample image match the key point information comprises:

The training sample set is input into the key point labeling models to be trained with different learning rates, so that each key point labeling model outputs training data respectively, and the training data includes the heat map corresponding to each sample image and the predicted key point coordinates And the occlusion confidence of each point in the sample image;

Perform statistics based on multiple sets of the training data, and identify target training data from the multiple sets of training data;

Updating the key point information contained in the sample image according to the target training data to obtain updated key point information of the sample image;

The parameters in the key point labeling model to be trained are adjusted to make the target key point coordinates match the updated key point information.
A detection device for key points of a human face, which includes:

The acquisition module is used to acquire the to-be-labeled image containing the human face;

The processing module is used to input the image to be annotated into the pre-trained key point annotation model, so that the key point annotation model outputs the heat map corresponding to the image to be annotated, predicted key point coordinates, and the to be annotated The occlusion confidence of each point in the image;

A determining module, configured to determine the target key point coordinates of the image to be annotated according to the heat map, the predicted key point coordinates, and the occlusion confidence;

The marking module is used to mark the key points of the image to be marked according to the coordinates of the target key points.
The detection device according to claim 7, wherein the determining module comprises:

A determining unit, configured to compare the occlusion confidence of each point in the to-be-labeled image with a preset occlusion threshold, and determine the point to be processed whose occlusion confidence is less than the occlusion threshold;

The first selection unit is configured to select, from the predicted key point coordinates, the predicted key point coordinates corresponding to the point to be processed as the first key point coordinates according to the point to be processed;

The second selection unit is configured to select, from the heat map, the coordinates of the highlight point except for the position corresponding to the point to be processed as the second key point coordinate according to the point to be processed;

The integration unit is configured to integrate the first key point coordinates and the second key point coordinates to determine the target key point coordinates of the image to be annotated.
A computer-readable storage medium having a computer program stored thereon, wherein the implementation steps when the computer program is executed by a processor include:

Obtain an image to be annotated containing a human face;

Input the image to be annotated into the pre-trained key point annotation model, so that the key point annotation model outputs the heat map corresponding to the image to be annotated, predicted key point coordinates, and each point in the image to be annotated Occlusion confidence level;

Determining the target key point coordinates of the image to be annotated according to the heat map, the predicted key point coordinates, and the occlusion confidence;

According to the coordinate of the target key point, the key point is marked on the image to be marked.
The computer-readable storage medium according to claim 9, wherein the step of implementing the computer program when the computer program is executed by the processor further comprises:

Comparing the occlusion confidence of each point in the image to be labeled with a preset occlusion threshold, and determine the point to be processed with the occlusion confidence less than the occlusion threshold;

According to the point to be processed, the predicted key point coordinate corresponding to the point to be processed is selected from the predicted key point coordinate as the first key point coordinate;

According to the point to be processed, the coordinates of the highlight point other than the position corresponding to the point to be processed are selected from the heat map as the second key point coordinate;

The first key point coordinates and the second key point coordinates are integrated to determine the target key point coordinates of the image to be annotated.
The computer-readable storage medium according to claim 10, wherein the step of implementing the computer program when the computer program is executed by the processor further comprises:

Acquiring an occlusion training sample set, the occlusion training sample set includes a plurality of occlusion sample images, and the face in the occlusion sample image includes partial occlusion;

Inputting the occlusion sample images in the occlusion training sample set into the key point annotation model, so that the key point annotation model outputs the occlusion confidence of each point in the occlusion sample image;

The occlusion threshold is determined according to the occlusion confidence of each point in the multiple occlusion sample images.
The computer-readable storage medium according to claim 11, wherein the step of implementing the computer program when the computer program is executed by the processor further comprises:

From the occlusion confidence levels of each point in the multiple occlusion sample images, select the occlusion confidence level of the point corresponding to the unoccluded position in the occlusion sample image as the confidence level to be selected;

From the to-be-selected confidences, in descending order, select the occlusion confidences arranged at a predetermined ratio position as the occlusion threshold.
The computer-readable storage medium according to claim 9, wherein the step of implementing the computer program when the computer program is executed by the processor further comprises:

Acquiring a training sample set, where the training sample set includes a plurality of sample images including human faces, and the sample images include key point information;

The sample image is input into the key point annotation model to be trained, so that the key point annotation model outputs the heat map corresponding to the sample image, the predicted key point coordinates, and the occlusion of each point in the sample image Confidence;

Determine the target key point coordinates in the sample image according to the heat map corresponding to the sample image, the predicted key point coordinates, and the occlusion confidence of each point in the sample image;

Adjust the parameters in the key point labeling model to be trained so that the target key point coordinates in the sample image match the key point information.
The computer-readable storage medium according to claim 13, wherein the step of implementing when the computer program is executed by the processor further comprises:

The training sample set is input into the key point labeling models to be trained with different learning rates, so that each key point labeling model outputs training data respectively, and the training data includes the heat map corresponding to each sample image and the predicted key point coordinates And the occlusion confidence of each point in the sample image;

Perform statistics based on multiple sets of the training data, and identify target training data from the multiple sets of training data;

Updating the key point information contained in the sample image according to the target training data to obtain updated key point information of the sample image;

The parameters in the key point labeling model to be trained are adjusted to make the target key point coordinates match the updated key point information.
An electronic device, including:

Processor; and

A memory on which a computer program is stored;

Wherein, the steps that the processor is configured to be implemented by executing the computer program include:

Obtain an image to be annotated containing a human face;

Input the image to be annotated into the pre-trained key point annotation model, so that the key point annotation model outputs the heat map corresponding to the image to be annotated, predicted key point coordinates, and each point in the image to be annotated Occlusion confidence level;

Determining the target key point coordinates of the image to be annotated according to the heat map, the predicted key point coordinates, and the occlusion confidence;

According to the coordinate of the target key point, the key point is marked on the image to be marked.
The electronic device according to claim 15, wherein the step of configuring the processor to be implemented via execution of the computer program further comprises:

Comparing the occlusion confidence of each point in the to-be-annotated image with a preset occlusion threshold, and determine the point to be processed whose occlusion confidence is less than the occlusion threshold;

According to the point to be processed, the predicted key point coordinate corresponding to the point to be processed is selected from the predicted key point coordinate as the first key point coordinate;

According to the point to be processed, the coordinates of the highlight point other than the position corresponding to the point to be processed are selected from the heat map as the second key point coordinate;

The first key point coordinates and the second key point coordinates are integrated to determine the target key point coordinates of the image to be annotated.
The electronic device according to claim 16, wherein the step of configuring the processor to be implemented via execution of the computer program further comprises:

Acquiring an occlusion training sample set, the occlusion training sample set includes a plurality of occlusion sample images, and the face in the occlusion sample image includes partial occlusion;

Inputting the occlusion sample images in the occlusion training sample set into the key point annotation model, so that the key point annotation model outputs the occlusion confidence of each point in the occlusion sample image;

The occlusion threshold is determined according to the occlusion confidence of each point in the multiple occlusion sample images.
The electronic device according to claim 17, wherein the step of configuring the processor to be implemented via execution of the computer program further comprises:

From the occlusion confidence levels of each point in the multiple occlusion sample images, select the occlusion confidence level of the point corresponding to the unoccluded position in the occlusion sample image as the confidence level to be selected;

From the to-be-selected confidences, in descending order, select the occlusion confidences arranged at a predetermined ratio position as the occlusion threshold.
The electronic device according to claim 15, wherein the step of configuring the processor to be implemented via execution of the computer program further comprises:

Acquiring a training sample set, where the training sample set includes a plurality of sample images including human faces, and the sample images include key point information;

The sample image is input into the key point annotation model to be trained, so that the key point annotation model outputs the heat map corresponding to the sample image, the predicted key point coordinates, and the occlusion of each point in the sample image Confidence;

Determine the target key point coordinates in the sample image according to the heat map corresponding to the sample image, the predicted key point coordinates, and the occlusion confidence of each point in the sample image;

Adjust the parameters in the key point labeling model to be trained so that the target key point coordinates in the sample image match the key point information.
The electronic device according to claim 19, wherein the step of configuring the processor to be implemented via execution of the computer program further comprises:

The training sample set is input into the key point labeling models to be trained with different learning rates, so that each key point labeling model outputs training data respectively, and the training data includes the heat map corresponding to each sample image and the predicted key point coordinates And the occlusion confidence of each point in the sample image;

Perform statistics based on multiple sets of the training data, and identify target training data from the multiple sets of training data;

Updating the key point information contained in the sample image according to the target training data to obtain updated key point information of the sample image;

The parameters in the key point labeling model to be trained are adjusted to make the target key point coordinates match the updated key point information.