Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
Fig. 1 illustrates an exemplary system architecture 100 to which the method for generating a face keypoint detection model or the apparatus for generating a face keypoint detection model of the present application may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may be installed with various communication client applications, such as an image processing application, an information browsing application, a video recording application, a video playing application, a voice interaction application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices with display screens, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.
When the terminal devices 101, 102, 103 are hardware, an image capturing device may be mounted thereon. The image acquisition device can be various devices capable of realizing the function of acquiring images, such as a camera, a sensor and the like. The user can use the image capturing devices on the terminal devices 101, 102, 103 to capture facial images.
The server 105 may be a server that provides various services, such as a database server. The database server may store the sample set or obtain the sample set from other devices. A sample set may contain a plurality of samples. Wherein the sample may comprise a face image. In addition, the database server may further store a first pre-trained face keypoint detection model. The model can be obtained by training a complex network. The model has large parameters and size, and requires high computing resources (such as memory and a GPU (Graphics Processing Unit)).
The server 105 may train a second convolutional neural network with a simpler structure by using a machine learning method based on the sample set and the first face keypoint detection model, and send a training result (e.g., the generated lightweight second face keypoint detection model) to the terminal devices 101, 102, and 103. In this way, the terminal devices 101, 102, 103 may apply the second face keypoint detection model for face keypoint detection.
The server 105 may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.
It should be noted that the method for generating the face keypoint detection model provided by the embodiment of the present application is generally executed by the server 105, and accordingly, the apparatus for generating the face keypoint detection model is generally disposed in the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating a face keypoint detection model according to the present application is shown. The method for generating the face key point detection model comprises the following steps:
step 201, a sample set is obtained.
In this embodiment, the execution subject (e.g., the server 105 shown in fig. 1) of the method for generating the face keypoint detection model may acquire the sample set in various ways. For example, the executing entity may obtain the existing sample set stored therein from another server (e.g., a database server) for storing the samples through a wired connection or a wireless connection. As another example, a user may collect a sample via a terminal device (e.g., terminal devices 101, 102, 103 shown in FIG. 1). In this way, the execution entity may receive samples collected by the terminal and store the samples locally, thereby generating a sample set. It should be noted that the wireless connection means may include, but is not limited to, a 3G/4G connection, a WiFi connection, a bluetooth connection, a WiMAX connection, a Zigbee connection, a uwb (ultra wideband) connection, and other wireless connection means now known or developed in the future.
Here, the sample set may include a plurality of samples. Wherein the sample may comprise a face image. The face image may include an image intercepted after face detection of various images. For example, after face detection is performed on an image in the internet, position information (i.e., the position of the face detection frame) indicating the area where the face object is located is obtained. And (4) carrying out screenshot on the area where the face object is located, and obtaining a face image. In addition, the face image may also include an image in which the user directly photographs the face of the human body.
Step 202, inputting the face images in the sample set to a pre-trained first face key point detection model to obtain a face key point detection result of the input face images.
In this embodiment, the executing entity may extract samples from the sample set obtained in step 201, and input the face images in the extracted samples to a first face key point detection model trained in advance, so as to obtain a face key point detection result of the input face images. Wherein, the first face key point detection model can be used to detect the positions (which can be expressed by coordinates) of the face key points in the face image. Here, the face key point detection result may be position information of the face key point (e.g., coordinates of the face key point). In practice, the face key points may be key points in the face (e.g., points with semantic information, or points that affect the face contour or the shape of the five sense organs, etc.). For example, face keypoints may include, but are not limited to, corners of the eyes, corners of the mouth, points in the contour, and the like.
In this embodiment, the first face keypoint detection model may be obtained by performing supervised training on an existing model by using a machine learning method. Here, various existing models capable of extracting image features may be used for training. For example, models such as convolutional neural networks, deep neural networks, and the like may be used. In practice, a Convolutional Neural Network (CNN) is a feed-forward Neural Network, and its artificial neurons can respond to a part of surrounding cells within a coverage range, and have excellent performance on image processing, so that the Convolutional Neural Network can be used for extracting features of a sample image. Convolutional neural networks may include convolutional layers, pooling layers, fully-connected layers, and the like. Among other things, convolutional layers may be used to extract image features. The pooling layer may be used to down-sample (down sample) the incoming information.
In some optional implementations of this embodiment, the first face keypoint detection model may be obtained by training through the following steps: the method comprises the steps of taking a face image in a target sample set as input of a first convolutional neural network, taking label information of face key points in the input face image as output of the first convolutional neural network, training the first convolutional neural network by using a machine learning method, and determining the trained first convolutional neural network as a first face key point detection model.
Here, the target sample set used for training the first face keypoint detection model may be the sample set obtained in step 201, or may be another sample set, which is not limited herein. The target sample set may include a large number of samples. The sample can comprise a face image and a face key point position label of the face image.
Here, the first convolutional neural network for training the first face keypoint detection model may be established based on various existing structures (e.g., DenseBox, VGGNet, ResNet, SegNet, etc.). Also, the first convolutional neural network may use a more complex network structure. For example, a plurality of build-up layers (e.g., 6 or 10 layers), a plurality of pooling layers, a plurality of full-link layers, and the like may be provided. Wherein each convolution layer may be provided with a plurality of convolution kernels (filters). It should be noted that the convolutional neural network used for training the first face keypoint detection model may further include other layers as needed, and is not limited herein.
In some optional implementations of the embodiment, the first face keypoint detection model may be obtained by training in a manner of confrontational training. The execution main body can input the face images in the target sample set into a pre-trained initial face key point detection model, input a face key point detection result generated by the initial face key point detection model into a pre-established discrimination model, and perform countermeasure training on the discrimination model and the initial face key point detection model to obtain a first face key point detection model.
Specifically, in the process of performing countermeasure training on the pair discrimination model and the initial face key point detection model, the discrimination model and the initial face key point detection model may be separately and alternately iteratively trained. For example, the parameters of the initial face key point detection model may be fixed first, and a discrimination model may be trained for the first time; then, fixing the parameters of the discrimination model after the first training, and carrying out the first training on the initial face key point detection model; and then, fixing parameters of the initial face key point detection model after the first training, carrying out second training on the discrimination model after the first training, and so on, and taking the initial face key point detection model obtained after the final training as a final first face key point detection model.
Here, the initial face key point detection model may be used to perform preliminary face key point detection on the face image. Various existing face keypoint detection models trained by using a face image as a sample can be used as the initial face keypoint detection model. For example, the initial face keypoint detection model may be obtained by performing supervised training on an existing convolutional neural network structure (e.g., DenseBox, VGGNet, ResNet, SegNet, etc.) by using a machine learning method and a face image.
Here, the above-described discrimination model may be used to determine whether the face key point detection result input to the discrimination model is taken from a face image. In practice, if the discrimination model determines that the face key point detection result input thereto is taken from the face image, a certain preset numerical value (for example, 1) can be output; if it is determined that the feature information input thereto is not taken from the face image, another preset value (e.g., 0) may be output. It should be noted that the above discriminant Model may be various existing models (e.g., Naive Bayes Model (NBM), Support Vector Machine (SVM), or neural network including fully connected layers (FCs)) that can implement the classification function.
And 203, using a machine learning method to input the face images in the sample set, using the face key point detection results of the input face images as output, and training to obtain a second face key point detection model.
In this embodiment, the executing agent may use a machine learning method to input the face images in the sample set, output the face keypoint detection result of the input face images, and train to obtain the second face keypoint detection model. Here, various existing models (which may be referred to as initial models) may be used for training. For example, models such as convolutional neural networks, deep neural networks, and the like may be used.
Here, the execution subject may extract samples in the sample set one by one for training. After each training, the initial model is updated. And then training the updated initial model by using the next sample until the initial model achieves the expected effect (for example, the predicted result is the same as or similar to the result output by the first face key point detection model). At this time, the final initial model may be determined as the second face keypoint detection model.
In addition, the execution subject may simultaneously extract a plurality of samples in the sample set and perform training using the extracted plurality of samples. After each training, the initial model is updated. And extracting a plurality of samples next time to train the updated initial model until the model achieves the expected effect.
In some optional implementations of this embodiment, the complexity of the model structure of the second face keypoint detection model may be less than the complexity of the model structure of the first face keypoint detection model. Here, the complexity may be characterized by the number of layers of the model, the number of parameters, and the like. For example, the fewer the number of layers, the less the complexity; the fewer the number of parameters, the less complex.
In some optional implementations of the present embodiment, the samples in the sample set further include annotation information of key points of a face in the face image. The labeling information may include coordinates of each face key point. After obtaining the face key point detection result of each face image output by the first face key point detection model, the executing body may further perform the following operations:
first, for a sample in the sample set, the executing entity may perform similarity calculation between a face key point detection result of a face image in the sample and annotation information in the sample. Here, the similarity calculation may be performed using various similarity calculation methods. Such as euclidean distance, cosine theorem, jackard similarity measure methods, etc.
Specifically, for each face key point in each sample, a distance between the coordinates of the face key point in the face key point detection result and the coordinates of the face key point in the annotation information may be calculated first. Then, the sum or the average of the distances of the face key points can be determined as the similarity calculation result of the sample.
In the second step, the execution subject may delete the sample whose similarity calculation result is smaller than a preset value from the sample set, so as to update the sample set. Here, the preset may be a numerical value preset by a technician through a large amount of data statistics and experiments.
In some optional implementation manners of this embodiment, the executing entity may obtain, on the basis of the implementation manner, the second face keypoint detection model by using the updated training set and training by using a machine learning method.
With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for generating a face keypoint detection model according to the present embodiment. In the application scenario of fig. 3, a terminal device 301 used by a user (e.g., a technician) may have a model training class application installed thereon in the application scenario of fig. 3. After the user opens the application and uploads the sample set or the storage path of the sample set, the server 302 providing background support for the application may run a method for generating a face keypoint detection model, including:
first, a sample set may be obtained. Wherein the samples in the sample set may comprise face images 303. Thereafter, the face images 303 in the sample set may be input to the first face keypoint detection model 304 trained in advance, and the face keypoint detection results 305 of the input face images may be obtained. Then, a machine learning method may be used to train the second face keypoint detection model 306 by using the face images in the sample set as input and using the face keypoint detection result 305 of the input face images as output.
The method provided by the above embodiment of the present application may extract a sample from the sample set by obtaining the sample set to perform training of the second convolutional neural network. Wherein, the samples in the sample set comprise face images. Therefore, the face images in the sample set are input to the first face key point detection model trained in advance, and the face key point detection result of the input face images can be obtained. Then, by using a machine learning method, the face images in the sample set are used as input, and the face key point detection results of the input face images are used as output, so that a second face key point detection model can be obtained through training. Therefore, a model for detecting the key points of the human face can be obtained, and the method enriches the generation modes of the model.
In addition, the face key point detection result output by the trained first face key point detection model is used as a label pair to train the second face key point detection model, so that the second face key point detection model can learn from the trained and well-represented first face key point detection model in the training process. Therefore, compared with a model obtained by only carrying out supervised learning through a sample set, the second face key point detection model obtained through training improves the accuracy rate of face key point detection. In addition, the trained second face key point detection model can have a lightweight structure and can be suitable for a mobile terminal.
With further reference to fig. 4, a flow 400 of yet another embodiment of a method for generating a face keypoint detection model is shown. The process 400 of the method for generating a face keypoint detection model comprises the following steps:
step 401, a sample set is obtained.
In this embodiment, the execution subject of the method for generating a face keypoint detection model (e.g., the server 105 shown in fig. 1) may obtain a sample set. Here, the sample set may include a plurality of samples. The samples can include face images and labeling information of face key points in the face images. Here, the annotation information may be used to indicate the location of the face keypoints in the face image. The annotation information may include coordinates of each face keypoint.
Step 402, inputting the face images in the sample set into a pre-trained first face key point detection model to obtain a face key point detection result of the input face images.
In this embodiment, the executing entity may extract samples from the sample set obtained in step 401, and input the face images in the extracted samples to a first face key point detection model trained in advance, so as to obtain a face key point detection result of the input face images.
In this embodiment, the first face keypoint detection model may be obtained by training through the following steps: the method comprises the steps of taking a face image in a target sample set as input of a first convolutional neural network, taking label information of face key points in the input face image as output of the first convolutional neural network, training the first convolutional neural network by using a machine learning method, and determining the trained first convolutional neural network as a first face key point detection model.
Here, the target sample set used for training the first face keypoint detection model may be the sample set obtained in step 401, or may be another sample set, which is not limited herein. The target sample set may include a large number of samples. The sample can comprise a face image and a face key point position label of the face image.
Here, the first convolutional neural network for training the first face keypoint detection model may be established based on various existing structures (e.g., DenseBox, VGGNet, ResNet, SegNet, etc.). Also, the first convolutional neural network may have a more complex network structure. For example, multiple convolutional layers (e.g., 6, or 10 layers, etc.), multiple pooling layers, multiple fully-connected layers, etc. may be included. Wherein each convolutional layer may be provided with a plurality of convolutional kernels. It should be noted that the convolutional neural network used for training the first face keypoint detection model may further include other layers as needed, and is not limited herein.
And 403, for the samples in the sample set, performing similarity calculation on the face key point detection result of the face image in the sample and the labeling information in the sample.
In this embodiment, for a sample in the sample set, the executing entity may perform similarity calculation between a face key point detection result of a face image in the sample and annotation information in the sample. Here, the similarity calculation may be performed using various similarity calculation methods. Such as euclidean distance, cosine theorem, jackard similarity measure methods, etc. Specifically, for each face key point in each sample, a distance between the coordinates of the face key point in the face key point detection result and the coordinates of the face key point in the annotation information may be calculated first. Then, the sum or the average of the distances of the face key points can be determined as the similarity calculation result of the sample.
And step 404, deleting the samples in the sample set, of which the similarity calculation result is smaller than a preset value, so as to update the sample set.
In this embodiment, the execution subject may delete the sample whose similarity calculation result is smaller than a preset value from the sample set, so as to update the sample set. Here, the preset may be a numerical value preset by a technician through a large amount of data statistics and experiments.
At step 405, samples are extracted from the updated set of samples.
In this embodiment, the execution subject may extract a sample from the updated sample set. Here, the manner of extracting the sample and the number of samples to be extracted are not limited in the present application. For example, at least one sample may be randomly extracted, or a sample with better definition (i.e., higher pixels of the face image) from which the face image is extracted. Then, the training steps of steps 406 to 409 may be performed as follows.
Step 406, inputting the face image in the extracted sample to a second convolutional neural network, so as to obtain information output by the second convolutional neural network.
In this embodiment, the executing entity may input the face image in the extracted sample to the second convolutional neural network, and obtain information output by the second convolutional neural network (for example, coordinates of key points of the face predicted by the second convolutional neural network).
Here, the second convolutional neural network may be a convolutional neural network using various existing structures (e.g., DenseBox, VGGNet, ResNet, SegNet, etc.). Also, the network structure of the second convolutional neural network may be simpler. For example, a small number of convolutional layers (e.g., one or two layers), a small number of pooling layers (e.g., one or two layers), and fully connected layers may be included. It should be noted that the convolutional neural network used for training the first face keypoint detection model may further include other layers as needed, and is not limited herein.
It should be noted that, in general, a more complex model structure may have a better performance (e.g., a higher accuracy of face key point detection) than a simpler model structure after training by using the same data and method. Meanwhile, the model trained by the more complex model structure occupies more computing resources due to more parameters and complex computing process, and has slower computing speed. Therefore, the model trained by the more complex model structure is not suitable for being deployed at the mobile terminal. On the contrary, the model trained by using the simpler model structure has fewer parameters and simple calculation process, only occupies fewer calculation resources, has higher processing speed and can be deployed at a mobile terminal for use. However, if the sample set is used directly for training, the trained model usually performs poorly (e.g., the accuracy of face keypoint detection is low).
It can be understood that the first convolutional neural network used for training the first face keypoint detection model, where the number of convolutional layers, the number of pooling layers, the number of fully-connected layers, the number of convolutional cores, the number of parameters, and the like may be set according to actual requirements, and are not limited herein. The set configuration may allow the trained first face keypoint detection model to achieve a desired performance (e.g., a recognition accuracy reaching a set value). Meanwhile, the number of convolutional layers, the number of pooled layers, the number of fully-connected layers, the number of convolutional cores, and the number of parameters in the second convolutional neural network may be set according to factors such as computational resources available from the mobile terminal, and the like, which is not limited herein.
Step 407, inputting the information output by the second convolutional neural network and the face key point detection result of the input face image into a pre-established loss function to obtain the loss value of the extracted sample.
In this embodiment, the executing entity may input the information output by the second convolutional neural network and the face keypoint detection result of the input face image (i.e., the face keypoint detection result of the face image output by the first face keypoint detection model) to a loss function (loss function) established in advance, so as to obtain the loss value of the extracted sample. Here, the value of the loss function (i.e., the loss value) may be used to characterize the degree of difference between the information (e.g., coordinates of the key points of the face) output by the second convolutional neural network and the labeling information. The loss function is a non-negative real-valued function. In general, the smaller the value of the loss function (loss value), the better the robustness of the model. The loss function may be set according to actual requirements.
Step 408, determining whether the second convolutional neural network is trained based on the comparison of the loss value and the target value.
In this embodiment, the executing entity may determine whether the training of the second convolutional neural network is completed based on the comparison between the loss value and the target value. Here, if there are a plurality of (at least two) samples extracted in the first step in the present implementation, the execution subject may compare the loss value of each sample with the target value, respectively. It can thus be determined whether the loss value for each sample is less than or equal to the target value. As an example, if there are multiple samples taken in the first step in this implementation, the execution subject may determine that the second convolutional neural network training is complete if the loss value of each sample is less than or equal to the target value. As another example, the executive may count the proportion of samples with a loss value less than or equal to the target value to the samples taken. And when the ratio reaches a preset sample ratio (e.g., 95%), it can be determined that the second convolutional neural network training is complete. It should be noted that the target value may be generally used as an ideal case of representing the degree of inconsistency between the predicted value and the true value. That is, when the loss value is less than or equal to the target value, the predicted value may be considered to be close to or approximate the true value. The target value may be set according to actual demand.
It is noted that, in response to determining that the second convolutional neural network has been trained, the following step 409 may be performed. In response to determining that the second convolutional neural network is not trained, parameters in the second convolutional neural network may be updated based on the determined loss values, samples are re-extracted from the sample set, and the training step is continued using the updated parameter second convolutional neural network as the second convolutional neural network. It should be noted that the extraction method is not limited in this application. For example, in the case where there are a large number of samples in the sample set, the execution subject may extract a sample from which it has not been extracted.
Here, the gradient of the loss value with respect to the model parameters may be found using a back propagation algorithm, and then the model parameters may be updated based on the gradient using a gradient descent algorithm. It should be noted that the back propagation algorithm, the gradient descent algorithm, and the machine learning method are well-known technologies that are currently widely researched and applied, and are not described herein again.
And 409, in response to the fact that the training of the second convolutional neural network is completed, determining the trained second convolutional neural network as a second face key point detection model.
In this embodiment, in response to determining that the training of the second convolutional neural network is completed, the executing entity may determine the trained second convolutional neural network as the second face keypoint detection model.
In this embodiment, since the structure of the second convolutional neural network used for training the second face keypoint detection model is simpler than the structure of the first convolutional neural network used for training the first face keypoint detection model, the complexity of the model structure of the second face keypoint detection model is less than the complexity of the model structure of the first face keypoint detection model.
As can be seen from fig. 4, compared with the embodiment corresponding to fig. 2, the process 400 of the method for generating a face keypoint detection model in this embodiment involves a step of deleting samples in a training set, and a step of training a second convolutional neural network to obtain a second face keypoint detection model. Because the structure of the second convolutional neural network used for training the second face key point detection model is simpler than the structure of the first convolutional neural network used for training the first face key point detection model, the second face key point detection model is used for detecting the face key points, and compared with the first face key point detection model with a complex structure, the detection efficiency can be improved, the occupancy rate of computing resources is reduced, and the method can be suitable for deployment of a mobile terminal.
With further reference to fig. 5, as an implementation of the method shown in the above figures, the present application provides an embodiment of an apparatus for generating a face keypoint detection model, where the apparatus embodiment corresponds to the method embodiment shown in fig. 2, and the apparatus may be applied to various electronic devices in particular.
As shown in fig. 5, the apparatus 500 for generating a face keypoint detection model according to the present embodiment includes: an obtaining unit 501 configured to obtain a sample set, where samples in the sample set include a face image; an input unit 502 configured to input the face images in the sample set to a first face key point detection model trained in advance, and obtain a face key point detection result of the input face images; the training unit 503 is configured to train the face images in the sample set as input and the face key point detection results of the input face images as output by using a machine learning method to obtain a second face key point detection model.
In some optional implementation manners of this embodiment, the samples in the sample set may further include annotation information of the face key points in the face image. And the apparatus may further include a calculation unit and a deletion unit (not shown in the figure). The calculating unit may be configured to perform similarity calculation on a face key point detection result of a face image in a sample of the sample set and annotation information in the sample, for the sample. The deleting unit may be configured to delete the samples of which the similarity calculation result is smaller than a preset value from the sample set, so as to update the sample set.
In some optional implementations of this embodiment, the training unit 503 may be further configured to: extracting samples from the updated sample set, and executing the following training steps: inputting the face image in the extracted sample into a second convolutional neural network to obtain information output by the second convolutional neural network; inputting information output by the second convolutional neural network and a face key point detection result of the input face image into a pre-established loss function to obtain a loss value of the extracted sample; determining whether the second convolutional neural network is trained based on the comparison between the loss value and the target value; and in response to determining that the training of the second convolutional neural network is completed, determining the trained second convolutional neural network as a second face key point detection model.
In some optional implementations of this embodiment, the training unit 503 may be further configured to: and in response to determining that the second convolutional neural network is not trained, updating parameters in the second convolutional neural network based on the loss values, re-extracting samples from the sample set, and continuing to execute the training step by using the second convolutional neural network with the updated parameters as the second convolutional neural network.
In some optional implementations of this embodiment, the first face keypoint detection model may be obtained by training through the following steps: the method comprises the steps of taking a face image in a target sample set as input of a first convolutional neural network, taking label information of face key points in the input face image as output of the first convolutional neural network, training the first convolutional neural network by using a machine learning method, and determining the trained convolutional neural network as a first face key point detection model.
In some optional implementations of this embodiment, the complexity of the model structure of the second face keypoint detection model is less than the complexity of the model structure of the first face keypoint detection model.
The apparatus provided by the above embodiment of the present application obtains the sample set through the obtaining unit 501, and may extract a sample therefrom to perform training of the second convolutional neural network. Wherein, the samples in the sample set comprise face images. In this way, the input unit 502 inputs the face images in the sample set to the first face keypoint detection model trained in advance, and can obtain the face keypoint detection result of the input face images. Next, the training unit 503 takes the face images in the sample set as input and takes the face key point detection result of the input face images as output by using a machine learning method, so as to obtain a second face key point detection model through training. Therefore, a model for detecting the key points of the human face can be obtained, and the generation mode of the model is enriched.
Referring to fig. 6, a flowchart 600 of an embodiment of a method for detecting face keypoints provided by the present application is shown. The method for detecting the key points of the human face can comprise the following steps:
step 601, obtaining a face image to be detected.
In the present embodiment, an executing subject (for example, the terminal devices 101, 102, 103 shown in fig. 1) of the method for detecting a face key point may acquire a face image to be detected. The image to be detected may be acquired by an image acquisition device such as a camera mounted on the execution main body, or may be acquired by the execution main body from the internet or other electronic devices. Here, the acquisition position of the image to be detected is not limited.
Step 602, inputting the face image to be detected into the second face key point detection model, and generating a face key point detection result of the face image to be detected.
In this embodiment, the execution subject may store a second face keypoint detection model. The executing body may input the image to be detected obtained in step 601 into the second face key point detection model, so as to obtain a face key point detection result of the face image to be detected (i.e. position information of the face key point in the face image to be detected).
In this embodiment, the second face keypoint detection model may be generated by the method described in the embodiment of fig. 2. For a specific generation process, reference may be made to the related description of the embodiment in fig. 2, which is not described herein again.
It should be noted that the method for detecting face keypoints according to the present embodiment may be used to test the second face keypoint detection model generated in the foregoing embodiments. And then the second face key point detection model can be continuously optimized according to the test result. The method may also be a practical application method of the second face keypoint detection model generated by the above embodiments. The second face key point detection model generated by the embodiments is adopted to detect the face key points, which is beneficial to improving the performance of face key point detection. The second face key point detection model is a lightweight model, so that the second face key point detection model can be deployed at a mobile terminal for use.
With continuing reference to fig. 7, as an implementation of the method illustrated in fig. 6 described above, the present application provides an embodiment of an apparatus for detecting key points of a human face. The embodiment of the device corresponds to the embodiment of the method shown in fig. 6, and the device can be applied to various electronic devices.
As shown in fig. 7, the apparatus 700 for detecting key points of a human face according to this embodiment includes: an acquisition unit 701 configured to acquire a face image to be detected; the generating unit 702 is configured to input the face image to be detected into the second face keypoint detection model generated by using the method described in the embodiment of fig. 2, and generate a face keypoint detection result of the face image to be detected.
It will be understood that the elements described in the apparatus 700 correspond to various steps in the method described with reference to fig. 6. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 700 and the units included therein, and will not be described herein again.
Referring now to FIG. 8, shown is a block diagram of a computer system 800 suitable for use in implementing the electronic device of an embodiment of the present application. The electronic device shown in fig. 8 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.
As shown in fig. 8, the computer system 800 includes a Central Processing Unit (CPU)801 that can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)802 or a program loaded from a storage section 808 into a Random Access Memory (RAM) 803. In the RAM 803, various programs and data necessary for the operation of the system 800 are also stored. The CPU 801, ROM 802, and RAM 803 are connected to each other via a bus 804. An input/output (I/O) interface 805 is also connected to bus 804.
The following components are connected to the I/O interface 805: an input portion 806 including a touch screen, a touch pad, and the like; an output section 807 including a display such as a Liquid Crystal Display (LCD) and a speaker; a storage portion 808 including a hard disk and the like; and a communication section 809 including a network interface card such as a LAN card, a modem, or the like. The communication section 809 performs communication processing via a network such as the internet. A drive 810 is also connected to the I/O interface 805 as necessary. A removable medium 811 such as a semiconductor memory or the like is mounted on the drive 810 as necessary, so that a computer program read out therefrom is mounted into the storage portion 808 as necessary.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 809 and/or installed from the removable medium 811. The computer program performs the above-described functions defined in the method of the present application when executed by the Central Processing Unit (CPU) 801. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present application may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit, an input unit, and a training unit. Where the names of these units do not in some cases constitute a limitation of the unit itself, for example, the acquisition unit may also be described as a "unit acquiring a sample set".
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: obtaining a sample set; inputting the face images in the sample set into a pre-trained first face key point detection model to obtain a face key point detection result of the input face images; and training to obtain a second face key point detection model by using a machine learning method and taking the face images in the sample set as input and the face key point detection results of the input face images as output.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.