CN109858445B

CN109858445B - Method and apparatus for generating a model

Info

Publication number: CN109858445B
Application number: CN201910099403.9A
Authority: CN
Inventors: 邓启力
Original assignee: Beijing ByteDance Network Technology Co Ltd
Current assignee: Douyin Vision Co Ltd; Douyin Vision Beijing Co Ltd
Priority date: 2019-01-31
Filing date: 2019-01-31
Publication date: 2021-06-25
Anticipated expiration: 2039-01-31
Also published as: CN109858445A

Abstract

Embodiments of the present disclosure disclose methods and apparatus for generating models. One embodiment of the method comprises: acquiring a training sample set; selecting training samples from the training sample set, and executing the following training steps: inputting a sample face image in a training sample into a feature extraction layer of an initial neural network to obtain image features; inputting the image characteristics into a first sub-network of an initial neural network to generate face key point information; inputting the face key point information and the image characteristics into a second sub-network of the initial neural network to obtain the deviation of the face key point information; determining expected deviation of the face key point information based on the face key point information and sample face key point information in the training sample; determining whether the initial neural network is trained based on the deviation and the expected deviation; in response to determining that the training is complete, determining the trained initial neural network as a face keypoint recognition model. The embodiment is beneficial to realizing more accurate face key point detection.

Description

Method and apparatus for generating a model

Technical Field

Embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and an apparatus for generating a model.

Background

With the development of face recognition technology, face key point detection technology is also widely applied, for example, for special effect addition, face three-dimensional model construction, and the like.

The face key points refer to points with obvious semantic discrimination in the face. At present, the process of detecting key points of a human face generally includes inputting a human face image to be detected into a pre-trained human face key point detection model to obtain a detection result.

Disclosure of Invention

The embodiment of the disclosure provides a method and a device for generating a model and a method and a device for processing a face image.

In a first aspect, an embodiment of the present disclosure provides a method for generating a model, the method including: acquiring a training sample set, wherein the training sample comprises a sample face image and sample face key point information labeled in advance aiming at the sample face image; selecting training samples from a set of training samples, and performing the following training steps: inputting a sample face image in the selected training sample into a feature extraction layer of an initial neural network to obtain image features; inputting the obtained image features into a first sub-network of an initial neural network to generate face key point information of a sample face image; inputting the generated face key point information and the image characteristics into a second sub-network of the initial neural network to obtain the deviation corresponding to the face key point information; determining expected deviation corresponding to the generated face key point information based on the face key point information and sample face key point information in the training sample; determining whether the training of the initial neural network is finished or not based on the deviation corresponding to the face key point information and the expected deviation; in response to determining that the training is complete, determining the trained initial neural network as a face keypoint recognition model.

In some embodiments, the second subnetwork comprises a first generation layer and a second generation layer; inputting the generated face key point information and the image characteristics into a second sub-network of the initial neural network to obtain the deviation corresponding to the face key point information, wherein the deviation comprises the following steps: inputting the generated face key point information into a first generated layer of a second sub-network, and obtaining a hotspot graph corresponding to the face key point information, wherein an image area of the hotspot graph comprises a numerical value set, and for numerical values in the numerical value set, the numerical values are used for representing the probability that the face key points are located at the positions of the numerical values; and inputting the obtained hotspot graph and the image characteristics into a second generation layer of a second sub-network to obtain the deviation corresponding to the face key point information.

In some embodiments, the method further comprises: in response to determining that the initial neural network is untrained, adjusting relevant parameters in the initial neural network, selecting training samples from the set of training samples that were not selected, and continuing the training step using the most recently adjusted initial neural network and the most recently selected training samples.

In a second aspect, an embodiment of the present disclosure provides a method for processing a face image, the method including: acquiring a target face image; inputting a target face image into a face key point recognition model generated by adopting the method of any embodiment in the method described in the first aspect, and generating face key point information corresponding to the target face image and a deviation corresponding to the face key point information; and generating result face key point information corresponding to the target face image based on the generated face key point information and the deviation.

In a third aspect, an embodiment of the present disclosure provides an apparatus for generating a model, the apparatus including: the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a training sample set, wherein the training sample comprises a sample face image and sample face key point information which is pre-labeled aiming at the sample face image; a training unit configured to select training samples from a set of training samples, and to perform the following training steps: inputting a sample face image in the selected training sample into a feature extraction layer of an initial neural network to obtain image features; inputting the obtained image features into a first sub-network of an initial neural network to generate face key point information of a sample face image; inputting the generated face key point information and the image characteristics into a second sub-network of the initial neural network to obtain the deviation corresponding to the face key point information; determining expected deviation corresponding to the generated face key point information based on the face key point information and sample face key point information in the training sample; determining whether the training of the initial neural network is finished or not based on the deviation corresponding to the face key point information and the expected deviation; in response to determining that the training is complete, determining the trained initial neural network as a face keypoint recognition model.

In some embodiments, the second subnetwork comprises a first generation layer and a second generation layer; and the training unit is further configured to: inputting the generated face key point information into a first generated layer of a second sub-network, and obtaining a hotspot graph corresponding to the face key point information, wherein an image area of the hotspot graph comprises a numerical value set, and for numerical values in the numerical value set, the numerical values are used for representing the probability that the face key points are located at the positions of the numerical values; and inputting the obtained hotspot graph and the image characteristics into a second generation layer of a second sub-network to obtain the deviation corresponding to the face key point information.

In some embodiments, the apparatus further comprises: an adjustment unit configured to adjust relevant parameters in the initial neural network in response to determining that the initial neural network is not trained, select a training sample from the set of training samples that was not selected, and continue to perform the training step using the most recently adjusted initial neural network and the most recently selected training sample.

In a fourth aspect, an embodiment of the present disclosure provides an apparatus for processing a face image, the apparatus including: an image acquisition unit configured to acquire a target face image; a first generating unit, configured to input a target face image into a face keypoint identification model generated by using the method of any one of the embodiments of the method described in the first aspect, and generate face keypoint information corresponding to the target face image and a deviation corresponding to the face keypoint information; and the second generation unit is configured to generate result face key point information corresponding to the target face image based on the generated face key point information and the deviation.

In a fifth aspect, an embodiment of the present disclosure provides an electronic device, including: one or more processors; a storage device having one or more programs stored thereon; when executed by one or more processors, cause the one or more processors to implement a method as in any one of the embodiments of the method described in the first and second aspects.

In a sixth aspect, embodiments of the present disclosure provide a computer-readable medium on which a computer program is stored, which program, when executed by a processor, implements the method of any of the embodiments of the methods described in the first and second aspects above.

According to the method and the device for generating the model, provided by the embodiment of the disclosure, a training sample set is obtained, wherein the training sample comprises a sample face image and sample face key point information labeled in advance aiming at the sample face image, then the training sample is selected from the training sample set, and the following training steps are performed: inputting a sample face image in the selected training sample into a feature extraction layer of an initial neural network to obtain image features; inputting the obtained image features into a first sub-network of an initial neural network to generate face key point information of a sample face image; inputting the generated face key point information and the image characteristics into a second sub-network of the initial neural network to obtain the deviation corresponding to the face key point information; determining expected deviation corresponding to the generated face key point information based on the face key point information and sample face key point information in the training sample; determining whether the training of the initial neural network is finished or not based on the deviation corresponding to the face key point information and the expected deviation; and in response to the fact that the training is determined to be completed, determining the trained initial neural network as a face key point recognition model, so that the generated face key point recognition model can simultaneously predict face key point information of a face image and the deviation of the predicted face key point information, and is beneficial to generating more accurate result face key point information by using the face key point information and the deviation predicted by the model, and realizing more accurate face key point detection.

Drawings

Other features, objects and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is an exemplary system architecture diagram in which one embodiment of the present disclosure may be applied;

FIG. 2 is a flow diagram of one embodiment of a method for generating a model according to the present disclosure;

FIG. 3 is a schematic illustration of one application scenario of a method for generating a model according to an embodiment of the present disclosure;

FIG. 4 is a flow diagram of yet another embodiment of a method for processing images of human faces according to the present disclosure;

FIG. 5 is a schematic diagram of an embodiment of an apparatus for generating models according to the present disclosure;

FIG. 6 is a schematic block diagram of one embodiment of an apparatus for processing face images according to the present disclosure;

FIG. 7 is a schematic block diagram of a computer system suitable for use with an electronic device implementing embodiments of the present disclosure.

Detailed Description

The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

Fig. 1 shows an exemplary system architecture 100 to which embodiments of the disclosed method for generating a model or apparatus for generating a model, and method for processing a face image or apparatus for processing a face image may be applied.

As shown in fig. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the

terminal devices

101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.

The user may use the

terminal devices

101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The

terminal devices

101, 102, 103 may have installed thereon various communication client applications, such as image processing-type software, web browser applications, search-type applications, instant messaging tools, social platform software, and the like.

The

terminal apparatuses

101, 102, and 103 may be hardware or software. When the

terminal devices

101, 102, 103 are hardware, they may be various electronic devices, including but not limited to smart phones, tablet computers, electronic book readers, MP3 players (Moving Picture Experts Group Audio Layer III, mpeg Audio Layer 3), MP4 players (Moving Picture Experts Group Audio Layer IV, mpeg Audio Layer 4), laptop portable computers, desktop computers, and the like. When the

terminal apparatuses

101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules to provide distributed services) or as a single piece of software or software module. And is not particularly limited herein.

The server 105 may be a server providing various services, and may be a model training server performing model training using a training sample set uploaded by the

terminal devices

101, 102, 103, for example. The model training server can perform model training by using the obtained training sample set to generate a face key point identification model. In addition, after the face key point recognition model is obtained through training, the server may send the face key point recognition model to the

terminal devices

101, 102, and 103, or may perform face key point recognition on the face image by using the face key point recognition model.

It should be noted that the method for generating the model provided by the embodiment of the present disclosure may be executed by the server 105, or may be executed by the

terminal devices

101, 102, and 103, and accordingly, the apparatus for generating the model may be disposed in the server 105, or may be disposed in the

terminal devices

101, 102, and 103. In addition, the method for processing the face image provided by the embodiment of the present disclosure may be executed by the server 105, and may also be executed by the

terminal devices

101, 102, and 103, and accordingly, the apparatus for processing the face image may be disposed in the server 105, and may also be disposed in the

terminal devices

101, 102, and 103.

The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules (e.g., multiple pieces of software or software modules used to provide distributed services), or as a single piece of software or software module. And is not particularly limited herein.

It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. In the case that the training sample set required for training the model is not required to be acquired from a remote place, or the target face image to be processed is not required to be acquired from a remote place, the system architecture may not include a network, and only a server or a terminal device is required.

With continued reference to FIG. 2, a flow 200 of one embodiment of a method for generating a model according to the present disclosure is shown. The method for generating the model comprises the following steps:

step 201, a training sample set is obtained.

In this embodiment, an executing subject (e.g., a server shown in fig. 1) of the method for generating a model may obtain the training sample set through a wired connection manner or a wireless connection manner. The training sample comprises a sample face image and sample face key point information which is labeled in advance aiming at the sample face image. The sample face image may be an image obtained by photographing a sample face. The sample face key point information is used to characterize the position of the sample face key point in the sample face image, and may include, but is not limited to, at least one of the following: numbers, words, symbols, images. For example, the sample face keypoint information may be keypoint coordinates used to characterize the position of the sample face keypoints in the sample face image. Here, the key point coordinates may be coordinates in a coordinate system established in advance based on the sample face image.

In practice, the face key points may be key points in the face, and specifically, may be points that affect the face contour or the shape of five sense organs. As an example, the face key points may be points corresponding to the tip of the nose, points corresponding to the eyes, and the like.

Specifically, the executing entity may obtain a training sample set stored locally in advance, or may obtain a training sample set transmitted by an electronic device (e.g., the terminal device shown in fig. 1) connected in communication.

Step 202, selecting training samples from the training sample set, and executing the following training steps: inputting a sample face image in the selected training sample into a feature extraction layer of an initial neural network to obtain image features; inputting the obtained image features into a first sub-network of an initial neural network to generate face key point information of a sample face image; inputting the generated face key point information and the image characteristics into a second sub-network of the initial neural network to obtain the deviation corresponding to the face key point information; determining expected deviation corresponding to the generated face key point information based on the face key point information and sample face key point information in the training sample; determining whether the training of the initial neural network is finished or not based on the deviation corresponding to the face key point information and the expected deviation; in response to determining that the training is complete, determining the trained initial neural network as a face keypoint recognition model.

In this embodiment, based on the training sample set obtained in step 201, the executing entity may select a training sample from the training sample set, and execute the following training steps (steps 2021 to 2026):

step 2021, inputting the face image of the selected sample in the training sample into the feature extraction layer of the initial neural network, and obtaining the image features.

The image features may be the color, shape, and other features of the image. The initial neural network is a variety of predetermined neural networks (e.g., convolutional neural networks) used to generate the face keypoint recognition model. The face key point identification model can be used for identifying face key points corresponding to the face image. Here, the initial neural network may be an untrained neural network, or may be a trained but untrained neural network. Specifically, the initial neural network includes a feature extraction layer. The feature extraction layer is used for extracting image features of the input face image. Specifically, the feature extraction layer includes a structure (e.g., a convolutional layer) capable of extracting features of the image, and may also include other structures (e.g., a pooling layer), which is not limited herein.

Step 2022, inputting the obtained image features into the first sub-network of the initial neural network, and generating face key point information of the sample face image.

In this embodiment, the initial neural network further comprises a first sub-network. The first sub-network is connected with the feature extraction layer and used for generating the key point information of the human face based on the image features output by the feature extraction layer. The generated face key point information is the face key point information predicted by the first sub-network and corresponding to the sample face image.

It can be understood that in practice, errors often exist between the measured values and the real values, so that the face key point information predicted by using the initial neural network is usually different from the actual face key point information.

It should be noted that, here, the first sub-network may include a structure (e.g., a classifier, a full connection layer) for generating a result (face key point information), and may further include a structure (e.g., an output layer) for outputting the result (face key point information).

Step 2023, inputting the generated face key point information and image features into a second sub-network of the initial neural network, and obtaining a deviation corresponding to the face key point information.

In this embodiment, the initial neural network further includes a second sub-network, and the second sub-network is connected to the first sub-network and the feature extraction layer, respectively, and is configured to determine, based on the image features output by the feature extraction layer, a deviation corresponding to the face keypoint information output by the first sub-network. The deviation corresponding to the face key point information is used for representing the difference of the generated face key point information relative to the actual face key point information of the sample face image. Here, the variance generated by the second sub-network is a variance predicted based on the image feature.

It should be noted that, here, the second sub-network may include a structure (e.g., a classifier, a full connection layer) for generating a result (a deviation corresponding to the face keypoint information), and may further include a structure (e.g., an output layer) for outputting the result (a deviation corresponding to the face keypoint information).

In some alternative implementations of this embodiment, the second subnetwork includes a first generation layer and a second generation layer; the executing body can obtain the deviation corresponding to the face key point information through the following steps:

firstly, inputting the generated face key point information into a first generation layer of a second sub-network, and obtaining a hot spot map corresponding to the face key point information.

The first generation layer is connected with the first sub-network and used for generating a hot spot graph corresponding to the face key point information based on the face key point information output by the first sub-network. Here, the image area of the hotspot graph includes a set of numerical values. For the values in the value set, the values are used for representing the probability that the face key point is located at the position of the value. The shape and size of the hot spot image and the sample face image are the same. Furthermore, the positions of the numerical values in the hotspot graph may correspond to the positions in the sample face image, and therefore, the hotspot graph may be used to indicate the positions of the face key points in the sample face image.

It should be noted that the hotspot graph may include at least one value set, where each value set in the at least one value set may correspond to one piece of face keypoint information.

Specifically, the value at the position in the hot spot map corresponding to the position in the sample face image characterized by the face key point information may be 1. According to the distance between each position in the heat point diagram and the position corresponding to the numerical value 1, the numerical value corresponding to each position can be gradually reduced. I.e. the farther from the position corresponding to the numerical value 1, the smaller the corresponding numerical value.

It should be noted that the position of the value in the hotspot graph can be determined by the smallest rectangle used to surround the value. Specifically, the center position of the minimum rectangle may be determined as the position where the numerical value is located, or the end point position of the minimum rectangle may be determined as the position where the numerical value is located.

Then, the obtained hotspot graph and the image features are input into a second generation layer of a second sub-network, and the deviation corresponding to the face key point information is obtained.

The second generation layer is connected with the first generation layer and the feature extraction layer respectively and used for determining the deviation corresponding to the face key point information input into the first generation layer based on the image features output by the feature extraction layer and the hot-spot graph output by the first generation layer.

In the implementation mode, the hot spot map can indicate the position range of the key points of the face, and compared with the key point information of the face, the position characteristics of the key points of the face can be more accurately and simply regressed by using the hot spot map, so that the deviation corresponding to the key point information of the face can be more quickly and accurately generated.

Step 2024, determining an expected deviation corresponding to the generated face keypoint information based on the face keypoint information and the sample face keypoint information in the training sample.

Here, the execution principal may determine a difference between the face keypoint information generated by the first sub-network and the pre-labeled sample face keypoint information, and then determine the determined difference as the expected deviation.

As an example, the face keypoint information generated by the first subnetwork is the coordinates "(10, 19)". The sample face key point information is coordinates "(11, 18)". The expected deviation may be (-1,1), where-1-10-11, to characterize the difference in abscissa; 1-19-18, for characterizing the difference in ordinate.

Step 2025, determining whether the initial neural network is trained based on the deviation corresponding to the face key point information and the expected deviation.

Here, the execution subject may calculate a difference between a deviation corresponding to the obtained face key point information and an expected deviation by using a preset loss function, further determine whether the calculated difference is less than or equal to a preset difference threshold, and determine that the initial neural network training is completed in response to determining that the calculated difference is less than or equal to the preset difference threshold. The preset loss function may be various loss functions, such as a norm of L2 or a euclidean distance.

Step 2026, in response to determining that the training is completed, determining the trained initial neural network as the face keypoint recognition model.

In this embodiment, the executing entity may determine the trained initial neural network as the face keypoint recognition model in response to determining that the training of the initial neural network is completed.

In some optional implementations of this embodiment, the executing entity may further adjust relevant parameters in the initial neural network in response to determining that the initial neural network is not trained, select a training sample from the unselected training samples in the training sample set, and continue to execute the training steps (steps 2021-2026) using the most recently adjusted initial neural network and the most recently selected training sample.

In particular, the execution subject may adjust the relevant parameters of the initial neural network based on the calculated difference in response to determining that the initial neural network is not trained. Here, the relevant parameters of the initial neural network may be adjusted based on the calculated difference in various ways. For example, a BP (Back Propagation) algorithm or an SGD (Stochastic Gradient Descent) algorithm may be used to adjust the relevant parameters of the initial neural network.

With continued reference to fig. 3, fig. 3 is a schematic diagram of an application scenario of the method for generating a model according to the present embodiment. In the application scenario of fig. 3, the server 301 may first obtain a training sample set 302, where the training sample includes a sample face image and sample face key point information pre-labeled for the sample face image. The sample face key point information is used for representing the positions of sample face key points in the sample face image. For example, the sample face keypoint information may be coordinates of sample face keypoints.

The server 301 may then select training samples 3021 from the training sample set 302 and perform the following training steps: inputting a sample face image 30211 in the selected training sample 3021 into a feature extraction layer 3031 of the initial neural network 303, to obtain an image feature 304; inputting the obtained image features 304 into a first sub-network 3032 of the initial neural network 303 to generate face keypoint information 305 of the sample face image 30211; inputting the generated face key point information 305 and the image features 304 into a second sub-network 3033 of the initial neural network 303 to obtain a deviation 306 corresponding to the face key point information 305; determining an expected deviation 307 corresponding to the generated face key point information 305 based on the face key point information 305 and the sample face key point information 30212 in the training sample 3021; determining whether the initial neural network 303 is trained completely based on the deviation 306 corresponding to the face key point information 305 and the expected deviation 307; in response to determining that the training is complete, the trained initial neural network 303 is determined to be the face keypoint recognition model 308.

The face key point identification model generated by the method provided by the embodiment of the disclosure can simultaneously predict the face key point information of the face image and the deviation of the predicted face key point information, and is beneficial to generating more accurate result face key point information by using the face key point information and the deviation predicted by the model, so as to realize more accurate face key point detection.

With further reference to fig. 4, a flow 400 of yet another embodiment of a method for processing a face image is shown. The process 400 of the method for processing a face image comprises the following steps:

step 401, a target face image is obtained.

In this embodiment, an execution subject (for example, a server shown in fig. 1) of the method for processing a face image may acquire a target face image by a wired connection manner or a wireless connection manner. The target face image may be a face image to be subjected to face key point recognition. The key points of the human face are key points in the human face, and specifically, the key points can be points which influence the facial contour or the shape of five sense organs.

In this embodiment, the executing subject may adopt various methods to acquire the target face image. Specifically, the executing entity may obtain a target face image stored locally in advance, or the executing entity may obtain a target face image sent by an electronic device (for example, the terminal device shown in fig. 1) connected in communication.

Step 402, inputting the target face image into a pre-trained face key point recognition model, and generating face key point information corresponding to the target face image and a deviation corresponding to the face key point information.

In this embodiment, based on the target face image obtained in step 401, the execution subject may input the target face image into a pre-trained face key point recognition model, and generate face key point information corresponding to the target face image and a deviation corresponding to the face key point information. The face key point information is used for characterizing the position of the face key point in the target face image, and may include but is not limited to at least one of the following: characters, numbers, symbols, images. The deviation corresponding to the face key point information can be used for representing a prediction error when the face key point information corresponding to the target face image is predicted by using the face key point recognition model.

In this embodiment, the face keypoint identification model is generated according to the method described in the embodiment corresponding to fig. 2, and is not described herein again.

Step 403, generating result face key point information corresponding to the target face image based on the generated face key point information and the deviation.

In this embodiment, based on the face key point information and the deviation generated in step 402, the execution subject may generate the result face key point information corresponding to the target face image. And the result human face key point information is the human face key point information after error compensation is carried out on the human face key point information output by the human face key point identification model.

In practice, prediction errors usually exist when the model is used for information prediction, and error compensation is performed on a prediction result, so that the compensated result is closer to a real result, and the accuracy of information prediction is improved.

As an example, the face key point information output by the face key point recognition model may be coordinates (10,19) of the face key points. The deviation of the face key point recognition model output can be (0.2, -0.5), wherein "0.2" can be used for representing the error of the abscissa in the coordinates of the face key point output by the face key point recognition model; "-0.5" may be used to characterize the error of the ordinate in the coordinates of the face keypoints output by the face keypoint recognition model. Furthermore, the execution subject may obtain a difference between an abscissa "10" of the coordinates of the key points of the face and an error "0.2" of the abscissa, and obtain an error-compensated abscissa "9.8"; the ordinate "19" and the error "-0.5" of the ordinate in the coordinates of the face key point are subtracted to obtain the error-compensated ordinate "19.5", and finally, the execution subject may compose the result face key point information "(9.8, 19.5)" (i.e., the error-compensated coordinates of the face key point) using the error-compensated abscissa "9.8" and the ordinate "19.5".

According to the method provided by the embodiment of the disclosure, the target face image is obtained, then the target face image is input into the pre-trained face key point recognition model, the face key point information corresponding to the target face image and the deviation corresponding to the face key point information are generated, and finally the result face key point information corresponding to the target face image is generated based on the generated face key point information and the deviation, so that more accurate result face key point information can be generated based on the face key point information and the deviation output by the face key point recognition model, and the accuracy of face key point recognition is improved.

With further reference to fig. 5, as an implementation of the method shown in fig. 2, the present disclosure provides an embodiment of an apparatus for generating a model, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2, and the apparatus may be applied to various electronic devices.

As shown in fig. 5, the apparatus 500 for generating a model of the present embodiment includes: an acquisition unit 501 and a training unit 502. The obtaining unit 501 is configured to obtain a training sample set, where a training sample includes a sample face image and sample face key point information pre-labeled for the sample face image; the training unit 502 is configured to select training samples from a set of training samples and to perform the following training steps: inputting a sample face image in the selected training sample into a feature extraction layer of an initial neural network to obtain image features; inputting the obtained image features into a first sub-network of an initial neural network to generate face key point information of a sample face image; inputting the generated face key point information and the image characteristics into a second sub-network of the initial neural network to obtain the deviation corresponding to the face key point information; determining expected deviation corresponding to the generated face key point information based on the face key point information and sample face key point information in the training sample; determining whether the training of the initial neural network is finished or not based on the deviation corresponding to the face key point information and the expected deviation; in response to determining that the training is complete, determining the trained initial neural network as a face keypoint recognition model.

In this embodiment, the obtaining unit 501 of the apparatus 500 for generating a model may obtain the training sample set through a wired connection manner or a wireless connection manner. The training sample comprises a sample face image and sample face key point information which is labeled in advance aiming at the sample face image. The sample face image may be an image obtained by photographing a sample face. The sample face key point information is used to characterize the position of the sample face key point in the sample face image, and may include, but is not limited to, at least one of the following: numbers, words, symbols, images.

In practice, the face key points may be key points in the face, and specifically, may be points that affect the face contour or the shape of five sense organs.

In this embodiment, based on the training sample set obtained by the obtaining unit 501, the training unit 502 may select training samples from the training sample set, and perform the following training steps (steps 5021-5026):

step 5021, inputting the sample face images in the selected training samples into a feature extraction layer of an initial neural network to obtain image features.

The image features may be the color, shape, and other features of the image. The initial neural network is a variety of predetermined neural networks (e.g., convolutional neural networks) used to generate the face keypoint recognition model. The face key point identification model can be used for identifying face key points corresponding to the face image. Here, the initial neural network may be an untrained neural network, or may be a trained but untrained neural network. Specifically, the initial neural network includes a feature extraction layer. The feature extraction layer is used for extracting image features of the input face image.

Step 5022, inputting the obtained image features into a first sub-network of the initial neural network, and generating face key point information of the sample face image.

Step 5023, the generated face key point information and the image characteristics are input into a second sub-network of the initial neural network, and the deviation corresponding to the face key point information is obtained.

Step 5024, based on the face key point information and the sample face key point information in the training sample, the expected deviation corresponding to the generated face key point information is determined.

Here, the training unit 502 may determine a difference between the face keypoint information generated by the first sub-network and the pre-labeled sample face keypoint information, and then determine the determined difference as the expected deviation.

Step 5025, whether the initial neural network is trained or not is determined based on the deviation corresponding to the face key point information and the expected deviation.

Here, the training unit 502 may calculate a difference between a deviation corresponding to the obtained face key point information and an expected deviation by using a preset loss function, further determine whether the calculated difference is less than or equal to a preset difference threshold, and determine that the initial neural network training is completed in response to determining that the calculated difference is less than or equal to the preset difference threshold.

Step 5026, in response to the fact that the training is completed, the trained initial neural network is determined as the face key point recognition model.

In this embodiment, the training unit 502 may determine the trained initial neural network as the face keypoint recognition model in response to determining that the training of the initial neural network is completed.

In some alternative implementations of this embodiment, the second subnetwork includes a first generation layer and a second generation layer; and training unit 502 may be further configured to: inputting the generated face key point information into a first generated layer of a second sub-network, and obtaining a hotspot graph corresponding to the face key point information, wherein an image area of the hotspot graph comprises a numerical value set, and for numerical values in the numerical value set, the numerical values are used for representing the probability that the face key points are located at the positions of the numerical values; and inputting the obtained hotspot graph and the image characteristics into a second generation layer of a second sub-network to obtain the deviation corresponding to the face key point information.

In some optional implementations of this embodiment, the apparatus 500 may further include: an adjusting unit (not shown in the figures) configured to adjust relevant parameters in the initial neural network in response to determining that the initial neural network is not trained, to select training samples from the set of training samples that were not selected, and to continue to perform the training step using the most recently adjusted initial neural network and the most recently selected training samples.

It will be understood that the elements described in the apparatus 500 correspond to various steps in the method described with reference to fig. 2. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 500 and the units included therein, and are not described herein again.

The face key point recognition model generated by the device 500 provided by the above embodiment of the present disclosure can predict the face key point information of the face image and the deviation of the predicted face key point information at the same time, which is helpful for generating more accurate result face key point information by using the face key point information and the deviation predicted by the model, and realizing more accurate face key point detection.

With further reference to fig. 6, as an implementation of the method shown in fig. 4, the present disclosure provides an embodiment of an apparatus for processing a face image, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 4, and the apparatus may be applied to various electronic devices.

As shown in fig. 6, the apparatus 600 for processing a face image of the present embodiment includes: an image acquisition unit 601, a first generation unit 602, and a second generation unit 603. Wherein the image acquisition unit 601 is configured to acquire a target face image; the first generating unit 602 is configured to input a target face image into a face key point recognition model generated by the method described in the embodiment corresponding to fig. 2, and generate face key point information corresponding to the target face image and a deviation corresponding to the face key point information; the second generating unit 603 is configured to generate the result face keypoint information corresponding to the target face image based on the generated face keypoint information and the deviation.

In this embodiment, the image acquiring unit 601 of the apparatus 600 for processing a face image may acquire the target face image by a wired connection or a wireless connection. The target face image may be a face image to be subjected to face key point recognition. The key points of the human face are key points in the human face, and specifically, the key points can be points which influence the facial contour or the shape of five sense organs.

In this embodiment, based on the target face image obtained by the image obtaining unit 601, the first generating unit 602 may input the target face image into a pre-trained face keypoint recognition model, and generate the face keypoint information corresponding to the target face image and the deviation corresponding to the face keypoint information. The face key point information is used for characterizing the position of the face key point in the target face image, and may include but is not limited to at least one of the following: characters, numbers, symbols, images. The deviation corresponding to the face key point information can be used for representing a prediction error when the face key point information corresponding to the target face image is predicted by using the face key point recognition model.

In this embodiment, the second generating unit 603 may generate the result face keypoint information corresponding to the target face image based on the face keypoint information and the deviation generated by the first generating unit 602. And the result human face key point information is the human face key point information after error compensation is carried out on the human face key point information output by the human face key point identification model.

It will be understood that the elements described in the apparatus 600 correspond to various steps in the method described with reference to fig. 4. Thus, the operations, features and resulting advantages described above with respect to the method are also applicable to the apparatus 600 and the units included therein, and are not described herein again.

The apparatus 600 provided in the above embodiment of the present disclosure generates the deviation corresponding to the face key point information and the face key point information corresponding to the target face image by acquiring the target face image, inputting the target face image into the pre-trained face key point recognition model, and finally generates the result face key point information corresponding to the target face image based on the generated face key point information and deviation, so that more accurate result face key point information can be generated based on the face key point information and deviation output by the face key point recognition model, and the accuracy of face key point recognition is improved.

Referring now to fig. 7, a schematic diagram of an electronic device (e.g., the server or terminal device of fig. 1) 700 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a fixed terminal such as a digital TV, a desktop computer, and the like. The terminal device or the server shown in fig. 7 is only an example, and should not bring any limitation to the functions and the use range of the embodiments of the present disclosure.

As shown in fig. 7, electronic device 700 may include a processing means (e.g., central processing unit, graphics processor, etc.) 701 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from storage 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data necessary for the operation of the electronic apparatus 700 are also stored. The processing device 701, the ROM 702, and the RAM703 are connected to each other by a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

Generally, the following devices may be connected to the I/O interface 705: input devices 706 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 707 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 708 including, for example, magnetic tape, hard disk, etc.; and a communication device 709. The communication means 709 may allow the electronic device 700 to communicate wirelessly or by wire with other devices to exchange data. While fig. 7 illustrates an electronic device 700 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 7 may represent one device or may represent multiple devices as desired.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such embodiments, the computer program may be downloaded and installed from a network via the communication means 709, or may be installed from the storage means 708, or may be installed from the ROM 702. The computer program, when executed by the processing device 701, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a training sample set, wherein the training sample comprises a sample face image and sample face key point information labeled in advance aiming at the sample face image; selecting training samples from a set of training samples, and performing the following training steps: inputting a sample face image in the selected training sample into a feature extraction layer of an initial neural network to obtain image features; inputting the obtained image features into a first sub-network of an initial neural network to generate face key point information of a sample face image; inputting the generated face key point information and the image characteristics into a second sub-network of the initial neural network to obtain the deviation corresponding to the face key point information; determining expected deviation corresponding to the generated face key point information based on the face key point information and sample face key point information in the training sample; determining whether the training of the initial neural network is finished or not based on the deviation corresponding to the face key point information and the expected deviation; in response to determining that the training is complete, determining the trained initial neural network as a face keypoint recognition model.

Further, the one or more programs, when executed by the electronic device, may further cause the electronic device to: acquiring a target face image; inputting a target face image into a pre-trained face key point recognition model to generate face key point information corresponding to the target face image and a deviation corresponding to the face key point information; and generating result face key point information corresponding to the target face image based on the generated face key point information and the deviation.

Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an acquisition unit and a training unit. Where the names of these units do not in some cases constitute a limitation of the unit itself, for example, the acquisition unit may also be described as a "unit that acquires a training sample set".

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the embodiments of the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is made without departing from the inventive concept as defined above. For example, the above features and (but not limited to) technical features with similar functions disclosed in the embodiments of the present disclosure are mutually replaced to form the technical solution.

Claims

1. A method for generating a model, comprising:

acquiring a training sample set, wherein the training sample comprises a sample face image and sample face key point information labeled in advance aiming at the sample face image;

selecting training samples from the set of training samples, and performing the following training steps: inputting a sample face image in the selected training sample into a feature extraction layer of an initial neural network to obtain image features; inputting the obtained image features into a first sub-network of an initial neural network to generate face key point information of a sample face image; inputting the generated face key point information and the image characteristics into a second sub-network of the initial neural network to obtain the deviation corresponding to the face key point information; determining expected deviation corresponding to the generated face key point information based on the face key point information and sample face key point information in the training sample; determining whether the training of the initial neural network is finished or not based on the deviation corresponding to the face key point information and the expected deviation; in response to determining that the training is complete, determining the trained initial neural network as a face keypoint recognition model.

2. The method of claim 1, wherein the second subnetwork comprises a first generation layer and a second generation layer; and

the step of inputting the generated face key point information and the image features into a second sub-network of the initial neural network to obtain the deviation corresponding to the face key point information includes:

inputting the generated face key point information into a first generated layer of a second sub-network, and obtaining a hotspot graph corresponding to the face key point information, wherein an image area of the hotspot graph comprises a numerical value set, and for numerical values in the numerical value set, the numerical values are used for representing the probability that the face key points are located at the positions of the numerical values;

and inputting the obtained hotspot graph and the image characteristics into a second generation layer of a second sub-network to obtain the deviation corresponding to the face key point information.

3. The method of claim 1, wherein the method further comprises:

in response to determining that the initial neural network is not trained, adjusting relevant parameters in the initial neural network, selecting training samples from the set of training samples that were not selected, and continuing the training step using the most recently adjusted initial neural network and the most recently selected training samples.

4. A method for processing a face image, comprising:

acquiring a target face image;

inputting the target face image into a face key point recognition model generated by adopting the method as claimed in one of claims 1 to 3, and generating face key point information corresponding to the target face image and a deviation corresponding to the face key point information;

and generating result face key point information corresponding to the target face image based on the generated face key point information and the deviation.

5. An apparatus for generating a model, comprising:

the system comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire a training sample set, wherein the training sample comprises a sample face image and sample face key point information which is pre-labeled aiming at the sample face image;

a training unit configured to select training samples from the set of training samples and to perform the following training steps: inputting a sample face image in the selected training sample into a feature extraction layer of an initial neural network to obtain image features; inputting the obtained image features into a first sub-network of an initial neural network to generate face key point information of a sample face image; inputting the generated face key point information and the image characteristics into a second sub-network of the initial neural network to obtain the deviation corresponding to the face key point information; determining expected deviation corresponding to the generated face key point information based on the face key point information and sample face key point information in the training sample; determining whether the training of the initial neural network is finished or not based on the deviation corresponding to the face key point information and the expected deviation; in response to determining that the training is complete, determining the trained initial neural network as a face keypoint recognition model.

6. The apparatus of claim 5, wherein the second subnetwork comprises a first generation layer and a second generation layer; and

the training unit is further configured to:

7. The apparatus of claim 5, wherein the apparatus further comprises:

an adjustment unit configured to adjust relevant parameters in the initial neural network in response to determining that the initial neural network is not trained, select training samples from the set of training samples that were not selected, and continue performing the training step using the most recently adjusted initial neural network and the most recently selected training samples.

8. An apparatus for processing a face image, comprising:

an image acquisition unit configured to acquire a target face image;

a first generating unit, configured to input the target face image into the face key point recognition model generated by the method according to any one of claims 1 to 3, and generate face key point information corresponding to the target face image and a deviation corresponding to the face key point information;

and the second generation unit is configured to generate result face key point information corresponding to the target face image based on the generated face key point information and the deviation.

9. An electronic device, comprising:

one or more processors;

a storage device having one or more programs stored thereon,

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-4.

10. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-4.