CN107239758B

CN107239758B - Method and device for positioning key points of human face

Info

Publication number: CN107239758B
Application number: CN201710373996.4A
Authority: CN
Inventors: 杨松
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2017-05-24
Filing date: 2017-05-24
Publication date: 2022-03-08
Anticipated expiration: 2037-05-24
Also published as: CN107239758A

Abstract

The disclosure relates to a method and a device for positioning face key points, which are used for improving the positioning accuracy of the face key points. The method comprises the following steps: determining a first projection matrix T and a first face shape component coefficient set alpha of a three-dimensional image by adopting a first convolution neural network, wherein the three-dimensional image is a three-dimensional deformable face image fitted with a two-dimensional face image to be recognized; and determining a first face key point P of the two-dimensional face image to be recognized according to the first T and the first alpha.

Description

Method and device for positioning key points of human face

Technical Field

The present disclosure relates to the field of face recognition technologies, and in particular, to a method and an apparatus for locating key points of a face.

Background

The positioning of the key points of the human face is that after the human face is detected, the key points of the face, including eyes, eyebrows, nose, mouth, face contour and the like, need to be further determined. The application of the technology is widely applied, such as automatic face recognition, expression recognition, automatic face animation synthesis and the like.

In the related technology, human face key points are detected by adopting modes such as a deformable template, a point distribution model, a graph model, cascade shape regression and the like. These related arts are all methods based on a two-dimensional face image.

However, under the influence of the factors of the pose change, the expression, the illumination and the face shielding of the face image, the positioning accuracy of the method based on the key point positioning of the two-dimensional face image is greatly influenced.

Disclosure of Invention

In order to overcome the problems in the related art, the present disclosure provides a method and an apparatus for locating key points of a human face.

According to a first aspect of the embodiments of the present disclosure, a method for locating a face key point is provided, which includes:

determining a first projection matrix T and a first face shape component coefficient set alpha of a three-dimensional image by adopting a first convolution neural network, wherein the three-dimensional image is a three-dimensional deformable face image fitted with a two-dimensional face image to be recognized;

and determining a first face key point P of the two-dimensional face image to be recognized according to the first T and the first alpha.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: the accuracy of determining the face key points based on the three-dimensional deformable face model is high, the projection matrix and the face shape component coefficient set are pre-estimated by utilizing the first convolution neural network, so that more accurate face key points are obtained, and the accuracy of positioning the face key points is improved.

In an embodiment, after determining the first face key point P of the two-dimensional face image to be recognized according to the first T and the first α, the method further includes:

determining a second T by adopting a second convolution neural network according to the image of the adjacent area of the first P;

and determining a second P of the two-dimensional face image to be recognized according to the second T and the first alpha.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: and further optimizing the projection matrix through a second convolutional neural network, further determining the key points of the human face again, and further improving the accuracy of positioning the key points of the human face.

determining a second P of the two-dimensional face image to be recognized according to the second T and the first alpha;

determining a second alpha by adopting a third convolutional neural network according to the image of the adjacent area of the second P;

and determining a third P of the two-dimensional face image to be recognized according to the second T and the second alpha.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: and further, the projection matrix and the face shape component coefficient set are optimized through a second convolutional neural network and a third convolutional neural network respectively, so that face key points are determined again, and the accuracy of face key point positioning is further improved.

determining a third alpha by adopting a third convolutional neural network according to the image of the adjacent area of the first P;

and determining a fourth P of the two-dimensional face image to be recognized according to the first T and the third alpha.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects: and further optimizing the face shape component coefficient set through a third convolutional neural network, further determining the face key points again, and further improving the accuracy of positioning the face key points.

In an embodiment, the determining the first projection matrix T and the first face shape component coefficient α of the three-dimensional image using the first convolution neural network includes:

performing regression calculation on the three-dimensional image by adopting a first convolution neural network to obtain the first T and the first alpha of the three-dimensional image;

the determining the face key point P of the two-dimensional face image to be recognized according to the first T and the first α includes:

using a formula based on the first T and the first alpha

Obtaining a first P, T of the two-dimensional face image as a projection matrix from the three-dimensional image to the two-dimensional face image, m^indexIs the shape vector of the average face keypoint P in the three-dimensional image,

for the ith individual face shape component, alpha, in the key points of the face in the three-dimensional image_iThe ith shape component coefficient is the first alpha, and n is the number of pixels of the face key point P in the three-dimensional image.

According to a second aspect of the embodiments of the present disclosure, there is provided an apparatus for locating a face key point, including:

the first determining module is used for determining a first projection matrix T and a first human face shape component coefficient set alpha of a three-dimensional image by adopting a first convolution neural network, wherein the three-dimensional image is a three-dimensional deformable human face image fitted by a two-dimensional human face image to be recognized;

and the first identification module is used for determining a first face key point P of the two-dimensional face image to be identified according to the first T and the first alpha.

In one embodiment, the apparatus further comprises:

a second determining module, configured to determine a second T by using a second convolutional neural network according to the image of the adjacent region of the first P;

and the second identification module is used for determining a second P of the two-dimensional face image to be identified according to the second T and the first alpha.

In one embodiment, the apparatus further comprises:

the second identification module is used for determining a second P of the two-dimensional face image to be identified according to the second T and the first alpha;

a third determining module, configured to determine a second α using a third convolutional neural network according to the image of the adjacent area of the second P;

and the third identification module is used for determining a third P of the two-dimensional face image to be identified according to the second T and the second alpha.

In one embodiment, the apparatus further comprises:

a fourth determining module, configured to determine a second T using a second convolutional neural network according to the image of the adjacent area of the first P;

and the fourth identification module is used for determining a fourth P of the two-dimensional face image to be identified according to the first T and the third alpha.

In one embodiment, the first determining module comprises:

the first determining submodule is used for performing regression calculation on the three-dimensional image by adopting a first convolution neural network to obtain the first T and the first alpha of the three-dimensional image;

the first identification module comprises: a first identification submodule for utilizing a formula based on said first T and said first α

for the ith individual face shape component, alpha, in the key points of the face in the three-dimensional image_iThe ith shape component coefficient is the first alpha, and n is the number of pixels of the key point of the human face in the three-dimensional image.

According to a third aspect of the embodiments of the present disclosure, there is provided a device for locating face key points, including:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

According to a fourth aspect of embodiments of the present disclosure, there is provided a computer-readable storage medium having a computer program stored thereon, wherein the program, when executed by a processor, implements the steps of:

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

Fig. 1 is a flowchart illustrating a method for face keypoint localization according to an exemplary embodiment.

Fig. 2 is a flowchart illustrating a method for face keypoint localization according to an exemplary embodiment.

Fig. 3 is a flowchart illustrating a method for face keypoint localization according to an exemplary embodiment.

Fig. 4 is a flowchart illustrating a method for face keypoint localization according to an exemplary embodiment.

Fig. 5 is a block diagram illustrating an apparatus for face keypoint localization according to an exemplary embodiment.

Fig. 6 is a block diagram illustrating an apparatus for face keypoint localization according to an exemplary embodiment.

Fig. 7 is a block diagram illustrating an apparatus for face keypoint localization according to an exemplary embodiment.

Fig. 8 is a block diagram illustrating an apparatus for face keypoint localization according to an exemplary embodiment.

Fig. 9 is a block diagram illustrating an apparatus for face keypoint localization according to an exemplary embodiment.

Fig. 10 is a block diagram illustrating an apparatus for face keypoint localization according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The implementations described in the exemplary embodiments below are not intended to represent all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present disclosure, as detailed in the appended claims.

Due to the influence of factors such as the change of the human face posture, the change of the expression, the illumination, the human face shielding and the like, the positioning effect is not ideal and the accuracy is low when a method for positioning the human face key points based on a two-dimensional human face image is adopted, such as a deformable template, a point distribution model, a graph model, a cascade shape regression and the like. The invention provides a method for positioning human face key points, which aims to utilize a three-dimensional deformable human face model to accurately position the human face key points, and adopts the powerful image feature extraction capability of a Convolutional Neural Network (CNN) to obtain a projection matrix and a human face shape component coefficient set of a three-dimensional image, so as to determine the human face key points of a two-dimensional human face image according to the projection matrix and the human face shape component coefficient set, thereby effectively improving the positioning accuracy of the human face key points.

The three-dimensional deformable human face model can be represented by a formula (2):

a represents a three-dimensional image of a human face, m represents an average shape of the three-dimensional human face, and alpha_iI-th individual face shape component coefficient, omega, representing the set of face shape component coefficients alpha_iAnd the number of pixel points of the three-dimensional face image is represented by I personal face shape component of the three-dimensional face image.

The three-dimensional face average shape m is a known amount, and is a face average shape obtained through learning training. Omega_iIs the I-person face shape component known from learning training, and is also a known quantity. Alpha is alpha_iThe i-th face shape component coefficient is represented, and the face shape component coefficient of different persons is different and is an unknown quantity. Generally, the number I of pixel points of the three-dimensional face image is 10000, and the position of each point representing the face is fixed.

After the face three-dimensional image is determined, when face key points of the two-dimensional face image need to be calculated, the three-dimensional face image needs to be converted into the two-dimensional face image. Equation (3) is used to represent the conversion of a three-dimensional face image into a two-dimensional face image:

B＝TA(3)

wherein, B represents a two-dimensional face image, and T represents a projection matrix of a three-dimensional image projection two-dimensional image.

Substituting equation (2) into equation (3) yields equation (4):

due to m, omega_iAnd I are both known quantities, then only the projection matrix T and I face shape component coefficients alpha need to be derived if a two-dimensional face image is to be derived_i. Generally, since the positions of faces represented by I pixel points representing three-dimensional face images are fixed, in order to reduce the amount of calculation, only the pixel points included in the key points of the faces in the three-dimensional face images may be calculated, that is, the formula (1) represents:

wherein P represents a face key point of a two-dimensional face image, m^indexRepresenting the average human face shape of pixel points included by the key points of the human face in the three-dimensional human face image,

is the ith individual face shape component, alpha, in the key points of the face in the three-dimensional face image_iThe coefficient of the ith shape component in the face key point in the three-dimensional image is shown, and n is the pixel number of the face key point in the three-dimensional face image.

In a preferred embodiment, the number n of pixels of the key points of the face in the three-dimensional face image is 68.

Based on the above explanation, when the face key point P of the two-dimensional face image is located by using the three-dimensional face image, it is necessary to calculate the projection matrix T of the three-dimensional face image projected onto the two-dimensional face image and the coefficient α of the n personal face shape components included in the face key point face shape component coefficient set α in the three-dimensional image_i. In particularAs shown in fig. 1, a flowchart of a method for locating face key points is provided, where the method may be implemented by an electronic device, and includes the following steps:

s101, determining a first projection matrix T and a first human face shape component coefficient set alpha of a three-dimensional image by adopting a first convolution neural network, wherein the three-dimensional image is a three-dimensional deformable human face image fitted by a two-dimensional human face image to be recognized.

In an embodiment, a first convolution neural network is used to perform regression calculation on the three-dimensional image, so as to obtain the first T and the first α of the three-dimensional image.

The first convolution neural network is a convolution neural network which inputs a large number of two-dimensional face image samples for learning training, establishes two-dimensional face images as input, and outputs a projection matrix T of the two-dimensional face images and corresponding three-dimensional face images and a face shape component coefficient set alpha of the corresponding three-dimensional face images.

S102, determining a first face key point P of the two-dimensional face image to be recognized according to the first T and the first alpha.

Based on formula (1), in view of the powerful image feature extraction capability of the convolutional neural network, the first convolutional neural network is used for estimating a first projection matrix T and a first face shape component coefficient set alpha of a three-dimensional deformable face image fitted to the two-dimensional face image to be recognized, and then a face key point P of the two-dimensional face image to be recognized is estimated by using the formula (1) according to the estimated projection matrix T and the face shape component coefficient set alpha.

In order to meet the requirement of the convolutional neural network on image calculation and improve the calculation efficiency, before the first convolutional neural network is adopted to determine the first projection matrix T and the first human face shape component coefficient set α of the three-dimensional image, the method further comprises the following steps:

and acquiring the two-dimensional face image to be recognized.

And zooming the pixels of the two-dimensional face image to be recognized to the image with preset pixel points.

And carrying out normalization processing on the image of the preset pixel point.

In an exemplary embodiment, the pixel size of the pixel site 114 is predetermined.

The technical scheme provided by the embodiment of the disclosure is high in accuracy of determining the face key points based on the three-dimensional deformable face model, and the projection matrix and the face shape component coefficient set are pre-estimated by using the first convolution neural network, so that more accurate face key points are obtained, and the accuracy of positioning the face key points is improved.

In view of the fact that the accuracy of the first projection matrix T and the first face shape component coefficient set alpha estimated through the first convolutional neural network is not high enough, the face key point positioning effect is not ideal, the projection matrix can be optimized again through the second convolutional neural network, and the accuracy of face key point positioning is further improved. Specifically, as shown in fig. 2, a method for locating a face key point is provided, which includes:

s201, determining a first projection matrix T and a first human face shape component coefficient set alpha of a three-dimensional image by using a first convolution neural network, wherein the three-dimensional image is a three-dimensional deformable human face image fitted with a two-dimensional human face image to be recognized.

S202, determining a first face key point P of the two-dimensional face image to be recognized according to the first T and the first alpha.

S203, determining a second T by adopting a second convolution neural network according to the image of the adjacent area of the first P.

The second convolutional neural network is a convolutional neural network which is used for learning and training an image sample in a preset range around each pixel point of a face key point of a large number of two-dimensional face images, establishing an input of the image in the preset range around each pixel point of the face key point of the two-dimensional face images, and outputting a projection matrix T from the three-dimensional face images to the two-dimensional face images.

S204, determining a second P of the two-dimensional face image to be recognized according to the second T and the first alpha.

Based on formula (1), in view of the powerful image feature extraction capability of the convolutional neural network, the first convolutional neural network is used for estimating a first projection matrix T and a first face shape component coefficient set alpha of a three-dimensional deformable face image fitted to a two-dimensional face image to be recognized firstly, and then a first face key point P of the two-dimensional face image to be recognized is estimated according to the estimated first projection matrix T and the estimated first face shape component coefficient set alpha by using the formula (1). And based on the estimated first face key point P, after a second T is obtained by optimizing the projection matrix through a second convolutional neural network, estimating the face key point of the two-dimensional face image to be recognized again according to the second T and the first alpha to obtain the second P.

The technical scheme provided by the embodiment of the disclosure further optimizes the projection matrix through the second convolutional neural network, so that the face key points are determined again, and the accuracy of face key point positioning is further improved.

In view of the fact that the accuracy of the first projection matrix T and the first face shape component coefficient set alpha estimated through the first convolutional neural network is low and the face key point positioning effect is not ideal, the face shape component coefficient set can be optimized again through the third convolutional neural network, and the accuracy of face key point positioning is further improved. Specifically, as shown in fig. 3, a method for locating a face key point is provided, which includes:

s301, determining a first projection matrix T and a first human face shape component coefficient set alpha of a three-dimensional image by using a first convolution neural network, wherein the three-dimensional image is a three-dimensional deformable human face image fitted with a two-dimensional human face image to be recognized.

S302, determining a first face key point P of the two-dimensional face image to be recognized according to the first T and the first alpha.

S303, determining a third alpha by adopting a third convolutional neural network according to the image of the adjacent area of the first P.

The third convolutional neural network is a convolutional neural network which is constructed by inputting a large number of image samples in a preset range around each pixel point of the face key point of the two-dimensional face image, performing learning training, inputting an image in the preset range around each pixel point of the face key point of the two-dimensional face image, and outputting a face shape component coefficient set alpha of a three-dimensional face image corresponding to the two-dimensional face image.

S304, determining a fourth P of the two-dimensional face image to be recognized according to the first T and the third alpha.

Based on formula (1), in view of the powerful image feature extraction capability of the convolutional neural network, the first projection matrix T and the first face shape component coefficient set α of the three-dimensional deformable face image fitted to the two-dimensional face image to be recognized are estimated through the first convolutional neural network, and then the first face key point P of the two-dimensional face image to be recognized is estimated by using the formula (1) according to the estimated first projection matrix T and the first face shape component coefficient set α. And based on the estimated first face key point P, obtaining a third alpha after optimizing the face shape component coefficient set alpha through a third convolutional neural network, and calculating the face key point according to the first projection matrix T and the optimized third alpha to obtain a fourth P.

In view of the fact that the accuracy of the first projection matrix T and the first face shape component coefficient set alpha estimated through the first convolutional neural network is low and the face key point positioning effect is not ideal, the projection matrix and the face shape component coefficient set can be optimized again through the second convolutional neural network and the third convolutional neural network respectively, and the face key point positioning accuracy is further improved. Specifically, as shown in fig. 4, a method for locating a face key point is provided, which includes:

s401, determining a first projection matrix T and a first human face shape component coefficient set alpha of a three-dimensional image by using a first convolution neural network, wherein the three-dimensional image is a three-dimensional deformable human face image fitted with a two-dimensional human face image to be recognized.

S402, determining a first face key point P of the two-dimensional face image to be recognized according to the first T and the first alpha.

S403, determining a second T by adopting a second convolutional neural network according to the image of the adjacent area of the first P.

S404, determining a second P of the two-dimensional face image to be recognized according to the second T and the first alpha.

S405, determining a second alpha by adopting a third convolutional neural network according to the image of the adjacent area of the second P.

S406, determining a third P of the two-dimensional face image to be recognized according to the second T and the second alpha.

Based on the formula (1), in view of the strong image feature extraction capability of the convolutional neural network, the face key points of the two-dimensional face image to be recognized are determined by using three convolutional neural networks after learning training, namely the first convolutional neural network, the second convolutional neural network and the third convolutional neural network. The method comprises the steps of firstly estimating a first projection matrix T and a first face shape component coefficient alpha of a three-dimensional deformable face image fitted to a two-dimensional face image to be recognized through a first convolution neural network, and then estimating a first face key point P of the two-dimensional face image to be recognized through a formula (1) according to the estimated first projection matrix T and the first face shape component coefficient set alpha. And based on the estimated first face key point P, obtaining a second T after a projection matrix is optimized through a second convolutional neural network, estimating a second P of the face key point of the two-dimensional face image to be recognized according to the second T and the first face shape component coefficient set alpha, finally, optimizing the face shape component coefficient set alpha through a third convolutional neural network to obtain a second alpha, and then calculating the face key point according to the optimized second T and the second alpha to obtain a third P.

The steps of optimizing the projection matrix and optimizing the face shape component coefficient α are not in sequence, the projection matrix may be optimized by a second convolutional neural network first, the face shape component coefficient set α is optimized by a third convolutional neural network second, the face shape component coefficient set α may be optimized by the third convolutional neural network first, and the projection matrix is optimized by the second convolutional neural network second. Specifically, the executing steps of firstly optimizing T and then optimizing alpha comprise:

after determining a first face key point P of the two-dimensional face image to be recognized according to the first T and the first α, the method further includes:

determining a fifth P of the two-dimensional face image to be recognized according to the first T and the third alpha;

determining a third T by adopting a second convolutional neural network according to the image of the adjacent area of the fifth P;

and determining a sixth P of the two-dimensional face image to be recognized according to the third T and the third alpha.

In an exemplary embodiment, in order to further improve the face keypoint locating accuracy, a projection matrix and a face shape component coefficient set can be subjected to multiple loop iteration optimization. Specifically, the scheme comprises the following steps:

step a, determining a first projection matrix T and a first human face shape component coefficient set alpha of a three-dimensional image by adopting a first convolution neural network, wherein the three-dimensional image is a three-dimensional deformable human face image fitted by a two-dimensional human face image to be recognized.

And b, determining a first face key point P of the two-dimensional face image to be recognized according to the first T and the first alpha.

Step c, according to the P_kUsing a second convolutional neural network to determine T_k。

Step d, according to the T_kAnd said alpha_kDetermining P of the two-dimensional face image to be recognized_k+1。

Step e, according to the P_k+1Using a third convolutional neural network to determine alpha_k+1。

Step f, according to T_kAnd said alpha_k+1Determining P of the two-dimensional face image to be recognized_k+2。

Wherein k is more than or equal to 1 and less than or equal to Z, k and Z are positive integers, and when k is 1, P_kIs a first P, alpha_kFor the first alpha, steps c-f may be repeatedThe human face key point P is more accurate by executing the loop for Z times_Z+2。

In an exemplary embodiment, on the premise of improving the positioning accuracy of the face key points, in order to avoid the problem that the calculation amount is too large due to multiple times of loop iteration calculation, and the calculation efficiency is reduced, the number of times of loop iteration optimization may be 3-5, that is, Z takes the value of [3,5 ].

The technical scheme provided by the embodiment of the disclosure further optimizes the projection matrix and the face shape component coefficient set through the second convolutional neural network and the third convolutional neural network respectively, so as to determine the face key points again, and further improve the accuracy of face key point positioning.

Fig. 5 is a diagram illustrating an apparatus for locating face key points according to an exemplary embodiment. Referring to fig. 5, the apparatus 50 includes:

a first determining module 51, configured to determine a first projection matrix T and a first set of facial shape component coefficients α of a three-dimensional image by using a first convolutional neural network, where the three-dimensional image is a three-dimensional deformable facial image fitted to a two-dimensional facial image to be recognized;

and the first identification module 52 is configured to determine a first face key point P of the two-dimensional face image to be identified according to the first T and the first α.

In one embodiment, as shown in fig. 6, the apparatus 60 further comprises:

a second determining module 53, configured to determine a second T by using a second convolutional neural network according to the image of the adjacent area of the first P;

and the second recognition module 54 is configured to determine a second P of the two-dimensional face image to be recognized according to the second T and the first α.

In one embodiment, as shown in fig. 7, the apparatus 70 further comprises:

a second recognition module 54, configured to determine a second P of the two-dimensional face image to be recognized according to the second T and the first α;

a third determining module 55, configured to determine a second α by using a third convolutional neural network according to the image of the neighboring area of the second P;

and the third recognition module 56 is configured to determine a third P of the two-dimensional face image to be recognized according to the second T and the second α.

In one embodiment, as shown in fig. 8, the apparatus 80 further comprises:

a fourth determining module 57, configured to determine a second T by using a second convolutional neural network according to the image of the adjacent area of the first P;

and a fourth identification module 58, configured to determine a fourth P of the two-dimensional face image to be identified according to the first T and the third α.

In one embodiment, such as the apparatus 90 shown in fig. 9, the first determining module 51 comprises:

the first determining submodule 511 is configured to perform regression calculation on the three-dimensional image by using a first convolution neural network to obtain the first T and the first α of the three-dimensional image;

the first identification module 52 includes: a first identification submodule 521 for utilizing a formula based on said first T and said first α

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 10 is a block diagram illustrating an apparatus 1000 for controlling a mobile terminal according to an example embodiment. For example, the apparatus 1000 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, an exercise device, a personal digital assistant, and the like.

Referring to fig. 10, the apparatus 1000 may include one or more of the following components: processing component 1002, memory 1004, power component 1006, multimedia component 1008, audio component 1010, input/output (I/O) interface 1012, sensor component 1014, and communications component 1016.

The processing component 1002 generally controls the overall operation of the device 1000, such as operations associated with display, telephone calls, data communications, camera operations, and recording operations. The processing components 1002 may include one or more processors 1020 to execute instructions to perform all or a portion of the steps of the methods described above. Further, processing component 1002 may include one or more modules that facilitate interaction between processing component 1002 and other components. For example, the processing component 1002 may include a multimedia module to facilitate interaction between the multimedia component 1008 and the processing component 1002.

The memory 1004 is configured to store various types of data to support operations at the apparatus 1000. Examples of such data include instructions for any application or method operating on device 1000, contact data, phonebook data, messages, pictures, videos, and so forth. The memory 1004 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

The power supply component 1006 provides power to the various components of the device 1000. The power components 1006 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power supplies for the device 1000.

The multimedia component 1008 includes a screen that provides an output interface between the device 1000 and a user. In some embodiments, the screen may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation. In some embodiments, the multimedia component 1008 includes a front facing camera and/or a rear facing camera. The front camera and/or the rear camera may receive external multimedia data when the device 1000 is in an operating mode, such as a shooting mode or a video mode. Each front camera and rear camera may be a fixed optical lens system or have a focal length and optical zoom capability.

The audio component 1010 is configured to output and/or input audio signals. For example, audio component 1010 includes a Microphone (MIC) configured to receive external audio signals when apparatus 1000 is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in the memory 1004 or transmitted via the communication component 1016. In some embodiments, audio component 1010 also includes a speaker for outputting audio signals.

I/O interface 1012 provides an interface between processing component 1002 and peripheral interface modules, which may be keyboards, click wheels, buttons, etc. These buttons may include, but are not limited to: a home button, a volume button, a start button, and a lock button.

The sensor assembly 1014 includes one or more sensors for providing various aspects of status assessment for the device 1000. For example, sensor assembly 1014 may detect an open/closed state of device 1000, the relative positioning of components, such as a display and keypad of device 1000, the change in position of device 1000 or a component of device 1000, the presence or absence of user contact with device 1000, the orientation or acceleration/deceleration of device 1000, and the change in temperature of device 1000. The sensor assembly 1014 may include a proximity sensor configured to detect the presence of a nearby object without any physical contact. The sensor assembly 1014 may also include a light sensor, such as a CMOS or CCD image sensor, for use in imaging applications. In some embodiments, the sensor assembly 1014 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.

The communication component 1016 is configured to facilitate communications between the apparatus 1000 and other devices in a wired or wireless manner. The device 1000 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component 1016 receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communications component 1016 further includes a Near Field Communication (NFC) module to facilitate short-range communications. For example, the NFC module may be implemented based on Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and other technologies.

In an exemplary embodiment, the apparatus 1000 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, micro-controllers, microprocessors or other electronic components for performing the above-described methods.

In an exemplary embodiment, a non-transitory computer readable storage medium comprising instructions, such as the memory 1004 comprising instructions, executable by the processor 1020 of the device 1000 to perform the above-described method is also provided. For example, the non-transitory computer readable storage medium may be a ROM, a Random Access Memory (RAM), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

An apparatus for locating face key points, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

The processor may be further configured to:

and determining a second P of the two-dimensional face image to be recognized according to the second T and the first alpha. The processor may be further configured to: after determining a first face key point P of the two-dimensional face image to be recognized according to the first T and the first α, the method further includes:

The processor may be further configured to:

the determining the first projection matrix T and the first face shape component coefficient alpha of the three-dimensional image by using the first convolution neural network comprises:

using a formula based on the first T and the first alpha

A computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, carries out the steps of: :

The instructions in the storage medium may further include:

and determining a second P of the two-dimensional face image to be recognized according to the second T and the first alpha. The instructions in the storage medium may further include: after determining a first face key point P of the two-dimensional face image to be recognized according to the first T and the first α, the method further includes:

The instructions in the storage medium may further include:

using a formula based on the first T and the first alpha

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A method for locating face key points is characterized by comprising the following steps:

determining a first projection matrix T and a first face shape component coefficient set alpha of a three-dimensional image by adopting a first convolution neural network, wherein the three-dimensional image is a three-dimensional deformable face image fitted with a two-dimensional face image to be recognized; the first convolution neural network is a convolution neural network which inputs a large number of two-dimensional face image samples for learning training, establishes two-dimensional face images as input, outputs a projection matrix T of the two-dimensional face images and corresponding three-dimensional face images and a face shape component coefficient set alpha of the corresponding three-dimensional face images;

determining a first face key point P of the two-dimensional face image to be recognized according to the first T and the first alpha;

the determining a first face key point P of the two-dimensional face image to be recognized according to the first T and the first α includes:

using a formula based on the first T and the first alpha

for the ith individual face shape component, alpha, in the key points of the face in the three-dimensional image_iThe ith shape component coefficient is the first alpha, and n is the pixel number of the key point of the human face in the three-dimensional image;

determining a second T by adopting a second convolution neural network according to the image of the adjacent area of the first P; the second convolutional neural network is a convolutional neural network which is used for learning and training an image sample in a preset range around each pixel point of a face key point of a large number of input two-dimensional face images, establishing input of the image in the preset range around each pixel point of the face key point of the two-dimensional face images, and outputting a projection matrix T from the three-dimensional face images to the two-dimensional face images;

2. The method according to claim 1, wherein after determining a first face keypoint P of the two-dimensional face image to be recognized according to the first T and the first alpha, the method further comprises:

3. The method according to claim 1, wherein after determining a first face keypoint P of the two-dimensional face image to be recognized according to the first T and the first alpha, the method further comprises:

4. The method of claim 1, wherein determining the first projection matrix T and the first face shape component coefficient α of the three-dimensional image using the first convolutional neural network comprises:

and performing regression calculation on the three-dimensional image by adopting a first convolution neural network to obtain the first T and the first alpha of the three-dimensional image.

5. An apparatus for locating key points of a human face, comprising:

the first determining module is used for determining a first projection matrix T and a first human face shape component coefficient set alpha of a three-dimensional image by adopting a first convolution neural network, wherein the three-dimensional image is a three-dimensional deformable human face image fitted by a two-dimensional human face image to be recognized; the first convolution neural network is a convolution neural network which inputs a large number of two-dimensional face image samples for learning training, establishes two-dimensional face images as input, outputs a projection matrix T of the two-dimensional face images and corresponding three-dimensional face images and a face shape component coefficient set alpha of the corresponding three-dimensional face images;

the first identification module is used for determining a first face key point P of the two-dimensional face image to be identified according to the first T and the first alpha;

the first identification module comprises: a first identification submodule for utilizing a formula based on the first T and the first alpha

for the ith individual face shape component, alpha, in the key points of the face in the three-dimensional image_iThe ith shape component coefficient is the first alpha, and n is the pixel number of the key point P of the human face in the three-dimensional image;

the device further comprises:

a second determining module, configured to determine a second T by using a second convolutional neural network according to the image of the adjacent region of the first P; the second convolutional neural network is a convolutional neural network which is used for learning and training an image sample in a preset range around each pixel point of a face key point of a large number of input two-dimensional face images, establishing input of the image in the preset range around each pixel point of the face key point of the two-dimensional face images, and outputting a projection matrix T from the three-dimensional face images to the two-dimensional face images;

6. The apparatus of claim 5, further comprising:

7. The apparatus of claim 6, further comprising:

a fourth determining module, configured to determine a third α using a third convolutional neural network according to the image of the adjacent area of the first P;

8. The apparatus of claim 6, wherein the first determining module comprises:

the first determining submodule is used for performing regression calculation on the three-dimensional image by adopting a first convolution neural network to obtain the first T and the first alpha of the three-dimensional image.

9. An apparatus for locating key points of a human face, comprising:

a processor;

a memory for storing processor-executable instructions;

wherein the processor is configured to:

determining a first face key point of the two-dimensional face image to be recognized according to the first T and the first alpha;

using a formula based on the first T and the first alpha

for the ith individual face shape component, alpha, in the key points of the face in the three-dimensional image_iThe ith shape component coefficient is the first alpha, and n is the pixel number P of key points of the human face in the three-dimensional image;

after determining a first face key point P of the two-dimensional face image to be recognized according to the first T and the first alpha, the method further comprises the following steps:

10. A computer-readable storage medium, on which a computer program is stored, which program, when executed by a processor, carries out the steps of:

the determining the first face key point P of the two-dimensional face image to be recognized according to the first T and the first alpha comprises:

using a formula based on the first T and the first alpha