CN114581973A - Face pose estimation method and device, storage medium and computer equipment - Google Patents

Face pose estimation method and device, storage medium and computer equipment Download PDF

Info

Publication number
CN114581973A
CN114581973A CN202210130312.9A CN202210130312A CN114581973A CN 114581973 A CN114581973 A CN 114581973A CN 202210130312 A CN202210130312 A CN 202210130312A CN 114581973 A CN114581973 A CN 114581973A
Authority
CN
China
Prior art keywords
face
key point
coordinates
keypoint
nose
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210130312.9A
Other languages
Chinese (zh)
Inventor
孟娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Zhuhai Zero Boundary Integrated Circuit Co Ltd
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Zhuhai Zero Boundary Integrated Circuit Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai, Zhuhai Zero Boundary Integrated Circuit Co Ltd filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN202210130312.9A priority Critical patent/CN114581973A/en
Publication of CN114581973A publication Critical patent/CN114581973A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application discloses a face pose estimation method, a face pose estimation device, a storage medium and computer equipment, and relates to the field of image processing. The method comprises the following steps: carrying out normalization processing on an input picture to obtain a normalized image; wherein the input picture comprises a human face; detecting the normalized image by using a pre-trained simplified network model to obtain position information of a face prediction frame and position information of a plurality of face key points; and calculating the attitude angle of the face according to the position information of the plurality of face key points. The convolutional neural network has a simple structure and can be deployed on embedded equipment, so that the face pose estimation with high precision, high real-time performance and low computation amount can be realized.

Description

Face pose estimation method and device, storage medium and computer equipment
Technical Field
The present application relates to the field of image processing, and in particular, to a method and an apparatus for estimating a face pose, a storage medium, and a computer device.
Background
With the rapid development of artificial intelligence technology, related industries and applications thereof have penetrated into various fields, and various intelligent products greatly facilitate the lives of people. Human face pose estimation is taken as a connection inlet and an important branch in the field of artificial intelligence, and the application of the human face pose estimation is already rushed to a plurality of fields in production and life of people. Such as intelligent desk lamp gesture detection, driver fatigue detection and virtual reality.
Face pose estimation is generally based on model-based methods, face appearance-based methods, and classification-based methods. The model-based method is that 2D characteristic points (such as mouth corners, eye corners, noses and the like) are extracted from a face region in a two-dimensional image and are in corresponding relation with 3D characteristic points of a three-dimensional face model, and the posture of a face is estimated by using methods such as geometry and the like, so that the method for estimating the posture of the face is simple, efficient, high in speed and small in calculated amount on the premise of accurate characteristic point detection; the appearance-based method is that a certain corresponding relation exists between the three-dimensional face pose and certain characteristics (gray scale, graphic gradient and the like) of the face in a two-dimensional image, a large number of face images with known poses are trained by constructing a mathematical model to recover the relation and determine the face pose, but the actual corresponding relation needs a large number of training image verifications, interpolation is needed in the process of processing the image, the calculated amount is large, and finally the pose estimation result is poor.
Disclosure of Invention
The embodiment of the application provides a face pose estimation method, a face pose estimation device, a storage medium and computer equipment, and can solve the problems of large calculation amount, low accuracy and low real-time performance of face pose estimation in the prior art. The technical scheme is as follows:
in a first aspect, an embodiment of the present application provides a method for estimating a face pose, where the method includes:
carrying out normalization processing on an input picture to obtain a normalized image; wherein the input picture comprises a human face;
detecting the normalized image by using a pre-trained network model to obtain position information of a face prediction frame and position information of a plurality of face key points;
and calculating the attitude angle of the human face according to the position information of the plurality of human face key points.
In a second aspect, an embodiment of the present application provides a face pose estimation apparatus, including:
the normalization unit is used for normalizing the input picture to obtain a normalized image; wherein the input picture comprises a human face;
the detection unit is used for detecting the normalized image by utilizing a pre-trained network model to obtain the position information of a face prediction frame and the position information of a plurality of face key points;
and the calculation unit is used for calculating the attitude angle of the human face according to the position information of the plurality of human face key points.
In a third aspect, embodiments of the present application provide a computer storage medium storing a plurality of instructions adapted to be loaded by a processor and to perform the above-mentioned method steps.
In a fourth aspect, an embodiment of the present application provides a computer device, which may include: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the above-mentioned method steps.
The beneficial effects brought by the technical scheme provided by some embodiments of the application at least comprise:
the method comprises the steps of outputting position information of a face prediction frame and position information of face key points by using a pre-trained network model, and then calculating attitude angles of a face by using the position information of the face key points.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the embodiments or the prior art descriptions will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and other drawings can be obtained by those skilled in the art without creative efforts.
Fig. 1 is a schematic flow chart of a face pose estimation method provided in an embodiment of the present application;
FIG. 2 is a schematic diagram of a face prediction box and an anchor box provided in an embodiment of the present application;
FIG. 3 is a schematic diagram of a world coordinate system provided by an embodiment of the present application;
fig. 4 is a schematic structural diagram of a human face pose estimation apparatus provided in the present application;
fig. 5 is a schematic structural diagram of a computer device provided in the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
It should be noted that, the face pose estimation method provided by the present application is generally executed by a computer device, and accordingly, the face pose estimation apparatus is generally disposed in the computer device.
Computer devices including, but not limited to, smart phones, tablets, laptop portable computers, desktop computers, and the like. The computer equipment can also be provided with a display device and a camera, the display of the display device can be various devices capable of realizing the display function, and the camera is used for collecting video streams; for example: the display device may be a cathode ray tube (CR) display, a light-emitting diode (LED) display, an electronic ink screen, a Liquid Crystal Display (LCD), a Plasma Display Panel (PDP), or the like. The user can utilize the display device on the computer device to view the displayed information such as characters, pictures, videos and the like.
The following describes in detail a face pose estimation method provided by the embodiment of the present application with reference to fig. 1. The face pose estimation apparatus in the embodiment of the present application may be a computer device shown in fig. 1.
Referring to fig. 1, a flow chart of a method for estimating a face pose is provided in an embodiment of the present application. As shown in fig. 2, the method of the embodiment of the present application may include the steps of:
and S101, carrying out normalization processing on the original image to obtain a normalized image.
The original image comprises one or more human faces. The image normalization means that the original image to be processed is converted into a corresponding unique standard form through a series of transformation. For example: the normalization process includes: and dividing the value of each pixel included in the original image by 255 to obtain a normalized image, wherein the value of each pixel included in the normalized image is between [0 and 1 ].
S202, detecting the normalized image by using a pre-trained network model to obtain the position information of the face prediction frame and the position information of a plurality of face key points.
The pre-trained network model is a convolutional neural network obtained based on machine learning or deep learning algorithm training, for example: the convolutional neural network is yolov3(you lok only once v3) network model, and the following embodiments take yolov3 network model as an example for explanation. The computer equipment is deployed with a pre-trained yolov3 network model, the yolov3 network model is obtained by training with a data set, the data set comprises a training set, a verification set and a test set, the data set comprises a plurality of face images, and the type of the data set is not limited in the application. The occupation ratio of the training set, the verification set and the testing machine in the data set can be determined according to actual requirements, and the application is not limited. Each face image in the data set is provided with annotation information, the annotation information can be manually annotated by a user through an annotation tool, or automatically annotated by computer equipment, and the annotation information represents position information of a face and position information of a plurality of face key points in each face image, for example: the real position of the face is marked through a rectangular frame in the face image, the rectangular frame is the face marking frame, and the real positions of a plurality of key points on the face are marked through marking points.
It should be noted that the yolov3 network model of the present application is a simplified model obtained by cutting based on the original yolov3 network model, and the cutting process includes layer deletion, channel number change, convolution kernel size change, etc., and the number of layers of the yolov3 network model is reduced to about 20 layers, and the model size is about 9.4M, so that the model can be replaced to match the computing power of the embedded device.
In one or more possible embodiments, the training process of the network model includes:
clustering the data set by using a clustering algorithm to obtain anchor frames with various scales;
determining a face detection loss function and a face key point detection loss function;
and training the training set by using the anchor frames with various scales, the face detection loss function and the face key point loss function to obtain a network model.
The method comprises the steps of acquiring the sizes of face labeling frames in all images by using labeled face images in a data set, and clustering the sizes of the face labeling frames by using a K-means clustering algorithm to obtain anchor frames (anchor boxes) with various scales. The face detection loss function is used for calculating a difference value between a face prediction frame and a face labeling frame, and the face key point loss function is used for calculating a difference value between a real position and a predicted position of a face key point. In the present application, the face detection loss function may be an L1 loss function or an L2 loss function, and the face key point detection loss function may be expressed as
Figure BDA0003502340610000031
Wherein x is the difference between the true value and the predicted value, w, a and C are common knowledge, the value can be determined according to the actual requirement, the application is not limited, the gradient value of the logarithmic function in the loss function is 1/x, and the optimal step length is x2The method can balance error items with different sizes, solve the problem of face key point outlier and improve the accuracy of face key point detection. In the training process, respectively utilizing each anchor frame with different scales to respectively search faces in a face image, obtaining a plurality of candidate face prediction frames after the searching is finished, finally calculating the confidence coefficient of each candidate face prediction frame by adopting a non-maximum suppression algorithm, and taking the face prediction frame with the highest confidence coefficient as a final face prediction frame.
In the embodiment of the present application, the parameter values output by the network model include: t is tx、ty、twAnd th,txDenotes, twAnd thFor scale scaling, tx=Gx-cx,ty=Gy-cy,GxAnd GyIs the 2D coordinate of the center point of the face labeling frame on the corresponding feature map, see FIG. 2, cxAnd cyAnd 2D coordinates of the upper left corner of the characteristic grid diagram where the central point is located.
Then, the position information of the face prediction frame is calculated according to the following formula:
Figure BDA0003502340610000041
wherein, bxAnd by2D coordinates of center point representing face prediction box, bwWidth of the face prediction box, bhRepresenting the height of the face prediction box, sigma () being a sigmoid function, p and w being constants, cxAnd 2D coordinates of the upper left corner of the feature map corresponding to the normalized image are shown.
Further, the parameter values output by the network model of the application comprise offsets between 2D coordinates of a plurality of face key points and the center point of the face prediction frame, the offsets comprise horizontal 2D coordinate offsets and vertical 2D coordinate offsets, the number of the face key points is 5, and the offsets of the 5 face key points are (t)x1,ty1)、(tx2,ty2)、(tx3,ty3)、(tx4,ty4)、(tx5,ty5) The left eye keypoint offset, the right eye keypoint offset, the nose keypoint offset, the left mouth angle keypoint offset and the right mouth angle keypoint offset are respectively represented.
Then, the 2D coordinates of the 5 face key points are calculated according to the following formula:
Figure BDA0003502340610000042
wherein LeyexAnd Leyey2D coordinates representing the key points of the left eye, ReyexAnd Reyey2D coordinates representing Right eye Key points, NosexAnd Nosey2D coordinates, Lmouth, representing key points of the nosexAnd Lmouthy2D coordinates representing a left mouth corner keypoint, RmouthxAnd RmouthyRepresenting the 2D coordinates of the right mouth corner keypoints.
And S103, calculating the pose angle of the human face according to the position information of the plurality of human face key points.
After the 2D coordinates of a plurality of face key points are obtained through calculation, three attitude angles of the face in a world coordinate system are estimated according to the 2D coordinates of the face key points: pitch angle (pitch), yaw angle (yaw), roll angle (roll), for example: the pose angle of the face is calculated by an EPNP (efficient personal dynamic-n-Point) algorithm, and the world coordinate system is shown in FIG. 3.
For example, the pose angle of the face in the world coordinate system is calculated from the 2D coordinates of the 5 face key points calculated in S102. In an embodiment, the coordinates in the world coordinate system and the camera coordinate system are denoted with superscripts w and c, respectively. The coordinates of the 5 3D reference points of the face in the world coordinate system are then
Figure BDA0003502340610000043
1, 5, 2D key points are coordinated in the camera coordinate system as
Figure BDA0003502340610000044
1., 5. The coordinates of the 4 control points in the world coordinate system are
Figure BDA0003502340610000045
j is 1, the coordinates of 4, 4 virtual control points in the camera coordinate system are
Figure BDA0003502340610000046
j 1.. 4, which are all non-homogeneous coordinates.
Wherein, the process of estimating the attitude angle comprises:
(1) selecting the quality of 5 3D reference points in a standard human face 3D modelThe unit vectors of the directions of the center and the three principal axes are used as four control points in a world coordinate system (for example, fig. 3 is a schematic diagram of the world coordinate system)
Figure BDA0003502340610000047
The centroid is taken as a first reference point, and the calculation formula is as follows:
Figure BDA0003502340610000048
in the formula, n represents the number of reference points. And (3) calculating to obtain four control points in the world coordinate system in the step (1).
The coordinates of the 5 3D reference points in the world coordinate system are expressed in a weighted summation manner, and according to the invariant characteristic of the euclidean transformation, the weighting parameters of the 2D coordinates of the 5 key points calculated in step S102 are also applicable in the camera coordinate system, and the formula is as follows:
Figure BDA0003502340610000051
wherein alpha isijThe weighting parameters for the four control points. Therefore, the camera coordinates and the control point coordinates in a world coordinate system can be linked through camera internal parameters and weighting parameters, and two linear equations can be obtained for each 3D reference point:
Figure BDA0003502340610000052
Figure BDA0003502340610000053
wherein the virtual control point coordinates are
Figure BDA0003502340610000054
j 1.. 4, [ u ] for projection of control points on a pixel planei,vi,1]TIs represented by (f)u,fv) And (u)c,vc) The focal length and optical center of the camera are obtained by calibrating the camera, 5 key points of the face are connected in series, and the corresponding coefficient can be written into M, so that an equation Mx is obtained, wherein the equation Mx is 0
Figure BDA0003502340610000055
The virtual control point coordinates of the control points in the camera coordinate system can be obtained by solving the equation.
(4) In the above calculation step, the control point calculated by the reference point is not changed, so the virtual control point calculated by the reference point is used as the initial control point, and the coordinate of the virtual control point calculated by the 2D coordinate is constantly changed, so that the posture of the human face can be calculated by only calculating the rotation and translation matrix of the virtual control point relative to the initial virtual control point. Therefore, the PNP problem from three-dimension to two-dimension is converted into the classical rigid motion problem from three-dimension to two-dimension, namely the ICP algorithm is adopted for solving, and the solving process is as follows:
the calculation center:
Figure BDA0003502340610000056
removing the center:
Figure BDA0003502340610000057
the objective function to be minimized is:
Figure BDA0003502340610000058
defining a matrix H:
Figure BDA0003502340610000059
SVD decomposition of H can obtain: h ═ U ∑ VTU and V unitary matrices, Σ is a semi-positive diagonal matrix.
And (3) combining the coordinate of the removal center to mathematically deform the target function and then substituting H into the target function to obtain: r ═ VUT,
Figure BDA00035023406100000510
Wherein, the value of N is 5,
Figure BDA00035023406100000511
respectively the center coordinates of the 2D keypoint and the 3D reference point coordinates,
Figure BDA00035023406100000512
and respectively removing the coordinates of the central point for the 2D key point and the coordinates of the central point for the 3D key point, wherein R and t are rotation and translation matrixes in the motion process of the human face.
According to the embodiment of the application, the position information of the face prediction frame and the position information of the face key points are output by using the pre-trained network model, then the attitude angle of the face is calculated by using the position information of the face key points, the functions of face detection and interpersonal key point detection are completed by using the yolov3 network, the network model has the characteristic of simple structure, and can be deployed on embedded equipment to realize high-precision and high-real-time face attitude estimation.
The following are embodiments of the apparatus of the present application that may be used to perform embodiments of the method of the present application. For details which are not disclosed in the embodiments of the apparatus of the present application, reference is made to the embodiments of the method of the present application.
Referring to fig. 4, a schematic structural diagram of a face pose estimation apparatus provided in an exemplary embodiment of the present application is shown, which is hereinafter referred to as apparatus 4. The apparatus 4 may be implemented as all or part of a computer device in software, hardware or a combination of both. The device 4 comprises: normalization section 401, detection section 402, and calculation section 403.
The normalization unit is used for normalizing the input picture to obtain a normalized image; wherein the input picture comprises a human face;
the detection unit is used for detecting the normalized image by utilizing a pre-trained network model to obtain the position information of a face prediction frame and the position information of a plurality of face key points;
and the calculation unit is used for calculating the attitude angle of the human face according to the position information of the plurality of human face key points.
In one or more possible embodiments, the plurality of face keypoints are: a left eye key point, a right eye key point, a nose key point, a left mouth corner key point and a right mouth corner key point; the parameter values output by the network model comprise: left eye keypoint offset (t)x1,ty1) Right eye keypoint offset (t)x2,ty2) Nose key point offset (t)x3,ty3) Left mouth corner key point offset (t)x4,ty4) And right mouth angle keypoint offset (t)x5,ty5);
The 2D coordinates of each keypoint are calculated according to the following formula:
Figure BDA0003502340610000061
wherein LeyexAnd Leyey2D coordinates representing the left eye keypoints, ReyexAnd Reyey2D coordinates representing Right eye Key points, NosexAnd Nosey2D coordinates, Lmouth, representing key points of the nosexAnd Lmouthy2D coordinates representing a left mouth corner keypoint, RmouthxAnd Rmouthy2D coordinates representing right mouth angle keypoints; c. CxAnd cy2D coordinates of the upper left corner of the characteristic grid diagram where the center point of the face prediction frame is located; δ () is a sigmoid function.
In one or more possible embodiments, the method further comprises:
the training unit is used for clustering the data set by utilizing a clustering algorithm to obtain anchor frames with various scales;
determining a face detection loss function and a face key point detection loss function;
and training the training set by using the anchor frames with various scales, the face detection loss function and the face key point loss function to obtain a network model.
In one or more possible embodiments, the face of the personThe key point loss function is:
Figure BDA0003502340610000062
w, a and C are constants.
In one or more possible embodiments, the normalization process includes:
the value of each pixel included in the original image is divided by 255.
It should be noted that, when the device 4 provided in the foregoing embodiment executes the face pose estimation method, only the division of the above functional modules is taken as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the apparatus is divided into different functional modules, so as to complete all or part of the above functions. In addition, the face pose estimation device and the face pose estimation method provided by the above embodiments belong to the same concept, and the detailed implementation process is referred to in the method embodiments, which is not described herein again.
The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.
An embodiment of the present application further provides a computer storage medium, where multiple instructions may be stored in the computer storage medium, where the instructions are suitable for being loaded by a processor and for executing the method steps in the embodiment shown in fig. 1, and a specific execution process may refer to a specific description of the embodiment shown in fig. 1, which is not described herein again.
The present application further provides a computer program product storing at least one instruction, which is loaded and executed by the processor to implement the method for estimating a face pose as described in the above embodiments.
Referring to fig. 5, a schematic structural diagram of a computer device is provided in an embodiment of the present application. As shown in fig. 5, the computer device 500 may include: at least one processor 501, at least one network interface 504, a user interface 503, memory 505, at least one communication bus 502.
Wherein a communication bus 502 is used to enable connective communication between these components.
The user interface 503 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 503 may also include a standard wired interface and a wireless interface.
The network interface 504 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.
Processor 501 may include one or more processing cores, among other things. The processor 501 interfaces with various components throughout the computer device 500 using various interfaces and lines to perform various functions of the computer device 500 and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 505 and invoking data stored in the memory 505. Optionally, the processor 501 may be implemented in at least one hardware form of Digital Signal Processing (DSP), Field-Programmable gate Array (FPGA), and Programmable Logic Array (PLA). The processor 501 may integrate one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a modem, and the like. Wherein, the CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It is understood that the modem may not be integrated into the processor 501, but may be implemented by a single chip.
The Memory 505 may include a Random Access Memory (RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 505 includes a non-transitory computer-readable medium. The memory 505 may be used to store instructions, programs, code sets, or instruction sets. The memory 505 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the various method embodiments described above, and the like; the storage data area may store data and the like referred to in the above respective method embodiments. The memory 505 may alternatively be at least one memory device located remotely from the processor 501. As shown in fig. 5, the memory 505, which is a type of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and an application program.
In the computer device 500 shown in fig. 5, the user interface 503 is mainly used as an interface for providing input for a user, and acquiring data input by the user; the processor 501 may be configured to call the application program stored in the memory 505 and specifically execute the method shown in fig. 1, and the specific process may refer to fig. 1 and is not described herein again.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a read-only memory or a random access memory.
The above disclosure is only for the purpose of illustrating the preferred embodiments of the present application and is not to be construed as limiting the scope of the present application, so that the present application is not limited thereto, and all equivalent variations and modifications can be made to the present application.

Claims (10)

1. A face pose estimation method is characterized by comprising the following steps:
carrying out normalization processing on an input picture to obtain a normalized image; wherein the input picture comprises a human face;
detecting the normalized image by using a pre-trained network model to obtain position information of a face prediction frame and position information of a plurality of face key points;
and calculating the attitude angle of the face according to the position information of the plurality of face key points.
2. The method of claim 1, wherein the plurality of face keypoints are: a left eye key point, a right eye key point, a nose key point, a left mouth corner key point and a right mouth corner key point; the parameter values output by the network model comprise: left eye keypoint offset (t)x1,ty1) Right eye keypoint offset (t)x2,ty2) Nose key point offset (t)x3,ty3) Left mouth corner key point offset (t)x4,ty4) And right mouth angle keypoint offset (t)x5,ty5);
The 2D coordinates of each keypoint are calculated according to the following formula:
Figure FDA0003502340600000011
wherein LeyexAnd Leyey2D coordinates representing the left eye keypoints, ReyexAnd Reyey2D coordinates, Nose, representing the key points of the right eyexAnd Nosey2D coordinates, Lmouth, representing key points of the nosexAnd Lmouthy2D coordinates representing a left mouth corner keypoint, RmouthxAnd Rmouthy2D coordinates representing right mouth angle keypoints; c. CxAnd cy2D coordinates of the upper left corner of the characteristic grid diagram where the center point of the face prediction frame is located; δ () is a sigmoid function.
3. The method according to claim 1 or 2, wherein before normalizing the input picture to obtain the normalized image, the method further comprises:
clustering the data set by using a clustering algorithm to obtain anchor frames with various scales;
determining a face detection loss function and a face key point detection loss function;
and training the training set by using the anchor frames with various scales, the face detection loss function and the face key point loss function to obtain a network model.
4. According to the rightThe method of claim 3, wherein the face keypoint loss function is:
Figure FDA0003502340600000012
w, a and C are constants.
5. The method according to claim 1, 2 or 4, wherein the normalization process comprises:
the value of each pixel included in the original image is divided by 255.
6. A face pose estimation apparatus, comprising:
the normalization unit is used for normalizing the input picture to obtain a normalized image; wherein the input picture comprises a human face;
the detection unit is used for detecting the normalized image by utilizing a pre-trained network model to obtain the position information of a face prediction frame and the position information of a plurality of face key points;
and the calculation unit is used for calculating the attitude angle of the human face according to the position information of the plurality of human face key points.
7. The apparatus of claim 6, wherein the plurality of face keypoints are: a left eye key point, a right eye key point, a nose key point, a left mouth corner key point and a right mouth corner key point; the parameter values output by the network model comprise: left eye keypoint offset (t)x1,ty1) Right eye keypoint offset (t)x2,ty2) Nose key point offset (t)x3,ty3) Left mouth corner key point offset (t)x4,ty4) And right mouth angle keypoint offset (t)x5,ty5);
The 2D coordinates of each keypoint are calculated according to the following formula:
Figure FDA0003502340600000021
wherein LeyexAnd Leyey2D coordinates representing the left eye keypoints, ReyexAnd Reyey2D coordinates, Nose, representing the key points of the right eyexAnd Nosey2D coordinates, Lmouth, representing key points of the nosexAnd Lmouthy2D coordinates representing a left mouth corner keypoint, RmouthxAnd Rmouthy2D coordinates representing right mouth angle keypoints; c. CxAnd cy2D coordinates of the upper left corner of the characteristic grid diagram where the center point of the face prediction frame is located; δ () is a sigmoid function.
8. The apparatus of claim 6 or 7, further comprising:
the training unit is used for clustering the data set by utilizing a clustering algorithm to obtain anchor frames with various scales;
determining a face detection loss function and a face key point detection loss function;
and training the training set by using the anchor frames with various scales, the face detection loss function and the face key point loss function to obtain a network model.
9. A computer storage medium, characterized in that it stores a plurality of instructions adapted to be loaded by a processor and to perform the method steps according to any one of claims 1 to 5.
10. A computer device, comprising: a processor and a memory; wherein the memory stores a computer program adapted to be loaded by the processor and to perform the method steps of any of claims 1 to 5.
CN202210130312.9A 2022-02-11 2022-02-11 Face pose estimation method and device, storage medium and computer equipment Pending CN114581973A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210130312.9A CN114581973A (en) 2022-02-11 2022-02-11 Face pose estimation method and device, storage medium and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210130312.9A CN114581973A (en) 2022-02-11 2022-02-11 Face pose estimation method and device, storage medium and computer equipment

Publications (1)

Publication Number Publication Date
CN114581973A true CN114581973A (en) 2022-06-03

Family

ID=81770208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210130312.9A Pending CN114581973A (en) 2022-02-11 2022-02-11 Face pose estimation method and device, storage medium and computer equipment

Country Status (1)

Country Link
CN (1) CN114581973A (en)

Similar Documents

Publication Publication Date Title
US11373332B2 (en) Point-based object localization from images
US11748934B2 (en) Three-dimensional expression base generation method and apparatus, speech interaction method and apparatus, and medium
CN109934792B (en) Electronic device and control method thereof
EP3992919B1 (en) Three-dimensional facial model generation method and apparatus, device, and medium
CN111401266B (en) Method, equipment, computer equipment and readable storage medium for positioning picture corner points
CN112052839A (en) Image data processing method, apparatus, device and medium
US11074671B2 (en) Electronic apparatus and control method thereof
Zhou et al. A lightweight hand gesture recognition in complex backgrounds
US20230351724A1 (en) Systems and Methods for Object Detection Including Pose and Size Estimation
KR20240032954A (en) Method, system, and computer-readable storage medium for locating a target object
CN114898062A (en) Map construction method and device based on SLAM in dynamic scene
CN110163095B (en) Loop detection method, loop detection device and terminal equipment
CN110910478B (en) GIF map generation method and device, electronic equipment and storage medium
CN113610864B (en) Image processing method, device, electronic equipment and computer readable storage medium
CN114581973A (en) Face pose estimation method and device, storage medium and computer equipment
CN107194980A (en) Faceform's construction method, device and electronic equipment
CN112150608A (en) Three-dimensional face reconstruction method based on graph convolution neural network
CN116503524B (en) Virtual image generation method, system, device and storage medium
CN113296604B (en) True 3D gesture interaction method based on convolutional neural network
CN116363329B (en) Three-dimensional image generation method and system based on CGAN and LeNet-5
LU101933B1 (en) Human action recognition method, human action recognition system and equipment
US20240233146A1 (en) Image processing using neural networks, with image registration
RU2779609C2 (en) Electronic device and method for its control
CN116363442A (en) Target detection method and device, and non-transitory storage medium
CN115393922A (en) Image processing method and device, electronic equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination