CN111695438B - Head pose estimation method and device - Google Patents

Head pose estimation method and device Download PDF

Info

Publication number
CN111695438B
CN111695438B CN202010431119.XA CN202010431119A CN111695438B CN 111695438 B CN111695438 B CN 111695438B CN 202010431119 A CN202010431119 A CN 202010431119A CN 111695438 B CN111695438 B CN 111695438B
Authority
CN
China
Prior art keywords
dimensional
head
angle
dimension
posture
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010431119.XA
Other languages
Chinese (zh)
Other versions
CN111695438A (en
Inventor
户磊
石芳
刘其开
朱海涛
陈智超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei Dilusense Technology Co Ltd
Original Assignee
Hefei Dilusense Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei Dilusense Technology Co Ltd filed Critical Hefei Dilusense Technology Co Ltd
Priority to CN202010431119.XA priority Critical patent/CN111695438B/en
Publication of CN111695438A publication Critical patent/CN111695438A/en
Application granted granted Critical
Publication of CN111695438B publication Critical patent/CN111695438B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/64Three-dimensional objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Social Psychology (AREA)
  • Psychiatry (AREA)
  • Image Analysis (AREA)
  • Length Measuring Devices By Optical Means (AREA)

Abstract

The embodiment of the invention provides a head posture estimation method and a device, wherein the head posture estimation method comprises the following steps: acquiring depth image data; inputting the depth image data into a head posture estimation model to obtain a head posture estimation result output by the head posture estimation model; the head posture estimation model is obtained by training a depth image sample data serving as a sample, and a comprehensive dimension posture size and angle two-classification label, a single-dimension posture multi-classification label and a single-dimension regression label which are marked in advance or defined in a section and are in one-to-one correspondence with the depth image sample data serving as sample labels; the comprehensive dimension gesture size and angle two-class label is used for representing the spatial characteristics of the comprehensive multi-dimension head gesture; both the single-dimensional pose multi-classification tag and the single-dimensional regression tag are used to represent the spatial features of a single-dimensional head pose. The head posture estimation method provided by the embodiment of the invention has higher precision and generalization capability, so that the head posture estimation result is more robust.

Description

Head pose estimation method and device
Technical Field
The present invention relates to the field of image recognition technologies, and in particular, to a head pose estimation method and apparatus.
Background
With the development of computer vision technology, head pose estimation is required in various scenes such as face recognition, attention detection, man-machine interaction, behavior analysis and the like. The head pose estimation is a technique for estimating the rotation angle of the head in the three-dimensional space from the face image by using methods such as computer vision and machine learning.
The head pose estimation methods in the prior art are roughly classified into three categories:
(1) The template matching-based method mainly comprises a two-dimensional image and a three-dimensional modeling method. The method based on the two-dimensional image mainly comprises the step of comparing an input image with images (each sample is provided with a gesture label) in a template library one by one, so that the most similar view and the corresponding gesture angle are obtained by matching. The three-dimensional modeling-based method is to reconstruct a three-dimensional face model of a person according to one or more Shan Erwei or single three-dimensional or multi-modal face images of the person, match the model with a three-dimensional model of a standard face gesture, completely coincide with the standard three-dimensional model after rotation correction, and calculate rotation matrix parameters to obtain a corresponding gesture angle. The method has the defects of higher computational complexity, longer time consumption and larger influence on face detection and image quality in the matching process.
(2) The model-based method is to construct a face structure by using a geometric model or construct a face model by using face key points, so that the mapping relation between the face image characteristics and the geometric model or the face model is further calculated, and finally the head gesture is estimated. The method has the defects of being easily influenced by key point detection precision, face image quality and scene environment, and poor generalization capability.
(3) The popular embedding method is used for simulating continuous change of head gestures by mapping high-dimensional space features of images to low dimensions and then is used for embedded template matching, and the feature dimension reduction method belongs to unsupervised learning, so that high correlation between low-dimensional principal component features and gesture features is difficult to ensure.
Disclosure of Invention
Embodiments of the present invention provide a head pose estimation method and apparatus that overcomes or at least partially solves the above-mentioned problems.
In a first aspect, an embodiment of the present invention provides a head pose estimation method, including: acquiring depth image data; inputting the depth image data into a head posture estimation model to obtain a head posture estimation result output by the head posture estimation model; the head posture estimation model is obtained by training with depth image sample data as a sample and a predetermined comprehensive dimension posture size and angle two-classification label, a single-dimension posture multi-classification label and a single-dimension regression label which correspond to the depth image sample data as sample labels; the comprehensive dimension gesture size and angle two-class label is used for representing the spatial characteristics of the comprehensive multi-dimensional head gesture; the single-dimensional posture multi-classification tag and the single-dimensional regression tag are both used for representing the spatial features of a single-dimensional head posture.
In some embodiments, the head pose estimation model includes a feature extraction layer, a single-dimensional pose layer, and a comprehensive dimensional pose layer; the step of inputting the depth image data into a head posture estimation model to obtain a head posture estimation result output by the head posture estimation model comprises the following steps: inputting the depth image data to the feature extraction layer to obtain a single-dimensional gesture feature map and a multi-dimensional gesture feature map; inputting the single-dimensional gesture feature map into the single-dimensional gesture layer to obtain single-dimensional gesture multi-classification information and single-dimensional regression information; inputting the multidimensional gesture feature map into the comprehensive dimensional gesture layer to obtain the size and angle classification information of the comprehensive dimensional gesture; determining the head pose estimation result based on the single-dimensional pose multi-classification information, the single-dimensional regression information and the single-dimensional regression information; the determining process of the head posture estimation model comprises the following steps: the single-dimensional gesture layer is obtained by training the single-dimensional gesture sample feature map serving as a sample and a single-dimensional gesture multi-classification label and a single-dimensional regression label which are predetermined and correspond to the single-dimensional gesture sample feature map serving as sample labels; and the comprehensive dimension posture layer is obtained by training the multi-dimension posture sample feature map serving as a sample and a predetermined comprehensive dimension posture size and angle two-class label corresponding to the multi-dimension posture sample feature map serving as a sample label.
In some embodiments, the acquiring depth image data comprises: collecting an original depth image; and performing face detection on the original depth image, removing redundant background areas, and determining the depth image data.
In some embodiments, the process of determining the head pose estimation model further comprises: acquiring a three-dimensional attitude angle label corresponding to any depth image data, wherein the three-dimensional attitude angle label comprises a Yaw angle, a Pitch angle and a Roll angle; determining the two classification labels of the size and the angle of the comprehensive dimension gesture based on the comparison of the three-dimensional gesture angle label and the set gesture angle threshold value of each dimension; determining the single-dimensional attitude multi-classification label based on interval division of the three-dimensional attitude angle label; and determining the single-dimensional regression tag based on the standardized processing of the three-dimensional attitude angle tag.
In some embodiments, the head pose estimation model is trained using a total loss function that is determined based on a single-dimensional pose loss function of a single-dimensional pose layer of the head pose estimation model and a comprehensive dimensional pose loss function of a comprehensive dimensional pose layer of the head pose estimation model.
In some embodiments, the determining the total loss function based on a one-dimensional pose loss function of a one-dimensional pose layer of the head pose estimation model and a comprehensive dimensional pose loss function of a comprehensive dimensional pose layer of the head pose estimation model comprises: applying the formula
L total =L yaw_total +L pitch_total +L roll_total +αL cls
Determining the total loss function; wherein L is total As a total loss function, L yaw_total 、L pitch_total 、L roll_total One-dimensional attitude loss functions of the Yaw angle, the Pitch angle and the Roll angle, L cls Is a comprehensive dimension attitude loss function, and alpha is a comprehensive dimension attitude loss function L cls Is a weight of (a).
In some embodiments, the determining of the head pose estimation model further comprises: extracting a verification set from the depth image sample data according to a preset sampling strategy; and adopting an RMSProp optimizer to dynamically adjust the learning rate, and utilizing the verification set to verify generalization and precision of the head posture estimation model to determine the head posture estimation model.
In a second aspect, an embodiment of the present invention provides a head pose estimation apparatus, including: an acquisition unit configured to acquire depth image data; the processing unit is used for inputting the depth image data into a head posture estimation model to obtain a head posture estimation result output by the head posture estimation model; the head posture estimation model is obtained by training with depth image sample data as a sample and a predetermined comprehensive dimension posture size and angle two-classification label, a single-dimension posture multi-classification label and a single-dimension regression label which correspond to the depth image sample data as sample labels; the comprehensive dimension gesture size and angle two-class label is used for representing the spatial characteristics of the comprehensive multi-dimensional head gesture; the single-dimensional posture multi-classification tag and the single-dimensional regression tag are both used for representing the spatial features of a single-dimensional head posture.
In a third aspect, an embodiment of the present invention provides an electronic device, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the head pose estimation method provided by any possible implementation of the first aspect when the program is executed.
In a fourth aspect, embodiments of the present invention provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the head pose estimation method provided by any possible implementation of the first aspect.
According to the head posture estimation method, the head posture estimation device, the electronic equipment and the non-transitory computer readable storage medium, a large amount of depth image sample data and corresponding comprehensive dimension posture size and angle two-class labels, single-dimension posture multi-class labels and single-dimension regression labels are used as sample labels to train a head posture estimation model, and then the head posture estimation model is used for obtaining a head posture estimation result, so that the mapping relation between the face image characteristics and the head posture Euler angles can be obtained through fitting better, the head posture estimation method and the head posture estimation device have higher precision and generalization capability, and the head posture estimation result is more robust.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, and it is obvious that the drawings in the following description are some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a head pose estimation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a head pose estimation model according to an embodiment of the present invention;
FIG. 3 is a flow chart of a head pose estimation method according to an embodiment of the present invention;
FIG. 4 is a flow chart of a head pose estimation method according to an embodiment of the present invention;
FIG. 5 is a schematic view of a head pose estimation device according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
A head pose estimation method according to an embodiment of the present invention is described below with reference to fig. 1 to 4.
As shown in fig. 1, the head pose estimation method according to the embodiment of the present invention includes the following steps S100 to S200.
Step S100, obtaining depth image data.
The depth image (depth image) is also called a range image, and refers to an image in which the distance (depth) from an image collector to each point in a scene is taken as a pixel value, and directly reflects the geometry of the visible surface of the scene. The depth image may be calculated as point cloud data through coordinate conversion, and the point cloud data of regular and necessary information may also be inversely calculated as depth image data.
The depth image data in the embodiment of the invention is a human head depth image, is mainly used for estimating the head gesture, and can be used for acquiring the depth image with rich head gesture (related to different angles of each dimension as far as possible) under each scene (not limited to factors such as distance, illumination, shielding, blurring, ornaments and the like) by adopting a depth camera.
Step 200, inputting the depth image data into the head posture estimation model to obtain a head posture estimation result output by the head posture estimation model.
It will be appreciated that the depth image data may be processed using a head pose estimation model to obtain a corresponding head pose estimation result.
The head posture estimation model is obtained by training a depth image sample data serving as a sample, and a comprehensive dimension posture size and angle two-classification label, a single-dimension posture multi-classification label and a single-dimension regression label which are marked in advance or defined in a section and correspond to the depth image sample data one by one serving as sample labels. The comprehensive dimension gesture size and angle two-classification tag is used for representing the spatial features of the head gesture in multiple dimensions; both the single-dimensional pose multi-classification tag and the single-dimensional regression tag are used to characterize the spatial features of the head pose in a single dimension.
It can be understood that the head posture estimation model is obtained by training a large number of depth image sample data and corresponding comprehensive dimension posture size angle two-class labels, single-dimension posture multi-class labels and single-dimension regression labels as sample labels.
It should be noted that the head gesture features are often described from three angles, and the comprehensive dimension gesture size angle two-classification label integrally describes the head gesture size features from three angles; the single-dimensional gesture multi-classification label describes the head gesture characteristics by multi-classification according to each angle; the single-dimensional regression tag describes the head posture characteristics by regressing to a certain interval according to each angle.
According to the embodiment of the invention, a large amount of depth image sample data and corresponding comprehensive dimension gesture size angle two-classification labels, single dimension gesture multi-classification labels and single dimension regression labels are used as sample labels to train the head gesture estimation model, so that the head gesture estimation result is obtained by using the head gesture estimation model, the mapping relation between the face image characteristics and the head gesture Euler angles can be obtained by better fitting, the precision and generalization capability are higher, and the head gesture estimation result is more robust.
As shown in fig. 2, in some embodiments, the head pose estimation model includes a feature extraction layer, a single-dimensional pose layer, and a comprehensive dimensional pose layer.
As shown in fig. 3, the depth image data is input into the head pose estimation model to obtain the head pose estimation result output by the head pose estimation model, which includes the following steps S210-S230.
And step S210, inputting the depth image data into a feature extraction layer to obtain a single-dimensional gesture feature map and a multi-dimensional gesture feature map.
Step S220, inputting the single-dimensional gesture feature map into a single-dimensional gesture layer to obtain single-dimensional gesture multi-classification information and single-dimensional regression information.
And step S230, inputting the multidimensional gesture feature map into a comprehensive dimension gesture layer to obtain the two classification information of the size and the angle of the comprehensive dimension gesture.
Step S240: and determining the head posture estimation result based on the single-dimensional posture multi-classification information, the single-dimensional regression information and the single-dimensional regression information.
The determination process of the head pose estimation model includes the following processes.
The single-dimensional gesture layer is obtained by training with a single-dimensional gesture sample feature map as a sample and a single-dimensional gesture multi-classification label and a single-dimensional regression label which are predetermined and correspond to the single-dimensional gesture sample feature map as sample labels.
The comprehensive dimension posture layer is obtained by training with a multi-dimension posture sample feature map as a sample and a predetermined comprehensive dimension posture size and angle two-class label corresponding to the multi-dimension posture sample feature map as a sample label.
It is understood that the head pose estimation model may include a feature extraction layer, a single-dimensional pose layer, and a comprehensive dimensional pose layer. The feature extraction layer is mainly used for extracting feature graphs based on a SheffeNet_Pose (lightweight network for posture estimation) and sharing weights; the single-dimensional attitude layer mainly adopts a regression angle of the single-dimensional attitude to assist multi-classification training corresponding to the single dimension; the comprehensive dimension gesture layer is designed mainly aiming at the problem that extremely large angle data in single dimension gesture training data are less and extremely large angle estimation is not robust, and aims to monitor and restrict the extremely large angle estimation of the single dimension gesture layer.
The embodiment of the invention aims at the SheffeNet (lightweight network) to perform network pruning and the more simplified design of parameter control, and the obtained head posture estimation model has high running speed and high accuracy and can meet the real-time and accuracy requirements of face recognition scenes.
According to the embodiment of the invention, the head posture estimation model is divided into the feature extraction layer, the single-dimensional posture layer and the comprehensive dimensional posture layer, so that the depth image data can be processed from multiple dimensions, the head posture features can be represented from the multiple dimensions, and the head posture estimation result data output by the model is more accurate.
As shown in fig. 4, in some embodiments, acquiring depth image data includes the following steps S110-S120.
Step S110, acquiring an original depth image.
Step S120, face detection is carried out on the original depth image, redundant background areas are removed, and depth image data are determined.
It can be understood that face detection is performed on the original depth image, and the face frame is enlarged to a proper size according to the length of the longest edge of the detection frame, so that the whole face is ensured to be in a cutting range, no redundant background area is contained, depth image data is determined, and the data format of the depth image data can be binary depth point cloud with uniform size for storing depth information in the cutting area.
According to the embodiment of the invention, the original depth image is preprocessed, so that the interference of irrelevant data can be removed, and the depth image data is more accurate.
In some embodiments, the process of determining the head pose estimation model further includes the following.
And acquiring a three-dimensional attitude angle label corresponding to any depth image data, wherein the three-dimensional attitude angle label comprises a Yaw angle, a Pitch angle and a Roll angle.
It will be appreciated that the three-dimensional attitude angle label characterizes depth image data in three angles, namely Yaw (Yaw angle of left-right rotation), pitch (Pitch angle of up-down rotation) and Roll (Roll angle of horizontal rotation).
It can be understood that in the head posture estimation model of the embodiment of the present invention, in actual training, a two-class label with a comprehensive dimensional posture, a multi-class label with a single dimensional posture, and a one-dimensional regression label are required to be constructed simultaneously, where each angle (Yaw angle, pitch angle, and Roll angle) in the multi-class label with a single dimensional posture corresponds to one multi-class label, and each angle (Yaw angle, pitch angle, and Roll angle) in the one-dimensional regression label corresponds to one regression label, and the 7-dimensional label vectors are shared according to the angle.
And determining a two-class label of the size and the angle of the comprehensive dimension gesture based on the comparison of the three-dimensional gesture angle label and the set threshold value of each dimension gesture angle.
It can be understood that the two classification labels of the size and the angle of the comprehensive dimension gesture are determined by setting a threshold value, and as the spatial characteristics of the gesture in the threshold value critical angle area are not easy to distinguish, a certain interval is set in the threshold value critical angle, the specific threshold value is set as follows:
(1) Large angle: abs (yaw) >45& abs (pitch) >35& abs (roll) >40, tag 1;
(2) Small angle: abs (yaw) <40or abs (pitch) <30or abs (roll) <30, the label is 0;
(3) Critical angle area: and the label is-1 between the large angle area and the small angle area, and only the label is marked due to the characteristic ambiguity of the part of data, so that the two-class training of the comprehensive dimension is not actually participated.
And determining the single-dimensional attitude multi-classification label based on the interval division of the three-dimensional attitude angle label.
It can be understood that the single-dimensional attitude multi-classification labels are obtained by dividing attitude angles of corresponding dimensions into sections according to 3 degrees, taking a yaw angle as an example, the sections can be divided into 60 subcategories, and the corresponding labels are discrete values in the sections of [0, 59 ]; the pitch and roll are each divided into 40 classes, corresponding to discrete values in the [0, 39] interval.
And determining a one-dimensional regression label based on the standardized processing of the three-dimensional attitude angle label.
It is understood that the single-dimensional regression labels are obtained by respectively normalizing the attitude angles of the corresponding dimensions to the range of [ -1,1 ].
According to the embodiment of the invention, the specific generation modes of the three labels are limited, so that the training of the head posture estimation model is more refined, and the accuracy of the head posture estimation model is higher.
In some embodiments, the head pose estimation model is trained using a total loss function that is determined based on a single-dimensional pose loss function of a single-dimensional pose layer of the head pose estimation model and a comprehensive dimensional pose loss function of a comprehensive dimensional pose layer of the head pose estimation model.
Determining a total loss function based on the one-dimensional pose loss function of the one-dimensional pose layer of the head pose estimation model and the comprehensive dimensional pose loss function of the comprehensive dimensional pose layer of the head pose estimation model, comprising: applying the formula
L total =L yaw_total +L pitch_total +L roll_total +αL cls
Determining a total loss function;
wherein L is total As a total loss function, L yaw_total 、L pitch_total 、L roll_total One-dimensional attitude loss functions of the Yaw angle, the Pitch angle and the Roll angle, L cls Is a comprehensive dimension attitude loss function, and alpha is a comprehensive dimension attitude loss function L cls Is a weight of (a).
It should be noted that, the main function of the single-dimensional attitude loss function is to perform the two-class training of the size and angle of the comprehensive dimension, and aims to restrict the fine-class training of correcting the single-dimensional attitude for the extremely large angle, and the cross entropy loss function is adopted, and the formula is as follows:
L cls =-[y*log(p)+(1-y)*log(1-p)];
wherein L is cls Representing a comprehensive dimension attitude loss function, wherein y represents a large-angle label of a sample, 1 is a large angle, and 0 is a small angle; p is the probability that the sample predicts as a large angle.
The comprehensive dimension attitude loss function is obtained by adopting multi-loss calculation combining classification and regression loss functions for each dimension attitude, and mainly monitors multi-classification learning of the corresponding dimension by using the regression angle of the single dimension attitude, so that classification results of all subclasses are more accurate, and a more accurate continuous attitude angle is predicted. The comprehensive dimension attitude loss function is calculated according to different dimensions as follows, taking the Yaw angle as an example:
L yaw_cls =H(x,x′);
L yaw_total =L yaw_cls +βL yaw_mse
wherein L is yaw_cls Multi-class cross entropy loss for the Yaw angle, β is the weight of the regression lossHeavy parameter, L yaw_mse And Z is a single-dimensional regression label of the Yaw, and Z' is a regression angle value solved according to the predicted sub-classification interval probability map. The specific calculation formula of Z' is as follows:
wherein p is k Mu, the probability distribution being located in the kth subinterval k The feature vector of the prediction result in the kth subinterval and the rest of the dimension gesture loss are designed to be L yaw_total
For the aforementioned integrated dimension gesture L cls And a single-dimensional gesture L yaw_total ,L pitch_total L and L roll_total And carrying out weighted summation to determine a total loss function.
Aiming at the problem that the current head posture estimation is inaccurate for larger angle estimation, the embodiment of the invention combines the comprehensive posture dimension and the single posture dimension to design the total loss function to assist training, and further improves the performance of the head posture estimation model.
In some embodiments, the process of determining the head pose estimation model further comprises the following process.
And extracting a verification set from the depth image sample data according to a preset sampling strategy.
It will be appreciated that the sampling strategy is first set according to a preset sampling strategy, for example, such a sampling strategy may be a training set: verification set = 4:1, respectively extracting training set and verification set from the depth image sample data, wherein the training set is used for training the head posture estimation model, and the verification set is used for verifying network generalization and precision in the training process.
It should be noted that, because each depth image sample data corresponds to a 7-dimensional label, the embodiment of the invention designs a preset sampling strategy to ensure the uniformity of the distribution of the depth image sample data in each sampling training, and simultaneously adds data enhancement such as random clipping and point cloud dithering in the training process to enrich the training data.
And adopting an RMSProp optimizer to dynamically adjust the learning rate, and utilizing a verification set to verify the generalization and the precision of the head posture estimation model to determine the head posture estimation model.
It should be noted that, in the embodiment of the present invention, an RMSProp optimizer is used in the aspect of super-parameter configuration, the initial learning rate is 0.01, and the learning rate is dynamically adjusted in the training process of the head posture estimation model, and the learning rate is stepped down along with the increase of the iteration number, so as to ensure that when the training is deeper, the optimizer does not generate large-amplitude oscillation due to the larger learning rate. In order to ensure that loss can continuously converge in the training process, variation trend adjustment parameters of loss and precision indexes in the training process are observed in real time, and a network parameter model is automatically stored for later use according to a set step length in the training process.
According to the embodiment of the invention, the verification set is divided from the depth image sample data and used for verifying the generalization and the precision of the head posture estimation model, so that the model performance in the training process can be monitored in real time, and the accuracy and the generalization capability of the head posture estimation model are further improved.
The head posture estimating apparatus provided by the embodiment of the present invention will be described below, and the head posture estimating apparatus described below and the head posture estimating method described above may be referred to correspondingly to each other.
As shown in fig. 5, an embodiment of the present invention provides a head pose estimation apparatus, including an acquisition unit 510 and a processing unit 520.
Wherein, the obtaining unit 510 is configured to obtain depth image data.
The processing unit 520 is configured to input the depth image data into the head pose estimation model, and obtain a head pose estimation result output by the head pose estimation model.
The head posture estimation model is obtained by training a depth image sample data serving as a sample, and a comprehensive dimension posture size and angle two-classification label, a single-dimension posture multi-classification label and a single-dimension regression label which are marked in advance or defined in a section and are in one-to-one correspondence with the depth image sample data serving as sample labels; the comprehensive dimension gesture size and angle two-class label is used for representing the spatial characteristics of the comprehensive multi-dimension head gesture; the single-dimensional posture multi-classification tag and the single-dimensional regression tag are both used for representing the spatial features of a single-dimensional head posture.
The embodiment of the present invention provides a head posture estimation device for executing the above head posture estimation method, and the specific implementation manner of the device is consistent with the implementation manner of the method, which is not described herein.
Fig. 6 illustrates a physical schematic diagram of an electronic device, as shown in fig. 6, which may include: processor 610, communication interface (Communications Interface) 620, memory 630, and communication bus 640, wherein processor 610, communication interface 620, and memory 630 communicate with each other via communication bus 640. The processor 610 may invoke logic instructions in the memory 630 to perform a head pose estimation method comprising: acquiring depth image data; inputting the depth image data into a head posture estimation model to obtain a head posture estimation result output by the head posture estimation model; the head posture estimation model is obtained by training a depth image sample data serving as a sample, and a comprehensive dimension posture size and angle two-classification label, a single-dimension posture multi-classification label and a single-dimension regression label which are marked in advance or defined in a section and are in one-to-one correspondence with the depth image sample data serving as sample labels; the comprehensive dimension gesture size and angle two-class label is used for representing the spatial characteristics of the comprehensive multi-dimension head gesture; the single-dimensional posture multi-classification tag and the single-dimensional regression tag are both used for representing the spatial features of a single-dimensional head posture.
It should be noted that, in this embodiment, the electronic device may be a server, a PC, or other devices in the specific implementation, so long as the structure of the electronic device includes a processor 610, a communication interface 620, a memory 630, and a communication bus 640 as shown in fig. 6, where the processor 610, the communication interface 620, and the memory 630 complete communication with each other through the communication bus 640, and the processor 610 may call logic instructions in the memory 630 to execute the above method. The embodiment does not limit a specific implementation form of the electronic device.
Further, the logic instructions in the memory 630 may be implemented in the form of software functional units and stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Further, embodiments of the present invention disclose a computer program product comprising a computer program stored on a non-transitory computer readable storage medium, the computer program comprising program instructions which, when executed by a computer, enable the computer to perform the head pose estimation method provided by the above-described method embodiments, the method comprising: acquiring depth image data; inputting the depth image data into a head posture estimation model to obtain a head posture estimation result output by the head posture estimation model; the head posture estimation model is obtained by training a depth image sample data serving as a sample, and a comprehensive dimension posture size and angle two-classification label, a single-dimension posture multi-classification label and a single-dimension regression label which are marked in advance or defined in a section and are in one-to-one correspondence with the depth image sample data serving as sample labels; the comprehensive dimension gesture size and angle two-class label is used for representing the spatial characteristics of the comprehensive multi-dimension head gesture; the single-dimensional posture multi-classification tag and the single-dimensional regression tag are both used for representing the spatial features of a single-dimensional head posture.
In another aspect, embodiments of the present invention further provide a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the head pose estimation method provided in the above embodiments, the method comprising: acquiring depth image data; inputting the depth image data into a head posture estimation model to obtain a head posture estimation result output by the head posture estimation model; the head posture estimation model is obtained by training a depth image sample data serving as a sample, and a comprehensive dimension posture size and angle two-classification label, a single-dimension posture multi-classification label and a single-dimension regression label which are marked in advance or defined in a section and are in one-to-one correspondence with the depth image sample data serving as sample labels; the comprehensive dimension gesture size and angle two-class label is used for representing the spatial characteristics of the comprehensive multi-dimension head gesture; the single-dimensional posture multi-classification tag and the single-dimensional regression tag are both used for representing the spatial features of a single-dimensional head posture.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (9)

1. A head pose estimation method, comprising:
acquiring depth image data;
inputting the depth image data into a head posture estimation model to obtain a head posture estimation result output by the head posture estimation model;
the head posture estimation model is obtained by training a depth image sample data serving as a sample and a comprehensive dimension posture size and angle two-class label, a single-dimension posture multi-class label and a single-dimension regression label which are marked in advance or defined in a section and are in one-to-one correspondence with the depth image sample data serving as sample labels; the comprehensive dimension gesture size and angle two-class label is used for representing the spatial characteristics of the comprehensive multi-dimensional head gesture; the single-dimensional posture multi-classification tag and the single-dimensional regression tag are used for representing the spatial characteristics of the single-dimensional head posture;
the determining process of the head pose estimation model further comprises:
acquiring a three-dimensional attitude angle label corresponding to any depth image data, wherein the three-dimensional attitude angle label comprises a Yaw angle, a Pitch angle and a Roll angle;
determining the two classification labels of the size and the angle of the comprehensive dimension gesture based on the comparison of the three-dimensional gesture angle label and the set gesture angle threshold value of each dimension;
determining the single-dimensional attitude multi-classification label based on interval division of the three-dimensional attitude angle label;
and determining the single-dimensional regression tag based on the standardized processing of the three-dimensional attitude angle tag.
2. The head pose estimation method according to claim 1, wherein the head pose estimation model includes a feature extraction layer, a single-dimensional pose layer, and a comprehensive dimensional pose layer;
the step of inputting the depth image data into a head posture estimation model to obtain a head posture estimation result output by the head posture estimation model comprises the following steps:
inputting the depth image data to the feature extraction layer to obtain a single-dimensional gesture feature map and a multi-dimensional gesture feature map;
inputting the single-dimensional gesture feature map into the single-dimensional gesture layer to obtain single-dimensional gesture multi-classification information and single-dimensional regression information;
inputting the multidimensional gesture feature map into the comprehensive dimensional gesture layer to obtain the size and angle classification information of the comprehensive dimensional gesture;
determining the head pose estimation result based on the single-dimensional pose multi-classification information, the single-dimensional regression information and the single-dimensional regression information;
the determining process of the head posture estimation model comprises the following steps:
the single-dimensional posture layer is obtained by training a single-dimensional posture sample feature map serving as a sample and a single-dimensional posture multi-classification label and a single-dimensional regression label which are predetermined and correspond to the single-dimensional posture sample feature map serving as sample labels;
the comprehensive dimension posture layer is obtained by training a multi-dimension posture sample feature map serving as a sample and a predetermined comprehensive dimension posture size and angle two-class label corresponding to the multi-dimension posture sample feature map serving as a sample label.
3. The head pose estimation method according to claim 1, wherein the acquiring depth image data comprises:
collecting an original depth image;
and performing face detection on the original depth image, removing redundant background areas, and determining the depth image data.
4. The head pose estimation method according to claim 1, wherein the head pose estimation model is trained using a total loss function, the total loss function being determined based on a single-dimensional pose loss function of a single-dimensional pose layer of the head pose estimation model and a comprehensive dimensional pose loss function of a comprehensive dimensional pose layer of the head pose estimation model.
5. The head pose estimation method according to claim 4, wherein,
the determining the total loss function based on the single-dimensional pose loss function of the single-dimensional pose layer of the head pose estimation model and the comprehensive dimensional pose loss function of the comprehensive dimensional pose layer of the head pose estimation model comprises the following steps: applying the formula
L total =L yaw_total +L pitch_total +L roll_total +αL cls
Determining the total loss function;
wherein L is total As a total loss function, L yaw_total 、L pitch_total 、L roll_total One-dimensional attitude loss functions of the Yaw angle, the Pitch angle and the Roll angle, L cls Is a comprehensive dimension attitude loss function, and alpha is a comprehensive dimension attitude loss function L cls Is a weight of (a).
6. The head pose estimation method according to any one of claims 1-5, wherein the process of determining the head pose estimation model further comprises:
extracting a verification set from the depth image sample data according to a preset sampling strategy;
and adopting an RMSProp optimizer to dynamically adjust the learning rate, and utilizing the verification set to verify generalization and precision of the head posture estimation model to determine the head posture estimation model.
7. A head pose estimation device, comprising:
an acquisition unit configured to acquire depth image data;
the processing unit is used for inputting the depth image data into a head posture estimation model to obtain a head posture estimation result output by the head posture estimation model;
the head posture estimation model is obtained by training with depth image sample data as a sample and a predetermined comprehensive dimension posture size and angle two-classification label, a single-dimension posture multi-classification label and a single-dimension regression label which correspond to the depth image sample data as sample labels; the comprehensive dimension gesture size and angle two-class label is used for representing the spatial characteristics of the comprehensive multi-dimensional head gesture; the single-dimensional posture multi-classification tag and the single-dimensional regression tag are used for representing the spatial characteristics of the single-dimensional head posture;
the determining process of the head pose estimation model further comprises:
acquiring a three-dimensional attitude angle label corresponding to any depth image data, wherein the three-dimensional attitude angle label comprises a Yaw angle, a Pitch angle and a Roll angle;
determining the two classification labels of the size and the angle of the comprehensive dimension gesture based on the comparison of the three-dimensional gesture angle label and the set gesture angle threshold value of each dimension;
determining the single-dimensional attitude multi-classification label based on interval division of the three-dimensional attitude angle label;
and determining the single-dimensional regression tag based on the standardized processing of the three-dimensional attitude angle tag.
8. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the head pose estimation method according to any of claims 1 to 6 when the program is executed.
9. A non-transitory computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the steps of the head pose estimation method according to any of claims 1 to 6.
CN202010431119.XA 2020-05-20 2020-05-20 Head pose estimation method and device Active CN111695438B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010431119.XA CN111695438B (en) 2020-05-20 2020-05-20 Head pose estimation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010431119.XA CN111695438B (en) 2020-05-20 2020-05-20 Head pose estimation method and device

Publications (2)

Publication Number Publication Date
CN111695438A CN111695438A (en) 2020-09-22
CN111695438B true CN111695438B (en) 2023-08-04

Family

ID=72478042

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010431119.XA Active CN111695438B (en) 2020-05-20 2020-05-20 Head pose estimation method and device

Country Status (1)

Country Link
CN (1) CN111695438B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034017A (en) * 2018-07-12 2018-12-18 北京华捷艾米科技有限公司 Head pose estimation method and machine readable storage medium
CN109977757A (en) * 2019-01-28 2019-07-05 电子科技大学 A kind of multi-modal head pose estimation method based on interacting depth Recurrent networks
CN110119148A (en) * 2019-05-14 2019-08-13 深圳大学 A kind of six-degree-of-freedom posture estimation method, device and computer readable storage medium
CN110197125A (en) * 2019-05-05 2019-09-03 上海资汇信息科技有限公司 Face identification method under unconfined condition
CN110427849A (en) * 2019-07-23 2019-11-08 深圳前海达闼云端智能科技有限公司 Face pose determination method and device, storage medium and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8687880B2 (en) * 2012-03-20 2014-04-01 Microsoft Corporation Real time head pose estimation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109034017A (en) * 2018-07-12 2018-12-18 北京华捷艾米科技有限公司 Head pose estimation method and machine readable storage medium
CN109977757A (en) * 2019-01-28 2019-07-05 电子科技大学 A kind of multi-modal head pose estimation method based on interacting depth Recurrent networks
CN110197125A (en) * 2019-05-05 2019-09-03 上海资汇信息科技有限公司 Face identification method under unconfined condition
CN110119148A (en) * 2019-05-14 2019-08-13 深圳大学 A kind of six-degree-of-freedom posture estimation method, device and computer readable storage medium
CN110427849A (en) * 2019-07-23 2019-11-08 深圳前海达闼云端智能科技有限公司 Face pose determination method and device, storage medium and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
Head Pose Estimation with Siamese Convolutional Neural Network;Fuxun Gao et al.;《 IEEE Xplore》;全文 *

Also Published As

Publication number Publication date
CN111695438A (en) 2020-09-22

Similar Documents

Publication Publication Date Title
CN109815826B (en) Method and device for generating face attribute model
CN106228185B (en) A kind of general image classifying and identifying system neural network based and method
EP3084682B1 (en) System and method for identifying faces in unconstrained media
CN108701234A (en) Licence plate recognition method and cloud system
CN109033978B (en) Error correction strategy-based CNN-SVM hybrid model gesture recognition method
WO2023151237A1 (en) Face pose estimation method and apparatus, electronic device, and storage medium
CN113095333B (en) Unsupervised feature point detection method and unsupervised feature point detection device
CN108124489B (en) Information processing method, apparatus, cloud processing device and computer program product
CN111274978B (en) Micro expression recognition method and device
CN109858454B (en) Adaptive kernel correlation filtering tracking method based on dual models
CN112258557B (en) Visual tracking method based on space attention feature aggregation
CN108492301A (en) A kind of Scene Segmentation, terminal and storage medium
CN112016454A (en) Face alignment detection method
JP2023545052A (en) Image processing model training method and device, image processing method and device, electronic equipment, and computer program
CN109740674A (en) A kind of image processing method, device, equipment and storage medium
CN114118303B (en) Face key point detection method and device based on prior constraint
CN110765843A (en) Face verification method and device, computer equipment and storage medium
CN112509154B (en) Training method of image generation model, image generation method and device
CN111695438B (en) Head pose estimation method and device
JP2017033556A (en) Image processing method and electronic apparatus
CN111860054A (en) Convolutional network training method and device
CN112053384B (en) Target tracking method based on bounding box regression model
CN115661618A (en) Training method of image quality evaluation model, image quality evaluation method and device
CN115620082A (en) Model training method, head posture estimation method, electronic device, and storage medium
CN109871867A (en) A kind of pattern fitting method of the data characterization based on preference statistics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220629

Address after: Room 611-217, R & D center building, China (Hefei) international intelligent voice Industrial Park, 3333 Xiyou Road, high tech Zone, Hefei City, Anhui Province

Applicant after: Hefei lushenshi Technology Co.,Ltd.

Address before: Room 3032, gate 6, block B, 768 Creative Industry Park, 5 Xueyuan Road, Haidian District, Beijing 100083

Applicant before: BEIJING DILUSENSE TECHNOLOGY CO.,LTD.

Applicant before: Hefei lushenshi Technology Co.,Ltd.

GR01 Patent grant
GR01 Patent grant