CN111339941A

CN111339941A - Head posture detection method

Info

Publication number: CN111339941A
Application number: CN202010119229.2A
Authority: CN
Inventors: 林士然; 蒋磊
Original assignee: Suzhou Lingtu Intelligent Technology Co ltd
Current assignee: Suzhou Lingtu Intelligent Technology Co ltd
Priority date: 2020-02-26
Filing date: 2020-02-26
Publication date: 2020-06-26

Abstract

The invention relates to a head posture detection method, which comprises the following steps: (a) selecting a data set; (b) preprocessing the face pictures in the data set, and then converting the sizes of the face pictures to obtain pictures with set sizes; (c) constructing a deep learning model by taking MobileNetv2 as a backbone; (d) putting the pictures with the set sizes into the neural network for classification; (e) applying softmax to the fully-connected layer result to map the fully-connected layer value to a probability value; (f) mapping the probability value to obtain regression, and calculating the regression loss probability by using an MSE loss function method; (g) carrying out weight weighted summation on the loss probability, and carrying out gradient direction on the final loss probability to finish the training of a deep learning model; (h) and testing the head of the child by using the deep learning model. The method has the advantages of high program detection speed and real-time performance.

Description

Head posture detection method

Technical Field

The invention relates to a head posture detection method, in particular to a method for detecting the head posture of a child with mental diseases by utilizing a deep learning training model in a computer vision technology.

Background

Head gestures can help people position to convey some rich information, such as people pointing with their head to indicate their dialog objects and intentions. In some dialogs, the head direction is a non-verbal announcement that alerts the listener when to change roles and begin speaking; in these dialogs, the head pose direction and the form of the gesture have the same important role.

For some children with autism, hyperactivity or tic disorder, the head orientation can reflect what the children aim at the current environment, so that the therapist or doctor can know the thoughts of the children conveniently. There are several methods of detecting head pose today: such as the early use of the detector array approach (training a large number of head detectors, each detector accommodating a particular pose, then assigning a discrete pose to these detectors, with some head pose being predicted accordingly); in the middle period, a nonlinear regression method or a random forest algorithm in machine learning is used; some recent algorithms extract key points of the human face to perform prediction of the head pose by deep learning training.

However, the above method has certain disadvantages: are relatively dependent on environmental influences. If the environmental background is changed greatly or the ages of the testers are different greatly (for example, the people with mental diseases such as autism, hyperactivity or tics are children generally, but the head posture of the children is detected slightly different from that of adults), the detection structure is easy to be inaccurate.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a head posture detection method which is suitable for children with mental diseases such as autism, hyperactivity or tic disorder.

In order to achieve the purpose, the technical scheme adopted by the invention is as follows: a head posture detection method comprises the following steps:

(a) selecting a data set;

(b) preprocessing the face picture in the data set, performing face detection and cutting on the face picture by using a multitask cascade convolution neural network, and then performing size conversion to obtain a picture with a set size;

(c) constructing a neural network which takes MobileNetv2 as a backbone and is respectively connected with three full-connection layers for the deep learning model;

(d) putting the pictures with the set sizes into the neural network for classification;

(e) applying softmax to the fully-connected layer result to map the fully-connected layer value to a probability value;

(f) mapping the probability value to obtain regression, and calculating the regression loss probability by using an MSE loss function method;

(g) carrying out weight weighted summation on the loss probability, and carrying out gradient direction on the final loss probability to finish the training of a deep learning model;

(h) the nose is taken as a basic point, the horizontal direction is set as an x axis, the vertical direction is set as a y axis, the z axis is perpendicular to a plane formed by the x axis and the y axis, the angle of clockwise rotation around the x axis, the y axis and the z axis is defined as the offset angle of the head posture in pitch, yaw and roll directions, and the deep learning model is used for testing the head of the child to obtain the posture position of the head of the child.

Optimally, in step (a), the data sets are BIWI, 300W-LP and AFLW2000 data sets.

Optimally, in the step (b), the preprocessing is to exclude the unwanted background or other objects in the face picture.

Further, in step (b), the multitask cascade convolution neural network is completed by three cascade lightweight CNNs of PNet, RNet and Onet.

Optimally, in the step (d), the classification result map is put into a range.

Due to the application of the technical scheme, compared with the prior art, the invention has the following advantages: according to the head posture detection method, the deep learning model is used for performing loss calculation on the three angles by using the composite loss function, the program detection speed is high, and the real-time performance can be achieved; the unified evaluation standard is available, and the accuracy is high; the time for the therapist or doctor to observe the child can be saved, and the therapist or doctor can treat the child more in other aspects; and the data may be stored and displayed in the form of video.

Drawings

FIG. 1 is a flow chart of an MSE loss function in the head pose detection method of the present invention;

FIG. 2 is a diagram illustrating a first effect of the head pose detection method according to the present invention;

fig. 3 is a diagram illustrating a second effect of the head pose detection method according to the present invention.

Detailed Description

Preferred embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

The head posture detection method comprises the following steps:

(a) selecting a data set; in the present embodiment, the datasets are primarily BIWI, 300W-LP, and AFLW2000 datasets (i.e., primarily trained and tested on the BIWI, 300W-LP, and AFLW2000 datasets). The BIWI data set was released in 2010 and contained 1000 high quality 3D scanners and 3D data collected by professional microphones, with intensive dynamic face scans acquired at 25 frames per second. The 300W-LP is a 3D data set obtained by simulation based on a 300W data set and a 3DMM model, which is the most widely used simulation data set in the 3D field and comprises labels of 68 key points, camera parameters and coefficients of the 3DMM model. AFLW is a large-scale database of faces including multi-pose, multi-view faces, typically used to evaluate the effect of facial keypoint detection, with pictures from a crawl of flickr, totaling 21997 pictures, 25993 faces, and 21 keypoints per face label (380000 keypoints total).

(b) The method comprises the steps of preprocessing face pictures in a data set, using a multi-task cascade convolution neural network (MTCNN, MTCNN is a more classical and rapid face detection technology, and is completed by three cascade lightweight CNNs, PNet, RNet and Onet) to detect and cut the faces of the face pictures, and eliminating some unneeded backgrounds or other objects to ensure that overfitting data does not appear during training;

(c) constructing a neural network which takes MobileNetv2 as a backbone and is respectively connected with three full-connection layers for the deep learning model (namely, the deep network takes MobileNetv2 as the backbone of the foundation and is respectively connected with the three full-connection layers, and each layer is independently predicted);

(d) the pictures with set sizes are put into a neural network for classification, and then the classification result map is put into a range, so that the precision of the method is greatly improved (the step is the loss probability of classification);

(e) performing softmax on the result of the full link layer to map the value of the full link layer to a probability value;

(f) the probability value is mapped to obtain regression (namely, the regression is needed when mapping is carried out according to the probability value), and the loss probability of the regression is calculated by using an MSE loss function method (as shown in figure 1, MSE mainly introduces the definition of common loss function MSE in machine learning and derivation characteristics thereof, and the mean square error in mathematical statistics refers to the expectation value of the square of the difference between a parameter estimation value and a parameter value;

(h) taking a nose as a base point, setting the horizontal direction as an x-axis, setting the vertical direction as a y-axis, setting the z-axis to be perpendicular to a plane formed by the x-axis and the y-axis, defining an angle of clockwise rotation around the x-axis, the y-axis and the z-axis as a deviation angle of the head posture in pitch, yaw and roll directions, and testing the head of the child by using the deep learning model to obtain the posture position of the head of the child (for specific application, see fig. 2 and fig. 3).

The above embodiments are merely illustrative of the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and implement the present invention, and not to limit the protection scope of the present invention. All equivalent changes and modifications made according to the spirit of the present invention should be covered within the protection scope of the present invention.

Claims

1. A head posture detection method is characterized by comprising the following steps:

(a) selecting a data set;

2. The head posture detection method according to claim 1, characterized in that: in step (a), the data sets are BIWI, 300W-LP and AFLW2000 data sets.

3. The head posture detection method according to claim 1, characterized in that: in the step (b), the preprocessing is to exclude the background or other objects which are not needed in the face picture.

4. The head pose detection method according to claim 1 or 3, characterized in that: in the step (b), the multitask cascade convolution neural network is completed by three cascade lightweight CNNs of PNet, RNet and Onet.

5. The head posture detection method according to claim 1, characterized in that: in the step (d), the classification result map is put into a range.