CN114283448A

CN114283448A - Child sitting posture reminding method and system based on head posture estimation

Info

Publication number: CN114283448A
Application number: CN202111551860.0A
Authority: CN
Inventors: 宣琦; 宋栩杰; 周洁韵; 翔云; 邱君瀚
Original assignee: Zhejiang University of Technology ZJUT
Current assignee: Zhejiang University of Technology ZJUT
Priority date: 2021-12-17
Filing date: 2021-12-17
Publication date: 2022-04-05

Abstract

The invention discloses a child sitting posture reminding method based on head posture estimation, which comprises the following steps: s1: acquiring an original image from the camera equipment, detecting an effective frame and reserving the effective frame; s2: extracting a head image from the original image using a target detection algorithm; s3: inputting the head image into a head attitude estimation network to obtain a probability distribution sequence of a pitch angle, a yaw angle and a roll angle and a final estimation angle; s4: and inputting the probability distribution sequence into a one-dimensional convolution network to verify accuracy, judging the head posture according to the estimation angle and carrying out voice reminding on the error posture. The invention further comprises a child sitting posture reminding system based on the head posture estimation. The invention can carry out fine-grained identification on the head Euler angle of the children in the writing and reading scenes through the camera equipment and remind the children of wrong postures, and has lower false alarm rate.

Description

Child sitting posture reminding method and system based on head posture estimation

Technical Field

The invention belongs to the field of computer vision, and mainly relates to a child sitting posture reminding method and system based on head posture estimation.

Background

Some domestic research results show that the poor reading and writing posture is associated with the occurrence and development of myopia, and the incorrect reading and writing posture rate of students is as high as 70% or even more than 85%. Therefore, correcting the reading and writing posture of children is an important way for preventing the occurrence and development of myopia.

At present, physical correction devices are common in the market, and the aim of standardizing sitting postures is achieved by restricting the moving ranges of bodies and heads. For example, the design proposal of the patent application No. CN202130311380.1, a sitting posture corrector for children, which is designed to be fixed on a table top and is used for keeping the distance between the chest of a human body and the table; for another example, patent application No. CN202030635389.3, a children sitting posture correction frame, designs a similar correction frame fixed on a table top, which is used to keep the distance between the head and the table top and prevent the head from being too low. The physical devices only isolate the human body outside a certain area, and can not ensure that the head posture of the child is correct when the child writes, and occupy the space of a desktop to a certain extent, so that the comfort of reading and writing is influenced.

As computer vision technology matures, there are related applications that use vision technology for human state recognition. For example, the invention patent with application number 201711070372.1 discloses a technical scheme, a driver attention detection method based on sight line, which obtains face key point coordinates through a 2D face key point detection algorithm, constructs a 3D head model, extracts 3D face features of a driver, calculates the sight line direction under a 3D coordinate system by using the 2D and 3D face key point coordinates and combining eye space relation, and finally takes the sight line direction as the attention direction. The core step in the technical scheme is to establish the mapping relation of a 3D head model from a 2D coordinate system, and the method based on the geometric deformation model is very dependent on the accuracy of the detection of key points of the human face and is easily influenced by factors such as appearance characteristics, light rays and the like of a driver in a real application scene.

Currently, the mainstream head posture estimation methods in academia can be mainly classified into regression and classification. Regression methods use or fit a mathematical model to directly predict poses based on labeled training data. The regressor may be a principal component analysis, a neural network, or the like. In contrast, classification methods predict poses from a discrete set of poses, with the prediction resolution often being low. The classifier may be a decision tree, a random forest, a neural network, or the like. For example, the invention patent with application number 202011019897.4 discloses a head pose estimation method based on multi-level image feature refining learning, and an implementation system and a storage medium thereof, wherein the scheme uses wavelet transformation to expand the information of pictures in the data processing stage; in the head posture estimation stage, a method of classifying first and then regressing is used, a coarse-grained classification network is used for estimating the approximate interval of the head posture, and then a fine-grained regression network is used for obtaining a specific angle value. According to the technical scheme, data enhancement is carried out, the model is further refined into two stages, but the coarse-grained network in the scheme is only divided into 5 sub-intervals, the fine-grained network in the regression stage is not greatly helped, and the complexity of the whole process is increased.

Disclosure of Invention

The invention overcomes the defects of the prior art, provides a child sitting posture reminding method and system based on head posture estimation, and provides powerful support for solving the sitting posture problem of teenagers when reading and writing characters, thereby contributing to prevention and control of the occurrence and development of teenager myopia.

In order to achieve the purpose, the invention provides the following scheme: the invention provides a child sitting posture reminding method based on head posture estimation, which comprises the following steps of:

s1: acquiring an original image from the camera equipment, inputting an effective frame discrimination network to detect an effective frame, if the probability of the effective frame is less than that of the ineffective frame, determining the original image as the ineffective frame, otherwise, determining the original image as the effective frame, and keeping the effective frame;

s2: performing head detection on the original image by using a target detection algorithm, and extracting a head image according to a detection result;

s3: inputting the head image into a head posture estimation network to obtain a probability distribution sequence of three dimensions of Euler angles

And a final estimated angular pitch angle θ_pYaw angle θ_yAngle of roll theta_r；

S4: and inputting the probability distribution sequence into a one-dimensional convolution network for accuracy evaluation, returning to the step S1 if the accuracy of the output of the one-dimensional convolution network is less than 0.5, otherwise, judging the head posture according to the estimation angle in the step S3, and carrying out voice reminding aiming at the error posture.

Preferably, step S1 specifically includes:

s1.1: adjusting the original Image into a square, and normalizing each channel to obtain an Image₁；

S1.2: image is recorded₁Inputting a valid frame discrimination network, outputting the frame image as a valid frame and an invalid frame with probability P respectively₁(x)，P₂(x) If P is₁(x)＜P₂(x) If the frame image is an invalid frame, otherwise, the frame image is an valid frame.

Preferably, the effective frame distinguishing network in step S1.2 is a lightweight network formed by a plurality of basic network blocks, where the basic network blocks are formed by 1 × 1 normal convolution and 3 × 3 depth-wise convolution, and the channel separation and channel mixing operations are performed at the head and tail of the network blocks respectively.

Preferably, step S2 specifically includes:

s2.1: adjusting the size of the original Image into a square, and normalizing each channel to obtain the Image₂，Image₂Size ratio of (1)₁Larger to retain more image information;

s2.2: image is recorded₂Import Yolov5 meshThe mark detection network carries out head identification and outputs the coordinates of a rectangular frame externally connected with the head on the original image: x is the number of_l,y_l,x_r,y_r；

Wherein x_l，y_lRespectively an x coordinate and a y coordinate of the upper left corner of the circumscribed rectangular frame; x is the number of_r，y_rRespectively an x coordinate and a y coordinate of the upper right corner of the circumscribed rectangle frame;

s2.3: and extracting the head image from the original image according to the coordinates of the head circumscribed rectangular frame.

Preferably, step S3 specifically includes:

s3.1: adjusting the size of the head Image into a square, and normalizing each channel to obtain Image₃；

S3.2: image is recorded₃Inputting an attitude estimation network to obtain a probability distribution sequence of a pitch angle, a yaw angle and a roll angle and an estimation angle;

the probability distribution sequences are respectively recorded as

Where N ∈ {66,120,66},

the pitch angle range is [ -99 degrees, 99 degrees °]The range of yaw angle is [ -180 DEG ], 180 DEG]The rolling angle range is [ -99 DEG ], 99 DEG]。

Preferably, the pose estimation network of step S3.2 comprises two parts: a backbone network for feature extraction and a full-connection layer network for regression classification;

the core structure of the backbone network is composed of 16 mobile turnover bottleneck convolution blocks, the mobile turnover bottleneck convolution blocks firstly change the number of channels through 1 x 1 convolution, then carry out deep convolution, and then are connected with a compression and excitation module, the compression and excitation module can enable a model to pay more attention to important channel characteristics, and finally recover the channel dimensionality through 1 x 1 convolution and increase the generalization and learning capability of the model through connection inactivation and jump connection operation;

the full-connection layer network comprises three branches in total, the three branches are used for calculating a pitch angle, a yaw angle and a roll angle respectively, and the output of each branch is a sequence with the length of 66 or 120;

each branch comprises a classification branch and a regression branch, in the classification branch, the probability of the class is output by each sequence unit, and the probability distribution of the branch is obtained through a softmax function; in the regression branch, each sequence unit represents a rotation angle of 3 degrees, and the expectation of the dimension is obtained through probability distribution calculation by the regression branch as an estimated angle;

the expected calculation formula is:

wherein, theta_predRepresenting the estimated angle, p_iIs the probability of the ith unit of the sequence, N is the sequence length;

the head posture estimation network is obtained by training, and the parameters of the head posture estimation network are obtained by minimizing a cost function L_totalUpdating;

the cost function consists of a plurality of loss functions:

L_total＝0.4*L_reg+0.4*L_cls+0.2*L_md (2)

wherein L is_regRepresenting the loss of the regression branch, and defining a loss function as follows in order to ensure that the penalty of the angle does not exceed 180 degrees:

L_clsrepresenting the loss of the classification branch, wherein a loss function uses binary cross entropy;

L_mdthe loss of the evaluation network formed by one-dimensional convolution is represented and used for standardizing the probability distribution characteristics of the output of the attitude estimation network.

Preferably, step S4 specifically includes:

s4.1: inputting the probability distribution sequence output by the head attitude estimation network into a one-dimensional convolution network for accuracy evaluation, wherein the accuracy index measures the correlation between the probability distribution characteristic output by the head attitude estimation network and the accuracy of a final estimation angle, and the closer the probability distribution characteristic is to the real distribution, the higher the accuracy is, and the more credible the estimation result of the head attitude estimation network is; if the accuracy of the one-dimensional convolution network output is less than 0.5, returning to the step S1, otherwise, entering the step S4.2;

s4.2: according to the obtained theta_p,θ_y,θ_rJudging the head posture;

s4.3: and if the head posture is judged to be abnormal in S4.2, voice reminding is carried out aiming at the abnormal posture.

Preferably, in step S4.3, the head pose is determined according to the estimated angles of the three dimensions and according to a preset angle interval, and the abnormal pose includes: too low head, head inclination, small difference in thought;

when the pitch angle is within the interval of [ -99 °, -45 ° ], the head in the abnormal posture of the current posture is judged to be too low; when the roll angle is not in the interval of [ -30 degrees, 30 degrees ], judging that the current posture is the head inclination in the abnormal posture; and when the yaw angle is not in the interval of [ -45 degrees, 45 degrees ], judging that the current posture is the small difference of the thought in the abnormal posture.

The invention also provides a child sitting posture reminding system based on head posture estimation, which comprises:

the device comprises an image input module, a head detection module, a posture estimation module and a sitting posture reminding module;

the image input module acquires an original image from the camera equipment and judges whether a child is in a read-write state in the current scene through an effective frame detection algorithm so as to determine whether to input the original image into a next module;

the head detection module is used for obtaining the position information of a circumscribed rectangular frame of the head in the original image by using the trained target detection model, the position information comprises x-axis coordinates and y-axis coordinates of the upper left corner and the lower right corner, and the head image is obtained by cutting according to the coordinates and is input into the next module;

the attitude estimation module is used for inputting the head image into a trained head attitude estimation network to obtain a probability distribution sequence and an estimation angle of three dimensions of an Euler angle, wherein the probability distribution sequence comprises probability distribution characteristics of the probability distribution sequence and is used for judging the accuracy of the estimation angle;

the sitting posture reminding module is used for firstly further inputting the probability distribution sequence into a one-dimensional convolution network for evaluating the accuracy of the head posture estimation network estimation angle, if the accuracy is less than 0.5, the head posture is not judged, otherwise, whether the head posture is abnormal or not is judged according to the estimation values of the pitch angle, the yaw angle and the roll angle, and if the head posture is abnormal, corresponding reminding is carried out;

the image input module, the head detection module, the posture estimation module and the sitting posture reminding module are connected in sequence.

The technical conception of the invention is as follows:

after an original image is obtained from the camera equipment, effective frame detection is firstly carried out, and scenes where no person exists or a human body is too far away from the camera equipment are eliminated; after the fact that a person is in front of the camera equipment is confirmed, detecting the head of the person in the scene through a target detection network, recording position information of the person, acquiring a head image from an original image according to the position information, inputting the head image into a head posture estimation network, and obtaining a probability distribution sequence of three dimensions of the head Euler angle and a final estimation angle; in order to ensure the accuracy of the estimated angle, the probability distribution sequence is input into a one-dimensional convolution network for accuracy evaluation, if the accuracy is more than 0.5, the head posture is judged according to the estimated angle of the head posture estimation network, and the voice reminding is carried out on the wrong sitting posture.

The invention has the beneficial effects that:

1) the invention provides a head posture estimation algorithm, which combines the advantages of a regression and classification method, has higher detection precision, reduces the classification width of a single class to 3 degrees, and has wider detection range;

2) the effective frame detection algorithm can avoid further detection and identification of invalid scenes such as unmanned scenes and non-read-write scenes which are occupied but too far away from the camera equipment, so that the computing resources are saved, and the power consumption is reduced;

3) the accuracy evaluation algorithm is realized through a light one-dimensional convolution network, the accuracy of the final estimation angle is obtained from the probability distribution characteristics of three dimensions of the Euler angle, and the possible false alarm problem in the practical application scene can be effectively prevented;

4) a set of complete children sitting posture reminding algorithm and system capable of being used for a mobile terminal are provided, the head posture of a child during reading and writing can be detected in real time, voice reminding can be carried out on the wrong sitting posture, and the occurrence and development of myopia can be effectively prevented and controlled.

Drawings

FIG. 1 is a system framework diagram of the method of the present invention;

FIG. 2 is a schematic overall flow chart of the method of the present invention;

FIG. 3 is a diagram of a pose estimation network architecture;

fig. 4 is a diagram illustrating the detection effect of four different head states.

Detailed Description

The following detailed description of embodiments of the invention is provided in connection with the accompanying drawings.

Referring to fig. 1 to 4, a child sitting posture reminding method based on head posture estimation includes the following steps:

Step S1 specifically includes:

s1.1: the original image is adjusted to be square, and the preferred image size in this embodiment is 224 × 224, and mean ═ 0.5,0.5]，std＝[0.5,0.5,0.5]Normalizing each channel to obtain an Image₁. Where mean represents the mean, std represents the standard deviation, and the normalization process can be expressed as:

wherein input represents the pixel value of the original image and output represents the normalized value;

s1.2: image is recorded₁Inputting a valid frame discrimination network, outputting the frame image as a valid frame and an invalid frame with probability P respectively₁(x)，P₂(x) If P is₁(x)＜P₂(x) If the frame image is an invalid frame, otherwise, the frame image is an effective frame;

step S1.2 the effective frame discrimination network is a lightweight network composed of a plurality of basic network blocks, where the basic network blocks are composed of 1 × 1 ordinary convolution and 3 × 3 depth-wise convolution, and the channel separation and channel mixing operations are performed at the head and tail of the network blocks, respectively.

Step S2 specifically includes:

s2.2: image is recorded₂Inputting a Yolov5 target detection network for head recognition, and outputting the coordinates of a rectangular frame outside the head on the original image: x is the number of_l,y_l,x_r,y_r；

s2.3: and extracting a head image from the original image according to the coordinates of the head circumscribed rectangular frame.

Step S3 specifically includes:

s3.1: the size of the head Image is adjusted to be square, the preferred Image size in this embodiment is 224 × 224, and normalization is performed for each channel to obtain Image₃；

the probability distribution sequences are respectively recorded as

Where N ∈ {66,120,66},

the pitch angle ranges from [ -99 °,99 [ -99 [ ]]The range of yaw angle is [ -180 DEG ], 180 DEG]The rolling angle range is [ -99 DEG ], 99 DEG]；

The pose estimation network of step S3.2 comprises two parts: a backbone network for feature extraction and a full-connection layer network for regression classification;

the expected calculation formula is:

the cost function consists of a plurality of loss functions:

L_total＝0.4*L_reg+0.4*L_cls+0.2*L_md (2)

Step S4 includes:

s4.1: inputting the probability distribution sequence output by the head attitude estimation network into a one-dimensional convolution network for accuracy evaluation, if the accuracy output by the one-dimensional convolution network is less than 0.5, returning to the step S1, otherwise, entering the step S4.2;

s4.2: according to the obtained theta_p,θ_y,θ_rJudging the head posture;

And S4.3, judging the head posture according to the estimation angles of the three dimensions and a preset angle interval, wherein the abnormal posture comprises the following steps: too low head, head inclination, small difference in thought;

The children sitting posture reminding system based on the head posture estimation comprises an image input module, a head detection module, a posture estimation module and a sitting posture reminding module;

the image input module acquires an original image from the camera equipment and judges whether a child is in a read-write state in a current scene through an effective frame detection algorithm so as to determine whether to input the original image into a next module, and the method specifically comprises the following steps:

s1.1: the original image is adjusted to be square, and the preferred image size in this embodiment is 224 × 224, and mean ═ 0.5,0.5]，std＝[0.5,0.5,0.5]Normalizing each channel to obtain an Image₁(ii) a Where mean represents the mean, std represents the standard deviation, and the normalization process can be expressed as:

The head detection module detects and obtains the position information of the external rectangular frame of the head in the original image by using the trained target detection model, comprises x and y axis coordinates of the upper left corner and the lower right corner, cuts out the head image according to the coordinates and inputs the head image into the next module, and specifically comprises:

The attitude estimation module obtains a probability distribution sequence and an estimation angle of three dimensionalities of an Euler angle by inputting a head image into a trained head attitude estimation network, wherein the probability distribution sequence contains probability distribution characteristics of the probability distribution sequence and is used for judging the accuracy of the estimation angle, and the attitude estimation module specifically comprises:

S3.2: image is recorded₃Inputting an attitude estimation network to obtain a pitch angle and a yawAngles, probability distribution sequences of roll angles and estimated angles;

the probability distribution sequences are respectively recorded as

Where N ∈ {66,120,66},

the pitch angle range is [ -99 degrees, 99 degrees °]The range of yaw angle is [ -180 DEG ], 180 DEG]The rolling angle range is [ -99 DEG ], 99 DEG]；

the expected calculation formula is:

the cost function consists of a plurality of loss functions:

L_total＝0.4*L_reg+0.4*L_cls+0.2*L_md (2)

The sitting posture reminding module is used for firstly further inputting the probability distribution sequence into a one-dimensional convolution network for evaluating the accuracy of the head posture estimation network estimation angle, if the accuracy is less than 0.5, the head posture is not judged, otherwise, whether the head posture is abnormal is judged according to the estimation values of the pitch angle, the yaw angle and the roll angle, if the head posture is abnormal, corresponding reminding is carried out, and the sitting posture reminding module specifically comprises the following steps:

s4.2: according to the obtained theta_p,θ_y,θ_rJudging the head posture;

s4.3: if the head posture is judged to be abnormal in S4.2, voice reminding is carried out aiming at the abnormal posture;

The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.

Claims

1. A child sitting posture reminding method based on head posture estimation is characterized in that: the method comprises the following steps:

s3: inputting the head image into a head posture estimation network to obtain a probability distribution sequence p, y of three dimensions of Euler angles,

S4: and inputting the probability distribution sequence into a one-dimensional convolution network for accuracy evaluation, returning to the step S1 if the accuracy of the one-dimensional convolution network output is less than 0.5, otherwise, judging the head posture according to the estimation angle in the step S3, and carrying out voice reminding aiming at the error posture.

2. The method for reminding the sitting posture of the child based on the head posture estimation as claimed in claim 1, wherein: the step S1 specifically includes:

S1.2: image is recorded₁Inputting effective frame discrimination network, and outputting the probability of the frame image being effective frame and ineffective frame respectively as P₁(x)，P₂(x) If P is₁(x)<P₂(x) If the frame image is an invalid frame, otherwise, the frame image is an valid frame.

3. The method for reminding the sitting posture of the child based on the head posture estimation as claimed in claim 2, wherein: step S1.2 the effective frame discrimination network is a lightweight network composed of a plurality of basic network blocks, where the basic network blocks are composed of 1 × 1 ordinary convolution and 3 × 3 depth-wise convolution, and the channel separation and channel mixing operations are performed at the head and tail of the network blocks, respectively.

4. The method for reminding the sitting posture of the child based on the head posture estimation as claimed in claim 1, wherein: the step S2 specifically includes:

s2.1: adjusting the size of the original Image into a square, and normalizing each channel to obtain the Image₂，Image₂Size ratio of (1)₁Is larger;

s2.2: image is recorded₂Inputting a Yolov5 target detection network for head recognition, and outputting the coordinates of a head circumscribed rectangle frame on the original image: x is the number of_l，y_l，x_r，y_r；

5. The method for reminding the sitting posture of the child based on the head posture estimation as claimed in claim 1, wherein: the step S3 specifically includes:

the probability distribution sequences are denoted as p, y,

where N ∈ {66,120,66},

6. The method for reminding the sitting posture of the child based on the head posture estimation as claimed in claim 5, wherein: the pose estimation network of step S3.2 comprises two parts: a backbone network for feature extraction and a full-connection layer network for regression classification;

the core structure of the backbone network is composed of 16 mobile turnover bottleneck convolution blocks, the mobile turnover bottleneck convolution blocks firstly change the number of channels through 1 x 1 convolution, then carry out deep convolution, and then are connected with a compression and excitation module, the compression and excitation module can enable a model to pay more attention to important channel characteristics, and finally recover the channel dimensionality through 1 x 1 convolution and increase the generalization and learning capacity of the model through connection inactivation and jump connection operation;

the expected calculation formula is:

the head posture estimation network is obtained by training, and the parameters of the head posture estimation network are obtained by minimizing a cost function L_totalUpdating is carried out;

the cost function consists of a plurality of loss functions:

L_total＝0.4*L_reg+0.4*L_cls+0.2*L_md (2)

L_clsrepresenting the loss of the classification branch, and using binary cross entropy as a loss function;

7. The method for reminding the sitting posture of the child based on the head posture estimation as claimed in claim 1, wherein: the step S4 includes:

s4.2: according to the obtained theta_p,θ_y,θ_rJudging the head posture;

8. The method for reminding the sitting posture of the child based on the head posture estimation as claimed in claim 7, wherein: and S4.3, judging the head posture according to the estimation angles of the three dimensions and a preset angle interval, wherein the abnormal posture comprises the following steps: too low head, head inclination, small difference in thought;

9. A system for implementing a child sitting posture reminding method based on head posture estimation as claimed in claim 1, wherein: the device comprises an image input module, a head detection module, a posture estimation module and a sitting posture reminding module;

the image input module acquires an original image from the camera equipment and judges whether a child is in a read-write state in the current scene through an effective frame detection algorithm so as to determine whether the original image is input into the next module;

the image input module, the head detection module, the posture estimation module and the sitting posture reminding module are sequentially connected.