CN114283448A - Child sitting posture reminding method and system based on head posture estimation - Google Patents

Child sitting posture reminding method and system based on head posture estimation Download PDF

Info

Publication number
CN114283448A
CN114283448A CN202111551860.0A CN202111551860A CN114283448A CN 114283448 A CN114283448 A CN 114283448A CN 202111551860 A CN202111551860 A CN 202111551860A CN 114283448 A CN114283448 A CN 114283448A
Authority
CN
China
Prior art keywords
head
posture
network
estimation
angle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111551860.0A
Other languages
Chinese (zh)
Inventor
宣琦
宋栩杰
周洁韵
翔云
邱君瀚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN202111551860.0A priority Critical patent/CN114283448A/en
Publication of CN114283448A publication Critical patent/CN114283448A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Image Analysis (AREA)
  • Image Processing (AREA)

Abstract

The invention discloses a child sitting posture reminding method based on head posture estimation, which comprises the following steps: s1: acquiring an original image from the camera equipment, detecting an effective frame and reserving the effective frame; s2: extracting a head image from the original image using a target detection algorithm; s3: inputting the head image into a head attitude estimation network to obtain a probability distribution sequence of a pitch angle, a yaw angle and a roll angle and a final estimation angle; s4: and inputting the probability distribution sequence into a one-dimensional convolution network to verify accuracy, judging the head posture according to the estimation angle and carrying out voice reminding on the error posture. The invention further comprises a child sitting posture reminding system based on the head posture estimation. The invention can carry out fine-grained identification on the head Euler angle of the children in the writing and reading scenes through the camera equipment and remind the children of wrong postures, and has lower false alarm rate.

Description

Child sitting posture reminding method and system based on head posture estimation
Technical Field
The invention belongs to the field of computer vision, and mainly relates to a child sitting posture reminding method and system based on head posture estimation.
Background
Some domestic research results show that the poor reading and writing posture is associated with the occurrence and development of myopia, and the incorrect reading and writing posture rate of students is as high as 70% or even more than 85%. Therefore, correcting the reading and writing posture of children is an important way for preventing the occurrence and development of myopia.
At present, physical correction devices are common in the market, and the aim of standardizing sitting postures is achieved by restricting the moving ranges of bodies and heads. For example, the design proposal of the patent application No. CN202130311380.1, a sitting posture corrector for children, which is designed to be fixed on a table top and is used for keeping the distance between the chest of a human body and the table; for another example, patent application No. CN202030635389.3, a children sitting posture correction frame, designs a similar correction frame fixed on a table top, which is used to keep the distance between the head and the table top and prevent the head from being too low. The physical devices only isolate the human body outside a certain area, and can not ensure that the head posture of the child is correct when the child writes, and occupy the space of a desktop to a certain extent, so that the comfort of reading and writing is influenced.
As computer vision technology matures, there are related applications that use vision technology for human state recognition. For example, the invention patent with application number 201711070372.1 discloses a technical scheme, a driver attention detection method based on sight line, which obtains face key point coordinates through a 2D face key point detection algorithm, constructs a 3D head model, extracts 3D face features of a driver, calculates the sight line direction under a 3D coordinate system by using the 2D and 3D face key point coordinates and combining eye space relation, and finally takes the sight line direction as the attention direction. The core step in the technical scheme is to establish the mapping relation of a 3D head model from a 2D coordinate system, and the method based on the geometric deformation model is very dependent on the accuracy of the detection of key points of the human face and is easily influenced by factors such as appearance characteristics, light rays and the like of a driver in a real application scene.
Currently, the mainstream head posture estimation methods in academia can be mainly classified into regression and classification. Regression methods use or fit a mathematical model to directly predict poses based on labeled training data. The regressor may be a principal component analysis, a neural network, or the like. In contrast, classification methods predict poses from a discrete set of poses, with the prediction resolution often being low. The classifier may be a decision tree, a random forest, a neural network, or the like. For example, the invention patent with application number 202011019897.4 discloses a head pose estimation method based on multi-level image feature refining learning, and an implementation system and a storage medium thereof, wherein the scheme uses wavelet transformation to expand the information of pictures in the data processing stage; in the head posture estimation stage, a method of classifying first and then regressing is used, a coarse-grained classification network is used for estimating the approximate interval of the head posture, and then a fine-grained regression network is used for obtaining a specific angle value. According to the technical scheme, data enhancement is carried out, the model is further refined into two stages, but the coarse-grained network in the scheme is only divided into 5 sub-intervals, the fine-grained network in the regression stage is not greatly helped, and the complexity of the whole process is increased.
Disclosure of Invention
The invention overcomes the defects of the prior art, provides a child sitting posture reminding method and system based on head posture estimation, and provides powerful support for solving the sitting posture problem of teenagers when reading and writing characters, thereby contributing to prevention and control of the occurrence and development of teenager myopia.
In order to achieve the purpose, the invention provides the following scheme: the invention provides a child sitting posture reminding method based on head posture estimation, which comprises the following steps of:
s1: acquiring an original image from the camera equipment, inputting an effective frame discrimination network to detect an effective frame, if the probability of the effective frame is less than that of the ineffective frame, determining the original image as the ineffective frame, otherwise, determining the original image as the effective frame, and keeping the effective frame;
s2: performing head detection on the original image by using a target detection algorithm, and extracting a head image according to a detection result;
s3: inputting the head image into a head posture estimation network to obtain a probability distribution sequence of three dimensions of Euler angles
Figure BDA0003417938210000031
And a final estimated angular pitch angle θpYaw angle θyAngle of roll thetar
S4: and inputting the probability distribution sequence into a one-dimensional convolution network for accuracy evaluation, returning to the step S1 if the accuracy of the output of the one-dimensional convolution network is less than 0.5, otherwise, judging the head posture according to the estimation angle in the step S3, and carrying out voice reminding aiming at the error posture.
Preferably, step S1 specifically includes:
s1.1: adjusting the original Image into a square, and normalizing each channel to obtain an Image1
S1.2: image is recorded1Inputting a valid frame discrimination network, outputting the frame image as a valid frame and an invalid frame with probability P respectively1(x),P2(x) If P is1(x)<P2(x) If the frame image is an invalid frame, otherwise, the frame image is an valid frame.
Preferably, the effective frame distinguishing network in step S1.2 is a lightweight network formed by a plurality of basic network blocks, where the basic network blocks are formed by 1 × 1 normal convolution and 3 × 3 depth-wise convolution, and the channel separation and channel mixing operations are performed at the head and tail of the network blocks respectively.
Preferably, step S2 specifically includes:
s2.1: adjusting the size of the original Image into a square, and normalizing each channel to obtain the Image2,Image2Size ratio of (1)1Larger to retain more image information;
s2.2: image is recorded2Import Yolov5 meshThe mark detection network carries out head identification and outputs the coordinates of a rectangular frame externally connected with the head on the original image: x is the number ofl,yl,xr,yr
Wherein xl,ylRespectively an x coordinate and a y coordinate of the upper left corner of the circumscribed rectangular frame; x is the number ofr,yrRespectively an x coordinate and a y coordinate of the upper right corner of the circumscribed rectangle frame;
s2.3: and extracting the head image from the original image according to the coordinates of the head circumscribed rectangular frame.
Preferably, step S3 specifically includes:
s3.1: adjusting the size of the head Image into a square, and normalizing each channel to obtain Image3
S3.2: image is recorded3Inputting an attitude estimation network to obtain a probability distribution sequence of a pitch angle, a yaw angle and a roll angle and an estimation angle;
the probability distribution sequences are respectively recorded as
Figure BDA0003417938210000041
Where N ∈ {66,120,66},
Figure BDA0003417938210000042
the pitch angle range is [ -99 degrees, 99 degrees °]The range of yaw angle is [ -180 DEG ], 180 DEG]The rolling angle range is [ -99 DEG ], 99 DEG]。
Preferably, the pose estimation network of step S3.2 comprises two parts: a backbone network for feature extraction and a full-connection layer network for regression classification;
the core structure of the backbone network is composed of 16 mobile turnover bottleneck convolution blocks, the mobile turnover bottleneck convolution blocks firstly change the number of channels through 1 x 1 convolution, then carry out deep convolution, and then are connected with a compression and excitation module, the compression and excitation module can enable a model to pay more attention to important channel characteristics, and finally recover the channel dimensionality through 1 x 1 convolution and increase the generalization and learning capability of the model through connection inactivation and jump connection operation;
the full-connection layer network comprises three branches in total, the three branches are used for calculating a pitch angle, a yaw angle and a roll angle respectively, and the output of each branch is a sequence with the length of 66 or 120;
each branch comprises a classification branch and a regression branch, in the classification branch, the probability of the class is output by each sequence unit, and the probability distribution of the branch is obtained through a softmax function; in the regression branch, each sequence unit represents a rotation angle of 3 degrees, and the expectation of the dimension is obtained through probability distribution calculation by the regression branch as an estimated angle;
the expected calculation formula is:
Figure BDA0003417938210000051
wherein, thetapredRepresenting the estimated angle, piIs the probability of the ith unit of the sequence, N is the sequence length;
the head posture estimation network is obtained by training, and the parameters of the head posture estimation network are obtained by minimizing a cost function LtotalUpdating;
the cost function consists of a plurality of loss functions:
Ltotal=0.4*Lreg+0.4*Lcls+0.2*Lmd (2)
wherein L isregRepresenting the loss of the regression branch, and defining a loss function as follows in order to ensure that the penalty of the angle does not exceed 180 degrees:
Figure BDA0003417938210000052
Lclsrepresenting the loss of the classification branch, wherein a loss function uses binary cross entropy;
Lmdthe loss of the evaluation network formed by one-dimensional convolution is represented and used for standardizing the probability distribution characteristics of the output of the attitude estimation network.
Preferably, step S4 specifically includes:
s4.1: inputting the probability distribution sequence output by the head attitude estimation network into a one-dimensional convolution network for accuracy evaluation, wherein the accuracy index measures the correlation between the probability distribution characteristic output by the head attitude estimation network and the accuracy of a final estimation angle, and the closer the probability distribution characteristic is to the real distribution, the higher the accuracy is, and the more credible the estimation result of the head attitude estimation network is; if the accuracy of the one-dimensional convolution network output is less than 0.5, returning to the step S1, otherwise, entering the step S4.2;
s4.2: according to the obtained thetapyrJudging the head posture;
s4.3: and if the head posture is judged to be abnormal in S4.2, voice reminding is carried out aiming at the abnormal posture.
Preferably, in step S4.3, the head pose is determined according to the estimated angles of the three dimensions and according to a preset angle interval, and the abnormal pose includes: too low head, head inclination, small difference in thought;
when the pitch angle is within the interval of [ -99 °, -45 ° ], the head in the abnormal posture of the current posture is judged to be too low; when the roll angle is not in the interval of [ -30 degrees, 30 degrees ], judging that the current posture is the head inclination in the abnormal posture; and when the yaw angle is not in the interval of [ -45 degrees, 45 degrees ], judging that the current posture is the small difference of the thought in the abnormal posture.
The invention also provides a child sitting posture reminding system based on head posture estimation, which comprises:
the device comprises an image input module, a head detection module, a posture estimation module and a sitting posture reminding module;
the image input module acquires an original image from the camera equipment and judges whether a child is in a read-write state in the current scene through an effective frame detection algorithm so as to determine whether to input the original image into a next module;
the head detection module is used for obtaining the position information of a circumscribed rectangular frame of the head in the original image by using the trained target detection model, the position information comprises x-axis coordinates and y-axis coordinates of the upper left corner and the lower right corner, and the head image is obtained by cutting according to the coordinates and is input into the next module;
the attitude estimation module is used for inputting the head image into a trained head attitude estimation network to obtain a probability distribution sequence and an estimation angle of three dimensions of an Euler angle, wherein the probability distribution sequence comprises probability distribution characteristics of the probability distribution sequence and is used for judging the accuracy of the estimation angle;
the sitting posture reminding module is used for firstly further inputting the probability distribution sequence into a one-dimensional convolution network for evaluating the accuracy of the head posture estimation network estimation angle, if the accuracy is less than 0.5, the head posture is not judged, otherwise, whether the head posture is abnormal or not is judged according to the estimation values of the pitch angle, the yaw angle and the roll angle, and if the head posture is abnormal, corresponding reminding is carried out;
the image input module, the head detection module, the posture estimation module and the sitting posture reminding module are connected in sequence.
The technical conception of the invention is as follows:
after an original image is obtained from the camera equipment, effective frame detection is firstly carried out, and scenes where no person exists or a human body is too far away from the camera equipment are eliminated; after the fact that a person is in front of the camera equipment is confirmed, detecting the head of the person in the scene through a target detection network, recording position information of the person, acquiring a head image from an original image according to the position information, inputting the head image into a head posture estimation network, and obtaining a probability distribution sequence of three dimensions of the head Euler angle and a final estimation angle; in order to ensure the accuracy of the estimated angle, the probability distribution sequence is input into a one-dimensional convolution network for accuracy evaluation, if the accuracy is more than 0.5, the head posture is judged according to the estimated angle of the head posture estimation network, and the voice reminding is carried out on the wrong sitting posture.
The invention has the beneficial effects that:
1) the invention provides a head posture estimation algorithm, which combines the advantages of a regression and classification method, has higher detection precision, reduces the classification width of a single class to 3 degrees, and has wider detection range;
2) the effective frame detection algorithm can avoid further detection and identification of invalid scenes such as unmanned scenes and non-read-write scenes which are occupied but too far away from the camera equipment, so that the computing resources are saved, and the power consumption is reduced;
3) the accuracy evaluation algorithm is realized through a light one-dimensional convolution network, the accuracy of the final estimation angle is obtained from the probability distribution characteristics of three dimensions of the Euler angle, and the possible false alarm problem in the practical application scene can be effectively prevented;
4) a set of complete children sitting posture reminding algorithm and system capable of being used for a mobile terminal are provided, the head posture of a child during reading and writing can be detected in real time, voice reminding can be carried out on the wrong sitting posture, and the occurrence and development of myopia can be effectively prevented and controlled.
Drawings
FIG. 1 is a system framework diagram of the method of the present invention;
FIG. 2 is a schematic overall flow chart of the method of the present invention;
FIG. 3 is a diagram of a pose estimation network architecture;
fig. 4 is a diagram illustrating the detection effect of four different head states.
Detailed Description
The following detailed description of embodiments of the invention is provided in connection with the accompanying drawings.
Referring to fig. 1 to 4, a child sitting posture reminding method based on head posture estimation includes the following steps:
s1: acquiring an original image from the camera equipment, inputting an effective frame discrimination network to detect an effective frame, if the probability of the effective frame is less than that of the ineffective frame, determining the original image as the ineffective frame, otherwise, determining the original image as the effective frame, and keeping the effective frame;
s2: performing head detection on the original image by using a target detection algorithm, and extracting a head image according to a detection result;
s3: inputting the head image into a head posture estimation network to obtain a probability distribution sequence of three dimensions of Euler angles
Figure BDA0003417938210000081
And a final estimated angular pitch angle θpYaw angle θyAngle of roll thetar
S4: and inputting the probability distribution sequence into a one-dimensional convolution network for accuracy evaluation, returning to the step S1 if the accuracy of the output of the one-dimensional convolution network is less than 0.5, otherwise, judging the head posture according to the estimation angle in the step S3, and carrying out voice reminding aiming at the error posture.
Step S1 specifically includes:
s1.1: the original image is adjusted to be square, and the preferred image size in this embodiment is 224 × 224, and mean ═ 0.5,0.5],std=[0.5,0.5,0.5]Normalizing each channel to obtain an Image1. Where mean represents the mean, std represents the standard deviation, and the normalization process can be expressed as:
Figure BDA0003417938210000082
wherein input represents the pixel value of the original image and output represents the normalized value;
s1.2: image is recorded1Inputting a valid frame discrimination network, outputting the frame image as a valid frame and an invalid frame with probability P respectively1(x),P2(x) If P is1(x)<P2(x) If the frame image is an invalid frame, otherwise, the frame image is an effective frame;
step S1.2 the effective frame discrimination network is a lightweight network composed of a plurality of basic network blocks, where the basic network blocks are composed of 1 × 1 ordinary convolution and 3 × 3 depth-wise convolution, and the channel separation and channel mixing operations are performed at the head and tail of the network blocks, respectively.
Step S2 specifically includes:
s2.1: adjusting the size of the original Image into a square, and normalizing each channel to obtain the Image2,Image2Size ratio of (1)1Larger to retain more image information;
s2.2: image is recorded2Inputting a Yolov5 target detection network for head recognition, and outputting the coordinates of a rectangular frame outside the head on the original image: x is the number ofl,yl,xr,yr
Wherein xl,ylRespectively an x coordinate and a y coordinate of the upper left corner of the circumscribed rectangular frame; x is the number ofr,yrRespectively an x coordinate and a y coordinate of the upper right corner of the circumscribed rectangle frame;
s2.3: and extracting a head image from the original image according to the coordinates of the head circumscribed rectangular frame.
Step S3 specifically includes:
s3.1: the size of the head Image is adjusted to be square, the preferred Image size in this embodiment is 224 × 224, and normalization is performed for each channel to obtain Image3
S3.2: image is recorded3Inputting an attitude estimation network to obtain a probability distribution sequence of a pitch angle, a yaw angle and a roll angle and an estimation angle;
the probability distribution sequences are respectively recorded as
Figure BDA0003417938210000091
Where N ∈ {66,120,66},
Figure BDA0003417938210000092
the pitch angle ranges from [ -99 °,99 [ -99 [ ]]The range of yaw angle is [ -180 DEG ], 180 DEG]The rolling angle range is [ -99 DEG ], 99 DEG];
The pose estimation network of step S3.2 comprises two parts: a backbone network for feature extraction and a full-connection layer network for regression classification;
the core structure of the backbone network is composed of 16 mobile turnover bottleneck convolution blocks, the mobile turnover bottleneck convolution blocks firstly change the number of channels through 1 x 1 convolution, then carry out deep convolution, and then are connected with a compression and excitation module, the compression and excitation module can enable a model to pay more attention to important channel characteristics, and finally recover the channel dimensionality through 1 x 1 convolution and increase the generalization and learning capability of the model through connection inactivation and jump connection operation;
the full-connection layer network comprises three branches in total, the three branches are used for calculating a pitch angle, a yaw angle and a roll angle respectively, and the output of each branch is a sequence with the length of 66 or 120;
each branch comprises a classification branch and a regression branch, in the classification branch, the probability of the class is output by each sequence unit, and the probability distribution of the branch is obtained through a softmax function; in the regression branch, each sequence unit represents a rotation angle of 3 degrees, and the expectation of the dimension is obtained through probability distribution calculation by the regression branch as an estimated angle;
the expected calculation formula is:
Figure BDA0003417938210000101
wherein, thetapredRepresenting the estimated angle, piIs the probability of the ith unit of the sequence, N is the sequence length;
the head posture estimation network is obtained by training, and the parameters of the head posture estimation network are obtained by minimizing a cost function LtotalUpdating;
the cost function consists of a plurality of loss functions:
Ltotal=0.4*Lreg+0.4*Lcls+0.2*Lmd (2)
wherein L isregRepresenting the loss of the regression branch, and defining a loss function as follows in order to ensure that the penalty of the angle does not exceed 180 degrees:
Figure BDA0003417938210000102
Lclsrepresenting the loss of the classification branch, wherein a loss function uses binary cross entropy;
Lmdthe loss of the evaluation network formed by one-dimensional convolution is represented and used for standardizing the probability distribution characteristics of the output of the attitude estimation network.
Step S4 includes:
s4.1: inputting the probability distribution sequence output by the head attitude estimation network into a one-dimensional convolution network for accuracy evaluation, if the accuracy output by the one-dimensional convolution network is less than 0.5, returning to the step S1, otherwise, entering the step S4.2;
s4.2: according to the obtained thetapyrJudging the head posture;
s4.3: and if the head posture is judged to be abnormal in S4.2, voice reminding is carried out aiming at the abnormal posture.
And S4.3, judging the head posture according to the estimation angles of the three dimensions and a preset angle interval, wherein the abnormal posture comprises the following steps: too low head, head inclination, small difference in thought;
when the pitch angle is within the interval of [ -99 °, -45 ° ], the head in the abnormal posture of the current posture is judged to be too low; when the roll angle is not in the interval of [ -30 degrees, 30 degrees ], judging that the current posture is the head inclination in the abnormal posture; and when the yaw angle is not in the interval of [ -45 degrees, 45 degrees ], judging that the current posture is the small difference of the thought in the abnormal posture.
The children sitting posture reminding system based on the head posture estimation comprises an image input module, a head detection module, a posture estimation module and a sitting posture reminding module;
the image input module acquires an original image from the camera equipment and judges whether a child is in a read-write state in a current scene through an effective frame detection algorithm so as to determine whether to input the original image into a next module, and the method specifically comprises the following steps:
s1.1: the original image is adjusted to be square, and the preferred image size in this embodiment is 224 × 224, and mean ═ 0.5,0.5],std=[0.5,0.5,0.5]Normalizing each channel to obtain an Image1(ii) a Where mean represents the mean, std represents the standard deviation, and the normalization process can be expressed as:
Figure BDA0003417938210000111
wherein input represents the pixel value of the original image and output represents the normalized value;
s1.2: image is recorded1Inputting a valid frame discrimination network, outputting the frame image as a valid frame and an invalid frame with probability P respectively1(x),P2(x) If P is1(x)<P2(x) If the frame image is an invalid frame, otherwise, the frame image is an effective frame;
step S1.2 the effective frame discrimination network is a lightweight network composed of a plurality of basic network blocks, where the basic network blocks are composed of 1 × 1 ordinary convolution and 3 × 3 depth-wise convolution, and the channel separation and channel mixing operations are performed at the head and tail of the network blocks, respectively.
The head detection module detects and obtains the position information of the external rectangular frame of the head in the original image by using the trained target detection model, comprises x and y axis coordinates of the upper left corner and the lower right corner, cuts out the head image according to the coordinates and inputs the head image into the next module, and specifically comprises:
s2.1: adjusting the size of the original Image into a square, and normalizing each channel to obtain the Image2,Image2Size ratio of (1)1Larger to retain more image information;
s2.2: image is recorded2Inputting a Yolov5 target detection network for head recognition, and outputting the coordinates of a rectangular frame outside the head on the original image: x is the number ofl,yl,xr,yr
Wherein xl,ylRespectively an x coordinate and a y coordinate of the upper left corner of the circumscribed rectangular frame; x is the number ofr,yrRespectively an x coordinate and a y coordinate of the upper right corner of the circumscribed rectangle frame;
s2.3: and extracting a head image from the original image according to the coordinates of the head circumscribed rectangular frame.
The attitude estimation module obtains a probability distribution sequence and an estimation angle of three dimensionalities of an Euler angle by inputting a head image into a trained head attitude estimation network, wherein the probability distribution sequence contains probability distribution characteristics of the probability distribution sequence and is used for judging the accuracy of the estimation angle, and the attitude estimation module specifically comprises:
s3.1: the size of the head Image is adjusted to be square, the preferred Image size in this embodiment is 224 × 224, and normalization is performed for each channel to obtain Image3
S3.2: image is recorded3Inputting an attitude estimation network to obtain a pitch angle and a yawAngles, probability distribution sequences of roll angles and estimated angles;
the probability distribution sequences are respectively recorded as
Figure BDA0003417938210000121
Where N ∈ {66,120,66},
Figure BDA0003417938210000122
the pitch angle range is [ -99 degrees, 99 degrees °]The range of yaw angle is [ -180 DEG ], 180 DEG]The rolling angle range is [ -99 DEG ], 99 DEG];
The pose estimation network of step S3.2 comprises two parts: a backbone network for feature extraction and a full-connection layer network for regression classification;
the core structure of the backbone network is composed of 16 mobile turnover bottleneck convolution blocks, the mobile turnover bottleneck convolution blocks firstly change the number of channels through 1 x 1 convolution, then carry out deep convolution, and then are connected with a compression and excitation module, the compression and excitation module can enable a model to pay more attention to important channel characteristics, and finally recover the channel dimensionality through 1 x 1 convolution and increase the generalization and learning capability of the model through connection inactivation and jump connection operation;
the full-connection layer network comprises three branches in total, the three branches are used for calculating a pitch angle, a yaw angle and a roll angle respectively, and the output of each branch is a sequence with the length of 66 or 120;
each branch comprises a classification branch and a regression branch, in the classification branch, the probability of the class is output by each sequence unit, and the probability distribution of the branch is obtained through a softmax function; in the regression branch, each sequence unit represents a rotation angle of 3 degrees, and the expectation of the dimension is obtained through probability distribution calculation by the regression branch as an estimated angle;
the expected calculation formula is:
Figure BDA0003417938210000131
wherein, thetapredRepresenting the estimated angle, piIs the probability of the ith unit of the sequence, N is the sequence length;
the head posture estimation network is obtained by training, and the parameters of the head posture estimation network are obtained by minimizing a cost function LtotalUpdating;
the cost function consists of a plurality of loss functions:
Ltotal=0.4*Lreg+0.4*Lcls+0.2*Lmd (2)
wherein L isregRepresenting the loss of the regression branch, and defining a loss function as follows in order to ensure that the penalty of the angle does not exceed 180 degrees:
Figure BDA0003417938210000132
Lclsrepresenting the loss of the classification branch, wherein a loss function uses binary cross entropy;
Lmdthe loss of the evaluation network formed by one-dimensional convolution is represented and used for standardizing the probability distribution characteristics of the output of the attitude estimation network.
The sitting posture reminding module is used for firstly further inputting the probability distribution sequence into a one-dimensional convolution network for evaluating the accuracy of the head posture estimation network estimation angle, if the accuracy is less than 0.5, the head posture is not judged, otherwise, whether the head posture is abnormal is judged according to the estimation values of the pitch angle, the yaw angle and the roll angle, if the head posture is abnormal, corresponding reminding is carried out, and the sitting posture reminding module specifically comprises the following steps:
s4.1: inputting the probability distribution sequence output by the head attitude estimation network into a one-dimensional convolution network for accuracy evaluation, if the accuracy output by the one-dimensional convolution network is less than 0.5, returning to the step S1, otherwise, entering the step S4.2;
s4.2: according to the obtained thetapyrJudging the head posture;
s4.3: if the head posture is judged to be abnormal in S4.2, voice reminding is carried out aiming at the abnormal posture;
and S4.3, judging the head posture according to the estimation angles of the three dimensions and a preset angle interval, wherein the abnormal posture comprises the following steps: too low head, head inclination, small difference in thought;
when the pitch angle is within the interval of [ -99 °, -45 ° ], the head in the abnormal posture of the current posture is judged to be too low; when the roll angle is not in the interval of [ -30 degrees, 30 degrees ], judging that the current posture is the head inclination in the abnormal posture; and when the yaw angle is not in the interval of [ -45 degrees, 45 degrees ], judging that the current posture is the small difference of the thought in the abnormal posture.
The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by the equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.

Claims (9)

1. A child sitting posture reminding method based on head posture estimation is characterized in that: the method comprises the following steps:
s1: acquiring an original image from the camera equipment, inputting an effective frame discrimination network to detect an effective frame, if the probability of the effective frame is less than that of the ineffective frame, determining the original image as the ineffective frame, otherwise, determining the original image as the effective frame, and keeping the effective frame;
s2: performing head detection on the original image by using a target detection algorithm, and extracting a head image according to a detection result;
s3: inputting the head image into a head posture estimation network to obtain a probability distribution sequence p, y of three dimensions of Euler angles,
Figure FDA0003417938200000011
and a final estimated angular pitch angle θpYaw angle θyAngle of roll thetar
S4: and inputting the probability distribution sequence into a one-dimensional convolution network for accuracy evaluation, returning to the step S1 if the accuracy of the one-dimensional convolution network output is less than 0.5, otherwise, judging the head posture according to the estimation angle in the step S3, and carrying out voice reminding aiming at the error posture.
2. The method for reminding the sitting posture of the child based on the head posture estimation as claimed in claim 1, wherein: the step S1 specifically includes:
s1.1: adjusting the original Image into a square, and normalizing each channel to obtain an Image1
S1.2: image is recorded1Inputting effective frame discrimination network, and outputting the probability of the frame image being effective frame and ineffective frame respectively as P1(x),P2(x) If P is1(x)<P2(x) If the frame image is an invalid frame, otherwise, the frame image is an valid frame.
3. The method for reminding the sitting posture of the child based on the head posture estimation as claimed in claim 2, wherein: step S1.2 the effective frame discrimination network is a lightweight network composed of a plurality of basic network blocks, where the basic network blocks are composed of 1 × 1 ordinary convolution and 3 × 3 depth-wise convolution, and the channel separation and channel mixing operations are performed at the head and tail of the network blocks, respectively.
4. The method for reminding the sitting posture of the child based on the head posture estimation as claimed in claim 1, wherein: the step S2 specifically includes:
s2.1: adjusting the size of the original Image into a square, and normalizing each channel to obtain the Image2,Image2Size ratio of (1)1Is larger;
s2.2: image is recorded2Inputting a Yolov5 target detection network for head recognition, and outputting the coordinates of a head circumscribed rectangle frame on the original image: x is the number ofl,yl,xr,yr
Wherein xl,ylRespectively an x coordinate and a y coordinate of the upper left corner of the circumscribed rectangular frame; x is the number ofr,yrRespectively an x coordinate and a y coordinate of the upper right corner of the circumscribed rectangle frame;
s2.3: and extracting a head image from the original image according to the coordinates of the head circumscribed rectangular frame.
5. The method for reminding the sitting posture of the child based on the head posture estimation as claimed in claim 1, wherein: the step S3 specifically includes:
s3.1: adjusting the size of the head Image into a square, and normalizing each channel to obtain Image3
S3.2: image is recorded3Inputting an attitude estimation network to obtain a probability distribution sequence of a pitch angle, a yaw angle and a roll angle and an estimation angle;
the probability distribution sequences are denoted as p, y,
Figure FDA0003417938200000021
where N ∈ {66,120,66},
Figure FDA0003417938200000022
the pitch angle range is [ -99 degrees, 99 degrees °]The range of yaw angle is [ -180 DEG ], 180 DEG]The rolling angle range is [ -99 DEG ], 99 DEG]。
6. The method for reminding the sitting posture of the child based on the head posture estimation as claimed in claim 5, wherein: the pose estimation network of step S3.2 comprises two parts: a backbone network for feature extraction and a full-connection layer network for regression classification;
the core structure of the backbone network is composed of 16 mobile turnover bottleneck convolution blocks, the mobile turnover bottleneck convolution blocks firstly change the number of channels through 1 x 1 convolution, then carry out deep convolution, and then are connected with a compression and excitation module, the compression and excitation module can enable a model to pay more attention to important channel characteristics, and finally recover the channel dimensionality through 1 x 1 convolution and increase the generalization and learning capacity of the model through connection inactivation and jump connection operation;
the full-connection layer network comprises three branches in total, the three branches are used for calculating a pitch angle, a yaw angle and a roll angle respectively, and the output of each branch is a sequence with the length of 66 or 120;
each branch comprises a classification branch and a regression branch, in the classification branch, the probability of the class is output by each sequence unit, and the probability distribution of the branch is obtained through a softmax function; in the regression branch, each sequence unit represents a rotation angle of 3 degrees, and the expectation of the dimension is obtained through probability distribution calculation by the regression branch as an estimated angle;
the expected calculation formula is:
Figure FDA0003417938200000031
wherein, thetapredRepresenting the estimated angle, piIs the probability of the ith unit of the sequence, N is the sequence length;
the head posture estimation network is obtained by training, and the parameters of the head posture estimation network are obtained by minimizing a cost function LtotalUpdating is carried out;
the cost function consists of a plurality of loss functions:
Ltotal=0.4*Lreg+0.4*Lcls+0.2*Lmd (2)
wherein L isregRepresenting the loss of the regression branch, and defining a loss function as follows in order to ensure that the penalty of the angle does not exceed 180 degrees:
Figure FDA0003417938200000032
Lclsrepresenting the loss of the classification branch, and using binary cross entropy as a loss function;
Lmdthe loss of the evaluation network formed by one-dimensional convolution is represented and used for standardizing the probability distribution characteristics of the output of the attitude estimation network.
7. The method for reminding the sitting posture of the child based on the head posture estimation as claimed in claim 1, wherein: the step S4 includes:
s4.1: inputting the probability distribution sequence output by the head attitude estimation network into a one-dimensional convolution network for accuracy evaluation, if the accuracy output by the one-dimensional convolution network is less than 0.5, returning to the step S1, otherwise, entering the step S4.2;
s4.2: according to the obtained thetapyrJudging the head posture;
s4.3: and if the head posture is judged to be abnormal in S4.2, voice reminding is carried out aiming at the abnormal posture.
8. The method for reminding the sitting posture of the child based on the head posture estimation as claimed in claim 7, wherein: and S4.3, judging the head posture according to the estimation angles of the three dimensions and a preset angle interval, wherein the abnormal posture comprises the following steps: too low head, head inclination, small difference in thought;
when the pitch angle is within the interval of [ -99 °, -45 ° ], the head in the abnormal posture of the current posture is judged to be too low; when the roll angle is not in the interval of [ -30 degrees, 30 degrees ], judging that the current posture is the head inclination in the abnormal posture; and when the yaw angle is not in the interval of [ -45 degrees, 45 degrees ], judging that the current posture is the small difference of the thought in the abnormal posture.
9. A system for implementing a child sitting posture reminding method based on head posture estimation as claimed in claim 1, wherein: the device comprises an image input module, a head detection module, a posture estimation module and a sitting posture reminding module;
the image input module acquires an original image from the camera equipment and judges whether a child is in a read-write state in the current scene through an effective frame detection algorithm so as to determine whether the original image is input into the next module;
the head detection module is used for obtaining the position information of a circumscribed rectangular frame of the head in the original image by using the trained target detection model, the position information comprises x-axis coordinates and y-axis coordinates of the upper left corner and the lower right corner, and the head image is obtained by cutting according to the coordinates and is input into the next module;
the attitude estimation module is used for inputting the head image into a trained head attitude estimation network to obtain a probability distribution sequence and an estimation angle of three dimensions of an Euler angle, wherein the probability distribution sequence comprises probability distribution characteristics of the probability distribution sequence and is used for judging the accuracy of the estimation angle;
the sitting posture reminding module is used for firstly further inputting the probability distribution sequence into a one-dimensional convolution network for evaluating the accuracy of the head posture estimation network estimation angle, if the accuracy is less than 0.5, the head posture is not judged, otherwise, whether the head posture is abnormal or not is judged according to the estimation values of the pitch angle, the yaw angle and the roll angle, and if the head posture is abnormal, corresponding reminding is carried out;
the image input module, the head detection module, the posture estimation module and the sitting posture reminding module are sequentially connected.
CN202111551860.0A 2021-12-17 2021-12-17 Child sitting posture reminding method and system based on head posture estimation Pending CN114283448A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111551860.0A CN114283448A (en) 2021-12-17 2021-12-17 Child sitting posture reminding method and system based on head posture estimation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111551860.0A CN114283448A (en) 2021-12-17 2021-12-17 Child sitting posture reminding method and system based on head posture estimation

Publications (1)

Publication Number Publication Date
CN114283448A true CN114283448A (en) 2022-04-05

Family

ID=80872850

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111551860.0A Pending CN114283448A (en) 2021-12-17 2021-12-17 Child sitting posture reminding method and system based on head posture estimation

Country Status (1)

Country Link
CN (1) CN114283448A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117275069A (en) * 2023-09-26 2023-12-22 华中科技大学 End-to-end head gesture estimation method based on learnable vector and attention mechanism

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117275069A (en) * 2023-09-26 2023-12-22 华中科技大学 End-to-end head gesture estimation method based on learnable vector and attention mechanism

Similar Documents

Publication Publication Date Title
CN111401257B (en) Face recognition method based on cosine loss under non-constraint condition
CN109558832B (en) Human body posture detection method, device, equipment and storage medium
US7925093B2 (en) Image recognition apparatus
CN112668483B (en) Single-target person tracking method integrating pedestrian re-identification and face detection
CN112418095A (en) Facial expression recognition method and system combined with attention mechanism
CN114241548A (en) Small target detection algorithm based on improved YOLOv5
CN107992807B (en) Face recognition method and device based on CNN model
WO2020181523A1 (en) Method and apparatus for waking up screen
CN107704813B (en) Face living body identification method and system
CN112287868B (en) Human body action recognition method and device
CN109087337B (en) Long-time target tracking method and system based on hierarchical convolution characteristics
CN112329683A (en) Attention mechanism fusion-based multi-channel convolutional neural network facial expression recognition method
Zakaria et al. Hierarchical skin-adaboost-neural network (h-skann) for multi-face detection
CN105046278B (en) The optimization method of Adaboost detection algorithm based on Haar feature
CN110598647B (en) Head posture recognition method based on image recognition
CN111898571A (en) Action recognition system and method
Sedai et al. Discriminative fusion of shape and appearance features for human pose estimation
CN114283448A (en) Child sitting posture reminding method and system based on head posture estimation
CN111582057B (en) Face verification method based on local receptive field
CN113269038A (en) Multi-scale-based pedestrian detection method
CN111191549A (en) Two-stage face anti-counterfeiting detection method
CN113642520B (en) Double-task pedestrian detection method with head information
CN111881732B (en) SVM (support vector machine) -based face quality evaluation method
CN113920155A (en) Moving target tracking algorithm based on kernel correlation filtering
JP2006244385A (en) Face-discriminating apparatus, program and learning method for the apparatus

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination