CN112766185A - Head posture monitoring method, device and system based on deep learning - Google Patents

Head posture monitoring method, device and system based on deep learning Download PDF

Info

Publication number
CN112766185A
CN112766185A CN202110090638.9A CN202110090638A CN112766185A CN 112766185 A CN112766185 A CN 112766185A CN 202110090638 A CN202110090638 A CN 202110090638A CN 112766185 A CN112766185 A CN 112766185A
Authority
CN
China
Prior art keywords
face
neural network
head
image
bed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110090638.9A
Other languages
Chinese (zh)
Other versions
CN112766185B (en
Inventor
金梅
李翔宇
张立国
李圆圆
李义辉
马子荐
杨曼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yanshan University
Original Assignee
Yanshan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yanshan University filed Critical Yanshan University
Priority to CN202110090638.9A priority Critical patent/CN112766185B/en
Publication of CN112766185A publication Critical patent/CN112766185A/en
Application granted granted Critical
Publication of CN112766185B publication Critical patent/CN112766185B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/50Context or environment of the image
    • G06V20/59Context or environment of the image inside of a vehicle, e.g. relating to seat occupancy, driver state or inner lighting conditions
    • G06V20/597Recognising the driver's state or behaviour, e.g. attention or drowsiness
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Abstract

The invention provides a method, a device and a system for monitoring head postures based on deep learning. The method comprises the following steps: s1, collecting historical image data; s2, training a neural network; s3, acquiring a real-time image; s4, transmitting the preprocessed real-time image into a first neural network to obtain a human face boundary box needing to be monitored; s5, determining the angle of the face image by the second neural network; s6, return to step S3. The system comprises: the device comprises a data acquisition module, an image processing module and an alarm module. The device comprises: a bed, a camera mounting bracket, a computer, and an alarm. The method adopts a first neural network and a second neural network, wherein the first neural network selects the faces appearing in the image by using an improved YOLOv3 algorithm box, so that the problem that a plurality of faces appear in a monitoring range is solved; the second neural network uses the improved VGG16 network to extract a plurality of characteristics from the image and perform fusion, so that the head posture can be monitored in real time.

Description

Head posture monitoring method, device and system based on deep learning
Technical Field
The invention relates to an artificial intelligence technology, in particular to a method, a device and a system for monitoring head postures based on deep learning.
Background
At present, a plurality of methods and applications for detecting the head posture exist, but the head posture of a person is mainly the head posture of the person when the person sits and stands, for example, the head posture of the driver is identified, the head posture of the person when the person lies down is not identified, but the requirements exist, for example, the head is deviated to one side when the person lies down through image detection, the posture is called as a pillow-removing lying posture, the posture is used for better keeping the respiratory tract unobstructed for the patient, respiratory tract obstruction caused by tongue falling is prevented, vomit is prevented from entering the trachea by mistake to cause asphyxia, headache caused by intracranial pressure reduction is prevented, a wound is prevented from being dragged, and asphyxia caused by milk spitting or lying prone sleep can be prevented for the baby. The traditional monitoring adopts vital sign (such as body temperature, electrocardiogram and the like) monitoring, if the vital sign data appear unusually to report to the police, but the problem that adopts vital sign monitoring to exist is that it is late, if in wrong sleep posture, the vital sign appears unusually often just can produce after a few minutes, can only adopt remedial measures this moment, but can't prevent. If manual supervision is adopted, a large amount of manpower and material resources are wasted.
Disclosure of Invention
In order to overcome the defects of the prior art, the recognition that the head is in the posture of being rested on a pillow and lying is realized, and the problem that a plurality of human faces appear in the monitoring range is solved. The invention provides a head posture monitoring method based on deep learning, which comprises the following steps:
s1, generating a data set for training a neural network by using the collected historical image data;
the historical image data mainly comprises a face data set, face data sets with different angles and a bed data set;
s2, training the neural network by using the data set to obtain the weight values of the first neural network and the second neural network;
s21, using 80% of the face data set, the face data set at different angles and the bed data set for training, called as a training set, and using the rest 20% for verification, called as a verification set;
s22, inputting the face data set and the bed data set in the data set of the step S21 into a first neural network for training, and obtaining a parameter model;
s23, inputting the face data sets with different angles in the data set of the step S21 into a second neural network for training, and obtaining a parameter model;
s231, the training process of the second neural network is to input the face data sets of different angles into the second neural network, and the second neural network is an improved Vgg16 network;
s232, the specific structure of the second neural network is as follows: the convolutional layers, the pooling layer, the BN layer and the ReLu activation function are sequentially stacked and repeated for four times, the convolutional layers are sequentially convolutional layers C1, C2, C3 and C4, the output of each ReLu activation function is respectively output to the next convolutional layer and the convolutional layer C5, because of the four times of repetition, four ReLu activation functions are provided, each ReLu activation function is connected with one convolutional layer C5, the four convolutional layers C5 respectively output a first characteristic, a second characteristic, a third characteristic and a fourth characteristic, the output of the last ReLu activation function is output as an output characteristic after being output as FC3 through 3 full connection layers, the first characteristic, the second characteristic, the third characteristic and the fourth characteristic and the output characteristic after FC3 are input to the characteristic fusion layer, the characteristic fusion layer inputs the processing result to the softmax, the head posture is judged according to the probability value, and the monitoring of the head posture is realized;
s233, training by using data sets of different angles to obtain weights of a second neural network, and accelerating feature collection of different angles in real-time monitoring;
s3, acquiring a real-time image needing head posture monitoring, and preprocessing the real-time image to obtain a preprocessed image;
the method comprises the steps that a camera shoots an obtained real-time picture, and collected real-time images are preprocessed to obtain preprocessed images, wherein the preprocessing comprises filtering and noise reduction of the real-time images;
s4, transmitting the preprocessed image into a first neural network for determining a face bounding box to be monitored in the preprocessed image, wherein the first neural network is divided into two parts, the first part identifies all faces and bed bounding boxes, the second part screens a plurality of face bounding boxes to obtain the face bounding box to be monitored, and the steps specifically comprise:
s41, rapidly determining a boundary box of the face and the bed according to the parameter model of the first neural network trained in the S22;
s42, determining the boundary frame and the limiting condition of the face and the bed according to S41, and determining the boundary frame of the face of the monitored person;
s421, firstly, setting the face selection area as A and the bed selection area as B, which must satisfy
Figure BDA0002912536970000021
If the human face area cannot be detected, the person is considered to be unmanned in bed; if the face frame selection area can be detected but is not in the bed frame selection area, the person is still absent in the bed; if no person is in the bed, returning to the step S3, otherwise executing the step S422;
s422, if only one face frame selection area is in the bed frame selection area, determining the face image of the monitored person in the area, executing step S5, if a plurality of face boundary frames are in the bed boundary frame, eliminating interference of irrelevant persons, and executing step S423;
in step S423, considering that the head pose is monitored during sleep, and therefore the head motion amplitude is small, using the face bounding box obtained in step S41,
Figure BDA0002912536970000022
in the case of (1), the contents in a plurality of face bounding boxes are taken out, wherein A is a face selection area and B is a bed selection areaA domain. Suppose that the k-th personal face bounding box, i (i ≦ k) th personal face bounding box take 4 consecutive frames of images, fi1、fi2、fi3、fi4Respectively making difference D between two adjacent frames of images in each bounding box areai,Di1=|fi3-fi2|∩|fi2-fi1|,Di2=|fi4-fi3|∩|fi3-fi2I, then solving the union set of the differences to obtain a difference image Di=Di1∪Di2Giving a threshold value T, and carrying out binarization operation on the image to obtain a binarized image Ri
Figure BDA0002912536970000031
1 represents the point of motion and 0 represents the background, the percentage of 1 to the total pixels in the bounding box is calculated, and R is calculatediThe face boundary box with the minimum proportion is the face boundary box of the monitored person;
s5, determining the angle of the face image by the second neural network;
and S6, returning to the step S3, and continuously monitoring the head posture. Meanwhile, if the head is in the non-occipital lying position posture for 250 continuous frames or more than 600 frames in one minute, an alarm signal is sent out to prompt that the head posture of the monitored person is in the non-occipital lying position posture.
Preferably, the different-angle face data sets acquire image data of different head rotation angles of a plurality of persons, and the regions have discontinuous attributes and can be represented by discontinuous head postures; the right middle is 0 degrees, then the head is marked at intervals of 15 degrees in the left-right direction X, the head pitch angle Y is also marked at intervals of 15 degrees, and the range of the head rotation is shown by using (X, Y); the four directions of the upper direction, the lower direction, the left direction and the right direction are all at the maximum angle of 90 degrees, so that 13 angles exist in the upper direction, the lower direction and the left direction and the right direction, and the total number of 169 deflection angles are provided for (X, Y), due to the limitation of a body, the left direction and the right direction can be rotated to-90 degrees and 90 degrees only when the pitch angle is 0 degree, the heads at other pitch angles can not be rotated to-90 degrees and 90 degrees left and right, and therefore 145 head posture data sets are determined; each head pose has a corresponding angle label; the head is required to lean to one side when the user sleeps on the back, so that 145 head postures can be divided into two major classes of the head-lying posture with pillow removed and the head-lying posture without pillow removed; when the absolute value of X is more than or equal to 75 degrees and less than or equal to 90 degrees and the absolute value of Y is more than or equal to 0 degrees and less than or equal to 30 degrees, the head is regarded as the lying posture with no pillow removal, and the rest are regarded as the lying postures without pillow removal.
Preferably, in step S22, the face data set and the bed data set in the data set of step S21 are input to a first neural network for training, and a parametric model is obtained, specifically including the steps of;
s221, predicting coordinates (x, y) of the center point of the target boundary box, height and width (w, h) of the boundary box, a bed, a human face, a background and confidence by using a YOLOv3 algorithm;
s222, correcting the boundary frame through backward propagation, meanwhile, predicting the detected object in the image, and if IoU of the boundary frame selected by the predicted boundary frame and the boundary frame selected by the manual frame is less than 0.6, considering that the error of the predicted target boundary frame is too large, the network continuously transfers the error to the previous stage in the backward direction to ensure that IoU of the boundary frame selected by the predicted boundary frame and the manual frame is more than or equal to 0.6;
and S223, training by using the data set to obtain a weight of the first neural network, and using the weight to accelerate the judgment speed of the boundary box of the face and the bed of the neural network in real-time monitoring.
Preferably, in step S5, the second neural network determines an angle of the face image, and the specific steps are as follows:
s51, cutting the human face bounding box area obtained in the step S4 from the preprocessed image, performing resize operation on the human face area to ensure that the human face area image with the image size of 224 multiplied by 224 input into the multi-feature fusion convolution neural network module is put into a second neural network to determine the head pose;
s52, feature fusion: every time the picture passes through one convolution layer and one pooling layer, a 1 x 1 convolution layer C5 is added before feature extraction, and the feature extraction sequentially comprises the following steps: the first feature, the second feature, the third feature, the fourth feature, and the output feature after FC 3;
s53, backward propagation of the first, second, third, fourth and FC3 post-signatures, total signature loss:
Figure BDA0002912536970000041
where T represents a feature function, n represents a total of n features, Li represents a feature loss function of the ith feature, and pi,kRepresenting the probability that the ith feature is predicted to be the kth class;
and S54, determining the angle of the face image, and judging whether the head is in a non-resting and lying position.
The application also discloses a system for monitoring the head posture by using the head posture monitoring method based on deep learning, which comprises the following steps: the device comprises a data acquisition module, an image processing module and an alarm module;
the data acquisition module is used for executing the step S3 and acquiring the images of the patient bed and different angles;
the image processing module is used for executing steps S4 and S5, processing the acquired image data, specifically, obtaining parameter models of two neural networks through preprocessing, putting a real-time image into a first neural network frame to select a face image, putting the detected network into a second neural network, and realizing the determination of the head posture through multi-feature fusion;
the alarm module is used for sending out an alarm signal if the non-sleeper-removal lying posture is maintained for 250 continuous frames or more than 600 continuous frames within one minute.
The application also discloses a device for monitoring the head posture by using the head posture monitoring method based on deep learning, which comprises the following steps: a bed, a camera fixing bracket, a computer and an alarm,
the camera is arranged right above the bed head through the camera fixing support, so that the camera can shoot a picture facing to a human face;
the camera is connected with the computer through a network, and the computer is used for executing the steps S4, S5 and S6, processing the collected image data and judging whether the head lies on the side or not according to the pictures obtained from the camera;
the alarm is connected with the computer through a network and used for giving an alarm.
Compared with the prior art, the invention has the following beneficial effects:
1. the method adopts a first neural network and a second neural network, wherein the first neural network selects the faces appearing in the image by using an improved YOLOv3 algorithm box, so that the problem that a plurality of faces appear in a monitoring range is solved;
2. the second neural network extracts a plurality of characteristics from the image by using the improved VGG16 network and fuses the characteristics, the head posture can be monitored in real time, and the device can immediately give an alarm when the head is not in the lying posture of the user.
Drawings
FIG. 1 is a schematic structural diagram of a head posture monitoring device based on deep learning;
FIG. 2 is a schematic structural diagram of a head posture monitoring system based on deep learning;
FIG. 3 is a schematic diagram of steps of a head posture monitoring method based on deep learning;
FIG. 4 is a schematic diagram of a process for improving algorithm processing of an image by using YOLOv 3;
FIG. 5 is a schematic diagram of a second neural network architecture;
FIG. 6 is a schematic diagram of fusion of different feature layers of an image using a second neural network feature.
Reference numerals:
1. the bed, 2, the camera, 3, the camera fixed bolster, 4, the computer, 5, the siren.
Detailed Description
In order to better understand the technical solution of the present invention, the following detailed description is made with reference to the accompanying drawings and examples. In the drawings, like reference numbers can indicate functionally identical or similar elements. While the various aspects of the embodiments are presented in drawings, the drawings are not necessarily drawn to scale unless specifically indicated.
The invention discloses a head posture monitoring method based on deep learning, as shown in fig. 3, S1, generating a data set for training a neural network by using collected historical image data; the historical image data mainly comprises a face data set, face data sets with different angles and a bed data set.
Wherein the different angle face data sets collect image data of different head rotation angles of a plurality of persons, and the regions have discontinuous properties and can be represented by discontinuous head gestures. The right center is set to 0 °, the head is marked at every 15 ° in the left-right direction X, and similarly, the head pitch angle Y is also marked at every 15 °, and the range of head rotation is represented by (X, Y). Since the maximum angle is 90 ° in all of the four directions, there are 13 angles in the up-down direction and 13 angles in the left-right direction, (X, Y) have 169 deflection angles in total. Therefore, only when the pitch angle is 0 degrees, the left and right heads can rotate to 90 degrees and 90 degrees, the left and right heads of other pitch angles cannot rotate to 90 degrees and 90 degrees, and finally 145 head posture data sets are determined. Each head pose has a corresponding angle label. The head is required to lean to one side when the user goes to rest the pillow, so that the 145 head postures can be divided into two main types of the lying posture of the user who goes to rest the pillow and the lying posture of the user who does not go to rest the pillow. When the absolute value of X is more than or equal to 75 degrees and less than or equal to 90 degrees and the absolute value of Y is more than or equal to 0 degrees and less than or equal to 30 degrees, the head is regarded as the lying posture with no pillow removal, and the rest are regarded as the lying postures without pillow removal.
And eliminating the condition that the face data sets at different angles are particularly fuzzy and the face cannot be distinguished. And labeling the removed data set, and manually framing to select a face area in the image.
Similarly, the bed data set is rejected if the bed cannot be distinguished. Labeling the data set after the data set is removed, and manually framing out the bed area in the image.
And S2, training the neural network by using the data set to obtain parameter models of the first neural network and the second neural network.
S21, 80% of the face data set, the face data set at different angles, and the bed data set are used for training, called training set, and the remaining 20% are used for verification, called verification set.
S22, inputting the face data set and the bed data set in the data set of the step S21 into a first neural network for training, and obtaining a parameter model.
S221, coordinates (x, y) of the center point of the target boundary box, height and width (w, h) of the boundary box, bed, face, background and confidence are predicted by using a YOLOv3 algorithm.
S222, the boundary frame is corrected by backward propagation, and the detected object in the image is predicted, and if IoU (intersection ratio) between the predicted boundary frame and the boundary frame selected by the manual frame is less than 0.6, it is considered that the error of the predicted target boundary frame is too large, and the network continuously transfers the error backward to the previous stage, so as to ensure that IoU between the predicted boundary frame and the boundary frame selected by the manual frame is 0.6 or more.
The existing YOLOv3 algorithm can locate and track all faces appearing in the video, but only the face of the person to be detected needs to be tracked in the application. Because one camera only aims at one bed and does not consider the situation that a plurality of people sleep on one bed, but a plurality of faces can appear on one image by considering various factors such as accompanying personnel and the like, a face boundary frame of a monitored person needs to be determined on the image.
S223, the weight of the first neural network can be obtained through training by using the data set, and the weight is used for accelerating the judgment speed of the boundary box of the face and the bed of the neural network in real-time monitoring.
And S23, inputting the different-angle face data sets in the data set of the step S21 into a second neural network for training, and obtaining a parameter model.
S231, the training process of the second neural network is to input the face data sets of different angles into the second neural network, the second neural network is an improved Vgg16 network, and the Vgg16 network is sensitive to different kinds of features but not sensitive to the same kind of objects, so that improvement is performed on the basis of the Vgg16 network.
S232, the second neural network is shown in FIG. 5, and the specific structure is as follows: the convolutional layer, the pooling layer, the BN layer and the ReLu activation functions are sequentially stacked and repeated for four times, the output of each ReLu activation function is respectively output to the next convolutional layer and the convolutional layer C5, because of the four times of repetition, four ReLu activation functions are provided, each of the four ReLu activation functions is connected with one convolutional layer C5, each of the four convolutional layers C5 respectively outputs a first characteristic, a second characteristic, a third characteristic and a fourth characteristic, the output of the last ReLu activation function is output as an output characteristic after being processed by 3 full connection layers to be FC 25, the output characteristics after the first characteristic, the second characteristic, the third characteristic and the fourth characteristic and the FC3 are input into the characteristic fusion layer, the characteristic fusion layer inputs the processing result into the softmax layer, and the head posture is judged according to the probability value 3535 3, so that the monitoring of the head posture is realized.
As shown in fig. 6, where C denotes a convolution layer, P denotes a pooling layer, FC denotes a full-link layer, and S denotes a softmax classifier, feature extraction is performed on an image once per convolution and pooling layer. The convolution layers all use convolution kernels of 3 x 3, 2 convolution kernels of 3 x 3 are stacked on C1 and C2, 3 convolution kernels of 3 x 3 are stacked on C3 and C4, the convolution kernels of 3 x 3 can be stacked to replace convolution kernels of larger scale, operation parameters can be reduced by stacking the convolution kernels of 3 x 3, and operation speed is improved. The convolutional layer contains a BN layer (Batch Normalization) and a reli (Rectified linear unit), activation function. After the convolution kernels are stacked, a BN layer is added after the convolution kernels are stacked, and a ReLu activation function is added after the BN layer, so that the training and convergence speed of the network is accelerated, and the problems of gradient explosion and gradient disappearance are solved. The pooling layers are all subjected to maximal pooling operation with convolution kernel size of 2 x 2 and step length of 2, a 1 x 1 convolution layer is added behind the pooling layers, and the down-sampling and the up-sampling are carried out firstly, so that the feature extraction is clearer.
And S233, training the data sets at different angles to obtain weights of the second neural network, so as to accelerate feature collection at different angles in real-time monitoring.
And S3, acquiring a real-time image needing head posture monitoring, and preprocessing the real-time image to obtain a preprocessed image.
The method comprises the steps of shooting an obtained real-time picture by a camera, preprocessing the acquired real-time picture to obtain a preprocessed image, wherein the preprocessing comprises filtering and denoising the real-time picture.
And S4, transmitting the preprocessed image into a first neural network, and determining a human face bounding box which needs to be monitored in the preprocessed image. The first neural network is divided into two parts, the first part identifies all human faces and bed boundary frames, and the second part screens a plurality of human face boundary frames to obtain the human face boundary frames to be monitored.
And S41, rapidly determining the boundary box of the face and the bed according to the parameter model of the first neural network trained in the S22.
S42, determining the boundary frame and the limiting condition of the face and the bed according to S41, and determining the boundary frame of the face of the monitored person;
s421, using the face area of the monitored person only in bed, firstly setting the face selection area as A and the bed selection area as B, which must satisfy
Figure BDA0002912536970000071
If the human face area can not be detected, the person is considered to be not in bed. If the face frame selection area can be detected but is not in the bed frame selection area, no person is in the bed. If no person is in the bed, the process returns to step S3, otherwise, step S422 is executed.
S422, if only one face frame selection area is in the bed frame selection area, the face image of the monitored person in the range is determined, step S5 is executed, if a plurality of face boundary frames are in the bed boundary frame, the interference of irrelevant persons is required to be eliminated, and step S423 is executed.
In step S423, considering that the head pose is monitored during sleep, and therefore the head motion amplitude is small, using the face bounding box obtained in step S41,
Figure BDA0002912536970000081
in the case of (1), the contents in the face bounding boxes are extracted, wherein A is a face frame selection area and B is a bed frame selection area. Suppose that the k-th personal face bounding box, i (i ≦ k) th personal face bounding box take 4 consecutive frames of images, fi1、fi2、fi3、fi4Respectively making difference D between two adjacent frames of images in each bounding box areai,Di1=|fi3-fi2|∩|fi2-fi1|,Di2=|fi4-fi3|∩|fi3-fi2I, then solving the union set of the differences to obtain a difference image Di=Di1∪Di2Giving a threshold value T, and carrying out binarization operation on the image to obtain a binarized image Ri
Figure BDA0002912536970000082
1 represents the point of motion and 0 represents the background, the percentage of 1 to the total pixels in the bounding box is calculated, and R is calculatediThe face bounding box with the smallest proportion is the face bounding box of the monitored person.
And S5, determining the angle of the face image by the second neural network.
And S51, cutting the human face bounding box area obtained in the step S4 from the preprocessed image, and performing resize (stretching and compressing) operation on the human face area to ensure that the human face area image with the image size of 224 multiplied by 224 input into the multi-feature fusion convolution neural network module is put into a second neural network.
S52, the characteristics of the face region image can be rapidly obtained through the second neural network parameter model S22. The method for sequentially extracting the features comprises the following steps: the first feature, the second feature, the third feature, the fourth feature, and an output feature after FC 3.
S53, backward propagation of the first, second, third, fourth and FC3 post-signatures, total signature loss:
Figure BDA0002912536970000083
where T represents a feature function, n represents a total of n features, Li represents a loss function of the ith feature, pi,kRepresenting the probability that the ith feature is predicted as the kth class.
From the first feature to the output feature after FC3, the feature is abstracted more and more, the receptive field of the convolutional neural network is gradually increased, the learned feature is smaller and more concrete, and the detailed information can be focused more and more. From the perspective of the feature map, the neural network focuses more on learning the color and overall shape from feature 1, the contour of the graph from feature two, and the facial features such as hair, nose, eyes, mouth, and the like of the human face with different angles and different shapes from the output features after features 3, 4 and FC 3.
And S54, determining the angle of the face image, and judging whether the head is in a non-resting and lying position.
And S6, returning to the step S3, and continuously monitoring the head posture. Meanwhile, if the head is in the non-occipital lying position posture for 250 continuous frames or more than 600 continuous frames within one minute, an alarm signal is sent out to prompt the head posture of the monitored person to be the non-occipital lying position posture.
The invention also discloses a head posture monitoring system based on deep learning, which specifically comprises a data acquisition module, an image processing module and an alarm module; wherein, the data acquisition module executes the step S3 to acquire the sickbed and images at different angles; the image processing module is used for executing S4 and S5 and processing the acquired image data, specifically, parameter models of two neural networks are obtained through preprocessing, a real-time image is put into a first neural network frame to select a face image, then the detected network is put into a second neural network, and the determination of the head posture is realized through multi-feature fusion; the alarm module is used for sending out a signal if 250 continuous frames or more than 600 continuous frames in one minute are in the non-unbuckled lying position.
The invention also discloses a head posture monitoring device based on deep learning, which specifically comprises a bed 1, a camera 2, a camera fixing support 3, a computer 4 and an alarm 5, wherein the camera is arranged right above the bed head through the camera fixing support, so that the camera can shoot a picture right facing to a human face, the camera is connected with the computer through a network, the computer is used for executing S4, S5 and S6, processing the acquired image data, judging whether the head is lying on side or not according to the picture acquired from the camera, and the alarm is connected with the computer through the network and used for alarming.
Finally, it should be noted that: the above-mentioned embodiments are only used for illustrating the technical solution of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; and the modifications or the substitutions do not make the essence of the corresponding technical solutions depart from the scope of the technical solutions of the embodiments of the present invention.

Claims (6)

1. A head posture monitoring method based on deep learning is characterized in that: which comprises the following steps:
s1, generating a data set for training a neural network by using the collected historical image data;
the historical image data mainly comprises a face data set, face data sets with different angles and a bed data set;
s2, training the neural network by using the data set to obtain the weight values of the first neural network and the second neural network;
s21, using 80% of the face data set, the face data set at different angles and the bed data set for training, called as a training set, and using the rest 20% for verification, called as a verification set;
s22, inputting the face data set and the bed data set in the data set of the step S21 into a first neural network for training, and obtaining a parameter model;
s23, inputting the face data sets with different angles in the data set of the step S21 into a second neural network for training, and obtaining a parameter model;
s231, the training process of the second neural network is to input the face data sets of different angles into the second neural network, and the second neural network is an improved Vgg16 network;
s232, the specific structure of the second neural network is as follows: the convolutional layers, the pooling layer, the BN layer and the ReLu activation function are sequentially stacked and repeated for four times, the convolutional layers are sequentially convolutional layers C1, C2, C3 and C4, the output of each ReLu activation function is respectively output to the next convolutional layer and the convolutional layer C5, because of the four times of repetition, four ReLu activation functions are provided, each ReLu activation function is connected with one convolutional layer C5, the four convolutional layers C5 respectively output a first characteristic, a second characteristic, a third characteristic and a fourth characteristic, the output of the last ReLu activation function is output as an output characteristic after being output as FC3 through 3 full connection layers, the first characteristic, the second characteristic, the third characteristic and the fourth characteristic and the output characteristic after FC3 are input to the characteristic fusion layer, the characteristic fusion layer inputs the processing result to the softmax, the head posture is judged according to the probability value, and the monitoring of the head posture is realized;
s233, training by using data sets of different angles to obtain weights of a second neural network, and accelerating feature collection of different angles in real-time monitoring;
s3, acquiring a real-time image needing head posture monitoring, and preprocessing the real-time image to obtain a preprocessed image;
the method comprises the steps that a camera shoots an obtained real-time picture, and collected real-time images are preprocessed to obtain preprocessed images, wherein the preprocessing comprises filtering and noise reduction of the real-time images;
s4, transmitting the preprocessed image into a first neural network for determining a face bounding box to be monitored in the preprocessed image, wherein the first neural network is divided into two parts, the first part identifies all faces and bed bounding boxes, the second part screens a plurality of face bounding boxes to obtain the face bounding box to be monitored, and the steps specifically comprise:
s41, rapidly determining a boundary box of the face and the bed according to the parameter model of the first neural network trained in the S22;
s42, determining the boundary frame and the limiting condition of the face and the bed according to S41, and determining the boundary frame of the face of the monitored person;
s421, firstly, setting the face selection area as A and the bed selection area as B, which must satisfy
Figure FDA0002912536960000011
If the human face area cannot be detected, the person is considered to be unmanned in bed; if the face frame selection area can be detected but is not in the bed frame selection area, the person is still absent in the bed; if no person is in the bed, returning to the step S3, otherwise executing the step S422;
s422, if only one face frame selection area is in the bed frame selection area, determining the face image of the monitored person in the area, executing step S5, if a plurality of face boundary frames are in the bed boundary frame, eliminating interference of irrelevant persons, and executing step S423;
in step S423, considering that the head pose is monitored during sleep, and therefore the head motion amplitude is small, using the face bounding box obtained in step S41,
Figure FDA0002912536960000021
under the condition of (1), taking out contents in a plurality of face bounding boxes, wherein A is a face frame selection area, and B is a bed frame selection area; suppose that the k-th personal face bounding box, i (i ≦ k) th personal face bounding box take 4 consecutive frames of images, fi1、fi2、fi3、fi4Respectively making difference D between two adjacent frames of images in each bounding box areai,Di1=|fi3-fi2|∩|fi2-fi1|,Di2=|fi4-fi3|∩|fi3-fi2I, then solving the union set of the differences to obtain a difference image Di=Di1∪Di2Giving a threshold value T, and carrying out binarization operation on the image to obtain a binarized image Ri
Figure FDA0002912536960000022
1 represents the point of motion and 0 represents the background, the percentage of 1 to the total pixels in the bounding box is calculated, and R is calculatediThe face boundary box with the minimum proportion is the face boundary box of the monitored person;
s5, determining the angle of the face image by the second neural network;
s6, returning to the step S3, and continuously monitoring the head posture; meanwhile, if the head is in the non-occipital lying position posture for 250 continuous frames or more than 600 frames in one minute, an alarm signal is sent out to prompt that the head posture of the monitored person is in the non-occipital lying position posture.
2. A deep learning based head pose monitoring method according to claim 1, wherein:
the face data sets of different angles acquire image data of different head rotation angles of a plurality of people, and the areas have discontinuous attributes and can be represented by discontinuous head postures; the right middle is 0 degrees, then the head is marked at intervals of 15 degrees in the left-right direction X, the head pitch angle Y is also marked at intervals of 15 degrees, and the range of the head rotation is shown by using (X, Y); the four directions of the upper direction, the lower direction, the left direction and the right direction are all at the maximum angle of 90 degrees, so that 13 angles exist in the upper direction, the lower direction and the left direction and the right direction, and the total number of 169 deflection angles are provided for (X, Y), due to the limitation of a body, the left direction and the right direction can be rotated to-90 degrees and 90 degrees only when the pitch angle is 0 degree, the heads at other pitch angles can not be rotated to-90 degrees and 90 degrees left and right, and therefore 145 head posture data sets are determined; each head pose has a corresponding angle label; the head is required to lean to one side when the user sleeps on the back, so that 145 head postures can be divided into two major classes of the head-lying posture with pillow removed and the head-lying posture without pillow removed; when the absolute value of X is more than or equal to 75 degrees and less than or equal to 90 degrees and the absolute value of Y is more than or equal to 0 degrees and less than or equal to 30 degrees, the head is regarded as the lying posture with no pillow removal, and the rest are regarded as the lying postures without pillow removal.
3. A deep learning based head pose monitoring method according to claim 1, wherein:
in the step S22, the face data set and the bed data set in the data set of the step S21 are input to a first neural network for training, and a parameter model is obtained, specifically including the steps of;
s221, predicting coordinates (x, y) of the center point of the target boundary box, height and width (w, h) of the boundary box, a bed, a human face, a background and confidence by using a YOLOv3 algorithm;
s222, correcting the boundary frame through backward propagation, meanwhile, predicting the detected object in the image, and if IoU of the boundary frame selected by the predicted boundary frame and the boundary frame selected by the manual frame is less than 0.6, considering that the error of the predicted target boundary frame is too large, the network continuously transfers the error to the previous stage in the backward direction to ensure that IoU of the boundary frame selected by the predicted boundary frame and the manual frame is more than or equal to 0.6;
and S223, training by using the data set to obtain a weight of the first neural network, and using the weight to accelerate the judgment speed of the boundary box of the face and the bed of the neural network in real-time monitoring.
4. A deep learning based head pose monitoring method according to claim 1, wherein:
in the step S5, the second neural network determines the angle of the face image, and the specific steps are as follows:
s51, cutting the human face bounding box area obtained in the step S4 from the preprocessed image, performing resize operation on the human face area to ensure that the human face area image with the image size of 224 multiplied by 224 input into the multi-feature fusion convolution neural network module is put into a second neural network to determine the head pose;
s52, feature fusion: every time the picture passes through one convolution layer and one pooling layer, a 1 x 1 convolution layer C5 is added before feature extraction, and the feature extraction sequentially comprises the following steps: the first feature, the second feature, the third feature, the fourth feature, and the output feature after FC 3;
s53, backward propagation of the first, second, third, fourth and FC3 post-signatures, total signature loss:
Figure FDA0002912536960000031
where T represents a feature function, n represents a total of n features, Li represents a feature loss function of the ith feature, and pi,kRepresenting the probability that the ith feature is predicted to be the kth class;
and S54, determining the angle of the face image, and judging whether the head is in a non-resting and lying position.
5. A system for head pose monitoring using the deep learning based head pose monitoring method of claim 1, wherein: it includes: the device comprises a data acquisition module, an image processing module and an alarm module;
the data acquisition module is used for executing the step S3 and acquiring the images of the patient bed and different angles;
the image processing module is used for executing steps S4 and S5, processing the acquired image data, specifically, obtaining parameter models of two neural networks through preprocessing, putting a real-time image into a first neural network frame to select a face image, putting the detected network into a second neural network, and realizing the determination of the head posture through multi-feature fusion;
the alarm module is used for sending out an alarm signal if the non-sleeper-removal lying posture is maintained for 250 continuous frames or more than 600 continuous frames within one minute.
6. An apparatus for head pose monitoring using the deep learning based head pose monitoring method of claim 1, wherein: it includes: a bed, a camera fixing bracket, a computer and an alarm,
the camera is arranged right above the bed head through the camera fixing support, so that the camera can shoot a picture facing to a human face;
the camera is connected with the computer through a network, and the computer is used for executing the steps S4, S5 and S6, processing the collected image data and judging whether the head lies on the side or not according to the pictures obtained from the camera;
the alarm is connected with the computer through a network and used for giving an alarm.
CN202110090638.9A 2021-01-22 2021-01-22 Head posture monitoring method, device and system based on deep learning Active CN112766185B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110090638.9A CN112766185B (en) 2021-01-22 2021-01-22 Head posture monitoring method, device and system based on deep learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110090638.9A CN112766185B (en) 2021-01-22 2021-01-22 Head posture monitoring method, device and system based on deep learning

Publications (2)

Publication Number Publication Date
CN112766185A true CN112766185A (en) 2021-05-07
CN112766185B CN112766185B (en) 2022-06-14

Family

ID=75706836

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110090638.9A Active CN112766185B (en) 2021-01-22 2021-01-22 Head posture monitoring method, device and system based on deep learning

Country Status (1)

Country Link
CN (1) CN112766185B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101937605A (en) * 2010-09-08 2011-01-05 无锡中星微电子有限公司 Sleep monitoring system based on face detection
CN107451568A (en) * 2017-08-03 2017-12-08 重庆邮电大学 Use the attitude detecting method and equipment of depth convolutional neural networks
US20180137642A1 (en) * 2016-11-15 2018-05-17 Magic Leap, Inc. Deep learning system for cuboid detection
CN108764123A (en) * 2018-05-25 2018-11-06 暨南大学 Intelligent recognition human body sleep posture method based on neural network algorithm
CN109145765A (en) * 2018-07-27 2019-01-04 华南理工大学 Method for detecting human face, device, computer equipment and storage medium
WO2019013105A1 (en) * 2017-07-14 2019-01-17 オムロン株式会社 Monitoring assistance system and control method thereof
CN111046734A (en) * 2019-11-12 2020-04-21 重庆邮电大学 Multi-modal fusion sight line estimation method based on expansion convolution
CN111695522A (en) * 2020-06-15 2020-09-22 重庆邮电大学 In-plane rotation invariant face detection method and device and storage medium
CN112132058A (en) * 2020-09-25 2020-12-25 山东大学 Head posture estimation method based on multi-level image feature refining learning, implementation system and storage medium thereof

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101937605A (en) * 2010-09-08 2011-01-05 无锡中星微电子有限公司 Sleep monitoring system based on face detection
US20180137642A1 (en) * 2016-11-15 2018-05-17 Magic Leap, Inc. Deep learning system for cuboid detection
WO2019013105A1 (en) * 2017-07-14 2019-01-17 オムロン株式会社 Monitoring assistance system and control method thereof
CN107451568A (en) * 2017-08-03 2017-12-08 重庆邮电大学 Use the attitude detecting method and equipment of depth convolutional neural networks
CN108764123A (en) * 2018-05-25 2018-11-06 暨南大学 Intelligent recognition human body sleep posture method based on neural network algorithm
CN109145765A (en) * 2018-07-27 2019-01-04 华南理工大学 Method for detecting human face, device, computer equipment and storage medium
CN111046734A (en) * 2019-11-12 2020-04-21 重庆邮电大学 Multi-modal fusion sight line estimation method based on expansion convolution
CN111695522A (en) * 2020-06-15 2020-09-22 重庆邮电大学 In-plane rotation invariant face detection method and device and storage medium
CN112132058A (en) * 2020-09-25 2020-12-25 山东大学 Head posture estimation method based on multi-level image feature refining learning, implementation system and storage medium thereof

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
QIANQIAN BI等: "Research on Driver"s Gaze Zone Estimation Based on Transfer Learning", 《2020 IEEE INTERNATIONAL CONFERENCE ON INFORMATION TECHNOLOGY,BIG DATA AND ARTIFICIAL INTELLIGENCE (ICIBA)》 *
TAIKI MATSUBARA等: "Proposal of an awaking detection system adopting Neural Network in hospital use", 《2008 WORLD AUTOMATION CONGRESS》 *
马中玉: "基于深度学习的头部姿态估计方法研究", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *
高鑫: "病房巡视机器人目标搜寻", 《中国优秀博硕士学位论文全文数据库(硕士)信息科技辑》 *

Also Published As

Publication number Publication date
CN112766185B (en) 2022-06-14

Similar Documents

Publication Publication Date Title
CN110458101B (en) Criminal personnel sign monitoring method and equipment based on combination of video and equipment
CN109522853B (en) Face datection and searching method towards monitor video
US7912246B1 (en) Method and system for determining the age category of people based on facial images
CN108460356A (en) A kind of facial image automated processing system based on monitoring system
CN110147738B (en) Driver fatigue monitoring and early warning method and system
CN108596087B (en) Driving fatigue degree detection regression model based on double-network result
KR102184109B1 (en) The system and the method for recognizing driver's condition of multimodal learning
CN107766819A (en) A kind of video monitoring system and its real-time gait recognition methods
CN110705500A (en) Attention detection method and system for personnel working image based on deep learning
CN110827432B (en) Class attendance checking method and system based on face recognition
CN112541422A (en) Expression recognition method and device with robust illumination and head posture and storage medium
CN110188715A (en) A kind of video human face biopsy method of multi frame detection ballot
CN106881716A (en) Human body follower method and system based on 3D cameras robot
CN108446690A (en) A kind of human face in-vivo detection method based on various visual angles behavioral characteristics
CN114937242A (en) Sleep detection early warning method and device
CN110309693B (en) Multi-level state detection system and method
CN113392765A (en) Tumble detection method and system based on machine vision
CN113963237B (en) Model training method, mask wearing state detection method, electronic device and storage medium
Walizad et al. Driver drowsiness detection system using convolutional neural network
CN114299606A (en) Sleep detection method and device based on front-end camera
CN113627256B (en) False video inspection method and system based on blink synchronization and binocular movement detection
CN112766185B (en) Head posture monitoring method, device and system based on deep learning
JP2004303150A (en) Apparatus, method and program for face identification
CN113326781B (en) Non-contact anxiety recognition method and device based on face video
Sheikh Robust recognition of facial expressions on noise degraded facial images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant