CN112381011A - Non-contact heart rate measurement method, system and device based on face image - Google Patents

Non-contact heart rate measurement method, system and device based on face image Download PDF

Info

Publication number
CN112381011A
CN112381011A CN202011295074.4A CN202011295074A CN112381011A CN 112381011 A CN112381011 A CN 112381011A CN 202011295074 A CN202011295074 A CN 202011295074A CN 112381011 A CN112381011 A CN 112381011A
Authority
CN
China
Prior art keywords
heart rate
video
image
interval
face
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202011295074.4A
Other languages
Chinese (zh)
Other versions
CN112381011B (en
Inventor
李学恩
孙闻
张振山
王红星
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202011295074.4A priority Critical patent/CN112381011B/en
Publication of CN112381011A publication Critical patent/CN112381011A/en
Application granted granted Critical
Publication of CN112381011B publication Critical patent/CN112381011B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/30Noise filtering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The invention belongs to the technical field of computer vision, deep learning and medicine, and particularly relates to a non-contact heart rate measuring method, system and device based on a human face image, aiming at solving the problems that the existing non-contact heart rate measuring method based on the image is greatly influenced by factors such as ROI (region of interest) region selection and environment, the error rate of heart rate calculation is large, and the instantaneity is low. The invention comprises the following steps: the method comprises the steps of obtaining a face position from a face video through face key point detection and positioning, and extracting a face local ROI area frame by frame to be used as a network model input; on the basis of a convolution and time sequence network cascade model, heart rate intervals are divided into different interval categories, a channel attention network SEnet is embedded into a convolution module, weights are learned according to channel importance degrees, and finally the heart rate interval category corresponding to an input video is obtained. The method combines CNN characteristic extraction and LSTM long-time memory neural network, and embeds the channel attention network, thereby realizing heart rate non-contact measurement with low error rate and high efficiency.

Description

Non-contact heart rate measurement method, system and device based on face image
Technical Field
The invention belongs to the technical field of computer vision, deep learning and medicine, and particularly relates to a non-contact heart rate measuring method, system and device based on a face image.
Background
The heart rate measurement mode can be specifically divided into a contact measurement mode and a non-contact heart rate measurement mode, and the contact measurement mode usually requires professional medical equipment to be in direct contact with a human body, so that the measurement is inconvenient. The non-contact measurement includes obtaining photoplethysmography (PPG) signals of a human body based on methods such as an image, a reflection type photoelectric measurement, a wireless electromagnetic field, etc., periodic fluctuation of a heart causes periodic contraction and relaxation of peripheral blood vessels, and analyzing periodic variation of the PPG signals obtained by reflection to obtain a human body heart rate, as shown in fig. 6, the PPG signals are schematic diagrams of components of the PPG signals, and include pulsating pulse blood signals of AC 1Hz, non-pulsating arterial blood signals of DC, venous blood signals, and skin musculoskeletal tissue signals.
The non-contact heart rate estimation method based on the image mainly utilizes a camera to collect a face video and analyzes the frame-by-frame change of the illumination intensity of the face, for example, the non-contact heart rate measurement method [1] based on a visual camera, but the ROI area selection needs the scene to be in a static state as much as possible, the heart rate identification under the motion change scene can hardly be positioned, the time domain discrete signal of the ROI area is extracted through the filtering bandwidth set manually, the heart rate calculation error rate is large, the calculation complexity is high, and the real-time performance is low. The traditional heart rate estimation method generally comprises frequency domain filtering, a Principal Component Analysis (PCA) method and the like which filter interference on an original heart rate signal and acquire a heart rate value, for example, a non-contact heart rate measurement method [2] based on typical correlation analysis, wherein a non-face area is included when an ROI area is selected, interference is introduced, the calculation amount of an algorithm is increased, the interference is easily interfered by expressions, light rays and motion factors, the signal-to-noise ratio of channel heart rate signals with different RGB (red, green and blue) is not contrasted and analyzed, the important degree of different signals on the extraction of the heart rate signals is not considered, and the error rate of the final heart rate calculation is large. A Remote Heart Rate Measurement method is provided by Remote Heart Rate Measurement from Face video units and reactive sites [3], but the ROI area of the method comprises parts such as the lips of a bus and the like, is easily influenced by non-rigid motion such as expression and speaking and introduces redundant interference, and the manually designed light correction method has limitations, and cannot keep the Measurement robustness when the light change interference is large.
As the deep learning method makes great breakthrough in visual tasks such as segmentation, identification, detection and the like, the application effect of the related algorithm in the actual scene is superior to that of the traditional model for manually extracting the characteristics, the end-to-end training can be realized, the algorithm flow is simplified, and the robustness is higher. With the development of the field of deep learning, a deep learning method is used for solving the problem of non-contact heart rate monitoring based on a human face video, so that the manual design of features and a complex filtering process in the traditional method are avoided, and the relationship between the color change of the face and a heart rate signal can be fully learned. At present, a research aiming at human face video Heart Rate Estimation mainly utilizes a single Convolutional Neural Network to process video data of a single frame, such as Visual Heart Rate Estimation with a conditional Neural Network [4], however, an input image of the method is a whole human face and background information, excessive interference information is introduced, and the method uses the single Convolutional Neural Network, ignores time sequence information of the video data, and has a large error Rate of Heart Rate calculation.
Generally speaking, in the prior art, the ROI is selected improperly, so that the interference is excessive or the ROI is easily influenced by environments such as motion change, light change and the like, and the time sequence information of video data is not considered, so that the heart rate calculating error rate is high, and the real-time performance is low.
The following documents are background information related to the present invention:
[1] chenjiaxin, linqingyu, zhou liang, wei xin, cai kai, a non-contact heart rate measuring method based on a visual camera 20180829, CN109259749A.
[2] Yan, Shu Xie Yi, a non-contact heart rate measurement method based on canonical correlation analysis, 20150630, CN105046209A.
[3]Xiaobai Li、Jie Chen、Guoying Zhao、Matti Pietikainen,Remote Heart Rate Measurement from Face Videos under Realistic Situations,IEEE Conference on Computer Vision,201409.
[4]
Figure BDA0002785139860000031
R、Franc V,Visual Heart Rate Estimation with Convolutional Neural Network[J],in BMVC,2108.
Disclosure of Invention
In order to solve the problems in the prior art that the existing image-based non-contact heart rate measurement method is greatly influenced by factors such as ROI (region of interest) region selection and environment, the error rate of heart rate calculation is large, and the instantaneity is low, the invention provides a non-contact heart rate measurement method based on a face image, which comprises the following steps:
step S10, acquiring a face video containing a set frame image as a video to be processed;
step S20, for each frame of image of the video to be processed, obtaining the positions of the nose wing and the lips by a human face 68 characteristic point detection method, and intercepting the set length and width to obtain a ROI regional sequence to be processed;
step S30, obtaining a predicted value of each preset heart rate category interval of the ROI area sequence to be processed through a trained heart rate category identification model;
step S40, taking the heart rate type interval with the maximum predicted value as the heart rate type interval of the video to be processed;
the heart rate category identification model is a cascade model combining a CNN feature extraction network and an LSTM long-time memory neural network, and a channel attention network SEnet is embedded in the CNN feature extraction network.
In some preferred embodiments, the heart rate category interval includes 14:
40-49,50-59,60-69,70-79,80-89,90-99,100-109,110-119,120-129,130-139,140-149,150-159,160-169 and 170-179.
In some preferred embodiments, the heart rate category identification model is trained by:
step B10, acquiring the human face video with the set duration of the set parameters and the real heart rate label value corresponding to each frame of image;
step B20, dividing the real heart rate labeling value into the heart rate category interval, and dividing the face video into L video sequences containing N frames of face images;
step B30, for each frame of image of L video sequences containing N frames of face images, obtaining the positions of the nasal alar and lip by a face 68 characteristic point detection method, intercepting the positions into set length and width to obtain L N frames of ROI regional sequences, and generating L training sample sequences by combining the real heart rate category interval corresponding to the video sequences;
step B40, selecting any training sample sequence, obtaining the predicted value of each preset heart rate category interval of the sample sequence through a heart rate category identification model, and calculating the cross entropy loss value of the maximum predicted value and the corresponding real heart rate category interval;
and B50, if the cross entropy loss value is not lower than a set threshold, adjusting parameters of the heart rate type recognition model, skipping to the step B40, and performing iterative training until the cross entropy loss function value is lower than the set threshold or reaches a set training frequency to obtain the trained heart rate type recognition model.
In some preferred embodiments, the parameters of the parameter-set face video with the set duration are as follows:
setting time length: 2 min;
video frame rate: 30/fps;
distance between camera and collected object: 1 m;
video image size: 1920*1080.
In some preferred embodiments, the L training sample sequences are:
Figure BDA0002785139860000041
wherein ,τjJ is 1,2, …, L represents the j-th sample sequence, L represents a total of L sample sequences,
Figure BDA0002785139860000055
representing the N frame ROI area image of the j sample sequence, N representing the total N frames ROI area images in the sample sequence, fjRepresents the jth sample sequence taujThe corresponding heart rate category interval.
In some preferred embodiments, the cross entropy loss value is calculated by:
Figure BDA0002785139860000051
wherein ,tiIs the true heart rate class interval of the current training sample, SiThe maximum predicted value of the current training sample belonging to each preset heart rate category interval is represented by i as the heart rate category interval;
Figure BDA0002785139860000052
wherein ,
Figure BDA0002785139860000053
the index output belonging to the ith category for the last layer of the fully connected network in the heart rate category identification model,
Figure BDA0002785139860000054
is the sum of the exponential outputs of the 14 classes.
In some preferred embodiments, the heart rate signal-to-noise ratio carried by the green channel of each frame of image in the face video is higher than that of the red and blue channels, and weight distribution is performed according to the importance degree of each channel through a channel attention network SEnet embedded in a CNN feature extraction network.
On the other hand, the invention provides a non-contact heart rate measurement system based on a human face image, which comprises an input module, an ROI (region of interest) extraction module, a category prediction module and an output module;
the input module is configured to acquire a face video containing a set frame image as a video to be processed and input the face video;
the ROI extraction module is configured to acquire the positions of a nasal alar part and a lip of each frame of image of the video to be processed by a human face 68 characteristic point detection method, and intercept and set the length and the width to acquire a ROI sequence to be processed;
the category prediction module is configured to obtain a prediction value of the ROI area sequence to be processed, which belongs to each preset heart rate category interval, through a trained heart rate category identification model;
and the output module is configured to take the heart rate category interval with the largest predicted value as the heart rate category interval of the video to be processed and output the heart rate category interval.
In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-mentioned non-contact heart rate measurement method based on facial images.
In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; the processor is suitable for executing various programs; the storage device is suitable for storing a plurality of programs; the program is suitable to be loaded and executed by a processor to implement the above-mentioned non-contact heart rate measurement method based on human face images.
The invention has the beneficial effects that:
(1) the invention relates to a non-contact heart rate measuring method based on a face image, which is based on the technical principle of photoplethysmography (rPPG), namely the periodic fluctuation of the heart causes the periodic contraction and relaxation of peripheral blood vessels, so that the PPG signal obtained by reflection has periodic change, and camera imaging equipment can obtain the image chromaticity change. The method comprises the steps of utilizing a CNN network to extract features frame by frame aiming at an acquired face video, cascading an LSTM time sequence network on the basis of a single convolution network model, further researching time sequence information between continuous frames of the video for the extracted image space features, utilizing the LSTM to encode the image space feature vectors into a hidden vector to be output, inputting the hidden vector into a two-layer full-connection network, utilizing a softmax function to further calculate the type cross loss of a heart rate interval, and finally acquiring the classification range of the heart rate interval, wherein the heart rate calculation error rate is low, and the efficiency is high.
(2) According to the non-contact heart rate measurement method based on the face image, the channel attention network structure SEnet is embedded into the basic network model ResNet18 on the basis of the ResNet residual error network and the LSTM cascade network model, and the performance of extracting image space features by the convolution model is improved. Aiming at the characteristics of different signal-to-noise ratios of RGB three channels of the face image, wherein the characteristic of high signal-to-noise ratio of the green channel is that different channels are endowed with different weights according to the importance degrees of the channels in the convolution process. Higher weight is distributed to the green channel with higher signal-to-noise ratio and the channel with high-level semantics of green information, and the importance degree of the channel with low signal-to-noise ratio is weakened. The spatial attention mechanism can be used for endowing different weights to spatial feature mapping and remolding the dependency relationship among different channels, so that the time sequence information of video data is fully utilized, and the error rate of heart rate calculation is further reduced.
Drawings
Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:
FIG. 1 is a schematic diagram of a model application process of a non-contact heart rate measurement method based on a human face image;
FIG. 2 is a schematic view of video sequence acquisition according to an embodiment of the non-contact heart rate measurement method based on a human face image;
FIG. 3 is a schematic structural diagram of a channel attention network SEnet according to an embodiment of the non-contact heart rate measurement method based on a human face image;
FIG. 4 is a schematic diagram of a calculation process of a softmax function according to an embodiment of the non-contact heart rate measurement method based on a human face image;
FIG. 5 is a schematic structural diagram of a heart rate category identification model according to an embodiment of the non-contact heart rate measurement method based on a human face image;
figure 6PPG signal component diagram.
Detailed Description
The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.
The invention provides a non-contact heart rate measuring method based on a face image, which utilizes a cascade network of CNN and LSTM to extract the characteristics of a video space image and learn the time sequence information between continuous frames of the video. Meanwhile, the light-weight network channel attention network SEnet is embedded into a CNN convolution structure, so that the intermediate convolution layer containing green channel information with high signal-to-noise ratio is improved, and the accuracy of a prediction result is further improved. The time sequence information extraction model used by the invention is an LSTM long-short time memory network, the LSTM long-short time memory network is a variant of RNN, a memory gating mechanism is added, and memory history information can be selectively stored, so that the time sequence prediction method is widely applied to the prediction problem of time sequences.
The invention discloses a non-contact heart rate measuring method based on a face image, which comprises the following steps:
step S10, acquiring a face video containing a set frame image as a video to be processed;
step S20, for each frame of image of the video to be processed, obtaining the positions of the nose wing and the lips by a human face 68 characteristic point detection method, and intercepting the set length and width to obtain a ROI regional sequence to be processed;
step S30, obtaining a predicted value of each preset heart rate category interval of the ROI area sequence to be processed through a trained heart rate category identification model;
step S40, taking the heart rate type interval with the maximum predicted value as the heart rate type interval of the video to be processed;
the heart rate category identification model is a cascade model combining a CNN feature extraction network and an LSTM long-time memory neural network, and a channel attention network SEnet is embedded in the CNN feature extraction network.
In order to more clearly describe the non-contact heart rate measurement method based on the facial image of the present invention, the following will describe each step in the embodiment of the present invention in detail with reference to fig. 1.
The non-contact heart rate measuring method based on the face image in the first embodiment of the invention comprises the steps of S10-S40, and the steps are described in detail as follows:
step S10, a face video including a set frame image is acquired as a video to be processed.
As shown in fig. 2, which is a schematic view of acquiring a video sequence according to an embodiment of the non-contact heart rate measurement method based on a face image of the present invention, Subject represents an acquisition object of the video sequence, build-in webcam represents a Built-in camera used for acquiring the video sequence, Laptop represents an intelligent processor (such as a notebook, a tablet, a mobile phone, etc.) for performing video processing and heart rate type acquisition, and Finger blood pressure Sensor represents Finger BVP Sensor. In one embodiment of the invention, a face video of the measured object is acquired by a Logitech camera.
And step S20, acquiring the positions of the nose wing and the lips of each frame of image of the video to be processed by a human face 68 characteristic point detection method, and intercepting the set length and width to obtain a ROI area sequence to be processed.
The human face detection technology such as deep learning and Haar feature detection method provides technical support for human face heart rate measurement. With the application of the deep learning method in the field of heart rate identification, the public data sets for non-contact heart rate detection promote the application of the deep learning method in the aspect of heart rate identification and detection problems.
For a video to be processed, a human face is intercepted frame by frame through a Harr feature detection method or an SSD detection method, the positions of a nasal wing and a lip are obtained through a human face 68 feature point detection method, an image is intercepted to be set in length and width, an ROI (region of interest) region of each frame of image is extracted, and an ROI region sequence to be processed corresponding to the video to be processed is obtained. In one embodiment of the present invention, the length and width of the ROI region of each extracted frame image is 410 × 500.
And step S30, obtaining a predicted value that the ROI sequence to be processed belongs to each preset heart rate category interval through the trained heart rate category identification model.
In one embodiment of the invention, heart rate category intervals are divided according to heart rate marking values of samples collected during model training, so that 14 category intervals are divided:
40-49,50-59,60-69,70-79,80-89,90-99,100-109,110-119,120-129,130-139,140-149,150-159,160-169 and 170-179.
And step S40, taking the heart rate type interval with the maximum predicted value as the heart rate type interval of the video to be processed.
The heart rate category identification model is a cascade model combining a CNN feature extraction network and an LSTM long-time memory neural network, and a channel attention network SEnet is embedded in the CNN feature extraction network.
The heart rate category identification model is constructed as a cascade model combining a CNN characteristic extraction network and an LSTM long-time memory neural network. ResNet18 is used as a basic network model, and the backbone network is used for extracting heart rate characteristics. The other methods mostly adopt a single convolutional neural network model to extract the heart rate characteristics, however, the time sequence dependence between adjacent frames of the face video lacks attention and utilization, so the method provides a cascade network of the convolutional network and the time sequence model, and the time sequence characteristic change of the spatial characteristics is extracted by using the LSTM while the face spatial characteristics of each frame of the face video are considered, so that the nonlinear relation between the face color change and the heart rate characteristic extraction is modeled.
The training method of the heart rate category identification model comprises the following steps:
and step B10, acquiring the human face video with the set duration of the set parameters and the real heart rate label value corresponding to each frame of image.
The invention provides a brand-new non-contact heart rate recognition deep learning method and also provides a construction method of a face video and a heart rate labeling data set for research and use.
The acquisition data set is described as follows:
setting time length of each video: and 2 min.
Video frame rate: 30/fps.
Distance between camera and collected object: 1m, and a solvent.
Video image size: 1920*1080.
The size of the intercepted face image is as follows: 410*500.
The collection equipment of selecting for use: logitech camera.
When the face video is collected, the diversity of illumination and motion conditions should be noticed, so that the model training is more robust. And collecting the heart rate data while collecting the video to generate a corresponding heart rate data file.
And step B20, dividing the real heart rate labeling value into the heart rate category interval, and dividing the face video into L video sequences containing N frames of face images.
In one embodiment of the invention, the images of the face video are read frame by frame, and every 20 images are used as a video sequence.
And intercepting the face images with uniform size from the picture by using an SSD (solid State disk) detection model or a Haar feature classifier, and finally dividing the face video into L video sequences containing N frames of face images.
Step B30, for each frame of image of L video sequences containing N frames of face images, obtaining the position of the nasal alar and lip with rich blood vessels by a face 68 feature point detection method, and intercepting the position to be set with length 410 x 500 to obtain L ROI region sequences of N frames, and generating L training sample sequences by combining the real heart rate category interval corresponding to the video sequence, as shown in formula (1):
Figure BDA0002785139860000111
wherein ,τjJ is 1,2, …, L represents the j-th sample sequence, L represents a total of L sample sequences,
Figure BDA0002785139860000112
representing the N frame ROI area image of the j sample sequence, N representing the total N frames ROI area images in the sample sequence, fjRepresents the jth sample sequence taujThe corresponding heart rate category interval.
Such a training data set comprises L training sample sequences, each training sample sequence comprising N ROI region image sets x ∈ χ and corresponding heart rate frequency labels F ∈ F (the true heart rate category interval is given in a frequency manner).
In one embodiment of the invention, every 30 continuous images are used as a sample to mark a heart rate category interval, and each image is subjected to ResNet18 convolutional neural network to obtain a feature vector.
Determining a sequence time length T of each training sample sequence in the training data set, as shown in table 1:
TABLE 1
Time/s Frame rate/fps Total frame Label
1 30 30 60~65
2 30 60 70~75
3 30 90 65~80
In table 1, Time represents the sequence Time length in units of s, Frame rate represents the Frame rate in units of fps, Total Frame represents the Total image Frame number, and Label represents the heart rate category interval.
And step B40, selecting any training sample sequence, obtaining the predicted value of each preset heart rate category interval of the sample sequence through a heart rate category identification model, and calculating the cross entropy loss value of the maximum predicted value and the corresponding real heart rate category interval.
The invention combines the CNN characteristic extraction network and the cascade model of the LSTM long-time memory neural network, and the network structure is shown in Table 2:
TABLE 2
Figure BDA0002785139860000121
In table 2, ResNet is the selected CNN feature extraction network for feature extraction, in which the channel attention network sentet is embedded, an FC full connection layer is added between the CNN feature extraction network and the LSTM, the constraint represents a convolutional layer, max pole represents a maximum pooling layer, average pole represents an average pooling layer, and full connected represents a full connection layer.
Inputting each frame image in the sequence time length T into a heart rate category identification model, and performing feature extraction through a CNN feature extraction network, wherein the CNN feature extraction network is a ResNet network pre-trained by an ImageNet face data set, parameters and calculation consumption are considered, and the last three full-connection layers are modified into two layers.
Obtaining a characteristic channel c from the convolution layer, the pooling layer and the activation function layer2Then, the SENET is input to perform the transformation of the channel weight, and the specific way is as follows:
FIG. 3 is a schematic structural diagram of a channel attention network SEnet according to an embodiment of the non-contact heart rate measurement method based on a human face image, in which the size is h × w × c by average potential2Feature map compression (Squeeze) to 1 x 1 c2Each two-dimensional characteristic channel becomes a real number, which can be regarded as having a global receptive field. 1 x c2And (3) obtaining final weight output by the part through two full connections and a sigmoid activation (excitation) part, and weighting the normalized weight to the characteristic (multiplex by Channel weights) of each Channel through Scale operation.
The features output by the SENET are subjected to multilayer convolution, pooling, activation and dropout operation, and feature description vectors { X: X ] of each picture are output in a full-connection modet,xt+1,xt+2,...xt+T}, wherein xtIs a feature vector of dimension m.
Inputting the vectors into an LSTM long-time memory neural network, and inputting the characteristic vectors { X: X ] of the T time sequencet,xt+1,xt+2,...xt+TEncoding as a hidden vector output xhidden. In one embodiment of the invention, the 30-dimensional feature vector is passed through an LSTM model to output an encoded hidden vector.
The LSTM long-time memory neural network takes the output of the previous moment as the input of the next moment, and each repetitive network has four special gating mechanism structures, wherein the forgetting gate determines what information is discarded from the previous moment in the first step, the input gate determines the retention degree of the input information, then the input gate determines the output of the current moment along with the state of a new cell, and all gating mechanisms pass through a sigmoid unit to determine the proportion of output information.
And the output of the LSTM is used as the input of the next full-connection module, and the prediction value that the sample sequence belongs to each preset heart rate category interval is obtained through the softmax module.
Softmax is an important function in the classification problem, mapping the input to real numbers between 0 and 1, and normalizing the guaranteed sum to 1, for the vector V, V of the output layeriThe ith element in the V is represented, and the calculation method of the maximum predicted value of the element belonging to each preset heart rate category interval is shown as formula (2):
Figure BDA0002785139860000141
wherein ,
Figure BDA0002785139860000142
the index output belonging to the ith category for the last layer of the fully connected network in the heart rate category identification model,
Figure BDA0002785139860000143
is the sum of the exponential outputs of the 14 classes.
FIG. 4 is a schematic diagram of a softmax function calculation process of an embodiment of the non-contact heart rate measurement method based on facial images, for input { y1,y2,...,ynSolving e exponential function to obtain
Figure BDA0002785139860000146
Respectively calculating the probability of each e exponential function in the sum of all e exponential functions
Figure BDA0002785139860000144
The maximum value of the result represents the maximum predicted value of the element belonging to each preset heart rate category interval.
Therefore, the calculation method of the maximum predicted value of the sample sequence and the cross entropy loss value of the corresponding real heart rate category interval is shown as the formula (3):
Figure BDA0002785139860000145
wherein ,tiIs the true heart rate class interval of the current training sample, SiAnd i is the heart rate category interval, which is the maximum predicted value of the current training sample belonging to each preset heart rate category interval.
And the output vector of the LSTM passes through the two full-connection layers and the softmax loss function to obtain a corresponding heart rate classification interval result. The corresponding heart rate category interval is labeled: (40-49,50-59,60-69,70-79,80-89,90-99, 100-. The CNN and LSTM cascade model output is different from that of other methods which directly acquire the specific value of the heart rate by using the regression idea, the regression problem is converted into the classification problem, and the problem that the heart rate estimation result is serious in shock phenomenon compared with the true value due to large interference of environmental illumination change, movement and expression in the process of directly estimating the specific heart rate value is solved.
And B50, if the cross entropy loss value is not lower than a set threshold, adjusting parameters of the heart rate type recognition model, skipping to the step B40, and performing iterative training until the cross entropy loss function value is lower than the set threshold or reaches a set training frequency to obtain the trained heart rate type recognition model.
As shown in fig. 5, which is a schematic structural diagram of a heart rate class recognition model according to an embodiment of the non-contact heart rate measurement method based on facial images of the present invention, an Input sequence of the length N represents an Input sample sequence with a sequence length of N, ResNet18 represents a CNN feature extraction network selected by the present invention, in which a channel attention network SENet is embedded, Conv represents a convolution operation, ReLU represents an activation operation, FC represents a full connection operation,
Figure BDA0002785139860000151
for the feature vector of the T time series, softmax stands for the sort operation, HeThe art interval cams represent the finally acquired heart rate category interval.
As mentioned above, the invention provides a heart rate labeling data set in the implementation process, the human face part is intercepted frame by frame on the basis of the obtained video, the attention mechanism module SENet is added on the basis of the principle that the signal-to-noise ratio of a green channel is higher than that of other channels, the CBAM module gives corresponding weights to different channels again according to the importance degree, and the dependency relationship among the channels of each layer is reshaped. Inputting the feature descriptor output by the ResNet18 convolution module into an LSTM code, and taking the time sequence information of the interframe features as a partial basis for extracting the heart rate rule. Therefore, the method has high industrial utilization value for the application of the non-contact heart rate extraction scene. The user can utilize various terminal devices, such as a computer, a mobile phone, a PC, a tablet and the like, to acquire a human face video image, input the human face video image into the deep learning model after preprocessing, give a corresponding heart rate interval prediction result and input the heart rate interval prediction result into the terminal display device, and therefore heart rate monitoring and health condition early warning of the user by the terminal are achieved.
The non-contact heart rate measurement system based on the face image comprises an input module, an ROI (region of interest) region extraction module, a category prediction module and an output module;
the input module is configured to acquire a face video containing a set frame image as a video to be processed and input the face video;
the ROI extraction module is configured to acquire the positions of a nasal alar part and a lip of each frame of image of the video to be processed by a human face 68 characteristic point detection method, and intercept and set the length and the width to acquire a ROI sequence to be processed;
the category prediction module is configured to obtain a prediction value of the ROI area sequence to be processed, which belongs to each preset heart rate category interval, through a trained heart rate category identification model;
and the output module is configured to take the heart rate category interval with the largest predicted value as the heart rate category interval of the video to be processed and output the heart rate category interval.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.
It should be noted that, the non-contact heart rate measurement system based on a face image provided in the foregoing embodiment is only illustrated by dividing the functional modules, and in practical applications, the functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.
A storage device according to a third embodiment of the present invention stores a plurality of programs, and the programs are suitable for being loaded and executed by a processor to realize the above non-contact heart rate measurement method based on human face images.
A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable to be loaded and executed by a processor to implement the above-mentioned non-contact heart rate measurement method based on human face images.
It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.
The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims (10)

1. A non-contact heart rate measurement method based on a face image is characterized by comprising the following steps:
step S10, acquiring a face video containing a set frame image as a video to be processed;
step S20, for each frame of image of the video to be processed, obtaining the positions of the nose wing and the lips by a human face 68 characteristic point detection method, and intercepting the set length and width to obtain a ROI regional sequence to be processed;
step S30, obtaining a predicted value of each preset heart rate category interval of the ROI area sequence to be processed through a trained heart rate category identification model;
step S40, taking the heart rate type interval with the maximum predicted value as the heart rate type interval of the video to be processed;
the heart rate category identification model is a cascade model combining a CNN feature extraction network and an LSTM long-time memory neural network, and a channel attention network SEnet is embedded in the CNN feature extraction network.
2. The non-contact heart rate measurement method based on the facial image as claimed in claim 1, wherein the heart rate category interval includes 14:
40-49,50-59,60-69,70-79,80-89,90-99,100-109,110-119,120-129,130-139,140-149,150-159,160-169 and 170-179.
3. The non-contact heart rate measurement method based on the facial image as claimed in claim 2, wherein the heart rate category recognition model is trained by:
step B10, acquiring the human face video with the set duration of the set parameters and the real heart rate label value corresponding to each frame of image;
step B20, dividing the real heart rate labeling value into the heart rate category interval, and dividing the face video into L video sequences containing N frames of face images;
step B30, for each frame of image of L video sequences containing N frames of face images, obtaining the positions of the nasal alar and lip by a face 68 characteristic point detection method, intercepting the positions into set length and width to obtain L N frames of ROI regional sequences, and generating L training sample sequences by combining the real heart rate category interval corresponding to the video sequences;
step B40, selecting any training sample sequence, obtaining the predicted value of each preset heart rate category interval of the sample sequence through a heart rate category identification model, and calculating the cross entropy loss value of the maximum predicted value and the corresponding real heart rate category interval;
and B50, if the cross entropy loss value is not lower than a set threshold, adjusting parameters of the heart rate type recognition model, skipping to the step B40, and performing iterative training until the cross entropy loss function value is lower than the set threshold or reaches a set training frequency to obtain the trained heart rate type recognition model.
4. The non-contact heart rate measurement method based on the human face image as claimed in claim 3, wherein the parameters of the human face video with the set duration of the set parameters are:
setting time length: 2 min;
video frame rate: 30/fps;
distance between camera and collected object: 1 m;
video image size: 1920*1080.
5. The non-contact heart rate measurement method based on the facial image according to claim 3, wherein the L training sample sequences are:
Figure FDA0002785139850000021
wherein ,τjJ is 1,2, …, L represents the j-th sample sequence, L represents a total of L sample sequences,
Figure FDA0002785139850000022
n represents the ROI area image of the nth frame of the jth sample sequence, N represents the total N ROI area images in the sample sequence, fjRepresents the jth sample sequence taujThe corresponding heart rate category interval.
6. The non-contact heart rate measurement method based on the facial image as claimed in claim 3, wherein the cross entropy loss value is calculated by:
Figure FDA0002785139850000031
wherein ,tiIs the true heart rate class interval of the current training sample, SiThe maximum predicted value of the current training sample belonging to each preset heart rate category interval is represented by i as the heart rate category interval;
Figure FDA0002785139850000032
wherein ,
Figure FDA0002785139850000033
the index output belonging to the ith category for the last layer of the fully connected network in the heart rate category identification model,
Figure FDA0002785139850000034
is the sum of the exponential outputs of the 14 classes.
7. The non-contact heart rate measurement method based on the human face image according to claim 1, wherein the heart rate signal-to-noise ratio carried by the green channel of each frame image in the human face video is higher than that of the red and blue channels, and weight distribution is performed according to the importance degree of each channel through a channel attention network SEnet embedded in a CNN feature extraction network.
8. A non-contact heart rate measurement system based on a face image is characterized by comprising an input module, an ROI (region of interest) region extraction module, a category prediction module and an output module;
the input module is configured to acquire a face video containing a set frame image as a video to be processed and input the face video;
the ROI extraction module is configured to acquire the positions of a nasal alar part and a lip of each frame of image of the video to be processed by a human face 68 characteristic point detection method, and intercept and set the length and the width to acquire a ROI sequence to be processed;
the category prediction module is configured to obtain a prediction value of the ROI area sequence to be processed, which belongs to each preset heart rate category interval, through a trained heart rate category identification model;
and the output module is configured to take the heart rate category interval with the largest predicted value as the heart rate category interval of the video to be processed and output the heart rate category interval.
9. A storage device having stored thereon a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the method for contactless heart rate measurement based on facial images according to any of claims 1-7.
10. A treatment apparatus comprises
A processor adapted to execute various programs; and
a storage device adapted to store a plurality of programs;
wherein the program is adapted to be loaded and executed by a processor to perform:
the method for non-contact heart rate measurement based on facial images of any one of claims 1-7.
CN202011295074.4A 2020-11-18 2020-11-18 Non-contact heart rate measurement method, system and device based on face image Active CN112381011B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011295074.4A CN112381011B (en) 2020-11-18 2020-11-18 Non-contact heart rate measurement method, system and device based on face image

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011295074.4A CN112381011B (en) 2020-11-18 2020-11-18 Non-contact heart rate measurement method, system and device based on face image

Publications (2)

Publication Number Publication Date
CN112381011A true CN112381011A (en) 2021-02-19
CN112381011B CN112381011B (en) 2023-08-22

Family

ID=74584155

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011295074.4A Active CN112381011B (en) 2020-11-18 2020-11-18 Non-contact heart rate measurement method, system and device based on face image

Country Status (1)

Country Link
CN (1) CN112381011B (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580612A (en) * 2021-02-22 2021-03-30 中国科学院自动化研究所 Physiological signal prediction method
CN113080907A (en) * 2021-04-14 2021-07-09 贵州省人民医院 Pulse wave signal processing method and device
CN113255585A (en) * 2021-06-23 2021-08-13 之江实验室 Face video heart rate estimation method based on color space learning
CN113408508A (en) * 2021-08-20 2021-09-17 中国科学院自动化研究所 Transformer-based non-contact heart rate measurement method
CN113397516A (en) * 2021-06-22 2021-09-17 山东科技大学 Newborn-oriented visual heart rate estimation method, device and system
CN113420624A (en) * 2021-06-11 2021-09-21 华中师范大学 Non-contact fatigue detection method and system
CN113688985A (en) * 2021-07-26 2021-11-23 浙江大华技术股份有限公司 Training method of heart rate estimation model, heart rate estimation method and device
CN114912487A (en) * 2022-05-10 2022-08-16 合肥中聚源智能科技有限公司 End-to-end remote heart rate detection method based on channel enhanced space-time attention network
CN116842330A (en) * 2023-08-31 2023-10-03 庆云县人民医院 Health care information processing method and device capable of comparing histories

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110163126A (en) * 2019-05-06 2019-08-23 北京华捷艾米科技有限公司 A kind of biopsy method based on face, device and equipment
WO2019202305A1 (en) * 2018-04-16 2019-10-24 Clinicco Ltd System for vital sign detection from a video stream
CN111345803A (en) * 2020-03-20 2020-06-30 浙江大学城市学院 Heart rate variability measuring method based on mobile device camera
CN111407245A (en) * 2020-03-19 2020-07-14 南京昊眼晶睛智能科技有限公司 Non-contact heart rate and body temperature measuring method based on camera
CN111626182A (en) * 2020-05-25 2020-09-04 浙江大学 Method and system for accurately detecting human heart rate and facial blood volume based on video

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019202305A1 (en) * 2018-04-16 2019-10-24 Clinicco Ltd System for vital sign detection from a video stream
CN110163126A (en) * 2019-05-06 2019-08-23 北京华捷艾米科技有限公司 A kind of biopsy method based on face, device and equipment
CN111407245A (en) * 2020-03-19 2020-07-14 南京昊眼晶睛智能科技有限公司 Non-contact heart rate and body temperature measuring method based on camera
CN111345803A (en) * 2020-03-20 2020-06-30 浙江大学城市学院 Heart rate variability measuring method based on mobile device camera
CN111626182A (en) * 2020-05-25 2020-09-04 浙江大学 Method and system for accurately detecting human heart rate and facial blood volume based on video

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XUAN ANH NGUYEN 等: "Surgical skill levels: Classification and analysis using deep neural network model and motion signals", 《COMPUTER METHODS AND PROGRAMS IN BIOMEDICINE》 *

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112580612A (en) * 2021-02-22 2021-03-30 中国科学院自动化研究所 Physiological signal prediction method
CN112580612B (en) * 2021-02-22 2021-06-08 中国科学院自动化研究所 Physiological signal prediction method
US11227161B1 (en) 2021-02-22 2022-01-18 Institute Of Automation, Chinese Academy Of Sciences Physiological signal prediction method
CN113080907A (en) * 2021-04-14 2021-07-09 贵州省人民医院 Pulse wave signal processing method and device
CN113420624A (en) * 2021-06-11 2021-09-21 华中师范大学 Non-contact fatigue detection method and system
CN113397516A (en) * 2021-06-22 2021-09-17 山东科技大学 Newborn-oriented visual heart rate estimation method, device and system
CN113397516B (en) * 2021-06-22 2022-03-25 山东科技大学 Newborn-oriented visual heart rate estimation method, device and system
CN113255585A (en) * 2021-06-23 2021-08-13 之江实验室 Face video heart rate estimation method based on color space learning
CN113688985A (en) * 2021-07-26 2021-11-23 浙江大华技术股份有限公司 Training method of heart rate estimation model, heart rate estimation method and device
CN113408508A (en) * 2021-08-20 2021-09-17 中国科学院自动化研究所 Transformer-based non-contact heart rate measurement method
CN113408508B (en) * 2021-08-20 2021-11-30 中国科学院自动化研究所 Transformer-based non-contact heart rate measurement method
CN114912487A (en) * 2022-05-10 2022-08-16 合肥中聚源智能科技有限公司 End-to-end remote heart rate detection method based on channel enhanced space-time attention network
CN114912487B (en) * 2022-05-10 2024-04-26 合肥中聚源智能科技有限公司 End-to-end remote heart rate detection method based on channel enhanced space-time attention network
CN116842330A (en) * 2023-08-31 2023-10-03 庆云县人民医院 Health care information processing method and device capable of comparing histories
CN116842330B (en) * 2023-08-31 2023-11-24 庆云县人民医院 Health care information processing method and device capable of comparing histories

Also Published As

Publication number Publication date
CN112381011B (en) 2023-08-22

Similar Documents

Publication Publication Date Title
CN112381011B (en) Non-contact heart rate measurement method, system and device based on face image
CN106709449B (en) Pedestrian re-identification method and system based on deep learning and reinforcement learning
CN105354548B (en) A kind of monitor video pedestrian recognition methods again based on ImageNet retrievals
Boom et al. A research tool for long-term and continuous analysis of fish assemblage in coral-reefs using underwater camera footage
CN108629306B (en) Human body posture recognition method and device, electronic equipment and storage medium
CN110991340B (en) Human body action analysis method based on image compression
CN113326835B (en) Action detection method and device, terminal equipment and storage medium
CN108647625A (en) A kind of expression recognition method and device
Huynh-The et al. NIC: A robust background extraction algorithm for foreground detection in dynamic scenes
CN113591968A (en) Infrared weak and small target detection method based on asymmetric attention feature fusion
CN106203255B (en) A kind of pedestrian based on time unifying recognition methods and system again
Hong et al. Geodesic regression on the Grassmannian
Wang et al. A coupled encoder–decoder network for joint face detection and landmark localization
Manttari et al. Interpreting video features: A comparison of 3D convolutional networks and convolutional LSTM networks
Schiele Model-free tracking of cars and people based on color regions
CN114332911A (en) Head posture detection method and device and computer equipment
CN112287802A (en) Face image detection method, system, storage medium and equipment
CN116343284A (en) Attention mechanism-based multi-feature outdoor environment emotion recognition method
Lin et al. Global feature integration based salient region detection
CN104679967A (en) Method for judging reliability of psychological test
JP6166981B2 (en) Facial expression analyzer and facial expression analysis program
Yang et al. Visual saliency detection with center shift
Wang et al. Transphys: Transformer-based unsupervised contrastive learning for remote heart rate measurement
Nooruddin et al. A multi-resolution fusion approach for human activity recognition from video data in tiny edge devices
Xu et al. Covariance descriptor based convolution neural network for saliency computation in low contrast images

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant