CN112381011A

CN112381011A - Non-contact heart rate measurement method, system and device based on face image

Info

Publication number: CN112381011A
Application number: CN202011295074.4A
Authority: CN
Inventors: 李学恩; 孙闻; 张振山; 王红星
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2020-11-18
Filing date: 2020-11-18
Publication date: 2021-02-19
Anticipated expiration: 2040-11-18
Also published as: CN112381011B

Abstract

The invention belongs to the technical field of computer vision, deep learning and medicine, and particularly relates to a non-contact heart rate measuring method, system and device based on a human face image, aiming at solving the problems that the existing non-contact heart rate measuring method based on the image is greatly influenced by factors such as ROI (region of interest) region selection and environment, the error rate of heart rate calculation is large, and the instantaneity is low. The invention comprises the following steps: the method comprises the steps of obtaining a face position from a face video through face key point detection and positioning, and extracting a face local ROI area frame by frame to be used as a network model input; on the basis of a convolution and time sequence network cascade model, heart rate intervals are divided into different interval categories, a channel attention network SEnet is embedded into a convolution module, weights are learned according to channel importance degrees, and finally the heart rate interval category corresponding to an input video is obtained. The method combines CNN characteristic extraction and LSTM long-time memory neural network, and embeds the channel attention network, thereby realizing heart rate non-contact measurement with low error rate and high efficiency.

Description

Non-contact heart rate measurement method, system and device based on face image

Technical Field

The invention belongs to the technical field of computer vision, deep learning and medicine, and particularly relates to a non-contact heart rate measuring method, system and device based on a face image.

Background

The heart rate measurement mode can be specifically divided into a contact measurement mode and a non-contact heart rate measurement mode, and the contact measurement mode usually requires professional medical equipment to be in direct contact with a human body, so that the measurement is inconvenient. The non-contact measurement includes obtaining photoplethysmography (PPG) signals of a human body based on methods such as an image, a reflection type photoelectric measurement, a wireless electromagnetic field, etc., periodic fluctuation of a heart causes periodic contraction and relaxation of peripheral blood vessels, and analyzing periodic variation of the PPG signals obtained by reflection to obtain a human body heart rate, as shown in fig. 6, the PPG signals are schematic diagrams of components of the PPG signals, and include pulsating pulse blood signals of AC 1Hz, non-pulsating arterial blood signals of DC, venous blood signals, and skin musculoskeletal tissue signals.

The non-contact heart rate estimation method based on the image mainly utilizes a camera to collect a face video and analyzes the frame-by-frame change of the illumination intensity of the face, for example, the non-contact heart rate measurement method [1] based on a visual camera, but the ROI area selection needs the scene to be in a static state as much as possible, the heart rate identification under the motion change scene can hardly be positioned, the time domain discrete signal of the ROI area is extracted through the filtering bandwidth set manually, the heart rate calculation error rate is large, the calculation complexity is high, and the real-time performance is low. The traditional heart rate estimation method generally comprises frequency domain filtering, a Principal Component Analysis (PCA) method and the like which filter interference on an original heart rate signal and acquire a heart rate value, for example, a non-contact heart rate measurement method [2] based on typical correlation analysis, wherein a non-face area is included when an ROI area is selected, interference is introduced, the calculation amount of an algorithm is increased, the interference is easily interfered by expressions, light rays and motion factors, the signal-to-noise ratio of channel heart rate signals with different RGB (red, green and blue) is not contrasted and analyzed, the important degree of different signals on the extraction of the heart rate signals is not considered, and the error rate of the final heart rate calculation is large. A Remote Heart Rate Measurement method is provided by Remote Heart Rate Measurement from Face video units and reactive sites [3], but the ROI area of the method comprises parts such as the lips of a bus and the like, is easily influenced by non-rigid motion such as expression and speaking and introduces redundant interference, and the manually designed light correction method has limitations, and cannot keep the Measurement robustness when the light change interference is large.

As the deep learning method makes great breakthrough in visual tasks such as segmentation, identification, detection and the like, the application effect of the related algorithm in the actual scene is superior to that of the traditional model for manually extracting the characteristics, the end-to-end training can be realized, the algorithm flow is simplified, and the robustness is higher. With the development of the field of deep learning, a deep learning method is used for solving the problem of non-contact heart rate monitoring based on a human face video, so that the manual design of features and a complex filtering process in the traditional method are avoided, and the relationship between the color change of the face and a heart rate signal can be fully learned. At present, a research aiming at human face video Heart Rate Estimation mainly utilizes a single Convolutional Neural Network to process video data of a single frame, such as Visual Heart Rate Estimation with a conditional Neural Network [4], however, an input image of the method is a whole human face and background information, excessive interference information is introduced, and the method uses the single Convolutional Neural Network, ignores time sequence information of the video data, and has a large error Rate of Heart Rate calculation.

Generally speaking, in the prior art, the ROI is selected improperly, so that the interference is excessive or the ROI is easily influenced by environments such as motion change, light change and the like, and the time sequence information of video data is not considered, so that the heart rate calculating error rate is high, and the real-time performance is low.

The following documents are background information related to the present invention:

[1] chenjiaxin, linqingyu, zhou liang, wei xin, cai kai, a non-contact heart rate measuring method based on a visual camera 20180829, CN109259749A.

[2] Yan, Shu Xie Yi, a non-contact heart rate measurement method based on canonical correlation analysis, 20150630, CN105046209A.

[3]Xiaobai Li、Jie Chen、Guoying Zhao、Matti Pietikainen，Remote Heart Rate Measurement from Face Videos under Realistic Situations，IEEE Conference on Computer Vision，201409.

[4]

R、Franc V，Visual Heart Rate Estimation with Convolutional Neural Network[J]，in BMVC，2108.

Disclosure of Invention

In order to solve the problems in the prior art that the existing image-based non-contact heart rate measurement method is greatly influenced by factors such as ROI (region of interest) region selection and environment, the error rate of heart rate calculation is large, and the instantaneity is low, the invention provides a non-contact heart rate measurement method based on a face image, which comprises the following steps:

step S10, acquiring a face video containing a set frame image as a video to be processed;

step S20, for each frame of image of the video to be processed, obtaining the positions of the nose wing and the lips by a human face 68 characteristic point detection method, and intercepting the set length and width to obtain a ROI regional sequence to be processed;

step S30, obtaining a predicted value of each preset heart rate category interval of the ROI area sequence to be processed through a trained heart rate category identification model;

step S40, taking the heart rate type interval with the maximum predicted value as the heart rate type interval of the video to be processed;

the heart rate category identification model is a cascade model combining a CNN feature extraction network and an LSTM long-time memory neural network, and a channel attention network SEnet is embedded in the CNN feature extraction network.

In some preferred embodiments, the heart rate category interval includes 14:

40-49,50-59,60-69,70-79,80-89,90-99,100-109,110-119,120-129,130-139,140-149,150-159,160-169 and 170-179.

In some preferred embodiments, the heart rate category identification model is trained by:

step B10, acquiring the human face video with the set duration of the set parameters and the real heart rate label value corresponding to each frame of image;

step B20, dividing the real heart rate labeling value into the heart rate category interval, and dividing the face video into L video sequences containing N frames of face images;

step B30, for each frame of image of L video sequences containing N frames of face images, obtaining the positions of the nasal alar and lip by a face 68 characteristic point detection method, intercepting the positions into set length and width to obtain L N frames of ROI regional sequences, and generating L training sample sequences by combining the real heart rate category interval corresponding to the video sequences;

step B40, selecting any training sample sequence, obtaining the predicted value of each preset heart rate category interval of the sample sequence through a heart rate category identification model, and calculating the cross entropy loss value of the maximum predicted value and the corresponding real heart rate category interval;

and B50, if the cross entropy loss value is not lower than a set threshold, adjusting parameters of the heart rate type recognition model, skipping to the step B40, and performing iterative training until the cross entropy loss function value is lower than the set threshold or reaches a set training frequency to obtain the trained heart rate type recognition model.

In some preferred embodiments, the parameters of the parameter-set face video with the set duration are as follows:

setting time length: 2 min;

video frame rate: 30/fps;

distance between camera and collected object: 1 m;

video image size: 1920*1080.

In some preferred embodiments, the L training sample sequences are:

wherein ,τ^jJ is 1,2, …, L represents the j-th sample sequence, L represents a total of L sample sequences,

representing the N frame ROI area image of the j sample sequence, N representing the total N frames ROI area images in the sample sequence, f^jRepresents the jth sample sequence tau^jThe corresponding heart rate category interval.

In some preferred embodiments, the cross entropy loss value is calculated by:

wherein ,t_iIs the true heart rate class interval of the current training sample, S_iThe maximum predicted value of the current training sample belonging to each preset heart rate category interval is represented by i as the heart rate category interval;

wherein ,

the index output belonging to the ith category for the last layer of the fully connected network in the heart rate category identification model,

is the sum of the exponential outputs of the 14 classes.

In some preferred embodiments, the heart rate signal-to-noise ratio carried by the green channel of each frame of image in the face video is higher than that of the red and blue channels, and weight distribution is performed according to the importance degree of each channel through a channel attention network SEnet embedded in a CNN feature extraction network.

On the other hand, the invention provides a non-contact heart rate measurement system based on a human face image, which comprises an input module, an ROI (region of interest) extraction module, a category prediction module and an output module;

the input module is configured to acquire a face video containing a set frame image as a video to be processed and input the face video;

the ROI extraction module is configured to acquire the positions of a nasal alar part and a lip of each frame of image of the video to be processed by a human face 68 characteristic point detection method, and intercept and set the length and the width to acquire a ROI sequence to be processed;

the category prediction module is configured to obtain a prediction value of the ROI area sequence to be processed, which belongs to each preset heart rate category interval, through a trained heart rate category identification model;

and the output module is configured to take the heart rate category interval with the largest predicted value as the heart rate category interval of the video to be processed and output the heart rate category interval.

In a third aspect of the present invention, a storage device is provided, in which a plurality of programs are stored, the programs being adapted to be loaded and executed by a processor to implement the above-mentioned non-contact heart rate measurement method based on facial images.

In a fourth aspect of the present invention, a processing apparatus is provided, which includes a processor, a storage device; the processor is suitable for executing various programs; the storage device is suitable for storing a plurality of programs; the program is suitable to be loaded and executed by a processor to implement the above-mentioned non-contact heart rate measurement method based on human face images.

The invention has the beneficial effects that:

(1) the invention relates to a non-contact heart rate measuring method based on a face image, which is based on the technical principle of photoplethysmography (rPPG), namely the periodic fluctuation of the heart causes the periodic contraction and relaxation of peripheral blood vessels, so that the PPG signal obtained by reflection has periodic change, and camera imaging equipment can obtain the image chromaticity change. The method comprises the steps of utilizing a CNN network to extract features frame by frame aiming at an acquired face video, cascading an LSTM time sequence network on the basis of a single convolution network model, further researching time sequence information between continuous frames of the video for the extracted image space features, utilizing the LSTM to encode the image space feature vectors into a hidden vector to be output, inputting the hidden vector into a two-layer full-connection network, utilizing a softmax function to further calculate the type cross loss of a heart rate interval, and finally acquiring the classification range of the heart rate interval, wherein the heart rate calculation error rate is low, and the efficiency is high.

(2) According to the non-contact heart rate measurement method based on the face image, the channel attention network structure SEnet is embedded into the basic network model ResNet18 on the basis of the ResNet residual error network and the LSTM cascade network model, and the performance of extracting image space features by the convolution model is improved. Aiming at the characteristics of different signal-to-noise ratios of RGB three channels of the face image, wherein the characteristic of high signal-to-noise ratio of the green channel is that different channels are endowed with different weights according to the importance degrees of the channels in the convolution process. Higher weight is distributed to the green channel with higher signal-to-noise ratio and the channel with high-level semantics of green information, and the importance degree of the channel with low signal-to-noise ratio is weakened. The spatial attention mechanism can be used for endowing different weights to spatial feature mapping and remolding the dependency relationship among different channels, so that the time sequence information of video data is fully utilized, and the error rate of heart rate calculation is further reduced.

Drawings

Other features, objects and advantages of the present application will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, made with reference to the accompanying drawings in which:

FIG. 1 is a schematic diagram of a model application process of a non-contact heart rate measurement method based on a human face image;

FIG. 2 is a schematic view of video sequence acquisition according to an embodiment of the non-contact heart rate measurement method based on a human face image;

FIG. 3 is a schematic structural diagram of a channel attention network SEnet according to an embodiment of the non-contact heart rate measurement method based on a human face image;

FIG. 4 is a schematic diagram of a calculation process of a softmax function according to an embodiment of the non-contact heart rate measurement method based on a human face image;

FIG. 5 is a schematic structural diagram of a heart rate category identification model according to an embodiment of the non-contact heart rate measurement method based on a human face image;

figure 6PPG signal component diagram.

Detailed Description

The present application will be described in further detail with reference to the following drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.

It should be noted that the embodiments and features of the embodiments in the present application may be combined with each other without conflict. The present application will be described in detail below with reference to the embodiments with reference to the attached drawings.

The invention provides a non-contact heart rate measuring method based on a face image, which utilizes a cascade network of CNN and LSTM to extract the characteristics of a video space image and learn the time sequence information between continuous frames of the video. Meanwhile, the light-weight network channel attention network SEnet is embedded into a CNN convolution structure, so that the intermediate convolution layer containing green channel information with high signal-to-noise ratio is improved, and the accuracy of a prediction result is further improved. The time sequence information extraction model used by the invention is an LSTM long-short time memory network, the LSTM long-short time memory network is a variant of RNN, a memory gating mechanism is added, and memory history information can be selectively stored, so that the time sequence prediction method is widely applied to the prediction problem of time sequences.

The invention discloses a non-contact heart rate measuring method based on a face image, which comprises the following steps:

In order to more clearly describe the non-contact heart rate measurement method based on the facial image of the present invention, the following will describe each step in the embodiment of the present invention in detail with reference to fig. 1.

The non-contact heart rate measuring method based on the face image in the first embodiment of the invention comprises the steps of S10-S40, and the steps are described in detail as follows:

step S10, a face video including a set frame image is acquired as a video to be processed.

As shown in fig. 2, which is a schematic view of acquiring a video sequence according to an embodiment of the non-contact heart rate measurement method based on a face image of the present invention, Subject represents an acquisition object of the video sequence, build-in webcam represents a Built-in camera used for acquiring the video sequence, Laptop represents an intelligent processor (such as a notebook, a tablet, a mobile phone, etc.) for performing video processing and heart rate type acquisition, and Finger blood pressure Sensor represents Finger BVP Sensor. In one embodiment of the invention, a face video of the measured object is acquired by a Logitech camera.

And step S20, acquiring the positions of the nose wing and the lips of each frame of image of the video to be processed by a human face 68 characteristic point detection method, and intercepting the set length and width to obtain a ROI area sequence to be processed.

The human face detection technology such as deep learning and Haar feature detection method provides technical support for human face heart rate measurement. With the application of the deep learning method in the field of heart rate identification, the public data sets for non-contact heart rate detection promote the application of the deep learning method in the aspect of heart rate identification and detection problems.

For a video to be processed, a human face is intercepted frame by frame through a Harr feature detection method or an SSD detection method, the positions of a nasal wing and a lip are obtained through a human face 68 feature point detection method, an image is intercepted to be set in length and width, an ROI (region of interest) region of each frame of image is extracted, and an ROI region sequence to be processed corresponding to the video to be processed is obtained. In one embodiment of the present invention, the length and width of the ROI region of each extracted frame image is 410 × 500.

And step S30, obtaining a predicted value that the ROI sequence to be processed belongs to each preset heart rate category interval through the trained heart rate category identification model.

In one embodiment of the invention, heart rate category intervals are divided according to heart rate marking values of samples collected during model training, so that 14 category intervals are divided:

And step S40, taking the heart rate type interval with the maximum predicted value as the heart rate type interval of the video to be processed.

The heart rate category identification model is constructed as a cascade model combining a CNN characteristic extraction network and an LSTM long-time memory neural network. ResNet18 is used as a basic network model, and the backbone network is used for extracting heart rate characteristics. The other methods mostly adopt a single convolutional neural network model to extract the heart rate characteristics, however, the time sequence dependence between adjacent frames of the face video lacks attention and utilization, so the method provides a cascade network of the convolutional network and the time sequence model, and the time sequence characteristic change of the spatial characteristics is extracted by using the LSTM while the face spatial characteristics of each frame of the face video are considered, so that the nonlinear relation between the face color change and the heart rate characteristic extraction is modeled.

The training method of the heart rate category identification model comprises the following steps:

and step B10, acquiring the human face video with the set duration of the set parameters and the real heart rate label value corresponding to each frame of image.

The invention provides a brand-new non-contact heart rate recognition deep learning method and also provides a construction method of a face video and a heart rate labeling data set for research and use.

The acquisition data set is described as follows:

setting time length of each video: and 2 min.

Video frame rate: 30/fps.

Distance between camera and collected object: 1m, and a solvent.

Video image size: 1920*1080.

The size of the intercepted face image is as follows: 410*500.

The collection equipment of selecting for use: logitech camera.

When the face video is collected, the diversity of illumination and motion conditions should be noticed, so that the model training is more robust. And collecting the heart rate data while collecting the video to generate a corresponding heart rate data file.

And step B20, dividing the real heart rate labeling value into the heart rate category interval, and dividing the face video into L video sequences containing N frames of face images.

In one embodiment of the invention, the images of the face video are read frame by frame, and every 20 images are used as a video sequence.

And intercepting the face images with uniform size from the picture by using an SSD (solid State disk) detection model or a Haar feature classifier, and finally dividing the face video into L video sequences containing N frames of face images.

Step B30, for each frame of image of L video sequences containing N frames of face images, obtaining the position of the nasal alar and lip with rich blood vessels by a face 68 feature point detection method, and intercepting the position to be set with length 410 x 500 to obtain L ROI region sequences of N frames, and generating L training sample sequences by combining the real heart rate category interval corresponding to the video sequence, as shown in formula (1):

Such a training data set comprises L training sample sequences, each training sample sequence comprising N ROI region image sets x ∈ χ and corresponding heart rate frequency labels F ∈ F (the true heart rate category interval is given in a frequency manner).

In one embodiment of the invention, every 30 continuous images are used as a sample to mark a heart rate category interval, and each image is subjected to ResNet18 convolutional neural network to obtain a feature vector.

Determining a sequence time length T of each training sample sequence in the training data set, as shown in table 1:

TABLE 1

Time/s	Frame rate/fps	Total frame	Label
					1	30	30	60～65
2	30	60	70～75
				3	30	90	65～80

In table 1, Time represents the sequence Time length in units of s, Frame rate represents the Frame rate in units of fps, Total Frame represents the Total image Frame number, and Label represents the heart rate category interval.

And step B40, selecting any training sample sequence, obtaining the predicted value of each preset heart rate category interval of the sample sequence through a heart rate category identification model, and calculating the cross entropy loss value of the maximum predicted value and the corresponding real heart rate category interval.

The invention combines the CNN characteristic extraction network and the cascade model of the LSTM long-time memory neural network, and the network structure is shown in Table 2:

TABLE 2

In table 2, ResNet is the selected CNN feature extraction network for feature extraction, in which the channel attention network sentet is embedded, an FC full connection layer is added between the CNN feature extraction network and the LSTM, the constraint represents a convolutional layer, max pole represents a maximum pooling layer, average pole represents an average pooling layer, and full connected represents a full connection layer.

Inputting each frame image in the sequence time length T into a heart rate category identification model, and performing feature extraction through a CNN feature extraction network, wherein the CNN feature extraction network is a ResNet network pre-trained by an ImageNet face data set, parameters and calculation consumption are considered, and the last three full-connection layers are modified into two layers.

Obtaining a characteristic channel c from the convolution layer, the pooling layer and the activation function layer₂Then, the SENET is input to perform the transformation of the channel weight, and the specific way is as follows:

FIG. 3 is a schematic structural diagram of a channel attention network SEnet according to an embodiment of the non-contact heart rate measurement method based on a human face image, in which the size is h × w × c by average potential₂Feature map compression (Squeeze) to 1 x 1 c₂Each two-dimensional characteristic channel becomes a real number, which can be regarded as having a global receptive field. 1 x c₂And (3) obtaining final weight output by the part through two full connections and a sigmoid activation (excitation) part, and weighting the normalized weight to the characteristic (multiplex by Channel weights) of each Channel through Scale operation.

The features output by the SENET are subjected to multilayer convolution, pooling, activation and dropout operation, and feature description vectors { X: X ] of each picture are output in a full-connection mode_t,x_t+1,x_t+2,...x_t+T}, wherein x_tIs a feature vector of dimension m.

Inputting the vectors into an LSTM long-time memory neural network, and inputting the characteristic vectors { X: X ] of the T time sequence_t,x_t+1,x_t+2,...x_t+TEncoding as a hidden vector output x_hidden. In one embodiment of the invention, the 30-dimensional feature vector is passed through an LSTM model to output an encoded hidden vector.

The LSTM long-time memory neural network takes the output of the previous moment as the input of the next moment, and each repetitive network has four special gating mechanism structures, wherein the forgetting gate determines what information is discarded from the previous moment in the first step, the input gate determines the retention degree of the input information, then the input gate determines the output of the current moment along with the state of a new cell, and all gating mechanisms pass through a sigmoid unit to determine the proportion of output information.

And the output of the LSTM is used as the input of the next full-connection module, and the prediction value that the sample sequence belongs to each preset heart rate category interval is obtained through the softmax module.

Softmax is an important function in the classification problem, mapping the input to real numbers between 0 and 1, and normalizing the guaranteed sum to 1, for the vector V, V of the output layer_iThe ith element in the V is represented, and the calculation method of the maximum predicted value of the element belonging to each preset heart rate category interval is shown as formula (2):

wherein ,

is the sum of the exponential outputs of the 14 classes.

FIG. 4 is a schematic diagram of a softmax function calculation process of an embodiment of the non-contact heart rate measurement method based on facial images, for input { y₁,y₂,...,y_nSolving e exponential function to obtain

Respectively calculating the probability of each e exponential function in the sum of all e exponential functions

The maximum value of the result represents the maximum predicted value of the element belonging to each preset heart rate category interval.

Therefore, the calculation method of the maximum predicted value of the sample sequence and the cross entropy loss value of the corresponding real heart rate category interval is shown as the formula (3):

wherein ,t_iIs the true heart rate class interval of the current training sample, S_iAnd i is the heart rate category interval, which is the maximum predicted value of the current training sample belonging to each preset heart rate category interval.

And the output vector of the LSTM passes through the two full-connection layers and the softmax loss function to obtain a corresponding heart rate classification interval result. The corresponding heart rate category interval is labeled: (40-49,50-59,60-69,70-79,80-89,90-99, 100-. The CNN and LSTM cascade model output is different from that of other methods which directly acquire the specific value of the heart rate by using the regression idea, the regression problem is converted into the classification problem, and the problem that the heart rate estimation result is serious in shock phenomenon compared with the true value due to large interference of environmental illumination change, movement and expression in the process of directly estimating the specific heart rate value is solved.

As shown in fig. 5, which is a schematic structural diagram of a heart rate class recognition model according to an embodiment of the non-contact heart rate measurement method based on facial images of the present invention, an Input sequence of the length N represents an Input sample sequence with a sequence length of N, ResNet18 represents a CNN feature extraction network selected by the present invention, in which a channel attention network SENet is embedded, Conv represents a convolution operation, ReLU represents an activation operation, FC represents a full connection operation,

for the feature vector of the T time series, softmax stands for the sort operation, HeThe art interval cams represent the finally acquired heart rate category interval.

As mentioned above, the invention provides a heart rate labeling data set in the implementation process, the human face part is intercepted frame by frame on the basis of the obtained video, the attention mechanism module SENet is added on the basis of the principle that the signal-to-noise ratio of a green channel is higher than that of other channels, the CBAM module gives corresponding weights to different channels again according to the importance degree, and the dependency relationship among the channels of each layer is reshaped. Inputting the feature descriptor output by the ResNet18 convolution module into an LSTM code, and taking the time sequence information of the interframe features as a partial basis for extracting the heart rate rule. Therefore, the method has high industrial utilization value for the application of the non-contact heart rate extraction scene. The user can utilize various terminal devices, such as a computer, a mobile phone, a PC, a tablet and the like, to acquire a human face video image, input the human face video image into the deep learning model after preprocessing, give a corresponding heart rate interval prediction result and input the heart rate interval prediction result into the terminal display device, and therefore heart rate monitoring and health condition early warning of the user by the terminal are achieved.

The non-contact heart rate measurement system based on the face image comprises an input module, an ROI (region of interest) region extraction module, a category prediction module and an output module;

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working process and related description of the system described above may refer to the corresponding process in the foregoing method embodiments, and will not be described herein again.

It should be noted that, the non-contact heart rate measurement system based on a face image provided in the foregoing embodiment is only illustrated by dividing the functional modules, and in practical applications, the functions may be allocated to different functional modules according to needs, that is, the modules or steps in the embodiment of the present invention are further decomposed or combined, for example, the modules in the foregoing embodiment may be combined into one module, or may be further split into multiple sub-modules, so as to complete all or part of the functions described above. The names of the modules and steps involved in the embodiments of the present invention are only for distinguishing the modules or steps, and are not to be construed as unduly limiting the present invention.

A storage device according to a third embodiment of the present invention stores a plurality of programs, and the programs are suitable for being loaded and executed by a processor to realize the above non-contact heart rate measurement method based on human face images.

A processing apparatus according to a fourth embodiment of the present invention includes a processor, a storage device; a processor adapted to execute various programs; a storage device adapted to store a plurality of programs; the program is suitable to be loaded and executed by a processor to implement the above-mentioned non-contact heart rate measurement method based on human face images.

It can be clearly understood by those skilled in the art that, for convenience and brevity of description, the specific working processes and related descriptions of the storage device and the processing device described above may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.

Those of skill in the art would appreciate that the various illustrative modules, method steps, and modules described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that programs corresponding to the software modules, method steps may be located in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art. To clearly illustrate this interchangeability of electronic hardware and software, various illustrative components and steps have been described above generally in terms of their functionality. Whether such functionality is implemented as electronic hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The terms "first," "second," and the like are used for distinguishing between similar elements and not necessarily for describing or implying a particular order or sequence.

The terms "comprises," "comprising," or any other similar term are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

So far, the technical solutions of the present invention have been described in connection with the preferred embodiments shown in the drawings, but it is easily understood by those skilled in the art that the scope of the present invention is obviously not limited to these specific embodiments. Equivalent changes or substitutions of related technical features can be made by those skilled in the art without departing from the principle of the invention, and the technical scheme after the changes or substitutions can fall into the protection scope of the invention.

Claims

1. A non-contact heart rate measurement method based on a face image is characterized by comprising the following steps:

2. The non-contact heart rate measurement method based on the facial image as claimed in claim 1, wherein the heart rate category interval includes 14:

3. The non-contact heart rate measurement method based on the facial image as claimed in claim 2, wherein the heart rate category recognition model is trained by:

4. The non-contact heart rate measurement method based on the human face image as claimed in claim 3, wherein the parameters of the human face video with the set duration of the set parameters are:

setting time length: 2 min;

video frame rate: 30/fps;

distance between camera and collected object: 1 m;

video image size: 1920*1080.

5. The non-contact heart rate measurement method based on the facial image according to claim 3, wherein the L training sample sequences are:

n represents the ROI area image of the nth frame of the jth sample sequence, N represents the total N ROI area images in the sample sequence, f^jRepresents the jth sample sequence tau^jThe corresponding heart rate category interval.

6. The non-contact heart rate measurement method based on the facial image as claimed in claim 3, wherein the cross entropy loss value is calculated by:

wherein ,

is the sum of the exponential outputs of the 14 classes.

7. The non-contact heart rate measurement method based on the human face image according to claim 1, wherein the heart rate signal-to-noise ratio carried by the green channel of each frame image in the human face video is higher than that of the red and blue channels, and weight distribution is performed according to the importance degree of each channel through a channel attention network SEnet embedded in a CNN feature extraction network.

8. A non-contact heart rate measurement system based on a face image is characterized by comprising an input module, an ROI (region of interest) region extraction module, a category prediction module and an output module;

9. A storage device having stored thereon a plurality of programs, wherein the programs are adapted to be loaded and executed by a processor to implement the method for contactless heart rate measurement based on facial images according to any of claims 1-7.

10. A treatment apparatus comprises

A processor adapted to execute various programs; and

a storage device adapted to store a plurality of programs;

wherein the program is adapted to be loaded and executed by a processor to perform:

the method for non-contact heart rate measurement based on facial images of any one of claims 1-7.