CN117173742A

CN117173742A - Remote large-range heart rate estimation method based on deep learning and face segmentation

Info

Publication number: CN117173742A
Application number: CN202311013839.4A
Authority: CN
Inventors: 缑水平; 赵汉涛; 郭璋; 焦昶哲
Original assignee: Xi'an Yunying Yitong Technology Co ltd; Xidian University
Current assignee: Xi'an Yunying Yitong Technology Co ltd; Xidian University
Priority date: 2023-08-11
Filing date: 2023-08-11
Publication date: 2023-12-05

Abstract

The application discloses a remote large-range heart rate estimation method based on face segmentation, which comprises the following steps: acquiring a near infrared thermal imaging video, and extracting 68 key points aiming at each frame of image; dividing the face into 25 areas according to the key points; calculating average gray scales aiming at the forehead, the left face, the right face and the chin in the region to obtain four paths of local face heart rate signals, and calculating frequency domain confidence coefficients; carrying out weighted average on the signals according to the frequency domain confidence coefficient to obtain a facial heart rate signal; performing two-classification on the facial heart rate signals by using a cyclic neural network model to obtain a classification result of high heart rate or low heart rate; performing wavelet decomposition on the facial heart rate signals, removing sub-band signals which do not accord with the classification result in the wavelet decomposition result, and recombining the facial heart rate signals; intercepting the facial heart rate signals continuously for a plurality of times based on the dynamic window, and estimating the heart rate by utilizing the intercepted signals each time; and determining a final heart rate estimation result according to the heart rate values estimated continuously and repeatedly.

Description

Remote large-range heart rate estimation method based on deep learning and face segmentation

Technical Field

The application belongs to the field of computer vision, and particularly relates to a remote large-range heart rate estimation method based on deep learning and face segmentation.

Background

Heart rate is one of vital signs of the human body and is also the most important and fundamental physiological index in the cardiovascular system of the human body. The heart rate of normal people in a resting state is relatively stable, but the heart rates of people with different sexes and ages are often different, and the heart rate can be greatly changed due to severe exercise or change of emotional states. For example, the heart rate of a person in a sleeping state is low, and the heart rate is greatly increased when the person goes through intense exercise or is affected by emotion such as vitality, excitement and the like. People suffering from cardiovascular diseases often have great fluctuations in heart rate at the onset of the disease, and if the heart rate information of the patient is not known in time, the optimal diagnosis and treatment time may be missed. Therefore, effective heart rate monitoring is imperative for people diagnosed with cardiovascular disease, even those with potential health risks.

Compared with contact heart rate detection, the non-contact video heart rate detection can remotely detect heart rate, does not need to wear special detection equipment, and is more convenient to apply. In the related art, patent application with publication number CN114912487a discloses an end-to-end remote heart rate detection method based on a channel enhanced spatiotemporal attention network, which directly maps an original face video and a specific heart rate value by using an end-to-end deep learning network. However, the method has very high requirements on the data set required by the training network, and the quality and the quantity of the required samples reach a certain level, so that the method has limited practicability.

Disclosure of Invention

In order to solve the problems in the prior art, the application provides a remote large-range heart rate estimation method based on deep learning and face segmentation.

The technical problems to be solved by the application are realized by the following technical scheme:

a remote large-range heart rate estimation method based on deep learning and face segmentation comprises the following steps:

acquiring a near infrared thermal imaging video containing a face image;

extracting 68 key points of a human face in each frame of image in the near infrared thermal imaging video; dividing the face in the image into 25 areas according to the 68 key points; calculating the average gray level of each region belonging to the forehead, the left face, the right face and the chin in the 25 regions;

respectively integrating the forehead average gray level, the left face average gray level, the right face average gray level and the chin average gray level calculated according to each frame of image according to a time sequence to obtain four paths of local face heart rate signals;

calculating the frequency domain confidence coefficient of each local face heart rate signal;

carrying out weighted average on the four paths of local face heart rate signals according to the calculated frequency domain confidence coefficient to obtain a face heart rate signal;

performing two-classification on the facial heart rate signals by using a cyclic neural network model to obtain classification results of the facial heart rate signals corresponding to high heart rate or low heart rate;

performing wavelet decomposition on the facial heart rate signals, removing sub-band signals which are inconsistent with the classification result in the wavelet decomposition result, and recombining the facial heart rate signals;

intercepting the recombined facial heart rate signals continuously for a plurality of times based on a dynamic window, and estimating the heart rate by utilizing the intercepted signals each time;

and determining a final heart rate estimation result according to the heart rate values estimated continuously and repeatedly.

In one embodiment, the calculating the frequency domain confidence of each local face heart rate signal includes:

trending treatment is carried out on each path of local face heart rate signals, and band-pass filtering treatment is carried out on each path of local face heart rate signals according to heart rate intervals;

aiming at each path of local face heart rate signals subjected to trending and band-pass filtering processing, the path of local face heart rate signals are converted into a frequency domain, and the ratio of a main frequency signal value to a frequency domain average signal value in a frequency domain conversion result is calculated and used as the frequency domain confidence coefficient of the path of local face heart rate signals.

In one embodiment, the capturing the recombined facial heart rate signal based on the dynamic window for a plurality of times, and using each captured signal for heart rate estimation, includes:

A. setting the initial window size of a Hamming window;

B. intercepting the recombined facial heart rate signal by using the current Hamming window, and carrying out zero padding on intercepted data to obtain a windowed facial heart rate signal;

C. b, performing Hilbert transformation on the windowed face heart rate signal in the step B, and obtaining the envelope of the windowed face heart rate signal according to a transformation result;

D. c, converting the envelope in the step C to a frequency domain, and multiplying the main frequency in the frequency domain conversion result by 60 to obtain a heart rate estimated value of the time;

E. judging whether the window size of the current Hamming window reaches the upper limit; if not, increasing the window size of the Hamming window according to the preset step, and returning to the step B; if so, continuing to execute the step of determining a final heart rate estimation result according to the heart rate values estimated continuously for a plurality of times.

In one embodiment, the determining the final heart rate estimate from the successively more estimated heart rate values comprises:

when the heart rate values estimated for a plurality of times are all maintained at a stable value, outputting the stable value as a final heart rate estimation result;

and when the heart rate values estimated for a plurality of times are in the same ascending trend or in the same descending trend, outputting the heart rate values estimated for a plurality of times as a final heart rate estimation result.

In one embodiment, the method further comprises: and when face region segmentation is carried out on the face in the near infrared thermal imaging video, carrying out convolution sharpening processing on each frame of image in the near infrared thermal imaging video so as to carry out face region segmentation by utilizing the image after the convolution sharpening processing.

In one embodiment, the recurrent neural network model is trained based on a plurality of sample heart rate signals and classifications corresponding to the sample heart rate signals; wherein, the sample heart rate signal with heart rate lower than 90 is classified as low heart rate; the sample heart rate signal with a heart rate not lower than 90 corresponds to a classification of high heart rate.

According to the remote large-range heart rate estimation method based on deep learning and face segmentation, provided by the application, remote non-contact heart rate estimation can be realized without depending on a strong data set, and the method has higher practicability.

The remote large-range heart rate estimation method based on deep learning and face segmentation provided by the application has the following more beneficial effects:

(1) The face region with physical meaning is accurately segmented, and the local face heart rate signals extracted by the partitions have more reference value for heart rate estimation, so that the estimation result is more accurate.

(2) The frequency domain confidence evaluation is carried out on the local face heart rate signals, so that the contribution of the local face heart rate signals with high confidence to the heart rate estimation result can be highlighted, and the influence of the local face heart rate signals with low confidence to the heart rate estimation result is restrained.

(3) The dynamic window is used for continuous multiple heart rate estimation, the accuracy of heart rate estimation results is ensured while the quick value output is considered, and the problem that the quick value output and the accuracy are difficult to be considered when the remote video signal is used for heart rate detection is solved.

(4) The face heart rate signals are classified by the aid of the cyclic neural network model, harmonic signals (subband signals) which do not accord with classification are removed according to classification results, the signal-to-noise ratio of the face heart rate signals is improved, and accordingly accuracy of heart rate estimation results is improved.

The present application will be described in further detail with reference to the accompanying drawings.

Drawings

FIG. 1 is a flow chart of a remote large-range heart rate estimation method based on deep learning and face segmentation provided by an embodiment of the application;

fig. 2 is a schematic diagram of region segmentation of a face in an embodiment of the present application.

Detailed Description

The present application will be described in further detail with reference to specific examples, but embodiments of the present application are not limited thereto.

In order to solve the technical problems in the background art, the embodiment of the application provides a remote large-range heart rate estimation method based on deep learning and face segmentation, as shown in fig. 1, the method comprises the following steps:

s10: and acquiring a near infrared thermal imaging video containing the face image.

The near infrared thermal imaging video can be directly obtained by shooting with a camera or can be obtained by an introduction mode, and the embodiment of the application is not limited to the above.

It is worth mentioning that most of the non-contact video heart rate detection schemes to date use visible light video, and most of the existing public data sets are also data sets of RGB images. The heart rate detection method is unfavorable for heart rate detection in a dim light scene, so that the heart rate estimation is performed by using near infrared thermal imaging video in the embodiment of the application, and effective heart rate estimation can be realized in both a bright light scene and a dim light scene.

S20: extracting 68 key points of a human face in each frame of image in the near infrared thermal imaging video; dividing the face in the image into 25 areas according to the 68 key points; for each of the 25 regions belonging to the forehead, the left face, the right face and the chin, the average gray of the region is calculated.

As can be seen from fig. 2, regions 1 to 4 belong to the forehead, regions 11 and 12 belong to the left face, regions 13 and 14 belong to the right face, and regions 21 to 25 belong to the chin, and the average gray scales in these regions are calculated, respectively.

In practice, the extraction 68 of keypoints and the 25-region partitioning of faces may be implemented using Dlib face boundary detectors (Dlib libraries).

It will be appreciated that by extracting 68 key points and performing a 25-region partitioning operation based on these key points, accurate face segmentation with physical significance can be achieved, which can calculate a more accurate local face heart rate signal. In the prior art, a simple rectangular frame is usually used for framing out different facial areas, and the segmentation mode is rough and is not beneficial to accurate calculation of heart rate signals.

In addition, when face region segmentation is carried out on a face in the near infrared thermal imaging video, convolution sharpening processing can be carried out on each frame of image in the near infrared thermal imaging video, so that the face region segmentation is carried out by utilizing the image after the convolution sharpening processing, and the detection of key points of the face is facilitated.

S30: and integrating the forehead average gray level, the left face average gray level, the right face average gray level and the chin average gray level calculated according to each frame of image according to the time sequence respectively to obtain four paths of local face heart rate signals.

It is understood that the four local human face heart rate signals include a forehead heart rate signal, a left face heart rate signal, a right face heart rate signal and a chin heart rate signal, and the data lengths of the four heart rate signals are equal to the frame number of the near infrared thermal imaging video.

S40: and calculating the frequency domain confidence coefficient of each path of local face heart rate signal.

Specifically, trending treatment is carried out on each path of local face heart rate signals, and band-pass filtering treatment is carried out on each path of local face heart rate signals according to a heart rate interval; then, for each local face heart rate signal after trending and band-pass filtering processing, the local face heart rate signal is converted into a frequency domain, and the ratio of the main frequency signal value to the frequency domain average signal value in the frequency domain conversion result is calculated and used as the frequency domain confidence coefficient of the local face heart rate signal.

The trending processing can remove abnormal offset caused to the local face heart rate signal when video is shot, so that the original characteristics of the local face heart rate signal are better presented, and the specific implementation mode of the trending processing can be referred to the related prior art, and the embodiment of the application is not repeated.

The heart rate interval, i.e., passband, used in bandpass filtering may be set to 45-150 bpm, which is wide enough, but is not necessarily limited thereto.

By calculating the frequency domain confidence coefficient of each path of local face heart rate signal, the embodiment of the application achieves the remarkable beneficial effects different from the prior art: according to the embodiment of the application, the quality of the heart rate signal is evaluated by using another set of evaluation criteria different from the heart rate standard value, so that the heart rate signal which is different from the heart rate standard value but really exists is not taken as interference rejection, and the pseudo heart rate signal which is similar to the heart rate standard value but is the interference signal is rejected, so that the accuracy of a final heart rate detection result can be improved, and particularly the accuracy of a heart rate detection estimation result for non-conventional people can be improved. It is understood that non-conventional populations as referred to herein include non-stationary populations in motion, sub-healthy populations with higher or lower heart rates, and the like.

S50: and carrying out weighted average on four paths of local face heart rate signals according to the calculated frequency domain confidence coefficient to obtain the face heart rate signals.

In addition, the frequency domain confidence coefficients of the four paths of local face heart rate signals can be added to be used as the confidence coefficient of the face heart rate signals. If the confidence value is low, then subsequent steps may be omitted and the method provided by embodiments of the present application may be re-performed starting with step S10.

S60: and carrying out two classification on the facial heart rate signals by using the cyclic neural network model to obtain a classification result of the facial heart rate signals corresponding to the high heart rate or the low heart rate.

In the step, the facial heart rate signals are classified by the cyclic neural network model, harmonic signals (subband signals) which do not accord with classification are removed according to the classification result, and the signal-to-noise ratio of the facial heart rate signals is improved, so that the accuracy of the heart rate estimation result is improved.

The cyclic neural network model is obtained by training based on a plurality of sample heart rate signals and classifications corresponding to the sample heart rate signals; wherein, the sample heart rate signal with heart rate lower than 90 is classified as low heart rate; the sample heart rate signal with a heart rate not lower than 90 corresponds to a classification of high heart rate.

In practical applications, there may be a plurality of cyclic neural network models that can be used in the embodiments of the present application, for example, cyclic neural network models capable of processing long-sequence data, such as long-term memory network (LSTM), etc., may be applied in the embodiments of the present application.

The recurrent neural network model may include, for example, an input layer, a hidden layer, and an output layer, wherein the hidden layer includes an LSTM cell that uses the tanh function as an activation function. The output layer sometimes uses a sigmoid function as the activation function.

The cross entropy function may be used to calculate the model loss when training the recurrent neural network model, thereby yielding a trained recurrent neural network model when the cross entropy is less than a predetermined threshold. Wherein when the cross entropy does not meet the requirement of being less than the threshold, the network parameters can be updated using an optimizer provided by pytorch (e.g., adam) so that the loss function converges to a minimum as soon as possible.

In addition, after the circulating neural network model is trained, the classification performance of the circulating neural network model can be tested by using a test sample which does not participate in training, so that the circulating neural network model is put into practical use when the classification accuracy and generalization capability of the circulating neural network model reach the evaluation standards.

S70: and carrying out wavelet decomposition on the facial heart rate signals, removing sub-band signals which are not matched with the classification results in the wavelet decomposition results, and recombining the facial heart rate signals.

In this step, wavelet decomposition is used as a method of decomposing a signal into subbands of different frequencies, which can be used to perform a multi-layer decomposition of the approximate and detail components of the signal with wavelet basis functions to obtain a multi-scale representation of the signal. The specific method is that Wavelet Packet Transformation (WPT) is carried out on the facial heart rate signals to obtain sub-band signals with different frequency bands, and a wavelet packet tree is formed. Because the high heart rate and the low heart rate respectively correspond to different subband signals, if the classification result in the step S60 is the high heart rate, removing the subband signals which do not correspond to the high heart rate in the wavelet packet tree, and then recombining the rest subband signals into a facial heart rate signal; if the classification result in step S60 is low heart rate, the sub-band signals in the wavelet packet tree not corresponding to low heart rate are removed, and then the remaining sub-band signals are recombined into the facial heart rate signal.

In practical applications, the wavelet function dmey may be used to perform wavelet transform, and the maximum analysis layer number may be set to 3, that is, at most 8 sub-signals with different frequency bands may be obtained, which is not limited thereto.

It is noted that most of the non-contact video heart rate detection schemes so far generally limit the heart rate range to 55-99 bpm, which is however the ideal heart rate range for a person in a resting state, so that these existing schemes have difficulty in accurate heart rate estimation for a person after exercise or for a particular population of high heart rates.

In the embodiment of the application, the cyclic neural network model can be used for carrying out two-class classification on the facial heart rate signals of a wide population, so that the facial heart rate signals of different individuals are purified according to the characteristics of the heart rate signals of the different individuals in a targeted manner, and a more accurate estimation result can be obtained by carrying out heart rate estimation according to the purified facial heart rate. This is because if the heart rate is estimated using a spectrum having an excessively large heart rate range directly, the heart rate estimation is inaccurate due to a frequency multiplication phenomenon which easily occurs in the spectrum. At the same time, this is the main reason why the prior art does not work well for directly estimating heart rate by deep learning.

In addition, in comparison, the remote heart rate detection method using the end-to-end deep learning network mentioned in the background art needs to map the original face near infrared thermal imaging video and specific heart rate values, so that besides the requirement on a training data set is high, the network structure is more complex, and the robustness of an algorithm used by the model is low. In the embodiment of the application, only two classifications are needed for the facial heart rate signals, so that a training data set is easy to obtain, the realization of a classified network model structure is simpler, and the difficulty is greatly reduced compared with the mode of directly mapping data to specific heart rate in the prior art, so that the robustness of an algorithm used by the model is higher, and the method is more suitable for popularization and application.

S80: and intercepting the recombined facial heart rate signals continuously for a plurality of times based on the dynamic window, and estimating the heart rate by utilizing the intercepted signals each time.

Specifically, the step S80 includes:

A. setting the initial window size of a Hamming window;

B. intercepting the facial heart rate signal by using the current Hamming window, and carrying out zero padding on intercepted data to obtain the windowed facial heart rate signal;

E. judging whether the window size of the current Hamming window reaches the upper limit; if not, increasing the window size of the Hamming window according to the preset step, and returning to the step B; if so, the process continues to step S90.

For example, the initial window size of the hamming window is set to 10s, that is, the face heart rate signal of 10s is firstly intercepted to perform hilbert transformation to obtain an envelope, then the envelope is converted to a frequency domain to obtain a main frequency, and the main frequency is multiplied by 60 to obtain a first estimated heart rate value in bpm. Then, as the window size of the Hamming window does not reach the set upper limit of 60s, increasing the window size of the Hamming window to 20 according to the step 10, namely, intercepting a 20s facial heart rate signal to perform Hilbert transformation to obtain an envelope, converting the envelope to a frequency domain to obtain a main frequency, multiplying the main frequency by 60 to obtain a second estimated heart rate value; and so on until the window size of the Hamming window reaches 60s, stopping estimation, and obtaining 5 continuous estimated heart rate values.

It will be appreciated that the hamming window serves to reduce the effects of frequency leakage when the signal is subjected to frequency domain analysis. Frequency leakage refers to spectral distortion caused by signal truncation, making it difficult to accurately measure the true frequency and amplitude of a signal. The Hamming window is a non-rectangular window and is characterized by greater side lobe attenuation, and the main lobe peak and the first side lobe peak attenuation can reach 40dB. Thus, interference of adjacent frequencies can be effectively restrained, and the resolution of signals is improved. In addition, zero padding operation is carried out on the signals after windowing, zero values are added behind the time domain signals, so that the lengths of the signals become the whole power of 2, a Fast Fourier Transform (FFT) algorithm is convenient to use, in addition, the zero padding operation can also increase sampling points of frequency domain data, and the resolution details of the frequency domain are improved.

The beneficial effects of the Hilbert transform on the windowed facial heart rate signal in step C are: the signal is expanded into an analytic signal on a complex plane, so that the envelope of the signal is obtained, the signal resolution can be improved, and clutter components outside non-dominant frequency components can be reduced.

S90: and determining a final heart rate estimation result according to the heart rate values estimated continuously and repeatedly.

Specifically, when the heart rate values estimated for a plurality of times are all maintained at a stable value, outputting the stable value as a final heart rate estimation result; and when the heart rate values estimated for a plurality of times have the same ascending trend or the same descending trend, outputting the heart rate values estimated for a plurality of times as a final heart rate estimation result.

In the remote large-range heart rate estimation method based on deep learning and face segmentation provided by the embodiment of the application, remote non-contact heart rate estimation can be realized without depending on a strong data set, and the method has higher practicability. The method carries out accurate segmentation of the facial area with physical meaning, so that the local face heart rate signals extracted by the partition have more reference value for heart rate estimation, and the estimation result is more accurate. According to the method, the frequency domain confidence coefficient evaluation is carried out on the local face heart rate signals, so that the contribution of the local face heart rate signals with high confidence coefficient to the heart rate estimation result can be highlighted, and the influence of the local face heart rate signals with low confidence coefficient to the heart rate estimation result is restrained. According to the method, the dynamic window is used for continuous and repeated heart rate estimation, the accuracy of a heart rate estimation result is ensured while the quick value output is considered, and the problem that the quick value output and the accuracy are difficult to consider when the remote video signal is used for heart rate detection is solved.

In summary, the embodiment of the application solves the problem that the heart rate estimation result error is overlarge due to harmonic waves, individual difference of people and the like when the remote video signal is used for carrying out large-scale heart rate estimation, and simultaneously solves the problem that the quick value output and the accuracy are difficult to be simultaneously considered, thereby being very suitable for carrying out remote medical monitoring in practice.

The method provided by the embodiment of the application can be applied to electronic equipment. Specifically, the electronic device may be: desktop computers, portable computers, intelligent mobile terminals, servers, etc. Any electronic device capable of implementing the present application is not limited herein, and falls within the scope of the present application.

It should be noted that the terms "first," "second," and the like are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the disclosed embodiments described herein may be implemented in other sequences than those illustrated or otherwise described herein. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with aspects of the present disclosure.

In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present application. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Further, one skilled in the art can engage and combine the different embodiments or examples described in this specification.

Although the application is described herein in connection with various embodiments, other variations to the disclosed embodiments can be understood and effected by those skilled in the art in practicing the claimed application, from a study of the drawings and the disclosure. In the description of the present application, the word "comprising" does not exclude other elements or steps, the "a" or "an" does not exclude a plurality, and the "a" or "an" means two or more, unless specifically defined otherwise. Moreover, some measures are described in mutually different embodiments, but this does not mean that these measures cannot be combined to produce a good effect.

The foregoing is a further detailed description of the application in connection with the preferred embodiments, and it is not intended that the application be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the application, and these should be considered to be within the scope of the application.

Claims

1. The remote large-range heart rate estimation method based on deep learning and face segmentation is characterized by comprising the following steps of:

acquiring a near infrared thermal imaging video containing a face image;

2. The method for remote wide-range heart rate estimation based on deep learning and face segmentation of claim 1, wherein the calculating the frequency domain confidence of each local face heart rate signal comprises:

3. The method for remote wide-range heart rate estimation based on deep learning and face segmentation of claim 1, wherein the dynamic window based continuous multiple truncation of the reconstructed facial heart rate signal and heart rate estimation using each truncated signal comprises:

A. setting the initial window size of a Hamming window;

4. The method of remote wide range heart rate estimation based on deep learning and face segmentation of claim 1, wherein determining a final heart rate estimate from successively more estimated heart rate values comprises:

5. The method of remote wide-range heart rate estimation based on deep learning and face segmentation of claim 1, further comprising: and when face region segmentation is carried out on the face in the near infrared thermal imaging video, carrying out convolution sharpening processing on each frame of image in the near infrared thermal imaging video so as to carry out face region segmentation by utilizing the image after the convolution sharpening processing.

6. The method for estimating a remote wide range heart rate based on deep learning and face segmentation according to claim 1, wherein the cyclic neural network model is trained based on a plurality of sample heart rate signals and classifications corresponding to the sample heart rate signals; wherein, the sample heart rate signal with heart rate lower than 90 is classified as low heart rate; the sample heart rate signal with a heart rate not lower than 90 corresponds to a classification of high heart rate.