CN114596963A

CN114596963A - Remote non-contact heart rate estimation method, system and equipment based on sparse structure representation

Info

Publication number: CN114596963A
Application number: CN202210319966.6A
Authority: CN
Inventors: 欧卫华; 陈龙保; 韩杰
Original assignee: Guizhou Education University
Current assignee: Guizhou Education University
Priority date: 2022-03-29
Filing date: 2022-03-29
Publication date: 2022-06-07

Abstract

The invention discloses a remote non-contact heart rate estimation method, system and equipment based on sparse structure representation, belongs to the field of physiological sign research and application, and solves the problem of inaccurate detection of the existing heart rate signals, and comprises the following steps: step 1, inputting a face video, and dividing a face into a plurality of sub-regions; step 2, constructing an average RGB pulse signal of each subregion, performing motion compensation, calculating a chrominance signal S by utilizing linear combination of RGB different channels, and calculating a signal-to-noise ratio; step 3, solving a coefficient matrix by using a structural dictionary consisting of cosine bases with different frequencies and wavelet bases with different scales, and reconstructing a heart rate signal by using the structural dictionary and the coefficient matrix; and 4, averaging the reconstructed sub-region heart rate signals, and performing power spectrum analysis to calculate the heart rate HR. The optimal sparse representation based on the dictionary is found, the heart rate signal is reconstructed by utilizing the dictionary and the sparse coding, the heart rate signal which is as real as possible is represented by using dictionary atoms as few as possible, and the accuracy of the algorithm is improved.

Description

Remote non-contact heart rate estimation method, system and equipment based on sparse structure representation

Technical Field

The invention relates to the technical field of biological sign research and application, in particular to the technical field of remote heart rate estimation based on videos.

Background

The heart rate signal is an important physiological signal, and can be used for calculating physiological parameters such as heart rate, heart rate variability, respiratory rate, blood pressure and the like. Conventional heart rate signal detection methods require skin contact (e.g., electrocardiogram), but can lead to discomfort and unsanitary problems. The remote heart rate estimation method based on the facial video is a non-contact heart rate measurement method and is convenient to use. The beating of the heart causes subtle changes in the color of the skin surface, particularly on the facial skin surface. On the basis, remote photoplethysmography (rPPG) is proposed, and a pure heart rate signal is extracted from weak color change by using technologies such as computer vision, signal processing and the like. Remote non-contact heart rate estimation falls into three broad categories: blind source separation, color subspace projection, and deep learning.

With the successful application of deep learning in the fields of computer vision, natural language processing and the like, the method of an end-to-end deep learning network is introduced into the field of remote heart rate estimation based on facial videos. Spetlik et al propose an HR-CNN network that acquires a pulse signal through a two-dimensional convolutional network based on signal-to-noise ratio constraints. Considering the effects of abnormal lighting and motion, Chen et al introduced an attention mechanism and proposed a deep physics approach by constructing a feature map that learns spatiotemporal information. In order to be able to better represent the color variation information, Niu et al convert the RGB signal into a YUV space, extract the heart rate variation information, and convert it into a time-space diagram for heart rate detection. Deep learning methods show better results on data sets, but the models are complex and weakly generalized. However, the deep learning method faces the challenges of requiring a large number of labeled data sets to train the network, and has a low calculation speed, many parameters and the need of pre-training the network model.

The method of blind source separation and color subspace decomposition in the traditional method has better interpretability and faster running speed. The sparse representation has wide application in computer vision, is introduced into the field of reconstruction of heart rate signals, and achieves good effect. Liu et al propose a signal sparse representation method for adaptive interference orthogonal matching pursuit, which recovers a heart rate signal damaged by face instability by constructing a discrete sine dictionary by using an adaptive interference orthogonal matching pursuit algorithm. The heart rate signal is a physiological signal and includes both a periodic signal and a pulsating signal. Although the method of Liu et al achieves good results, the pulsatility of the signals between the sub-regions and the correlation between the sub-regions are not considered, resulting in inconsistency of the estimated heart rate signals, which results in inaccuracy of the estimation of the heart rate signals.

Disclosure of Invention

The invention aims to: in order to solve the technical problem that the heart rate signal estimation is inaccurate due to the fact that the pulsatility of signals between sub-regions and the correlation between the sub-regions are not considered in the signal sparse representation method, the invention provides a remote non-contact heart rate estimation method, system and equipment based on sparse structure representation.

The invention specifically adopts the following technical scheme for realizing the purpose:

a remote non-contact heart rate estimation method based on sparse structure representation comprises the following steps:

the method comprises the steps of 1, inputting a face video, fixedly selecting forehead and cheek parts as interested areas of a detected face, and partitioning to obtain a plurality of sub-areas (in remote video heart rate detection, the face detection is firstly needed, in different scenes, the face can be processed in aspects of movement, rotation and the like, and the face detection method is timely and accurate, considering the real-time property and the positioning and tracking effect of the detection method, the face detection and tracking are carried out by adopting a PCN (pulse coupled network) deep network face recognition algorithm to obtain coordinate points and width of the face, and because the forehead and cheek areas of the face have rich heart rate information, based on the analysis, the forehead and the cheek are fixedly selected as the ROI by adopting the PCN algorithm, and the forehead and the cheek are partitioned into the plurality of sub-areas);

step 2, constructing an average RGB pulse signal of each subregion, performing motion compensation on the RGB pulse signal (eliminating sudden change of pixels and generating a primary de-noised color signal), calculating a chrominance signal S by utilizing linear combination of different channels of RGB, calculating the signal-to-noise ratio of each region based on the chrominance signal S (the quality of chrominance signals of partial subregions is poor due to the factors of foreign matters on a face shielding surface, uneven light distribution, motion and the like), selecting the subregion with rich heart rate information by calculating the signal-to-noise ratio of the chrominance signal, taking the subregion with the signal-to-noise ratio of the subregion larger than the total signal-to-noise ratio as a high-quality quantum region, defining the corresponding chrominance signal as the high-quality chrominance signal, and generating a high-quality quantum region chrominance signal M;

step 3, for real heart rate signal M_pulseSparse structure representation is carried out, and heart rate signals are reconstructed

The reconstructed heart rate signal is further computed from a high quality sub-region chrominance signal M, which consists of the chrominance signals of the sub-region with a high signal-to-noise ratio, i.e. the reconstructed heart rate signal is obtained from the high quality sub-region chrominance signal M

Wherein the content of the first and second substances,

the chrominance signal representing the selected high quality sub-region (the chrominance signal M of the high quality sub-region, which is linearly calculated from the average RGB pulse signal, is derived from the true heart rate signal (M)_pulse) And a noise signal (M)_noise) Composition, i.e. M ═ M_pulse+M_noiseSparse structural representation reconstructs the heart rate signal

As an estimate of the heart rate signal),

representing the reconstruction of the heart rate signal for selected high quality quantum regions, X is obtained as follows: the method comprises the steps of utilizing cosine bases with different frequencies and wavelet bases with different scales to form a mixed structure dictionary D (utilizing different cosine bases to reconstruct the periodic characteristics of heart rate signals and utilizing wavelet bases with different scales to reconstruct the fluctuation characteristics of the heart rate signals), solving a coefficient matrix A, selecting proper dictionary atoms from the mixed dictionary by a sparse structure method to represent the heart rate signals, carrying out sparse coding by a greedy algorithm to form the coefficient matrix A, and reconstructing the heart rate signals by an inner product method of the mixed structure dictionary D and the coefficient matrix A, namely X is D.A;

and 4, averaging the reconstructed heart rate signals of all the sub-regions, and then analyzing the power spectrum to obtain the heart rate HR.

In the technical scheme of the application, a section of video containing a face is input, forehead and cheek parts of the detected face are extracted to be used as a region of interest (ROI), a plurality of sub-regions are segmented in the ROI, an original color signal is extracted, the original color signal is analyzed, and sudden change of pixels caused by rigid motion and unstable lighting conditions of the face is eliminated through motion compensation; calculating a chrominance signal and a signal-to-noise ratio of each subregion, taking the subregion of which the signal-to-noise ratio is greater than the total signal-to-noise ratio as a high-quality quantum region, defining the corresponding chrominance signal as a high-quality chrominance signal, and generating the high-quality chrominance signal; a structural dictionary consisting of cosine bases with different frequencies and wavelet bases with different scales is adopted to efficiently reconstruct the period and amplitude of the heart rate signal; solving a sparse matrix, and selecting more proper dictionary atoms from the selected dictionary to represent the heart rate signals by using a sparse structure method; reconstructing a heart rate signal by adopting a dictionary and sparse matrix inner product method; averaging the reconstructed heart rate signal matrix; and finally, performing power spectrum analysis on the average heart rate signals of all the sub-regions to obtain the final heart rate. This application divides into a plurality of subregions with the people's face, extracts the high-quality chrominance signal in subregion, constructs mixed structure dictionary, seeks the best sparse representation based on mixed structure dictionary, utilizes mixed structure dictionary and sparse coding reconsitution heart rate signal, expresses the heart rate signal as far as possible with the dictionary atom as few as possible. The accuracy of the algorithm is improved.

Further, in step 2, calculating the average value, the absolute deviation of the average value and the standard deviation of the RGB pulse signals of each subregion (the standard deviation reflects the degree of deviation of the signals from the average value), and then calculating the deviation amplitude of each atom from the average value, wherein the average value is added to the standard deviation for the atoms exceeding the average value and exceeding the standard deviation; the atoms below the mean and below the standard deviation are motion compensated with the mean minus the standard deviation (color signal contains a lot of noise, heart rate signal appears very weak, heart rate does not change significantly in a short time).

Wherein i is the number of channels, i ═ 1, 2, 3 respectively represent RGB channel signals, and specifically i ═ 1 is the R channel; i is 2, which is a G channel; i-3 is B channel, l is signal length, C_i(t) the signal length of the i-th channel at time t, C_{i_avg}Is the signal average of the i channel, C_{i_Me}Is the signal mean deviation of the i channel, C_{i_SD}Is the standard deviation of the signal for the i channel.

Further, in step 2, the chrominance signal S is represented by S ═ X_f-αY_fCalculated where α is two signals X_fAnd Y_fA ratio of standard deviation of (a) to (a) (X)_f)/σ(Y_f)，X_sAnd Y_sIs the fusion of the normalized RGB pulse signals after motion compensation, X_sAnd Y_sThe calculation function of (c) is as follows:

X_s＝3R_n-2G_n

Y_s＝1.5R_n+G_n-1.5B_n

wherein R is_n，G_nAnd B_nA color signal for each sub-region;

X_fand Y_fAre each X_sAnd Y_sIs calculated to obtain X_sAnd Y_sThen, by X_sAnd Y_sPassing through a band-pass filter to obtain X_fAnd Y_fFinally, calculating an S chrominance signal;

the signal-to-noise ratio function for each subregion is as follows:

wherein S is_fRepresenting spectral values of the chrominance signal after S-Fourier transformation, f representing the frequency limiting the heart rate, f being in the range 0.6, 4]，(f＝[0.4，6])，W_tRepresenting the calculated sliding window size.

Further, in step 3, the obtaining method of X is as follows: using cosine basis D^cSum wavelet D^wConstructing a mixed structure dictionary D (since any periodic discrete signal can be expressed as a linear combination of cosine-based signals with different frequencies, heart rate signals have different periodicities and different amplitudes, discrete cosine-based sparse representation and wavelet-based sparse representation are used, in order to reduce the size of the dictionary to improve the performance of the algorithm, the frequency of the cosine-based signal is limited in a heart rate frequency range (0.6-4HZ), and constructing a dictionary D containing a cosine base and a small baseA dictionary of mixed-structure wave-based),

D^w＝WaveDict(Short3，N_b，j，b)

wherein k is_iFor the ith in a discrete cosine dictionary_thFrequency of atoms, k_iAnd k is_i+1By a difference of

F_rFor video frame rate, N_bFor the length of the signal generated, j is the level vector, b is the translation factor, f_LIn order to be able to generate the length of the signal sequence,

the mixed dictionary model is:

furthermore, in step 3, the coefficient matrix is solved, and the method of sparse structure (sparse structure representation method is to true heart rate signal (M)_pulse) Sparse structure representation is carried out, and an estimated value of the heart rate signal is obtained. Although each zone contains different heart rate information, its basic heart rate component should be the same and the heart rate signals of the different zones should be similar. Thus, for heart rate signals reconstructed in different sub-regions, by minimizing L of the coefficient matrix₁、L₂Parameterization to select the same dictionary atoms in the signal-based dictionary, min | | A | | sweet wind_1，2. Furthermore, the reconstructed heart rate signal X and the real heart rate signal M_pulseThere should be a small error between them. This problem can be solved by minimizing L between the true heart rate signal and the reconstructed heart rate signal₂Parameterized to be bounded, i.e. minimum

Based on this, the reconstruction of the heart rate signal can be expressed as solving the following model) from the hybrid dictionarySelecting proper dictionary atoms to represent pulse signals, carrying out sparse coding by a greedy algorithm (so as to find the most proper dictionary atoms), and solving a coefficient matrix specifically comprises the following steps:

obtaining sparse codes, forming a coefficient matrix A,

wherein λ is used to balance signal reconstruction accuracy

And a penalty term | | A | | non-woven phosphor_1，2The regularization parameter of (a);

further, in step 3, reconstructing the heart rate signal, and reconstructing the heart rate signal by adopting a method of inner product of a hybrid dictionary D and a coefficient matrix a, wherein an inner product model is as follows:

X＝D·A。

an ideal PPG signal should contain periodicity and fluctuation rate. The periodicity can be expressed as a linear combination of a small number of cosine bases and the amplitude can be expressed as a linear combination of a small number of wavelets. Therefore, the ideal PPG signal can be represented with a small number of mixed-structure dictionaries, i.e. a is a sparse matrix. The heart rate signal is formed by a change in the heartbeat, and the epidermis of the face is changed in different colors due to interference of light, motion, and other noises.

Furthermore, the reconstructed heart rate signals of the sub-regions are averaged to obtain a heart rate signal, which specifically comprises:

where N is the total number of selected sub-regions, x_nIs the heart rate signal of the selected sub-area.

Further, in step 4, the power spectrum analysis is performed on the averaged heart rate signal, and the component with the highest peak value is selected as the main component of the heart rate signal, that is, the frequency f corresponding to the component with the highest power is adopted_HRFinally, time-to-time conversion to time domainObtaining a final heart rate HR, specifically: HR ═ f_HR*60。

A system for remote non-contact heart rate estimation based on sparse structure representation, comprising:

the subregion acquisition module is used for inputting a face video, fixedly selecting forehead and cheek parts of a detected face as regions of interest, and partitioning the regions of interest to obtain a plurality of subregions;

the high-quality quantum region chrominance signal generating module is used for constructing an average RGB pulse signal of each sub-region, performing motion compensation on the RGB pulse signals, calculating a chrominance signal S by utilizing linear combination of different RGB channels, calculating the signal-to-noise ratio of each sub-region based on the chrominance signal S, taking the sub-region with the signal-to-noise ratio of the sub-region larger than the total signal-to-noise ratio as a high-quality quantum region, defining the corresponding chrominance signal as a high-quality chrominance signal, and generating a high-quality quantum region chrominance signal M;

a heart rate signal reconstruction module for reconstructing a real heart rate signal M_pulseSparse structure representation is carried out, and heart rate signals are reconstructed

The high-quality sub-region chrominance signal M consists of chrominance signals of sub-regions of high signal-to-noise ratio, i.e.

Wherein the content of the first and second substances,

a chrominance signal representing a selected high quality quantum region,

representing the reconstruction of the heart rate signal for selected high quality quantum regions, X is obtained as follows: forming a mixed structure dictionary D by using cosine bases with different frequencies and wavelet bases with different scales, solving a coefficient matrix A, selecting proper dictionary atoms from the mixed dictionary by using a sparse structure method to represent heart rate signals, carrying out sparse coding by using a greedy algorithm to form the coefficient matrix A, and adopting a mixed structure methodReconstructing a heart rate signal by a method of integrating the inner product of the dictionary D and the coefficient matrix A, namely X is equal to D.A;

and the heart rate obtaining module is used for averaging the reconstructed heart rate signals of all the sub-regions and then carrying out power spectrum analysis to obtain the heart rate HR.

A computer device comprising a memory storing a computer program and a processor implementing the steps of the sparse structure representation based remote contactless heart rate estimation method when executing the computer program.

The invention has the following beneficial effects:

1. dividing a human face into a plurality of sub-regions, extracting high-quality chrominance signals of the sub-regions, constructing a mixed structure dictionary, searching for optimal sparse representation based on the mixed structure dictionary, reconstructing a heart rate signal by using the mixed structure dictionary and sparse coding, and representing the heart rate signal which is as real as possible by using dictionary atoms as few as possible, so that the accuracy of the algorithm is improved;

2. under the frame length of 1200, the best result is obtained when the MAE and the RMSE are lower than 5bpm, the accuracy rate of the detection method is higher than 96%, and the Pearson correlation between the detected heart rates is higher than 97%;

3. under the states of steady state, speaking, slow translation, slow rotation, medium rotation and fast translation, the total accuracy of the detection method exceeds 95%, and the Pearson correlation coefficient between the detected heart rate and the true value exceeds 98%. Under the states of steady state, speaking, slow translation, slow rotation and medium rotation, the Pearson correlation coefficient is more than 99 percent, which shows that the detection method has good performance to the states;

4. the comparison of other methods is carried out on the UBFC data set, the MAE and the RMSE of the method are lower than 5bpm, and the Acc and the PCC are higher than 96 percent, which shows that the method has stronger applicability; compared with the best-performing POS method, the MAE is improved by 1.697bpm, the RMSE is improved by 6.075bpm, and the Pearson correlation coefficient is improved by 11.9%, which shows that the method has higher algorithm robustness under the conditions of stable illumination and heart rate variation;

5. the detection method of the present invention detects a high degree of correspondence between heart rate and true HR for all states of the entire data set.

Drawings

FIG. 1 is a block diagram of a sparse structure representation-based remote non-contact heart rate estimation method of the present invention;

FIG. 2 is a flow chart of a method of remote non-contact heart rate estimation based on sparse structure representation according to the present invention;

FIG. 3 is a flow chart of sparse structure representation of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments.

All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Example 1

As shown in fig. 1 to 3, the present embodiment provides a remote non-contact heart rate estimation method based on sparse structure representation, including the following steps:

calculating the mean, the absolute deviation of the mean and the standard deviation of the RGB pulse signals of each subregion (the standard deviation reflects the degree of deviation of the signals from the mean), then calculating the deviation amplitude of each atom from the mean, and the average plus the standard deviation of the atoms exceeding the mean and the standard deviation; the atoms below the mean and below the standard deviation are motion compensated with the mean minus the standard deviation (color signal contains a lot of noise, heart rate signal appears very weak, heart rate does not change significantly in a short time).

Wherein i is the number of channels, i ═ 1, 2, 3 respectively represent RGB channel signals, and specifically i ═ 1 is the R channel; i is 2, which is a G channel; i-3 is B channel, l is signal length, C_i(t) the signal length of the i-th channel at time t, C_{i_avg}Is the signal average of the i channel, C_{i_Me}Is the signal mean deviation of the i channel, C_{i_SD}Is the signal standard deviation of the i channel;

the chrominance signal S is defined by S ═ X_f-αY_fCalculated where α is two signals X_fAnd Y_fA ratio of standard deviation of (a) to (a) (X)_f)/σ(Y_f)，X_sAnd Y_sIs the fusion of the normalized RGB pulse signals after motion compensation, X_sAnd Y_sThe calculation function of (a) is as follows:

X_s＝3R_n-2G_n

Y_s＝1.5R_n+G_n-1.5B_n

wherein R is_n，G_nAnd B_nA color signal for each sub-region;

the signal-to-noise ratio function for each subregion is as follows:

wherein S is_fRepresenting spectral values of the chrominance signal after S-Fourier transformation, f representing the frequency limiting the heart rate, the range of f being [0.6, 4 ]]，(f＝[0.4，6])，W_tRepresenting a calculated sliding window size;

step 3, for real heart rate signal M_pulsePerforming sparse structure representation, heavilyForm heart rate signal

Wherein the content of the first and second substances,

As an estimate of the heart rate signal),

the method for obtaining X is as follows: using cosine basis D^cSum wavelet D^wA mixed structure dictionary D is constructed (since any periodic discrete signal can be represented as a linear combination of cosine-based signals of different frequenciesDiscrete cosine-based sparse representation and wavelet-based sparse representation are used. In order to reduce the size of the dictionary to improve the performance of the algorithm, the frequency of the cosine-based signal is limited in the heart rate frequency range (0.6-4HZ), and on the basis, a mixed structure dictionary containing cosine bases and wavelet bases is constructed,

D^w＝WaveDict(Short3，N_b，j，b)

the mixed dictionary model is:

solving the coefficient matrix by means of a sparse structure (sparse structure representation is for the true heart rate signal (M)_pulse) Sparse structure representation is carried out, and an estimated value of the heart rate signal is obtained. Although each zone contains different heart rate information, its basic heart rate component should be the same and the heart rate signals of the different zones should be similar. Thus, for heart rate signals reconstructed in different sub-regions, by minimizing L of the coefficient matrix₁、L₂Parameterization to select the same dictionary atoms in the signal-based dictionary, min | | A | | sweet wind_1，2. Furthermore, the reconstructed heart rate signal X and the real heart rate signal M_pulseThere should be a small error between them. This problem can be solved by minimizing L between the true heart rate signal and the reconstructed heart rate signal₂Parameterized to be bounded, i.e. minimum

Based on this, the reconstruction of the heart rate signal can be expressed as solving the following model), selecting an appropriate dictionary atom from a mixed dictionary to represent the pulse signal, performing sparse coding with a greedy algorithm (to find the most appropriate dictionary atom), and solving a coefficient matrix specifically as solving the following model:

obtaining sparse codes, forming a coefficient matrix A,

wherein λ is used to balance signal reconstruction accuracy

reconstructing the heart rate signal, and reconstructing the heart rate signal by adopting a method of inner product of a hybrid dictionary D and a coefficient matrix A, wherein an inner product model is as follows:

X＝D·A。

an ideal PPG signal should contain periodicity and fluctuation rate. The periodicity can be expressed as a linear combination of a small number of cosine bases and the amplitude can be expressed as a linear combination of a small number of wavelets. Therefore, an ideal PPG signal can be represented with a small number of mixed-structure dictionaries, i.e. a is a sparse matrix. The heart rate signal is formed by a change in the heartbeat, and the epidermis of the face is changed in different colors due to interference of light, motion, and other noises.

Averaging the reconstructed heart rate signals of all the sub-regions to obtain heart rate signals, which specifically comprises the following steps:

where N is the total number of selected sub-regions, x_nA heart rate signal for the selected sub-region;

step 4, averaging the reconstructed heart rate signals of all the sub-regions, and performing work againRate spectrum analysis to obtain the heart rate HR, power spectrum analysis of the averaged heart rate signal, and selecting the component with the highest peak value as the main component of the heart rate signal, i.e. adopting the frequency f corresponding to the component with the highest power_HRAnd finally, converting the time frequency into the time domain to obtain the final heart rate HR, which specifically comprises the following steps: HR ═ f_HR*60。

Wherein, in fig. 2, I represents: face detection and ROI selection; II represents chroma signal extraction and motion compensation; III denotes the original RGB pulse signal; IV represents a sparse structure representation; v represents the average reconstructed heart rate signal; VI denotes the calculated heart rate; in fig. 3, a denotes a mixed structure dictionary (D); b represents a matching coefficient matrix (a); c represents the reconstructed sub-region heart rate signal; d represents averaging the reconstructed heart rate signals of the individual sub-regions.

In order to verify the superiority of the method in the video-based remote heart rate estimation, two data sets PURE and UBFC are selected for experiments, and in order to evaluate the comprehensive performance of the algorithm, the average absolute error (MAE), the Root Mean Square Error (RMSE), the accuracy (Acc) and the Pearson Correlation Coefficient (PCC) of the heart rate are selected as evaluation indexes. The heart rate error of the nth video sample is defined as follows:

wherein n represents the n-th^thThe number of the test specimens is determined,

representing the heart rate signal extracted from the video,

representing the heart rate detected from the PPG, i.e. the true signal.

The mean absolute error of heart rate is defined as follows:

n represents the total number of test samples.

The root mean square error of the heart rate is defined as follows:

the accuracy of the heart rate is defined as follows:

the pearson correlation coefficient between the estimated value and the true value is as follows:

where μ denotes the average of the estimated heart rate,

is the true average.

(1) Result analysis on UBFC datasets by the invention

TABLE 1 Performance at different video lengths

Frame length	MAE(bpm)	RMSE(bpm)	Acc(％)	PCC
					300	8.756	12.515	91.315	0.782
450	7.091	10.764	92.979	0.824
					600	5.843	8.949	94.199	0.869
750	4.017	5.711	95.925	0.940
					900	4.217	6.011	95.790	0.939
1200	3.467	4.071	96.526	0.976

All video segments have a frame rate of 30fps, and the video is divided into different lengths so as to realize heart rate signal recovery. The performance of the detection method at different video lengths is shown in table 1. Under the condition of 1200 frames, the detection method obtains the best result when the MAE and the RMSE are lower than 5bpm, the accuracy rate of the detection method is more than 96%, and the Pearson correlation between the detected heart rates is more than 97%. Below 450 frame lengths, the performance of the detection method drops dramatically. This is because short video in nature does not provide sufficiently stable information for heart rate signal recovery.

(2) Analysis of results on the PURE data set by the present invention

In the PURE data set, the performance of the present invention was tested in different states of the data set, including six different states. The results are shown in Table 2. The overall accuracy of the detection method is over 95%, and the Pearson correlation coefficient between the detected heart rate and the true value is over 98%. Under the states of steady state, speaking, slow translation, slow rotation and medium rotation, the Pearson correlation coefficient is more than 99 percent, which shows that the detection method has good performance to the states.

In the speaking state and the slow translation state, the detection method detects error fluctuation because facial skin wrinkles when a person speaks and motion artifacts are formed when the person slowly translates, and the factors interfere with the extraction of chrominance signals. In the fast pan state, there may be inaccurate ROI selection because face detection has difficulty capturing fast head rotations. Fast rotation causes blurring of the images in the video, thereby blurring the faint heart rate information to a greater extent.

The detection method in the present invention detects a high degree of correspondence between the heart rate and the ground true HR for all states of the entire data set.

TABLE 2 Performance of the Algorithm under different states

Evaluation criteria	01	02	03	04	05	06
							MAE(bpm)	2.577	3.046	2.678	2.738	2.246	2.396
RMSE(bpm)	3.282	3.621	3.748	4.163	2.904	2.949
							Acc(％)	95.951	95.060	96.708	95.891	96.629	96.365
PCC	0.996	0.991	0.991	0.985	0.996	0.994

01, stable state; 02, speaking; 03, slow translation; 04, fast translation; slow rotation is carried out at 05; and 06, medium-speed rotation.

(3) Experimental comparison of the invention

The present invention performs comparisons of other methods on UBFC datasets. Table 3 shows the experimental results of six different estimation methods on UBFC data sets. The MAE and RMSE of the method are lower than 5bpm, and the Acc and PCC are higher than 96 percent, which shows that the method has stronger applicability. Compared with the best POS method, the MAE is improved by 1.697bpm, the RMSE is improved by 6.075bpm, and the Pearson correlation coefficient is improved by 11.9%. The result shows that the method has higher detection method robustness under the conditions of stable illumination and heart rate variation.

DAOMP also uses sparse representation, and because its dictionary is simple, the correlation between the heart rates of the sub-regions is not considered, resulting in a large error between the reconstructed heart rate signal and the original pulse signal.

POS also gives good results as a subspace decomposition method.

TABLE 3 comparison of six comparison methods on UBFC data sets

Methods	MAE(bpm)	RMSE(bpm)	Acc(％)	PCC
					ICA	28.455	36.858	72.420	0.277
CHROM	7.627	15.693	92.542	0.714
					POS	5.164	10.146	94.872	0.857
LGI	10.570	20.715	89.841	0.553
					DAOMP	14.580	22.804	85.850	0.484
Ours	3.467	4.071	96.526	0.976

Example 2

A sparse structure representation-based remote non-contact heart rate estimation system, comprising:

the subregion acquisition module is used for inputting a face video, fixedly selecting forehead and cheek parts as regions of interest of the detected face, and partitioning the regions of interest to acquire a plurality of subregions;

the high-quality quantum region chrominance signal generating module is used for constructing an average RGB pulse signal of each sub-region, performing motion compensation on the RGB pulse signal, calculating a chrominance signal S by utilizing linear combination of different channels of RGB, calculating the signal-to-noise ratio of each sub-region based on the chrominance signal S, taking the sub-region with the signal-to-noise ratio of the sub-region larger than the total signal-to-noise ratio as a high-quality quantum region, defining the corresponding chrominance signal as the high-quality chrominance signal and generating a high-quality quantum region chrominance signal M;

Wherein the content of the first and second substances,

a chrominance signal representing a selected high quality quantum region,

representing the reconstruction of the heart rate signal for selected high quality quantum regions, X is obtained as follows: by making use of differencesForming a mixed structure dictionary D by cosine basis of frequency and wavelet basis of different scales, solving a coefficient matrix A, selecting proper dictionary atoms from the mixed dictionary by a sparse structure method to represent heart rate signals, carrying out sparse coding by a greedy algorithm to form the coefficient matrix A, and reconstructing the heart rate signals by adopting an inner product method of the mixed structure dictionary D and the coefficient matrix A, namely

X＝D·A；

In the high-quality quantum region chrominance signal generation module:

calculating the average value, the absolute deviation of the average value and the standard deviation of the RGB pulse signals of each subarea, then calculating the deviation amplitude of each atom from the average value, and adding the standard deviation to the average value for the atoms exceeding the average value and exceeding the standard deviation; atoms below the mean and below the standard deviation are motion compensated with the mean minus the standard deviation, as defined in detail below:

wherein, l is the channel number, l 1, 2, 3 respectively represent RGB channel signals, and specific l 1 is an R channel; i is 2, which is a G channel; i-3 is B channel, l is signal length, C_i(t) the signal length of the i-th channel at time t, C_{i_avg}Is the signal average of the i channel, C_{i_Me}Is a general purpose ofMean deviation of the signal of the track, C_{i_SD}Is the signal standard deviation of the i channel;

the chrominance signal S is defined by S ═ X_f-αY_fCalculated where α is the two signals X_fAnd Y_fA ratio of standard deviation of (a) to (a) (X)_f)/σ(Y_f)，X_sAnd Y_sIs the fusion of the normalized RGB pulse signals after motion compensation, X_sAnd Y_sThe calculation function of (c) is as follows:

X_s＝3R_n-2G_n

Y_s＝1.5R_n+G_n-1.5B_n

wherein R is_n，G_nAnd B_nA color signal for each sub-region;

the signal-to-noise ratio function for each subregion is as follows:

wherein S is_fRepresenting spectral values of the chrominance signal after S-Fourier transformation, f representing the frequency limiting the heart rate, the range of f being [0.6, 4 ]]，W_tRepresenting the calculated sliding window size.

In the heart rate signal reconstruction module:

the method for obtaining X is as follows: using cosine basis D^cSum wavelet D^wA hybrid dictionary D is constructed which is,

D^w＝WaveDict(Short3，N_b，j，b)

the mixed dictionary model is:

solving a coefficient matrix, selecting proper dictionary atoms from a mixed dictionary by using a sparse structure method to represent pulse signals, and carrying out sparse coding by using a greedy algorithm, wherein the step of solving the coefficient matrix is specifically to solve the following model:

obtaining sparse codes, forming a coefficient matrix A,

wherein λ is used to balance signal reconstruction accuracy

reconstructing the heart rate signal by adopting a method of inner product of a mixed dictionary D and a coefficient matrix A, wherein an inner product model is as follows:

X＝D·A；

In the heart rate obtaining module:

performing power spectrum analysis on the averaged heart rate signal, selecting the component with the highest peak value as the main component of the heart rate signal, namely adopting the frequency f corresponding to the component with the highest power_HRFinally, the time-frequency conversion is carried out to the time domain to obtain the final heart rateHR, specifically: HR ═ f_HR*60。

Example 3

A computer device comprising a memory storing a computer program and a processor implementing the steps of the sparse structure representation based remote non-contact heart rate estimation method when executing the computer program.

Claims

1. The remote non-contact heart rate estimation method based on the sparse structure representation is characterized by comprising the following steps of:

step 1, inputting a face video, fixedly selecting forehead and cheek parts as regions of interest of a detected face, and partitioning to obtain a plurality of sub-regions;

step 2, constructing an average RGB pulse signal of each subregion, performing motion compensation on the RGB pulse signal, calculating a chrominance signal S by utilizing linear combination of different channels of RGB, calculating the signal-to-noise ratio of each subregion based on the chrominance signal S, taking the subregion of which the signal-to-noise ratio is greater than the total signal-to-noise ratio as a high-quality quantum region, defining the corresponding chrominance signal as a high-quality chrominance signal, and generating a high-quality quantum region chrominance signal M;

step 3, for real heart rate signal M_pulseThe heart rate signal is reconstructed by carrying out sparse structure representation

Wherein the content of the first and second substances,

a chrominance signal representing a selected high quality quantum region,

for selected high qualityAnd (3) reconstructing heart rate signals of the sub-areas, wherein the acquisition method of X is as follows: forming a mixed structure dictionary D by using cosine bases with different frequencies and wavelet bases with different scales, solving a coefficient matrix A, selecting a proper dictionary atom from the mixed dictionary by using a sparse structure method to represent a heart rate signal, carrying out sparse coding by using a greedy algorithm to form a coefficient matrix A, and reconstructing the heart rate signal by using an inner product method of the mixed structure dictionary D and the coefficient matrix A, wherein X is D.A;

2. The sparse structure representation-based remote non-contact heart rate estimation method according to claim 1, wherein in step 2, the mean, the absolute deviation of the mean and the standard deviation of the RGB pulse signals of each sub-area are calculated, and then the magnitude of each atom deviating from the mean is calculated, and the atoms exceeding the mean and exceeding the standard deviation are added with the mean plus the standard deviation; atoms below the mean and below the standard deviation are motion compensated with the mean minus the standard deviation, as defined in detail below:

wherein i is the number of channels, i ═ 1, 2, 3 respectively represent RGB channel signals, and specifically i ═ 1 is the R channel; i ═2, is a G channel; i-3 is B channel, l is signal length, C_i(t) the signal length of the i-th channel at time t, C_{i_avg}Is the signal average of the i channel, C_{i_Me}Is the signal mean deviation of the i channel, C_{i_SD}Is the standard deviation of the signal for the i channel.

3. The sparse-structure-representation-based remote non-contact heart rate estimation method according to claim 1, wherein in step 2, the chrominance signal S is represented by S ═ X_f-αY_fCalculated where α is two signals X_fAnd Y_fA ratio of standard deviation of (a) to (a) (X)_f)/σ(Y_f)，X_sAnd Y_sIs the fusion of the normalized RGB pulse signals after motion compensation, X_sAnd Y_sThe calculation function of (a) is as follows:

X_s＝3R_n-2G_n

Y_s＝1.5R_n+G_n-1.5B_n

wherein R is_n，G_nAnd B_nA color signal for each sub-region;

the signal-to-noise ratio function for each subregion is as follows:

4. The sparse structure representation-based remote non-contact heart rate estimation method according to claim 1, wherein in step 3, X is obtained as follows: using cosine basis D^cSum wavelet D^wA hybrid dictionary D is constructed which is,

D^w＝WaVeDict(Short3，N_b，j，b)

wherein k is_iFor the ith in a discrete cosine dictionary_thFrequency of atom, k_iAnd k is_i+1By a difference of

the mixed dictionary model is:

5. the sparse-structure-representation-based remote non-contact heart rate estimation method according to claim 4, wherein in step 3, a coefficient matrix is solved, a sparse-structure method is used for selecting a proper dictionary atom from a mixed dictionary to represent the pulse signal, a greedy algorithm is used for sparse coding, and the solving of the coefficient matrix is specifically to solve the following model:

obtaining sparse codes, forming a coefficient matrix A,

wherein λ is used to balance signal reconstruction accuracy

And a penalty term | | A | | non-woven phosphor_1，2The regularization parameter of (1).

6. The sparse structure representation-based remote non-contact heart rate estimation method according to claim 5, wherein in step 3, the heart rate signal is reconstructed, and the heart rate signal is reconstructed by adopting a method of inner product of a mixed dictionary D and a coefficient matrix A, and an inner product model is as follows:

X＝D·A。

7. the sparse structure representation-based remote non-contact heart rate estimation method according to claim 6, wherein the heart rate signals of the reconstructed sub-regions are averaged to obtain heart rate signals, specifically:

8. The sparse-structure-representation-based remote non-contact heart rate estimation method according to claim 1, wherein in step 4, the power spectrum analysis is performed on the averaged heart rate signal, and the component with the highest peak value is selected as the main component of the heart rate signal, that is, the frequency f corresponding to the component with the highest power is adopted_HRAnd finally, converting the time frequency into the time domain to obtain the final heart rate HR, which specifically comprises the following steps: HR ═ f_HR*60。

9. A system for remote non-contact heart rate estimation based on sparse structure representation, comprising:

Wherein the content of the first and second substances,

a chrominance signal representing a selected high quality quantum region,

representing the reconstruction of the heart rate signal for selected high quality quantum regions, X is obtained as follows: forming a mixed structure dictionary D by using cosine bases with different frequencies and wavelet bases with different scales, solving a coefficient matrix A, selecting a proper dictionary atom from the mixed dictionary by using a sparse structure method to represent a heart rate signal, carrying out sparse coding by using a greedy algorithm to form a coefficient matrix A, and reconstructing the heart rate signal by using an inner product method of the mixed structure dictionary D and the coefficient matrix A, wherein X is D.A;

10. A computer device, characterized by comprising a memory storing a computer program and a processor implementing the steps of the sparse structure representation based remote contactless heart rate estimation method of any of claims 1-8 when executing the computer program.