WO2021184620A1

WO2021184620A1 - Camera-based non-contact heart rate and body temperature measurement method

Info

Publication number: WO2021184620A1
Application number: PCT/CN2020/103087
Authority: WO
Inventors: 谢世朋; 袁柱柱
Original assignee: 南京昊眼晶睛智能科技有限公司
Priority date: 2020-03-19
Filing date: 2020-07-20
Publication date: 2021-09-23
Also published as: CN111407245B; CN111407245A

Abstract

A camera-based non-contact heart rate and body temperature measurement method. According to the method, by performing color correction on original video images and eliminating interference caused by light, the impact of light intensity on measurement results during a measurement process is minimized. According to the method, an electrocardiogram curve is obtained using a depth learning method, there is no need to position key parts of a face, and the electrocardiogram curve can be obtained only by inputting a face contour into a constructed model. The whole measurement process is simple and convenient. In the case that the heat rate is measured, body temperature data of the subject can be further obtained by calculation using the heart rate value. The whole method not only greatly improves the measurement accuracy, but also is more complete in function, and can better meet the actual heart rate and body temperature measurement requirements.

Description

A camera-based non-contact heart rate and body temperature measurement method

Technical field

The invention relates to the technical field of non-contact physical sign monitoring and image processing, and more specifically to a camera-based non-contact heart rate and body temperature measurement method.

Background technique

At present, as the incidence of cardiovascular and cerebrovascular diseases continues to increase, people's health awareness has gradually increased, and their awareness of detecting physical parameters such as heart rate and body temperature has also been increasing. Heart rate is one of the important physiological parameters of human metabolism and functional activities. The most accurate method of traditional heart rate measurement is electrocardiography, but the electrocardiography method requires electrodes to be glued to the subject’s skin. This method is more complicated and less convenient to use, and this method needs to be in direct contact with the skin. Subject to limitations, such as measuring heart rate and body temperature of infants and athletes during exercise.

For this reason, image PPG (Photoplethysmography) technology came into being. PPG, or photoplethysmography, is a method that uses photoelectric means to non-invasively detect changes in blood volume in living tissues. Reflected light intensity, trace the blood volume pulse (BVP) signal and calculate the heart rate. Fu Mingzhe and others first proposed a non-contact heart rate detection method using an ordinary web camera. This method uses independent component analysis (ICA) to separate three average color traces into three base source signals, and analyzes the second base source signal. The power spectrum estimates the heart rate. The above methods all require the tester to be in a cooperative situation and need to measure under the condition of sufficient light. When the light is weak, this method is difficult to extract a clean BVP signal, which will contain excess noise, which will have a serious impact on the detection result. However, due to the harsh measurement conditions and the influence of measurement errors, the above methods have not been widely used.

Therefore, how to provide a non-contact heart rate and body temperature measurement method with strong practicability, high measurement accuracy, and stability and reliability is an urgent problem to be solved by those skilled in the art.

Summary of the invention

In view of this, the present invention provides a camera-based non-contact heart rate and body temperature measurement method. The method of measuring heart rate and body temperature is less affected by light, has low requirements for measurement conditions, and the measured results are more accurate. The existing non-contact heart rate measurement methods have to deal with the problem of severe measurement conditions and large errors in the measurement results.

In order to achieve the above objectives, the present invention adopts the following technical solutions:

A camera-based non-contact heart rate and body temperature measurement method, the method includes:

S1: Under ordinary visible light conditions, collect video images of the facial area of the subject through the camera, and perform color correction on the collected video images;

S2: Perform face recognition on each frame of video image after color correction, and intercept the face contour image from the recognized face area;

S3: Perform deep learning on the face contour images intercepted in a continuous video image, and obtain the ECG curve;

S4: Eliminate baseline drift and strengthen R wave processing on the obtained ECG curve, and calculate the heart rate value of the subject by calculating the number of R wave appearances per minute;

S5: Calculate the body temperature of the subject according to the relationship between the normal human heart rate benchmark and the obtained heart rate value.

The beneficial effect of the present invention is that the method uses color correction to the original video image to remove the interference caused by the light, and minimizes the influence of the light intensity on the measurement result during the measurement process. The method uses the deep learning method to obtain the heart Electrical curve, no need to locate the key parts of the face, just input the contour image of the face into the built model, you can get the ECG curve, the whole measurement process is simple and convenient, based on the measured heart rate, you can also Further using the heart rate value to calculate the body temperature data of the person to be tested, the whole method not only greatly improves the measurement accuracy, but also has more complete functions, which can better meet the actual heart rate and body temperature measurement needs.

Further, in step S1, performing color correction on the collected video image specifically includes:

S101: Establish an achromatic model, assuming that the average image is achromatic;

S102: Obtain the RGB value of each frame of video image, and substitute the RGB value of each frame of video image into the achromatic model to perform color correction.

Further, the achromatic model is:

in,

Is the corrected color component, k is the scale factor, and the value is:

Where, V=2 ^N -1, 0<N<225.

In order to avoid the influence of the illumination change in the measurement environment on the measurement result, the method of the present invention removes the influence of the illumination transformation by converting the RGB value of each pixel in the image.

Further, step S2 specifically includes:

S201: Build a SegNet semantic segmentation model, a U-net semantic segmentation model, and a semantic segmentation model coupled with Faster-RCNN and digital matting;

S202: Use the constructed SegNet semantic segmentation model, U-net semantic segmentation model, and the semantic segmentation model coupled with Faster-RCNN and digital matting to perform face recognition and semantic segmentation on each frame of video image after color correction. Obtain three sets of recognition results;

S203: Perform a weighted average of the obtained three sets of recognition results to obtain a final face contour image.

The beneficial effect of adopting the above technical solution is that the method of using the weighted average of the three segmentation models to obtain the face contour image, compared with the method of directly obtaining the face contour by directly using the edge detection, the face contour obtained by the present invention is closer to the actual person. Face shape.

Further, step S3 specifically includes:

S301: Construct a feature fusion residual network, select ECG images obtained by multiple testers wearing ECG acquisition equipment and face contour images obtained after processing at step S2 at the same time as the test set, and fuse the features The residual network is trained to obtain the ECG detection model;

S302: Input the face contour image in a segment of view image obtained in step S2 into the ECG detection model, and output the ECG curve.

The beneficial effect of adopting the above-mentioned further scheme is that the feature fusion residual network is trained through multiple sets of test data to obtain an ECG detection model. The input of the model is a continuous face contour image, and the output is an ECG curve. In the process of obtaining the electrical curve, there is no need to extract the key parts of the face contour, and the electrocardiogram curve can be obtained directly from the face contour image.

Further, step S5 specifically includes:

S501: Construct a deep learning network, select multiple groups of different experimenters' heart rate and body temperature corresponding data under the same conditions, train the deep learning network to obtain a heart rate and body temperature conversion model;

S502: Input the obtained heart rate value of the person to be measured into the heart rate and body temperature conversion model, and output the body temperature value of the person to be measured.

By constructing a deep learning network, multiple sets of heart rate and body temperature corresponding data are trained to obtain the heart rate and body temperature conversion relationship, and then the heart rate data of the subject is used as an input value into the model, and the corresponding body temperature value is output to realize the measurement of body temperature.

Description of the drawings

In order to explain the embodiments of the present invention or the technical solutions in the prior art more clearly, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only It is an embodiment of the present invention. For those of ordinary skill in the art, other drawings can be obtained based on the provided drawings without creative work.

Fig. 1 is a schematic flowchart of a camera-based non-contact heart rate and body temperature measurement method provided by the present invention;

Fig. 2 is a schematic diagram of a process flow of performing color correction on a captured video image in an embodiment of the present invention;

Fig. 3 is a schematic flowchart of a process of acquiring a face contour image in an embodiment of the present invention;

Figure 4 is a schematic diagram of a SegNet network structure in an embodiment of the present invention;

Figure 5 is a schematic diagram of the U-Net network structure in an embodiment of the present invention;

Fig. 6 is a schematic diagram of a process flow of obtaining an ECG curve in an embodiment of the present invention;

Fig. 7 is a schematic diagram of a feature fusion residual network structure in an embodiment of the present invention;

Fig. 8 is a schematic diagram of the network structure of EDSR and WDSR in an embodiment of the present invention;

9 is a schematic diagram of the size of the convolution kernel used in RSDB and the size of the convolution kernel used in WDSR in an embodiment of the present invention;

Fig. 10 is a schematic flow chart of the process of calculating the body temperature value of the test subject according to the relationship between the normal human heart rate reference and the obtained heart rate value in the embodiment of the present invention.

Detailed ways

The technical solutions in the embodiments of the present invention will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only a part of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present invention.

Referring to Figure 1, an embodiment of the present invention discloses a camera-based non-contact heart rate and body temperature measurement method, which includes:

S4: Eliminate baseline drift and strengthen R wave processing on the obtained ECG curve, and calculate the heart rate value of the subject by calculating the number of R wave appearances per minute; S5: According to the relationship between the body's normal heart rate reference and the obtained heart rate value , Calculate the body temperature of the person to be tested.

In a specific embodiment, referring to FIG. 2, in step S1, performing color correction on the collected video image specifically includes:

In a specific embodiment, the achromatic model is:

in,

Is the corrected color component, k is the scale factor, and the value is:

Where, V=2 ^N -1, 0<N<225.

In order to avoid the influence of the illumination change in the measurement environment on the measurement result, the method of this embodiment removes the influence of the illumination transformation by converting the RGB value of each pixel in the image.

In a specific embodiment, referring to FIG. 3, step S2 specifically includes:

The following is a corresponding description of the above three segmentation models, and the content is as follows:

(1) SegNet semantic segmentation model

SegNet is an image semantic segmentation deep network proposed by Cambridge, open source, based on the caffe framework. SegNet is a semantic segmentation network obtained by modifying the VGG-16 network based on FCN. The network structure is clear and easy to understand, and the training is fast and there are few pits. The SegNet network structure is an encoder-decoder structure. When SegNet performs semantic segmentation, a CRF module is usually added at the end for post-processing, aiming to further refine the edge segmentation results.

The novelty of SegNet lies in the way the decoder upsamples its lower resolution input feature maps. Specifically, the decoder uses the pooling index calculated in the maximum pooling step of the corresponding encoder to perform non-linear upsampling. This method eliminates the need for learning upsampling. The feature map after upsampling is sparse, so a trainable convolution kernel is then used to perform convolution operations to generate dense feature maps. SegNet uses de-pooling in the decoder to upsample the feature map and maintain the integrity of high-frequency details during segmentation. The encoder does not use a fully connected layer (convolution like FCN), so it is a lightweight network with fewer parameters. The index of each maximum pooling layer in the encoder is stored, and the stored indexes are used in the decoder to perform the depooling operation on the corresponding feature map later. This helps maintain the integrity of high-frequency information, but when depooling low-resolution feature maps, neighboring information is ignored. SegNet network structure diagram is shown as in Fig. 4.

The SegNet semantic segmentation model includes a convolutional layer, a batch normalization layer, an activation layer, a pooling layer, an upsampling layer, and a Softmax layer. The convolutional layer and the activation layer are the same as those in the patch-based CNN classification model. The pooling layer And the upsampling layer has done corresponding processing for information loss, using Soflmax function for classification.

The Batch Normalization (BN) operation speeds up the convergence of the model through transformation and reconstruction, greatly improves the speed of training, and improves the generalization ability of the network to suppress over-fitting. Before it can be used to activate the function, normalize the output data of the previous layer, so that the value of the different dimensions of the output has a mean value of 0 and a variance of 1.

The essence of pooling is sampling, which compresses the input feature map to reduce the feature map and simplify the calculation complexity of the network; it has better adaptation to a small range of pixel offsets, making the network more robust. Great. Commonly used pooling operations include Max Pooling, which searches for the maximum value in each area.

(2) Upsampling is the inverse process of the pooling operation. Through the index position recorded in the pooling layer, the data of the feature map can be returned to the corresponding position during the pooling operation, and 0 is added to other positions. value.

(2) U-net semantic segmentation

(3) The U-Net network structure is very simple. The first half is for feature extraction, and the second half is for upsampling. It can also be called an encoder-decoder structure. Since the overall structure of this network is similar to the uppercase English letter U, it is named U-Net. U-Net is very different from other common segmentation networks: U-Net uses a completely different feature fusion method: splicing. U-Net uses the feature to be spliced together in the channel dimension to form a thicker feature. The addition of corresponding points used in FCN fusion does not form a thicker feature.

(4) According to the structure of U-Net, it can combine the information of the bottom layer and the high layer. Bottom (deep) information: low-resolution information after multiple downsampling. It can provide the contextual semantic information of the segmentation target in the entire image, which can be understood as a feature that reflects the relationship between the target and its environment. This feature helps to determine the category of the object (so the classification problem usually only requires low resolution/deep information, and does not involve multi-scale fusion) high-level (shallow) information: directly passed from the encoder to the same height decoder through the concatenate operation High-resolution information. It can provide more refined features for segmentation, such as gradients. U-Net has many advantages. The biggest feature is that it can train a good model on a small data set. This advantage can shorten the process of labeling training samples for this project task. Moreover, U-Net is also very fast in training speed.

(5) U-Net's network structure diagram is shown in Figure 5. As can be seen in the figure, the original U-Net contains 18 3×3 convolutional layers, 1 1×1 convolutional layer, 4 2×2 down-sampling layers, and 4 2×2 up-sampling layers , Using ReLU as the activation function. Generally, the pooling operation will lose high-frequency components in the image, produce dull and blurred image blocks, and lose position information. In order to restore the structural features of the original image, U-Net uses a 4-hop connection method to connect the low-level and high-level feature maps. U-Net is actually a fully convolutional neural network, the input and output are images, and the fully connected layer is omitted. The shallower layer is used to solve the pixel positioning problem, and the deeper layer is used to solve the pixel classification problem.

(6) According to the standard convolutional neural network framework, the conversion is performed layer by layer. The last layer of the structure is a prediction output map of the same size as the original image, and each pixel in the output image is an integer value representing a category. Compared with the original U-Net structure, the network structure adopted in this embodiment has more convolutional layers, and batch standardization operations are performed before the convolutional layer and the deconvolutional layer. The maximum pooling is adopted, and the activation function is adopted It's ELU. The continuous operation of "batch normalization + convolution/deconvolution + ELU activation" in the network is called a "super convolution". The entire network is actually composed of a series of superconvolution, pooling, connection and finally pixel-level classification operations.

(7) In the convolution operation, the convolution filter size is 3×3×64, unit step size, zero padding; in the deconvolution operation, the filter size is 2×2×64, and the output size is the input size 2 times, the step size is 2, zero filling; in the pooling operation, the filter size is 2×2, and the step size is also 2. The weights of all filters are initialized with random values that obey the truncated Gaussian distribution, with zero mean and the variance set to 0.1. All offsets are initialized with 0.1m. It is worth noting that in the original U-Net, the filter depth is increased from 64 to 1024 layer by layer, while the network disclosed in this embodiment sets the filter depth to 64 uniformly. If you refer to the filter depth in the original U-Net, the network is not easy to converge, and the segmentation accuracy is low. The improved network of this embodiment has the following advantages:

① Both the number of categories and the number of features to be identified in the data set are small, and the information lost in the network pooling operation can be retrieved through "deconvolution" and "jump connection".

②Design a uniform number of filters, which can reduce time and space complexity.

③Using a deeper network can improve the accuracy of segmentation.

(3) A segmentation model based on the coupling of Faster RCNN and interactive digital matting

The model construction method is specifically as follows:

First, obtain the face image; then, the corresponding face label box position, picture and label file are divided into training set and test set in proportion; then, the processed picture set is sent to the convolutional neural network for training; In the process of feature extraction by the feature extraction module, the feature extraction network is based on the ZF network, and the region suggestion network is used to generate the region suggestion frame; at the same time, the Faster-RCNN network is used as the detection framework.

The RCNN method can be subdivided into four steps: candidate region generation, feature extraction, suggested region classification, and coordinate regression. Selective Search is used to generate candidate regions in RCNN, and then convolutional network is used for feature extraction. Finally, the extracted features are classified by SVM and the location is refined by regression network. In Fast RCNN, feature extraction, SVM, and regression network are combined into a convolutional neural network, which greatly improves the running speed. However, in Fast RCNN, convolution feature extraction is required for each candidate region, and there are a lot of repeated calculations. However, in Faster RCNN, the candidate region generation is also completed through a convolutional network, and the feature extraction network of the feature extraction part that generates the candidate region and the feature extraction network of the classification part are combined. In addition, Faster RCNN uses ROI pooling to map the position of the generated candidate region on the last feature layer, eliminating a lot of repeated calculations. From the perspective of network structure, Faster RCNN can be regarded as a combination of RPN network and Fast RCNN network.

In the process of detecting through the Faster-RCNN network, for the loss of the image, this project uses the loss function:

Among them, i is the index number of the suggestion box, and P _i is the probability that the suggestion box contains typical weather elements;

Calculated by manually marked labels. If the manually marked label contains typical face elements, it is 1, and if it does not contain it, it is 0; t _i is a four-dimensional vector representing the coordinates of the suggested frame, and

Is a four-dimensional vector representing the coordinates of the artificially labeled face elements (that is, the coordinates of the rectangular frame); the classification loss function is defined as:

Among them, the proposed box regression loss function L _{reg is} defined as:

Where R is the robust loss function smooth _{L1 is} defined as:

In order to obtain high-quality face matting, a cost function is introduced based on the smooth changes of foreground and background brightness, which demonstrates how to eliminate foreground and background brightness to obtain a quadratic cost function. The principle is as follows:

Assuming that the obtained face image is an image I composed of foreground brightness F and foreground brightness B, the image matting algorithm is used to process image I, that is, image I is input; the brightness of the i-th pixel is the corresponding foreground brightness and Combination of background brightness:

I _i =α _i F _i +(1-α _i )B _i

Among them, α _i is the opaque part of the pixel foreground.

In order to finally obtain a good cutout, the closed form scheme is used to extract the cutout from the face image in this embodiment; specifically, a cost function obtained from the local smoothing on the foreground brightness F and the background brightness B is:

I _i =α _i F _i +(1-α _i )B _i

From the expression of the result, the foreground brightness F and the background brightness B can be eliminated to generate a quadratic cost function of α; and the global optimal solution of the quadratic cost function can be obtained by solving the sparse linear equations; this embodiment Only need to directly calculate α without estimating foreground brightness F and background brightness B. At the same time, there is less user input, which can reduce the amount of calculation to a certain extent, and finally obtain high-quality cutouts; and use closed-form formulas for the obtained cutouts Check the eigenvector understanding of the sparse matrix and the characteristics of the prediction scheme.

Since this first derives a closed-form scheme of matting for grayscale images, and matting is a serious under-constrained problem, it is necessary to perform hypothetical operations on foreground brightness F, background brightness B and/or α.

Specifically, it is assumed that the foreground brightness F and the background brightness B are approximately constants on a small window near each pixel, and the foreground brightness F and the background brightness B are set to be locally smooth; in this embodiment, the foreground brightness F and the background brightness B are locally smoothed. Smoothing does not mean that the input image I is locally smooth, and α discontinuous means I is discontinuous; therefore, the formula (1-4) I _i = α _i F _i + (1-α _i ) B _i Rewrite to get α expressed as a linear function of image I:

in,

And W is the small image window.

Here, α, a, and b need to be solved. This project is solved by minimizing the cost function. The formula is as follows:

Among them, W _j is a small window near the pixel j. In addition, in order to ensure that the value obtained through the cost function is stable, this embodiment performs a regularization operation on a in the cost function.

Preferably, in this embodiment, a window of 3×3 pixels is used to implement the above operation. Specifically, a window is placed around each pixel, so that the windows W _j in the cost function overlap together to ensure the gap between adjacent pixels. The information overlaps to ensure that high-quality alpha matting is finally obtained; of course, the pixel window used is not limited and fixed in this embodiment, and the selection can be made according to the actual situation. In this way, since the cost function (1-6) is the quadratic function of α, a and b, in actual situations, for an image with N pixels, there are 3N unknowns in common; at this time, in order to obtain only N The unknown quadratic cost function, that is, the alpha value of the pixel, this embodiment eliminates a and b in the following manner.

In the embodiment, only the position of the face element can be located through the regional positioning based on deep learning. According to the above-mentioned positioning process using deep learning, it can be known that the background brightness B when α=0 and the foreground brightness when α=1 F; In this way, you can solve the following equation:

α=argminα ^T Lα, stα _i =s _i

s is the brush pixel set, s _i is the value pointed to by the brush to achieve the extraction operation of α. Specifically, a 3×3 window is used to define the Laplacian matting matrix. In other embodiments, the current scene brightness F When the background brightness B distribution is not very complicated, a wider window can be used; at the same time, in order to ensure that the calculation time of using a wider window is reduced, this embodiment adopts the linear coefficient of the α matte channel of the image I:

That is, the coefficient obtained at a finer resolution with a wider window is similar to that obtained from a smaller window on a coarse image, thereby calculating the linear coefficient of the coarse image; then the linear coefficient is interpolated, and the It is applied to a finer resolution image; the resulting alpha frosted channel is similar to that when a wider window is used to directly solve the matting system on a finer image, that is, the alpha value is obtained, and high-quality pictures are obtained.

The present invention uses three segmentation models, and the models also adopt different parameters for training and prediction, so that many prediction segmentation maps will be obtained, so that the recognition results of the three intelligent recognition models are the probability that each pixel is a face. Next, the method based on artificial intelligence determines the weight coefficients of the recognition results of these three recognition models. The method is: use artificial intelligence to learn and train the historical recognition accuracy of these three recognition models, obtain the weight of each recognition model through intelligent learning and training, and finally obtain the occurrence of a certain meteorological element on each pixel through weighted average. When the probability of occurrence is greater than a certain threshold (such as 80%), it is judged to be a face image, so as to obtain an accurate face contour image.

In a specific embodiment, referring to FIG. 6, step S3 specifically includes:

The following specifically describes the feature fusion residual network structure mentioned in this embodiment.

Feature Fusion Residual Network (FFRN) is based on the integration of super-resolution networks EDSR and WDSR. This network is suitable for sparse CT image reconstruction. The FFRN network architecture is shown in Figure 7. First of all, the proposal of EDSR and WDSR has made great progress in their related fields, and also provides important ideas for the direction of image reconstruction. Both use residual blocks. WDSR also improves the residual blocks to reduce network problems. Parameters, while increasing accuracy. However, neither of them makes full use of the feature information in the RB. Therefore, we propose to adopt RSDB as the shallow building module of the network. The local feature fusion layer is after the two-layer convolutional layer of the building module. RSDB skips to connect the local feature fusion layer in the two building modules, and the feature fusion result of the former module is used as the input of the latter module. Then stack the features after local feature fusion, and use residual learning to integrate feature information to form the basic network architecture.

It can be seen from Figure 8 that both EDSR and WDSR use up-sampling (pixel shuffl) at the end of the network. This method can reduce calculations without loss of model capacity and greatly increase the operating speed. The new up-sampling method (pixel shuffle) adopted by WDSR has little effect on network accuracy. The zooming operation of the image will not increase the information of the image, so the quality of the image will inevitably decrease, and the feature information will also be affected. The medical image correction task is to predict dense pixels, which is very sensitive to the amount of feature information. Therefore, the FFRN network chooses to abandon the up-sampling method, and the image size in the network remains unchanged for end-to-end learning.

When convolution is used to extract image features, the size of the convolution kernel will determine the receptive field of the convolution and will also affect the parameters of the model. In order to reduce computational overhead and parameters, WDSR-B chooses to increase the number of convolution kernels before the ReLU activation layer and reduce the number of convolution kernels after the ReLU activation layer. WDSR-A uses a 3×3 size convolution kernel before and after the activation layer, while WDSR-B uses a 1×1 size convolution kernel before and after the ReLU activation layer to further expand the number of channels before the activation layer to obtain a broader feature map, as shown in the figure 9 shown. When WDSR-B trains a deep neural network with RB as the network building module. After the network reaches a certain depth, the accuracy is not significantly improved. Its CT image artifact removal performance is not even as good as WDSR-A. Therefore, we all use 3×3 small convolution kernels in the RSDBs we proposed. This form can increase the convolution receptive field, while avoiding the use of large convolution kernels to extract too many meaningless features. Splitting the 3×3 convolution kernel into 3×1 and 1×3 convolution kernels has the same effect as the 3×3 convolution kernel and speeds up the operation. The last layer of the network is a fully connected layer, which is coupled with the output ECG curve.

In this embodiment, since the R wave is the most obvious among all the information bands of the ECG signal, the heart rate of the subject can be calculated by detecting the number of appearances of the R wave per minute.

In a specific embodiment, referring to FIG. 10, step S5 specifically includes:

S501: Build a deep learning network, select multiple groups of different experimenters' heart rate and body temperature corresponding data under the same conditions, train the deep learning network to obtain a heart rate and body temperature conversion model;

In some embodiments, the body temperature data can also be estimated based on the existing corresponding relationship between heart rate and body temperature. The method is as follows:

1) Calculate the difference between the obtained heart rate value of the test subject and the normal heart rate benchmark to obtain the heart rate difference;

2) Calculate the body temperature difference according to the obtained heart rate difference and the transformation relationship between heart rate and body temperature;

3) Calculate the sum of the obtained body temperature difference value and the normal body temperature reference to obtain the body temperature value of the test subject.

As the body temperature of the human body increases by 1℃, the heart rate will increase by 10 beats/min. The heart rate of a normal person in a calm state is generally between 60-90 beats/min. From this, the approximate conversion relationship between body temperature and heart rate can be obtained, and then it can be passed The heart rate value estimates the body temperature value.

In summary, compared with the prior art, the method provided by the embodiment of the present invention has the following advantages:

1. Remove the interference caused by light by performing color correction on the original video image;

2. This method uses the deep learning method to obtain the ECG curve, no need to locate the key parts of the face, just input the face contour image into the built model to get the ECG curve, the whole measurement process is simple and convenient ；

3. Based on the measured heart rate, the heart rate value can be further used to calculate the body temperature data of the person under test. The whole method not only greatly improves the measurement accuracy, but also has more complete functions, which can better meet the actual heart rate and body temperature measurement needs.

The various embodiments in this specification are described in a progressive manner. Each embodiment focuses on the differences from other embodiments, and the same or similar parts between the various embodiments can be referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method part.

The above description of the disclosed embodiments enables those skilled in the art to implement or use the present invention. Various modifications to these embodiments will be obvious to those skilled in the art, and the general principles defined herein can be implemented in other embodiments without departing from the spirit or scope of the present invention. Therefore, the present invention will not be limited to the embodiments shown in this document, but should conform to the widest scope consistent with the principles and novel features disclosed in this document.

Claims

A camera-based non-contact heart rate and body temperature measurement method, which is characterized in that it includes:

S1: Under ordinary visible light conditions, collect video images of the facial area of the subject through the camera, and perform color correction on the collected video images;

S2: Perform face recognition on each frame of video image after color correction, and intercept the face contour image from the recognized face area;

S3: Perform deep learning on the face contour images intercepted in a continuous video image, and obtain the ECG curve;

S4: Eliminate baseline drift and strengthen R wave processing on the obtained ECG curve, and calculate the heart rate value of the subject by calculating the number of R wave appearances per minute;

S5: Calculate the body temperature of the subject according to the relationship between the normal human heart rate benchmark and the obtained heart rate value.
A camera-based non-contact heart rate and body temperature measurement method according to claim 1, wherein in step S1, performing color correction on the collected video image specifically includes:

S101: Establish an achromatic model, assuming that the average image is achromatic;

S102: Obtain the RGB value of each frame of video image, and substitute the RGB value of each frame of video image into the achromatic model to perform color correction.
A camera-based non-contact heart rate and body temperature measurement method according to claim 2, wherein the achromatic model is:

in,
Is the corrected color component, k is the scale factor, and the value is:

Where, V=2 N -1, 0<N<225.
A camera-based non-contact heart rate and body temperature measurement method according to claim 1, wherein step S2 specifically includes:

S201: Build a SegNet semantic segmentation model, a U-net semantic segmentation model, and a semantic segmentation model coupled with Faster-RCNN and digital matting;

S202: Use the constructed SegNet semantic segmentation model, U-net semantic segmentation model, and the semantic segmentation model coupled with Faster-RCNN and digital matting to perform face recognition and semantic segmentation on each frame of video image after color correction. Obtain three sets of recognition results;

S203: Perform a weighted average of the obtained three sets of recognition results to obtain a final face contour image.
A camera-based non-contact heart rate and body temperature measurement method according to claim 1, wherein step S3 specifically includes:

S301: Construct a feature fusion residual network, select ECG images obtained by multiple testers wearing ECG acquisition equipment and face contour images obtained after processing at step S2 at the same time as the test set, and fuse the features The residual network is trained to obtain the ECG detection model;

S302: Input the face contour image in a segment of view image obtained in step S2 into the ECG detection model, and output the ECG curve.
A camera-based non-contact heart rate and body temperature measurement method according to claim 1, wherein step S5 specifically includes:

S501: Construct a deep learning network, select multiple groups of different experimenters' heart rate and body temperature corresponding data under the same conditions, train the deep learning network to obtain a heart rate and body temperature conversion model;

S502: Input the obtained heart rate value of the person to be measured into the heart rate and body temperature conversion model, and output the body temperature value of the person to be measured.