WO2021238344A1 - 一种基于视频的人体心率及面部血容积精确检测方法和系统 - Google Patents

一种基于视频的人体心率及面部血容积精确检测方法和系统 Download PDF

Info

Publication number
WO2021238344A1
WO2021238344A1 PCT/CN2021/080905 CN2021080905W WO2021238344A1 WO 2021238344 A1 WO2021238344 A1 WO 2021238344A1 CN 2021080905 W CN2021080905 W CN 2021080905W WO 2021238344 A1 WO2021238344 A1 WO 2021238344A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
roi
heart rate
signal
sig
Prior art date
Application number
PCT/CN2021/080905
Other languages
English (en)
French (fr)
Inventor
鲍虎军
徐晓刚
王小龙
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Publication of WO2021238344A1 publication Critical patent/WO2021238344A1/zh
Priority to US17/696,909 priority Critical patent/US20220218218A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/02Detecting, measuring or recording pulse, heart rate, blood pressure or blood flow; Combined pulse/heart-rate/blood pressure determination; Evaluating a cardiovascular condition not otherwise provided for, e.g. using combinations of techniques provided for in this group with electrocardiography or electroauscultation; Heart catheters for measuring blood pressure
    • A61B5/024Detecting, measuring or recording pulse rate or heart rate
    • A61B5/02416Detecting, measuring or recording pulse rate or heart rate using photoplethysmograph signals, e.g. generated by infrared radiation
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/02Detecting, measuring or recording pulse, heart rate, blood pressure or blood flow; Combined pulse/heart-rate/blood pressure determination; Evaluating a cardiovascular condition not otherwise provided for, e.g. using combinations of techniques provided for in this group with electrocardiography or electroauscultation; Heart catheters for measuring blood pressure
    • A61B5/026Measuring blood flow
    • A61B5/0295Measuring blood flow using plethysmography, i.e. measuring the variations in the volume of a body part as modified by the circulation of blood therethrough, e.g. impedance plethysmography
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7232Signal processing specially adapted for physiological signals or for diagnostic purposes involving compression of the physiological signal, e.g. to extend the signal recording period
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/725Details of waveform analysis using specific filters therefor, e.g. Kalman or adaptive filters
    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B5/00Measuring for diagnostic purposes; Identification of persons
    • A61B5/72Signal processing specially adapted for physiological signals or for diagnostic purposes
    • A61B5/7235Details of waveform analysis
    • A61B5/7264Classification of physiological signals or data, e.g. using neural networks, statistical classifiers, expert systems or fuzzy systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/0002Inspection of images, e.g. flaw detection
    • G06T7/0012Biomedical image inspection
    • G06T7/0014Biomedical image inspection using an image reference approach
    • G06T7/0016Biomedical image inspection using an image reference approach involving temporal comparison
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/62Extraction of image or video features relating to a temporal dimension, e.g. time-based feature extraction; Pattern tracking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/15Biometric patterns based on physiological signals, e.g. heartbeat, blood flow
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/171Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H30/00ICT specially adapted for the handling or processing of medical images
    • G16H30/40ICT specially adapted for the handling or processing of medical images for processing medical images, e.g. editing
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10016Video; Image sequence
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30076Plethysmography
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30004Biomedical image processing
    • G06T2207/30088Skin; Dermal
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Definitions

  • the invention relates to the use of a camera to collect human facial video, based on image processing, deep learning and signal processing technology, to accurately detect human heart rate and facial blood volume distribution.
  • Human heart rate and facial blood volume distribution are important indicators to measure the physical health of the human body.
  • the main methods for measuring human heart rate are electrocardiographic signal detection, photoelectric signal detection and other methods.
  • the common feature of these detection methods is that the detection device needs to be close to the human skin, and the heart rate can be detected through the skin potential change signal or the blood volume signal.
  • the limitation that the measurer needs to wear the sensor limits the scope of application of this measurement method.
  • the detection of remote human physiological indicators through the camera has become a hot spot in current research. Due to the complexity of the external environment, it is easy to cause interference to the remote detection method. In order to eliminate the above interference, wavelet decomposition and independent component analysis are usually used alone or in combination.
  • Signal decomposition methods such as (ICA), Principal Component Analysis (PCA), and Hilbert Yellow Transform (HHT) remove noise.
  • ICA Principal Component Analysis
  • PCA Principal Component Analysis
  • HHT Hilbert Yellow Transform
  • the present invention adopts deep learning technology, spectrum analysis and related calculation methods to detect human heart rate and facial blood volume distribution, and adopts Kalman filter algorithm to fuse the heart rate detection results to realize the detection of human body Accurate detection of heart rate.
  • the present invention proposes a new detection method and system for human heart rate and facial blood volume distribution. Processing, by means of model prediction and signal processing, the detection of human heart rate and facial blood volume distribution is achieved.
  • a video-based method for accurate detection of human heart rate and facial blood volume includes the following steps:
  • step (2) Fusion of the heart rate results of step (2) and step (3) based on the Kalman filter method to obtain the fusion heart rate detection result.
  • the invention also discloses a video-based accurate detection system for human heart rate and facial blood volume, which is characterized by including:
  • An image detection module which is used to detect the human face area in the video frame image, extract the face image sequence and the key position points of the face in the time dimension; extract the overall face signal and the face roi signal set based on the face image sequence;
  • Preprocessing module which preprocesses the overall facial signal and facial ROI signal extracted by the image detection module
  • Spectrum-based heart rate calculation module which is based on the pre-processed facial roi signal set, uses linear weighting to calculate the reference signal, and calculates the reference signal spectrum, and obtains the heart rate value according to the spectrum peak value, and according to the reference signal spectrum and the facial roi signal spectrum Calculate facial blood volume distribution;
  • Multi-modal heart rate detection model which is constructed based on LSTM and residual convolutional neural network model, used to obtain the predicted heart rate value based on the probability of the heart rate distribution;
  • the fusion module obtains the fusion heart rate value detection result according to the heart rate value of the spectrum-based heart rate calculation module and the predicted heart rate value of the multi-modal heart rate detection model.
  • the present invention has the following advantages:
  • the integrated heart rate detection method improves the anti-interference ability and heart rate detection accuracy of the detection process.
  • the heart rate is detected based on the signal spectrum peak method, and the global face detection and the face roi sub-block detection are combined to improve the method. Detection ability. In practical applications, this method has poor robustness, and factors such as human face movement or external light intensity changes will have a greater impact on the detection results. Therefore, a multi-modal deep learning model is used to predict the heart rate value of the tested object.
  • This detection method is based on the principle of statistical learning and realizes heart rate estimation according to the time-frequency characteristics of the signal. On this basis, the Kalman filter method is used to integrate the above two measurements. As a result, the robustness and detection accuracy of heart rate detection are improved.
  • An estimation method of facial blood volume based on heart rate A method for estimating facial blood volume distribution based on heart rate value is proposed. By comparing the actual facial blood volume distribution with experimental results, it can be seen that the estimated facial blood volume distribution based on heart rate value is consistent with the actual facial blood volume distribution.
  • a heart rate detection method based on multi-modal deep learning technology and facial video data is proposed.
  • a deep learning model based on CNN and LSTM structure is used to analyze the spatial structure characteristics and time series characteristics of the data to achieve rapid detection of human heart rate .
  • techniques such as adding face shaking, light and dark change samples in the training sample set are used to improve the model's anti-interference ability.
  • FIG. 1 Structure diagram of heart rate detection model
  • FIG. 1 Schematic diagram of key areas of the face
  • FIG. 1 it is a schematic diagram of the process of the present invention.
  • the present invention is implemented in the following specific steps:
  • the pre-processing method is not limited to band-pass filtering and other methods.
  • the convolutional network model is used to detect the face area and key points of the face in the video frame image, and generate the face image sequence and the face key position point sequence in the time dimension, as shown in formula 1, where: MTCNN() is the volume In the product network model, frame i is the i-th frame image of the video, face i is the face image extracted from the i-th frame image of the video, and critical_pos i is the key position point corresponding to the face image.
  • the face image sequence is shown in formula 2, where: face_seq is the face image sequence, face i is the face image corresponding to the i-th frame of video, and T is the length of the video frame sequence.
  • the face image is divided into roi sub-blocks of R ⁇ R size to obtain the roi sub-block image sequence in the time dimension, as shown in formula 4, where: face_roi i represents the i-th roi sub-block image Sequence, face_roi_seq is the set of all roi sub-block image sequences.
  • face_roi_seq ⁇ face_roi 1 ,face_roi 2 ,...,face_roi i ,...,face_roi m ⁇ n ⁇ (4)
  • each sub-block image sequence is compressed, as shown in formula 5, where: face_roi_seq is the set of all roi sub-block image sequences, and PCompress() is the compression function used to calculate each roi image in the set
  • face_roi_sig is the signal set obtained after compression, and each element is the signal obtained by the compression of the roi sub-block image sequence.
  • face_roi_sig PCompress(face_roi_seq) (5)
  • face_roi_sig ⁇ face_roi_sig 1 ,...,face_roi_sig i ,...,face_roi_sig m ⁇ n ⁇ (6)
  • face_roi_sig i is the compressed signal corresponding to the i-th roi sub-block image sequence, and m ⁇ n is the number of roi sub-blocks.
  • face_sig_r sigprocess(face_sig) (7)
  • face_sig_r ⁇ face_sig_r 1 ,...,face_sig_r i ,...,face_sig_r T ⁇
  • roi_sig_r ⁇ roi_sig_r 1 ,...,roi_sig_r i ,...,roi_sig_r m ⁇ n ⁇
  • T is the number of video frames
  • m ⁇ n is the number of roi sub-blocks.
  • step (2) Calculate the heart rate value and facial blood volume distribution. Based on step (1), the overall facial signal and the facial roi signal set are calculated, and on this basis, the facial blood volume distribution is detected.
  • weight_set ⁇ w 1 ,w 2 ,...,w i ,...,w m ⁇ n ⁇
  • weight_set is the weight set
  • m ⁇ n is the number of roi sub-blocks.
  • (2.2) Calculate the heart rate value based on the reference signal.
  • the calculation process is shown in formulas 11 and 12, where: sig_ref is the reference signal, sig_ref_sd is the reference signal spectrum, and heart_rate_ref is the heart rate value, which corresponds to the peak value of the spectrum.
  • the signal spectrum calculation is not limited to the lomb-scargle spectrum analysis method.
  • heart_rate_ref max_freq(sig_ref_sd) (12)
  • Volume() is a function for calculating blood volume, and its specific form is shown in formula 14.
  • fs ref is the frequency spectrum of the reference signal
  • fs roi is the frequency spectrum of the face roi signal
  • It is a convolution operator
  • m and n are the maximum values of the number of roi sub-blocks on the abscissa and ordinate respectively.
  • step (1.1) Construct a heart rate detection model based on deep learning methods. Based on the key facial location points extracted in step (1.1), the image sequence containing the forehead and cheeks is used as training samples, and a multi-modal heart rate detection model is constructed based on the LSTM and residual convolutional neural network (Resnet) model.
  • Resnet residual convolutional neural network
  • step (3.1) Extraction of training samples. Based on step (1.1), extract the key position points of the face and form a sequence of key points in the time dimension, as shown in formula 15, where: critical_pos i is the set of key position points of the face in the i-th frame of video image, img i It is the i-th frame of video image.
  • Equation 16 the set form of critical_pos i is shown in Equation 16, and k is the number of key points on the face.
  • sig_c ⁇ sig_c 1 ,sig_c 2 ,...,sig_c i ,...,sig_c T ⁇
  • sig_c is the signal set obtained after image sequence compression
  • T is the video length
  • step (3.3) Construct a heart rate detection sub-model based on LSTM (Long Short-term Memory Network) architecture.
  • This sub-model mainly includes two network structures: 1D-CNN (1D Convolutional Neural Network) and LSTM.
  • 1D-CNN 1D Convolutional Neural Network
  • LSTM LSTM
  • LSTM() is the heart rate detection model based on the LSTM architecture
  • sig_nor is the normalized signal obtained in step (3.2)
  • feature lstm is the output feature vector of the sub-model.
  • (3.4) Construct a heart rate detection sub-model based on the Resnet architecture.
  • the sub-model is mainly based on the residual network model (Resnet) to extract the time-domain waveform characteristics of the signal, and the sig_nor signal is used as the input sample of the sub-model.
  • the output feature vector of the sub-model is shown in Equation 20, where: Resnet is based on the Resnet architecture
  • the heart rate detection model of, sig_nor is the normalized signal obtained in step (3.2), and feature resnet is the output feature vector of the sub-model.
  • heart_rate_pre is the predicted heart rate value
  • mean() is the mean value function
  • max_reg() is the heart rate range function corresponding to the maximum probability value.
  • heart_rate_pre mean(max_reg(res_pro)) (22)
  • the Kalman filter model is shown in formulas 23 and 24, where: x k and z k are the predicted value and the measured value, respectively, A and B are the state matrix and the control matrix, respectively, H is the conversion matrix from the prediction space to the measurement space, w k-1 and v k are the prediction error and the measurement error, respectively.
  • x 'k is the result of integration of heart rate
  • x k of the step (3) obtained predicted heart rate
  • z k of the step (2) obtained in Heart rate value
  • K is the fusion coefficient
  • H indicates the conversion matrix from the prediction space to the measurement space
  • H 1 in the heart rate measurement work.
  • Pk is the prediction variance, which corresponds to the prediction probability value in step (3).
  • Rk is the measurement variance, which corresponds to the signal-to-noise ratio of the reference signal described in step (2.3).
  • the present invention discloses a video-based accurate detection system for human heart rate and facial blood volume, which is used to implement the method of the present invention, which includes:
  • An image detection module which is used to detect the human face area in the video frame image, extract the face image sequence and the key position points of the face in the time dimension; extract the overall face signal and the face roi signal set based on the face image sequence;
  • Preprocessing module which preprocesses the overall facial signal and facial ROI signal extracted by the image detection module
  • Spectrum-based heart rate calculation module which is based on the pre-processed facial roi signal set, uses linear weighting to calculate the reference signal, and calculates the reference signal spectrum, and obtains the heart rate value according to the spectrum peak value, and according to the reference signal spectrum and the facial roi signal spectrum Calculate facial blood volume distribution;
  • Multi-modal heart rate detection model which is constructed based on LSTM and residual convolutional neural network model, used to obtain the predicted heart rate value based on the probability of the heart rate distribution;
  • the fusion module obtains the fusion heart rate value detection result according to the heart rate value of the spectrum-based heart rate calculation module and the predicted heart rate value of the multi-modal heart rate detection model.
  • FIG 2 it is the structure diagram of the multi-modal heart rate detection model of the present invention; the structure diagram of the multi-modal heart rate detection model.
  • the left sub-picture (Resnet) is the CNN network model part, which is responsible for detecting the spatial structure characteristics of the data
  • the right sub-picture (LSTM) is the LSTM network model part, which is responsible for detecting the time series characteristics of the data, and integrates the output of the above two sub-network models.
  • Fig. 3 is a schematic diagram of the key areas of the face; the key areas of the face in the present invention refer to the forehead and cheeks in the figure.
  • FIG. 4 Facial blood volume detection result 1. Keep the head in a stable state. From the results, it can be seen that the blood volume is not detected in the forehead concealed part and the poorly illuminated part of the cheek, which is in line with the experimental expectations;
  • Figure 5 Facial blood volume detection result 2. Using a lying position to stabilize the head and keeping the light even, it can be seen from the results that the whole facial blood volume detection result is uniform, which meets the expectations of the experiment;

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Public Health (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Pathology (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Physiology (AREA)
  • Data Mining & Analysis (AREA)
  • Animal Behavior & Ethology (AREA)
  • Surgery (AREA)
  • Veterinary Medicine (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Cardiology (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Psychiatry (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Epidemiology (AREA)
  • Radiology & Medical Imaging (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Primary Health Care (AREA)
  • Hematology (AREA)
  • Quality & Reliability (AREA)

Abstract

一种基于视频的心率及脸部血液容积精确检测方法和系统,首先,对包含人面部的视频帧图像进行人脸检测,并提取时间维度上的人脸图像序列以及面部关键位置点,得到时间维度上面部总体信号和面部roi信号集合;其次,构建并训练心率预测模型,并根据所提取的面部关键位置点,定位并提取额头、面颊位置的数据,得到时间维度上面部关键位置图像序列,对该序列进行压缩得到时间维度上的面部信号,将该信号作为心率预测模型的输入样本;再次,基于面部总体信号和面部roi信号集合,检测面部血液容积分布;最后,分别采用心率预测模型和频谱分析方法检测心率值,融合检测结果,使检测方法具备了很强的鲁棒性,适用于较为复杂的应用场景。

Description

一种基于视频的人体心率及面部血容积精确检测方法和系统 技术领域
本发明涉及利用摄像头采集人面部视频,基于图像处理、深度学习以及信号处理技术,精确检测人体心率和面部血液容积分布。
背景技术
人体心率和面部血液容积分布是衡量人体生理健康程度的重要指标。目前,测量人体心率主要方式是心电信号检测、光电信号检测等手段,这些检测手段的共同特征是需要检测设备紧贴人体皮肤,通过皮肤电位变化信号或者血液容积信号对心率进行检测,但是被测量者需要佩戴传感器这一局限,限制了该测量方式的适用范围。目前,通过摄像头进行远端人体生理指标检测成为了当前研究的热点,由于外界环境的复杂性容易对远端检测方式容易造成干扰,为了消除上述干扰,通常单独或者综合采用小波分解、独立成分分析(ICA)、主成分分析(PCA)以及希尔伯特黄变换(HHT)等信号分解方法去除噪声,然而在外界噪声较强时,依靠信号分解方法不能很好的消除掉噪声影响,主要基于以下2点原因:1、信号分解模型往往是通用分解算法,未引入人体心率生理特征这一先验信息;2、对分解结果的选择上依赖主观判断,即从分解结果中选择最接近心率特征的信号,缺乏客观依据。为了提高检测方法的鲁棒性和精确性,本发明采用深度学习技术、频谱分析以及相关计算方法对人体心率和面部血液容积分布进行检测,并采用卡尔曼滤波算法融合心率检测结果,实现对人体心率的精确检测。
发明内容
为了提高人体心率和面部血液容积分布检测效益,本发明提出了一种新的人体心率和面部血液容积分布检测方法和系统,该方法基于摄像头采集的人脸视频,对视频中面部数据进行分析、处理,通过模型预测以及信号处理的方式,实现人体心率及面部血液容积分布检测。
本发明通过以下技术方案来实现:一种基于视频的人体心率及面部血容积精 确检测方法,该方法包括以下步骤:
(1)检测视频帧图像中人面部区域,提取时间维度上人脸图像序列和面部关键位置点;基于人脸图像序列提取面部总体信号和面部roi信号集合;对信号进行预处理;
(2)基于预处理后的面部roi信号集合,计算心率值和面部血液容积分布;
(3)利用基于LSTM和残差卷积神经网络模型构建的多模态心率检测模型;得到基于心率分布概率的预测心率值;
(4)基于卡尔曼滤波方法融合步骤(2)和步骤(3)的心率结果,得到融合心率检测结果。
本发明还公开了一种基于视频的人体心率及面部血容积精确检测系统,其特征在于包括:
图像检测模块,其用于检测视频帧图像中人面部区域,提取时间维度上人脸图像序列和面部关键位置点;基于人脸图像序列提取面部总体信号和面部roi信号集合;
预处理模块,对图像检测模块提取的面部总体信号和面部roi信号进行预处理;
基于频谱的心率计算模块,其基于预处理后的面部roi信号集合,采用线性加权的方式计算参考信号,并计算参考信号频谱,根据频谱峰值得到心率值,并根据参考信号频谱和面部roi信号频谱计算面部血液容积分布;
多模态心率检测模型,其基于LSTM和残差卷积神经网络模型构建,用于得到基于心率分布概率的预测心率值;
融合模块,根据基于频谱的心率计算模块的心率值和多模态心率检测模型的预测心率值,得到融合心率值检测结果。
与现有技术相比,本发明所具有的优点是:
1)基于融合方法提高了心率测量的鲁棒性和检测精度。采用综合心率检测方式提升了检测过程的抗干扰能力和心率检测精度,首先采用基于信号频谱峰值方式检测心率,综合采用脸部全局检测和脸部roi子块检测相结合的方式,提升该方式的检测能力,在实际应用中该方法鲁棒性较差,人面部移动或者外部光照强度变化等因素都会对检测结果造成较大影响。因此,采用多模态深度学习模型预测被测试对象的心率值,该检测方式基于统计学习原理,根据信号时频特征实 现心率估计,在此基础上,采用卡尔曼滤波方式,综合上述两种测量结果,提升心率检测的鲁棒性和检测精度。
2)基于心率的面部血容积估计方法。提出了一种基于心率值估计面部血容积分布的方法,通过对比人面部血容积实际分布和实验结果,可知基于心率值的面部血容积分布估计结果符合人脸部实际血容积分布。
3)基于机器学习方法心率值快速检测。提出了一种基于多模态深度学习技术和面部视频数据的心率检测方法,采用基于CNN和LSTM结构的深度学习模型对数据的空间结构特征和时间序列特征进行分析,实现对人体心率的快速检测。同时,采用在训练样本集中增加人脸晃动、光照明暗变化样本等技巧,提高模型的抗干扰能力。
附图说明
图1本发明方法流程图;
图2心率检测模型结构图;
图3脸部关键区域示意图;
图4面部血液容积检测结果1;
图5面部血液容积检测结果2;
图6面部血液容积检测结果3。
具体实施方式
下面结合附图和具体实施例对本发明作进一步详细说明。
如图1所示,为本发明的流程示意图。在本发明的一个具体实施例中,本发明以如下具体步骤实施:
(1)数据提取及预处理。基于人脸检测模型,提取时间维度上人脸图像序列,并对人脸图像序列进行压缩,基于此,提取面部总体信号和面部roi(region of interesting,roi)信号集合,并对信号进行预处理,预处理所采用方法不限于带通滤波等方法。
(1.1)采用卷积网络模型检测视频帧图像中人面部区域和面部关键点,分别生成时间维度上人脸图像序列和面部关键位置点序列,如公式1所示,其中: MTCNN()为卷积网络模型,frame i为视频第i帧图像,face i为视频第i帧图像提取得到的人脸图像,critical_pos i为人脸图像对应的关键位置点。
face i,critical_pos i=MTCNN(frame i)    (1)
人脸图像序列,如公式2所示,其中:face_seq为人脸图像序列,face i为第i帧视频对应的人脸图像,T为视频帧序列长度。
face_seq={face 1,face 2,...,face i,...,face T}    (2)
(1.2)基于上述人脸图像序列,分别提取面部总体信号和面部roi(region of interesting,roi)信号集合。面部总体信号计算如公式3所示,其中:face_sig为压缩后的信号,PCompress()为压缩函数,用于计算人脸图像序列中每一幅人脸图像的平均像素强度,face_seq为人脸图像序列。
face_sig=PCompress(face_seq)    (3)
为了便于分析信号的分布,采用R×R大小的roi子块划分人脸图像,得到时间维度上的roi子块图像序列,如公式4所示,其中:face_roi i表示第i个roi子块图像序列,face_roi_seq为所有roi子块图像序列构成的集合。
face_roi_seq={face_roi 1,face_roi 2,...,face_roi i,...,face_roi m×n}    (4)
在此基础上,对每个子块图像序列进行压缩,如公式5所示,其中:face_roi_seq为所有roi子块图像序列构成的集合,PCompress()为压缩函数,用于计算集合中每一个roi图像序列对应的时间维度上的平均像素强度信号,face_roi_sig为压缩后得到的信号集合,其中每一个元素为roi子块图像序列压缩得到的信号。
face_roi_sig=PCompress(face_roi_seq)    (5)
其中:
face_roi_sig={face_roi_sig 1,...,face_roi_sig i,...,face_roi_sig m×n}    (6)
公式6中,face_roi_sig i为第i个roi子块图像序列对应的压缩后的信号,m×n为roi子块数量。
(1.3)信号预处理,对面部总体信号和面部roi信号集合进行预处理,预处理所采用方法不限于带通滤波方法,如公式7、8所示。其中:face_sig_r和face_roi_sig_r分别为信号预处理结果,sigprocess为信号预处理函数。
face_sig_r=sigprocess(face_sig)    (7)
roi_sig_r=sigprocess(face_roi_sig)    (8)
其中:
face_sig_r={face_sig_r 1,...,face_sig_r i,...,face_sig_r T}
roi_sig_r={roi_sig_r 1,...,roi_sig_r i,...,roi_sig_r m×n}
式中,T为视频帧数量,m×n为roi子块数量。
(2)计算心率值和面部血液容积分布。基于步骤(1)计算得到面部总体信号和面部roi信号集合,在此基础上,检测面部血液容积分布。
(2.1)采用线性加权的方式计算参考信号,如公式9所示,其中:sig_ref为参考信号,roi_sig_r为面部roi信号集合。
Figure PCTCN2021080905-appb-000001
weight_set={w 1,w 2,...,w i,...,w m×n}
其中:weight_set为权重集合,m×n为roi子块数量。
(2.2)基于参考信号,计算心率值。计算过程如公式11、12所示,其中:sig_ref为参考信号,sig_ref_sd为参考信号频谱,heart_rate_ref为心率值,该心率值对应于频谱峰值。信号频谱计算不仅限于lomb-scargle谱分析方法。
sig_ref_sd=fft(sig_ref)    (11)
heart_rate_ref=max_freq(sig_ref_sd)    (12)
(2.3)面部血液容积分布计算。如公式13所示,其中:sig_ref_sd为参考信号频谱,v为计算得到的血液容积分布。其中,血容积计算所采用数据不仅限于参考信号频谱。
v=Volume(sig_ref_sd)    (13)
其中,Volume()为计算血容积函数,其具体形式如公式14所示。
Figure PCTCN2021080905-appb-000002
公式14中,fs ref为参考信号频谱,fs roi为面部roi信号频谱,
Figure PCTCN2021080905-appb-000003
为卷积算子,m、n分别为roi子块数量在横纵坐标上的最大值。
(3)基于深度学习方法构建心率检测模型。基于步骤(1.1)提取得到的面部关键位置点,将包含额头、面颊部分图像序列作为训练样本,并基于LSTM和残差卷积神经网络(Resnet)模型构建了多模态心率检测模型。
(3.1)训练样本提取。基于步骤(1.1)提取得到的人脸关键位置点,并构成时间维度上的关键点序列,如公式15所示,其中:critical_pos i为第i帧视频图像中人脸关键位置点集合,img i为第i帧视频图像。
face i,critical_pos i=MTCNN(img i)    (15)
其中,critical_pos i的集合形式如式16所示,k为人脸关键点数量。
Figure PCTCN2021080905-appb-000004
基于人脸关键位置点,选取额头、左右面颊区域在时间维度上构成的图像序列,在图像空间维度上压缩选取的图像构建训练样本,如公式17所示,其中:sig_c i为第i帧图像压缩后的结果,img_c i为上述关键位置图像序列中第i帧图像,PCompress()为压缩函数。
sig_c i=PCompress(img_c i)    (17)
其中:
sig_c={sig_c 1,sig_c 2,...,sig_c i,...,sig_c T}
式中,sig_c为图像序列压缩后得到的信号集合,T为视频长度。
(3.2)初始化训练样本数据,如公式18所示,其中:sig_nor为规范化后信号,mean()为均值计算函数,var()为方差计算函数,样本数据初始化不仅限于所述方法。
Figure PCTCN2021080905-appb-000005
(3.3)构建基于LSTM(长短时记忆网络)架构的心率检测子模型。该子模型主要包含了1D-CNN(1维卷积神经网络)和LSTM两种网络结构。首先,将步骤(3.2)得到的sig_nor信号作为训练样本,基于1D-CNN模型提取sig_nor信号对应的初步特征,在此基础上,采用LSTM结构提取信号对应的时间序列特征,最后,采用注意力机制融合LSTM模型各个阶段输出特征向量,如公式19所示,其中:LSTM()为基于LSTM架构的心率检测模型,sig_nor为步骤(3.2)得到规范化信号,feature lstm为该子模型输出特征向量。
feature lstm=LSTM(sig_nor)    (19)
(3.4)构建基于Resnet架构的心率检测子模型。该子模型主要基于残差网络模型(Resnet)提取信号的时域波形特征,将sig_nor信号作为子模型的输入样本,该子模型的输出特征向量如公式20所示,其中:Resnet为基于Resnet架构的心率检测模型,sig_nor为步骤(3.2)得到规范化信号,feature resnet为该子模型输出特征向量。
feature resnet=Resnet(sig_nor)    (20)
(3.5)融合步骤(3.3)和(3.4)所述子模型,构建多模态心率检测模型。串联步骤(3.3)和步骤(3.4)中子模型的输出特征,并采用全连接网络(FCN)的结构形式预测心率。预测基本过程如公式21所示,其中:res_pro为模型预测结果向量,FCN()为全连接层,Concat()为向量合并函数。
res_pro=FCN(Concat(feature lstm,feature resnet))    (21)
在此基础上预测心率值,提取心率值基本过程如公式22所示,其中:heart_rate_pre为心率预测值,mean()为均值函数,max_reg()为查找最大概率值对应的心率范围函数。
heart_rate_pre=mean(max_reg(res_pro))    (22)
(4)基于卡尔曼滤波方法融合心率检测结果。基于步骤(2)、(3)计算得到的心率值,以信号质量评估值和深度学习模型估值作为动态调节卡尔曼滤波器的状态变量,用于动态融合两种心率测量方式的结果,从而获取心率检测值的最佳估计,提高心率检测的鲁棒性。
卡尔曼滤波模型如公式23、24所示,其中:x k、z k分别为预测值和测量值,A、B分别为状态矩阵和控制矩阵,H为预测空间到测量空间的转换矩阵,w k-1、v k分别为预测误差和测量误差。
x k=Ax k-1+Bu k+w k-1    (23)
z k=Hx k+v k    (24)
基于公式25、26融合两种测量方式得到的心率值,其中:x′ k为融合后的心率值结果,x k为步骤(3)得到的预测心率值,z k为步骤(2)得到的心率值,K为融合系数,H表示预测空间到测量空间的转换矩阵,在心率测量工作中H=1。Pk为预测方差,其对应于步骤(3)中预测概率值。Rk为测量方差,对应于步骤 (2.3)所述参考信号的信噪比。
x' k=x k+K(z k-Hx k)    (25)
Figure PCTCN2021080905-appb-000006
本发明公开了一种基于视频的人体心率及面部血容积精确检测系统,用于实现本发明的方法,其包括:
图像检测模块,其用于检测视频帧图像中人面部区域,提取时间维度上人脸图像序列和面部关键位置点;基于人脸图像序列提取面部总体信号和面部roi信号集合;
预处理模块,对图像检测模块提取的面部总体信号和面部roi信号进行预处理;
基于频谱的心率计算模块,其基于预处理后的面部roi信号集合,采用线性加权的方式计算参考信号,并计算参考信号频谱,根据频谱峰值得到心率值,并根据参考信号频谱和面部roi信号频谱计算面部血液容积分布;
多模态心率检测模型,其基于LSTM和残差卷积神经网络模型构建,用于得到基于心率分布概率的预测心率值;
融合模块,根据基于频谱的心率计算模块的心率值和多模态心率检测模型的预测心率值,得到融合心率值检测结果。
如图2所示,为本发明的多模态心率检测模型结构图;多模态心率检测模型的结构图。图2左侧子图(Resnet)为CNN网络模型部分,负责检测数据的空间结构特征,右侧子图(LSTM)为LSTM网络模型部分,负责检测数据时间序列特征,综合上述两个子网络模型输出特征,采用softmax方式预测心率值。
图3脸部关键区域示意图;本发明中脸部关键区域指图示中的额头、面颊部分。
图4面部血液容积检测结果1,保持头部稳定状态,从结果中可以看出额头遮挡部分以及面颊部分光照较差的部分没有检测到血容积,符合实验预期;
图5面部血液容积检测结果2,采用躺姿稳定头部,并保持光照均匀,从结果中可以看出整个面部血容积检测结果均匀,符合实验预期;
图6面部血液容积检测结果3,刻意小幅度晃动头部,相比图5所示检测结果,可以看出面部血容积检测结果存在噪声,符合实验预期。
以上所述仅是本发明的优选实施方式,应当指出,对于本技术领域的普通技术人员来说,在不脱离本发明技术原理的前提下,还可以做出若干改进和变型,这些改进和变型也应视为本发明的保护范围。

Claims (5)

  1. 一种基于视频的人体心率及面部血容积精确检测方法,其特征在于所述方法包括以下步骤:
    (1)检测视频帧图像中人面部区域,提取时间维度上人脸图像序列和面部关键位置点;基于人脸图像序列提取面部总体信号和面部roi信号集合;对信号进行预处理;
    所述的步骤(1)具体为:
    (1.1)采用卷积网络模型检测视频帧图像中人面部区域和面部关键点,分别生成时间维度上人脸图像序列和面部关键位置点序列;
    (1.2)基于上述人脸图像序列,分别提取面部总体信号和面部roi信号集合;面部总体信号计算如公式3所示,其中:face_sig为压缩后的信号,PCompress()为压缩函数,用于计算人脸图像序列中每一幅人脸图像的平均像素强度,face_seq为人脸图像序列;
    face_sig=PCompress(face_seq)   (3)
    采用R×R大小的roi子块划分人脸图像,得到时间维度上的roi子块图像序列,如公式4所示,其中:face_roi i表示第i个roi子块图像序列,face_roi_seq为所有roi子块图像序列构成的集合,m×n为roi子块数量;
    face_roi_seq={face_roi 1,face_roi 2,...,face_roi i,...,face_roi m×n}   (4)
    对每个roi子块图像序列进行压缩,如公式5所示,其中:face_roi_seq为所有roi子块图像序列构成的集合,PCompress()为压缩函数,用于计算集合中每一个roi子块图像序列对应的时间维度上的平均像素强度信号,face_roi_sig为压缩后得到的信号集合,即面部roi子块信号集合,其中每一个元素为roi子块图像序列压缩得到的信号;
    face_roi_sig=PCompress(face_roi_seq)    (5)
    其中:
    face_roi_sig={face_roi_sig 1,...,face_roi_sig i,...,face_roi_sig m×n}    (6)
    公式6中,face_roi_sig i为第i个roi子块图像序列对应的压缩后的信号,m×n为roi子块数量;
    (1.3)对面部总体信号和面部roi信号集合进行预处理,消除指定频率范围以外的噪声信号;
    (2)基于预处理后的面部roi信号集合,计算心率值和面部血液容积分布;
    (3)利用基于LSTM和残差卷积神经网络模型构建的多模态心率检测模型;得到基于心率分布概率的预测心率值;
    (4)基于卡尔曼滤波方法融合步骤(2)和步骤(3)的心率值结果,得到融合心率值检测结果。
  2. 根据权利要求1所述的基于视频的人体心率及面部血容积精确检测方法,其特征在于所述的步骤(2)具体为:
    (2.1)采用线性加权的方式计算参考信号,如公式9所示,其中:sig_ref为参考信号,roi_sig_r为预处理后的面部roi信号集合,m×n为roi子块数量;
    Figure PCTCN2021080905-appb-100001
    weight_set={w 1,w 2,...,w i,...,w m×n}
    roi_sig_r=sigprocess(face_roi_sig)    (8)
    其中:weight_set为计算得到的权重集合;sigprocess()为信号预处理函数;
    (2.2)基于参考信号,采用lomb-scargle谱分析方法计算参考信号频谱,基于此得到心率值,心率值对应于频谱峰值;
    (2.3)计算面部血液容积分布。
  3. 根据权利要求2所述的基于视频的人体心率及面部血容积精确检测方法,其特征在于所述的步骤(2.3)具体为:
    如公式13所示,sig_ref_sd为参考信号频谱,v为计算得到的血液容积分布;
    v=Volume(sig_ref_sd)    (13)
    其中,Volume()为计算血容积函数,其具体形式如公式14所示;
    Figure PCTCN2021080905-appb-100002
    公式14中,fs ref为参考信号频谱,fs roi为面部roi信号频谱,
    Figure PCTCN2021080905-appb-100003
    为卷积算子,m、n分别为roi子块在横纵坐标方向上的数量最大值。
  4. 根据权利要求1所述的基于视频的人体心率及面部血容积精确检测方法,其特征在于所述的步骤(3)中,基于LSTM和残差卷积神经网络模型构建的多模态心率检测模型的训练方法为:
    (3.1)训练样本提取
    基于步骤(1)提取得到的人脸关键位置点,构成时间维度上的关键点序列,基于人脸关键位置点,选取额头、左右面颊区域在时间维度上构成的图像序列,在图像空间维度上压缩选取的图像构建训练样本;训练样本为图像序列压缩后得到的信号集合;
    (3.2)初始化训练样本数据,得到规范化后信号sig_nor;
    (3.3)构建基于LSTM架构的心率检测子模型
    该子模型包含了1D-CNN和LSTM两种网络结构,首先,将步骤(3.2)得到的sig_nor信号作为训练样本,基于1D-CNN模型提取sig_nor信号对应的初步特征,在此基础上,采用LSTM结构提取信号对应的时间序列特征,最后,采用注意力机制融合LSTM模型各个阶段输出特征向量;
    (3.4)构建基于Resnet架构的心率检测子模型
    该子模型基于残差网络模型提取信号的时域波形特征,将sig_nor信号作为子模型的输入样本,输出特征向量feature resnet
    (3.5)融合步骤(3.3)和(3.4)所述子模型,构建多模态心率检测模型
    串联步骤(3.3)和步骤(3.4)中子模型的输出特征,并采用全连接网络的结构形式预测心率;
    预测基本过程如公式21所示,其中:res_pro为模型预测结果向量,FCN()为全连接层,Concat()为向量合并函数;
    res_pro=FCN(Concat(feature lstm,feature resnet))   (21)
    在此基础上预测心率值,提取心率值基本过程如公式22所示,其中:heart_rate_pre为心率预测值,mean()为均值函数,max_reg()为查找最大概率值对应的心率范围函数;
    heart_rate_pre=mean(max_reg(res_pro))   (22)。
  5. 根据权利要求1所述的基于视频的人体心率及面部血容积精确检测方法,其特征在于所述的步骤(4)具体为:
    基于公式25、26融合两种测量方式得到的心率值,
    x' k=x k+K(z k-Hx k)    (25)
    Figure PCTCN2021080905-appb-100004
    其中:x′k为融合后的心率值结果,x k为步骤(3)得到的预测心率值,z k为步骤(2)得到的心率值,K为融合系数,P k为预测方差,R k为测量方差,H和H T分别为预测值与真实值之间的关系矩阵及其转置形式。
PCT/CN2021/080905 2020-05-25 2021-03-16 一种基于视频的人体心率及面部血容积精确检测方法和系统 WO2021238344A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/696,909 US20220218218A1 (en) 2020-05-25 2022-03-17 Video-based method and system for accurately estimating human body heart rate and facial blood volume distribution

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010448368.X 2020-05-25
CN202010448368.XA CN111626182B (zh) 2020-05-25 2020-05-25 一种基于视频的人体心率及面部血容积精确检测方法和系统

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/696,909 Continuation US20220218218A1 (en) 2020-05-25 2022-03-17 Video-based method and system for accurately estimating human body heart rate and facial blood volume distribution

Publications (1)

Publication Number Publication Date
WO2021238344A1 true WO2021238344A1 (zh) 2021-12-02

Family

ID=72257949

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/080905 WO2021238344A1 (zh) 2020-05-25 2021-03-16 一种基于视频的人体心率及面部血容积精确检测方法和系统

Country Status (3)

Country Link
US (1) US20220218218A1 (zh)
CN (1) CN111626182B (zh)
WO (1) WO2021238344A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111626182B (zh) * 2020-05-25 2021-03-26 浙江大学 一种基于视频的人体心率及面部血容积精确检测方法和系统
CN112237421B (zh) * 2020-09-23 2023-03-07 浙江大学山东工业技术研究院 一种基于视频的动态心率变异性分析模型
CN112381011B (zh) * 2020-11-18 2023-08-22 中国科学院自动化研究所 基于人脸图像的非接触式心率测量方法、系统及装置
CN113892930B (zh) * 2021-12-10 2022-04-22 之江实验室 一种基于多尺度心率信号的面部心率测量方法和装置
CN113963427B (zh) * 2021-12-22 2022-07-26 浙江工商大学 一种快速活体检测的方法与系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170367590A1 (en) * 2016-06-24 2017-12-28 Universita' degli Studi di Trento (University of Trento) Self-adaptive matrix completion for heart rate estimation from face videos under realistic conditions
CN107692997A (zh) * 2017-11-08 2018-02-16 清华大学 心率检测方法及装置
CN109602412A (zh) * 2018-12-05 2019-04-12 中国科学技术大学 利用面部视频实现心率检测的方法
CN109700450A (zh) * 2018-12-28 2019-05-03 联想(北京)有限公司 一种心率检测方法及电子设备
CN110321781A (zh) * 2019-05-06 2019-10-11 苏宁金融服务(上海)有限公司 一种用于无接触式测量的信号处理方法及装置
CN111626182A (zh) * 2020-05-25 2020-09-04 浙江大学 一种基于视频的人体心率及面部血容积精确检测方法和系统

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7756577B1 (en) * 2006-04-21 2010-07-13 Pacesetter, Inc. Multi-modal medical therapy system
WO2015084376A1 (en) * 2013-12-05 2015-06-11 Apple Inc. Wearable multi-modal physiological sensing sysem
CN106845395A (zh) * 2017-01-19 2017-06-13 北京飞搜科技有限公司 一种基于人脸识别进行活体检测的方法
CN109460737A (zh) * 2018-11-13 2019-03-12 四川大学 一种基于增强式残差神经网络的多模态语音情感识别方法
CN110458101B (zh) * 2019-08-12 2022-09-16 南京邮电大学 基于视频与设备结合的服刑人员体征监测方法及设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170367590A1 (en) * 2016-06-24 2017-12-28 Universita' degli Studi di Trento (University of Trento) Self-adaptive matrix completion for heart rate estimation from face videos under realistic conditions
CN107692997A (zh) * 2017-11-08 2018-02-16 清华大学 心率检测方法及装置
CN109602412A (zh) * 2018-12-05 2019-04-12 中国科学技术大学 利用面部视频实现心率检测的方法
CN109700450A (zh) * 2018-12-28 2019-05-03 联想(北京)有限公司 一种心率检测方法及电子设备
CN110321781A (zh) * 2019-05-06 2019-10-11 苏宁金融服务(上海)有限公司 一种用于无接触式测量的信号处理方法及装置
CN111626182A (zh) * 2020-05-25 2020-09-04 浙江大学 一种基于视频的人体心率及面部血容积精确检测方法和系统

Also Published As

Publication number Publication date
CN111626182A (zh) 2020-09-04
US20220218218A1 (en) 2022-07-14
CN111626182B (zh) 2021-03-26

Similar Documents

Publication Publication Date Title
WO2021238344A1 (zh) 一种基于视频的人体心率及面部血容积精确检测方法和系统
Hsu et al. Deep learning with time-frequency representation for pulse estimation from facial videos
CN105636505B (zh) 用于获得对象的生命体征的设备和方法
EP2960862B1 (en) A method for stabilizing vital sign measurements using parametric facial appearance models via remote sensors
CN109993068B (zh) 一种基于心率和面部特征的非接触式的人类情感识别方法
CN109086675B (zh) 一种基于光场成像技术的人脸识别及攻击检测方法及其装置
CN105147274A (zh) 一种从可见光谱段人脸视频信号中提取心率的方法
KR20080051956A (ko) 실시간 동영상의 실루엣 기반 대상체 행동 분석 시스템 및방법
JP2017093760A (ja) 心拍に連動する周期的変動の計測装置及び計測方法
CN108596087A (zh) 一种基于双网络结果的驾驶疲劳程度检测回归模型
CN109508648A (zh) 一种人脸抓拍方法及设备
Przybyło A deep learning approach for remote heart rate estimation
Szankin et al. Long distance vital signs monitoring with person identification for smart home solutions
KR102150635B1 (ko) 비전 기반 심박수 측정 방법
Wei et al. Remote photoplethysmography and heart rate estimation by dynamic region of interest tracking
CN114557685B (zh) 一种非接触式运动鲁棒心率测量方法及测量装置
CN110718301A (zh) 基于动态脑功能网络的阿尔茨海默病辅助诊断装置及方法
CN113456042A (zh) 一种基于3d cnn的无接触面部血压测量方法
CN112716468A (zh) 基于三维卷积网络的非接触心率测量方法及装置
CN110135357B (zh) 一种基于远程遥感的幸福感实时检测方法
Karmuse et al. A robust rPPG approach for continuous heart rate measurement based on face
Slapnicar et al. Contact-free monitoring of physiological parameters in people with profound intellectual and multiple disabilities
CN113693573B (zh) 一种基于视频的非接触式多生理参数监测的系统及方法
CN116350198A (zh) 一种远距离移动目标心率实时监测方法
Wang et al. Recognition oriented iris image quality assessment in the feature space

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21813422

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21813422

Country of ref document: EP

Kind code of ref document: A1