CN115624322B - Non-contact physiological signal detection method and system based on efficient space-time modeling - Google Patents
Non-contact physiological signal detection method and system based on efficient space-time modeling Download PDFInfo
- Publication number
- CN115624322B CN115624322B CN202211451949.4A CN202211451949A CN115624322B CN 115624322 B CN115624322 B CN 115624322B CN 202211451949 A CN202211451949 A CN 202211451949A CN 115624322 B CN115624322 B CN 115624322B
- Authority
- CN
- China
- Prior art keywords
- neural network
- loss function
- image sequence
- space
- loss
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 38
- 238000013528 artificial neural network Methods 0.000 claims abstract description 58
- 238000000034 method Methods 0.000 claims abstract description 52
- 230000036387 respiratory rate Effects 0.000 claims abstract description 18
- 230000007246 mechanism Effects 0.000 claims abstract description 17
- 238000001914 filtration Methods 0.000 claims abstract description 12
- 238000007781 pre-processing Methods 0.000 claims abstract description 11
- 230000008569 process Effects 0.000 claims description 8
- 230000000241 respiratory effect Effects 0.000 claims description 7
- 238000000605 extraction Methods 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 5
- 238000005457 optimization Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 abstract description 6
- 238000011156 evaluation Methods 0.000 abstract description 4
- 238000002679 ablation Methods 0.000 abstract description 3
- 238000011160 research Methods 0.000 abstract description 2
- 230000006870 function Effects 0.000 description 73
- 230000033001 locomotion Effects 0.000 description 20
- 238000005259 measurement Methods 0.000 description 11
- 238000013507 mapping Methods 0.000 description 8
- 230000008901 benefit Effects 0.000 description 7
- 238000004364 calculation method Methods 0.000 description 7
- 238000013186 photoplethysmography Methods 0.000 description 7
- 230000029058 respiratory gaseous exchange Effects 0.000 description 6
- 230000015654 memory Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000010606 normalization Methods 0.000 description 4
- 238000001228 spectrum Methods 0.000 description 4
- 238000009532 heart rate measurement Methods 0.000 description 3
- 238000012544 monitoring process Methods 0.000 description 3
- 238000011084 recovery Methods 0.000 description 3
- 230000036391 respiratory frequency Effects 0.000 description 3
- 230000002123 temporal effect Effects 0.000 description 3
- 208000024172 Cardiovascular disease Diseases 0.000 description 2
- 238000013459 approach Methods 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000005286 illumination Methods 0.000 description 2
- 230000004807 localization Effects 0.000 description 2
- 238000000691 measurement method Methods 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 206010042434 Sudden death Diseases 0.000 description 1
- 230000004931 aggregating effect Effects 0.000 description 1
- 230000003042 antagnostic effect Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000005800 cardiovascular problem Effects 0.000 description 1
- 238000012512 characterization method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000002565 electrocardiography Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 230000000873 masking effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/02—Detecting, measuring or recording pulse, heart rate, blood pressure or blood flow; Combined pulse/heart-rate/blood pressure determination; Evaluating a cardiovascular condition not otherwise provided for, e.g. using combinations of techniques provided for in this group with electrocardiography or electroauscultation; Heart catheters for measuring blood pressure
- A61B5/024—Detecting, measuring or recording pulse rate or heart rate
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/08—Detecting, measuring or recording devices for evaluating the respiratory organs
- A61B5/0816—Measuring devices for examining respiratory frequency
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/0002—Inspection of images, e.g. flaw detection
- G06T7/0012—Biomedical image inspection
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/10—Image acquisition modality
- G06T2207/10016—Video; Image sequence
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20081—Training; Learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20092—Interactive image processing based on input by user
- G06T2207/20104—Interactive definition of region of interest [ROI]
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Health & Medical Sciences (AREA)
- Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Medical Informatics (AREA)
- Theoretical Computer Science (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Veterinary Medicine (AREA)
- Heart & Thoracic Surgery (AREA)
- Surgery (AREA)
- Animal Behavior & Ethology (AREA)
- Pathology (AREA)
- Public Health (AREA)
- Physiology (AREA)
- Pulmonology (AREA)
- Cardiology (AREA)
- General Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Computation (AREA)
- Radiology & Medical Imaging (AREA)
- Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Quality & Reliability (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
The invention provides a non-contact physiological signal detection method and system based on efficient space-time modeling, and relates to the technical field of intelligent data processing. The method comprises the steps of obtaining an original video stream, preprocessing the original video stream, and obtaining a preprocessed image sequence; acquiring an image sequence, inputting the image sequence into a depth neural network based on a three-dimensional center differential convolution operator, and extracting space-time information by combining a attention mask mechanism of a convolution layer; based on the space-time information and the epsilon-insensitive Huber loss loss function, constructing a multi-task loss function to optimize the deep neural network; and filtering the optimized deep neural network by adopting a second-order Butterworth filter, and outputting the heart rate and the respiratory rate. The three-dimensional center difference convolution operator can be used for extracting pulse wave information by gathering time difference information, and cross-database evaluation and ablation research are carried out, so that the effectiveness and the robustness of the method are proved.
Description
Technical Field
The invention relates to the technical field of intelligent data processing, in particular to a non-contact physiological signal detection method and system based on efficient space-time modeling.
Background
Along with the continuous development of modern society and the continuous improvement of living standard of people, the incidence rate of cardiovascular diseases is also continuously increased, which is probably caused by the increase of working pressure and the increase of living rhythm. The detection of human physiological indexes is very important for sensing human health conditions. Early detection and treatment can effectively prevent and control cardiovascular diseases, and can avoid and reduce sudden death caused by cardiovascular problems. Traditional heart rate measurement methods, such as electrocardiography, are all contact heart rate measurements, and have the following limitations: first, it is not suitable for some specific application scenarios, such as a scenario requiring long-term monitoring: neonates, large area burn patients, sleep monitoring, driver monitoring, etc., and contact measurements require subjective cooperation of the subject. Secondly, when the contact position of the measuring instrument and the skin deviates, a large deviation of the measuring result is easy to occur. Furthermore, although electrocardiographs have the advantage of accurate measurements, they are relatively expensive, require specialized operations, and are not suitable for routine physiological signal measurements. With the initial success of the non-contact remote photoplethysmography prototype method, classical signal processing has demonstrated the feasibility of heart rate measurements based on remote photoplethysmography techniques. However, these methods tend to degrade when subjected to noise such as motion, illumination changes, and skin tone. In practical application, the method based on signal separation is found to be only aimed at specific interference, and coexistence of multiple interferences in a real scene cannot be effectively processed.
For practical applications, recent research is beginning to focus on deep learning based approaches because of their better characterizations. Several deep learning-based methods have been successfully applied to remote photoplethysmography recovery tasks with pulse or respiration as the target signal, but these methods still have difficulty in efficiently modeling spatiotemporal information. Although there are methods of modeling spatiotemporal information by generating feature maps that require preprocessing, including face detection, face keypoint localization, face alignment, skin segmentation, and color space transformation, these are quite complex. Furthermore, performance across database evaluations and practical applications may be degraded due to limitations in supervised learning. The learned spatiotemporal features are still susceptible to lighting conditions and motion, and they do not take full advantage of extensive temporal context information.
Disclosure of Invention
Aiming at the problems that the learned space-time characteristics are still easily influenced by illumination conditions and motions and the learned space-time characteristics cannot fully utilize wide time context information in the prior art, the invention provides a non-contact physiological signal detection method and system based on efficient space-time modeling.
In order to solve the technical problems, the invention provides the following technical scheme:
in one aspect, a method for detecting a non-contact physiological signal based on efficient spatiotemporal modeling is provided, which is applied to an electronic device, and includes the following steps:
s1: the method comprises the steps of obtaining an original video stream, preprocessing the original video stream, and obtaining a preprocessed image sequence;
s2: acquiring an image sequence, inputting the image sequence into a depth neural network based on a three-dimensional center differential convolution operator, and extracting space-time information by combining a attention mask mechanism of a convolution layer;
s3: based on the space-time information and the epsilon-insensitive Huber loss loss function, constructing a multi-task loss function to optimize the deep neural network;
s4: and filtering the optimized deep neural network by adopting a second-order Butterworth filter, and outputting the heart rate and the respiratory rate simultaneously to finish non-contact physiological signal detection based on efficient space-time modeling.
Optionally, in step S1, an original video stream is acquired, and the original video stream is preprocessed to obtain a preprocessed image sequence, including:
respectively carrying out time domain normalization difference value and downsampling treatment on an original video stream to obtain a preprocessed image sequence;
wherein, the calculation of the time domain normalized difference is performed according to the following formula (1):
wherein the method comprises the steps ofRepresent the firstIndividual skin pixels are in timeIs used for the RGB values of (a),is a time variation value.
Optionally, the image sequence comprises: a time domain normalized difference image sequence and a downsampled image sequence.
Optionally, in step S2, an image sequence is acquired, the image sequence is input into a deep neural network based on a three-dimensional central differential convolution operator, and the spatio-temporal information is extracted by combining with a attention mask mechanism of a convolution layer, including:
s21: taking the time domain normalized difference image sequence as a motion branch, and inputting the motion branch into a depth neural network based on a three-dimensional center differential convolution operator;
taking the downsampled image sequence as an appearance branch, and inputting the downsampled image sequence into a depth neural network based on a three-dimensional center differential convolution operator;
s22: through a attention mask mechanism, modeling skin interested areas based on appearance branches assist the motion branches to extract space-time information;
s23: repeating the steps S21-S22, extracting the space-time information, and transmitting the space-time information to the full connection layer.
Optionally, in step S21, the method includes:
obtaining a three-dimensional center difference convolution operator according to the following formula (2):
Wherein,,is an input feature map which is used to input a feature map,representing a local receptive field cube, the shape of which is shown,is a weight that can be learned and is,representing the current position on the feature map,representation of receptive fieldsAnd adjacent time stepsEnumeration of middle position, hyper-parametersFor balancing spatial intensity and gradient.
Optionally, in step S22, it includes:
obtaining a function of the attention mask mechanism according to the following equation (3)The formula:
wherein,,is the appearance branchA feature map of a layer convolution layer;is the branch of motionA feature map of a layer convolution layer;andis the firstThe height and width of the layer convolution layer feature map;the sigmoid function is represented as a function,is the weight of the convolution kernel,is a convolution kernel offset which is a function of the convolution kernel,is the L1 norm of the sample,representing per-element products.
Optionally, in step S3, constructing a multi-task loss function based on the spatio-temporal information and the epsilon-insensitive Huber loss loss function optimizes the deep neural network, including:
calculating the intensity loss of pulse wave and respiratory wave according to the following equation (4) epsilon-insensitive Huber loss loss function:
Wherein the method comprises the steps ofA true value is indicated and,representing inputThrough a function ofThe predicted value after the mapping is used for mapping,is a hyper-parameter of Huber loss, default to 1,is an superparameter of epsilon-intrinsic loss, and the default value is 0.1;
and (3) optimizing the weight of the deep neural network through the back propagation loss function value, and stopping optimizing after the loss function is not reduced any more, namely selecting the deep neural network with the lowest loss function value in the training process.
Optionally, in step S4, the optimized deep neural network is filtered by using a second-order butterworth filter, and the heart rate and the respiration rate are output at the same time, so as to complete non-contact physiological signal detection based on efficient space-time modeling, which includes:
a second-order Butterworth filter deep neural network is adopted, and heart rate and respiratory rate are output simultaneously;
wherein, the cut-off frequency of heart rate is 0.75Hz and 2.5Hz, and the cut-off frequency of respiratory frequency is 0.08Hz and 0.5Hz respectively;
the position of the highest peak value in the power spectrum obtained by filtering the signals is selected as heart rate and respiratory rate output, and non-contact physiological signal detection based on efficient space-time modeling is completed.
In one aspect, a non-contact physiological signal detection system based on efficient spatiotemporal modeling is provided, the system being applied to an electronic device, the system comprising:
the data acquisition module is used for acquiring an original video stream, preprocessing the original video stream and acquiring a preprocessed image sequence;
the space-time information extraction module is used for acquiring an image sequence, inputting the image sequence into a depth neural network based on a three-dimensional center difference convolution operator, and extracting space-time information by combining a attention mask mechanism of a convolution layer;
the model optimization module is used for constructing a multi-task loss function to optimize the deep neural network based on the space-time information and the epsilon-insensitive Huber loss loss function;
and the data output module is used for filtering the optimized deep neural network by adopting a second-order Butterworth filter, outputting heart rate and respiratory rate at the same time, and finishing non-contact physiological signal detection based on efficient space-time modeling.
Optionally, the data acquisition module is further configured to perform time domain normalization difference and downsampling on the original video stream respectively to obtain a preprocessed image sequence;
wherein, the calculation of the time domain normalized difference is performed according to the following formula (1):
wherein the method comprises the steps ofRepresent the firstIndividual skin pixels are in timeIs used for the RGB values of (a),is a time variation value.
In one aspect, an electronic device is provided that includes a processor and a memory having at least one instruction stored therein that is loaded and executed by the processor to implement a non-contact physiological signal detection method based on efficient spatiotemporal modeling as described above.
In one aspect, a computer-readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement a non-contact physiological signal detection method based on efficient spatiotemporal modeling as described above is provided.
The technical scheme provided by the embodiment of the invention has at least the following beneficial effects:
in the scheme, an accurate non-contact physiological signal measurement method based on a three-dimensional center difference convolution attention network is provided, the method is used for efficient space-time modeling, and the utilized three-dimensional center difference convolution operator can extract pulse wave information by gathering time difference information.
Epsilon-insensitive Huber loss is proposed as a loss function of the non-contact pulse wave measurement network, as it can focus the pulse wave intensity constraint, showing better performance of epsilon-insensitive Huber loss loss function by evaluating different loss functions and combinations thereof.
And further provides a network for combined multitasking measurement of heart and respiratory motion, which has the advantage of sharing information between related physiological signals, can measure heart rate and respiratory rate simultaneously, further improves accuracy and reduces calculation cost. A large number of experiments show that the proposed method has excellent performance on the public database. And cross-database evaluation and ablation studies were performed, demonstrating the effectiveness and robustness of the proposed method.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a non-contact physiological signal detection method based on efficient space-time modeling provided by an embodiment of the invention;
FIG. 2 is a flow chart of a non-contact physiological signal detection method based on efficient spatiotemporal modeling provided by an embodiment of the invention;
FIG. 3 is a block diagram of a non-contact physiological signal detection system based on efficient spatiotemporal modeling provided by an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the technical problems, technical solutions and advantages to be solved more apparent, the following detailed description will be given with reference to the accompanying drawings and specific embodiments.
The embodiment of the invention provides a non-contact physiological signal detection method based on efficient space-time modeling, which can be realized by electronic equipment, wherein the electronic equipment can be a terminal or a server. The flow chart of the non-contact physiological signal detection method based on the efficient space-time modeling as shown in fig. 1, the processing flow of the method can comprise the following steps:
s101: the method comprises the steps of obtaining an original video stream, preprocessing the original video stream, and obtaining a preprocessed image sequence;
s102: acquiring an image sequence, inputting the image sequence into a depth neural network based on a three-dimensional center differential convolution operator, and extracting space-time information by combining a attention mask mechanism of a convolution layer;
s103: based on the space-time information and the epsilon-insensitive Huber loss loss function, constructing a multi-task loss function to optimize the deep neural network;
s104: and filtering the optimized deep neural network by adopting a second-order Butterworth filter, and outputting the heart rate and the respiratory rate simultaneously to finish non-contact physiological signal detection based on efficient space-time modeling.
Optionally, in step S101, an original video stream is acquired, and the original video stream is preprocessed to obtain a preprocessed image sequence, which includes:
respectively carrying out time domain normalization difference value and downsampling treatment on an original video stream to obtain a preprocessed image sequence:
wherein, the calculation of the time domain normalized difference is performed according to the following formula (1):
wherein the method comprises the steps ofRepresent the firstIndividual skin pixels are in timeIs used for the RGB values of (a),is a time variation value.
Optionally, the image sequence comprises: a time domain normalized difference image sequence and a downsampled image sequence.
Optionally, in step S102, an image sequence is acquired, the image sequence is input into a deep neural network based on a three-dimensional central differential convolution operator, and the spatio-temporal information is extracted in combination with an attention mask mechanism of a convolution layer, including:
s121: taking the time domain normalized difference image sequence as a motion branch, and inputting the motion branch into a depth neural network based on a three-dimensional center differential convolution operator;
taking the downsampled image sequence as an appearance branch, and inputting the downsampled image sequence into a depth neural network based on a three-dimensional center differential convolution operator;
s122: through a attention mask mechanism, modeling skin interested areas based on appearance branches assist the motion branches to extract space-time information;
s123: repeating the steps S121-S122, extracting the space-time information, and transmitting the space-time information to the full connection layer.
Optionally, in step S121, it includes:
obtaining a three-dimensional center difference convolution operator according to the following formula (2):
Wherein,,is an input feature map which is used to input a feature map,representing a local receptive field cube, the shape of which is shown,is a weight that can be learned and is,representing the current position on the feature map,representation of receptive fieldsAnd adjacent time stepsEnumeration of middle position, hyper-parametersFor balancing spatial intensity and gradient.
Optionally, in step S122, the method includes:
wherein,,is the appearance branchA feature map of a layer convolution layer;is the branch of motionA feature map of a layer convolution layer;andis the firstThe height and width of the layer convolution layer feature map;the sigmoid function is represented as a function,is the weight of the convolution kernel,is a convolution kernel offset which is a function of the convolution kernel,is the L1 norm of the sample,representing per-element products.
Optionally, in step S103, constructing a multi-task loss function based on the spatio-temporal information and the epsilon-insensitive Huber loss loss function optimizes the deep neural network, including:
calculating the intensity loss of pulse wave and respiratory wave according to the following equation (4) epsilon-insensitive Huber loss loss function:
Wherein the method comprises the steps ofA true value is indicated and,representing inputThrough a function ofThe predicted value after the mapping is used for mapping,is a hyper-parameter of Huber loss, default to 1,is an superparameter of epsilon-intrinsic loss, and the default value is 0.1;
and optimizing the weight of the deep neural network through the back propagation loss function value, stopping optimizing after the loss function is not reduced any more, and selecting the deep neural network with the lowest loss function value in the training process.
Optionally, in step S104, the optimized deep neural network is filtered by using a second-order butterworth filter, and the heart rate and the respiration rate are output at the same time, so as to complete non-contact physiological signal detection based on efficient space-time modeling, which includes:
a second-order Butterworth filter deep neural network is adopted, and heart rate and respiratory rate are output simultaneously;
wherein, the cut-off frequency of heart rate is 0.75Hz and 2.5Hz, and the cut-off frequency of respiratory frequency is 0.08Hz and 0.5Hz respectively;
the position of the highest peak value in the power spectrum obtained by filtering the signals is selected as heart rate and respiratory rate output, and non-contact physiological signal detection based on efficient space-time modeling is completed.
In the embodiment of the invention, a remote photoelectric volume pulse wave recovery method for physiological signal non-contact measurement based on efficient space-time modeling is provided. The effective space-time modeling is realized by combining a three-dimensional center differential convolution operator, a motion and appearance double-branch structure and a soft attention mask. The three-dimensional center differential convolution operator is good at describing the intrinsic mode of the pulse wave by a combination of gradient and intensity information. Deep neural networks based on three-dimensional central differential convolution operators can provide more reliable spatio-temporal information modeling capabilities than traditional three-dimensional convolutions.
In addition, the patent firstly introduces an epsilon-insensitive Huber loss loss function in a remote photoplethysmography task, has the advantages of L1 and L2 loss, and combines epsilon-insensitivity to ensure that the loss function can ignore noise samples in a insensitive domain, thereby increasing robustness and displaying better performance.
The embodiment of the invention provides a non-contact physiological signal detection method based on efficient space-time modeling, which can be realized by electronic equipment, wherein the electronic equipment can be a terminal or a server. The flow chart of the non-contact physiological signal detection method based on the efficient space-time modeling as shown in fig. 2, the processing flow of the method can comprise the following steps:
s201: acquiring an original video stream, preprocessing the original video stream, and acquiring a preprocessed image sequence;
in a possible implementation, the original video stream is subjected to a time-domain normalized difference value and downsampling process, and the resolution after the process is 36×36×3. Obtaining a preprocessed image sequence;
wherein, the calculation of the time domain normalized difference is performed according to the following formula (1):
wherein the method comprises the steps ofRepresent the firstIndividual skin pixels are in timeIs used for the RGB values of (a),is a time variation value.
In a possible embodiment, the image sequence comprises: a time domain normalized difference image sequence and a downsampled image sequence. In this embodiment the image sequences are 10 frames each.
S202: taking the time domain normalized difference image sequence as a motion branch, and inputting the motion branch into a depth neural network based on a three-dimensional center differential convolution operator;
taking the downsampled image sequence as an appearance branch, and inputting the downsampled image sequence into a depth neural network based on a three-dimensional center differential convolution operator;
s203: by noting the masking mechanism, modeling skin regions of interest based on appearance branches assists the motion branches in extracting spatiotemporal information.
S204: repeating the steps S202-203, extracting the space-time information, and transmitting the space-time information to the full connection layer.
In a possible embodiment, the three-dimensional central differential convolution operator is obtained according to the following formula (2):
wherein,,is an input feature map which is used to input a feature map,representing a local receptive field cube, the shape of which is shown,is a weight that can be learned and is,representing the current position on the feature map,representation of receptive fieldsAnd adjacent time stepsEnumeration of middle position, hyper-parametersFor balancing spatial intensity and gradient.
In one possible implementation of this embodiment, the convolution kernel size is (3 x 3, 32) the number of convolution layers is 2.
In a possible embodiment, the functional formula of the attention mask mechanism is obtained according to the following formula (2):
wherein,,is the appearance branchA feature map of a layer convolution layer;is the branch of motionA feature map of a layer convolution layer;andis the firstThe height and width of the layer convolution layer feature map;the sigmoid function is represented as a function,is the weight of the convolution kernel,is a convolution kernel offset which is a function of the convolution kernel,is the L1 norm of the sample,representing per-element products.
In a possible embodiment, layering is performed using average pooling with a core size of (2 x 2, 32) with a probability of Dropout of 0.5.
In a possible implementation, the full connection layer has input/output dimensions 128 and 10 with a probability of Dropout of 0.5.
S205: based on the space-time information and the epsilon-insensitive Huber loss loss function, constructing a multi-task loss function to optimize the deep neural network;
in a possible embodiment, the intensity losses of the pulse wave and the respiration wave are calculated according to the following equation (4) ε -insensitive Huber loss loss function:
wherein the method comprises the steps ofA true value is indicated and,representing inputThrough a function ofThe predicted value after the mapping is used for mapping,is a hyper-parameter of Huber loss, default to 1,is an superparameter of epsilon-intrinsic loss, and the default value is 0.1;
as a multitasking loss function, the following equation (5) is combined:
and optimizing the weight of the deep neural network through the back propagation loss function value, stopping optimizing after the loss function is not reduced any more, and selecting the deep neural network with the lowest loss function value in the training process.
In one possible embodiment, the deep neural network model with the lowest loss function value is used as the final model.
In a possible implementation, α=β=1 is adopted in one example, and finally, the neural network weight is optimized through the back propagation loss function value, and the neural network weight stops after the loss function is no longer reduced, and the model with the best effect adopts the recovered pulse wave and the respiratory wave as the final model;
s206: and filtering the optimized deep neural network by adopting a second-order Butterworth filter, and outputting the heart rate and the respiratory rate simultaneously to finish non-contact physiological signal detection based on efficient space-time modeling.
In a possible implementation, a second-order Butterworth filter is adopted to further filter the output heart rate and the respiratory rate of the network, wherein the cut-off frequency of the heart rate is 0.75Hz and 2.5Hz, and the cut-off frequency of the respiratory rate is 0.08Hz and 0.5Hz respectively;
and selecting the position of the highest peak value in the power spectrum obtained by the filtering signal as heart rate and respiratory rate output, and finishing non-contact physiological signal detection based on efficient space-time modeling.
In a possible embodiment, the present invention differs from the previous methods in mainly two ways: one is a spatio-temporal network. Three-dimensional convolution and time-shifted convolution are employed in the conventional art to reduce computational budget but without precision gain. The three-dimensional center differential convolution operator used herein may replace conventional convolution operations without the need for additional parameters. The result is improved because of the enhanced spatiotemporal context modeling capabilities that facilitate the representation of appearance and motion information. Meanwhile, the center difference can be regarded as a regularization term, and overfitting is relieved. The other is a network architecture, such as Dual-GAN, which is a design based on generating an antagonistic network architecture for signal decoupling, with performance at certain metrics (e.g., root mean square error on UBFC data sets) superior to the method of the present invention. But Dual GAN contains preprocessing steps called spatiotemporal map generation, including face detection, face keypoint localization, face alignment, skin segmentation and color space transformation, which are relatively complex. Whereas the method of the invention requires only a simple subtraction between frames as input to the motion branch.
Furthermore, in terms of the loss function, epsilon-insensitive Huber loss is employed, the gradient slowly decreases with Huber loss as the loss between the predicted signal and the real signal approaches a minimum, so the model is more robust in signal prediction. The multi-task network provided by the invention can model internal correlation, has accuracy advantage compared with a single-task version, and simultaneously saves computing resources. In general, a remote photoplethysmography measurement network based on efficient spatiotemporal modeling can capture rich temporal context by aggregating temporal difference information related to the remote photoplethysmography technology to obtain a more robust and accurate non-contact physiological signal measurement result.
In the embodiment of the invention, an accurate non-contact physiological signal measurement method based on a three-dimensional center difference convolution attention network is provided, the method is used for efficient space-time modeling, and pulse wave information can be extracted by utilizing a three-dimensional center difference convolution operator through gathering time difference information.
Epsilon-insensitive Huber loss is proposed as a loss function of the non-contact pulse wave measurement network, as it can focus the pulse wave intensity constraint, showing better performance of epsilon-insensitive Huber loss loss function by evaluating different loss functions and combinations thereof.
The network for combined multitasking measurement of heart rate and respiratory motion is further provided, and has the advantage of sharing information between related physiological signals, so that the heart rate and the respiratory rate can be measured simultaneously, the accuracy is further improved, and the calculation cost is reduced. A large number of experiments show that the proposed method has excellent performance on the public database. And cross-database evaluation and ablation studies were performed, demonstrating the effectiveness and robustness of the proposed method.
FIG. 3 is a block diagram illustrating a non-contact physiological signal detection system based on efficient spatiotemporal modeling, according to an exemplary embodiment. Referring to fig. 3, the system 300 includes:
the data acquisition module 310 is configured to acquire an original video stream, perform preprocessing on the original video stream, and acquire a preprocessed image sequence;
the spatio-temporal information extraction module 320 is configured to obtain an image sequence, input the image sequence into a deep neural network based on a three-dimensional central differential convolution operator, and extract spatio-temporal information by combining a attention mask mechanism of a convolution layer;
the model optimization module 330 is configured to construct a multi-task loss function to optimize the deep neural network based on the spatio-temporal information and the epsilon-insensitive Huber loss loss function;
the data output module 340 is configured to filter the optimized deep neural network by using a second-order butterworth filter, and output the heart rate and the respiration rate at the same time, so as to complete non-contact physiological signal detection based on efficient space-time modeling.
Optionally, the data acquisition module 310 is further configured to perform a time domain normalization difference value and a downsampling process on the original video stream respectively, so as to obtain a preprocessed image sequence;
wherein, the calculation of the time domain normalized difference is performed according to the following formula (1):
wherein the method comprises the steps ofRepresent the firstIndividual skin pixels are in timeIs used for the RGB values of (a),is a time variation value.
Optionally, the image sequence comprises: a time domain normalized difference image sequence and a downsampled image sequence.
Optionally, the spatio-temporal information extraction module 320 is further configured to input the time domain normalized difference image sequence as a motion branch into a deep neural network based on a three-dimensional center differential convolution operator;
taking the downsampled image sequence as an appearance branch, and inputting the downsampled image sequence into a depth neural network based on a three-dimensional center differential convolution operator;
through a attention mask mechanism, modeling skin interested areas based on appearance branches assist the motion branches to extract space-time information;
and repeatedly extracting the space-time information, and transmitting the space-time information to the full-connection layer.
Optionally, the spatio-temporal information extraction module 320 is further configured to
Obtaining a three-dimensional center difference convolution operator according to the following formula (2):
Wherein,,is an input feature map which is used to input a feature map,representing a local receptive field cube, the shape of which is shown,is a weight that can be learned and is,representing the current position on the feature map,representation of receptive fieldsAnd adjacent time stepsEnumeration of middle position, hyper-parametersFor balancing spatial intensity and gradient.
Optionally, the spatio-temporal information extraction module 320 is further configured to
Obtaining a function of the attention mask mechanism according to the following equation (3)The formula:
wherein,,is the appearance branchA feature map of a layer convolution layer;is the branch of motionA feature map of a layer convolution layer;andis the firstThe height and width of the layer convolution layer feature map;the sigmoid function is represented as a function,is the weight of the convolution kernel,is a convolution kernel offset which is a function of the convolution kernel,is the L1 norm of the sample,representing per-element products.
Optionally, a model optimization module 330 for calculating the intensity loss of pulse wave and respiratory wave according to the following equation (4) [ epsilon ] -insensitive Huber loss loss function:
Wherein the method comprises the steps ofA true value is indicated and,representing inputThrough a function ofThe predicted value after the mapping is used for mapping,is a hyper-parameter of Huber loss, default to 1,is an superparameter of epsilon-intrinsic loss, and the default value is 0.1;
constructing a multiple-task loss function in combination with the following equation (5)L Total :
and (3) optimizing the weight of the deep neural network through the back propagation loss function value, and stopping optimizing after the loss function is not reduced any more, namely selecting the deep neural network with the lowest loss function value in the training process.
Optionally, the data output module 340 is configured to use a second order butterworth filter deep neural network to output the heart rate and the respiration rate simultaneously;
wherein, the cut-off frequency of heart rate is 0.75Hz and 2.5Hz, and the cut-off frequency of respiratory frequency is 0.08Hz and 0.5Hz respectively;
the position of the highest peak value in the power spectrum obtained by filtering the signals is selected as heart rate and respiratory rate output, and non-contact physiological signal detection based on efficient space-time modeling is completed.
In the embodiment of the invention, a remote photoelectric volume pulse wave recovery method for physiological signal non-contact measurement based on efficient space-time modeling is provided. The effective space-time modeling is realized by combining a three-dimensional center differential convolution operator, a motion and appearance double-branch structure and a soft attention mask. The three-dimensional center differential convolution operator is good at describing the intrinsic mode of the pulse wave by a combination of gradient and intensity information. Deep neural networks based on three-dimensional central differential convolution operators can provide more reliable spatio-temporal information modeling capabilities than traditional three-dimensional convolutions. In addition, the patent firstly introduces an epsilon-insensitive Huber loss loss function in a remote photoplethysmography task, and simultaneously combines epsilon-insensitivity to ensure that the loss function can ignore noise samples in a insensitive domain, thereby increasing robustness and displaying better performance.
Fig. 4 is a schematic structural diagram of an electronic device 400 according to an embodiment of the present invention, where the electronic device 400 may have a relatively large difference due to different configurations or performances, and may include one or more processors (central processing units, CPU) 401 and one or more memories 402, where at least one instruction is stored in the memories 402, and the at least one instruction is loaded and executed by the processors 401 to implement the following steps of a non-contact physiological signal detection method based on efficient space-time modeling:
s1: acquiring an original video stream, preprocessing the original video stream, and acquiring a preprocessed image sequence;
s2: acquiring the image sequence, inputting the image sequence into a depth neural network based on a three-dimensional center differential convolution operator, and extracting space-time information by combining a notice mask mechanism of a convolution layer;
s3: constructing a multi-task loss function to optimize the deep neural network based on the space-time information and the epsilon-insensitive Huber loss loss function;
s4: and filtering the optimized deep neural network by adopting a second-order Butterworth filter, and outputting the heart rate and the respiratory rate simultaneously to finish non-contact physiological signal detection based on efficient space-time modeling.
In an exemplary embodiment, a computer readable storage medium, e.g. a memory comprising instructions executable by a processor in a terminal to perform the above-described non-contact physiological signal detection method based on efficient spatiotemporal modeling, is also provided. For example, the computer readable storage medium may be ROM, random Access Memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, etc.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and scope of the invention are intended to be included within the scope of the invention.
Claims (2)
1. The non-contact physiological signal detection method based on the efficient space-time modeling is characterized by comprising the following steps of:
s1: acquiring an original video stream, preprocessing the original video stream, and acquiring a preprocessed image sequence;
s2: acquiring the image sequence, inputting the image sequence into a depth neural network based on a three-dimensional center differential convolution operator, and extracting space-time information by combining a notice mask mechanism of a convolution layer;
s3: constructing a multi-task loss function to optimize the deep neural network based on the space-time information and the epsilon-insensitive Huber loss loss function;
in the step S3, constructing a multi-task loss function to optimize the deep neural network based on the spatio-temporal information and the epsilon-insensitive Huber loss loss function includes:
calculating the intensity loss of pulse wave and respiratory wave according to the following equation (4) epsilon-insensitive Huber loss loss function:
Wherein the method comprises the steps ofIndicating true value(s)>Representation input +.>Through a function->Mapped prediction value ∈ ->Is a superparameter of Huber loss, default 1, < >>Is an superparameter of epsilon-intrinsic loss, and the default value is 0.1;
the following formula (5) is combinedBuilding a multitasking loss functionL Total :
optimizing the weight of the deep neural network through the back propagation loss function value, stopping optimizing after the loss function is not reduced any more, and selecting a model of the deep neural network with the lowest loss function value in the training process;
s4: and filtering the optimized deep neural network by adopting a second-order Butterworth filter, and outputting the heart rate and the respiratory rate simultaneously to finish non-contact physiological signal detection based on efficient space-time modeling.
2. A non-contact physiological signal detection system based on efficient spatiotemporal modeling, characterized in that the system is adapted for use in the method of claim 1 above, the system comprising:
the data acquisition module is used for acquiring an original video stream, preprocessing the original video stream and acquiring a preprocessed image sequence;
the space-time information extraction module is used for acquiring the image sequence, inputting the image sequence into a deep neural network based on a three-dimensional center difference convolution operator, and extracting space-time information by combining a attention mask mechanism of a convolution layer;
the model optimization module is used for constructing a multi-task loss function to optimize the deep neural network based on the space-time information and the epsilon-insensitive Huber loss loss function;
and the data output module is used for filtering the optimized deep neural network by adopting a second-order Butterworth filter, outputting heart rate and respiratory rate at the same time, and finishing non-contact physiological signal detection based on efficient space-time modeling.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211451949.4A CN115624322B (en) | 2022-11-17 | 2022-11-17 | Non-contact physiological signal detection method and system based on efficient space-time modeling |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211451949.4A CN115624322B (en) | 2022-11-17 | 2022-11-17 | Non-contact physiological signal detection method and system based on efficient space-time modeling |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115624322A CN115624322A (en) | 2023-01-20 |
CN115624322B true CN115624322B (en) | 2023-04-25 |
Family
ID=84910095
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211451949.4A Active CN115624322B (en) | 2022-11-17 | 2022-11-17 | Non-contact physiological signal detection method and system based on efficient space-time modeling |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115624322B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115840890B (en) * | 2023-02-24 | 2023-05-19 | 北京科技大学 | Emotion recognition method and device based on non-contact physiological signals |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020047750A1 (en) * | 2018-09-04 | 2020-03-12 | 深圳先进技术研究院 | Arrhythmia detection method and apparatus, electronic device, and computer storage medium |
CN112998690A (en) * | 2021-03-29 | 2021-06-22 | 华南理工大学 | Pulse wave multi-feature fusion-based respiration rate extraction method |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113343821B (en) * | 2021-05-31 | 2022-08-30 | 合肥工业大学 | Non-contact heart rate measurement method based on space-time attention network and input optimization |
CN115024706A (en) * | 2022-05-16 | 2022-09-09 | 南京邮电大学 | Non-contact heart rate measurement method integrating ConvLSTM and CBAM attention mechanism |
-
2022
- 2022-11-17 CN CN202211451949.4A patent/CN115624322B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020047750A1 (en) * | 2018-09-04 | 2020-03-12 | 深圳先进技术研究院 | Arrhythmia detection method and apparatus, electronic device, and computer storage medium |
CN112998690A (en) * | 2021-03-29 | 2021-06-22 | 华南理工大学 | Pulse wave multi-feature fusion-based respiration rate extraction method |
Also Published As
Publication number | Publication date |
---|---|
CN115624322A (en) | 2023-01-20 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Meng et al. | Enhancing dynamic ECG heartbeat classification with lightweight transformer model | |
Hu et al. | Robust heart rate estimation with spatial–temporal attention network from facial videos | |
Hnoohom et al. | An Efficient ResNetSE Architecture for Smoking Activity Recognition from Smartwatch. | |
CN110944577A (en) | Method and system for detecting blood oxygen saturation | |
CN111914925B (en) | Patient behavior multi-modal perception and analysis system based on deep learning | |
CN112465905A (en) | Characteristic brain region positioning method of magnetic resonance imaging data based on deep learning | |
CN115024706A (en) | Non-contact heart rate measurement method integrating ConvLSTM and CBAM attention mechanism | |
CN112001122A (en) | Non-contact physiological signal measuring method based on end-to-end generation countermeasure network | |
CN115624322B (en) | Non-contact physiological signal detection method and system based on efficient space-time modeling | |
CN116343284A (en) | Attention mechanism-based multi-feature outdoor environment emotion recognition method | |
CN111863232A (en) | Remote disease intelligent diagnosis system based on block chain and medical image | |
CN111461206B (en) | Electroencephalogram-based fatigue detection method for steering wheel embedded electroencephalogram sensor | |
Brophy et al. | An interpretable machine vision approach to human activity recognition using photoplethysmograph sensor data | |
Zhao et al. | A CNN based human bowel sound segment recognition algorithm with reduced computation complexity for wearable healthcare system | |
Asyhar et al. | Implementation LSTM Algorithm for Cervical Cancer using Colposcopy Data | |
Lee et al. | Lstc-rppg: Long short-term convolutional network for remote photoplethysmography | |
Chen et al. | A novel imbalanced dataset mitigation method and ECG classification model based on combined 1D_CBAM-autoencoder and lightweight CNN model | |
CN114287910A (en) | Brain function connection classification method based on multi-stage graph convolution fusion | |
KR102493242B1 (en) | Method and System for Judging Aortic Valve Stenosis Risk and Other Cardiovascular Diseases Risk from Photoplethysmography through Artificial Intelligence Learning | |
Zhang et al. | Research on lung sound classification model based on dual-channel CNN-LSTM algorithm | |
CN111278353A (en) | Method and system for detecting vital sign signal noise | |
CN116975693A (en) | Method and system for detecting heart sounds based on deep learning and heterogeneous integration strategy | |
Lin et al. | Remote Physiological Measurement With Multiscale Feature Extraction and Global Attention Module | |
CN116138756A (en) | Non-contact heart rate detection method, system and medium based on face feature point detection | |
CN116172573A (en) | Arrhythmia image classification method based on improved acceptance-ResNet-v 2 |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |