CN115624322B

CN115624322B - Non-contact physiological signal detection method and system based on efficient space-time modeling

Info

Publication number: CN115624322B
Application number: CN202211451949.4A
Authority: CN
Inventors: 邹博超; 马惠敏
Original assignee: University of Science and Technology Beijing USTB
Current assignee: University of Science and Technology Beijing USTB
Priority date: 2022-11-17
Filing date: 2022-11-17
Publication date: 2023-04-25
Anticipated expiration: 2042-11-17
Also published as: CN115624322A

Abstract

The invention provides a non-contact physiological signal detection method and system based on efficient space-time modeling, and relates to the technical field of intelligent data processing. The method comprises the steps of obtaining an original video stream, preprocessing the original video stream, and obtaining a preprocessed image sequence; acquiring an image sequence, inputting the image sequence into a depth neural network based on a three-dimensional center differential convolution operator, and extracting space-time information by combining a attention mask mechanism of a convolution layer; based on the space-time information and the epsilon-insensitive Huber loss loss function, constructing a multi-task loss function to optimize the deep neural network; and filtering the optimized deep neural network by adopting a second-order Butterworth filter, and outputting the heart rate and the respiratory rate. The three-dimensional center difference convolution operator can be used for extracting pulse wave information by gathering time difference information, and cross-database evaluation and ablation research are carried out, so that the effectiveness and the robustness of the method are proved.

Description

A non-contact physiological signal detection method and system based on efficient spatio-temporal modeling

技术领域technical field

本发明涉及智能数据处理技术领域，特别是指一种基于高效时空建模的非接触生理信号检测方法及系统。The invention relates to the technical field of intelligent data processing, in particular to a non-contact physiological signal detection method and system based on efficient spatio-temporal modeling.

背景技术Background technique

随着现代社会的不断发展、人们的生活水平不断提高的同时，心血管疾病的发病率也不断上升，这可能是由于工作压力增加和生活节奏加快所致。人体生理指标的检测对于感知人体健康状况非常重要。早期发现和治疗可以有效预防和控制心血管疾病，可避免和减少因心血管问题而导致的猝死。传统的心率测量方法，如心电图，都是接触式心率测量，存在以下局限：一是不适用于某些特定的应用场景，如需要长期监控的场景：新生儿、大面积烧伤患者、睡眠监控、驾驶员监控等，且接触测量需要受试者的主观合作。二是当测量仪器与皮肤接触的位置偏离时，很容易导致测量结果出现较大偏差。此外，尽管心电图仪具有精确测量的优点，但它相对昂贵，需要专业操作，不适合日常生理信号测量。随着非接触远程光电容积脉搏波原型方法的初步成功，经典信号处理已经证明了基于远程光电容积脉搏波技术的心率测量的可行性。然而，这些方法在遇到运动、光照变化和肤色等噪声时往往会出现退化。在实际应用中发现基于信号分离的方法只能针对特定的干扰，不能有效地处理真实场景中多个干扰的共存。With the continuous development of modern society and the continuous improvement of people's living standards, the incidence of cardiovascular diseases is also increasing, which may be due to increased work pressure and accelerated pace of life. The detection of human physiological indicators is very important for the perception of human health. Early detection and treatment can effectively prevent and control cardiovascular diseases, and can avoid and reduce sudden death caused by cardiovascular problems. Traditional heart rate measurement methods, such as electrocardiograms, are contact heart rate measurements, which have the following limitations: First, they are not suitable for some specific application scenarios, such as scenarios that require long-term monitoring: newborns, patients with extensive burns, sleep monitoring, Driver monitoring, etc., and exposure measurements require the subject's subjective cooperation. The second is that when the position of the measuring instrument in contact with the skin deviates, it is easy to cause large deviations in the measurement results. Furthermore, despite the advantages of accurate measurements, an electrocardiograph is relatively expensive, requires professional operation, and is not suitable for routine physiological signal measurement. With the initial success of the non-contact remote photoplethysmography prototype method, Classical Signal Processing has demonstrated the feasibility of telephotoplethysmography-based heart rate measurement. However, these methods tend to degrade when encountering noise such as motion, lighting changes, and skin color. In practical applications, it is found that the method based on signal separation can only target specific interference, and cannot effectively deal with the coexistence of multiple interferences in real scenes.

对于实际应用，最近的研究开始关注基于深度学习的方法，因为其更好的表征能力。几种基于深度学习的方法已成功应用于以脉搏或呼吸为目标信号的远程光电容积脉搏波恢复任务，但这些方法仍难以有效建模时空信息。虽然存在通过生成特征图来建模时空信息的方法，该特征图需要预处理，包括人脸检测、面部关键点定位、面部对齐、皮肤分割和颜色空间变换，这些都相当复杂。此外，由于监督学习的局限性，跨数据库评估和实际应用的性能会下降。所学习的时空特征仍然容易受到光照条件和运动的影响，并且它们无法充分利用广泛的时间上下文信息。For practical applications, recent research has started to focus on deep learning based methods due to their better representation capabilities. Several deep learning-based methods have been successfully applied to the task of remote photoplethysmography recovery with pulse or respiration as the target signal, but these methods still struggle to effectively model spatio-temporal information. Although methods exist to model spatiotemporal information by generating feature maps, the feature maps require preprocessing, including face detection, facial keypoint localization, face alignment, skin segmentation, and color space transformation, which are quite complex. In addition, due to the limitations of supervised learning, the performance of cross-database evaluation and real application will drop. The learned spatio-temporal features are still vulnerable to lighting conditions and motion, and they cannot fully utilize extensive temporal context information.

发明内容Contents of the invention

针对现有技术中，学习的时空特征仍然容易受到光照条件和运动的影响，并且它们无法充分利用广泛的时间上下文信息的问题，本发明提出了一种基于高效时空建模的非接触生理信号检测方法及系统。Aiming at the problem that in the prior art, the learned spatio-temporal features are still easily affected by lighting conditions and motion, and they cannot make full use of extensive temporal context information, the present invention proposes a non-contact physiological signal detection based on efficient spatio-temporal modeling methods and systems.

为解决上述技术问题，本发明提供如下技术方案：In order to solve the above technical problems, the present invention provides the following technical solutions:

一方面，提供了一种基于高效时空建模的非接触生理信号检测方法，该方法应用于电子设备，包括以下步骤：In one aspect, a non-contact physiological signal detection method based on efficient spatio-temporal modeling is provided, the method is applied to electronic equipment, comprising the following steps:

S1：获取原视频流，对原视频流进行预处理，获得预处理后的图像序列；S1: Obtain the original video stream, preprocess the original video stream, and obtain the preprocessed image sequence;

S2：获取图像序列，将图像序列输入基于三维中心差分卷积算子的深度神经网络中，结合卷积层的注意掩膜机制，提取时空信息；S2: Obtain the image sequence, input the image sequence into the deep neural network based on the three-dimensional central difference convolution operator, combine the attention mask mechanism of the convolution layer, and extract the spatiotemporal information;

S3：基于时空信息以及ε-insensitive Huber loss 损失函数，构建多任务损失函数对深度神经网络进行优化；S3: Based on spatiotemporal information and ε-insensitive Huber loss loss function, build a multi-task loss function to optimize the deep neural network;

S4：采用二阶巴特沃斯滤波器过滤优化后的深度神经网络，同时输出心率与呼吸率，完成基于高效时空建模的非接触生理信号检测。S4: The second-order Butterworth filter is used to filter the optimized deep neural network, and the heart rate and respiration rate are output at the same time, and the non-contact physiological signal detection based on efficient spatiotemporal modeling is completed.

可选地，步骤S1中，获取原视频流，对原视频流进行预处理，获得预处理后的图像序列，包括：Optionally, in step S1, the original video stream is obtained, and the original video stream is preprocessed to obtain a preprocessed image sequence, including:

将原始视频流分别进行时域归一化差值和下采样处理，获得预处理后的图像序列；The original video stream is subjected to time-domain normalized difference and down-sampling processing to obtain a preprocessed image sequence;

其中，根据下述公式（1）进行时域归一化差值的计算：Among them, the time domain normalized difference is calculated according to the following formula (1):

其中表示第个皮肤像素在时间的RGB值，为时间变化值。in Indicates the first skin pixels at time the RGB value, is the time-varying value.

可选地，图像序列包括：时域归一化差值图像序列以及下采样图像序列。Optionally, the image sequence includes: a time domain normalized difference image sequence and a downsampled image sequence.

可选地，步骤S2中，获取图像序列，将图像序列输入基于三维中心差分卷积算子的深度神经网络中，结合卷积层的注意掩膜机制，提取时空信息，包括：Optionally, in step S2, an image sequence is obtained, and the image sequence is input into a deep neural network based on a three-dimensional central difference convolution operator, and combined with the attention mask mechanism of the convolution layer, spatiotemporal information is extracted, including:

S21：将时域归一化差值图像序列作为运动分支，输入基于三维中心差分卷积算子的深度神经网络中；S21: Taking the time-domain normalized difference image sequence as a motion branch, and inputting it into a deep neural network based on a three-dimensional central difference convolution operator;

将下采样图像序列作为外观分支，输入基于三维中心差分卷积算子的深度神经网络中；The downsampled image sequence is used as the appearance branch and input into the deep neural network based on the three-dimensional central difference convolution operator;

S22：通过注意掩膜机制，基于外观分支的建模皮肤感兴趣区域，协助运动分支提取时空信息；S22: Through the attention mask mechanism, based on the appearance branch to model the skin area of interest, assist the motion branch to extract spatiotemporal information;

S23：重复步骤S21-S22，进行时空信息的提取，将时空信息传至全连接层。S23: Repeat steps S21-S22 to extract spatio-temporal information, and transmit the spatio-temporal information to the fully connected layer.

可选地，步骤S21中，包括：Optionally, in step S21, including:

根据下述公式（2）获得三维中心差分卷积算子：The three-dimensional central difference convolution operator is obtained according to the following formula (2): :

其中，是输入特征图，表示局部感受野立方体，是可学习的权重，表示特征图上的当前位置，表示对感受野和相邻时间步中位置的枚举，超参数用于平衡空间强度和梯度。in, is the input feature map, Represents the local receptive field cube, are learnable weights, Indicates the current position on the feature map, receptive field and adjacent time steps An enumeration of positions in the hyperparameter Used to balance spatial strength and gradients.

可选地，步骤S22中，包括：Optionally, in step S22, including:

根据下述公式（3）获得注意掩膜机制的函数公式：The function of the attention masking mechanism is obtained according to the following formula (3): formula:

其中，是外观分支第层卷积层的特征图；是运动分支第层卷积层的特征图；和是第层卷积层特征图的高和宽；表示sigmoid函数，是卷积核权重，是卷积核偏置，是L1范数，表示按元素乘积。in, is the appearance branch The feature map of the layer convolutional layer; is the sports branch The feature map of the layer convolutional layer; and is the first The height and width of the feature map of the layer convolution layer; Indicates the sigmoid function, is the convolution kernel weight, is the convolution kernel bias, is the L1 norm, Represents an element-wise product.

可选地，步骤S3中，基于时空信息以及ε-insensitive Huber loss 损失函数，构建多任务损失函数对深度神经网络进行优化，包括：Optionally, in step S3, based on the spatiotemporal information and the ε-insensitive Huber loss loss function, a multi-task loss function is constructed to optimize the deep neural network, including:

根据下述公式（4）ε-insensitive Huber loss 损失函数计算脉搏波和呼吸波的强度损失：Calculate the intensity loss of pulse wave and respiratory wave according to the following formula (4) ε-insensitive Huber loss loss function :

其中表示真值，表示输入经过函数映射后的预测值，是Huberloss的超参数，默认值为1，是ε-insensitive loss的超参数，默认值为0.1；in represents the truth value, Indicates input After the function The predicted value after mapping, Is the hyperparameter of Huberloss, the default value is 1, Is the hyperparameter of ε-insensitive loss, the default value is 0.1;

结合下述公式（5）构建多任务损失函数：Combining the following formula (5) to construct a multi-task loss function :

其中和为损失函数权重；in and is the loss function weight;

通过反向传播损失函数值优化深度神经网络权重，在损失函数不再下降后停止优化，即选择训练过程中损失函数值最低的深度神经网络。Optimize the weight of the deep neural network by backpropagating the loss function value, and stop the optimization after the loss function no longer decreases, that is, select the deep neural network with the lowest loss function value during the training process.

可选地，步骤S4中，采用二阶巴特沃斯滤波器过滤优化后的深度神经网络，同时输出心率与呼吸率，完成基于高效时空建模的非接触生理信号检测，包括：Optionally, in step S4, the second-order Butterworth filter is used to filter the optimized deep neural network, and the heart rate and respiration rate are simultaneously output to complete non-contact physiological signal detection based on efficient spatiotemporal modeling, including:

采用二阶巴特沃斯滤波器深度神经网络，同时输出心率与呼吸率；Using second-order Butterworth filter deep neural network, output heart rate and respiration rate at the same time;

其中，心率的截止频率为0.75Hz和2.5Hz，呼吸频率的截止频率分别为0.08Hz和0.5Hz；Among them, the cut-off frequencies of heart rate are 0.75Hz and 2.5Hz, and the cut-off frequencies of respiratory frequency are 0.08Hz and 0.5Hz respectively;

其中，选择滤波信号获得的功率谱中最高峰值的位置作为心率及呼吸率输出，完成基于高效时空建模的非接触生理信号检测。Among them, the position of the highest peak in the power spectrum obtained by filtering the signal is selected as the output of heart rate and respiration rate, and the non-contact physiological signal detection based on efficient spatiotemporal modeling is completed.

一方面，提供了一种基于高效时空建模的非接触生理信号检测系统，该系统应用于电子设备，该系统包括：In one aspect, a non-contact physiological signal detection system based on efficient spatiotemporal modeling is provided, the system is applied to electronic equipment, and the system includes:

数据采集模块，用于获取原视频流，对原视频流进行预处理，获得预处理后的图像序列；The data acquisition module is used to obtain the original video stream, preprocess the original video stream, and obtain the preprocessed image sequence;

时空信息提取模块，用于获取图像序列，将图像序列输入基于三维中心差分卷积算子的深度神经网络中，结合卷积层的注意掩膜机制，提取时空信息；The spatiotemporal information extraction module is used to obtain image sequences, input the image sequences into the deep neural network based on the three-dimensional central difference convolution operator, and extract spatiotemporal information in combination with the attention mask mechanism of the convolutional layer;

模型优化模块，用于基于时空信息以及ε-insensitive Huber loss 损失函数，构建多任务损失函数对深度神经网络进行优化；The model optimization module is used to construct a multi-task loss function to optimize the deep neural network based on spatiotemporal information and ε-insensitive Huber loss loss function;

数据输出模块，用于采用二阶巴特沃斯滤波器过滤优化后的深度神经网络，同时输出心率与呼吸率，完成基于高效时空建模的非接触生理信号检测。The data output module is used to use the second-order Butterworth filter to filter the optimized deep neural network, output heart rate and respiration rate at the same time, and complete non-contact physiological signal detection based on efficient spatiotemporal modeling.

可选地，数据采集模块，进一步用于将原始视频流分别进行时域归一化差值和下采样处理，获得预处理后的图像序列；Optionally, the data acquisition module is further configured to perform time-domain normalized difference and down-sampling processing on the original video stream, respectively, to obtain a preprocessed image sequence;

一方面，提供了一种电子设备，所述电子设备包括处理器和存储器，所述存储器中存储有至少一条指令，所述至少一条指令由所述处理器加载并执行以实现上述一种基于高效时空建模的非接触生理信号检测方法。In one aspect, an electronic device is provided, the electronic device includes a processor and a memory, and at least one instruction is stored in the memory, and the at least one instruction is loaded and executed by the processor to realize the above-mentioned high-efficiency-based Non-contact physiological signal detection method for spatiotemporal modeling.

一方面，提供了一种计算机可读存储介质，所述存储介质中存储有至少一条指令，所述至少一条指令由处理器加载并执行以实现上述一种基于高效时空建模的非接触生理信号检测方法。In one aspect, a computer-readable storage medium is provided, wherein at least one instruction is stored in the storage medium, and the at least one instruction is loaded and executed by a processor to realize the above-mentioned non-contact physiological signal based on efficient spatio-temporal modeling Detection method.

本发明实施例的上述技术方案至少具有如下有益效果：The above-mentioned technical solutions of the embodiments of the present invention have at least the following beneficial effects:

上述方案中，提出了一种基于三维中心差卷积注意力网络的精确非接触生理信号测量方法，用于高效时空建模，利用的三维中心差分卷积算子可以通过聚集时间差异信息来提取脉搏波信息。In the above scheme, an accurate non-contact physiological signal measurement method based on a 3D centered difference convolutional attention network is proposed for efficient spatiotemporal modeling, and the utilized 3D centered difference convolution operator can extract Pulse wave information.

提出了ε-insensitive Huber loss作为非接触脉搏波测量网络的损失函数，因其可以聚焦脉搏波强度约束，通过评估不同的损耗函数及其组合，显示了ε-insensitiveHuber loss损失函数更优的性能。The ε-insensitive Huber loss is proposed as the loss function of the non-contact pulse wave measurement network, because it can focus on the pulse wave intensity constraints, and the better performance of the ε-insensitive Huber loss loss function is shown by evaluating different loss functions and their combinations.

且进一步提出了心脏和呼吸运动联合多任务测量的网络，它具有在相关生理信号之间共享信息的优点，可以同时测量心率与呼吸率，并进一步提高精度，同时降低计算成本。大量实验表明，所提出的方法在公开数据库上性能优异。并进行了跨数据库评估和消融研究，证明了所提出方法的有效性和鲁棒性。And further proposed a network for joint multi-task measurement of heart and respiratory motion, which has the advantage of sharing information between related physiological signals, can measure heart rate and respiratory rate at the same time, and further improve accuracy while reducing computational cost. Extensive experiments show that the proposed method has excellent performance on public databases. Cross-database evaluation and ablation studies are also conducted to demonstrate the effectiveness and robustness of the proposed method.

附图说明Description of drawings

为了更清楚地说明本发明实施例中的技术方案，下面将对实施例描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings that need to be used in the description of the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present invention. For those skilled in the art, other drawings can also be obtained based on these drawings without creative effort.

图1是本发明实施例提供的一种基于高效时空建模的非接触生理信号检测方法的流程图；FIG. 1 is a flowchart of a non-contact physiological signal detection method based on efficient spatiotemporal modeling provided by an embodiment of the present invention;

图2是本发明实施例提供的一种基于高效时空建模的非接触生理信号检测方法的流程图；2 is a flowchart of a non-contact physiological signal detection method based on efficient spatiotemporal modeling provided by an embodiment of the present invention;

图3是本发明实施例提供的一种基于高效时空建模的非接触生理信号检测系统框图；Fig. 3 is a block diagram of a non-contact physiological signal detection system based on efficient spatio-temporal modeling provided by an embodiment of the present invention;

图4是本发明实施例提供的一种电子设备的结构示意图。Fig. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the present invention.

具体实施方式Detailed ways

为使本发明要解决的技术问题、技术方案和优点更加清楚，下面将结合附图及具体实施例进行详细描述。In order to make the technical problems, technical solutions and advantages to be solved by the present invention clearer, the following will describe in detail with reference to the drawings and specific embodiments.

本发明实施例提供了一种基于高效时空建模的非接触生理信号检测方法，该方法可以由电子设备实现，该电子设备可以是终端或服务器。如图1所示的基于高效时空建模的非接触生理信号检测方法流程图，该方法的处理流程可以包括如下的步骤：An embodiment of the present invention provides a non-contact physiological signal detection method based on efficient spatiotemporal modeling, which can be implemented by an electronic device, and the electronic device can be a terminal or a server. The flow chart of the non-contact physiological signal detection method based on efficient spatio-temporal modeling as shown in Figure 1, the processing flow of the method may include the following steps:

S101：获取原视频流，对原视频流进行预处理，获得预处理后的图像序列；S101: Obtain the original video stream, perform preprocessing on the original video stream, and obtain a preprocessed image sequence;

S102：获取图像序列，将图像序列输入基于三维中心差分卷积算子的深度神经网络中，结合卷积层的注意掩膜机制，提取时空信息；S102: Obtain an image sequence, input the image sequence into a deep neural network based on a three-dimensional central difference convolution operator, combine the attention mask mechanism of the convolution layer, and extract spatiotemporal information;

S103：基于时空信息以及ε-insensitive Huber loss 损失函数，构建多任务损失函数对深度神经网络进行优化；S103: Based on spatiotemporal information and ε-insensitive Huber loss loss function, construct a multi-task loss function to optimize the deep neural network;

S104：采用二阶巴特沃斯滤波器过滤优化后的深度神经网络，同时输出心率与呼吸率，完成基于高效时空建模的非接触生理信号检测。S104: Use the second-order Butterworth filter to filter the optimized deep neural network, output heart rate and respiration rate at the same time, and complete non-contact physiological signal detection based on efficient spatio-temporal modeling.

可选地，步骤S101中，获取原视频流，对原视频流进行预处理，获得预处理后的图像序列，包括：Optionally, in step S101, the original video stream is obtained, and the original video stream is preprocessed to obtain a preprocessed image sequence, including:

将原始视频流分别进行时域归一化差值和下采样处理，获得预处理后的图像序列：The original video stream is subjected to time-domain normalized difference and down-sampling processing to obtain the preprocessed image sequence:

可选地，步骤S102中，获取图像序列，将图像序列输入基于三维中心差分卷积算子的深度神经网络中，结合卷积层的注意掩膜机制，提取时空信息，包括：Optionally, in step S102, an image sequence is obtained, and the image sequence is input into a deep neural network based on a three-dimensional central difference convolution operator, and combined with the attention mask mechanism of the convolution layer, spatiotemporal information is extracted, including:

S121：将时域归一化差值图像序列作为运动分支，输入基于三维中心差分卷积算子的深度神经网络中；S121: Taking the time-domain normalized difference image sequence as a motion branch, and inputting it into a deep neural network based on a three-dimensional central difference convolution operator;

S122：通过注意掩膜机制，基于外观分支的建模皮肤感兴趣区域，协助运动分支提取时空信息；S122: Through the attention mask mechanism, modeling the skin region of interest based on the appearance branch, assisting the motion branch to extract spatiotemporal information;

S123：重复步骤S121-S122，进行时空信息的提取，将时空信息传至全连接层。S123: Repeat steps S121-S122 to extract spatio-temporal information, and transmit the spatio-temporal information to the fully connected layer.

可选地，步骤S121中，包括：Optionally, in step S121, including:

其中，是输入特征图，表示局部感受野立方体，是可学习的权重，表示特征图上的当前位置，表示对感受野和相邻时间步中位置的枚举，超参数用于平衡空间强度和梯度。in, is the input feature map, Represents the local receptive field cube, are learnable weights, represents the current position on the feature map, receptive field and adjacent time steps An enumeration of positions in the hyperparameter Used to balance spatial strength and gradients.

可选地，步骤S122中，包括：Optionally, in step S122, including:

可选地，步骤S103中，基于时空信息以及ε-insensitive Huber loss 损失函数，构建多任务损失函数对深度神经网络进行优化，包括：Optionally, in step S103, based on spatiotemporal information and ε-insensitive Huber loss loss function, a multi-task loss function is constructed to optimize the deep neural network, including:

其中表示真值，表示输入经过函数映射后的预测值，是Huber loss的超参数，默认值为1，是ε-insensitive loss的超参数，默认值为0.1；in represents the truth value, Indicates input After the function The predicted value after mapping, Is the hyperparameter of Huber loss, the default value is 1, Is the hyperparameter of ε-insensitive loss, the default value is 0.1;

其中和为损失函数权重：in and For the loss function weights:

通过反向传播损失函数值优化深度神经网络权重，在损失函数不再下降后停止优化，选择训练过程中损失函数值最低的深度神经网络。Optimize the weight of the deep neural network by backpropagating the loss function value, stop the optimization after the loss function no longer decreases, and select the deep neural network with the lowest loss function value during the training process.

可选地，步骤S104中，采用二阶巴特沃斯滤波器过滤优化后的深度神经网络，同时输出心率与呼吸率，完成基于高效时空建模的非接触生理信号检测，包括：Optionally, in step S104, the second-order Butterworth filter is used to filter the optimized deep neural network, and the heart rate and respiration rate are simultaneously output to complete non-contact physiological signal detection based on efficient spatiotemporal modeling, including:

本发明实施例中，提出了一种基于高效时空建模的用于生理信号非接触测量的远程光电容积脉搏波恢复方法。其有效的时空建模是通过结合三维中心差分卷积算子、运动和外观双分支结构以及软注意掩膜来实现的。三维中心差分卷积算子擅长通过梯度和强度信息的组合来描述脉搏波的内在模式。基于三维中心差分卷积算子的深度神经网络可以提供比传统三维卷积更可靠的时空信息建模能力。In the embodiment of the present invention, a remote photoplethysmography recovery method for non-contact measurement of physiological signals based on efficient spatiotemporal modeling is proposed. Its efficient spatiotemporal modeling is achieved by combining a 3D central difference convolution operator, a motion and appearance dual branch structure, and a soft attention mask. The 3D central difference convolution operator is good at describing the intrinsic pattern of the pulse wave through the combination of gradient and intensity information. The deep neural network based on 3D central difference convolution operator can provide more reliable spatiotemporal information modeling ability than traditional 3D convolution.

此外，本专利首先在远程光电容积脉搏波任务中引入了ε-insensitive Huberloss 损失函数，它具有L1和L2损失的优点，同时结合ε-insensitive 使得损失函数可忽略不敏感域的噪声样本从而增加鲁棒性，显示出更好的性能。In addition, this patent first introduces the ε-insensitive Huberloss loss function in the remote photoplethysmography task, which has the advantages of L1 and L2 loss, and combines ε-insensitive so that the loss function can ignore the noise samples in the insensitive region, thereby increasing the robustness. stickiness, showing better performance.

本发明实施例提供了一种基于高效时空建模的非接触生理信号检测方法，该方法可以由电子设备实现，该电子设备可以是终端或服务器。如图2所示的基于高效时空建模的非接触生理信号检测方法流程图，该方法的处理流程可以包括如下的步骤：An embodiment of the present invention provides a non-contact physiological signal detection method based on efficient spatiotemporal modeling, which can be implemented by an electronic device, and the electronic device can be a terminal or a server. As shown in Figure 2, the flow chart of the non-contact physiological signal detection method based on efficient spatiotemporal modeling, the processing flow of the method may include the following steps:

S201：获取原视频流，对所述原视频流进行预处理，获得预处理后的图像序列；S201: Obtain an original video stream, perform preprocessing on the original video stream, and obtain a preprocessed image sequence;

一种可行的实施方式中，将原始视频流分别进行时域归一化差值和下采样处理，处理后分辨率为36×36×3。获得预处理后的图像序列；In a feasible implementation manner, the original video stream is subjected to time-domain normalized difference and down-sampling processing, and the resolution after processing is 36×36×3. Obtain the preprocessed image sequence;

。 .

一种可行的实施方式中，图像序列包括：时域归一化差值图像序列以及下采样图像序列。在本实施例中图像序列均为10帧。In a feasible implementation manner, the image sequence includes: a time-domain normalized difference image sequence and a downsampled image sequence. In this embodiment, the image sequences are all 10 frames.

S202：将时域归一化差值图像序列作为运动分支，输入基于三维中心差分卷积算子的深度神经网络中；S202: Taking the time-domain normalized difference image sequence as a motion branch, and inputting it into a deep neural network based on a three-dimensional central difference convolution operator;

S203：通过注意掩膜机制，基于外观分支的建模皮肤感兴趣区域，协助运动分支提取时空信息。S203: Through the attention mask mechanism, modeling skin ROI based on the appearance branch, assisting the motion branch to extract spatio-temporal information.

S204：重复步骤S202-203，进行时空信息的提取，将所述时空信息传至全连接层。S204: Repeat steps S202-203 to extract spatio-temporal information, and transmit the spatio-temporal information to the fully connected layer.

一种可行的实施方式，根据下述公式（2）获得所述三维中心差分卷积算子：In a feasible implementation manner, the three-dimensional central difference convolution operator is obtained according to the following formula (2):

一种可行的实施方式中，卷积核大小为(3×3×3, 32)，卷积层数为2。In a feasible implementation, the size of the convolution kernel is (3×3×3, 32), and the number of convolution layers is 2.

一种可行的实施方式中，根据下述公式（2）获得所述注意掩膜机制的函数公式：In a feasible implementation manner, the functional formula of the attention mask mechanism is obtained according to the following formula (2):

。 .

一种可行的实施方式中，层化层，采用平均池化，核大小为(2×2×2, 32)，并带有概率为0.5的Dropout。In a feasible implementation, the stratified layer adopts average pooling, the kernel size is (2×2×2, 32), and it has a dropout probability of 0.5.

一种可行的实施方式中，全连接层，输入输出维度为128和10，并带有概率为0.5的Dropout。In a feasible implementation, the fully connected layer has input and output dimensions of 128 and 10, and has a dropout probability of 0.5.

S205：基于时空信息以及ε-insensitive Huber loss 损失函数，构建多任务损失函数对深度神经网络进行优化；S205: Based on spatiotemporal information and ε-insensitive Huber loss loss function, construct a multi-task loss function to optimize the deep neural network;

一种可行的实施方式中，根据下述公式（4）ε-insensitive Huber loss 损失函数计算脉搏波和呼吸波的强度损失：In a feasible implementation, the intensity loss of pulse wave and respiratory wave is calculated according to the following formula (4) ε-insensitive Huber loss loss function:

结合下述公式（5）作为多任务损失函数：Combining the following formula (5) as a multi-task loss function:

其中和为损失函数权重；in and is the loss function weight;

一种可行的实施方式中，将损失函数值最低的深度神经网络模型作为最终模型。In a feasible implementation manner, the deep neural network model with the lowest loss function value is used as the final model.

一种可行的实施方式中，α=β=1在一个实例中采用，最后通过反向传播损失函数值优化神经网络权重，在损失函数不再下降后停止，此时效果最好的模型作为最终模型采用恢复脉搏波和呼吸波；In a feasible implementation, α=β=1 is adopted in an example, and finally optimize the weight of the neural network by backpropagating the loss function value, and stop when the loss function no longer decreases, and the model with the best effect at this time is used as the final The model adopts recovery pulse wave and respiration wave;

S206：采用二阶巴特沃斯滤波器过滤优化后的深度神经网络，同时输出心率与呼吸率，完成基于高效时空建模的非接触生理信号检测。S206: Use the second-order Butterworth filter to filter the optimized deep neural network, output heart rate and respiration rate at the same time, and complete non-contact physiological signal detection based on efficient spatio-temporal modeling.

一种可行的实施方式中，采用二阶巴特沃斯滤波器进一步过滤网络输出心率与呼吸率，其中心率的截止频率为0.75和2.5Hz，呼吸频率的截止频率分别为0.08和0.5Hz；In a feasible implementation, a second-order Butterworth filter is used to further filter the network output heart rate and respiration rate, and the cut-off frequencies of the heart rate are 0.75 and 2.5 Hz, and the cut-off frequencies of the respiration frequency are 0.08 and 0.5 Hz, respectively;

选择滤波信号获得的功率谱中最高峰值的位置作为心率及呼吸率输出，完成基于高效时空建模的非接触生理信号检测。The position of the highest peak in the power spectrum obtained by filtering the signal is selected as the output of heart rate and respiration rate, and the non-contact physiological signal detection based on efficient spatiotemporal modeling is completed.

一种可行的实施方式中，本发明与以前的先前方法相比，主要有两个方面的区别：一个是时空网络。传统技术中采用了三维卷积和时间移位卷积，旨在减少计算预算但没有精度增益。本文使用的三维中心差分卷积算子可以在不需要额外参数的情况下取代传统的卷积运算。结果改善的是由于增强的时空上下文建模能力有助于外观和运动信息的表示。同时，中心差分可视为正则化项，缓解过度拟合。另一种是网络架构，如Dual-GAN，它是用于信号解耦的基于生成对抗网络架构的设计，在某些度量（如UBFC数据集上的均方根误差）的性能优于本发明的方法。但Dual GAN包含称为时空地图生成的预处理步骤，包括人脸检测、面部关键点定位、面部对齐、皮肤分割和颜色空间变换，这些预处理相对复杂。而本发明的方法只需要帧之间的简单减法运算作为运动分支的输入。In a feasible implementation mode, compared with the previous previous methods, the present invention mainly has two differences: one is the spatio-temporal network. Three-dimensional convolution and time-shifted convolution are used in traditional techniques, aiming to reduce the computational budget but without accuracy gain. The 3D central difference convolution operator used in this paper can replace the traditional convolution operation without additional parameters. The improved results are due to the enhanced spatio-temporal context modeling capability that facilitates the representation of appearance and motion information. At the same time, the central difference can be regarded as a regularization term to alleviate overfitting. The other is a network architecture, such as Dual-GAN, which is a design based on a generative confrontation network architecture for signal decoupling, and its performance in some metrics (such as the root mean square error on the UBFC dataset) is better than that of the present invention Methods. However, Dual GAN includes a preprocessing step called spatio-temporal map generation, including face detection, facial key point positioning, face alignment, skin segmentation, and color space transformation, which are relatively complex. Whereas the method of the present invention only requires a simple subtraction between frames as input to the motion branch.

此外，在损失函数方面，所采用的ε-insensitive Huber loss当预测信号和真实信号之间的损失接近最小值时，梯度随着Huber损失而缓慢减小，因此该模型在信号预测中更具鲁棒性。本发明提出的多任务网络能够建模内部相关性，与单任务版本相比具有准确性优势，并同时节省计算资源。总体而言，基于高效时空建模的远程光电容积脉搏波技术测量网络可以通过聚集远程光电容积脉搏波技术相关的时间差异信息来捕获丰富的时间上下文，获得更加鲁棒准确的非接触生理信号测量结果。In addition, in terms of loss function, the adopted ε-insensitive Huber loss when the loss between the predicted signal and the real signal is close to the minimum, the gradient decreases slowly with the Huber loss, so the model is more robust in signal prediction Stickiness. The multi-task network proposed by the present invention is capable of modeling internal dependencies, has accuracy advantages over the single-task version, and saves computing resources at the same time. Overall, the remote photoplethysmography measurement network based on efficient spatiotemporal modeling can capture rich temporal context by aggregating temporal difference information related to remote photoplethysmography, and obtain more robust and accurate non-contact physiological signal measurements result.

本发明实施例中，提出了一种基于三维中心差卷积注意力网络的精确非接触生理信号测量方法，用于高效时空建模，利用的三维中心差分卷积算子可以通过聚集时间差异信息来提取脉搏波信息。In the embodiment of the present invention, an accurate non-contact physiological signal measurement method based on a three-dimensional center difference convolution attention network is proposed for efficient spatio-temporal modeling. The three-dimensional center difference convolution operator used can gather time difference information to extract pulse wave information.

提出了ε-insensitive Huber loss 作为非接触脉搏波测量网络的损失函数，因其可以聚焦脉搏波强度约束，通过评估不同的损耗函数及其组合，显示了ε-insensitiveHuber loss 损失函数更优的性能。The ε-insensitive Huber loss is proposed as the loss function of the non-contact pulse wave measurement network, because it can focus on the pulse wave intensity constraints, and the better performance of the ε-insensitive Huber loss loss function is shown by evaluating different loss functions and their combinations.

进一步提出了心率和呼吸运动联合多任务测量的网络，它具有在相关生理信号之间共享信息的优点，可以同时测量心率与呼吸率，并进一步提高精度，同时降低计算成本。大量实验表明，所提出的方法在公开数据库上性能优异。并进行了跨数据库评估和消融研究，证明了所提出方法的有效性和鲁棒性。A network for joint multi-task measurement of heart rate and respiratory motion is further proposed, which has the advantage of sharing information between related physiological signals, can measure heart rate and respiratory rate simultaneously, and further improves accuracy while reducing computational cost. Extensive experiments show that the proposed method has excellent performance on public databases. Cross-database evaluation and ablation studies are also conducted to demonstrate the effectiveness and robustness of the proposed method.

图3是根据一示例性实施例示出的一种基于高效时空建模的非接触生理信号检测系统框图。参照图3，该系统300包括：Fig. 3 is a block diagram of a non-contact physiological signal detection system based on efficient spatio-temporal modeling according to an exemplary embodiment. Referring to Figure 3, the system 300 includes:

数据采集模块310，用于获取原视频流，对原视频流进行预处理，获得预处理后的图像序列；The data collection module 310 is used to obtain the original video stream, preprocess the original video stream, and obtain a preprocessed image sequence;

时空信息提取模块320，用于获取图像序列，将图像序列输入基于三维中心差分卷积算子的深度神经网络中，结合卷积层的注意掩膜机制，提取时空信息；The spatio-temporal information extraction module 320 is used to obtain the image sequence, input the image sequence into the deep neural network based on the three-dimensional central difference convolution operator, and extract the spatio-temporal information in combination with the attention mask mechanism of the convolutional layer;

模型优化模块330，用于基于时空信息以及ε-insensitive Huber loss 损失函数，构建多任务损失函数对深度神经网络进行优化；The model optimization module 330 is used to construct a multi-task loss function to optimize the deep neural network based on the spatiotemporal information and the ε-insensitive Huber loss loss function;

数据输出模块340，用于采用二阶巴特沃斯滤波器过滤优化后的深度神经网络，同时输出心率与呼吸率，完成基于高效时空建模的非接触生理信号检测。The data output module 340 is configured to use the second-order Butterworth filter to filter the optimized deep neural network, output the heart rate and respiration rate simultaneously, and complete non-contact physiological signal detection based on efficient spatio-temporal modeling.

可选地，数据采集模块310，进一步用于将原始视频流分别进行时域归一化差值和下采样处理，获得预处理后的图像序列；Optionally, the data acquisition module 310 is further configured to perform time-domain normalized difference and down-sampling processing on the original video stream, respectively, to obtain a preprocessed image sequence;

可选地，时空信息提取模块320，进一步用于将时域归一化差值图像序列作为运动分支，输入基于三维中心差分卷积算子的深度神经网络中；Optionally, the spatio-temporal information extraction module 320 is further configured to use the time-domain normalized difference image sequence as a motion branch, and input it into a deep neural network based on a three-dimensional central difference convolution operator;

通过注意掩膜机制，基于外观分支的建模皮肤感兴趣区域，协助运动分支提取时空信息；Through the attention mask mechanism, the skin area of interest is modeled based on the appearance branch, and the motion branch is assisted to extract spatiotemporal information;

重复进行时空信息的提取，将时空信息传至全连接层。Repeat the extraction of spatio-temporal information, and pass the spatio-temporal information to the fully connected layer.

可选地，时空信息提取模块320，进一步用于Optionally, the spatio-temporal information extraction module 320 is further used to

可选地，模型优化模块330，用于根据下述公式（4）ε-insensitive Huber loss 损失函数计算脉搏波和呼吸波的强度损失：Optionally, the model optimization module 330 is used to calculate the intensity loss of pulse wave and respiratory wave according to the following formula (4) ε-insensitive Huber loss loss function :

结合下述公式（5）构建多任务损失函数 L ^Total： Combine the following formula (5) to construct the multi-task loss function L ^Total :

其中和为损失函数权重；in and is the loss function weight;

可选地，数据输出模块340，用于采用二阶巴特沃斯滤波器深度神经网络，同时输出心率与呼吸率；Optionally, the data output module 340 is configured to use a second-order Butterworth filter deep neural network to simultaneously output heart rate and respiration rate;

本发明实施例中，提出了一种基于高效时空建模的用于生理信号非接触测量的远程光电容积脉搏波恢复方法。其有效的时空建模是通过结合三维中心差分卷积算子、运动和外观双分支结构以及软注意掩膜来实现的。三维中心差分卷积算子擅长通过梯度和强度信息的组合来描述脉搏波的内在模式。基于三维中心差分卷积算子的深度神经网络可以提供比传统三维卷积更可靠的时空信息建模能力。此外，本专利首先在远程光电容积脉搏波任务中引入了ε-insensitive Huber loss 损失函数，同时结合ε-insensitive 使得损失函数可忽略不敏感域的噪声样本从而增加鲁棒性，显示出更好的性能。In the embodiment of the present invention, a remote photoplethysmography recovery method for non-contact measurement of physiological signals based on efficient spatiotemporal modeling is proposed. Its efficient spatiotemporal modeling is achieved by combining a 3D central difference convolution operator, a motion and appearance dual branch structure, and a soft attention mask. The 3D central difference convolution operator is good at describing the intrinsic pattern of the pulse wave through the combination of gradient and intensity information. The deep neural network based on 3D central difference convolution operator can provide more reliable spatiotemporal information modeling ability than traditional 3D convolution. In addition, this patent first introduces the ε-insensitive Huber loss loss function in the remote photoplethysmography task, and combines ε-insensitive so that the loss function can ignore the noise samples in the insensitive region to increase robustness and show better performance.

图4是本发明实施例提供的一种电子设备400的结构示意图，该电子设备400可因配置或性能不同而产生比较大的差异，可以包括一个或一个以上处理器（centralprocessing units，CPU）401和一个或一个以上的存储器402，其中，所述存储器402中存储有至少一条指令，所述至少一条指令由所述处理器401加载并执行以实现下述基于高效时空建模的非接触生理信号检测方法的步骤：FIG. 4 is a schematic structural diagram of an electronic device 400 provided by an embodiment of the present invention. The electronic device 400 may have relatively large differences due to different configurations or performances, and may include one or more central processing units (CPU) 401 and one or more memory 402, wherein at least one instruction is stored in the memory 402, and the at least one instruction is loaded and executed by the processor 401 to realize the following non-contact physiological signal based on efficient spatiotemporal modeling The steps of the detection method:

S1：获取原视频流，对所述原视频流进行预处理，获得预处理后的图像序列；S1: Obtain an original video stream, perform preprocessing on the original video stream, and obtain a preprocessed image sequence;

S2：获取所述图像序列，将所述图像序列输入基于三维中心差分卷积算子的深度神经网络中，结合卷积层的注意掩膜机制，提取时空信息；S2: Acquiring the image sequence, inputting the image sequence into a deep neural network based on a three-dimensional central difference convolution operator, and extracting spatiotemporal information in combination with the attention mask mechanism of the convolution layer;

S3：基于所述时空信息以及ε-insensitive Huber loss 损失函数，构建多任务损失函数对所述深度神经网络进行优化；S3: Based on the spatiotemporal information and the ε-insensitive Huber loss loss function, construct a multi-task loss function to optimize the deep neural network;

在示例性实施例中，还提供了一种计算机可读存储介质，例如包括指令的存储器，上述指令可由终端中的处理器执行以完成上述基于高效时空建模的非接触生理信号检测方法。例如，所述计算机可读存储介质可以是ROM、随机存取存储器（RAM）、CD-ROM、磁带、软盘和光数据存储设备等。In an exemplary embodiment, there is also provided a computer-readable storage medium, such as a memory including instructions, which can be executed by a processor in the terminal to implement the above-mentioned non-contact physiological signal detection method based on efficient spatio-temporal modeling. For example, the computer readable storage medium may be ROM, random access memory (RAM), CD-ROM, magnetic tape, floppy disk, optical data storage device, and the like.

本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成，也可以通过程序来指令相关的硬件完成，所述的程序可以存储于一种计算机可读存储介质中，上述提到的存储介质可以是只读存储器，磁盘或光盘等。Those of ordinary skill in the art can understand that all or part of the steps for implementing the above embodiments can be completed by hardware, and can also be completed by instructing related hardware through a program. The program can be stored in a computer-readable storage medium. The above-mentioned The storage medium mentioned may be a read-only memory, a magnetic disk or an optical disk, and the like.

以上所述仅为本发明的较佳实施例，并不用以限制本发明，凡在本发明的精神和原则之内，所作的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, equivalent replacements, improvements, etc. made within the spirit and principles of the present invention shall be included in the protection of the present invention. within range.

Claims

1. The non-contact physiological signal detection method based on the efficient space-time modeling is characterized by comprising the following steps of:

s1: acquiring an original video stream, preprocessing the original video stream, and acquiring a preprocessed image sequence;

s2: acquiring the image sequence, inputting the image sequence into a depth neural network based on a three-dimensional center differential convolution operator, and extracting space-time information by combining a notice mask mechanism of a convolution layer;

s3: constructing a multi-task loss function to optimize the deep neural network based on the space-time information and the epsilon-insensitive Huber loss loss function;

in the step S3, constructing a multi-task loss function to optimize the deep neural network based on the spatio-temporal information and the epsilon-insensitive Huber loss loss function includes:

calculating the intensity loss of pulse wave and respiratory wave according to the following equation (4) epsilon-insensitive Huber loss loss function

：

（4）

Wherein the method comprises the steps of

Indicating true value(s)>

Representation input +.>

Through a function->

Mapped prediction value ∈ ->

Is a superparameter of Huber loss, default 1, < >>

Is an superparameter of epsilon-intrinsic loss, and the default value is 0.1;

the following formula (5) is combinedBuilding a multitasking loss functionL ^Total ：

（5）

Wherein the method comprises the steps of

And->

Weights for a loss function;

optimizing the weight of the deep neural network through the back propagation loss function value, stopping optimizing after the loss function is not reduced any more, and selecting a model of the deep neural network with the lowest loss function value in the training process;

s4: and filtering the optimized deep neural network by adopting a second-order Butterworth filter, and outputting the heart rate and the respiratory rate simultaneously to finish non-contact physiological signal detection based on efficient space-time modeling.

2. A non-contact physiological signal detection system based on efficient spatiotemporal modeling, characterized in that the system is adapted for use in the method of claim 1 above, the system comprising:

the data acquisition module is used for acquiring an original video stream, preprocessing the original video stream and acquiring a preprocessed image sequence;

the space-time information extraction module is used for acquiring the image sequence, inputting the image sequence into a deep neural network based on a three-dimensional center difference convolution operator, and extracting space-time information by combining a attention mask mechanism of a convolution layer;

the model optimization module is used for constructing a multi-task loss function to optimize the deep neural network based on the space-time information and the epsilon-insensitive Huber loss loss function;

and the data output module is used for filtering the optimized deep neural network by adopting a second-order Butterworth filter, outputting heart rate and respiratory rate at the same time, and finishing non-contact physiological signal detection based on efficient space-time modeling.