CN116385837A

CN116385837A - Self-supervision pre-training method for remote physiological measurement based on mask self-encoder

Info

Publication number: CN116385837A
Application number: CN202310445533.XA
Authority: CN
Inventors: 刘鑫; 张雨婷; 余梓彤; 岳焕景; 杨敬钰
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2023-04-24
Filing date: 2023-04-24
Publication date: 2023-07-04
Anticipated expiration: 2043-04-24
Also published as: CN116385837B

Abstract

The invention discloses a self-supervision pre-training method for remote physiological measurement based on a mask self-encoder, belonging to the technical field of computer vision; the invention proposes an rPPG-MAE which takes ST-Map as input and uses a Mask Automatic Encoder (MAE) for self-supervised ViT pre-training. To our knowledge, this is the first time exploring self-supervised learning using ST-Map input on challenging rpg tasks, such as VIPL-HR datasets that are less constrained. The invention designs a new rPPG loss function to restrict MAE pre-training task. The proposed rpg loss is more suitable for pre-training than the original pixel reconstruction loss employed in the original MAE, enabling ViT to learn the periodic information of the rpg signal efficiently. In addition to the original ST-Map, the present invention explores several rpg task-related reconstruction targets. The ST-Map with band-pass filtering is provided, the frequency is limited in the range of the heart rate signal, and the network is helped to learn useful period information.

Description

Self-supervision pre-training method for remote physiological measurement based on mask self-encoder

Technical Field

The invention relates to the technical field of computer vision, in particular to a self-supervision pre-training method for remote physiological measurement based on a mask self-encoder.

Background

Heart Rate (HR), heart Rate Variability (HRV), and respiratory Rate (RF) contain a number of important indicators of vital information about the human body. In the past, these physiological signals were typically measured by Electrocardiography (ECG) and photoplethysmography (PPG). However, these conventional methods require direct contact with the body, limiting the ability to monitor human vital information in real time in a sensorless environment. Contactless remote heart rate monitoring (rpg) has become a hot topic of research by analyzing skin color changes in patient facial videos without additional sensors.

In the early stages, many methods explored various manual properties of rpg. In recent years, a number of end-to-end supervision models have been designed using two-dimensional/three-dimensional Convolutional Neural Networks (CNNs) to extract rpg features. Meanwhile, some research work developed some non-end-to-end fully supervised methods to capture the rpg signal from the space-time diagram (ST-Map). However, supervised learning requires a large amount of labeling data, and in the rpg field, the cost of collecting large-scale labeling data with accuracy is high. Thus, some self-monitoring methods have been proposed to cope with this limitation, for example Gideo and stept propose a method with weak prior to the frequency and time smoothness of the target signal; sun and Li generate multiple rPPG signals from each video at different spatio-temporal positions using a 3DCNN model, taking both facial video frames as input and obtaining an rPPG representation, directly predicting the rPPG signals. However, these methods are end-to-end and may not be robust in challenging situations (e.g., severe head movements).

Since the rpg signal is very subtle, it is easily swamped by noise (e.g., light, motion, camera noise, etc.), and it is difficult to extract periodic information from the original video data in the manner of an original data structure. That is why many successful rpg methods still construct neural network inputs in a specific way, rather than directly using raw data, such as a spatiotemporal Map (ST-Map), where spatiotemporal representations and temporal physiological signals extracted from different regions of interest (ROIs) of the face are designed as inputs to a model. On the one hand, ST-Map contains abundant physiological information and has been successfully applied to supervised learning methods. On the other hand, the cost of acquiring PPG/ECG signals while acquiring large-scale face video data is high.

In recent years, self-supervised learning has become a hotspot in the field of computer vision, and many methods have been proposed, such as self-supervised learning algorithms based on auxiliary tasks and models based on contrast learning. Today, a more versatile denoising auto-encoder has enjoyed tremendous success in both Natural Language Processing (NLP) (e.g., mask auto-encoding of BERT) and computer vision (e.g., mask auto-encoder (MAE)). In particular, MAEs have proven to be an effective image analysis task (such as image classification and object segmentation). rpg is a typical computer vision task, and how to use a mask self-encoder to reduce information redundancy and noise of ST-Map, to realize efficient rpg measurement, naturally becomes the focus of research. However, in early studies, the mask auto-encoder was used only to pre-train natural images, such as ImageNet datasets. In addition, a large gap exists between the natural semantic image and the ST-Map.

In order to solve the problems, the invention provides a self-supervision pre-training method for remote physiological measurement based on a mask self-encoder.

Disclosure of Invention

The invention aims to provide a self-supervision pre-training method for remote physiological measurement based on a mask self-encoder so as to solve the following problems in the prior art:

(1) The mask automatic encoder is only used for pre-training natural images, and has a narrow application range;

(2) There is a large gap between the natural semantic image and the ST-Map:

2.1 The natural semantic image is different from the physical information contained in the ST-Map;

2.2 Identifying valid information from ST-Map is more difficult than from natural semantic images;

2.3 The proposed goal of self-supervising pre-training is quite different from existing work.

In order to achieve the above purpose, the present invention adopts the following technical scheme:

the self-supervision pre-training method for remote physiological measurement based on the mask self-encoder utilizes the advantages of the mask self-encoder on a space-time diagram and training ViT to design a novel shielding self-supervision rPPG measurement method, which comprises the following specific contents:

step 1, detecting a face video by using open source face detection software Seeta face, positioning 81 face key points, generating a face boundary frame by using the 81 face key points, and aligning a face region and removing a background region through the face boundary frame;

step 2, dividing the face video frame which is obtained in the step 1 and has the background area removed into 25 interested areas (ROI), and respectively calculating the average pixel value of each color channel (R, G, B) in each area; the average color values of each channel of the same block but different frames are connected in series to form a sequence, the sequences from the same color channel are spliced to form pictures, and then a large space-time diagram is generated through a face video

；

Step 3, cutting and adjusting the large space-time diagram (ST-Map) obtained in the step 2 to obtain a square space-time diagram;

step 4, carrying out mask processing on the square space-time diagram obtained in the step 3, and calculating to obtain a reserved space-time diagram patch;

step 5, inputting the reserved space-time diagram patch into a ViT encoder to generate a coded space-time diagram feature vector;

step 6, the space-time diagram feature vector and the mask mark are input into a ViT decoder together, and after passing through the ViT decoder, missing patches of the ST-Map are obtained through prediction;

step 7, calculating reconstruction loss of pixel values of the predicted patch and the corresponding position of the original space-time diagram, and training and optimizing a new loss function;

step 8, acquiring a trained ViT encoder based on the step 7, and inputting an unmasked space-time diagram into the current ViT encoder to generate a complete space-time diagram feature vector;

step 9, inputting the complete feature vector into an rPPG predictor, and outputting a predicted rPPG signal;

step 10, training ViT encoder and rpg predictor based on steps 8, 9;

and 11, inputting a space-time diagram into a trained ViT coder and an rPPG predictor to obtain a prediction result.

Preferably, the cropping of the large space-time diagram in step 3 specifically includes the following: cutting a large space-time diagram into small space-time diagrams with fixed overlapping step length (s=5), controlling the cutting length to be 224, and readjusting the obtained rectangular space-time diagram (224×25) into 224×224 square space-time diagram

。

Preferably, the step 4 specifically includes the following:

step 4.1, dividing the space-time diagram into non-overlapping patches (size

,/>

）；

Step 4.2, shuffling the patch obtained in the step 1;

and 4.3, keeping the proportion of the patches orderly, removing the remaining patches, and calculating the number of the retained patches, wherein a specific calculation formula is as follows:

wherein ,R _m a specific mask ratio representing the masking process,R _m =75%；Ta length/width (space-time diagram is square) representing a space-time diagram, t=224;

represents the length/width of the patch (patch is square),>

。

preferably, the ViT encoder in step 5 includes a linear mapping layer with position coding and a plurality of transducer modules; the invention selects ViT basic version, which comprises 12 transducer modules, and the output dimension is 768. The input at this stage is the reserved patch in step 4

The output of the ViT encoder is:

wherein ,

，L _k andD _e representing the length of the input ST-Map sequence and the ViT encoded dimension, respectively; />

Representing input patch data,/->

A ViT encoder is shown.

Preferably, the ViT decoder described in step 6The device comprises 8 transducer modules, and the output dimension is 128. Output length after passing ViT decoder due to addition of mask mark

The specific formula for the number of patches in the whole ST-Map is as follows:

wherein ,L _all representing the length of the entire ST-Map sequence;D _d representing the output dimension of the ViT decoder;

output as ViT encoder; />

Representing ViT decoder. The default dimension does not match the number of pixel values in the patch, so the last layer of the ViT decoder designs a linear projection, and the mask mark is remodelled into the patch to obtain the required reconstructed ST-Map.

Preferably, the output of the ViT decoder is a series of vectors having dimensions equal to the number of pixels of a patch, the pixel loss function only calculating the Mean Square Error (MSE) between the reconstructed image and the original image in the mask pixel space, in particular:

wherein ,

a mask pixel value representing ViT decoder prediction; />

Mask pixel values representing ST-Map realityThe method comprises the steps of carrying out a first treatment on the surface of the MSE (·) represents the mean square error;

the reconstruction loss described in step 7 specifically refers to: the ViT encoder is guaranteed to learn the periodic characteristics of the BVP signal by reconstructing a new ST-Map, the specific function being expressed as:

wherein ,

，/>

representing pixel values of a reconstructed ST-Map and a true ST-Map line, respectively;PC(. Cndot.) represents pearson correlation;CandN _ROI the number of channels and the number of ROIs, respectively, wherein,N _ROI =T；

in summary, the overall loss function of the reconstruction phase is:

wherein the super parameterλ∈{0,1}。

Preferably, the input to the trained ViT encoder described in step 8 is ST-Map

Is a complete patch of (a); the output of the trained ViT encoder is:

wherein ,

；L _all andD _e the length of the entire ST-Map sequence and the dimension of the ViT encoder are shown, respectively.

Preferably, the rpg predictor described in step 9 consists of one simple Linear layer (Linear) and layer normalization (LayerNorm).

Preferably, the step 10 specifically includes the following:

step 10.1, predicting the rpg signal by selecting a negative pearson correlation loss calculated between the predicted rpg signal and the real BVP signal, in particular:

wherein ,S _pr andS _gt representing the predicted rpg signal and the actual BVP signal, respectively;PC(. Cndot.) represents pearson correlation;

step 10.2, performing better prediction by using frequency domain loss, and calculating a cross entropy error between a real heart rate and an estimated rPPG signal spectrum distribution, wherein the cross entropy error specifically comprises the following steps:

wherein ,PSD(·) represents the power spectral density of the predicted rpg signal;CE(. Cndot.) represents cross entropy loss;

refer to the true heart rate, and is specifically expressed as a single heat vector hr= [0, …,0,1,0, …]"1" represents an index corresponding to the true heart rate;

representing the predicted signal.

Step 10.3, synthesize the content of step 10.1-10.2, the whole loss function of the rPPG predictive stage is specifically:

wherein the parameter gamma e {0,1} is adjusted between different datasets, in the present invention we set gamma=0 in VIPL-HR datasets and gamma=1 in both the push and UBFC-rpg datasets.

Compared with the prior art, the invention provides a self-supervision pre-training method for remote physiological measurement based on a mask self-encoder, which has the following beneficial effects:

(1) The invention proposes an rPPG-MAE which takes ST-Map as input and uses a Mask Automatic Encoder (MAE) for self-supervised ViT pre-training. This is the first time on the rpg task, exploring the use of self-supervised learning with ST-Map as input on the challenging VIPL-HR dataset.

(2) The invention designs a new rPPG loss function to restrict MAE pre-training task. The proposed rpg loss is more suitable for pre-training than the original pixel reconstruction loss employed in the original MAE, enabling ViT to learn the periodic information of the rpg signal efficiently.

(3) In addition to the original ST-Map, the present invention explores several rpg task-related reconstruction targets. The ST-Map with band-pass filtering is proposed, the frequency is limited in the BVP signal range, and the network is helped to learn useful periodic information.

(4) The method is an unsupervised method, does not need expensive manual labeling of the data set, has better economical efficiency compared with other methods, and is worth popularizing.

(5) The invention has wider application range, can be expanded to other monitoring methods, and further improves the performance.

Drawings

FIG. 1 is a schematic diagram of the time-space diagram (ST-Map) generation of a self-supervised pre-training method for remote physiological measurement based on a mask self-encoder;

FIG. 2 is a flow chart of a design framework of a self-monitoring pre-training method for performing remote physiological measurement based on a mask self-encoder according to the present invention;

FIG. 3 is an input original graph, a mask effect graph and a reconstruction effect graph in embodiment 1 of the present invention;

fig. 4 is a graph comparing the predicted rpg signal with the actual BVP signal in example 1 of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments.

The invention provides a self-supervision pre-training method for remote physiological measurement based on a mask self-encoder, which is sponsored by 'national natural science foundation-human body micro-gesture recognition and emotion analysis project 62171309 based on self-supervision learning', and mainly aims to solve the following problems in the prior art:

rpg is a typical computer vision task, and how to use a mask self-encoder to reduce information redundancy and noise of ST-Map, to realize efficient rpg measurement, naturally becomes the focus of research. However, in early studies, the mask auto-encoder was used only to pre-train natural images, such as ImageNet datasets. In addition, a large gap exists between the natural semantic image and the ST-Map:

1) The two images contain different physical information. The natural image contains only spatial information, where a cluster of pixels represents one object, but the ST-Map is a representation of the physiological signal in the spatial and temporal domains.

2) Identifying valid information from ST-Map is more difficult than from natural images. The extraction of the rpg signal from ST-Map is relatively difficult because of the presence of many uncorrelated noise and subtle physiological signals in ST-Map.

3) The proposed goal of self-supervising pre-training is quite different from existing work. The main purpose of self-supervised pre-training in rpg is not to predict the mask pixel values for reconstructing the image, but rather to predict an image containing similar physical period information as the real ST-Map.

In response to the above problems, the present invention proposes an rpg-MAE that uses ST-Map as input, and uses a Mask Auto Encoder (MAE) for self-supervised ViT pre-training. To our knowledge, this is the first time exploring self-supervised learning using ST-Map input on challenging rpg tasks, such as VIPL-HR datasets that are less constrained. Meanwhile, the invention designs a new rPPG loss function to restrict the MAE pre-training task. The proposed rpg loss is more suitable for pre-training than the original pixel reconstruction loss employed in the original MAE, enabling ViT to learn the periodic information of the rpg signal efficiently. In addition to the original ST-Map, the present invention explores several rpg task-related reconstruction targets. The ST-Map with band-pass filtering is proposed, the frequency is limited in the BVP signal range, and the network is helped to learn useful periodic information. In addition, the invention is an unsupervised method, which does not require expensive manual labeling of the data sets and is cheaper than other methods. Furthermore, the invention can be expanded to other monitoring methods to further improve the performance.

Based on the above description, the self-supervision pre-training method for remote physiological measurement based on the mask self-encoder provided by the invention specifically comprises the following steps:

example 1:

the invention provides a self-supervision pre-training method for remote physiological measurement based on a mask self-encoder, referring to fig. 1, which comprises four ST-Map generation schematic diagrams:

we first align the face in different frames according to the detected keypoints, and then divide the face region into n ROI blocks R1, R2, … R25. An average color value is calculated for each color channel in each block. The average color values per channel of the same block but different frames are concatenated into a sequence, i.e. R1, G1, B1, R2, G2, B2, …, R25, G25, B25. Splice sequences from the same color channel into a size of

(R, G, B), where n=25. Further, the CHROM signal and the POS signal are processed using a CHROM algorithm and a POS algorithm. The filtered signal in the figure +.>

With a pass frequency of [0.6, 3]Is filtered by a butterworth band-pass filter. Finally, the different combined signals are spliced into 4 ST-Maps (CHROM-ST-Map, POS-ST-Map, filter-ST-Map, ST-Map)

The overall design flow of the invention is shown in fig. 2, and the overall flow can be divided into three modules:

1）ST-Mand an ap generating module. We mark the ith input ST-Map as

Where N represents the number of ROIs, T represents the number of frames of a video segment, and C represents the number of channels (c=3, including R, G and B), as shown in fig. 2, we first generate a large ST-Map from the entire video>

Then, large ST-maps overlapping s frames are cut. The fragment length is T. Therefore, the number of ST-maps in video is +.>

After that, we adjust the size of the original ST-Map to +.>

The number of ROIs increases from N to T.

2) And an ST-Map reconstruction module. The reconstruction module is mainly composed of a ViT encoder and a ViT decoder. The ViT encoder includes a linear mapping layer with position encoding and a number of transducer modules. The invention selects ViT basic version, which comprises 12 transducer modules with 768 output dimension, the input at this stage is the reserved patch in step 4

The output of the ViT encoder is +.>

, wherein />

，/>

and />

The length of the input ST-Map sequence and the ViT encoded dimension are shown, respectively. ViT decoder includes 8 transducer modules with an output dimension of 128. Due to the addition of the masked signature, a ViT solution is passedEncoder post output Length->

The number of patches in the entire ST-Map. The output of the ViT decoder is +.>

，/>

, wherein />

Indicates the length of the entire ST-Map sequence, < >>

Representing the output dimension of the ViT decoder. However, the default dimension does not match the number of pixel values in the patch, so a linear projection is designed at the last layer of the decoder. In this way, the mask mark is remodelled into a patch, and we can then obtain the desired reconstructed ST-Map. The reconstructed ST-Map and the true ST-Map calculate the loss. The output of the ViT decoder is a series of vectors with dimensions equal to the number of pixels of a patch. The pixel loss function calculates only the Mean Square Error (MSE) between the reconstructed image and the original image in the mask pixel space as

wherein ,

mask pixel values predicted for ViT decoder,>

mask pixel values that are true for ST-Map,

is the mean square error.

In order for the ViT encoder to learn the periodicity of the BVP signal, the present invention proposes a new loss function:

wherein

，/>

Representing the pixel values of the reconstructed ST-Map and the real ST-Map line, respectively. />

Representing pearson correlation; />

and />

Respectively the number of channels and the number of ROIs, wherein +.>

.

In summary, the overall loss function of the reconstruction phase can be written as:

wherein the super parameter gamma e is {0,1}.

3) rPPG prediction module: this module consists of a ViT encoder and an rpg predictor. The initialization weights of the ViT encoder in this module are pre-trained during ST-Map reconstruction. The rpg predictor consists of a Linear layer (Linear) and layer normalization (LayerNorm). The original ST-Map is input into a ViT coder and then input into an rPPG predictor, and the output of the rPPG predictor is a predicted rPPG signal. The predicted rpg signal and the true BVP signal calculate the loss:

the negative pearson correlation loss calculated between the predicted rpg signal and the real BVP signal can be expressed as:

wherein ,

and />

Respectively representing the predicted rpg signal and the actual BVP signal. />

Representing pearson correlation.

In addition, the frequency domain loss is utilized to perform better prediction, and the cross entropy error between the true heart rate and the estimated rPPG signal spectrum distribution is calculated as follows:

wherein

Representing the power spectral density of the predicted rpg signal, < >>

Representing cross entropy loss. />

The true heart rate can be calculated using a single heat vector hr= [0, …,0,1,0, …]"1" represents an index corresponding to the true heart rate; />

Representing the predicted signal.

In general, the overall loss function of the rpg prediction phase can be written as:

where the parameter gamma e 0,1 will be adjusted between different data sets. In the present invention we set γ=0 in the VIPL-HR dataset and γ=1 in the PURE and UBFC-rpg datasets.

Overall, the main steps are divided into three major steps: 1) Generating ST-Map; 2) Reconstructing ST-Map; 3) The rpg signal is predicted. The first step is to prepare input data for two subsequent parts, reconstruct ST-Map to pre-train ViT encoder weight parameters for predicting rPPG signals, and finally obtain the required rPPG signals by an rPPG prediction module.

FIG. 3 is a visual representation of the step of reconstructing the ST-Map, the reconstructed ST-Map being as close as possible to the original ST-Map.

Fig. 4 is a visual representation of the predicted rpg signal, which can be observed to be very close to the real BVP signal.

The present invention is not limited to the above-mentioned embodiments, and any person skilled in the art, based on the technical solution of the present invention and the inventive concept thereof, can be replaced or changed within the scope of the present invention.

Claims

1. The self-supervision pre-training method for remote physiological measurement based on the mask self-encoder is characterized in that a novel mask self-supervision rPPG measurement method is designed by utilizing the advantages of the mask self-encoder on a space-time diagram and training ViT, and specifically comprises the following steps:

step 1, detecting a face video by using face detection software, positioning face key points in the video, generating a face boundary frame by using the face key points, and aligning a face region and removing a background region by using the face boundary frame;

step 2, dividing the face video frame which is obtained in the step 1 and has been aligned and the background area removed into a plurality of interested areas, and respectively calculating the average pixel value of each color channel in each area; the average color values of each channel of the same block but different frames are connected in series to form a sequence, and the sequences from the same color channel are spliced to form pictures, so that a large space-time diagram is generated through a face video;

step 3, cutting and adjusting the large time-space diagram obtained in the step 2 to obtain a square time-space diagram;

step 6, the space-time diagram feature vector and the mask mark are input into a ViT decoder together, and after the space-time diagram feature vector and the mask mark pass through the ViT decoder, a missing patch of the space-time diagram is obtained through prediction;

step 10, training ViT encoder and rpg predictor based on steps 8, 9;

2. The method for self-monitoring pre-training based on remote physiological measurement by mask self-encoder according to claim 1, wherein the clipping of the large space-time diagram in step 3 specifically comprises the following steps: a large space-time diagram is cut into small space-time diagrams with fixed overlapping step sizes, the cutting time is controlled to be 224, and the obtained rectangular space-time diagram is readjusted into 224×224 square space-time diagrams.

3. The method for self-monitoring pre-training based on remote physiological measurement by mask self-encoder according to claim 1, wherein the step 4 specifically comprises the following steps:

step 4.1, dividing the space-time diagram into non-overlapping patches;

step 4.2, shuffling the patch obtained in the step 1;

wherein ,R _m a specific mask ratio representing the masking process,Trepresenting the length/width of the space-time diagram,

representing the length/width of the patch.

4. The method of claim 1, wherein the ViT encoder in step 5 comprises a linear mapping layer with position coding and a plurality of transducer modules; the output of the ViT encoder is:

wherein ,

，L _k andD _e representing the length of the input space-time diagram sequence and the ViT encoded dimension, respectively; />

Representing input patch data,/->

A ViT encoder is shown.

5. The method for self-monitoring pre-training based on remote physiological measurement by mask self-encoder according to claim 1, wherein the output length after ViT decoder in step 6 is the number of patches in the whole space-time diagram, and the specific formula is:

wherein ,L _all representing the length of the entire time space diagram sequence;D _d representing the output dimension of the ViT decoder;

output as ViT encoder; />

Representing ViT decoder; the final layer of the ViT decoder is designed with linear projections, and the mask marks are remodelled into patches, thereby obtaining the required reconstructed time-space diagram.

6. The method according to claim 1, wherein the output of the ViT decoder is a series of vectors with dimensions equal to the number of pixels of a patch, and the pixel loss function only calculates the mean square error between the reconstructed image and the original image in the mask pixel space, in particular:

wherein ,

a mask pixel value representing ViT decoder prediction; />

Mask pixel values representing the reality of the space-time diagram; MSE (·) represents the mean square error;

the reconstruction loss described in step 7 specifically refers to: the ViT encoder is guaranteed to learn the periodic characteristics of the BVP signal by reconstructing a new space-time diagram, the specific function being expressed as:

wherein ,

，/>

respectively representing pixel values of one row of the reconstructed space-time diagram and the real space-time diagram;PC(. Cndot.) represents pearson correlation;CandN _ROI the number of channels and the number of ROIs, respectively, wherein,N _ROI =T；

in summary, the overall loss function of the reconstruction phase is:

wherein the super parameterλ∈{0,1}。

7. The method of claim 1, wherein the input to the trained ViT encoder in step 8 is a complete patch of a space-time diagram; the output of the trained ViT encoder is:

wherein ,

；L _all andD _e representing the length of the entire ST-Map sequence and the dimension of the ViT encoder, respectively; />

Representing input complete data; />

Representing a pre-trained ViT encoder.

8. The method of claim 1, wherein the rpg predictor in step 9 consists of a simple linear layer and layer normalization.

9. The method for self-monitoring pre-training based on remote physiological measurement by mask self-encoder according to claim 1, wherein the following are specifically included in the step 10:

refer to the true heart rate, and is specifically expressed as a single heat vector hr= [0, …,0,1,0, …]"1" represents an index corresponding to the true heart rate; />

Representing the predicted signal;

wherein the parameter gamma e {0,1} is adjusted between different data sets.