WO2019203106A1 - Pulse rate estimation apparatus, pulse rate estimation method, and computer-readable storage medium - Google Patents
Pulse rate estimation apparatus, pulse rate estimation method, and computer-readable storage medium Download PDFInfo
- Publication number
- WO2019203106A1 WO2019203106A1 PCT/JP2019/015742 JP2019015742W WO2019203106A1 WO 2019203106 A1 WO2019203106 A1 WO 2019203106A1 JP 2019015742 W JP2019015742 W JP 2019015742W WO 2019203106 A1 WO2019203106 A1 WO 2019203106A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- pulse
- noise
- sub
- pulse rate
- roi
- Prior art date
Links
Images
Classifications
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/02—Detecting, measuring or recording pulse, heart rate, blood pressure or blood flow; Combined pulse/heart-rate/blood pressure determination; Evaluating a cardiovascular condition not otherwise provided for, e.g. using combinations of techniques provided for in this group with electrocardiography or electroauscultation; Heart catheters for measuring blood pressure
- A61B5/024—Detecting, measuring or recording pulse rate or heart rate
- A61B5/0245—Detecting, measuring or recording pulse rate or heart rate by using sensing means generating electric signals, i.e. ECG signals
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/0059—Measuring for diagnostic purposes; Identification of persons using light, e.g. diagnosis by transillumination, diascopy, fluorescence
- A61B5/0077—Devices for viewing the surface of the body, e.g. camera, magnifying lens
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7203—Signal processing specially adapted for physiological signals or for diagnostic purposes for noise prevention, reduction or removal
Definitions
- the present invention relates to an apparatus and a method for robust physiological pulse rate estimation from videos of human face, and a computer-readable storage medium storing a program for realizing these.
- HR measurements are especially important as it’s been shown that human psychological states as stress, arousal, and sleepiness can be estimated from HR. While HRs are usually measured by contact-based means, especially electrocardiography, however for the aforementioned applications, continuous, simpler measurement is necessary. To this end, in recent years, HR measurement techniques employing videos captured with commonly-used cameras have been proposed.
- Processes of a pulse rate measurement system can be broadly categorized into three steps: The first is to capture a video of a human face, which can be done using a video capturing device; the second is to extract the required pulse signal from a region of interest (ROI) on the face, which essentially represents periodic fluctuations happening on the color of face because of physiological activity (e.g. periodic heart-beating activity), henceforth referred to as pulse fluctuations; the third is to estimate the pulse rate frequency by performing frequency analysis on the extracted pulse signal.
- ROI region of interest
- noise from background illumination changes, rigid head motion as well as change in facial expression, often affects performance of pulse extraction. Due to a variety of noise, the pulse signal can get corrupted and the expected pulse fluctuations can’t be captured effectively, thus negatively affecting the performance (e.g. accuracy) of pulse rate estimation methods.
- Patent Literature 1 Prior art which deals with heart rate estimation on face videos is disclosed in Patent Literature 1 and is illustrated as a block diagram in Figure 7.
- This prior art uses self-adaptive matrix completion on chrominance features to extract fluctuations (due to cardiac activity) from a color video of human face in the presence of head movement, facial expressions and background illumination changes.
- the important steps in this prior art are: a) generate facial feature points by tracking face on video (602); b) then inside the ROI selection and pulse extraction unit 603, illustrated in greater detail in Figure 8- i) use facial feature points to select region of interest on face and divide it into sub-regions (702); ii) extract pulse signal (chrominance) from each sub-region (703); iii) create binary ROI mask to remove sub-regions with high local (temporal) variance of pulse signal (704); iv) perform matrix completion to estimate a low rank matrix that approximates the ROI-masked chrominance signal matrix and combine linearly dependent rows to extract cardiac pulse signal (705); c) perform frequency analysis (604) on the extracted pulse signal to estimate heart rate (HR).
- HR heart rate
- Patent Literature 1 employs matrix completion as a method for replacing the removed noisy ROI sub-regions on the face and groups the linearly dependent rows together because the pulse fluctuations are supposed to be present in all sub-regions (corrupted by varying amounts of noise).
- the method for identifying noisy regions (prior art considers regions with high pulse intensity variance as noisy) and the subsequent removal of noise, however, is sub-optimal and is a contributing factor to the low HR estimation accuracy of prior art.
- the main reason for this sub-optimality is that the prior art doesn’t optimize noise localization and noise removal processes depending on the cause of noise and hence, gives low performance.
- Prior art Patent Literature 1 uses the local variance of pulse signal intensity as a measure of the noise present in each ROI sub-region and relies on chrominance value to extract pulse signal.
- the chrominance signal is a linear combination of the three color-channels (with fixed coefficients) which is intended to be orthogonal to the direction of white light, but suffers due to the standardized-skin-tone assumption used to derive the fixed coefficients. Contrary to the standardized-skin-tone assumption, all people don’t have similar skin tone, hence the chrominance signal isn’t truly orthogonal to the direction of white light for a lot of skin colors, which is a reason for its failure to effectively extract pulse signal in such cases.
- the local variance approach fails on another account.
- the regions which contain bright reflection (glare) spots or dark (shadow) spots, due to a certain placement of the light source, are usually the ones with the lowest light intensity variance. This is because without face movement, the glare/shadow artifact has a nearly constant intensity value. Consequently, such regions also contain the lowest amount of pulse information.
- a low local variance approach, as the one implemented in prior art, is likely to select such ROI sub-regions in the absence of head motion and this can be detrimental to the HR estimation accuracy.
- non-rigid noise For cases where facial expression (non-rigid noise) is involved, local variance of pulse intensity is not a good measure of the noise, as a non-rigid deformation of face when exhibiting an expression introduces noise in most regions of the face and so, most regions have a high variance due to unpredictable amount of noise being present. Most of the non-rigid noise is present in the direction of the white light and prior art’s pulse signal is not orthogonal to it.
- chrominance signal is designed to cancel the noise fluctuations out when noise is dominant and reinforce the pulse fluctuations when pulse fluctuations are dominant, but it fails to do either when the strength of noise and cardiac fluctuations are comparable.
- Patent Literature 2 and Patent Literature 3 that perform spectral peak tracking for pulse rate measurements performed using wearable/contact-based sensors, as illustrated using a block diagram in Figure 8.
- a first problem of HR estimation in prior art is the deterioration of estimation accuracy.
- the reason for the occurrence of the first problem is, several noise sources, mainly due to rigid and non-rigid motion performed by the person under observation, introduce complex corruption to observed signal and make it difficult to estimate HR, but prior art does not differentiate between the noise sources.
- HR estimation accuracy is deterioration in the presence of head motion.
- prior art uses local pulse intensity variance to identify noisy sub-regions but fails to sufficiently correct for head motion noise as a displacement of the face introduces noise fluctuations in the observed pulse from all regions.
- a third problem is HR estimation accuracy is deterioration in the absence of head motion.
- the reason for the occurrence of this problem is, prior art uses local pulse intensity variance to identify noisy sub-regions and hence, selects some glare/shadow spots which show low local variance in light intensity but don’t contain useful pulse information.
- a fourth problem is HR estimation accuracy is deterioration in the presence of facial expression.
- prior art uses local pulse intensity variance to identify noisy sub-regions but here, since the face undergoes deformation and the area under observation changes, unpredictable noise fluctuations are introduced in most regions.
- chrominance pulse is not orthogonal to the direction of white light and the identification of noisy regions becomes difficult due to noise being present in most of them.
- a fifth problem is HR estimation accuracy is deterioration in the presence of subtle head motions or facial expressions.
- the reason for the occurrence of this problem is, when strength of noise fluctuations is similar to strength of pulse fluctuations, chrominance value is a poor representation of the pulse signal as the linear combination is neither able to sufficiently suppress noise, nor able to sufficiently emphasize pulse fluctuations.
- a sixth problem is HR estimation accuracy is deterioration in the presence of severe head motions or facial expressions.
- the reason for the occurrence of this problem is, when the noise is dominant over the pulse signal, prior art is unable to distinguish between noise frequency and pulse rate because of the naive frequency analysis used to estimate pulse rate from the extracted pulse signal.
- Patent Literature 2 and Patent Literature 3 have a problem that an additional sensor (e.g. accelerometer) 804 is required to detect the presence and magnitude of noise.
- additional sensor e.g. accelerometer
- Such sensors also help the prior arts Patent Literature 2 and Patent Literature 3 to estimate the effect of noise on the observed pulse signal.
- One example of an object of the present invention is to provide a pulse rate estimation apparatus, pulse rate estimation method, and a computer-readable storage medium according to which the above-described problems are eliminated.
- the present invention uses a noise-source identification step to detect the presence of noise (both head motion and facial expression- which a wearable pulse sensor doesn’t need to deal with) and preforms spectral peak tracking without the need of an additional motion sensor which can predict the effect of noise on the observed pulse signal.
- a pulse rate estimation apparatus includes: a video capturing unit that captures a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted, a body-part tracking unit that detect specific human body part, and generate feature points indicating important structural landmarks on the body part for each frame where the body part is detected; a noise source detector that identify the noise source and assign labels to each video; a ROI selection and pulse signal extraction unit that select an ROI and divide it into sub-regions, extract pulse signal from each sub-regions, create a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and apply label-dependent noise estimation and correction to the pulse signals extracted from each sub-region; a pulse rate estimation unit that combine the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and perform frequency analysis on that extracted noise suppressed pulse signal and generate pulse rate candidates; a spect
- a pulse rate estimation method includes: (a) a step of capturing a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted, (b) a step of detecting specific human body part, and generating feature points indicating important structural landmarks on the body part for each frame where the body part is detected; (c) a step of identifying the noise source and assign labels to each video; (d) a step of selecting an ROI and divide it into sub-regions, extracting pulse signal from each sub-regions, creating a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and applying label-dependent noise estimation and correction to the pulse signals extracted from each sub-region; (e) a step of combing the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and performing frequency analysis on that extracted noise suppressed pulse signal and generating pulse rate candidates; (a) a step of capturing a video of a human body part
- a computer-readable recording medium has recorded therein a program, and the program includes an instruction to cause the computer to execute: (a) a step of capturing a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted, (b) a step of detecting specific human body part, and generating feature points indicating important structural landmarks on the body part for each frame where the body part is detected; (c) a step of identifying the noise source and assign labels to each video; (d) a step of selecting an ROI and divide it into sub-regions, extracting pulse signal from each sub-regions, creating a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and applying label-dependent noise estimation and correction to the pulse signals extracted from each sub-region; (e) a step of combing the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal
- the present invention it is possible to improve HR estimation accuracy in the presence of rigid and/or non-rigid motion noise.
- the identification of noise source makes it possible to improve HR estimation accuracy in the presence of rigid and non-rigid motion noise of varying degrees and this can be done without the need of an additional motion sensor.
- a block diagram illustrating an example of primary embodiment of the present invention A flowchart illustrating an example of primary embodiment of the present invention
- a block diagram illustrating the noise source detection step of primary embodiment of the present invention A block diagram detailing ROI selection and Pulse signal extraction step of primary embodiment of the present invention
- a block diagram illustrating the spectral peak tracking step of primary embodiment of the present invention A block diagram illustrating an example of a computer that realizes the pulse rate estimation apparatus according to an embodiment of the present invention.
- a block diagram of prior art PTL1 - self-adaptive matrix completion and adaptive ROI selection for HR pulse extraction on human face A block diagram detailing ROI selection and Pulse signal extraction step of prior art PTL1
- FIG. 1 is a block diagram schematically showing the configuration of the pulse rate estimation according to the embodiment of the present invention.
- the pulse rate estimation apparatus 100 includes a video capturing unit 101, a face tracker (a body-part tracking unit) 102, a noise source detector 103, a ROI selection and pulse signal extraction unit 104, a pulse rate estimation unit 105, and a spectral peak tracker 106.
- the pulse rate estimation apparatus 100 output estimated pulse rate (HR) 1061.
- the face tracker 102 in the primary embodiment of the present invention tracks human face, but pulse rate estimation can be performed on many other body parts where the skin is visible, like hand or ear. So in other embodiments of our method, 102 can be a hand tracker or an ear tracker or such.
- the video capturing unit 101 captures a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted.
- the body-part tracking means that detect specific human body part, and generate feature points indicating important structural landmarks on the body part for each frame where the body part is detected.
- the noise source detection means that identify the noise source and assign labels to each video.
- the ROI selection and pulse signal extraction means that select an ROI and divide it into sub-regions, extract pulse signal from each sub-regions, create a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and apply label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
- the pulse rate estimation means that combine the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and perform frequency analysis on that extracted noise suppressed pulse signal and generate pulse rate candidates.
- the spectral peak tracking means that select a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
- Figure 2 illustrates the flowchart of the first embodiment of the present invention. This Figure illustrates the process implemented to estimate heart rate from a video of a person’s face. Figure 1 will be referred to as needed in the following description.
- the pulse rate estimation method is carried out by allowing the pulse rate estimation apparatus 100 to operate. Accordingly, the description of the pulse rate estimation method of the present embodiment will be substituted with the following description of operations performed by the pulse rate estimation apparatus 100.
- the video capturing unit 101 captures video of a human face (Step A1).
- the face tracker 102 tracks the face of the person being observed to output face feature points for each video frame (Step A2).
- the noise source is identified by noise source detector 103 (in Figure 1).
- an ROI is selected and divided into several sub-regions for the localization of noise and pulse information (Step A3).
- a pulse signal is extracted from each ROI sub-region (Step A4).
- the noise source detector 103 identify the noise source and assign labels to each video frame (Step A5).
- the frame label assigned by noise source detector 103 is used to determine the dynamic ROI filtering process applied to the set of extracted sub-regional pulse signals.
- Steps A6-A8, A11-A13, A14-A16 are performed.
- the pulse rate estimation unit 105 generates a set of pulse rate estimate candidates for each video frame, which are classified as reliable or noisy estimates by the spectral peak tracker 106, depending on the frame label (Steps A9 and A17).
- the spectral peak tracker 106 then performs spectral peak tracking to select the correct pulse rate frequency from the set of noisy pulse rate estimate candidates (Step A18) and outputs the final pulse rate estimates for each video frame (Step A19). This flow process is discussed in greater detail below, starting from the top.
- FIG. 3 illustrates a block diagram for the noise source detection and frame label assignment step of the present embodiment.
- the face tracker 102 creates a stream of facial feature points 3011 for each video frame 3012 where the face of a person is observed.
- a number of facial feature points 3011 are generated on landmarks on the face such as, but not limited to, the nose, the eyes, corner of the lips, boundary of the face, etc. These facial feature points 3011 are generated for each frame in the video 3012 where a face is detected (with the help of a face detection software).
- the facial feature points 3011 are then fed to the noise source detector 103, which analyzes the movement of individual feature points with respect to time, as well as with respect to other feature points in order to detect whether the person in the video is performing (voluntary) head motions or if there are any facial expressions present on the face.
- small involuntary vertical head motion has been shown to contain HR information in the past in the following reference.
- head motion we are only concerned with large voluntary head motion, referred to simply as head motion hereafter, because such large head motion can cause corruption in the cardiac fluctuation pulse extracted by pulse extractor.
- the position of the feature point on the nose is observed over time for the purpose of head motion detection as this feature point is fairly visible on all video frames that contain a face and this feature point is not severely affected by any facial expression changes on the face. Since most facial expressions involve the movement of the mouth/lips, hence for the purpose of facial expression detection, the position of the facial feature points on and around the lips are observed with respect to the other feature points which are least affected by facial expression change, like the facial feature points on the nose, the forehead, etc. If facial expressions or head motion is not observed in a sequence of frames, these frames are assigned a label “still” to signify the absence of any corruption affecting the pulse signal due to head motion or facial expressions. These frame labels are fed into the ROI selection and pulse extraction unit 104 and the spectral peak tracker 106 which are all described in detail below.
- FIG. 4 illustrates a detailed block diagram of the ROI selection and pulse estimation unit of the new invention. Facial feature points 4011 and video frames 4012 are fed into the ROI selection and pulse extraction unit 104 (in figure 1) by the face tracker 102 (in Figure 1). As shown in FIG. 4, the ROI selection and pulse signal extraction unit 104 includes a Fixed ROI selector 402, a pulse extractor 403, a dynamic ROI filter generation unit 404, and noise correction unit 405.
- the facial feature points 4012 are used to select a ROI on the video frame 4012 and this ROI is divided into several parts.
- a pulse signal is extracted from each sub-region by the pulse extractor 403 and these pulse signals are fed into the dynamic ROI filter 404, which also takes as input, frame labels from the noise source detector 103 (in Figure 1).
- This noise source detector 103 uses the positions of facial feature points 4011 to detect the kind of noise (head motion, facial expression or neither) present in the video frames 4012 and assigns them with a label (“motion”, “expression” and “still”, respectively) which is fed into the ROI filter generation unit 404 and noise correction unit 405.
- the dynamic ROI filter generation unit 404 then creates a noise source-specific ROI filter to emphasize ROI sub-regions with useful pulse information and suppress those with noise.
- the noise correction unit 405 combines the pulse signals from each ROI sub-region using the ROI filter generated in the previous step and then performs noise-source specific noise correction steps to generate a noise-free pulse signal 4051 which is fed into the pulse rate estimation unit 105 (in figure 1) for pulse rate estimation and subsequent spectral peak tracking steps.
- the position and size of the ROI generated by the Fixed ROI selector 402 is decided based on the facial feature point on the nose as well as those on the face boundary.
- the ROI is a rectangular block centered at the center of the nose (the nose feature point) and its height and width is determined in a way that it lies between the eyes and the lips and covers the width of the face. This ROI is divided into many rectangular sub-regions of equal dimensions for the purpose of local analysis of the face regions to extract HR information from areas which are least affected by noise.
- the pulse extractor 403 calculates the mean (over all pixels of each ROI sub-region) of the green channel values over a moving window of a few seconds (4-10 seconds) which represents the pulse signal of the ROI sub-region.
- each sub-region in the ROI is assigned a weight by the ROI selector; the more useful the information present in the pulse extracted from the sub-region, larger the weight assigned, the more contribution it has towards pulse rate estimation.
- the usefulness of a sub-region increases with the amount of pulse information (physiological activity-related fluctuations) it contains, or alternatively, the usefulness decreases with the amount of head-motion/facial expression corruption present in the pulse signal of the sub-region. This is important because head motion and facial expression changes cause unpredictable fluctuations in the light reflected by various regions on the face.
- the head motion/facial expression corruption is present in different magnitudes in different parts of the face.
- the ROI sub-regions serve the purpose of dividing the face into several small parts to look for parts that have accumulated the least of corruptions.
- the corruptions introduced by head motion, which is a rigid motion are different from the corruptions introduced by facial expressions and hence the sub-regions affected by each is determined using a different algorithm. Hence the subsequent method of pulse rate estimation is different for frames with different labels.
- the dynamic ROI filter 404 assigns a weight to each sub-region which is inversely proportional to the local variance of the pulse signal in that region. This filtering emphasizes sub-regions with small changes in the pulse signal amplitude as physiological activity-related fluctuations are small in magnitude as compared to the corruptions caused due to background illumination changes or other unaccounted factors.
- the noise correction unit 405 simply takes a ROI filter-weighted average of the sub-regional pulse signals and then performs glare/shadow removal to remove the effect of any glare or shadow artifact on the pulse rate estimation process.
- the dynamic ROI filter 404 assigns a weight to each sub-region which is inversely proportional to the local variance of the pulse signal in that region.
- This step is similar to the region selection step used by prior art PTL1 to select useful ROI sub-regions. However, this step alone is not sufficient to remove head motion corruption, hence the ROI filtered-weighted average of sub-regional pulse signals is subjected to a motion correction step by the noise correction unit 405. In this motion correction step, the projection of horizontal head motion in the direction of pulse signal is subtracted from the combined (ROI filtered) pulse signal to obtain a noise free pulse signal 4051.
- the dynamic ROI filter unit assigns weights to each sub-region according to their estimated usefulness in HR extraction process.
- First method for emphasizing useful pulse fluctuations in the presence of facial expressions is to assign high weights to sub-regions with the least local variance observed in the hue channel values. Since light intensity values in most sub-regions contain large facial expression corruption, we choose the hue value, which lies in the direction orthogonal to the light intensity, to estimate the usefulness of ROI sub-regions. The hue value experiences less corruption from external factors as compared to the light intensity value and hence, in this method, the dynamic ROI filter 404 assigns weights inversely proportional to the local variance of the hue value in each sub-region.
- Second method for emphasizing useful pulse fluctuations in the presence of facial expressions is to assign high weights to sub-regions with the lowest maximum value of pulse signal. Since light intensity values in most sub-regions contain large facial expression corruption, most sub-regions have high local variance of pulse signal. However, the extracted pulse signal in sub-regions with larger corruption undergoes larger fluctuations (deviation from average observed value). So, the larger the facial expression corruption, the larger the maximum value of pulse signal extracted from that sub-region. Hence, in this second method of identifying the useful sub-regions in the presence of facial expressions, the dynamic ROI filter 404 assigns weights inversely proportional to the local maximum of pulse signal extracted from each sub-region.
- One or both of these two aforementioned methods can be used to emphasize useful ROI sub-regions for “expression” labeled frames.
- the noise correction unit 405 corrects facial expression corruption based on the movement of facial feature points rather than the changes in color of light reflected from the face.
- each ROI sub-region is penalized (assigned a low weight) directly proportional to the amount of fluctuation (movement over time) observed in the position of the facial feature point(s) lying inside/nearest to that ROI sub-region.
- the noise correction unit 405 suppresses the sub-regions that have moving facial muscles in and around them.
- the regions which are least affected by a particular facial expression change contribute the most towards pulse extraction.
- the glare/shadow removal is performed in the absence of head motion.
- the sub-regional pulse signals are combined together using the weights assigned by ROI filter, the facial expression correction step and the glare/shadow correction step to obtain the pulse signal 4051.
- the pulse signal 4051 is a time-series that represents physiological activity-related fluctuations observed on the face and has underwent head motion correction, facial expression correction and glare/shadow correction steps through procedures mentioned above. This pulse signal is fed into the pulse rate estimation unit 105 (in figure 1) for pulse rate value estimation.
- the pulse rate estimation unit 105 receives the pulse signal 5041 extracted through the process described above and analyzes it in the frequency domain.
- a frequency analysis is required if the physiological activity (e.g. heartbeat, which is a cardiac activity) is a quasi-periodic activity. For instance, in the case of cardiac activity, the heart beats in regular intervals and the length of these regular intervals changes slowly over time. Hence a frequency analysis performed over a short time can tell us about the pulse rate during that short period as the pulse rate frequency is expected to be prominent in the frequency estimate.
- Prior art PTL1 chooses one of the two methods to obtain an HR estimate from the frequency analysis of the HR pulse: one, take a fast Fourier transform (FFT) of the pulse signal over a short time (ranging from 1 second to 10 seconds) and choose the frequency with the highest peak in the FFT as the HR estimate; or two, take the power spectral density (PSD) of the pulse signal over a short time and choose the frequency with the highest energy in the PSD as the pulse rate (HR) estimate.
- FFT fast Fourier transform
- PSD power spectral density
- This pulse rate estimate 5051 is directly declared as the final HR estimate (after the application of moving average filter to remove outliers).
- pulse rate of a physiological process can’t undergo a large abrupt change and a spectral peak tracker that enforces such a constraint is required, especially in phases involving severe head motion and/or facial expression distortion.
- the spectral peak tracker 106 obtains frame labels 5021 from noise source detector 103 and pulse rate estimate candidates 5051 from the pulse rate estimation unit 105 and performs bi-directional label-specific spectral peak tracking on the pulse rate estimate candidates 5051. The effects of this procedure are especially prominent in cases where the motion and/or expression corruption is severe and the noise is more dominant than the pulse signal (despite noise-specific correction steps).
- the spectral peak tracker 106 performs a label-specific process which is described in detail below:
- the pulse rate estimate candidate 5051 corresponding to the highest peak in the FFT is considered as the final pulse rate estimate 5061 and is considered as a “reliable” as the HR pulse signal 5041 is less likely to be corrupted by noise.
- the “reliable” pulse rate estimates 5061 are used as benchmark and when the transition is made from “still” labeled frames into frames which are not reliable, i.e. the frames labeled “motion” and “expression”, pulse rate estimate candidates 5051 which lie close to the nearest reliable HR peaks are tracked through the noisy periods. This peak tracking is performed in both directions to maintain the continuity of HR estimates from one “reliable” period to the other.
- the crucial step of identifying reliable HR estimates 5061 solves the problem of obtaining a string of erroneous estimates and outputs a string of consistent pulse rate estimates when noise becomes dominant.
- spectral peak trackers used in prior arts PTL2 and PTL3 are employed in the tracking of spectral peaks of Heart rate estimates and slow varying frequencies, respectively.
- the HR being tracked in PTL2 is obtained using wearable HR sensors and these HR estimates are traditionally known to be much more robust to motion corruption than the HR estimates obtained through the observation of color changes on a human skin.
- both PTL2 and PTL3 use an additional sensor (e.g. an accelerometer), to measure the motion and the magnitude of corruption introduced into the HR estimates.
- an additional sensor e.g. an accelerometer
- the use of this additional sensor makes the noise in the HR estimates predictable to an extent and simplifies the tracking process.
- the extracted HR pulse 5041 and the subsequent steps of the new invention can also be used (with slight modification) to estimate other pulse signal related statistics such as the Heart Rate Variability (HRV) in the case of cardiac pulse signal which is measured using the time difference between subsequent peaks in the cardiac pulse signal.
- HRV Heart Rate Variability
- a first effect is to ensure that it is possible to estimate HR with high accuracy even in several noise sources.
- the present embodiment detects the presence of head motion and/or facial expression changes and labels each video frame accordingly.
- This step is vital to identifying the noise corrupting the pulse signal and helps in adapting the ROI filter and pulse extraction steps in order to effectively localize and remove noise of both kinds, rigid and non-rigid.
- the generated video frame labels are also used for the spectral peak tracking step to identify which pulse rate estimates are reliable and which FFT peaks (during frequency analysis) are actually pulse frequency peaks and which ones are noise peaks.
- a second effect is to ensure that it is possible to estimate HR with high accuracy even in the presence of head motion.
- the present embodiment creates an ROI filter that assigns weights to each ROI sub-region inversely proportional to the local variance of the pulse intensity. This results in small weight being assigned to regions with large fluctuations in pulse signal and large weight being assigned to sub-regions with small fluctuations, since pulse fluctuations are typically small.
- a third effect is to ensure that it is possible to estimate HR with high accuracy even in the absence of head motion.
- the present embodiment suppresses sub-regions containing glare or shadow spots which contain little HR information, and directly improves pulse extraction process in the absence of head motion.
- a fourth effect is to ensure that it is possible to estimate HR with high accuracy even in the presence of facial expression.
- the present embodiment creates an ROI filter that assigns weights to each ROI sub-region inversely proportional to local variance of hue channel intensity. This results in the fluctuations present in the direction of white light, which corrupt most sub-regions, to be ignored completely. As a result, sub-regions with small fluctuations in the direction orthogonal to white light are given large weight.
- it creates an ROI filter that assigns weights to each ROI sub-region inversely proportional to local maximum of the pulse signal extracted from that sub-region. In the presence of non-rigid, unpredictable fluctuations present on most sub-regions on the face, this ROI filtering process emphasizes sub-regions with the lowest amount of corruption introduced due to facial expression changes.
- the present embodiment achieves facial expression correction by first measuring the amount of noise present in each ROI sub-region by measuring the movement of the facial feature point(s) lying inside/near the ROI sub-region and then suppressing the sub-regions with the largest feature point movements and emphasizing the ones with the least.
- This is a color-independent facial expression correction step that results in efficient noise removal in a constantly changing area under observation.
- a fifth effect is to ensure that it is possible to estimate HR with high accuracy even in the presence of subtle head motions or facial expressions.
- the green channel is the most robust to noise out of the three-color channels (red, blue and green) and preserves the pulse signal in the presence of subtle head motion or facial expressions as opposed to the chrominance pulse used in the prior art.
- same processing can be performed by using near infrared cameras or infrared cameras and using output pixel values of these imaging devices. In that case, it is possible to obtain the effect that the HR can be estimated with high accuracy even in a dark place.
- a sixth effect is to ensure that it is possible to estimate HR with high accuracy even in the presence of severe head motions or facial expressions.
- Dominant noise peaks in the FFT which appear due to severe head motion and facial expression are removed as a result of spectral peak tracking. This step helps in identifying the true pulse rate frequency when the above mentioned steps fail to effectively remove noise or when the noise is too dominant.
- Program A program of the present embodiment need only be a program for causing a computer to execute steps A1 to A19 shown in FIG. 2.
- the pulse rate estimation apparatus 100 and the pulse rate estimation method according to the present embodiment can be realized by installing the program on a computer and executing it.
- the Processor of the computer functions as the video capturing unit 101, the face tracker 102, the noise source detector 103, the ROI selection and pulse signal extraction unit 104, the pulse rate estimation unit 105, the spectral peak tracker 106, and performs processing.
- the program according to the present exemplary embodiment may be executed by a computer system constructed using a plurality of computers.
- each computer may function as a different one of the video capturing unit 101, the face tracker 102, the noise source detector 103, the ROI selection and pulse signal extraction unit 104, the pulse rate estimation unit 105, and the spectral peak tracker 106.
- FIG. 6 is a block diagram showing an example of a computer that realizes the pulse rate estimation apparatus according to an embodiment of the present invention.
- the computer 10 includes a CPU (Central Processing Unit) 11, a main memory 12, a storage device 13, an input interface 14, a display controller 15, a data reader/writer 16, and a communication interface 17. These units are connected via a bus 21 so as to be capable of mutual data communication.
- CPU Central Processing Unit
- the CPU 11 carries out various calculations by expanding programs (codes) according to the present embodiment, which are stored in the storage device 13, to the main memory 12 and executing them in a predetermined sequence.
- the main memory 12 is typically a volatile storage device such as a DRAM (Dynamic Random Access Memory).
- the program according to the present embodiment is provided in a state of being stored in a computer-readable storage medium 20. Note that the program according to the present embodiment may be distributed over the Internet, which is connected to via the communication interface 17.
- the storage device 13 includes a semiconductor storage device such as a flash memory, in addition to a hard disk drive.
- the input interface 14 mediates data transmission between the CPU 11 and an input device 18 such as a keyboard or a mouse.
- the display controller 15 is connected to a display device 19 and controls display on the display device 18.
- the data reader/writer 16 mediates data transmission between the CPU 11 and the storage medium 20, reads out programs from the storage medium 20, and writes results of processing performed by the computer 10 in the storage medium 20.
- the communication interface 17 mediates data transmission between the CPU 11 and another computer.
- the storage medium 20 include a general-purpose semiconductor storage device such as CF (Compact Flash (registered trademark)) and SD (Secure Digital), a magnetic storage medium such as a flexible disk, and an optical storage medium such as a CD-ROM (Compact Disk Read Only Memory).
- CF Compact Flash
- SD Secure Digital
- CD-ROM Compact Disk Read Only Memory
- the pulse rate estimation apparatus 100 can also be realized using items of hardware corresponding to various components, rather than using the computer having the program installed therein. Furthermore, a part of the pulse rate estimation apparatus 100 may be realized by the program, and the remaining part of the pulse rate estimation apparatus 100 may be realized by hardware.
- a pulse rate estimation apparatus based on observation of human skin, comprising: a video capturing unit that captures a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted, a body-part tracking unit that detect specific human body part, and generate feature points indicating important structural landmarks on the body part for each frame where the body part is detected; a noise source detector that identify the noise source and assign labels to each video; a ROI selection and pulse signal extraction unit that select an ROI and divide it into sub-regions, extract pulse signal from each sub-regions, create a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and apply label-dependent noise estimation and correction to the pulse signals extracted from each sub-region; a pulse rate estimation unit that combine the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and perform frequency analysis on that extracted noise suppressed pulse signal and generate pulse rate candidates; a spectral peak
- the pulse rate estimation apparatus detects the presence of rigid and/or non-rigid motion as the noise source, and assign labels to each video frame indicating the detected noise-source.
- the pulse rate estimation apparatus selects an area on human skin from where pulse rate is measured, divides the area into more than one sub-regions, extracts a pulse signal from each sub-region signifying color changes happening due to a physiological activity observed in that sub-region, estimates the amount of useful information present inside each sub-region, uses label assigned by noise-source detector to estimate amount of useful information present inside each sub-region, creates label-dependent ROI filter which emphasizes sub-regions by a measure which is related to the estimated amount of useful pulse information present inside that sub-region, applies label-dependent noise estimation and removal steps to the pulse signals extracted from each sub-region.
- the pulse rate estimation apparatus uses the fluctuations in the position of the feature point associated with the nose or another body part which is robust to non-rigid motion to estimate and/or remove rigid motion noise from extracted pulse signal when noise-source detector assigns label indicating presence of rigid motion, uses the fluctuations in the position of the feature point associated with the landmarks on body that show non-rigid movement to estimate and/or remove non-rigid motion noise from extracted pulse signal when noise-source detector assigns label indicating presence of non-rigid motion.
- the pulse rate estimation apparatus uses variance of pulse signal extracted from each sub-region to estimate useful information when noise-source detector assigns label indicating presence of rigid motion, and uses either variance of hue changes or local maximum of pulse signal observed in each sub-region to estimate useful information when noise-source detector assigns label indicating presence of non-rigid motion.
- the pulse rate estimation apparatus combines the pulse extracted from each ROI sub-region to form a time series representing overall extracted pulse signal, combines pulse signals from each ROI sub-region by taking their linear combination, uses weights assigned by ROI filter to create said linear combination, and perform frequency analysis on that extracted overall pulse signal to generate pulse rate candidates.
- the pulse rate estimation apparatus removes or replaces noisy pulse rate estimates, wherein, frequency peaks related to noise in the frequency band of interest are removed through identification of reliable and unreliable estimates, attempts to identify the correct pulse rate estimate out of a set of more than one pulse rate candidates using reliable pulse rate estimates identified in previous and/or upcoming video frames.
- a pulse rate estimation method based on observation of human skin comprising: (a) a step of capturing a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted, (b) a step of detecting specific human body part, and generating feature points indicating important structural landmarks on the body part for each frame where the body part is detected; (c) a step of identifying the noise source and assign labels to each video; (d) a step of selecting an ROI and divide it into sub-regions, extracting pulse signal from each sub-regions, creating a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and applying label-dependent noise estimation and correction to the pulse signals extracted from each sub-region; (e) a step of combing the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and performing frequency analysis on that extracted noise suppressed pulse signal and generating pulse rate candidates; (f)
- the pulse rate estimation method according to supplementary note 10 or 11, Wherein, in the step (d), an area on human skin from where pulse rate is measured is selected, the area is divided into more than one sub-regions, a pulse signal is extracted from each sub-region signifying color changes happening due to a physiological activity observed in that sub-region, estimates the amount of useful information present inside each sub-region, label assigned by noise-source detector is used to estimate amount of useful information present inside each sub-region, label-dependent ROI filter which emphasizes sub-regions by a measure which is related to the estimated amount of useful pulse information present inside that sub-region is created, label-dependent noise estimation and removal steps are applied to the pulse signals extracted from each sub-region.
- a computer-readable storage medium storing a program that includes commands for causing a computer to execute: (a) a step of capturing a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted, (b) a step of detecting specific human body part, and generating feature points indicating important structural landmarks on the body part for each frame where the body part is detected; (c) a step of identifying the noise source and assign labels to each video; (d) a step of selecting an ROI and divide it into sub-regions, extracting pulse signal from each sub-regions, creating a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and applying label-dependent noise estimation and correction to the pulse signals extracted from each sub-region; (e) a step of combing the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and performing frequency analysis on that extracted noise suppressed pulse signal and
- the computer-readable storage medium storing according to supplementary note 19 or 20, wherein, in the step (d), an area on human skin from where pulse rate is measured is selected, the area is divided into more than one sub-regions, a pulse signal is extracted from each sub-region signifying color changes happening due to a physiological activity observed in that sub-region, estimates the amount of useful information present inside each sub-region, label assigned by noise-source detector is used to estimate amount of useful information present inside each sub-region, label-dependent ROI filter which emphasizes sub-regions by a measure which is related to the estimated amount of useful pulse information present inside that sub-region is created, label-dependent noise estimation and removal steps are applied to the pulse signals extracted from each sub-region.
- the present invention is useful in fields requiring pulse rate measurement.
- Pulse rate estimation apparatus 10 Computer 11 CPU 12 Main memory 13 Storage device 14 Input interface 15 Display controller 16 Data reader/writer 17 Communication interface 18 Input device 19 Display apparatus 20 Storage medium 21 Bus 100 Pulse rate estimation apparatus 101 Video capturing unit 102 Face tracker (a body-part tracking unit) 103 Noise source detector 104 ROI selection and pulse signal extraction unit 105 Pulse rate estimation unit 106 Spectral peak tracker
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Medical Informatics (AREA)
- Molecular Biology (AREA)
- Physics & Mathematics (AREA)
- Animal Behavior & Ethology (AREA)
- Pathology (AREA)
- Public Health (AREA)
- Veterinary Medicine (AREA)
- Cardiology (AREA)
- Signal Processing (AREA)
- Physiology (AREA)
- Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)
- Measuring And Recording Apparatus For Diagnosis (AREA)
Abstract
A pulse rate estimation apparatus based on observation of human skin, includes a video capturing means that captures a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted, a body-part tracking means that detect specific human body part, and generate feature points indicating important structural landmarks on the body part for each frame where the body part is detected, a noise source detection means that identify the noise source and assign labels to each video, a ROI selection and pulse signal extraction means that select an ROI and divide it into sub-regions, extract pulse signal from each sub-regions, create a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and apply label-dependent noise estimation and correction to the pulse signals extracted from each sub-region, a pulse rate estimation means that combine the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and perform frequency analysis on that extracted noise suppressed pulse signal and generate pulse rate candidates, a spectral peak tracking means that select a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
Description
The present invention relates to an apparatus and a method for robust physiological pulse rate estimation from videos of human face, and a computer-readable storage medium storing a program for realizing these.
There’s been growing interest in measurement of physiological information aimed at stress detection, health care, and accident prevention. Heart rate (HR) measurements are especially important as it’s been shown that human psychological states as stress, arousal, and sleepiness can be estimated from HR. While HRs are usually measured by contact-based means, especially electrocardiography, however for the aforementioned applications, continuous, simpler measurement is necessary. To this end, in recent years, HR measurement techniques employing videos captured with commonly-used cameras have been proposed.
Processes of a pulse rate measurement system can be broadly categorized into three steps: The first is to capture a video of a human face, which can be done using a video capturing device; the second is to extract the required pulse signal from a region of interest (ROI) on the face, which essentially represents periodic fluctuations happening on the color of face because of physiological activity (e.g. periodic heart-beating activity), henceforth referred to as pulse fluctuations; the third is to estimate the pulse rate frequency by performing frequency analysis on the extracted pulse signal.
In real world scenarios, noise, from background illumination changes, rigid head motion as well as change in facial expression, often affects performance of pulse extraction. Due to a variety of noise, the pulse signal can get corrupted and the expected pulse fluctuations can’t be captured effectively, thus negatively affecting the performance (e.g. accuracy) of pulse rate estimation methods.
Prior art which deals with heart rate estimation on face videos is disclosed in Patent Literature 1 and is illustrated as a block diagram in Figure 7. This prior art uses self-adaptive matrix completion on chrominance features to extract fluctuations (due to cardiac activity) from a color video of human face in the presence of head movement, facial expressions and background illumination changes. The important steps in this prior art are: a) generate facial feature points by tracking face on video (602); b) then inside the ROI selection and pulse extraction unit 603, illustrated in greater detail in Figure 8- i) use facial feature points to select region of interest on face and divide it into sub-regions (702); ii) extract pulse signal (chrominance) from each sub-region (703); iii) create binary ROI mask to remove sub-regions with high local (temporal) variance of pulse signal (704); iv) perform matrix completion to estimate a low rank matrix that approximates the ROI-masked chrominance signal matrix and combine linearly dependent rows to extract cardiac pulse signal (705); c) perform frequency analysis (604) on the extracted pulse signal to estimate heart rate (HR).
This prior art Patent Literature 1 employs matrix completion as a method for replacing the removed noisy ROI sub-regions on the face and groups the linearly dependent rows together because the pulse fluctuations are supposed to be present in all sub-regions (corrupted by varying amounts of noise). The method for identifying noisy regions (prior art considers regions with high pulse intensity variance as noisy) and the subsequent removal of noise, however, is sub-optimal and is a contributing factor to the low HR estimation accuracy of prior art. The main reason for this sub-optimality, is that the prior art doesn’t optimize noise localization and noise removal processes depending on the cause of noise and hence, gives low performance.
Prior art Patent Literature 1 uses the local variance of pulse signal intensity as a measure of the noise present in each ROI sub-region and relies on chrominance value to extract pulse signal. The chrominance signal is a linear combination of the three color-channels (with fixed coefficients) which is intended to be orthogonal to the direction of white light, but suffers due to the standardized-skin-tone assumption used to derive the fixed coefficients. Contrary to the standardized-skin-tone assumption, all people don’t have similar skin tone, hence the chrominance signal isn’t truly orthogonal to the direction of white light for a lot of skin colors, which is a reason for its failure to effectively extract pulse signal in such cases.
For cases where head motion (rigid noise) is involved, local variance of pulse intensity is a good measure of the noise present in the pulse signal, since physiological fluctuations are very small compared to noise. But the identification of noisy regions is not sufficient to remove rigid noise as local variance of pulse signal only tells us which ROI sub-region has suffered the most noise due to the rigid motion. But even the less noisy regions contain noise due to change in the relative positions of the light source, the face and the video capturing device.
For cases where head motion is absent, the local variance approach fails on another account. The regions which contain bright reflection (glare) spots or dark (shadow) spots, due to a certain placement of the light source, are usually the ones with the lowest light intensity variance. This is because without face movement, the glare/shadow artifact has a nearly constant intensity value. Consequently, such regions also contain the lowest amount of pulse information. A low local variance approach, as the one implemented in prior art, is likely to select such ROI sub-regions in the absence of head motion and this can be detrimental to the HR estimation accuracy.
For cases where facial expression (non-rigid noise) is involved, local variance of pulse intensity is not a good measure of the noise, as a non-rigid deformation of face when exhibiting an expression introduces noise in most regions of the face and so, most regions have a high variance due to unpredictable amount of noise being present. Most of the non-rigid noise is present in the direction of the white light and prior art’s pulse signal is not orthogonal to it.
Another limitation of this prior art originates from the chrominance signal being ineffective in cases where the strength of cardiac fluctuations is comparable to the strength of noise, for instance, in low light conditions or when noise is introduced by subtle changes in facial expressions and head motion. This particular limitation arises from the fact that chrominance signal is designed to cancel the noise fluctuations out when noise is dominant and reinforce the pulse fluctuations when pulse fluctuations are dominant, but it fails to do either when the strength of noise and cardiac fluctuations are comparable.
Finally, in severe cases of head motion or facial expression changes, for instance if a person is both laughing and moving their head at the same time, it can be very difficult to track and identify useful sub-regions on the ROI and/or the noise present in the extracted HR pulse can be much more dominant as compared to the cardiac fluctuations. In the presence of dominant noise, the frequency analysis used in the prior art Patent Literature 1, fails to pick the true HR frequency. A lack of sophisticated spectral peak tracking in case of severe head motion and facial expression noise is one of the reasons for poor pulse rate estimates.
There have been prior arts, Patent Literature 2 and Patent Literature 3, that perform spectral peak tracking for pulse rate measurements performed using wearable/contact-based sensors, as illustrated using a block diagram in Figure 8.
A first problem of HR estimation in prior art is the deterioration of estimation accuracy. The reason for the occurrence of the first problem is, several noise sources, mainly due to rigid and non-rigid motion performed by the person under observation, introduce complex corruption to observed signal and make it difficult to estimate HR, but prior art does not differentiate between the noise sources.
A second problem is HR estimation accuracy is deterioration in the presence of head motion. The reason for the occurrence of this problem is, prior art uses local pulse intensity variance to identify noisy sub-regions but fails to sufficiently correct for head motion noise as a displacement of the face introduces noise fluctuations in the observed pulse from all regions.
A third problem is HR estimation accuracy is deterioration in the absence of head motion. The reason for the occurrence of this problem is, prior art uses local pulse intensity variance to identify noisy sub-regions and hence, selects some glare/shadow spots which show low local variance in light intensity but don’t contain useful pulse information.
A fourth problem is HR estimation accuracy is deterioration in the presence of facial expression. The reason for the occurrence of this problem is, prior art uses local pulse intensity variance to identify noisy sub-regions but here, since the face undergoes deformation and the area under observation changes, unpredictable noise fluctuations are introduced in most regions. In this case, chrominance pulse is not orthogonal to the direction of white light and the identification of noisy regions becomes difficult due to noise being present in most of them.
A fifth problem is HR estimation accuracy is deterioration in the presence of subtle head motions or facial expressions. The reason for the occurrence of this problem is, when strength of noise fluctuations is similar to strength of pulse fluctuations, chrominance value is a poor representation of the pulse signal as the linear combination is neither able to sufficiently suppress noise, nor able to sufficiently emphasize pulse fluctuations.
A sixth problem is HR estimation accuracy is deterioration in the presence of severe head motions or facial expressions. The reason for the occurrence of this problem is, when the noise is dominant over the pulse signal, prior art is unable to distinguish between noise frequency and pulse rate because of the naive frequency analysis used to estimate pulse rate from the extracted pulse signal.
In addition, the inventions disclosed in Patent Literature 2 and Patent Literature 3 have a problem that an additional sensor (e.g. accelerometer) 804 is required to detect the presence and magnitude of noise. Such sensors also help the prior arts Patent Literature 2 and Patent Literature 3 to estimate the effect of noise on the observed pulse signal.
One example of an object of the present invention is to provide a pulse rate estimation apparatus, pulse rate estimation method, and a computer-readable storage medium according to which the above-described problems are eliminated.
In addition to, the present invention uses a noise-source identification step to detect the presence of noise (both head motion and facial expression- which a wearable pulse sensor doesn’t need to deal with) and preforms spectral peak tracking without the need of an additional motion sensor which can predict the effect of noise on the observed pulse signal.
In order to achieve the foregoing object, a pulse rate estimation apparatus according to one aspect of the present invention includes:
a video capturing unit that captures a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted,
a body-part tracking unit that detect specific human body part, and generate feature points indicating important structural landmarks on the body part for each frame where the body part is detected;
a noise source detector that identify the noise source and assign labels to each video;
a ROI selection and pulse signal extraction unit that select an ROI and divide it into sub-regions, extract pulse signal from each sub-regions, create a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and apply label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
a pulse rate estimation unit that combine the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and perform frequency analysis on that extracted noise suppressed pulse signal and generate pulse rate candidates;
a spectral peak tracker that select a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
a video capturing unit that captures a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted,
a body-part tracking unit that detect specific human body part, and generate feature points indicating important structural landmarks on the body part for each frame where the body part is detected;
a noise source detector that identify the noise source and assign labels to each video;
a ROI selection and pulse signal extraction unit that select an ROI and divide it into sub-regions, extract pulse signal from each sub-regions, create a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and apply label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
a pulse rate estimation unit that combine the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and perform frequency analysis on that extracted noise suppressed pulse signal and generate pulse rate candidates;
a spectral peak tracker that select a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
In order to achieve the foregoing object, a pulse rate estimation method according to another aspect of the present invention includes:
(a) a step of capturing a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted,
(b) a step of detecting specific human body part, and generating feature points indicating important structural landmarks on the body part for each frame where the body part is detected;
(c) a step of identifying the noise source and assign labels to each video;
(d) a step of selecting an ROI and divide it into sub-regions, extracting pulse signal from each sub-regions, creating a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and applying label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
(e) a step of combing the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and performing frequency analysis on that extracted noise suppressed pulse signal and generating pulse rate candidates;
(f) a step of selecting a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
(a) a step of capturing a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted,
(b) a step of detecting specific human body part, and generating feature points indicating important structural landmarks on the body part for each frame where the body part is detected;
(c) a step of identifying the noise source and assign labels to each video;
(d) a step of selecting an ROI and divide it into sub-regions, extracting pulse signal from each sub-regions, creating a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and applying label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
(e) a step of combing the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and performing frequency analysis on that extracted noise suppressed pulse signal and generating pulse rate candidates;
(f) a step of selecting a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
In order to achieve the foregoing object, a computer-readable recording medium according to still another aspect of the present invention has recorded therein a program, and the program includes an instruction to cause the computer to execute:
(a) a step of capturing a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted,
(b) a step of detecting specific human body part, and generating feature points indicating important structural landmarks on the body part for each frame where the body part is detected;
(c) a step of identifying the noise source and assign labels to each video;
(d) a step of selecting an ROI and divide it into sub-regions, extracting pulse signal from each sub-regions, creating a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and applying label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
(e) a step of combing the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and performing frequency analysis on that extracted noise suppressed pulse signal and generating pulse rate candidates;
(f) a step of selecting a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
(a) a step of capturing a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted,
(b) a step of detecting specific human body part, and generating feature points indicating important structural landmarks on the body part for each frame where the body part is detected;
(c) a step of identifying the noise source and assign labels to each video;
(d) a step of selecting an ROI and divide it into sub-regions, extracting pulse signal from each sub-regions, creating a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and applying label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
(e) a step of combing the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and performing frequency analysis on that extracted noise suppressed pulse signal and generating pulse rate candidates;
(f) a step of selecting a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
As described above, according to the present invention, it is possible to improve HR estimation accuracy in the presence of rigid and/or non-rigid motion noise. The identification of noise source makes it possible to improve HR estimation accuracy in the presence of rigid and non-rigid motion noise of varying degrees and this can be done without the need of an additional motion sensor.
The drawings together with the detailed description, serve to explain the principles for the inventive method. The drawings are for illustration and do not limit the application of the technique.
A block diagram illustrating an example of primary embodiment of the present invention
A flowchart illustrating an example of primary embodiment of the present invention
A block diagram illustrating the noise source detection step of primary embodiment of the present invention
A block diagram detailing ROI selection and Pulse signal extraction step of primary embodiment of the present invention
A block diagram illustrating the spectral peak tracking step of primary embodiment of the present invention
A block diagram illustrating an example of a computer that realizes the pulse rate estimation apparatus according to an embodiment of the present invention.
A block diagram of prior art PTL1 - self-adaptive matrix completion and adaptive ROI selection for HR pulse extraction on human face
A block diagram detailing ROI selection and Pulse signal extraction step of prior art PTL1
A block diagram of prior art PTL2, PTL3 - wearable sensor pulse rate measurement using spectral peak tracking
(Embodiment)
Example embodiment of the present invention are described in detail below referring to the accompanying drawings.
Example embodiment of the present invention are described in detail below referring to the accompanying drawings.
Device Configuration
First, a configuration of a pulserate estimation apparatus 100 according to the present embodiment will be described using FIG. 1. FIG. 1 is a block diagram schematically showing the configuration of the pulse rate estimation according to the embodiment of the present invention.
First, a configuration of a pulse
As shown in FIG. 1, the pulse rate estimation apparatus 100 includes a video capturing unit 101, a face tracker (a body-part tracking unit) 102, a noise source detector 103, a ROI selection and pulse signal extraction unit 104, a pulse rate estimation unit 105, and a spectral peak tracker 106. The pulse rate estimation apparatus 100 output estimated pulse rate (HR) 1061.
Please note that the face tracker 102 in the primary embodiment of the present invention tracks human face, but pulse rate estimation can be performed on many other body parts where the skin is visible, like hand or ear. So in other embodiments of our method, 102 can be a hand tracker or an ear tracker or such.
The video capturing unit 101 captures a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted. The body-part tracking means that detect specific human body part, and generate feature points indicating important structural landmarks on the body part for each frame where the body part is detected. The noise source detection means that identify the noise source and assign labels to each video.
The ROI selection and pulse signal extraction means that select an ROI and divide it into sub-regions, extract pulse signal from each sub-regions, create a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and apply label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
The pulse rate estimation means that combine the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and perform frequency analysis on that extracted noise suppressed pulse signal and generate pulse rate candidates.
The spectral peak tracking means that select a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
Operations of Apparatus
Next, operations performed by the pulserate estimation apparatus 100 according to an embodiment of the present invention will be described with reference to Figure 2. Figure 2 illustrates the flowchart of the first embodiment of the present invention. This Figure illustrates the process implemented to estimate heart rate from a video of a person’s face. Figure 1 will be referred to as needed in the following description.
Next, operations performed by the pulse
Also, in the present embodiment, the pulse rate estimation method is carried out by allowing the pulse rate estimation apparatus 100 to operate. Accordingly, the description of the pulse rate estimation method of the present embodiment will be substituted with the following description of operations performed by the pulse rate estimation apparatus 100.
The video capturing unit 101 (in Figure 1) captures video of a human face (Step A1). Next, the face tracker 102 (in Figure 1) tracks the face of the person being observed to output face feature points for each video frame (Step A2). After that, the noise source is identified by noise source detector 103 (in Figure 1).
Simultaneously, inside the ROI selection and pulse extraction unit 104, an ROI is selected and divided into several sub-regions for the localization of noise and pulse information (Step A3).
Next, a pulse signal is extracted from each ROI sub-region (Step A4). The noise source detector 103 identify the noise source and assign labels to each video frame (Step A5). The frame label assigned by noise source detector 103 is used to determine the dynamic ROI filtering process applied to the set of extracted sub-regional pulse signals.
Then, label-specific correction steps are performed to remove the noise and obtain a combined, noise-free pulse signal (Steps A6-A8, A11-A13, A14-A16).
Next, the pulse rate estimation unit 105 generates a set of pulse rate estimate candidates for each video frame, which are classified as reliable or noisy estimates by the spectral peak tracker 106, depending on the frame label (Steps A9 and A17).
The spectral peak tracker 106 then performs spectral peak tracking to select the correct pulse rate frequency from the set of noisy pulse rate estimate candidates (Step A18) and outputs the final pulse rate estimates for each video frame (Step A19). This flow process is discussed in greater detail below, starting from the top.
(Noise Source Estimation)
Figure 3 illustrates a block diagram for the noise source detection and frame label assignment step of the present embodiment. Theface tracker 102 creates a stream of facial feature points 3011 for each video frame 3012 where the face of a person is observed. A number of facial feature points 3011 are generated on landmarks on the face such as, but not limited to, the nose, the eyes, corner of the lips, boundary of the face, etc. These facial feature points 3011 are generated for each frame in the video 3012 where a face is detected (with the help of a face detection software).
Figure 3 illustrates a block diagram for the noise source detection and frame label assignment step of the present embodiment. The
The facial feature points 3011 are then fed to the noise source detector 103, which analyzes the movement of individual feature points with respect to time, as well as with respect to other feature points in order to detect whether the person in the video is performing (voluntary) head motions or if there are any facial expressions present on the face. Note that small involuntary vertical head motion has been shown to contain HR information in the past in the following reference. But for the purpose of activity detection, we are only concerned with large voluntary head motion, referred to simply as head motion hereafter, because such large head motion can cause corruption in the cardiac fluctuation pulse extracted by pulse extractor.
Balakrishnan G, Durand F and Guttag J, “Detecting pulse from head motions in video”, IEEE Conf. on Computer Vision and Pattern Recognition, 2013, pp 3430-3437
The position of the feature point on the nose is observed over time for the purpose of head motion detection as this feature point is fairly visible on all video frames that contain a face and this feature point is not severely affected by any facial expression changes on the face. Since most facial expressions involve the movement of the mouth/lips, hence for the purpose of facial expression detection, the position of the facial feature points on and around the lips are observed with respect to the other feature points which are least affected by facial expression change, like the facial feature points on the nose, the forehead, etc. If facial expressions or head motion is not observed in a sequence of frames, these frames are assigned a label “still” to signify the absence of any corruption affecting the pulse signal due to head motion or facial expressions. These frame labels are fed into the ROI selection and pulse extraction unit 104 and the spectral peak tracker 106 which are all described in detail below.
(ROI selection and Pulse Extraction)
(Fixed ROI selection and sub-regional pulse extraction)
Figure 4 illustrates a detailed block diagram of the ROI selection and pulse estimation unit of the new invention. Facial feature points 4011 andvideo frames 4012 are fed into the ROI selection and pulse extraction unit 104 (in figure 1) by the face tracker 102 (in Figure 1). As shown in FIG. 4, the ROI selection and pulse signal extraction unit 104 includes a Fixed ROI selector 402, a pulse extractor 403, a dynamic ROI filter generation unit 404, and noise correction unit 405.
(Fixed ROI selection and sub-regional pulse extraction)
Figure 4 illustrates a detailed block diagram of the ROI selection and pulse estimation unit of the new invention. Facial feature points 4011 and
In the Fixed ROI selector 402, the facial feature points 4012 are used to select a ROI on the video frame 4012 and this ROI is divided into several parts. A pulse signal is extracted from each sub-region by the pulse extractor 403 and these pulse signals are fed into the dynamic ROI filter 404, which also takes as input, frame labels from the noise source detector 103 (in Figure 1).
This noise source detector 103 uses the positions of facial feature points 4011 to detect the kind of noise (head motion, facial expression or neither) present in the video frames 4012 and assigns them with a label (“motion”, “expression” and “still”, respectively) which is fed into the ROI filter generation unit 404 and noise correction unit 405. The dynamic ROI filter generation unit 404 then creates a noise source-specific ROI filter to emphasize ROI sub-regions with useful pulse information and suppress those with noise.
The noise correction unit 405 combines the pulse signals from each ROI sub-region using the ROI filter generated in the previous step and then performs noise-source specific noise correction steps to generate a noise-free pulse signal 4051 which is fed into the pulse rate estimation unit 105 (in figure 1) for pulse rate estimation and subsequent spectral peak tracking steps.
The position and size of the ROI generated by the Fixed ROI selector 402 is decided based on the facial feature point on the nose as well as those on the face boundary. The ROI is a rectangular block centered at the center of the nose (the nose feature point) and its height and width is determined in a way that it lies between the eyes and the lips and covers the width of the face. This ROI is divided into many rectangular sub-regions of equal dimensions for the purpose of local analysis of the face regions to extract HR information from areas which are least affected by noise.
The pulse extractor 403 calculates the mean (over all pixels of each ROI sub-region) of the green channel values over a moving window of a few seconds (4-10 seconds) which represents the pulse signal of the ROI sub-region.
Next, each sub-region in the ROI is assigned a weight by the ROI selector; the more useful the information present in the pulse extracted from the sub-region, larger the weight assigned, the more contribution it has towards pulse rate estimation. The usefulness of a sub-region increases with the amount of pulse information (physiological activity-related fluctuations) it contains, or alternatively, the usefulness decreases with the amount of head-motion/facial expression corruption present in the pulse signal of the sub-region. This is important because head motion and facial expression changes cause unpredictable fluctuations in the light reflected by various regions on the face.
The head motion/facial expression corruption is present in different magnitudes in different parts of the face. The ROI sub-regions serve the purpose of dividing the face into several small parts to look for parts that have accumulated the least of corruptions. The corruptions introduced by head motion, which is a rigid motion are different from the corruptions introduced by facial expressions and hence the sub-regions affected by each is determined using a different algorithm. Hence the subsequent method of pulse rate estimation is different for frames with different labels.
(Noise source-specific Dynamic ROI filtering and Noise Correction)
When the label assigned to avideo frame 4012 is “still” it means that it doesn’t contain any corruption. For “still” labeled video frames, the dynamic ROI filter 404 assigns a weight to each sub-region which is inversely proportional to the local variance of the pulse signal in that region. This filtering emphasizes sub-regions with small changes in the pulse signal amplitude as physiological activity-related fluctuations are small in magnitude as compared to the corruptions caused due to background illumination changes or other unaccounted factors.
When the label assigned to a
Since there is no rigid or non-rigid corruption present in “still” labeled video frames, the noise correction unit 405 simply takes a ROI filter-weighted average of the sub-regional pulse signals and then performs glare/shadow removal to remove the effect of any glare or shadow artifact on the pulse rate estimation process.
For “motion” labeled video frames, the dynamic ROI filter 404 assigns a weight to each sub-region which is inversely proportional to the local variance of the pulse signal in that region. This step is similar to the region selection step used by prior art PTL1 to select useful ROI sub-regions. However, this step alone is not sufficient to remove head motion corruption, hence the ROI filtered-weighted average of sub-regional pulse signals is subjected to a motion correction step by the noise correction unit 405. In this motion correction step, the projection of horizontal head motion in the direction of pulse signal is subtracted from the combined (ROI filtered) pulse signal to obtain a noise free pulse signal 4051.
As a side note, we want to point out that the ROI selection procedure of prior art removes all noisy regions completely and assigns equal importance to all selected sub-regions, but we feel that this can be detrimental to the pulse extraction step as some regions are noisier than the others and equal importance to such regions can introduce unwanted corruption in the extracted pulse. Hence the dynamic ROI filter unit assigns weights to each sub-region according to their estimated usefulness in HR extraction process.
For “expression” labeled frames, the corruptions introduced are more complicated as it involves movement of facial regions relative to each other and enforces a change in the regions of the face under observation. Non-rigid motions of numerous facial muscles introduce unpredictable changes in the reflected light intensity as the skin gets stretched/compressed in some parts of the face. The light intensity changes in the case of facial expressions originate from the above mentioned factors that affect most ROI sub-regions that results in even the useful ROI sub-regions having high local variance where facial expressions are observed. For this reason, we think that the local variance of light intensity is not the best measure to identify useful sub-regions when facial expression corruption is involved. We propose two new methods to identify useful sub-regions and create dynamic ROI filter when facial expression corruption is involved, below.
First method for emphasizing useful pulse fluctuations in the presence of facial expressions is to assign high weights to sub-regions with the least local variance observed in the hue channel values. Since light intensity values in most sub-regions contain large facial expression corruption, we choose the hue value, which lies in the direction orthogonal to the light intensity, to estimate the usefulness of ROI sub-regions. The hue value experiences less corruption from external factors as compared to the light intensity value and hence, in this method, the dynamic ROI filter 404 assigns weights inversely proportional to the local variance of the hue value in each sub-region.
Second method for emphasizing useful pulse fluctuations in the presence of facial expressions is to assign high weights to sub-regions with the lowest maximum value of pulse signal. Since light intensity values in most sub-regions contain large facial expression corruption, most sub-regions have high local variance of pulse signal. However, the extracted pulse signal in sub-regions with larger corruption undergoes larger fluctuations (deviation from average observed value). So, the larger the facial expression corruption, the larger the maximum value of pulse signal extracted from that sub-region. Hence, in this second method of identifying the useful sub-regions in the presence of facial expressions, the dynamic ROI filter 404 assigns weights inversely proportional to the local maximum of pulse signal extracted from each sub-region.
One or both of these two aforementioned methods can be used to emphasize useful ROI sub-regions for “expression” labeled frames.
The noise correction unit 405 corrects facial expression corruption based on the movement of facial feature points rather than the changes in color of light reflected from the face. In this approach, each ROI sub-region is penalized (assigned a low weight) directly proportional to the amount of fluctuation (movement over time) observed in the position of the facial feature point(s) lying inside/nearest to that ROI sub-region. This way, the noise correction unit 405 suppresses the sub-regions that have moving facial muscles in and around them. As a result, the regions which are least affected by a particular facial expression change contribute the most towards pulse extraction. Additionally, the glare/shadow removal is performed in the absence of head motion. Finally, the sub-regional pulse signals are combined together using the weights assigned by ROI filter, the facial expression correction step and the glare/shadow correction step to obtain the pulse signal 4051.
The pulse signal 4051 is a time-series that represents physiological activity-related fluctuations observed on the face and has underwent head motion correction, facial expression correction and glare/shadow correction steps through procedures mentioned above. This pulse signal is fed into the pulse rate estimation unit 105 (in figure 1) for pulse rate value estimation.
(Pulse Rate Estimation and Spectral Peak Tracking)
As illustrated in figure 5, the pulserate estimation unit 105 receives the pulse signal 5041 extracted through the process described above and analyzes it in the frequency domain. A frequency analysis is required if the physiological activity (e.g. heartbeat, which is a cardiac activity) is a quasi-periodic activity. For instance, in the case of cardiac activity, the heart beats in regular intervals and the length of these regular intervals changes slowly over time. Hence a frequency analysis performed over a short time can tell us about the pulse rate during that short period as the pulse rate frequency is expected to be prominent in the frequency estimate.
As illustrated in figure 5, the pulse
Prior art PTL1 chooses one of the two methods to obtain an HR estimate from the frequency analysis of the HR pulse: one, take a fast Fourier transform (FFT) of the pulse signal over a short time (ranging from 1 second to 10 seconds) and choose the frequency with the highest peak in the FFT as the HR estimate; or two, take the power spectral density (PSD) of the pulse signal over a short time and choose the frequency with the highest energy in the PSD as the pulse rate (HR) estimate. This pulse rate estimate 5051 is directly declared as the final HR estimate (after the application of moving average filter to remove outliers). However, pulse rate of a physiological process can’t undergo a large abrupt change and a spectral peak tracker that enforces such a constraint is required, especially in phases involving severe head motion and/or facial expression distortion.
The spectral peak tracker 106 obtains frame labels 5021 from noise source detector 103 and pulse rate estimate candidates 5051 from the pulse rate estimation unit 105 and performs bi-directional label-specific spectral peak tracking on the pulse rate estimate candidates 5051. The effects of this procedure are especially prominent in cases where the motion and/or expression corruption is severe and the noise is more dominant than the pulse signal (despite noise-specific correction steps). The spectral peak tracker 106 performs a label-specific process which is described in detail below:
For “still” labeled frames, the pulse rate estimate candidate 5051 corresponding to the highest peak in the FFT is considered as the final pulse rate estimate 5061 and is considered as a “reliable” as the HR pulse signal 5041 is less likely to be corrupted by noise.
For “motion” and “expression” labeled frames, the “reliable” pulse rate estimates 5061 are used as benchmark and when the transition is made from “still” labeled frames into frames which are not reliable, i.e. the frames labeled “motion” and “expression”, pulse rate estimate candidates 5051 which lie close to the nearest reliable HR peaks are tracked through the noisy periods. This peak tracking is performed in both directions to maintain the continuity of HR estimates from one “reliable” period to the other. The crucial step of identifying reliable HR estimates 5061 solves the problem of obtaining a string of erroneous estimates and outputs a string of consistent pulse rate estimates when noise becomes dominant.
Please note here that the spectral peak trackers used in prior arts PTL2 and PTL3, are employed in the tracking of spectral peaks of Heart rate estimates and slow varying frequencies, respectively. The HR being tracked in PTL2 is obtained using wearable HR sensors and these HR estimates are traditionally known to be much more robust to motion corruption than the HR estimates obtained through the observation of color changes on a human skin.
Obviously, there is no involvement of complex corruptions of the kind of facial expression corruption which is observed in the field of video-based HR estimation problem. More importantly, both PTL2 and PTL3 use an additional sensor (e.g. an accelerometer), to measure the motion and the magnitude of corruption introduced into the HR estimates. The use of this additional sensor makes the noise in the HR estimates predictable to an extent and simplifies the tracking process. In the new invention, we do not use any additional sensors to measure distortion, but use the frame labels 5021 obtained by the noise source detector 103 to differentiate reliable estimates from the noisy ones.
It is important to note here that the extracted HR pulse 5041 and the subsequent steps of the new invention can also be used (with slight modification) to estimate other pulse signal related statistics such as the Heart Rate Variability (HRV) in the case of cardiac pulse signal which is measured using the time difference between subsequent peaks in the cardiac pulse signal.
As a final point, it should be clear that the process, techniques and methodology described and illustrated here are not limited or related to a particular apparatus. It can be implemented using a combination of components. Also, various types of general purpose devise may be used in accordance with the instructions herein. The present invention has also been described using a particular set of examples. However, these are merely illustrative and not restrictive. For example, the described software may be implemented in a wide variety of languages such as C++, Java, Python and Perl etc. Moreover, other implementations of the inventive technology will be apparent to those skilled in the art.
(Effects of the present embodiment)
A first effect is to ensure that it is possible to estimate HR with high accuracy even in several noise sources.
A first effect is to ensure that it is possible to estimate HR with high accuracy even in several noise sources.
According to the present embodiment, it detects the presence of head motion and/or facial expression changes and labels each video frame accordingly. This step is vital to identifying the noise corrupting the pulse signal and helps in adapting the ROI filter and pulse extraction steps in order to effectively localize and remove noise of both kinds, rigid and non-rigid. The generated video frame labels are also used for the spectral peak tracking step to identify which pulse rate estimates are reliable and which FFT peaks (during frequency analysis) are actually pulse frequency peaks and which ones are noise peaks.
A second effect is to ensure that it is possible to estimate HR with high accuracy even in the presence of head motion.
According to the present embodiment, it creates an ROI filter that assigns weights to each ROI sub-region inversely proportional to the local variance of the pulse intensity. This results in small weight being assigned to regions with large fluctuations in pulse signal and large weight being assigned to sub-regions with small fluctuations, since pulse fluctuations are typically small.
And it achieves head motion correction by subtracting the projection of these fluctuations in the direction of the pulse signal intensity. This results in the fluctuations which are a result of head motion (and not a result of the physiological process of interest) to be removed from the pulse signal. In frames where head motion is present, this step results in the extraction of a clean, noise-free pulse.
A third effect is to ensure that it is possible to estimate HR with high accuracy even in the absence of head motion.
According to the present embodiment, it suppresses sub-regions containing glare or shadow spots which contain little HR information, and directly improves pulse extraction process in the absence of head motion.
A fourth effect is to ensure that it is possible to estimate HR with high accuracy even in the presence of facial expression.
According to the present embodiment, it creates an ROI filter that assigns weights to each ROI sub-region inversely proportional to local variance of hue channel intensity. This results in the fluctuations present in the direction of white light, which corrupt most sub-regions, to be ignored completely. As a result, sub-regions with small fluctuations in the direction orthogonal to white light are given large weight. Alternatively, it creates an ROI filter that assigns weights to each ROI sub-region inversely proportional to local maximum of the pulse signal extracted from that sub-region. In the presence of non-rigid, unpredictable fluctuations present on most sub-regions on the face, this ROI filtering process emphasizes sub-regions with the lowest amount of corruption introduced due to facial expression changes.
And, the present embodiment achieves facial expression correction by first measuring the amount of noise present in each ROI sub-region by measuring the movement of the facial feature point(s) lying inside/near the ROI sub-region and then suppressing the sub-regions with the largest feature point movements and emphasizing the ones with the least. This is a color-independent facial expression correction step that results in efficient noise removal in a constantly changing area under observation.
A fifth effect is to ensure that it is possible to estimate HR with high accuracy even in the presence of subtle head motions or facial expressions.
According to the present embodiment, the green channel is the most robust to noise out of the three-color channels (red, blue and green) and preserves the pulse signal in the presence of subtle head motion or facial expressions as opposed to the chrominance pulse used in the prior art.
Alternatively, same processing can be performed by using near infrared cameras or infrared cameras and using output pixel values of these imaging devices. In that case, it is possible to obtain the effect that the HR can be estimated with high accuracy even in a dark place.
A sixth effect is to ensure that it is possible to estimate HR with high accuracy even in the presence of severe head motions or facial expressions.
According to the present embodiment, Dominant noise peaks in the FFT which appear due to severe head motion and facial expression are removed as a result of spectral peak tracking. This step helps in identifying the true pulse rate frequency when the above mentioned steps fail to effectively remove noise or when the noise is too dominant.
Program
A program of the present embodiment need only be a program for causing a computer to execute steps A1 to A19 shown in FIG. 2. The pulserate estimation apparatus 100 and the pulse rate estimation method according to the present embodiment can be realized by installing the program on a computer and executing it. In this case, the Processor of the computer functions as the video capturing unit 101, the face tracker 102, the noise source detector 103, the ROI selection and pulse signal extraction unit 104, the pulse rate estimation unit 105, the spectral peak tracker 106, and performs processing.
A program of the present embodiment need only be a program for causing a computer to execute steps A1 to A19 shown in FIG. 2. The pulse
The program according to the present exemplary embodiment may be executed by a computer system constructed using a plurality of computers. In this case, for example, each computer may function as a different one of the video capturing unit 101, the face tracker 102, the noise source detector 103, the ROI selection and pulse signal extraction unit 104, the pulse rate estimation unit 105, and the spectral peak tracker 106.
Also, a computer that realizes the pulse rate estimation apparatus 100 by executing the program according to the present embodiment will be described with reference to the drawings. Figure 6 is a block diagram showing an example of a computer that realizes the pulse rate estimation apparatus according to an embodiment of the present invention.
As shown in FIG. 6, the computer 10 includes a CPU (Central Processing Unit) 11, a main memory 12, a storage device 13, an input interface 14, a display controller 15, a data reader/writer 16, and a communication interface 17. These units are connected via a bus 21 so as to be capable of mutual data communication.
The CPU 11 carries out various calculations by expanding programs (codes) according to the present embodiment, which are stored in the storage device 13, to the main memory 12 and executing them in a predetermined sequence. The main memory 12 is typically a volatile storage device such as a DRAM (Dynamic Random Access Memory). Also, the program according to the present embodiment is provided in a state of being stored in a computer-readable storage medium 20. Note that the program according to the present embodiment may be distributed over the Internet, which is connected to via the communication interface 17.
Also, specific examples of the storage device 13 include a semiconductor storage device such as a flash memory, in addition to a hard disk drive. The input interface 14 mediates data transmission between the CPU 11 and an input device 18 such as a keyboard or a mouse. The display controller 15 is connected to a display device 19 and controls display on the display device 18.
The data reader/writer 16 mediates data transmission between the CPU 11 and the storage medium 20, reads out programs from the storage medium 20, and writes results of processing performed by the computer 10 in the storage medium 20. The communication interface 17 mediates data transmission between the CPU 11 and another computer.
Also, specific examples of the storage medium 20 include a general-purpose semiconductor storage device such as CF (Compact Flash (registered trademark)) and SD (Secure Digital), a magnetic storage medium such as a flexible disk, and an optical storage medium such as a CD-ROM (Compact Disk Read Only Memory).
The pulse rate estimation apparatus 100 according to the present exemplary embodiment can also be realized using items of hardware corresponding to various components, rather than using the computer having the program installed therein. Furthermore, a part of the pulse rate estimation apparatus 100 may be realized by the program, and the remaining part of the pulse rate estimation apparatus 100 may be realized by hardware.
The above-described embodiment can be partially or entirely expressed by, but is not limited to, the following Supplementary Notes 1 to 27.
(Supplementary Note 1)
A pulse rate estimation apparatus based on observation of human skin, comprising:
a video capturing unit that captures a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted,
a body-part tracking unit that detect specific human body part, and generate feature points indicating important structural landmarks on the body part for each frame where the body part is detected;
a noise source detector that identify the noise source and assign labels to each video;
a ROI selection and pulse signal extraction unit that select an ROI and divide it into sub-regions, extract pulse signal from each sub-regions, create a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and apply label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
a pulse rate estimation unit that combine the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and perform frequency analysis on that extracted noise suppressed pulse signal and generate pulse rate candidates;
a spectral peak tracker that select a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
A pulse rate estimation apparatus based on observation of human skin, comprising:
a video capturing unit that captures a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted,
a body-part tracking unit that detect specific human body part, and generate feature points indicating important structural landmarks on the body part for each frame where the body part is detected;
a noise source detector that identify the noise source and assign labels to each video;
a ROI selection and pulse signal extraction unit that select an ROI and divide it into sub-regions, extract pulse signal from each sub-regions, create a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and apply label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
a pulse rate estimation unit that combine the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and perform frequency analysis on that extracted noise suppressed pulse signal and generate pulse rate candidates;
a spectral peak tracker that select a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
(Supplementary Note 2)
The pulse rate estimation apparatus according to supplementary note 1,
Wherein the noise source detector detects the presence of rigid and/or non-rigid motion as the noise source, and assign labels to each video frame indicating the detected noise-source.
The pulse rate estimation apparatus according to supplementary note 1,
Wherein the noise source detector detects the presence of rigid and/or non-rigid motion as the noise source, and assign labels to each video frame indicating the detected noise-source.
(Supplementary Note 3)
The pulse rate estimation apparatus according to supplementary note 1 or 2,
Wherein the ROI selection and pulse signal extraction unit selects an area on human skin from where pulse rate is measured, divides the area into more than one sub-regions, extracts a pulse signal from each sub-region signifying color changes happening due to a physiological activity observed in that sub-region, estimates the amount of useful information present inside each sub-region, uses label assigned by noise-source detector to estimate amount of useful information present inside each sub-region, creates label-dependent ROI filter which emphasizes sub-regions by a measure which is related to the estimated amount of useful pulse information present inside that sub-region, applies label-dependent noise estimation and removal steps to the pulse signals extracted from each sub-region.
The pulse rate estimation apparatus according to supplementary note 1 or 2,
Wherein the ROI selection and pulse signal extraction unit selects an area on human skin from where pulse rate is measured, divides the area into more than one sub-regions, extracts a pulse signal from each sub-region signifying color changes happening due to a physiological activity observed in that sub-region, estimates the amount of useful information present inside each sub-region, uses label assigned by noise-source detector to estimate amount of useful information present inside each sub-region, creates label-dependent ROI filter which emphasizes sub-regions by a measure which is related to the estimated amount of useful pulse information present inside that sub-region, applies label-dependent noise estimation and removal steps to the pulse signals extracted from each sub-region.
(Supplementary Note 4)
The pulse rate estimation apparatus according to any of supplementary notes 1 to 3,
Wherein the ROI selection and pulse signal extraction unit uses the fluctuations in the position of the feature point associated with the nose or another body part which is robust to non-rigid motion to estimate and/or remove rigid motion noise from extracted pulse signal when noise-source detector assigns label indicating presence of rigid motion, uses the fluctuations in the position of the feature point associated with the landmarks on body that show non-rigid movement to estimate and/or remove non-rigid motion noise from extracted pulse signal when noise-source detector assigns label indicating presence of non-rigid motion.
The pulse rate estimation apparatus according to any of supplementary notes 1 to 3,
Wherein the ROI selection and pulse signal extraction unit uses the fluctuations in the position of the feature point associated with the nose or another body part which is robust to non-rigid motion to estimate and/or remove rigid motion noise from extracted pulse signal when noise-source detector assigns label indicating presence of rigid motion, uses the fluctuations in the position of the feature point associated with the landmarks on body that show non-rigid movement to estimate and/or remove non-rigid motion noise from extracted pulse signal when noise-source detector assigns label indicating presence of non-rigid motion.
(Supplementary Note 5)
The pulse rate estimation apparatus according to any of supplementary notes 1 to 3,
Wherein the ROI selection and pulse signal extraction unit uses variance of pulse signal extracted from each sub-region to estimate useful information when noise-source detector assigns label indicating presence of rigid motion, and uses either variance of hue changes or local maximum of pulse signal observed in each sub-region to estimate useful information when noise-source detector assigns label indicating presence of non-rigid motion.
The pulse rate estimation apparatus according to any of supplementary notes 1 to 3,
Wherein the ROI selection and pulse signal extraction unit uses variance of pulse signal extracted from each sub-region to estimate useful information when noise-source detector assigns label indicating presence of rigid motion, and uses either variance of hue changes or local maximum of pulse signal observed in each sub-region to estimate useful information when noise-source detector assigns label indicating presence of non-rigid motion.
(Supplementary Note 6)
The pulse rate estimation apparatus according to any of supplementary notes 1 to 3,
Wherein the ROI selection and pulse signal extraction unit identifies and suppresses regions with bright reflection/dark shadow spots when noise-source detector assigns label indicating absence of rigid motion.
The pulse rate estimation apparatus according to any of supplementary notes 1 to 3,
Wherein the ROI selection and pulse signal extraction unit identifies and suppresses regions with bright reflection/dark shadow spots when noise-source detector assigns label indicating absence of rigid motion.
(Supplementary Note 7)
The pulse rate estimation apparatus according to supplementary note 1 or 2,
Wherein the pulse rate estimation unit combines the pulse extracted from each ROI sub-region to form a time series representing overall extracted pulse signal, combines pulse signals from each ROI sub-region by taking their linear combination, uses weights assigned by ROI filter to create said linear combination, and perform frequency analysis on that extracted overall pulse signal to generate pulse rate candidates.
The pulse rate estimation apparatus according to supplementary note 1 or 2,
Wherein the pulse rate estimation unit combines the pulse extracted from each ROI sub-region to form a time series representing overall extracted pulse signal, combines pulse signals from each ROI sub-region by taking their linear combination, uses weights assigned by ROI filter to create said linear combination, and perform frequency analysis on that extracted overall pulse signal to generate pulse rate candidates.
(Supplementary Note 8)
The pulse rate estimation apparatus according to supplementary note 1 or 2,
Wherein the spectral peak tracker considers estimated pulse rate value as reliable when noise-source detector assigns label indicating absence of both rigid and non-rigid motion, considers estimated pulse rate value as unreliable when noise-source detector assigns label indicating presence of either rigid or non-rigid motion or both.
The pulse rate estimation apparatus according to supplementary note 1 or 2,
Wherein the spectral peak tracker considers estimated pulse rate value as reliable when noise-source detector assigns label indicating absence of both rigid and non-rigid motion, considers estimated pulse rate value as unreliable when noise-source detector assigns label indicating presence of either rigid or non-rigid motion or both.
(Supplementary Note 9)
The pulse rate estimation apparatus according to supplementary note 1 or 2,
Wherein the spectral peak tracker removes or replaces noisy pulse rate estimates, wherein, frequency peaks related to noise in the frequency band of interest are removed through identification of reliable and unreliable estimates, attempts to identify the correct pulse rate estimate out of a set of more than one pulse rate candidates using reliable pulse rate estimates identified in previous and/or upcoming video frames.
The pulse rate estimation apparatus according to supplementary note 1 or 2,
Wherein the spectral peak tracker removes or replaces noisy pulse rate estimates, wherein, frequency peaks related to noise in the frequency band of interest are removed through identification of reliable and unreliable estimates, attempts to identify the correct pulse rate estimate out of a set of more than one pulse rate candidates using reliable pulse rate estimates identified in previous and/or upcoming video frames.
(Supplementary Note 10)
A pulse rate estimation method based on observation of human skin, comprising:
(a) a step of capturing a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted,
(b) a step of detecting specific human body part, and generating feature points indicating important structural landmarks on the body part for each frame where the body part is detected;
(c) a step of identifying the noise source and assign labels to each video;
(d) a step of selecting an ROI and divide it into sub-regions, extracting pulse signal from each sub-regions, creating a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and applying label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
(e) a step of combing the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and performing frequency analysis on that extracted noise suppressed pulse signal and generating pulse rate candidates;
(f) a step of selecting a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
A pulse rate estimation method based on observation of human skin, comprising:
(a) a step of capturing a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted,
(b) a step of detecting specific human body part, and generating feature points indicating important structural landmarks on the body part for each frame where the body part is detected;
(c) a step of identifying the noise source and assign labels to each video;
(d) a step of selecting an ROI and divide it into sub-regions, extracting pulse signal from each sub-regions, creating a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and applying label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
(e) a step of combing the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and performing frequency analysis on that extracted noise suppressed pulse signal and generating pulse rate candidates;
(f) a step of selecting a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
(Supplementary Note 11)
The pulse rate estimation method according tosupplementary note 10,
Wherein, in the step (c), the presence of rigid and/or non-rigid motion as the noise source is detected, and labels are assigned to each video frame indicating the detected noise-source.
The pulse rate estimation method according to
Wherein, in the step (c), the presence of rigid and/or non-rigid motion as the noise source is detected, and labels are assigned to each video frame indicating the detected noise-source.
(Supplementary Note 12)
The pulse rate estimation method according to supplementary note 10 or 11,
Wherein, in the step (d), an area on human skin from where pulse rate is measured is selected, the area is divided into more than one sub-regions, a pulse signal is extracted from each sub-region signifying color changes happening due to a physiological activity observed in that sub-region, estimates the amount of useful information present inside each sub-region, label assigned by noise-source detector is used to estimate amount of useful information present inside each sub-region, label-dependent ROI filter which emphasizes sub-regions by a measure which is related to the estimated amount of useful pulse information present inside that sub-region is created, label-dependent noise estimation and removal steps are applied to the pulse signals extracted from each sub-region.
The pulse rate estimation method according to
Wherein, in the step (d), an area on human skin from where pulse rate is measured is selected, the area is divided into more than one sub-regions, a pulse signal is extracted from each sub-region signifying color changes happening due to a physiological activity observed in that sub-region, estimates the amount of useful information present inside each sub-region, label assigned by noise-source detector is used to estimate amount of useful information present inside each sub-region, label-dependent ROI filter which emphasizes sub-regions by a measure which is related to the estimated amount of useful pulse information present inside that sub-region is created, label-dependent noise estimation and removal steps are applied to the pulse signals extracted from each sub-region.
(Supplementary Note 13)
The pulse rate estimation method according to any ofsupplementary notes 10 to 12,
Wherein, in the step (d), the fluctuations in the position of the feature point associated with the nose or another body part which is robust to non-rigid motion to estimate and/or remove rigid motion noise from extracted pulse signal are used, when noise-source detector assigns label indicating presence of rigid motion, the fluctuations in the position of the feature point associated with the landmarks on body that show non-rigid movement to estimate and/or remove non-rigid motion noise from extracted pulse signal are used, when noise-source detector assigns label indicating presence of non-rigid motion.
The pulse rate estimation method according to any of
Wherein, in the step (d), the fluctuations in the position of the feature point associated with the nose or another body part which is robust to non-rigid motion to estimate and/or remove rigid motion noise from extracted pulse signal are used, when noise-source detector assigns label indicating presence of rigid motion, the fluctuations in the position of the feature point associated with the landmarks on body that show non-rigid movement to estimate and/or remove non-rigid motion noise from extracted pulse signal are used, when noise-source detector assigns label indicating presence of non-rigid motion.
(Supplementary Note 14)
The pulse rate estimation method according to any ofsupplementary notes 10 to 12,
Wherein, in the step (d), variance of pulse signal extracted from each sub-region to estimate useful information are used, when noise-source detector assigns label indicating presence of rigid motion, and either variance of hue changes or local maximum of pulse signal observed in each sub-region to estimate useful information are used, when noise-source detector assigns label indicating presence of non-rigid motion.
The pulse rate estimation method according to any of
Wherein, in the step (d), variance of pulse signal extracted from each sub-region to estimate useful information are used, when noise-source detector assigns label indicating presence of rigid motion, and either variance of hue changes or local maximum of pulse signal observed in each sub-region to estimate useful information are used, when noise-source detector assigns label indicating presence of non-rigid motion.
(Supplementary Note 15)
The pulse rate estimation method according to any ofsupplementary notes 10 to 12,
Wherein, in the step (d), regions with bright reflection/dark shadow spots are identified and suppressed when noise-source detector assigns label indicating absence of rigid motion.
The pulse rate estimation method according to any of
Wherein, in the step (d), regions with bright reflection/dark shadow spots are identified and suppressed when noise-source detector assigns label indicating absence of rigid motion.
(Supplementary Note 16)
The pulse rate estimation method according to supplementary note 10 or 11,
Wherein, in the step (e), the pulse extracted from each ROI sub-region is combined to form a time series representing overall extracted pulse signal, pulse signals from each ROI sub-region by taking their linear combination are combined, weights assigned by ROI filter are used to create said linear combination, and frequency analysis is performed on that extracted overall pulse signal to generate pulse rate candidates.
The pulse rate estimation method according to
Wherein, in the step (e), the pulse extracted from each ROI sub-region is combined to form a time series representing overall extracted pulse signal, pulse signals from each ROI sub-region by taking their linear combination are combined, weights assigned by ROI filter are used to create said linear combination, and frequency analysis is performed on that extracted overall pulse signal to generate pulse rate candidates.
(Supplementary Note 17)
The pulse rate estimation method according to supplementary note 10 or 11,
Wherein, in the step (f), estimated pulse rate value as reliable is considered, when noise-source detector assigns label indicating absence of both rigid and non-rigid motion, estimated pulse rate value as unreliable is considered, when noise-source detector assigns label indicating presence of either rigid or non-rigid motion or both.
The pulse rate estimation method according to
Wherein, in the step (f), estimated pulse rate value as reliable is considered, when noise-source detector assigns label indicating absence of both rigid and non-rigid motion, estimated pulse rate value as unreliable is considered, when noise-source detector assigns label indicating presence of either rigid or non-rigid motion or both.
(Supplementary Note 18)
The pulse rate estimation method according to supplementary note 10 or 11,
Wherein, in the step (f), noisy pulse rate estimates are removed or replaced, wherein, frequency peaks related to noise in the frequency band of interest are removed through identification of reliable and unreliable estimates, attempts to identify the correct pulse rate estimate out of a set of more than one pulse rate candidates using reliable pulse rate estimates identified in previous and/or upcoming video frames.
The pulse rate estimation method according to
Wherein, in the step (f), noisy pulse rate estimates are removed or replaced, wherein, frequency peaks related to noise in the frequency band of interest are removed through identification of reliable and unreliable estimates, attempts to identify the correct pulse rate estimate out of a set of more than one pulse rate candidates using reliable pulse rate estimates identified in previous and/or upcoming video frames.
(Supplementary Note 19)
A computer-readable storage medium storing a program that includes commands for causing a computer to execute:
(a) a step of capturing a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted,
(b) a step of detecting specific human body part, and generating feature points indicating important structural landmarks on the body part for each frame where the body part is detected;
(c) a step of identifying the noise source and assign labels to each video;
(d) a step of selecting an ROI and divide it into sub-regions, extracting pulse signal from each sub-regions, creating a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and applying label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
(e) a step of combing the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and performing frequency analysis on that extracted noise suppressed pulse signal and generating pulse rate candidates;
(f) a step of selecting a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
A computer-readable storage medium storing a program that includes commands for causing a computer to execute:
(a) a step of capturing a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted,
(b) a step of detecting specific human body part, and generating feature points indicating important structural landmarks on the body part for each frame where the body part is detected;
(c) a step of identifying the noise source and assign labels to each video;
(d) a step of selecting an ROI and divide it into sub-regions, extracting pulse signal from each sub-regions, creating a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and applying label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
(e) a step of combing the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and performing frequency analysis on that extracted noise suppressed pulse signal and generating pulse rate candidates;
(f) a step of selecting a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
(Supplementary Note 20)
The computer-readable storage medium storing according tosupplementary note 19,
Wherein, in the step (c), the presence of rigid and/or non-rigid motion as the noise source is detected, and labels are assigned to each video frame indicating the detected noise-source.
The computer-readable storage medium storing according to
Wherein, in the step (c), the presence of rigid and/or non-rigid motion as the noise source is detected, and labels are assigned to each video frame indicating the detected noise-source.
(Supplementary Note 21)
The computer-readable storage medium storing according to supplementary note 19 or 20,
Wherein, in the step (d), an area on human skin from where pulse rate is measured is selected, the area is divided into more than one sub-regions, a pulse signal is extracted from each sub-region signifying color changes happening due to a physiological activity observed in that sub-region, estimates the amount of useful information present inside each sub-region, label assigned by noise-source detector is used to estimate amount of useful information present inside each sub-region, label-dependent ROI filter which emphasizes sub-regions by a measure which is related to the estimated amount of useful pulse information present inside that sub-region is created, label-dependent noise estimation and removal steps are applied to the pulse signals extracted from each sub-region.
The computer-readable storage medium storing according to
Wherein, in the step (d), an area on human skin from where pulse rate is measured is selected, the area is divided into more than one sub-regions, a pulse signal is extracted from each sub-region signifying color changes happening due to a physiological activity observed in that sub-region, estimates the amount of useful information present inside each sub-region, label assigned by noise-source detector is used to estimate amount of useful information present inside each sub-region, label-dependent ROI filter which emphasizes sub-regions by a measure which is related to the estimated amount of useful pulse information present inside that sub-region is created, label-dependent noise estimation and removal steps are applied to the pulse signals extracted from each sub-region.
(Supplementary Note 22)
The computer-readable storage medium storing according to any ofsupplementary notes 19 to 21,
Wherein, in the step (d), the fluctuations in the position of the feature point associated with the nose or another body part which is robust to non-rigid motion to estimate and/or remove rigid motion noise from extracted pulse signal are used, when noise-source detector assigns label indicating presence of rigid motion, the fluctuations in the position of the feature point associated with the landmarks on body that show non-rigid movement to estimate and/or remove non-rigid motion noise from extracted pulse signal are used, when noise-source detector assigns label indicating presence of non-rigid motion.
The computer-readable storage medium storing according to any of
Wherein, in the step (d), the fluctuations in the position of the feature point associated with the nose or another body part which is robust to non-rigid motion to estimate and/or remove rigid motion noise from extracted pulse signal are used, when noise-source detector assigns label indicating presence of rigid motion, the fluctuations in the position of the feature point associated with the landmarks on body that show non-rigid movement to estimate and/or remove non-rigid motion noise from extracted pulse signal are used, when noise-source detector assigns label indicating presence of non-rigid motion.
(Supplementary Note 23)
The computer-readable storage medium storing according to any ofsupplementary notes 19 to 21,
Wherein, in the step (d), variance of pulse signal extracted from each sub-region to estimate useful information are used, when noise-source detector assigns label indicating presence of rigid motion, and either variance of hue changes or local maximum of pulse signal observed in each sub-region to estimate useful information are used, when noise-source detector assigns label indicating presence of non-rigid motion.
The computer-readable storage medium storing according to any of
Wherein, in the step (d), variance of pulse signal extracted from each sub-region to estimate useful information are used, when noise-source detector assigns label indicating presence of rigid motion, and either variance of hue changes or local maximum of pulse signal observed in each sub-region to estimate useful information are used, when noise-source detector assigns label indicating presence of non-rigid motion.
(Supplementary Note 24)
The computer-readable storage medium storing according to any ofsupplementary notes 19 to 21,
Wherein, in the step (d), regions with bright reflection/dark shadow spots are identified and suppressed when noise-source detector assigns label indicating absence of rigid motion.
The computer-readable storage medium storing according to any of
Wherein, in the step (d), regions with bright reflection/dark shadow spots are identified and suppressed when noise-source detector assigns label indicating absence of rigid motion.
(Supplementary Note 25)
The computer-readable storage medium storing according to supplementary note 19 or 20,
Wherein, in the step (e), the pulse extracted from each ROI sub-region is combined to form a time series representing overall extracted pulse signal, pulse signals from each ROI sub-region by taking their linear combination are combined, weights assigned by ROI filter are used to create said linear combination, and frequency analysis is performed on that extracted overall pulse signal to generate pulse rate candidates.
The computer-readable storage medium storing according to
Wherein, in the step (e), the pulse extracted from each ROI sub-region is combined to form a time series representing overall extracted pulse signal, pulse signals from each ROI sub-region by taking their linear combination are combined, weights assigned by ROI filter are used to create said linear combination, and frequency analysis is performed on that extracted overall pulse signal to generate pulse rate candidates.
(Supplementary Note 26)
The computer-readable storage medium storing according to supplementary note 19 or 20,
Wherein, in the step (f), estimated pulse rate value as reliable is considered, when noise-source detector assigns label indicating absence of both rigid and non-rigid motion, estimated pulse rate value as unreliable is considered, when noise-source detector assigns label indicating presence of either rigid or non-rigid motion or both.
The computer-readable storage medium storing according to
Wherein, in the step (f), estimated pulse rate value as reliable is considered, when noise-source detector assigns label indicating absence of both rigid and non-rigid motion, estimated pulse rate value as unreliable is considered, when noise-source detector assigns label indicating presence of either rigid or non-rigid motion or both.
(Supplementary Note 27)
The computer-readable storage medium storing according to supplementary note 19 or 20,
Wherein, in the step (f), noisy pulse rate estimates are removed or replaced, wherein, frequency peaks related to noise in the frequency band of interest are removed through identification of reliable and unreliable estimates, attempts to identify the correct pulse rate estimate out of a set of more than one pulse rate candidates using reliable pulse rate estimates identified in previous and/or upcoming video frames.
The computer-readable storage medium storing according to
Wherein, in the step (f), noisy pulse rate estimates are removed or replaced, wherein, frequency peaks related to noise in the frequency band of interest are removed through identification of reliable and unreliable estimates, attempts to identify the correct pulse rate estimate out of a set of more than one pulse rate candidates using reliable pulse rate estimates identified in previous and/or upcoming video frames.
Although the invention of the present application has been described above with reference to the embodiment, the invention of the present application is not limited to the above embodiment. Various changes that can be understood by a person skilled in the art can be made to the configurations and details of the invention of the present application within the scope of the invention of the present application.
This application is based upon and claims the benefit of priority from International patent application No. PCT/JP2018/015909, filed April 17, 2018, the disclosure of which is incorporated herein in its entirety by reference.
The present invention is useful in fields requiring pulse rate measurement.
10 Computer
11 CPU
12 Main memory
13 Storage device
14 Input interface
15 Display controller
16 Data reader/writer
17 Communication interface
18 Input device
19 Display apparatus
20 Storage medium
21 Bus
100 Pulse rate estimation apparatus
101 Video capturing unit
102 Face tracker (a body-part tracking unit)
103 Noise source detector
104 ROI selection and pulse signal extraction unit
105 Pulse rate estimation unit
106 Spectral peak tracker
11 CPU
12 Main memory
13 Storage device
14 Input interface
15 Display controller
16 Data reader/writer
17 Communication interface
18 Input device
19 Display apparatus
20 Storage medium
21 Bus
100 Pulse rate estimation apparatus
101 Video capturing unit
102 Face tracker (a body-part tracking unit)
103 Noise source detector
104 ROI selection and pulse signal extraction unit
105 Pulse rate estimation unit
106 Spectral peak tracker
Claims (11)
- A pulse rate estimation apparatus based on observation of human skin, comprising:
a video capturing means that captures a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted,
a body-part tracking means that detect specific human body part, and generate feature points indicating important structural landmarks on the body part for each frame where the body part is detected;
a noise source detection means that identify the noise source and assign labels to each video;
a ROI selection and pulse signal extraction means that select an ROI and divide it into sub-regions, extract pulse signal from each sub-regions, create a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and apply label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
a pulse rate estimation means that combine the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and perform frequency analysis on that extracted noise suppressed pulse signal and generate pulse rate candidates;
a spectral peak tracking means that select a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
- The pulse rate estimation apparatus according to claim 1,
Wherein the noise source detection means detect the presence of rigid and/or non-rigid motion as the noise source, and assign labels to each video frame indicating the detected noise-source.
- The pulse rate estimation apparatus according to claim 1 or 2,
Wherein the ROI selection and pulse signal extraction means selects an area on human skin from where pulse rate is measured, divides the area into more than one sub-regions, extracts a pulse signal from each sub-region signifying color changes happening due to a physiological activity observed in that sub-region, estimates the amount of useful information present inside each sub-region, uses label assigned by noise-source detector to estimate amount of useful information present inside each sub-region, creates label-dependent ROI filter which emphasizes sub-regions by a measure which is related to the estimated amount of useful pulse information present inside that sub-region, applies label-dependent noise estimation and removal steps to the pulse signals extracted from each sub-region.
- The pulse rate estimation apparatus according to any of claims 1 to 3,
Wherein the ROI selection and pulse signal extraction means uses the fluctuations in the position of the feature point associated with the nose or another body part which is robust to non-rigid motion to estimate and/or remove rigid motion noise from extracted pulse signal when noise-source detector assigns label indicating presence of rigid motion, uses the fluctuations in the position of the feature point associated with the landmarks on body that show non-rigid movement to estimate and/or remove non-rigid motion noise from extracted pulse signal when noise-source detector assigns label indicating presence of non-rigid motion.
- The pulse rate estimation apparatus according to any of claims 1 to 3,
Wherein the ROI selection and pulse signal extraction means uses variance of pulse signal extracted from each sub-region to estimate useful information when noise-source detector assigns label indicating presence of rigid motion, and uses either variance of hue changes or local maximum of pulse signal observed in each sub-region to estimate useful information when noise-source detector assigns label indicating presence of non-rigid motion.
- The pulse rate estimation apparatus according to any of claims 1 to 3,
Wherein the ROI selection and pulse signal extraction means identifies and suppresses regions with bright reflection/dark shadow spots when noise-source detector assigns label indicating absence of rigid motion.
- The pulse rate estimation apparatus according to claim 1 or 2,
Wherein the pulse rate estimation means combines the pulse extracted from each ROI sub-region to form a time series representing overall extracted pulse signal, combines pulse signals from each ROI sub-region by taking their linear combination, uses weights assigned by ROI filter to create said linear combination, and perform frequency analysis on that extracted overall pulse signal to generate pulse rate candidates.
- The pulse rate estimation apparatus according to claim 1 or 2,
Wherein the spectral peak tracking means considers estimated pulse rate value as reliable when noise-source detector assigns label indicating absence of both rigid and non-rigid motion, considers estimated pulse rate value as unreliable when noise-source detector assigns label indicating presence of either rigid or non-rigid motion or both.
- The pulse rate estimation apparatus according to claim 1 or 2,
Wherein the spectral peak tracking means removes or replaces noisy pulse rate estimates, wherein, frequency peaks related to noise in the frequency band of interest are removed through identification of reliable and unreliable estimates, attempts to identify the correct pulse rate estimate out of a set of more than one pulse rate candidates using reliable pulse rate estimates identified in previous and/or upcoming video frames.
- A pulse rate estimation method based on observation of human skin, comprising:
(a) a step of capturing a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted,
(b) a step of detecting specific human body part, and generating feature points indicating important structural landmarks on the body part for each frame where the body part is detected;
(c) a step of identifying the noise source and assign labels to each video;
(d) a step of selecting an ROI and divide it into sub-regions, extracting pulse signal from each sub-regions, creating a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and applying label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
(e) a step of combing the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and performing frequency analysis on that extracted noise suppressed pulse signal and generating pulse rate candidates;
(f) a step of selecting a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
- A computer-readable storage medium storing a program that includes commands for causing a computer to execute:
(a) a step of capturing a video of a human body part from where pulse signal, which is the direct effect of a certain physiological process, be extracted,
(b) a step of detecting specific human body part, and generating feature points indicating important structural landmarks on the body part for each frame where the body part is detected;
(c) a step of identifying the noise source and assign labels to each video;
(d) a step of selecting an ROI and divide it into sub-regions, extracting pulse signal from each sub-regions, creating a label-dependent ROI filter to assign each ROI sub-region with a weight proportional to the amount of useful pulse information present in it, and applying label-dependent noise estimation and correction to the pulse signals extracted from each sub-region;
(e) a step of combing the pulse signals extracted and ROI filter created in previous step to form the final noise suppressed pulse signal and performing frequency analysis on that extracted noise suppressed pulse signal and generating pulse rate candidates;
(f) a step of selecting a pulse rate value from the list of pulse rate candidates generated in previous step, which is consistent with other pulse rate values in time for each video frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP2020556331A JP7099542B2 (en) | 2018-04-17 | 2019-04-11 | Pulse rate estimation device, pulse rate estimation method, and program |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JPPCT/JP2018/015909 | 2018-04-17 | ||
PCT/JP2018/015909 WO2019202671A1 (en) | 2018-04-17 | 2018-04-17 | Pulse rate estimation apparatus, pulse rate estimation method, and computer-readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2019203106A1 true WO2019203106A1 (en) | 2019-10-24 |
Family
ID=68239112
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2018/015909 WO2019202671A1 (en) | 2018-04-17 | 2018-04-17 | Pulse rate estimation apparatus, pulse rate estimation method, and computer-readable storage medium |
PCT/JP2019/015742 WO2019203106A1 (en) | 2018-04-17 | 2019-04-11 | Pulse rate estimation apparatus, pulse rate estimation method, and computer-readable storage medium |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/JP2018/015909 WO2019202671A1 (en) | 2018-04-17 | 2018-04-17 | Pulse rate estimation apparatus, pulse rate estimation method, and computer-readable storage medium |
Country Status (2)
Country | Link |
---|---|
JP (1) | JP7099542B2 (en) |
WO (2) | WO2019202671A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310584A (en) * | 2020-01-19 | 2020-06-19 | 上海眼控科技股份有限公司 | Heart rate information acquisition method and device, computer equipment and storage medium |
CN111743524A (en) * | 2020-06-19 | 2020-10-09 | 联想(北京)有限公司 | Information processing method, terminal and computer readable storage medium |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113598728B (en) * | 2021-08-31 | 2024-05-07 | 嘉兴温芯智能科技有限公司 | Noise reduction method, monitoring method and monitoring device for physiological signals and wearable equipment |
CN115153473B (en) * | 2022-06-10 | 2024-04-19 | 合肥工业大学 | Non-contact heart rate detection method based on multivariate singular spectrum analysis |
WO2024116255A1 (en) * | 2022-11-29 | 2024-06-06 | 三菱電機株式会社 | Pulse wave estimation device, pulse wave estimation method, state estimation system, and state estimation method |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140018635A1 (en) * | 2012-01-05 | 2014-01-16 | Scosche Industries, Inc. | Heart rate monitor |
JP2014198201A (en) * | 2013-03-29 | 2014-10-23 | 富士通株式会社 | Pulse wave detection program, pulse wave detection method, and pulse wave detection device |
US20150302158A1 (en) * | 2014-04-21 | 2015-10-22 | Microsoft Corporation | Video-based pulse measurement |
JP2016508401A (en) * | 2013-02-28 | 2016-03-22 | コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. | Apparatus and method for determining vital sign information from a subject |
JP2016513521A (en) * | 2013-03-14 | 2016-05-16 | コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. | Apparatus and method for determining a vital sign of an object |
JP2016137018A (en) * | 2015-01-26 | 2016-08-04 | 富士通株式会社 | Pulse wave detection device, pulse wave detection method, and pulse wave detection program |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP6521845B2 (en) * | 2015-11-22 | 2019-05-29 | 国立大学法人埼玉大学 | Device and method for measuring periodic fluctuation linked to heart beat |
CN106073729B (en) * | 2016-05-31 | 2019-08-27 | 中国科学院苏州生物医学工程技术研究所 | The acquisition method of photoplethysmographic signal |
-
2018
- 2018-04-17 WO PCT/JP2018/015909 patent/WO2019202671A1/en active Application Filing
-
2019
- 2019-04-11 JP JP2020556331A patent/JP7099542B2/en active Active
- 2019-04-11 WO PCT/JP2019/015742 patent/WO2019203106A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20140018635A1 (en) * | 2012-01-05 | 2014-01-16 | Scosche Industries, Inc. | Heart rate monitor |
JP2016508401A (en) * | 2013-02-28 | 2016-03-22 | コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. | Apparatus and method for determining vital sign information from a subject |
JP2016513521A (en) * | 2013-03-14 | 2016-05-16 | コーニンクレッカ フィリップス エヌ ヴェKoninklijke Philips N.V. | Apparatus and method for determining a vital sign of an object |
JP2014198201A (en) * | 2013-03-29 | 2014-10-23 | 富士通株式会社 | Pulse wave detection program, pulse wave detection method, and pulse wave detection device |
US20150302158A1 (en) * | 2014-04-21 | 2015-10-22 | Microsoft Corporation | Video-based pulse measurement |
JP2016137018A (en) * | 2015-01-26 | 2016-08-04 | 富士通株式会社 | Pulse wave detection device, pulse wave detection method, and pulse wave detection program |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310584A (en) * | 2020-01-19 | 2020-06-19 | 上海眼控科技股份有限公司 | Heart rate information acquisition method and device, computer equipment and storage medium |
CN111743524A (en) * | 2020-06-19 | 2020-10-09 | 联想(北京)有限公司 | Information processing method, terminal and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
JP2021517843A (en) | 2021-07-29 |
JP7099542B2 (en) | 2022-07-12 |
WO2019202671A1 (en) | 2019-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2019203106A1 (en) | Pulse rate estimation apparatus, pulse rate estimation method, and computer-readable storage medium | |
US11229372B2 (en) | Systems and methods for computer monitoring of remote photoplethysmography based on chromaticity in a converted color space | |
CN112074226B (en) | System and method for remote measurement of vital signs | |
US9737219B2 (en) | Method and associated controller for life sign monitoring | |
JP5834011B2 (en) | Method and system for processing a signal including a component representing at least a periodic phenomenon in a living body | |
Feng et al. | Motion artifacts suppression for remote imaging photoplethysmography | |
CN109820499B (en) | High anti-interference heart rate detection method based on video, electronic equipment and storage medium | |
Bousefsaf et al. | Automatic selection of webcam photoplethysmographic pixels based on lightness criteria | |
US20220240789A1 (en) | Estimation apparatus, method and program | |
JP7339676B2 (en) | Computer-implemented method and system for direct photoplethysmography (PPG) with multiple sensors | |
Wang et al. | Quality metric for camera-based pulse rate monitoring in fitness exercise | |
Wu et al. | Motion-robust atrial fibrillation detection based on remote-photoplethysmography | |
Li et al. | Comparison of region of interest segmentation methods for video-based heart rate measurements | |
CN111050638B (en) | Computer-implemented method and system for contact photoplethysmography (PPG) | |
JP7124974B2 (en) | Blood volume pulse signal detection device, blood volume pulse signal detection method, and program | |
EP3427640B1 (en) | Serial fusion of eulerian and lagrangian approaches for real-time heart rate estimation | |
Murashov et al. | A technique for detecting diagnostic events in video channel of synchronous video and electroencephalographic monitoring data | |
Das et al. | A multiresolution method for non-contact heart rate estimation using facial video frames | |
Kopeliovich et al. | Color signal processing methods for webcam-based heart rate evaluation | |
Le et al. | Remote PPG Estimation from RGB-NIR Facial Image Sequence for Heart Rate Estimation | |
US20230128766A1 (en) | Multimodal contactless vital sign monitoring | |
Vatanparvar et al. | Enhanced Contactless Heart Rate Monitoring Using Camera with Motion Artifact Removal During Physical Activities | |
Nguyen et al. | Evaluation of Video-Based rPPG in Challenging Environments: Artifact Mitigation and Network Resilience | |
SlapniÇcar et al. | Reconstructing PPG Signal from Video Recordings | |
CN117617903A (en) | Pulse wave signal quality non-reference evaluation method based on heart rate continuity |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 19787841 Country of ref document: EP Kind code of ref document: A1 |
|
ENP | Entry into the national phase |
Ref document number: 2020556331 Country of ref document: JP Kind code of ref document: A |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 19787841 Country of ref document: EP Kind code of ref document: A1 |