WO2022177501A1 - A system and method for measuring vital body signs - Google Patents
A system and method for measuring vital body signs Download PDFInfo
- Publication number
- WO2022177501A1 WO2022177501A1 PCT/SG2021/050366 SG2021050366W WO2022177501A1 WO 2022177501 A1 WO2022177501 A1 WO 2022177501A1 SG 2021050366 W SG2021050366 W SG 2021050366W WO 2022177501 A1 WO2022177501 A1 WO 2022177501A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- nir
- ppg signal
- person
- estimating
- ppg
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 110
- 238000013186 photoplethysmography Methods 0.000 claims abstract description 160
- 230000035487 diastolic blood pressure Effects 0.000 claims description 36
- 230000035488 systolic blood pressure Effects 0.000 claims description 36
- 239000008280 blood Substances 0.000 claims description 34
- 210000004369 blood Anatomy 0.000 claims description 34
- 230000014509 gene expression Effects 0.000 claims description 31
- 230000029058 respiratory gaseous exchange Effects 0.000 claims description 29
- QVGXLLKOCUKJST-UHFFFAOYSA-N atomic oxygen Chemical compound [O] QVGXLLKOCUKJST-UHFFFAOYSA-N 0.000 claims description 28
- 229910052760 oxygen Inorganic materials 0.000 claims description 28
- 239000001301 oxygen Substances 0.000 claims description 28
- 238000012880 independent component analysis Methods 0.000 claims description 23
- 230000003205 diastolic effect Effects 0.000 claims description 17
- 230000001815 facial effect Effects 0.000 claims description 17
- 230000004044 response Effects 0.000 claims description 17
- 238000012417 linear regression Methods 0.000 claims description 15
- 238000012545 processing Methods 0.000 claims description 15
- 229910003798 SPO2 Inorganic materials 0.000 claims description 12
- 238000001514 detection method Methods 0.000 claims description 10
- 238000010801 machine learning Methods 0.000 claims description 8
- 238000005070 sampling Methods 0.000 claims description 7
- 210000004709 eyebrow Anatomy 0.000 claims description 6
- 238000003384 imaging method Methods 0.000 claims description 6
- 238000009472 formulation Methods 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 5
- 239000000203 mixture Substances 0.000 claims description 5
- 230000003595 spectral effect Effects 0.000 claims description 5
- 230000036772 blood pressure Effects 0.000 description 13
- 210000001061 forehead Anatomy 0.000 description 7
- 210000003491 skin Anatomy 0.000 description 6
- 230000001360 synchronised effect Effects 0.000 description 5
- 238000010521 absorption reaction Methods 0.000 description 3
- 102000001554 Hemoglobins Human genes 0.000 description 2
- 108010054147 Hemoglobins Proteins 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 239000000654 additive Substances 0.000 description 1
- 230000000996 additive effect Effects 0.000 description 1
- 230000017531 blood circulation Effects 0.000 description 1
- 210000004027 cell Anatomy 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000013178 mathematical model Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 210000004927 skin cell Anatomy 0.000 description 1
- 238000010561 standard procedure Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/0059—Measuring for diagnostic purposes; Identification of persons using light, e.g. diagnosis by transillumination, diascopy, fluorescence
- A61B5/0077—Devices for viewing the surface of the body, e.g. camera, magnifying lens
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/02—Detecting, measuring or recording pulse, heart rate, blood pressure or blood flow; Combined pulse/heart-rate/blood pressure determination; Evaluating a cardiovascular condition not otherwise provided for, e.g. using combinations of techniques provided for in this group with electrocardiography or electroauscultation; Heart catheters for measuring blood pressure
- A61B5/0205—Simultaneously evaluating both cardiovascular conditions and different types of body conditions, e.g. heart and respiratory condition
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/02—Detecting, measuring or recording pulse, heart rate, blood pressure or blood flow; Combined pulse/heart-rate/blood pressure determination; Evaluating a cardiovascular condition not otherwise provided for, e.g. using combinations of techniques provided for in this group with electrocardiography or electroauscultation; Heart catheters for measuring blood pressure
- A61B5/024—Detecting, measuring or recording pulse rate or heart rate
- A61B5/02416—Detecting, measuring or recording pulse rate or heart rate using photoplethysmograph signals, e.g. generated by infrared radiation
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/72—Signal processing specially adapted for physiological signals or for diagnostic purposes
- A61B5/7235—Details of waveform analysis
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/0059—Measuring for diagnostic purposes; Identification of persons using light, e.g. diagnosis by transillumination, diascopy, fluorescence
- A61B5/0075—Measuring for diagnostic purposes; Identification of persons using light, e.g. diagnosis by transillumination, diascopy, fluorescence by spectroscopy, i.e. measuring spectra, e.g. Raman spectroscopy, infrared absorption spectroscopy
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/02—Detecting, measuring or recording pulse, heart rate, blood pressure or blood flow; Combined pulse/heart-rate/blood pressure determination; Evaluating a cardiovascular condition not otherwise provided for, e.g. using combinations of techniques provided for in this group with electrocardiography or electroauscultation; Heart catheters for measuring blood pressure
- A61B5/021—Measuring pressure in heart or blood vessels
- A61B5/02108—Measuring pressure in heart or blood vessels from analysis of pulse wave characteristics
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/08—Detecting, measuring or recording devices for evaluating the respiratory organs
- A61B5/0816—Measuring devices for examining respiratory frequency
-
- A—HUMAN NECESSITIES
- A61—MEDICAL OR VETERINARY SCIENCE; HYGIENE
- A61B—DIAGNOSIS; SURGERY; IDENTIFICATION
- A61B5/00—Measuring for diagnostic purposes; Identification of persons
- A61B5/145—Measuring characteristics of blood in vivo, e.g. gas concentration, pH value; Measuring characteristics of body fluids or tissues, e.g. interstitial fluid, cerebral tissue
- A61B5/1455—Measuring characteristics of blood in vivo, e.g. gas concentration, pH value; Measuring characteristics of body fluids or tissues, e.g. interstitial fluid, cerebral tissue using optical sensors, e.g. spectral photometrical oximeters
- A61B5/14551—Measuring characteristics of blood in vivo, e.g. gas concentration, pH value; Measuring characteristics of body fluids or tissues, e.g. interstitial fluid, cerebral tissue using optical sensors, e.g. spectral photometrical oximeters for measuring blood gases
Definitions
- This invention relates to a system and method of estimating body vitals like Heart Rate, Respiration Rate, Blood Oxygen Saturation and Blood Pressure of a person. Specifically, this invention relates to a system and method of estimating body vitals like Heart Rate, Respiration Rate, Blood Oxygen Saturation and Blood Pressure of a person based on images from a colour camera and an active depth sensor simultaneously.
- Prior Art
- Photoplethysmography is an optically obtained plethysmogram used to detect blood volume changes in the microvascular bed of tissue.
- Remote Photoplethysmography allows estimation of blood volume changes without skin contact by using ambient light sources and video cameras. Under proper lighting conditions, minute variations in skin colour and temperature due to blood volume changes can be observed. This allows estimating blood volume changes without skin contact.
- ambient light sources slight change in the surrounding would inevitably lead to inaccurate results from the plethysmogram.
- a first advantage of the system and method in accordance with this invention is that the system and method allows non-contact measurement of vital signs.
- a second advantage of the system and method in accordance with this invention is that the system and method is able to estimate vital signs within a short period of time, at less than 5 seconds, with an accuracy of more than 90%.
- a third advantage of the system and method in accordance with this invention is that the system and method is independent of the ambient light.
- a fourth advantage of the system and method in accordance with this invention is that the system and method utilises devices that are readily available. This allows remote monitoring of patient's vital signs with ease.
- a first aspect of the disclosure describes a system for estimating vitals of a person.
- the system comprises an image sensor adapted to capture a sequence of color (RGB) and near infrared (NIR) images, a processing unit comprising a processor, memory and instructions stored on the memory and executable by the processor to: estimating vitals of the person from the sequence of RGB and NIR images.
- RGB color
- NIR near infrared
- the instruction to estimating vitals of the person from the sequence of RGB and NIR images comprises instructions to: time synchronising the sequence of RGB and NIR images; detecting region of interest within the sequence of RGB and NIR images; extracting raw signals from the detected region of interest; extracting photoplethysmography (PPG) signals from the raw signals; and estimating vitals of the person from the extracted PPG signals.
- PPG photoplethysmography
- the instruction to detecting region of interest within the sequence of RGB and NIR images comprises instructions to: detecting a face of the person within each of the sequence of NIR images; in response to detecting the face of the person, detecting at least one region of interest within the detected face of the person; and mapping the detected region of interest in each of the sequence of NIR images to the corresponding sequence of RGB images.
- the instruction to detecting at least one region of interest within the detected face of the person comprises instructions to: applying facial landmark detection where 81 prominent feature points on the face the person are detected; and overlaying a box at a top of the detected 81 prominent feature points between facial edge and eyebrows, wherein the box is one region of interest.
- the instruction to detecting at least one region of interest within the detected face of the person further comprises instructions to: overlaying two boxes between facial edge and nose, wherein the two box are two other region of interests.
- the instruction to detecting region of interest within the sequence of RGB and NIR images comprises instructions to: computing an average pixel intensity of the detected region of interest for each of a plurality of frequency bands including Red, Green and Blue from the sequence of RGB images and NIR frequency band from the sequence of NIR images; and appending the average pixel intensity value of each frequency band to the pixel intensity value of corresponding frequency band in respective RGB and NIR images to form a time series for each frequency band.
- the instruction to detecting region of interest within the sequence of RGB and NIR images further comprises instructions to: determining if a length of the time series is equal to a predetermined length, /.; in response to the length of the time series being equal to the predetermined length, L, normalising the pixel intensity value in the time series for each frequency band.
- ICA Independent Component Analysis
- the instruction to applying Independent Component Analysis (ICA) to extract PPG signal from the detrended pixel intensity value in the time series for each frequency band comprises instructions to: grouping the detrended pixel intensity value in the time series for each frequency band into a set of Red PPG signal and a set of NIR PPG signals wherein the Red PPG signal is derived from the frequency bands of Red, Green and Blue and the NIR PPG signal is derived from the frequency bands of NIR, Green and Blue; applying a band pass filter with cutoff frequency of between 0.16 and 2 Hz to the Red PPG signal; applying a band pass filter with cutoff frequency of between 0.16 and 2 Hz to the NIR PPG signal.
- ICA Independent Component Analysis
- the instruction to estimating vitals of the person from the extracted PPG signals comprises instructions to: computing Ratio (R) of time-domain range for NIR PPG signal and Red PPG signal wherein the Ratio (R) can be expressed with the following expression, where amplitude range of Red PPG signal relates to difference between minimum and maximum values of the amplitude of the Red PPG signal in time domain, arithmetic mean of Red PPG signal relates to arithmetic mean of the Red PPG signal in between minimum and maximum values of the amplitude of the Red PPG signal in time domain, amplitude range of NIR PPG signal relates to difference between minimum and maximum values of the amplitude of the NIR PPG signal in time domain and arithmetic mean of NIR PPG signal relates to arithmetic mean of the NIR PPG signal in between minimum and maximum values of the amplitude of the NIR PPG signal in time domain; estimating the Blood Oxygen Saturation with the following expression,
- the instruction to estimating vitals of the person from the extracted PPG signals comprises instructions to: detecting Systolic Peak (P s ), Diastolic peak (P d ), and global minima (P m ) from the time-domain NIR PPG signal; estimating the Systolic and Diastolic Blood Pressure with the following expressions,
- Y 1 , Y 2 , Y 3 , Y 4 , Y 5 , Y 6 are parameters of the regression curve which maps the
- Y 1 and Y 4 are frequency-slope parameters for systolic and diastolic blood pressures respectively
- Y 2 and Y 5 are amplitude-slope parameters for systolic and diastolic blood pressures respectively
- Y 3 and Y 6 are off-set parameters for systolic and diastolic blood pressures respectively.
- the instruction to detecting Systolic Peak (P s ), Diastolic peak (P d ), and global minima (P m ) from the time-domain NIR PPG signal comprises instructions to: applying a peak detection algorithm on the NIR PPG signal to obtain a second order differential; identifying a global maxima in the NIR PPG signal as the Systolic Peak (P s ), a peak in the NIR PPG signal corresponding to a minima following the Systolic Peak (P s ) in the second order derivative as Diastolic peak (P d ), a maxima immediately after the systolic peak in the second order derivative as the Diatric notch, a minima of the entire NIR PPG signal as the global minima (P m ).
- the instruction to estimating vitals of the person from the extracted PPG signals comprises instructions to: applying fast fourier transform to the NIR PPG signal to transform the NIR PPG signal from time-domain to frequency domain; computing a plurality of local maxima of a power spectral density (PSD) of the NIR PPG signal in frequency domain; in response to a local maxima between the frequency range of 0.8 and 2 Hz, estimating a Heart Rate (HR) with the following expression,
- Heart Rate ( HR ) 60 * f hr
- f hr refers to the local maxima that is between the frequency range of 0.8
- Respiration Rate ( RR ) 60 * f rr
- f rr refers to the local maxima that is between the frequency range of 0.1 and 0.5 Hz.
- the instruction to estimating vitals of the person from the extracted PPG signals comprises instruction to applying a machine learning model based on a linear regression model.
- a second aspect of the disclosure describes a method for estimating vitals of a person.
- the method comprises estimating vitals of the person from a sequence of RGB and NIR images.
- the step of estimating vitals of the person from the sequence of RGB and NIR images comprises: time synchronising the sequence of RGB and NIR images; detecting region of interest within the sequence of RGB and NIR images; extracting raw signals from the detected region of interest; extracting photoplethysmography (PPG) signals from the raw signals; and estimating vitals of the person from the extracted PPG signals.
- PPG photoplethysmography
- the step of detecting region of interest within the sequence of RGB and NIR images comprises: detecting a face of the person within each of the sequence of NIR images; in response to detecting the face of the person, detecting at least one region of interest within the detected face of the person; and mapping the detected region of interest in each of the sequence of NIR images to the corresponding sequence of RGB images.
- the step of detecting at least one region of interest within the detected face of the person comprises instructions to: applying facial landmark detection where 81 prominent feature points on the face the person are detected; and overlaying a box at a top of the detected 81 prominent feature points between facial edge and eyebrows, wherein the box is one region of interest.
- the step of detecting at least one region of interest within the detected face of the person further comprises instructions to: overlaying two boxes between facial edge and nose, wherein the two box are two other region of interests.
- the step of detecting region of interest within the sequence of RGB and NIR images comprises instructions to: computing an average pixel intensity of the detected region of interest for each of a plurality of frequency bands including Red, Green and Blue from the sequence of RGB images and NIR frequency band from the sequence of NIR images; and appending the average pixel intensity value of each frequency band to the pixel intensity value of corresponding frequency band in respective RGB and NIR images to form a time series for each frequency band.
- the step of detecting region of interest within the sequence of RGB and NIR images further comprises instructions to: determining if a length of the time series is equal to a predetermined length, /.; in response to the length of the time series being equal to the predetermined length, L, normalising the pixel intensity value in the time series for each frequency band.
- ICA Independent Component Analysis
- ICA Independent Component Analysis
- the instruction to estimating vitals of the person from the extracted PPG signals comprises instructions to: computing Ratio (R) of time-domain range for NIR PPG signal and Red PPG signal wherein the Ratio (R) can be expressed with the following expression, where amplitude range of Red PPG signal relates to difference between minimum and maximum values of the amplitude of the Red PPG signal in time domain, arithmetic mean of Red PPG signal relates to arithmetic mean of the Red PPG signal in between minimum and maximum values of the amplitude of the Red PPG signal in time domain, amplitude range of NIR PPG signal relates to difference between minimum and maximum values of the amplitude of the NIR PPG signal in time domain and arithmetic mean of NIR PPG signal relates to arithmetic mean of the NIR PPG signal in between minimum and maximum values of the amplitude of the NIR PPG signal in time domain; estimating the Blood Oxygen Saturation with the following expression,
- the step of estimating vitals of the person from the extracted PPG signals comprises instructions to: detecting Systolic Peak (P s ), Diastolic peak (P d ), and global minima (P m ) from the time-domain NIR PPG signal; estimating the Systolic and Diastolic Blood Pressure with the following expressions,
- the step of detecting Systolic Peak (P s ), Diastolic peak (P d ), and global minima (P m ) from the time-domain NIR PPG signal comprises instructions to: applying a peak detection algorithm on the NIR PPG signal to obtain a second order differential; identifying a global maxima in the NIR PPG signal as the Systolic Peak (P s ), a peak in the NIR PPG signal corresponding to a minima following the Systolic Peak (P s ) in the second order derivative as Diastolic peak (P d ), a maxima immediately after the systolic peak in the second order derivative as the Diatric notch, a minima of the entire NIR PPG signal as the global minima (P m ).
- the step of estimating vitals of the person from the extracted PPG signals comprises instructions to: applying fast fourier transform to the NIR PPG signal to transform the NIR PPG signal from time-domain to frequency domain; computing a plurality of local maxima of a power spectral density (PSD) of the NIR PPG signal in frequency domain; in response to a local maxima between the frequency range of 0.8 and 2 Hz, estimating a Heart Rate (HR) with the following expression,
- Heart Rate ( HR ) 60 * f hr
- f hr refers to the local maxima that is between the frequency range of 0.8 and 2 Hz
- RR Respiration Rate
- Respiration Rate ( RR ) 60 * f rr
- f rr refers to the local maxima that is between the frequency range of 0.1
- the step of estimating vitals of the person from the extracted PPG signals comprises applying a machine learning model based on a linear regression model.
- Figure 1 illustrating a system for estimating vitals of a person in accordance with an embodiment of this invention
- Figure 2 illustrating a process flow that is executable by a processing unit of the system in figure 1 in accordance with an embodiment of this invention
- FIG. 3 illustrating a sample image of colour image and NIR image in accordance with an embodiment of this invention
- Figure 4 illustrating a process flow for extracting raw signal in accordance with an embodiment of this invention
- FIG. 5 illustrating a process flow for extracting PPG signal in accordance with an embodiment of this invention
- FIG. 6 illustrating a process flow for estimating blood oxygen saturation of a person from the extracted PPG signal in accordance of this invention
- FIG. 7 illustrating a process flow for estimating Systolic and Diastolic Blood Pressure of a person from the extracted PPG signal in accordance of this invention
- FIG 8 illustrating a process flow for estimating Heart Rate and Respiration Rate of a person from the extracted PPG signal in accordance of this invention
- Figure 9 illustrating the NIR PPG signal and the second order differential of the NIR
- Figure 11 illustrating a linear regression curve that can be used for estimation of blood pressure.
- This invention relates to a system and method of estimating body vitals like Heart Rate, Respiration Rate, Blood Oxygen Saturation and Blood Pressure of a person. Specifically, this invention relates to a system and method of estimating body vitals like Heart Rate, Respiration Rate, Blood Oxygen Saturation and Blood Pressure of a person based on images from a Colour camera and depth sensor simultaneously.
- Figure 1 illustrates a system 100 for estimating body vitals such as Heart Rate, Respiration Rate, Blood Oxygen Saturation and Blood Pressure of a person.
- the system 100 comprises a depth sensor 110, an image sensor 120 and a processing unit 130.
- the depth sensor 110 is an active depth sensor that works in the near infrared (NIR) region.
- NIR near infrared
- absorption of the light by Hemoglobin is lowest in the NIR region.
- the skin cells and Hemoglobin in blood absorbs less NIR light (800- 1000 nm) compared to visible light and hence the blood flow changes underneath the skin are sensed with greater contrast compared to visible light. Due to low absorption, the PPG signal will have the higher signal to noise ratio. Since the active depth sensor works in the NIR region, we can use active depth sensor 110 to shine NIR light on the skin of a person and measure its absorption to assist in estimating the vitals of the person.
- the image sensor 120 is a typical colour camera for capturing sequence of images. Such image sensor 120 is commonly known in the art and a detailed description of the image sensor 120 would be omitted for brevity.
- the processing unit 130 is a typical computing system that comprises a processor, memory and instructions stored on the memory and executable by the processor.
- the processor may be a processor, microprocessor, microcontroller, application specific integrated circuit, digital signal processor (DSP), programmable logic circuit, or other data processing device that executes instructions to perform the processes in accordance with the present invention.
- the processor has the capability to execute various applications that are stored in the memory.
- the memory may include read-only memory (ROM), random- access memory (RAM), electrically erasable programmable ROM (EEPROM), flash cards, or any storage medium.
- Instructions are computing codes, software applications that are stored on the memory and executable by the processor to perform the processes in accordance with this invention. Such computing system is well known in the art and hence only briefly described herein.
- the instructions can developed in C++ language (or any other known programming language) and can be run on System on Chip (SoC) like Raspberry Pi or/and mobile devices like cell phones or tablet PCs.
- SoC System on Chip
- FIG. 2 illustrates a process flow 200 that is executed by the instructions of the processing unit to estimate vital signs of a person in accordance with this invention.
- both active depth sensor 110 and image sensor 120 would pointing in the same direction. Images captured by both sensors 110 and 120 would be transmitted to the processing unit 130.
- Process 200 begins with step 205 where the images from the active depth sensor 110 and image sensor 120 are time synchronised. Essentially, there should not be any time shift between the images from both sensors.
- process 200 detects the region of interest.
- the region of interest includes, forehead, cheek and any other exposed skin.
- the region of interest would include the forehead and cheek.
- Figure 3 illustrates an example of an image taken from the active depth sensor 110 and image sensor 120. Specifically, the left image 310 is taken from the active depth sensor 110 and the right image 320 is taken from the image sensor 120.
- the process first detects the face 311 and thereafter detects whether the person is wearing a mask. If the person is not wearing a mask, the process would detect the forehead region 312 and the cheek regions 313. If the person is wearing a mask, the process would detect the forehead region 312 only.
- One method of detecting the Rol is by using facial landmark detection where 81 prominent feature points on the face are detected.
- the 81 prominent feature points include eyebrow, eye, nose, mouth and facial edge.
- the process would be able to determine if the person of interest is facing the camera (i.e. frontal, profile, or semi-frontal) and whether the person is wearing a mask (i.e. mouth and part of nose and bottom facial edges would be obscured).
- a mask i.e. mouth and part of nose and bottom facial edges would be obscured.
- the region of interest is computed for all the images from the active depth sensor 110 and mapped to the images from image sensor 120 using the calibration.
- the Rol detection is executed on the captured images to form a set of time synchronised images.
- the set of time synchronised images is a length of no more than 5 seconds.
- both active depth sensor 110 and image sensor 120 are calibrated using standard Tsai camera calibration technique to know the pixel mapping between them so that when we detect the region of interest in one image, it can be mapped to the other image without additional computations.
- process 200 extracts raw signals within the detected region(s) of the images from both the active depth sensor 110 and image sensor 120. Further details will be described below with reference to figure 4.
- step 220 process 200 extracts PPG signal from the raw signals. Further details will be described below with reference to figure 5.
- step 225 process 200 estimates the vital signs including Respiration Rate (RR),
- Process 200 ends after step 225.
- Figure 4 illustrates a process flow 400 that is executed by the instructions of the processing unit to extract raw signals within the detected region(s) of the images from both the active depth sensor of step 215 in process 200 in accordance with this invention.
- the raw signals are pixel intensity values of each frequency band in the RGB images and NIR images.
- Process 400 begins with step 405 to compute the average pixel intensity of the detected region(s) for each of the frequency bands including Red, Green and Blue from the image of the image sensor 220 and NIR from the depth sensor 210.
- Each detected Region of Interest (Rol) is a rectangular matrix of the pixel intensity values as identified as 312 and 313 in Figure 3.
- the arithmetic mean of all the pixels (average pixel intensity) within the region of interest 312 and 313 is computed in step 405.
- the average pixel intensity value for each frequency band computed in step 405 is then appended to the time series of the corresponding frequency band in step 410.
- process 400 appends the average pixel intensity value to form a time series corresponding to the frequency band. Specifically, process 400 appends the average pixel intensity value of each frequency band to the pixel intensity value of corresponding frequency band in respective RGB and NIR images to form a time series for each frequency band, the frequency band includes R, G B and NIR.
- the time series relate to the time synchronised video for estimating the vital signs. Each set of time series is no longer than 5 seconds of video clip.
- the average pixel intensity values are appended in the following manner:
- the average pixel intensity values of the Red frequency band computed in step 405 is appended to the pixel intensity values of Red frequency band in each image from the video clip to form the red time series;
- the average pixel intensity values of the Green frequency band computed in step 405 is appended to the pixel intensity values of Green frequency band in each image from the video clip to form the green time series; 3.
- the average pixel intensity values of the Blue frequency band computed in step 405 is appended to the pixel intensity values of Blue frequency band in each image from the video clip to form the blue time series; and 4.
- the average pixel intensity values of the NIR frequency band computed in step 405 is appended to the pixel intensity values of NIR frequency band in each image from the video clip to form the NIR time series.
- process 400 generates a table as illustrated below.
- Each column of Red, Blue, Green and NIR corresponds to a time series where the average pixel intensity value has been appended to respective frequency band.
- Below table only illustrates four time stamps, at 0 second, 0.036 second, 0.072 second and 0.108 second.
- the time series should contain no more than 5 seconds worth of data.
- the minimum would be 104 frames; and if camera frame capture rate is more than 30 FPS, the minimum would be 128 frames.
- step 415 process 400 determines if the time series length is equal to a predetermined length, L
- the predetermined length, L is no more than 5 seconds, which is computed based on the frame rate of the image sensor. If the length of the time series is equal to the predetermined length, L, process 400 proceeds to step 425. If the time series is not equal to the predetermined length, L, process 400 repeats from step 205. Essentially, in order to accurately estimate the vital signs, the system requires a minimum length of data and this length is no more than 5 seconds.
- step 415 may be modified to determine if the time series length is less than or not more than the predetermined length, L
- process 400 normalises the pixel intensity value in the time series for each frequency band. Specifically, the pixel intensity value from each frequency band in the time series are normalised to restrict the amplitude to be within [0, 1].
- normalising the pixel intensity value in each frequency band in the time series is by subtracting the pixel intensity value in each frequency band by their arithmetic mean of the pixel intensity value and divided by their standard deviation the pixel intensity value.
- Process 400 ends after step 425.
- Figure 5 illustrates a process flow 500 that is executed by the instructions of the processing unit to extract PPG signal from the raw signals of step 220 in process 200 in accordance with this invention.
- Process 500 begins with step 505 by applying a moving average filter for each frequency band. Specifically, process 505 applies a moving average filter to the pixel intensity value in the time series for each frequency band.
- the moving average filter has a window length of a t * L, to remove sudden amplitude changes in the raw signal caused due to movement of the person across the time series. The choice of a t and L are shown in the table below.
- process 500 applies a detrending algorithm to the up-sampled pixel intensity value in the time series for each frequency band.
- a detrending algorithm based on smoothness priors’ formulation is applied to remove noise in the up-sampled raw signal while preserving the frequency bands corresponding to respiration rate and heart rate. Further details of the detrending algorithm can be found in Tarvainen, M.P., Ranta-Aho, P.O. and Karjalainen, P.A., 2002.
- An advanced detrending method with application to HRV analysis IEEE Transactions on Biomedical Engineering, 49(2), pp.172-175.
- the above expression automates the tuning of the smoothness factor, L, and eliminates the need for re-tuning the algorithm when the length of raw signal is varied.
- process 500 applies Independent Component Analysis (ICA) to extract PPG signal from the detrended raw signal (i.e. detrended pixel intensity value in the time series for each frequency band).
- the detrended raw signal is grouped into two sets based on the frequency bands to extract Red and NIR PPG signals using ICA.
- the frequency bands of Red, Green and Blue are used to extract Red PPG signal in step 530 and the frequency bands of NIR, Green and Blue are used to extract NIR PPG signal in step 540.
- ICA is a computational method for separating a multivariate signal into additive subcomponents. It is a standard technique well known in the field of signal processing and hence details of ICA is omitted for brevity.
- a band pass filter with cutoff frequency between 0.16 and 2 Hz is applied to the Red PPG signal to remove noise.
- a band pass filter with cutoff frequency between 0.16 and 2 Hz is applied to the NIR PPG signal to remove noise.
- the cutoff frequency is set to 0.16 and 2 Hz which covers the frequency range of average human respiration rate and heart rate.
- the band pass filter is a Finite Impulse Response (FIR) filter which is a standard time domain filtering technique.
- the cutoff frequency can be inversed to translate to time domain.
- the values of a 1 , a 2 , a 3 &.L are calibrated based on the imaging devices used to capture video.
- the imaging device s video capture frame rate plays a key role in the calibration process, and this is dependent on the device and the processing unit 130.
- These calibrated values are used to automate the up-sampling and detrending algorithms. Below are two examples of known devices with the calibrated values.
- Process 500 ends after step 535 and 545.
- FIG. 6 illustrates a process flow 600 that is executed by the instructions of the processing unit to estimate Blood Oxygen Saturation (SPO 2 ) in step 225 in process 200 in accordance with this invention.
- Process 600 begins with step 605 by computing Ratio (R) of time-domain range for NIR PPG signal and Red PPG signal.
- Ratio (R) can be computed with the following expression,
- amplitude range of Red PPG signal refers to difference between minimum and maximum values of the amplitude of the Red PPG signal in time domain
- arithmetic mean of Red PPG signal refers to arithmetic mean of the Red PPG signal in between minimum and maximum values of the amplitude of the Red PPG signal in time domain
- amplitude range of NIR PPG signal refers to difference between minimum and maximum values of the amplitude of the NIR PPG signal in time domain
- arithmetic mean of NIR PPG signal refers to arithmetic mean of the NIR PPG signal in between minimum and maximum values of the amplitude of the NIR PPG signal in time domain.
- step 610 process 600 estimates the Blood Oxygen Saturation with the following expression.
- Blood Oxygen Saturation ( SP0 2 ) b + ⁇ 2 * R
- b 1 refers to off-set parameter and ⁇ 2 refers to slope parameter of the linear regression curve which maps the Ratio (R) to the blood oxygen saturation (SPO 2 ).
- /3 ⁇ 4 and ⁇ 2 are variables that can be derived from the linear regression curve.
- Figure 10 illustrates a linear regression curve that can be used for estimation of SPO 2 from Ratio (R).
- Process 600 ends after step 610.
- FIG. 7 illustrates a process flow 700 that is executed by the instructions of the processing unit to estimate Systolic and Diastolic Blood Pressure in step 225 in process 200 in accordance with this invention.
- Process 700 begins with step 705 by detecting Systolic Peak (P s ), Diastolic peak (P d ), and global minima (P m ) from the time-domain NIR PPG signal.
- P s Systolic Peak
- P d Diastolic peak
- P m global minima
- the Systolic peak times can be detected accurately from the blood volume pulse (BVP) waveform as they are maxima within the signal.
- BVP waveform refers to the NIR PPG signal.
- Maxima refers to global maxima which is the maxima of the entire NIR PPG signal.
- the amplitude of the global maxima is identified as the Systolic Peak (P s ) 910. From figure 9, the largest minima within the second order derivative correspond to the Systolic Peaks (P s ) 910 and the minima following these typically correspond to the diastolic peaks/inflections. Based on the location of the diastolic peak in the second order derivative, the amplitude of the corresponding point on the NIR PPG signal is identified as Diastolic peak (P d ) 930. The location corresponding to the Diatric notch 920 is identified as the maxima immediately after the location identified as the systolic peak in the second order derivative. The minima of the entire NIR PPG signal would be the global minima (P m ) 940. In step 610, process 600 estimates the Systolic and Diastolic Blood Pressure with the following expressions,
- Y 1 , Y 2 , Y 3 , Y 4 , Y 5 , Y 6 are parameters of the regression curve which maps the
- Y 1 and Y 4 are frequency-slope parameters for systolic and diastolic blood pressures respectively
- Y 2 and Y 5 are amplitude-slope parameters for systolic and diastolic blood pressures respectively
- Y 3 and Y 1 are off-set parameters for systolic and diastolic blood pressures respectively.
- Figure 11 illustrates a linear regression curve that can be used for estimation of blood pressure.
- the log in figure 11 refers to logarithm base-10 and In refers to logarithm base-e (natural log).
- Process 700 ends after step 710.
- FIG. 8 illustrates a process flow 800 that is executed by the instructions of the processing unit to estimate Heart Rate (HR) and Respiration Rate (RR) in step 225 in process 200 in accordance with this invention.
- Process 800 begins with step 805 by applying fast fourier transform to the NIR PPG signal to transform the NIR PPG signal from time-domain to frequency domain.
- process 800 compute the local maxima of power spectral density (PSD). Specifically, PSD is computed by dividing the NIR PPG in frequency domain by the length of the NIR PPG signal (L). The local maxima of PSD is then computed by using any generic peak finder algorithms.
- PSD power spectral density
- process 800 determines the frequency range of the local maxima to use for computing the heart rate and respiration rate.
- the local maxima within the frequency range of between 0.8 and 2 Hz is used.
- the local maxima within the frequency range of between 0.1 and 0.5 Hz is used.
- process 800 proceeds to step 830 to estimate the Heart Rate (HR) using the local maxima that is that is within the frequency range of between 0.8 and 2 Hz and to step 840 to estimate the Respiration Rate (RR) using the local maxima that is within the frequency range of between 0.1 and 0.5 Hz.
- HR Heart Rate
- RR Respiration Rate
- process 800 estimates the Heart Rate (HR) with the following expression,
- f hr refers to the local maxima that is within the frequency range of between 0.8 and 2 Hz.
- step 840 process 800 estimates the Respiration Rate (RR) with the following expression,
- Respiration Rate ( RR ) 60 * f rr
- f rr refers to the local maxima that is within the frequency range of between 0.1 and 0.5 Hz.
- Process 800 ends after steps 830 and 840.
- the Heart rate, Respiration rate and Blood Pressure are estimated using NIR PPG signal derived from data captured by active depth camera. These estimates do not vary with the ambient lighting conditions.
- the blood pressure model works on the premise that the systolic and diastolic peaks identified which correspond to the respective heart activities. This method gives greater accuracy compared to just using the global maxima in the model.
- the estimation of Blood Oxygen Saturation is based on the ratio of range of Red PPG signal to range of NIR PPG signal.
- the method according to this invention involves using an active depth camera to capture simultaneous, time synchronized, video frames of both Color (RGB) and NIR wavelengths. From the video frames, a person’s face is detected and the region corresponding to their forehead is extracted (region of interest) and the pixel intensity values are averaged to form a time domain raw signal across multiple video frames. The time domain raw signal is detrended to remove higher order harmonics and resampled and filtered using Independent Component Analysis (ICA) and FIR bandpass filter to obtain the PPG signal. Vitals such as Pulse rate, Respiration Rate, Blood Oxygen saturation and Blood Pressure are estimated from the PPG signal using frequency analysis using FFT for the former two vitals and mathematical models for the latter two vitals. Processes 600-800 illustrate one embodiment of estimating Pulse rate, Respiration
- Pulse rate, Respiration Rate, Blood Oxygen saturation and Blood Pressure based on the Red PPG signal and NIR PPG signal Another embodiment of estimating Pulse rate, Respiration Rate, Blood Oxygen saturation and Blood Pressure based on the Red PPG signal and NIR PPG signal is the use of a machine learning model to estimate vitals of the person from the the Red PPG signal and NIR PPG signal.
- machine learning model is the linear regression model.
- processes 600-800 would be used for estimating Pulse rate, Respiration Rate, Blood Oxygen saturation and Blood Pressure based on the Red PPG signal and NIR PPG signal for at least a predetermined number of datasets is obtained in order to train the model for the machine learning model before the machine learning model takes over from processes 600-800.
Landscapes
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Animal Behavior & Ethology (AREA)
- Cardiology (AREA)
- Physiology (AREA)
- Biophysics (AREA)
- Pathology (AREA)
- Biomedical Technology (AREA)
- Heart & Thoracic Surgery (AREA)
- Medical Informatics (AREA)
- Veterinary Medicine (AREA)
- Surgery (AREA)
- Public Health (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Psychiatry (AREA)
- Pulmonology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- General Physics & Mathematics (AREA)
- Multimedia (AREA)
- Theoretical Computer Science (AREA)
- Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)
Abstract
This invention relates to a system and method for measuring vital body signs. The method comprises time synchronising a sequence of RGB and NIR images, detecting region of interest within the sequence of RGB and NIR images, extracting raw signals from the detected region of interest, extracting photoplethysmography (PPG) signals from the raw signals and estimating vitals of the person from the extracted PPG signals.
Description
A System And Method For Measuring Vital Body Signs
Field of the Invention
This invention relates to a system and method of estimating body vitals like Heart Rate, Respiration Rate, Blood Oxygen Saturation and Blood Pressure of a person. Specifically, this invention relates to a system and method of estimating body vitals like Heart Rate, Respiration Rate, Blood Oxygen Saturation and Blood Pressure of a person based on images from a colour camera and an active depth sensor simultaneously. Prior Art
Photoplethysmography (PPG) is an optically obtained plethysmogram used to detect blood volume changes in the microvascular bed of tissue. Remote Photoplethysmography allows estimation of blood volume changes without skin contact by using ambient light sources and video cameras. Under proper lighting conditions, minute variations in skin colour and temperature due to blood volume changes can be observed. This allows estimating blood volume changes without skin contact. However, due to the use of ambient light sources, slight change in the surrounding would inevitably lead to inaccurate results from the plethysmogram.
Therefore, those skilled in the art are striving to provide an improved method and system of using PPG to remotely obtain vital signs of patients accurately.
Summary of the Invention
The above and other problems are solved and an advance in the art is made by a system and method in accordance with this invention. A first advantage of the system and method in accordance with this invention is that the system and method allows non-contact measurement of vital signs. A second advantage of the system and method in accordance with this invention is that the system and method is able to estimate vital signs within a
short period of time, at less than 5 seconds, with an accuracy of more than 90%. A third advantage of the system and method in accordance with this invention is that the system and method is independent of the ambient light. A fourth advantage of the system and method in accordance with this invention is that the system and method utilises devices that are readily available. This allows remote monitoring of patient's vital signs with ease.
A first aspect of the disclosure describes a system for estimating vitals of a person. The system comprises an image sensor adapted to capture a sequence of color (RGB) and near infrared (NIR) images, a processing unit comprising a processor, memory and instructions stored on the memory and executable by the processor to: estimating vitals of the person from the sequence of RGB and NIR images.
In an embodiment of the first aspect of the disclosure, the instruction to estimating vitals of the person from the sequence of RGB and NIR images comprises instructions to: time synchronising the sequence of RGB and NIR images; detecting region of interest within the sequence of RGB and NIR images; extracting raw signals from the detected region of interest; extracting photoplethysmography (PPG) signals from the raw signals; and estimating vitals of the person from the extracted PPG signals.
In an embodiment of the first aspect of the disclosure, the instruction to detecting region of interest within the sequence of RGB and NIR images comprises instructions to: detecting a face of the person within each of the sequence of NIR images; in response to detecting the face of the person, detecting at least one region of interest within the detected face of the person; and mapping the detected region of interest in each of the sequence of NIR images to the corresponding sequence of RGB images.
In an embodiment of the first aspect of the disclosure, the instruction to detecting at least one region of interest within the detected face of the person comprises instructions to: applying facial landmark detection where 81 prominent feature points on the face the person are detected; and overlaying a box at a top of the detected 81 prominent feature points between facial edge and eyebrows, wherein the box is one region of interest.
In an embodiment of the first aspect of the disclosure, the instruction to detecting at least one region of interest within the detected face of the person further comprises instructions to: overlaying two boxes between facial edge and nose, wherein the two box are two other region of interests. In an embodiment of the first aspect of the disclosure, the instruction to detecting region of interest within the sequence of RGB and NIR images comprises instructions to: computing an average pixel intensity of the detected region of interest for each of a plurality of frequency bands including Red, Green and Blue from the sequence of RGB images and NIR frequency band from the sequence of NIR images; and appending the average pixel intensity value of each frequency band to the pixel intensity value of corresponding frequency band in respective RGB and NIR images to form a time series for each frequency band.
In an embodiment of the first aspect of the disclosure, the instruction to detecting region of interest within the sequence of RGB and NIR images further comprises instructions to: determining if a length of the time series is equal to a predetermined length, /.; in response to the length of the time series being equal to the predetermined length, L, normalising the pixel intensity value in the time series for each frequency band.
In an embodiment of the first aspect of the disclosure, the instruction to extracting PPG signals from the raw signals comprises instructions to: applying a moving average filter to the pixel intensity value in the time series for each frequency band, the moving average filter has a window length of ax * L; up-sampling the pixel intensity value in the time series for each frequency band using cubic spline interpolation method to form a new length, L2, where L2 = a2 * L; applying a detrending algorithm based on smoothness priors’ formulation for the upsampled pixel intensity value in the time series for each frequency band wherein a smoothness factor, L, of the detrending algorithm is set based on the following expression, l = a3 * L2; and applying Independent Component Analysis (ICA) to extract PPG signal from the detrended pixel intensity value in the time series for each
frequency band, wherein values of <¾,<¾,<¾ &L are calibrated based on the imaging sensor.
In an embodiment of the first aspect of the disclosure, the instruction to applying Independent Component Analysis (ICA) to extract PPG signal from the detrended pixel intensity value in the time series for each frequency band comprises instructions to: grouping the detrended pixel intensity value in the time series for each frequency band into a set of Red PPG signal and a set of NIR PPG signals wherein the Red PPG signal is derived from the frequency bands of Red, Green and Blue and the NIR PPG signal is derived from the frequency bands of NIR, Green and Blue; applying a band pass filter with cutoff frequency of between 0.16 and 2 Hz to the Red PPG signal; applying a band pass filter with cutoff frequency of between 0.16 and 2 Hz to the NIR PPG signal.
In an embodiment of the first aspect of the disclosure, the instruction to estimating vitals of the person from the extracted PPG signals comprises instructions to: computing Ratio (R) of time-domain range for NIR PPG signal and Red PPG signal wherein the Ratio (R) can be expressed with the following expression,
where amplitude range of Red PPG signal relates to difference between minimum and maximum values of the amplitude of the Red PPG signal in time domain, arithmetic mean of Red PPG signal relates to arithmetic mean of the Red PPG signal in between minimum and maximum values of the amplitude of the Red PPG signal in time domain, amplitude range of NIR PPG signal relates to difference between minimum and maximum values of the amplitude of the NIR PPG signal in time domain and arithmetic mean of NIR PPG signal relates to arithmetic mean of the NIR PPG signal in between minimum and maximum values of the amplitude of the NIR PPG signal in time domain; estimating the Blood Oxygen Saturation with the following expression,
Blood Oxygen Saturation ( SP02 ) = bi + β2 * R
where β1 refers to off-set parameter and β2 refers to slope parameter of the linear regression curve which maps the Ratio (R) to the blood oxygen saturation (SPO2).
In an embodiment of the first aspect of the disclosure, the instruction to estimating vitals of the person from the extracted PPG signals comprises instructions to: detecting Systolic Peak (Ps), Diastolic peak (Pd), and global minima (Pm) from the time-domain NIR PPG signal; estimating the Systolic and Diastolic Blood Pressure with the following expressions,
Diastolic Blood Pressure
^ ^ where Y1, Y2, Y3, Y4, Y5, Y6 are parameters of the regression curve which maps the
Heart Rate and NIR PPG signal amplitude features to the systolic and diastolic blood pressure, Y1 and Y4 are frequency-slope parameters for systolic and diastolic blood pressures respectively, Y2 and Y5 are amplitude-slope parameters for systolic and diastolic blood pressures respectively, Y3 and Y6 are off-set parameters for systolic and diastolic blood pressures respectively.
In an embodiment of the first aspect of the disclosure, the instruction to detecting Systolic Peak (Ps), Diastolic peak (Pd), and global minima (Pm) from the time-domain NIR PPG signal comprises instructions to: applying a peak detection algorithm on the NIR PPG signal to obtain a second order differential; identifying a global maxima in the NIR PPG signal as the Systolic Peak (Ps), a peak in the NIR PPG signal corresponding to a minima following the Systolic Peak (Ps) in the second order derivative as Diastolic peak (Pd), a maxima immediately after the systolic peak in the second order derivative as the Diatric notch, a minima of the entire NIR PPG signal as the global minima (Pm).
In an embodiment of the first aspect of the disclosure, the instruction to estimating vitals of the person from the extracted PPG signals comprises instructions to: applying fast fourier transform to the NIR PPG signal to transform the NIR PPG signal from time-domain to frequency domain; computing a plurality of local maxima of a power spectral density
(PSD) of the NIR PPG signal in frequency domain; in response to a local maxima between the frequency range of 0.8 and 2 Hz, estimating a Heart Rate (HR) with the following expression,
Heart Rate ( HR ) = 60 * fhr where fhr refers to the local maxima that is between the frequency range of 0.8 and
2 Hz; and in response to a local maxima between the frequency range of 0.1 and 0.5 Hz, estimating a Respiration Rate (RR) with the following expression,
Respiration Rate ( RR ) = 60 * frr where frr refers to the local maxima that is between the frequency range of 0.1 and 0.5 Hz.
In an embodiment of the first aspect of the disclosure, the instruction to estimating vitals of the person from the extracted PPG signals comprises instruction to applying a machine learning model based on a linear regression model.
A second aspect of the disclosure describes a method for estimating vitals of a person. The method comprises estimating vitals of the person from a sequence of RGB and NIR images.
In an embodiment of the second aspect of the disclosure, the step of estimating vitals of the person from the sequence of RGB and NIR images comprises: time synchronising the sequence of RGB and NIR images; detecting region of interest within the sequence of RGB and NIR images; extracting raw signals from the detected region of interest; extracting photoplethysmography (PPG) signals from the raw signals; and estimating vitals of the person from the extracted PPG signals.
In an embodiment of the second aspect of the disclosure, the step of detecting region of interest within the sequence of RGB and NIR images comprises: detecting a face of the person within each of the sequence of NIR images; in response to detecting the face of the person, detecting at least one region of interest within the detected face of the person;
and mapping the detected region of interest in each of the sequence of NIR images to the corresponding sequence of RGB images.
In an embodiment of the second aspect of the disclosure, the step of detecting at least one region of interest within the detected face of the person comprises instructions to: applying facial landmark detection where 81 prominent feature points on the face the person are detected; and overlaying a box at a top of the detected 81 prominent feature points between facial edge and eyebrows, wherein the box is one region of interest.
In an embodiment of the second aspect of the disclosure, the step of detecting at least one region of interest within the detected face of the person further comprises instructions to: overlaying two boxes between facial edge and nose, wherein the two box are two other region of interests.
In an embodiment of the second aspect of the disclosure, the step of detecting region of interest within the sequence of RGB and NIR images comprises instructions to: computing an average pixel intensity of the detected region of interest for each of a plurality of frequency bands including Red, Green and Blue from the sequence of RGB images and NIR frequency band from the sequence of NIR images; and appending the average pixel intensity value of each frequency band to the pixel intensity value of corresponding frequency band in respective RGB and NIR images to form a time series for each frequency band. In an embodiment of the second aspect of the disclosure, the step of detecting region of interest within the sequence of RGB and NIR images further comprises instructions to: determining if a length of the time series is equal to a predetermined length, /.; in response to the length of the time series being equal to the predetermined length, L, normalising the pixel intensity value in the time series for each frequency band. In an embodiment of the second aspect of the disclosure, the step of extracting
PPG signals from the raw signals comprises instructions to: applying a moving average filter to the pixel intensity value in the time series for each frequency band, the moving
average filter has a window length of a1 * L; up-sampling the pixel intensity value in the time series for each frequency band using cubic spline interpolation method to form a new length, L2, where L2 a2 * L; applying a detrending algorithm based on smoothness priors’ formulation for the upsampled pixel intensity value in the time series for each frequency band wherein a smoothness factor, L, of the detrending algorithm is set based on the following expression, λ = α3 * L2; and applying Independent Component Analysis (ICA) to extract PPG signal from the detrended pixel intensity value in the time series for each frequency band, wherein values of α1, α2, α3 & L are calibrated based on the imaging sensor. In an embodiment of the second aspect of the disclosure, the step of applying
Independent Component Analysis (ICA) to extract PPG signal from the detrended pixel intensity value in the time series for each frequency band comprises instructions to: grouping the detrended pixel intensity value in the time series for each frequency band into a set of Red PPG signal and a set of NIR PPG signals wherein the Red PPG signal is derived from the frequency bands of Red, Green and Blue and the NIR PPG signal is derived from the frequency bands of NIR, Green and Blue; applying a band pass filter with cutoff frequency of between 0.16 and 2 Hz to the Red PPG signal; applying a band pass filter with cutoff frequency of between 0.16 and 2 Hz to the NIR PPG signal.
In an embodiment of the first aspect of the disclosure, the instruction to estimating vitals of the person from the extracted PPG signals comprises instructions to: computing Ratio (R) of time-domain range for NIR PPG signal and Red PPG signal wherein the Ratio (R) can be expressed with the following expression,
where amplitude range of Red PPG signal relates to difference between minimum and maximum values of the amplitude of the Red PPG signal in time domain, arithmetic
mean of Red PPG signal relates to arithmetic mean of the Red PPG signal in between minimum and maximum values of the amplitude of the Red PPG signal in time domain, amplitude range of NIR PPG signal relates to difference between minimum and maximum values of the amplitude of the NIR PPG signal in time domain and arithmetic mean of NIR PPG signal relates to arithmetic mean of the NIR PPG signal in between minimum and maximum values of the amplitude of the NIR PPG signal in time domain; estimating the Blood Oxygen Saturation with the following expression,
Blood Oxygen Saturation ( SPO2 ) = β1 + β2 * R where β1 refers to off-set parameter and β2 refers to slope parameter of the linear regression curve which maps the Ratio (R) to the blood oxygen saturation (SPO2).
In an embodiment of the second aspect of the disclosure, the step of estimating vitals of the person from the extracted PPG signals comprises instructions to: detecting Systolic Peak (Ps), Diastolic peak (Pd), and global minima (Pm) from the time-domain NIR PPG signal; estimating the Systolic and Diastolic Blood Pressure with the following expressions,
Systolic Blood Pressure (SBP) =
Diastolic Blood Pressure (DBP) =
where Y1, Y2, Y3, Y4, Y5, Y6 are parameters of the regression curve which maps the Heart Rate and NIR PPG signal amplitude features to the systolic and diastolic blood pressure, y1 and y4 are frequency-slope parameters for systolic and diastolic blood pressures respectively, y2 and Y5 are amplitude-slope parameters for systolic and diastolic blood pressures respectively, y3 and Y6 are off-set parameters for systolic and diastolic blood pressures respectively.
In an embodiment of the second aspect of the disclosure, the step of detecting Systolic Peak (Ps), Diastolic peak (Pd), and global minima (Pm) from the time-domain NIR PPG signal comprises instructions to: applying a peak detection algorithm on the NIR PPG signal to obtain a second order differential; identifying a global maxima in the NIR PPG
signal as the Systolic Peak (Ps), a peak in the NIR PPG signal corresponding to a minima following the Systolic Peak (Ps) in the second order derivative as Diastolic peak (Pd), a maxima immediately after the systolic peak in the second order derivative as the Diatric notch, a minima of the entire NIR PPG signal as the global minima (Pm). In an embodiment of the second aspect of the disclosure, the step of estimating vitals of the person from the extracted PPG signals comprises instructions to: applying fast fourier transform to the NIR PPG signal to transform the NIR PPG signal from time-domain to frequency domain; computing a plurality of local maxima of a power spectral density (PSD) of the NIR PPG signal in frequency domain; in response to a local maxima between the frequency range of 0.8 and 2 Hz, estimating a Heart Rate (HR) with the following expression,
Heart Rate ( HR ) = 60 * fhr where fhr refers to the local maxima that is between the frequency range of 0.8 and 2 Hz; and in response to a local maxima between the frequency range of 0.1 and 0.5 Hz, estimating a Respiration Rate (RR) with the following expression,
Respiration Rate ( RR ) = 60 * frr where frr refers to the local maxima that is between the frequency range of 0.1 and
0.5 Hz.
In an embodiment of the second aspect of the disclosure, the step of estimating vitals of the person from the extracted PPG signals comprises applying a machine learning model based on a linear regression model.
Brief Description of the Drawings
The above and other features and advantages in accordance with this invention are described in the following detailed description and are shown in the following drawings:
Figure 1 illustrating a system for estimating vitals of a person in accordance with an embodiment of this invention;
Figure 2 illustrating a process flow that is executable by a processing unit of the system in figure 1 in accordance with an embodiment of this invention;
Figure 3 illustrating a sample image of colour image and NIR image in accordance with an embodiment of this invention; Figure 4 illustrating a process flow for extracting raw signal in accordance with an embodiment of this invention;
Figure 5 illustrating a process flow for extracting PPG signal in accordance with an embodiment of this invention;
Figure 6 illustrating a process flow for estimating blood oxygen saturation of a person from the extracted PPG signal in accordance of this invention;
Figure 7 illustrating a process flow for estimating Systolic and Diastolic Blood Pressure of a person from the extracted PPG signal in accordance of this invention;
Figure 8 illustrating a process flow for estimating Heart Rate and Respiration Rate of a person from the extracted PPG signal in accordance of this invention; Figure 9 illustrating the NIR PPG signal and the second order differential of the NIR
PPG signal in accordance of this invention;
Figure 10 illustrating a linear regression curve that can be used for estimation of SPO2 from Ratio (R); and
Figure 11 illustrating a linear regression curve that can be used for estimation of blood pressure.
Detailed Description
This invention relates to a system and method of estimating body vitals like Heart Rate, Respiration Rate, Blood Oxygen Saturation and Blood Pressure of a person. Specifically, this invention relates to a system and method of estimating body vitals like Heart Rate, Respiration Rate, Blood Oxygen Saturation and Blood Pressure of a person based on images from a Colour camera and depth sensor simultaneously.
Figure 1 illustrates a system 100 for estimating body vitals such as Heart Rate, Respiration Rate, Blood Oxygen Saturation and Blood Pressure of a person. The system 100 comprises a depth sensor 110, an image sensor 120 and a processing unit 130.
The depth sensor 110 is an active depth sensor that works in the near infrared (NIR) region. By using active depth sensor 110, we can eliminate error due to variations in ambient light. It is also observed that absorption of the light by Hemoglobin is lowest in the NIR region. Specifically, the skin cells and Hemoglobin in blood absorbs less NIR light (800- 1000 nm) compared to visible light and hence the blood flow changes underneath the skin are sensed with greater contrast compared to visible light. Due to low absorption, the PPG signal will have the higher signal to noise ratio. Since the active depth sensor works in the NIR region, we can use active depth sensor 110 to shine NIR light on the skin of a person and measure its absorption to assist in estimating the vitals of the person.
The image sensor 120 is a typical colour camera for capturing sequence of images. Such image sensor 120 is commonly known in the art and a detailed description of the image sensor 120 would be omitted for brevity.
The processing unit 130 is a typical computing system that comprises a processor, memory and instructions stored on the memory and executable by the processor. The processor may be a processor, microprocessor, microcontroller, application specific integrated circuit, digital signal processor (DSP), programmable logic circuit, or other data processing device that executes instructions to perform the processes in accordance with the present invention. The processor has the capability to execute various applications that are stored in the memory. The memory may include read-only memory (ROM), random- access memory (RAM), electrically erasable programmable ROM (EEPROM), flash cards, or any storage medium. Instructions are computing codes, software applications that are stored on the memory and executable by the processor to perform the processes in accordance with this invention. Such computing system is well known in the art and hence only briefly described herein. The instructions can developed in C++ language (or any other
known programming language) and can be run on System on Chip (SoC) like Raspberry Pi or/and mobile devices like cell phones or tablet PCs.
Figure 2 illustrates a process flow 200 that is executed by the instructions of the processing unit to estimate vital signs of a person in accordance with this invention. To use the system 100, both active depth sensor 110 and image sensor 120 would pointing in the same direction. Images captured by both sensors 110 and 120 would be transmitted to the processing unit 130.
Process 200 begins with step 205 where the images from the active depth sensor 110 and image sensor 120 are time synchronised. Essentially, there should not be any time shift between the images from both sensors.
In step 210, process 200 detects the region of interest. The region of interest (Rol) includes, forehead, cheek and any other exposed skin. In one embodiment, when the system is arranged in a manner to capture the face of the person, the region of interest would include the forehead and cheek. Figure 3 illustrates an example of an image taken from the active depth sensor 110 and image sensor 120. Specifically, the left image 310 is taken from the active depth sensor 110 and the right image 320 is taken from the image sensor 120. In step 210, the process first detects the face 311 and thereafter detects whether the person is wearing a mask. If the person is not wearing a mask, the process would detect the forehead region 312 and the cheek regions 313. If the person is wearing a mask, the process would detect the forehead region 312 only. One method of detecting the Rol is by using facial landmark detection where 81 prominent feature points on the face are detected. The 81 prominent feature points include eyebrow, eye, nose, mouth and facial edge. Based on the 81 prominent detected points, the process would be able to determine if the person of interest is facing the camera (i.e. frontal, profile, or semi-frontal) and whether the person is wearing a mask (i.e. mouth and part of nose and bottom facial edges would be obscured). Using the 81 prominent feature points detected, we can simply overlay a box on the top of the detected 81 prominent feature points between the facial edge and
eyebrows. If the person is not wearing a mask, another two boxes would be overlay at the cheeks which are between the facial edge and nose.
Based on the detected region, similar region would be marked on the image from the image sensor. The region of interest is computed for all the images from the active depth sensor 110 and mapped to the images from image sensor 120 using the calibration. The Rol detection is executed on the captured images to form a set of time synchronised images. The set of time synchronised images is a length of no more than 5 seconds. In one embodiment, both active depth sensor 110 and image sensor 120 are calibrated using standard Tsai camera calibration technique to know the pixel mapping between them so that when we detect the region of interest in one image, it can be mapped to the other image without additional computations. For example, if we detect forehead and cheek regions in the image from the active depth camera, we can map the regions corresponding to the forehead and cheek regions in the image from the image sensor by using the calibration. In step 215, process 200 extracts raw signals within the detected region(s) of the images from both the active depth sensor 110 and image sensor 120. Further details will be described below with reference to figure 4.
In step 220, process 200 extracts PPG signal from the raw signals. Further details will be described below with reference to figure 5. In step 225, process 200 estimates the vital signs including Respiration Rate (RR),
Heart Rate (HR), Systolic Blood Pressure (SBP), Diastolic Blood Pressure (DBP) and Blood Oxygen Saturation (SPO2). Further details will be described below with reference to figure 6.
Process 200 ends after step 225. Figure 4 illustrates a process flow 400 that is executed by the instructions of the processing unit to extract raw signals within the detected region(s) of the images from both the active depth sensor of step 215 in process 200 in accordance with this invention. The
raw signals are pixel intensity values of each frequency band in the RGB images and NIR images. Process 400 begins with step 405 to compute the average pixel intensity of the detected region(s) for each of the frequency bands including Red, Green and Blue from the image of the image sensor 220 and NIR from the depth sensor 210. Each detected Region of Interest (Rol) is a rectangular matrix of the pixel intensity values as identified as 312 and 313 in Figure 3. For each of the frequency bands in the images (Red, Green, Blue and NIR), the arithmetic mean of all the pixels (average pixel intensity) within the region of interest 312 and 313 is computed in step 405. The average pixel intensity value for each frequency band computed in step 405 is then appended to the time series of the corresponding frequency band in step 410.
In step 410, process 400 appends the average pixel intensity value to form a time series corresponding to the frequency band. Specifically, process 400 appends the average pixel intensity value of each frequency band to the pixel intensity value of corresponding frequency band in respective RGB and NIR images to form a time series for each frequency band, the frequency band includes R, G B and NIR. The time series relate to the time synchronised video for estimating the vital signs. Each set of time series is no longer than 5 seconds of video clip. The average pixel intensity values are appended in the following manner:
1 . For Red time series, the average pixel intensity values of the Red frequency band computed in step 405 is appended to the pixel intensity values of Red frequency band in each image from the video clip to form the red time series;
2. For Green time series, the average pixel intensity values of the Green frequency band computed in step 405 is appended to the pixel intensity values of Green frequency band in each image from the video clip to form the green time series; 3. For Blue time series, the average pixel intensity values of the Blue frequency band computed in step 405 is appended to the pixel intensity values of Blue frequency band in each image from the video clip to form the blue time series; and
4. For NIR time series, the average pixel intensity values of the NIR frequency band computed in step 405 is appended to the pixel intensity values of NIR frequency band in each image from the video clip to form the NIR time series.
At the end of the step 410, process 400 generates a table as illustrated below. Each column of Red, Blue, Green and NIR corresponds to a time series where the average pixel intensity value has been appended to respective frequency band. Below table only illustrates four time stamps, at 0 second, 0.036 second, 0.072 second and 0.108 second. One skilled in the art will recognise that more time stamps are required for each successful estimation of the vital signs. As mentioned previously, the time series should contain no more than 5 seconds worth of data. Preferably, if the camera frame capture rate is less than or equal to 30 FPS, the minimum would be 104 frames; and if camera frame capture rate is more than 30 FPS, the minimum would be 128 frames.
In step 415, process 400 determines if the time series length is equal to a predetermined length, L The predetermined length, L, is no more than 5 seconds, which is computed based on the frame rate of the image sensor. If the length of the time series is equal to the predetermined length, L, process 400 proceeds to step 425. If the time series is not equal to the predetermined length, L, process 400 repeats from step 205. Essentially, in order to accurately estimate the vital signs, the system requires a minimum length of data and this length is no more than 5 seconds. One skilled in the art will recognise that other conditions may be applied in step 415 without departing from the invention. For
example, step 415 may be modified to determine if the time series length is less than or not more than the predetermined length, L
In step 425, process 400 normalises the pixel intensity value in the time series for each frequency band. Specifically, the pixel intensity value from each frequency band in the time series are normalised to restrict the amplitude to be within [0, 1]. One example of normalising the pixel intensity value in each frequency band in the time series is by subtracting the pixel intensity value in each frequency band by their arithmetic mean of the pixel intensity value and divided by their standard deviation the pixel intensity value.
Process 400 ends after step 425. Figure 5 illustrates a process flow 500 that is executed by the instructions of the processing unit to extract PPG signal from the raw signals of step 220 in process 200 in accordance with this invention. Process 500 begins with step 505 by applying a moving average filter for each frequency band. Specifically, process 505 applies a moving average filter to the pixel intensity value in the time series for each frequency band. The moving average filter has a window length of at * L, to remove sudden amplitude changes in the raw signal caused due to movement of the person across the time series. The choice of atand L are shown in the table below.
In step 510, process 500 up-samples the raw signal using cubic spline interpolation method to a new length, L2, where L2 = a2 * L. Specifically, process 500 up-samples the pixel intensity value in the time series for each frequency band using cubic spline interpolation method to form the new length. Up-sampling would involve adding zero samples between the raw signal to increase the sampling rate. By using cubic spline interpolation method, the added zero samples would be reconstructed to non-zero samples which are typically approximate to adjacent samples of the raw signal. This increases the resolution of the respiration rate and heart rate and thus the accuracy of the estimate.
In step 515, process 500 applies a detrending algorithm to the up-sampled pixel intensity value in the time series for each frequency band. For example, an advanced
detrending algorithm based on smoothness priors’ formulation is applied to remove noise in the up-sampled raw signal while preserving the frequency bands corresponding to respiration rate and heart rate. Further details of the detrending algorithm can be found in Tarvainen, M.P., Ranta-Aho, P.O. and Karjalainen, P.A., 2002. An advanced detrending method with application to HRV analysis. IEEE Transactions on Biomedical Engineering, 49(2), pp.172-175. The smoothness factor, L, of the detrending algorithm is set based on the following expression, l = CC-j * Z*2
The above expression automates the tuning of the smoothness factor, L, and eliminates the need for re-tuning the algorithm when the length of raw signal is varied.
In step 520, process 500 applies Independent Component Analysis (ICA) to extract PPG signal from the detrended raw signal (i.e. detrended pixel intensity value in the time series for each frequency band). The detrended raw signal is grouped into two sets based on the frequency bands to extract Red and NIR PPG signals using ICA. Specifically, the frequency bands of Red, Green and Blue are used to extract Red PPG signal in step 530 and the frequency bands of NIR, Green and Blue are used to extract NIR PPG signal in step 540. ICA is a computational method for separating a multivariate signal into additive subcomponents. It is a standard technique well known in the field of signal processing and hence details of ICA is omitted for brevity. In step 535, a band pass filter with cutoff frequency between 0.16 and 2 Hz is applied to the Red PPG signal to remove noise. Similarly in step 535, a band pass filter with cutoff frequency between 0.16 and 2 Hz is applied to the NIR PPG signal to remove noise. The cutoff frequency is set to 0.16 and 2 Hz which covers the frequency range of average human respiration rate and heart rate. The band pass filter is a Finite Impulse Response (FIR) filter which is a standard time domain filtering technique. The cutoff frequency can be inversed to translate to time domain.
The values of a1, a2, a3 &.L are calibrated based on the imaging devices used to capture video. The imaging device’s video capture frame rate plays a key role in the calibration process, and this is dependent on the device and the processing unit 130. These calibrated values are used to automate the up-sampling and detrending algorithms. Below are two examples of known devices with the calibrated values.
Process 500 ends after step 535 and 545.
Figure 6 illustrates a process flow 600 that is executed by the instructions of the processing unit to estimate Blood Oxygen Saturation (SPO2) in step 225 in process 200 in accordance with this invention. Process 600 begins with step 605 by computing Ratio (R) of time-domain range for NIR PPG signal and Red PPG signal. Ratio (R) can be computed with the following expression,
Where amplitude range of Red PPG signal refers to difference between minimum and maximum values of the amplitude of the Red PPG signal in time domain, arithmetic mean of Red PPG signal refers to arithmetic mean of the Red PPG signal in between minimum and maximum values of the amplitude of the Red PPG signal in time domain, amplitude range of NIR PPG signal refers to difference between minimum and maximum values of the amplitude of the NIR PPG signal in time domain and arithmetic mean of NIR
PPG signal refers to arithmetic mean of the NIR PPG signal in between minimum and maximum values of the amplitude of the NIR PPG signal in time domain.
In step 610, process 600 estimates the Blood Oxygen Saturation with the following expression. Blood Oxygen Saturation ( SP02 ) = b + β2 * R
Where b1 refers to off-set parameter and β2 refers to slope parameter of the linear regression curve which maps the Ratio (R) to the blood oxygen saturation (SPO2). /¾ and β2 are variables that can be derived from the linear regression curve. Figure 10 illustrates a linear regression curve that can be used for estimation of SPO2 from Ratio (R). Process 600 ends after step 610.
Figure 7 illustrates a process flow 700 that is executed by the instructions of the processing unit to estimate Systolic and Diastolic Blood Pressure in step 225 in process 200 in accordance with this invention. Process 700 begins with step 705 by detecting Systolic Peak (Ps), Diastolic peak (Pd), and global minima (Pm) from the time-domain NIR PPG signal. From figure 9, the Systolic peak times can be detected accurately from the blood volume pulse (BVP) waveform as they are maxima within the signal. Here BVP waveform refers to the NIR PPG signal. Maxima refers to global maxima which is the maxima of the entire NIR PPG signal. The amplitude of the global maxima is identified as the Systolic Peak (Ps) 910. From figure 9, the largest minima within the second order derivative correspond to the Systolic Peaks (Ps) 910 and the minima following these typically correspond to the diastolic peaks/inflections. Based on the location of the diastolic peak in the second order derivative, the amplitude of the corresponding point on the NIR PPG signal is identified as Diastolic peak (Pd) 930. The location corresponding to the Diatric notch 920 is identified as the maxima immediately after the location identified as the systolic peak in the second order derivative. The minima of the entire NIR PPG signal would be the global minima (Pm) 940.
In step 610, process 600 estimates the Systolic and Diastolic Blood Pressure with the following expressions,
Diastolic Blood Pressure
Where Y1, Y2, Y3, Y4, Y5, Y6 are parameters of the regression curve which maps the
Heart Rate and NIR PPG signal amplitude features to the systolic and diastolic blood pressure, Y1 and Y4 are frequency-slope parameters for systolic and diastolic blood pressures respectively, Y2 and Y5 are amplitude-slope parameters for systolic and diastolic blood pressures respectively, Y3 and Y1 are off-set parameters for systolic and diastolic blood pressures respectively. Figure 11 illustrates a linear regression curve that can be used for estimation of blood pressure. The log in figure 11 refers to logarithm base-10 and In refers to logarithm base-e (natural log).
Process 700 ends after step 710.
Figure 8 illustrates a process flow 800 that is executed by the instructions of the processing unit to estimate Heart Rate (HR) and Respiration Rate (RR) in step 225 in process 200 in accordance with this invention. Process 800 begins with step 805 by applying fast fourier transform to the NIR PPG signal to transform the NIR PPG signal from time-domain to frequency domain.
In step 810, process 800 compute the local maxima of power spectral density (PSD). Specifically, PSD is computed by dividing the NIR PPG in frequency domain by the length of the NIR PPG signal (L). The local maxima of PSD is then computed by using any generic peak finder algorithms.
In step 820, process 800 determines the frequency range of the local maxima to use for computing the heart rate and respiration rate. For Heart Rate computations, the local maxima within the frequency range of between 0.8 and 2 Hz is used. For Respiration Rate computations, the local maxima within the frequency range of between 0.1 and 0.5 Hz is used. Hence, process 800 proceeds to step 830 to estimate the Heart Rate (HR)
using the local maxima that is that is within the frequency range of between 0.8 and 2 Hz and to step 840 to estimate the Respiration Rate (RR) using the local maxima that is within the frequency range of between 0.1 and 0.5 Hz.
In step 830, process 800 estimates the Heart Rate (HR) with the following expression,
Heart Rate ( HR ) = 60 * fhr
Where fhr refers to the local maxima that is within the frequency range of between 0.8 and 2 Hz.
In step 840, process 800 estimates the Respiration Rate (RR) with the following expression,
Respiration Rate ( RR ) = 60 * frr
Where frr refers to the local maxima that is within the frequency range of between 0.1 and 0.5 Hz.
Process 800 ends after steps 830 and 840. The Heart rate, Respiration rate and Blood Pressure are estimated using NIR PPG signal derived from data captured by active depth camera. These estimates do not vary with the ambient lighting conditions. The blood pressure model works on the premise that the systolic and diastolic peaks identified which correspond to the respective heart activities. This method gives greater accuracy compared to just using the global maxima in the model. The estimation of Blood Oxygen Saturation is based on the ratio of range of Red PPG signal to range of NIR PPG signal.
In short, the method according to this invention involves using an active depth camera to capture simultaneous, time synchronized, video frames of both Color (RGB) and NIR wavelengths. From the video frames, a person’s face is detected and the region corresponding to their forehead is extracted (region of interest) and the pixel intensity values are averaged to form a time domain raw signal across multiple video frames. The time domain raw signal is detrended to remove higher order harmonics and resampled and
filtered using Independent Component Analysis (ICA) and FIR bandpass filter to obtain the PPG signal. Vitals such as Pulse rate, Respiration Rate, Blood Oxygen saturation and Blood Pressure are estimated from the PPG signal using frequency analysis using FFT for the former two vitals and mathematical models for the latter two vitals. Processes 600-800 illustrate one embodiment of estimating Pulse rate, Respiration
Rate, Blood Oxygen saturation and Blood Pressure based on the Red PPG signal and NIR PPG signal. Another embodiment of estimating Pulse rate, Respiration Rate, Blood Oxygen saturation and Blood Pressure based on the Red PPG signal and NIR PPG signal is the use of a machine learning model to estimate vitals of the person from the the Red PPG signal and NIR PPG signal. One example of such machine learning model is the linear regression model. In another embodiment, processes 600-800 would be used for estimating Pulse rate, Respiration Rate, Blood Oxygen saturation and Blood Pressure based on the Red PPG signal and NIR PPG signal for at least a predetermined number of datasets is obtained in order to train the model for the machine learning model before the machine learning model takes over from processes 600-800.
The above is a description of exemplary embodiments of a system and method in accordance with this disclosure. It is foreseeable that those skilled in the art can and will design alternative system and method based on this disclosure.
Claims
1 . A system for estimating vitals of a person comprising: an image sensor adapted to capture a sequence of color (RGB) and near infrared
(NIR) images; a processing unit comprising a processor, memory and instructions stored on the memory and executable by the processor to: estimating vitals of the person from the sequence of RGB and NIR images.
2. The system according to claim 1 wherein the instruction to estimating vitals of the person from the sequence of RGB and NIR images comprises instructions to: time synchronising the sequence of RGB and NIR images; detecting region of interest within the sequence of RGB and NIR images; extracting raw signals from the detected region of interest; extracting photoplethysmography (PPG) signals from the raw signals; and estimating vitals of the person from the extracted PPG signals.
3. The system according to claim 2 wherein the instruction to detecting region of interest within the sequence of RGB and NIR images comprises instructions to: detecting a face of the person within each of the sequence of NIR images; in response to detecting the face of the person, detecting at least one region of interest within the detected face of the person; and mapping the detected region of interest in each of the sequence of NIR images to the corresponding sequence of RGB images.
4 The system according to claim 3 wherein the instruction to detecting at least one region of interest within the detected face of the person comprises instructions to:
applying facial landmark detection where 81 prominent feature points on the face the person are detected; and overlaying a box at a top of the detected 81 prominent feature points between facial edge and eyebrows, wherein the box is one region of interest.
5 The system according to claim 4 wherein the instruction to detecting at least one region of interest within the detected face of the person further comprises instructions to: overlaying two boxes between facial edge and nose, wherein the two box are two other region of interests.
6. The system according to any one of claims 3-5 wherein the instruction to detecting region of interest within the sequence of RGB and NIR images comprises instructions to: computing an average pixel intensity of the detected region of interest for each of a plurality of frequency bands including Red, Green and Blue from the sequence of RGB images and NIR frequency band from the sequence of NIR images; and appending the average pixel intensity value of each frequency band to the pixel intensity value of corresponding frequency band in respective RGB and NIR images to form a time series for each frequency band.
7. The system according to claim 6 wherein the instruction to detecting region of interest within the sequence of RGB and NIR images further comprises instructions to: determining if a length of the time series is equal to a predetermined length, /.; in response to the length of the time series being equal to the predetermined length, L, normalising the pixel intensity value in the time series for each frequency band.
8. The system according to claim 7 wherein the instruction to extracting PPG signals from the raw signals comprises instructions to:
applying a moving average filter to the pixel intensity value in the time series for each frequency band, the moving average filter has a window length of <¾ * L; up-sampling the pixel intensity value in the time series for each frequency band using cubic spline interpolation method to form a new length, Lå, where L2 = a2 * L applying a detrending algorithm based on smoothness priors’ formulation for the upsampled pixel intensity value in the time series for each frequency band wherein a smoothness factor, L, of the detrending algorithm is set based on the following expression, L — oc2 * L2 , and applying Independent Component Analysis (ICA) to extract PPG signal from the detrended pixel intensity value in the time series for each frequency band, wherein values of alr a2, a3 &.L are calibrated based on the imaging sensor.
9. The system according to claim 8 wherein the instruction to applying Independent Component Analysis (ICA) to extract PPG signal from the detrended pixel intensity value in the time series for each frequency band comprises instructions to: grouping the detrended pixel intensity value in the time series for each frequency band into a set of Red PPG signal and a set of NIR PPG signals wherein the Red PPG signal is derived from the frequency bands of Red, Green and Blue and the NIR PPG signal is derived from the frequency bands of NIR, Green and Blue; applying a band pass filter with cutoff frequency of between 0.16 and 2 Hz to the
Red PPG signal; and applying a band pass filter with cutoff frequency of between 0.16 and 2 Hz to the NIR PPG signal.
10. The system according to claim 9 wherein the instruction to estimating vitals of the person from the extracted PPG signals comprises instructions to use a machine learning model based on a linear regression model.
11. The system according to claim 9 wherein the instruction to estimating vitals of the person from the extracted PPG signals comprises instructions to: computing Ratio (R) of time-domain range for NIR PPG signal and Red PPG signal wherein the Ratio (R) can be expressed with the following expression,
where amplitude range of Red PPG signal relates to difference between minimum and maximum values of the amplitude of the Red PPG signal in time domain, arithmetic mean of Red PPG signal relates to arithmetic mean of the Red PPG signal in between minimum and maximum values of the amplitude of the Red PPG signal in time domain, amplitude range of NIR PPG signal relates to difference between minimum and maximum values of the amplitude of the NIR PPG signal in time domain and arithmetic mean of NIR PPG signal relates to arithmetic mean of the NIR PPG signal in between minimum and maximum values of the amplitude of the NIR PPG signal in time domain; and estimating the Blood Oxygen Saturation with the following expression,
Blood Oxygen Saturation ( SPO2 ) = β1 + β2 * R refers to off-set parameter and β2 refers to slope parameter of the linear regression curve which maps the Ratio (R) to the blood oxygen saturation (SPO2).
12. The system according to claim 9 or 10 wherein the instruction to estimating vitals of the person from the extracted PPG signals comprises instructions to: detecting Systolic Peak (Ps), Diastolic peak (Pd), and global minima (Pm) from the time-domain NIR PPG signal; estimating the Systolic and Diastolic Blood Pressure with the following expressions, Systolic Blood Pressure
Diastolic Blood Pressure (
where Y1, Y2, Y3, Y4, Y5, Y6 are parameters of the regression curve which maps the Heart Rate and NIR PPG signal amplitude features to the systolic and diastolic blood pressure, Y1 and Y4 are frequency-slope parameters for systolic and diastolic blood pressures respectively, Y2 and Y5 are amplitude-slope parameters for systolic and diastolic blood pressures respectively, Y3 and Y6 are off-set parameters for systolic and diastolic blood pressures respectively.
13. The system according to claim 12 wherein the instruction to detecting Systolic Peak (Ps), Diastolic peak (Pd), and global minima (Pm) from the time-domain NIR PPG signal comprises instructions to: applying a peak detection algorithm on the NIR PPG signal to obtain a second order differential; identifying a global maxima in the NIR PPG signal as the Systolic Peak (Ps), a peak in the NIR PPG signal corresponding to a minima following the Systolic Peak (Ps) in the second order derivative as Diastolic peak (Pd), a maxima immediately after the systolic peak in the second order derivative as the Diatric notch, a minima of the entire NIR PPG signal as the global minima (Pm).
14. The system according to any one of claims 7-13 wherein the instruction to estimating vitals of the person from the extracted PPG signals comprises instructions to: applying fast fourier transform to the NIR PPG signal to transform the NIR PPG signal from time-domain to frequency domain; computing a plurality of local maxima of a power spectral density (PSD) of the NIR PPG signal in frequency domain; in response to a local maxima between the frequency range of 0.8 and 2 Hz, estimating a Heart Rate (HR) with the following expression,
Heart Rate ( HR ) = 60 * fhr
where fhr refers to the local maxima that is between the frequency range of 0.8 and 2 Hz; and in response to a local maxima between the frequency range of 0.1 and 0.5 Hz, estimating a Respiration Rate (RR) with the following expression, Respiration Rate ( RR ) = 60 * frr where frr refers to the local maxima that is between the frequency range of 0.1 and
0.5 Hz.
15. A method of estimating vitals of a person from a sequence of RGB and NIR images comprising: time synchronising the sequence of RGB and NIR images; detecting region of interest within the sequence of RGB and NIR images; extracting raw signals from the detected region of interest; extracting photoplethysmography (PPG) signals from the raw signals; and estimating vitals of the person from the extracted PPG signals.
16. The method according to claim 15 wherein the step of detecting region of interest within the sequence of RGB and NIR images comprises: detecting a face of the person within each of the sequence of NIR images; in response to detecting the face of the person, detecting at least one region of interest within the detected face of the person; and mapping the detected region of interest in each of the sequence of NIR images to the corresponding sequence of RGB images.
17. The method according to claim 16 wherein the step of detecting at least one region of interest within the detected face of the person comprises:
applying facial landmark detection where 81 prominent feature points on the face the person are detected; and overlaying a box at a top of the detected 81 prominent feature points between facial edge and eyebrows, wherein the box is one region of interest.
18. The method according to claim 17 wherein the step of detecting at least one region of interest within the detected face of the person further comprises: overlaying two boxes between facial edge and nose, wherein the two box are two other region of interests.
19. The method according to any one of claims 16-18 wherein the step of detecting region of interest within the sequence of RGB and NIR images comprises: computing an average pixel intensity of the detected region of interest for each of a plurality of frequency bands including Red, Green and Blue from the sequence of RGB images and NIR frequency band from the sequence of NIR images; and appending the average pixel intensity value of each frequency band to the pixel intensity value of corresponding frequency band in respective RGB and NIR images to form a time series for each frequency band.
20. The method according to claim 19 wherein the step of detecting region of interest within the sequence of RGB and NIR images further comprises: determining if a length of the time series is equal to a predetermined length, /.; in response to the length of the time series being equal to the predetermined length, L, normalising the pixel intensity value in the time series for each frequency band.
21. The method according to claim 20 wherein the step of extracting PPG signals from the raw signals comprises:
applying a moving average filter to the pixel intensity value in the time series for each frequency band, the moving average filter has a window length of <¾ * L; up-sampling the pixel intensity value in the time series for each frequency band using cubic spline interpolation method to form a new length, Lå, where L2 = a2 * L applying a detrending algorithm based on smoothness priors’ formulation for the upsampled pixel intensity value in the time series for each frequency band wherein a smoothness factor, L, of the detrending algorithm is set based on the following expression, L — oc2 * L2 , and applying Independent Component Analysis (ICA) to extract PPG signal from the detrended pixel intensity value in the time series for each frequency band, wherein values of alr a2, a3 &.L are calibrated based on the imaging sensor.
22. The method according to claim 21 wherein the step of applying Independent Component Analysis (ICA) to extract PPG signal from the detrended pixel intensity value in the time series for each frequency band comprises: grouping the detrended pixel intensity value in the time series for each frequency band into a set of Red PPG signal and a set of NIR PPG signals wherein the Red PPG signal is derived from the frequency bands of Red, Green and Blue and the NIR PPG signal is derived from the frequency bands of NIR, Green and Blue; applying a band pass filter with cutoff frequency of between 0.16 and 2 Hz to the
Red PPG signal; and applying a band pass filter with cutoff frequency of between 0.16 and 2 Hz to the NIR PPG signal.
23. The method according to claim 22 wherein the step of estimating vitals of the person from the extracted PPG signals comprises:
computing Ratio (R) of time-domain range for NIR PPG signal and Red PPG signal wherein the Ratio (R) can be expressed with the following expression,
where amplitude range of Red PPG signal relates to difference between minimum and maximum values of the amplitude of the Red PPG signal in time domain, arithmetic mean of Red PPG signal relates to arithmetic mean of the Red PPG signal in between minimum and maximum values of the amplitude of the Red PPG signal in time domain, amplitude range of NIR PPG signal relates to difference between minimum and maximum values of the amplitude of the NIR PPG signal in time domain and arithmetic mean of NIR PPG signal relates to arithmetic mean of the NIR PPG signal in between minimum and maximum values of the amplitude of the NIR PPG signal in time domain; estimating the Blood Oxygen Saturation with the following expression,
Blood Oxygen Saturation ( SPO2 ) = β1 + β2 * R where β1 refers to off-set parameter and β2 refers to slope parameter of the linear regression curve which maps the Ratio (R) to the blood oxygen saturation (SPO2).
24. The method according to claim 22 or 23 wherein the step of estimating vitals of the person from the extracted PPG signals comprises: detecting Systolic Peak (Ps), Diastolic peak ( Pd ), and global minima (Pm) from the time-domain NIR PPG signal; estimating the Systolic and Diastolic Blood Pressure with the following expressions, Systolic Blood Pressure (SBP)
Diastolic Blood Pressure (DBP)
where Y1; Y2- Y3- Y4- Y5- Y6 are parameters of the regression curve which maps the Heart Rate and NIR PPG signal amplitude features to the systolic and diastolic blood
pressure, yt and y4 are frequency-slope parameters for systolic and diastolic blood pressures respectively, y2 and g5 are amplitude-slope parameters for systolic and diastolic blood pressures respectively, y3 and y6 are off-set parameters for systolic and diastolic blood pressures respectively.
25. The method according to claim 24 wherein the step of detecting Systolic Peak (Ps), Diastolic peak (Pd), and global minima (Pm) from the time-domain NIR PPG signal comprises: applying a peak detection algorithm on the NIR PPG signal to obtain a second order differential; identifying a global maxima in the NIR PPG signal as the Systolic Peak (Ps), a peak in the NIR PPG signal corresponding to a minima following the Systolic Peak (Ps) in the second order derivative as Diastolic peak (Pd), a maxima immediately after the systolic peak in the second order derivative as the Diatric notch, a minima of the entire NIR PPG signal as the global minima (Pm).
26. The method according to any one of claims 20-25 wherein the step of estimating vitals of the person from the extracted PPG signals comprises: applying fast fourier transform to the NIR PPG signal to transform the NIR PPG signal from time-domain to frequency domain; computing a plurality of local maxima of a power spectral density (PSD) of the NIR PPG signal in frequency domain; in response to a local maxima between the frequency range of 0.8 and 2 Hz, estimating a Heart Rate (HR) with the following expression, Heart Rate ( HR ) = 60 * fhr where fhr refers to the local maxima that is between the frequency range of 0.8 and
2 Hz; and
in response to a local maxima between the frequency range of 0.1 and 0.5 Hz, estimating a Respiration Rate (RR) with the following expression,
Respiration Rate ( RR ) = 60 * frr where frr refers to the local maxima that is between the frequency range of 0.1 and 0.5 Hz.
27. The method according to claim 22 wherein the step of estimating vitals of the person from the extracted PPG signals comprises applying a machine learning model based on a linear regression model.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
SG10202101561W | 2021-02-16 | ||
SG10202101561W | 2021-02-16 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2022177501A1 true WO2022177501A1 (en) | 2022-08-25 |
Family
ID=82932307
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/SG2021/050366 WO2022177501A1 (en) | 2021-02-16 | 2021-06-24 | A system and method for measuring vital body signs |
Country Status (1)
Country | Link |
---|---|
WO (1) | WO2022177501A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
SE2250253A1 (en) * | 2022-02-25 | 2023-08-26 | Detectivio Ab | Non-contact oxygen saturation estimation |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110251493A1 (en) * | 2010-03-22 | 2011-10-13 | Massachusetts Institute Of Technology | Method and system for measurement of physiological parameters |
KR20120083994A (en) * | 2011-01-19 | 2012-07-27 | 임경근 | Hairstyle simulation system and method |
CN109008964A (en) * | 2018-06-27 | 2018-12-18 | 浏阳市安生智能科技有限公司 | A kind of method and device that physiological signal extracts |
US20190008402A1 (en) * | 2014-10-04 | 2019-01-10 | Government Of The United States, As Represented By The Secretary Of The Air Force | Non-Contact Assessment of Cardiovascular Function using a Multi-Camera Array |
WO2019145142A1 (en) * | 2018-01-24 | 2019-08-01 | Koninklijke Philips N.V. | Device, system and method for determining at least one vital sign of a subject |
CN110279406A (en) * | 2019-05-06 | 2019-09-27 | 苏宁金融服务(上海)有限公司 | A kind of touchless pulse frequency measurement method and device based on camera |
US20200178809A1 (en) * | 2017-08-08 | 2020-06-11 | Koninklijke Philips N.V. | Device, system and method for determining a physiological parameter of a subject |
WO2020126713A1 (en) * | 2018-12-19 | 2020-06-25 | Koninklijke Philips N.V. | System and method for determining at least one vital sign of a subject |
US20200367773A1 (en) * | 2017-08-08 | 2020-11-26 | Koninklijke Philips N.V. | Device, system and method for determining a physiological parameter of a subject |
WO2021100994A1 (en) * | 2019-11-21 | 2021-05-27 | 주식회사 지비소프트 | Non-contact method for measuring biological index |
-
2021
- 2021-06-24 WO PCT/SG2021/050366 patent/WO2022177501A1/en active Application Filing
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110251493A1 (en) * | 2010-03-22 | 2011-10-13 | Massachusetts Institute Of Technology | Method and system for measurement of physiological parameters |
KR20120083994A (en) * | 2011-01-19 | 2012-07-27 | 임경근 | Hairstyle simulation system and method |
US20190008402A1 (en) * | 2014-10-04 | 2019-01-10 | Government Of The United States, As Represented By The Secretary Of The Air Force | Non-Contact Assessment of Cardiovascular Function using a Multi-Camera Array |
US20200178809A1 (en) * | 2017-08-08 | 2020-06-11 | Koninklijke Philips N.V. | Device, system and method for determining a physiological parameter of a subject |
US20200367773A1 (en) * | 2017-08-08 | 2020-11-26 | Koninklijke Philips N.V. | Device, system and method for determining a physiological parameter of a subject |
WO2019145142A1 (en) * | 2018-01-24 | 2019-08-01 | Koninklijke Philips N.V. | Device, system and method for determining at least one vital sign of a subject |
CN109008964A (en) * | 2018-06-27 | 2018-12-18 | 浏阳市安生智能科技有限公司 | A kind of method and device that physiological signal extracts |
WO2020126713A1 (en) * | 2018-12-19 | 2020-06-25 | Koninklijke Philips N.V. | System and method for determining at least one vital sign of a subject |
CN110279406A (en) * | 2019-05-06 | 2019-09-27 | 苏宁金融服务(上海)有限公司 | A kind of touchless pulse frequency measurement method and device based on camera |
WO2021100994A1 (en) * | 2019-11-21 | 2021-05-27 | 주식회사 지비소프트 | Non-contact method for measuring biological index |
Non-Patent Citations (4)
Title |
---|
ALGHOUL KARIM; ALHARTHI SAEED; AL OSMAN HUSSEIN; EL SADDIK ABDULMOTALEB: "Heart Rate Variability Extraction From Videos Signals: ICA vs. EVM Comparison", IEEE ACCESS, IEEE, USA, vol. 5, 1 January 1900 (1900-01-01), USA , pages 4711 - 4719, XP011646924, DOI: 10.1109/ACCESS.2017.2678521 * |
F. TSALAKANIDOU ; S. MALASSIOTIS: "Robust facial action recognition from real-time 3D streams", COMPUTER VISION AND PATTERN RECOGNITION WORKSHOPS, 2009. CVPR WORKSHOPS 2009. IEEE COMPUTER SOCIETY CONFERENCE ON, IEEE, PISCATAWAY, NJ, USA, 20 June 2009 (2009-06-20), Piscataway, NJ, USA , pages 4 - 11, XP031606940, ISBN: 978-1-4244-3994-2 * |
TARVAINEN M. P. ET AL.: "An advanced detrending method with application to HRV analysis", IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING, vol. 49, no. 2, 7 August 2002 (2002-08-07), pages 172 - 175, [retrieved on 20211006], DOI: 10.1109/10.979357 * |
XIONG P. ET AL.: "Combining Local and Global Features for 3D Face Tracking", PROCEEDINGS OF THE IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION (ICCV, 29 October 2017 (2017-10-29), pages 2529 - 2536, XP033303724, [retrieved on 20211006], DOI: 10.1109/ICCVW.2017.297 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
SE2250253A1 (en) * | 2022-02-25 | 2023-08-26 | Detectivio Ab | Non-contact oxygen saturation estimation |
WO2023163644A1 (en) * | 2022-02-25 | 2023-08-31 | Detectivio Ab | Non-contact oxygen saturation estimation using ambient light |
SE545755C2 (en) * | 2022-02-25 | 2024-01-02 | Detectivio Ab | Non-contact oxygen saturation estimation |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Tasli et al. | Remote PPG based vital sign measurement using adaptive facial regions | |
McDuff et al. | Remote detection of photoplethysmographic systolic and diastolic peaks using a digital camera | |
AU2016201690B2 (en) | Method and system for noise cleaning of photoplethysmogram signals | |
US10448900B2 (en) | Method and apparatus for physiological monitoring | |
Sinhal et al. | An overview of remote photoplethysmography methods for vital sign monitoring | |
US20110251493A1 (en) | Method and system for measurement of physiological parameters | |
KR101738278B1 (en) | Emotion recognition method based on image | |
US20180303351A1 (en) | Systems and methods for optimizing photoplethysmograph data | |
KR102215557B1 (en) | Heart rate estimation based on facial color variance and micro-movement | |
US20200015688A1 (en) | Blood pressure measurement method, device and storage medium | |
Alnaggar et al. | Video-based real-time monitoring for heart rate and respiration rate | |
WO2022177501A1 (en) | A system and method for measuring vital body signs | |
EP3318179B1 (en) | Method for measuring respiration rate and heart rate using dual camera of smartphone | |
Lamba et al. | Contactless heart rate estimation from face videos | |
Kurita et al. | Non-contact video based estimation for heart rate variability spectrogram using ambient light by extracting hemoglobin information | |
Jensen et al. | Camera-based heart rate monitoring | |
US20230000376A1 (en) | System and method for physiological measurements from optical data | |
US20230397826A1 (en) | Operation method for measuring biometric index of a subject | |
Wu et al. | Peripheral oxygen saturation measurement using an rgb camera | |
Hu et al. | Study on Real-Time Heart Rate Detection Based on Multi-People. | |
US20200155008A1 (en) | Biological information detecting apparatus and biological information detecting method | |
JP2021023490A (en) | Biological information detection device | |
Ozawa et al. | Improving the accuracy of noncontact blood pressure sensing using near-infrared light | |
CN115245318A (en) | Automatic identification method of effective IPPG signal based on deep learning | |
CN112784731A (en) | Method for detecting physiological indexes of driver and establishing model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 21926958 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 21926958 Country of ref document: EP Kind code of ref document: A1 |