CN110321782B

CN110321782B - System for detecting human body characteristic signals

Info

Publication number: CN110321782B
Application number: CN201910371120.5A
Authority: CN
Inventors: 王元
Original assignee: Suning Financial Services Shanghai Co ltd
Current assignee: Shanghai Star Map Financial Services Group Co.,Ltd.
Priority date: 2019-05-06
Filing date: 2019-05-06
Publication date: 2023-09-19
Anticipated expiration: 2039-05-06
Also published as: CN110321782A

Abstract

The embodiment of the invention discloses a system for detecting human body characteristic signals, which relates to the technical field of computer images and vision and can realize the remote measurement of non-contact human body characteristic signals (pulse rate). The invention comprises the following steps: receiving image information acquired by image acquisition equipment, identifying a human face from an image and extracting facial feature points, wherein a depth camera is installed in the image acquisition equipment, and the acquired image information comprises: depth of field frames; tracking a target position in a human face to obtain image information of the target position, wherein the target position comprises: the facial feature points are located; extracting skin pixels using the image information of the target location; and acquiring pulse rate related characteristic signals by using the extracted skin pixels, acquiring pulse rate time sequences according to the continuously generated pulse rate related characteristic signals, and outputting measurement results. The invention is suitable for non-contact human body sign signal measurement.

Description

System for detecting human body characteristic signals

Technical Field

The invention relates to the technical field of computer images and vision, in particular to a system for detecting human body characteristic signals.

Background

Currently, non-contact-based detection of human body physical sign characteristic signals, such as pulse rate (heartbeat) measurement, is one of the directions of scientific research in academia and industry. The non-contact acquisition of human heartbeats has wide business demands and commercial values in the fields of medical treatment, finance, transportation and the like.

Pulse rate measurements are currently commonly done in the industry using Electrocardiography (ECG) and photoplethysmography (PPG). However, both pulse rate measurement methods require that a sensor (an electrode or a PPG light sensor) is placed on the skin surface of the subject, i.e., the measurement apparatus needs to be very close to or in contact with the subject, and the main reason is that the limitation of the processing method of the signal collected by the detection apparatus determines that a contact type sensor must be used, and the problem of inconvenient use always exists in the contact type measurement scheme.

Disclosure of Invention

The embodiment of the invention provides a system for detecting human body characteristic signals, which can realize remote acquisition and processing of human body characteristic signals (pulse rates) so as to realize non-contact pulse rate measurement.

In order to achieve the above purpose, the embodiment of the present invention adopts the following technical scheme:

recognizing a human face from a shot image, and extracting facial feature points;

Tracking a target position in a human face to obtain image information of the target position, wherein the target position comprises: and the facial feature points are positioned.

Further extracting pulse rate related characteristic signals according to skin pixels extracted from the image information of the target position; selecting sample points from the extracted pulse rate related characteristic signals, and carrying out signal fusion according to the selected sample points; and calculating the pulse rate according to the fused pulse rate related characteristic signals.

Specifically, acquiring a region of interest (ROI) according to image information of a target position; from among the pixels within the ROI, the skin pixels are identified.

Wherein each ROI mesh needs to be located and skin pixels within each ROI mesh are identified; and obtaining the average pixel intensity of the skin pixels and the number of the skin pixel points in each ROI grid.

When the image information of the target position is a near infrared frame, resampling, normalizing and filtering are sequentially carried out on the average pixel intensity of the skin pixels in the ROI grid; when the image information of the target position is a color frame, RGB three-color signals are combined to generate a chrominance signal, and then the chrominance signal is sequentially subjected to mixing, resampling, dynamic castration (detrending) and filtering.

Specifically, counting the change condition of the number of skin pixel points of skin pixels in each ROI grid to obtain the quartile range (interquartile range, IQR) of each ROI grid; the selecting sample points from the extracted pulse rate related characteristic signals comprises the following steps: and acquiring the IQR and the signal-to-noise ratio of each ROI grid, and eliminating the ROI grids with the IQR higher than the maximum IQR threshold and the signal-to-noise ratio lower than the minimum signal-to-noise ratio threshold.

In the embodiment, the pulse rate of the measured object is measured by analyzing the pixels of the facial image based on the face recognition of the camera, other auxiliary hardware is not needed, the measured person does not need to carry any close-fitting sensor, and the remote acquisition and processing of the human body sign signals (pulse rate) are realized, so that the non-contact pulse rate measurement is realized.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic diagram of a system framework of an operating environment provided by an embodiment of the present invention;

FIG. 2a is a schematic diagram of an operation flow provided in an embodiment of the present invention;

fig. 2b, fig. 2c, fig. 2d are schematic diagrams of a system architecture according to an embodiment of the present invention;

fig. 3, fig. 4, and fig. 5 are schematic diagrams of specific examples provided in the embodiments of the present invention;

fig. 6 is a flowchart of sample automatic selection in an embodiment of the present invention.

Detailed Description

The present invention will be described in further detail below with reference to the drawings and detailed description for the purpose of better understanding of the technical solution of the present invention to those skilled in the art. Embodiments of the present invention will hereinafter be described in detail, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions throughout. The embodiments described below by referring to the drawings are exemplary only for explaining the present invention and are not to be construed as limiting the present invention. As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or coupled. The term "and/or" as used herein includes any and all combinations of one or more of the associated listed items. It will be understood by those skilled in the art that, unless otherwise defined, all terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. It will be further understood that terms, such as those defined in commonly used dictionaries, should be interpreted as having a meaning that is consistent with their meaning in the context of the prior art and will not be interpreted in an idealized or overly formal sense unless expressly so defined herein.

The method flow in this embodiment may be specifically executed in a system as shown in fig. 1, where the method flow includes: image acquisition equipment, detection terminal and high in the clouds server.

The image capturing device in this embodiment may be a camera device with an independent shooting function, and the camera device is provided with a communication module, and may communicate with a cloud server, for example, a security camera commonly used at present. The camera is specifically arranged in a designated area, such as a security inspection position, and is used for shooting facial images of inspected personnel; for another example: the system is arranged on a cradle head and used for shooting facial images of everyone in a crowd, the cradle head can be arranged in a building or outdoors, and a specific system can adopt a 'space net' system used in some cities at present.

The image acquisition device can specifically adopt a digital camera or an analog camera. The digital camera can convert the shot analog video signals into digital signals and then transmit the digital signals to a cloud server connected with the camera. The video signal captured by the analog camera is converted into a digital mode through the video acquisition card, compressed and transmitted to the cloud server connected with the camera. And the specific scheme of the embodiment can also be applied to various cameras, such as a pure color camera (RGB camera), a pure Near Infrared (NIR) camera, a depth camera and the like.

The cloud server disclosed in this embodiment may be a device such as a blade machine, a workstation, a supercomputer, or a server cluster system for data processing, where the server cluster system is composed of a plurality of server devices. The cloud server can interact data with the detection terminal through a mobile wireless network or an internet mode, and the specific data interaction mode or the communication mode is achieved by adopting the existing network standard and communication scheme, and details are omitted in the embodiment.

An embodiment of the present invention provides a system for detecting a human body characteristic signal, as shown in fig. 2a, including:

s101, receiving image information acquired by an image acquisition device, identifying a human face from the image and extracting facial feature points.

Wherein, install the depth camera in the image acquisition equipment, the image information who gathers includes: depth frames. In this embodiment, the existing face recognition technology may be used for the recognition of the face region. The scheme of the embodiment focuses on further image feature extraction and analysis on the identified face region. Cameras for capturing face images may be of various types and may be integrated in various detection terminals, for example:

The image acquisition device may also be a camera integrated on the detection terminal, such as: cameras on smartphones (current smartphones have realized multi-camera shooting and have applied pure color cameras (RGB cameras), pure Near Infrared (NIR) cameras, wide angle cameras, depth cameras, etc.).

The detection terminal may be implemented as a single device, or may be integrated with a personal detection terminal of various users, including: smart phones, tablet computers (Tablet Personal Computer), laptop computers (Laptop computers), personal digital assistants (personal digital assistant, PDA for short), or Wearable devices (wearmable devices), etc.; the detection terminal may also be integrated in a special recording instrument. The operation recorder comprises a portable camera and a storage device, such as a currently common automobile data recorder or a live video camera.

Wherein, install the depth camera in the image acquisition equipment, the image information who gathers includes: depth frames.

S102, tracking a target position in a human face to obtain image information of the target position.

And continuously tracking the target position in the human face for a period of time, and obtaining a continuously and dynamically changed image frame of the target position, wherein the obtained image frame is taken as the image information of the target position. In the present embodiment, the "image frame of a certain position" may be understood as an image extracted from a point where a target position is located or a further refined area in a complete image frame captured by an image capturing apparatus, such extracted image being a part of the complete image frame.

Specifically, the target location includes: and the facial feature points are positioned. The facial feature points refer to feature positions of a human face through a face feature point recognition algorithm, for example: eyebrows, eyes, nose, mouth, face contours, etc. Optionally, the target position further comprises a head position and a gaze position. The head position and the gaze position are obtained by a head recognition algorithm and a gaze recognition algorithm, respectively.

S103, extracting skin pixels by using the image information of the target position.

The skin pixels refer to pixels in the photographed image, which are in the area where the face is located and are identified as skin areas.

S104, acquiring pulse rate related characteristic signals by using the extracted skin pixels, acquiring pulse rate time sequences according to the continuously generated pulse rate related characteristic signals, and outputting measurement results.

Wherein, the pulse rate time sequence records the pulse rate value obtained continuously, and the pulse rate time sequence can be used as a measurement result and output. Further data processing can also be performed on the pulse rate time series, i.e. higher visual results can be further obtained by the pulse rate time series, such as: the character information such as X% of the maximum pulse rate, and quick heartbeat is convenient for the user to check.

In addition, the specific form of the output measurement result in this embodiment is not limited, and may depend on specific application scenarios, for example: the physiological data can be directly output to a screen of the intelligent terminal of the user or output to a cloud server, and the physiological data is recorded by the cloud server as physiological data of the user.

Compared with the prior art, the method has the advantages that the sensor (electrode or PPG light sensor) is arranged on the skin surface of the tested person, namely the measuring instrument needs to be very close to or released from the tested person. In the embodiment, the pulse rate of the measured object is measured by analyzing the pixels of the facial image based on the face recognition of the camera, other auxiliary hardware is not needed, the measured person does not need to carry any close-fitting sensor, and the non-contact remote measurement of the human body sign signal (pulse rate) is realized. Furthermore, as the body surface sensor is not required to be arranged, the pulse rate measurement directly depends on the camera as the only signal acquisition hardware equipment, and the scheme of the embodiment is also suitable for simultaneously measuring pulse rates of multiple people and can be deployed based on the video monitoring system which is built at present, so that the construction cost of hardware is saved.

Further, a color camera (RGB camera) is further installed in the image capturing device, and the captured image information includes: a color frame; and/or, a Near Infrared (NIR) camera is installed in the image acquisition device, and the acquired image information comprises: near infrared frames.

From the system implementation level, the pulse rate measurement systems can be deployed separately. The deployment architecture is mainly characterized in that whether the pulse rate measurement algorithm is deployed in front of the detection terminal or behind the detection terminal and the cloud end is different, and different specific application scenes can be supported by the difference of deployment modes:

architecture embodiment one

As shown in fig. 2b, the image capturing device is a camera device, and the captured uncompressed video stream(s) of the subject(s) is (are) transmitted to the cloud server via a network (the data interaction may be specifically based on RTSP (Real Time Streaming Protocol, RFC2326, real-time streaming protocol, which is an application layer protocol in the TCP/IP protocol system), RTMP (Real Time Messaging Protocol, real-time messaging protocol), and other network transmission protocols), and the cloud server runs the pulse rate measurement algorithm to detect the pulse rate. The complete pulse rate measurement engine is deployed at the cloud server, i.e.: recognizing a human face from the image and extracting facial feature points; tracking a target position in a human face to obtain image information of the target position; extracting skin pixels using the image information of the target location; and obtaining pulse rate related characteristic signals by using the extracted skin pixels, wherein the complete execution flow of the pulse rate time sequence obtained according to the continuously generated pulse rate related characteristic signals is put on the cloud server, and the detection terminal only needs to wait for the measurement result output by the cloud server.

Architecture implementation II

And the cloud server receives the measurement results sent by the detection terminals, arranges the measurement results sent by the detection terminals according to the time sequence and records the measurement results. And the measurement result sent by the detection terminal is calculated by the detection terminal.

As shown in fig. 2c, in the second architecture embodiment, the complete pulse rate measurement engine is deployed at the detection terminal, the pulse rate measurement algorithm is completely run directly on the detection terminal, such as a smart phone, and the pulse rate calculation is performed locally by the detection terminal. The method comprises the following steps: recognizing a human face from the image and extracting facial feature points; tracking a target position in a human face to obtain image information of the target position; extracting skin pixels using the image information of the target location; and obtaining pulse rate related characteristic signals by using the extracted skin pixels, and putting the complete execution flow of the pulse rate time sequence obtained according to the continuously generated pulse rate related characteristic signals on a cloud server. The detection terminal outputs the measurement result through the output equipment of the detection terminal, for example, the measurement result is displayed on a screen or the voice for reading the measurement result is played through a loudspeaker, and the cloud server records the measurement result uploaded by the detection terminal, arranges the measurement results sent by the detection terminal according to the time sequence and records the measurement result.

Architecture embodiment III

The detection terminal acquires the image information acquired by the image acquisition equipment, and identifies a human face from the image and extracts facial feature points; the detection terminal tracks a target position in a human face to obtain image information of the target position, and then extracts skin pixels by utilizing the image information of the target position, wherein the target position comprises: the facial feature points are located; and the detection terminal sends the extracted skin pixels to the cloud server.

The cloud server acquires pulse rate related characteristic signals by using the extracted skin pixels, obtains pulse rate time sequences according to the continuously generated pulse rate related characteristic signals, and outputs measurement results to the detection terminal.

As shown in fig. 2d, in a specific application, the pulse rate measurement algorithm may be divided according to the actual network bandwidth condition, so that the processes of face recognition, ROI calculation and pulse rate feature extraction are directly operated on the front-end device, and the collected and preprocessed data is transmitted to the cloud in a lossless compression manner, and the data is decompressed at the cloud and then the pulse rate calculation model is operated, so as to obtain the final measurement result. Therefore, the real-time performance and the accuracy of front-end image information acquisition and tracking are ensured, and the calculated amount of the detection terminal is reduced.

The present embodiment provides a system for detecting a human body characteristic signal, including:

and extracting pulse rate related characteristic signals according to the extracted skin pixels.

Sample points are selected from the extracted pulse rate related characteristic signals, and signal fusion is carried out according to the selected sample points.

Specifically, since the real pulse rate signal is weak and small, a single ROI grid is not enough to extract the pulse rate signal with high quality, so that the pulse rate characteristic signal generated by the grid with excellent signal quality needs to be weighted and added to generate the pulse rate signal with higher signal-to-noise ratio, and the calculation formula can be expressed as follows:

wherein S is _final Representing the fused signal, S _i Representing a single ROI gridThe signal p () represents a fusion function, where the fusion function may employ a currently existing signal processing, fusion algorithm, and N is an integer greater than 1.

And calculating the pulse rate according to the fused pulse rate related characteristic signals.

The pulse rate solving calculation is performed on the generated pulse rate signal, specifically, in the corresponding Probability Spectrum Density (PSD), the frequency maximum value f_peak is searched, the corresponding pulse rate is f_peak multiplied by 60, and the pulse rate is physically interpreted as the number of beats per minute. Furthermore, according to the application scene, in the application requiring real-time connection and tracking of the pulse rate, the secondary processing is performed by adopting a smoothing filtering technology according to the historical time sequence and trend of the pulse rate, so that a single point of the pulse rate calculation error can be corrected in time.

In this embodiment, there is also provided a specific manner of extracting skin pixels, including: a region of interest (ROI) is acquired from image information of the target location. From among the pixels within the ROI, the skin pixels are identified.

Wherein said identifying said skin pixels from among pixels within said ROI comprises: each ROI mesh is located and skin pixels within each ROI mesh are identified. And obtaining the average pixel intensity of the skin pixels and the number of the skin pixel points in each ROI grid. Specifically, after the skin pixels are confirmed in each ROI grid, the characteristic signal extraction is performed by taking each grid as a unit, and the average pixel intensity of the skin pixels and the number of the skin pixels are respectively recorded.

Specifically, in the process of extracting the pulse rate related characteristic signal:

and when the image information of the target position is a near infrared frame, resampling, normalizing and filtering are sequentially carried out on the average pixel intensity of the skin pixels in the ROI grid.

When the image information of the target position is a color frame, RGB three-color signals are combined to generate a chrominance signal, and then the chrominance signal is sequentially subjected to mixing, resampling, dynamic castration (detrending) and filtering.

Specifically, during pulse rate feature signal processing, further signal processing is required for the average pixel intensity and the number of skin pixels acquired within each ROI mesh. The characteristic processing of the average pixel intensity is that the system sequentially resamples, normalizes and filters for near infrared frames. For color images, one more additional process is added: the RGB three-color signals are combined to generate a chrominance signal, wherein, S=g (R, G, B), and a function G () is responsible for carrying out signal processing flows of mixing, resampling, dynamic castration (detrending), filtering and the like on the RGB signals. The feature processing of the skin pixel count is to generate skin pixel count variation statistics, i.e., calculate its first derivative and corresponding quartile range (IQR).

Further, the method further comprises the following steps: and counting the change condition of the number of the skin pixels in each ROI grid to obtain the quartile range (interquartile range, IQR) of each ROI grid.

Thus, selecting sample points from the extracted pulse rate related feature signals comprises:

and acquiring the IQR and the signal-to-noise ratio of each ROI grid, and eliminating the ROI grids with the IQR higher than the maximum IQR threshold and the signal-to-noise ratio lower than the minimum signal-to-noise ratio threshold.

In this embodiment, for each ROI grid, there are some grid pulse rate characteristic signals, because the measured person moves, expression, illumination changes, etc., are disturbed, because the system is required to automatically recognize and reject. The process may be automatically selected by a sample as shown in fig. 6. The final auto-purge is considered by the system as noisy, unsuitable for extracting ROI samples of the pulse rate signal, the remaining subset of samples being used for the next signal fusion. Specifically, clustering operation is carried out through the IQR and the signal-to-noise ratio of each grid, and the grids with high IQR and low signal-to-noise ratio are automatically removed.

In the present embodiment, before extracting the pulse rate related feature signal from the extracted skin pixel, the target position for the extracted skin pixel is also determined by the following procedure:

a face is recognized from the photographed image, and facial feature points are extracted.

In this embodiment, the existing face recognition technology may be used for the recognition of the face region. The scheme of the embodiment focuses on further image feature extraction and analysis on the identified face region. Cameras for capturing face images may be of various types and may be integrated in various detection terminals, for example:

The face recognition module used in this embodiment may in principle use any mainstream face recognition engine. For example: face (supporting multiple faces) can be positioned and marked by using a Viola Jones algorithm, face feature point positioning is achieved by using a DRMF algorithm, and real-time tracking of feature points is achieved by combining a KLT (Kanade-Lucas-Tomasi Tracking Method) method with an MSAC algorithm.

Tracking the target position in the face to obtain the image information of the target position.

Wherein the target location comprises: and the facial feature points are positioned.

Specifically, the skin pixel refers to a pixel in the captured image, which is in the area where the face is located and is identified as a skin area.

And then, acquiring pulse rate related characteristic signals by using the extracted skin pixels, acquiring pulse rate time sequences according to the continuously generated pulse rate related characteristic signals, and outputting measurement results.

In this embodiment, the specific way of extracting the facial feature points may include:

And locating the face position in the acquired image information. And then, acquiring the position of the facial feature point according to the position of the face. For example, taking a color camera (RGB camera) as an example, the process of performing face recognition and facial feature points through the RGB camera includes:

color frames (RGB) photographed by the camera are recorded.

Optionally, the color frames may also be pre-processed to provide image quality, such as white balance, exposure compensation, etc. Many cameras acquire image information already through hardware internal processing, so this step is an optional step.

Face recognition algorithm is adopted to position the face in the image and label a block diagram (binding box)

Positioning characteristic parts of the human face by adopting a human face characteristic point recognition algorithm: eyebrows, eyes, nose, mouth, face contours, etc.

And dynamically tracking the characteristic points of the human face in real time, and simultaneously estimating the head position and the gaze position. The head position and gaze position are optional modules.

When the embodiment is applied to different camera hardware devices, the embodiment can be decomposed into 3 kinds of sub-schemes according to the camera type:

one, pulse rate measurement based on a color camera (RGB camera) or near infrared camera, as shown in fig. 3:

Image information collected by a color camera (RGB camera). Wherein, the image information that the color camera gathered includes: color frames.

Alternatively, image information acquired by a Near Infrared (NIR) camera is acquired, the image information acquired by the NIR camera including a near infrared frame.

The extracting skin pixels using the image information of the target location includes: a region of interest (ROI) is acquired from the image information of the target location. From among the pixels within the ROI, the skin pixels are identified. Specifically, the ROI processing method adopted in this embodiment approximately includes:

a region of interest (ROI) is calculated in real time according to the face position, the face feature point position and other auxiliary information such as the head position, the face dynamic tracking displacement matrix and the like.

Identifying whether the pixels in the ROI are human skin pixels, and eliminating corresponding non-skin pixels including glasses, hair and the like.

The ROI calculation also includes extraction of the background. The background information helps to improve the pulse rate characteristic calculation signal quality. This step may be optional depending on the traffic scenario configuration. In this embodiment, background extraction uses a 2-dimensional image-based algorithm for non-depth cameras, such as Distance Regularized Level Set Evolution (DRLSE); and for the depth camera, the foreground image part is directly removed by using the depth frame to obtain the background image.

In this embodiment, a color camera (RGB camera) photographs a color frame, which is usually in the form of signals of a plurality of color channels, for example: 3 channels (red, green, blue), each of which is a long x wide 2-dimensional matrix, i.e. a matrix of pixels, such as 1920 x 1080, each of which typically has a value in the range of 0-255, typically 8 bits of accuracy.

And infrared frames differ from color frames in that: the infrared frame has only a pixel matrix of one channel, and each pixel value is in the range of 0-255, usually 8 bits of accuracy. Therefore, the logic flow of processing the color frames captured by the color camera (RGB camera) and the infrared frames collected by the Near Infrared (NIR) camera in this embodiment is basically the same, and the difference is that the algorithms (calculation models) used by the color camera (RGB camera) and the Near Infrared (NIR) camera in the links of skin recognition and pulse rate feature processing are different.

In this embodiment, the calculation method of the ROI generally includes: and cutting the face rectangular frame marked by the face recognition module into rectangular small grids with the length of 20 pixels and the width of 20 pixels, wherein the grid size is configurable. The region of the ROI is the forehead and cheek region. When the depth camera is used, the grid size is automatically calculated according to the area relation between the depth of field and the rectangular frame of the face so as to reach the designated grid number. The ROI mesh tracking is calculated by using a transformation matrix of face feature point tracking, namely a new ROI vector=a×an old ROI vector, where "×" is a matrix multiplication operation, and a is a transformation matrix.

The calculation mode of the ROI grid is ROI _1…N ＝f(bbox,w,h,landmark _1…M d) Wherein bbox represents the face rectangular frame azimuth marked by the face recognition module, w and h represent the width and the height of the grid respectively, the unit is a pixel, d represents a depth frame, landmark represents the position of a face feature point, and M and N are positive integers larger than 1.

After the ROI grids are obtained, the system performs skin recognition calculation, i.e., determines whether the pixels in each grid are skin. For a color camera, the skin recognition logic is implemented in an RGB color space and a YCbCr color space; for the near infrared camera, the skin identification is realized by jointly acquiring and calculating the statistical rule of skin pixels under the gray level map through a Bayesian model and a distance model (distance-based prior probability); for the depth camera, the skin identification can be calculated independently according to the color frame or the near infrared frame, and if the color frame and the near infrared frame exist at the same time, the skin identification can also be indirectly obtained by geometrically mapping the near infrared frame to the color frame.

Secondly, on the basis of pulse rate measurement based on a color camera (RGB camera) or a near infrared camera, a depth camera is further applied, and the method further comprises the following steps: and acquiring image information acquired by the depth camera, wherein the image information acquired by the depth camera comprises a depth frame. Wherein, the color camera, the near infrared camera and the depth camera are mutually independent in hardware. The detection terminal can also measure pulse rate based on the depth frame only, and in the preferred scheme, a measurement mode based on a color frame and the depth frame or based on a near infrared frame and the depth frame is adopted.

The structured light parameters collected by the depth camera can be led into the ROI processing mode process. In particular, since structured light based depth cameras typically contain both color frames, near infrared frames and depth frames. Therefore, referring to the above-mentioned technical solution, the pulse rate measurement signal processing flow based on the structured light depth camera supports the color frame as the main image information source, the near infrared frame and the depth frame for assistance, and also supports the near infrared image as the main image information source, the color frame and the depth frame for assistance. As shown in fig. 4, the technical scheme based on structured light is used in the sub-module algorithms of ROI calculation, skin recognition, background extraction, pulse rate characteristic processing and signal fusion in the pulse rate calculation kernel, and simultaneously, color frame, near infrared frame and depth frame information are utilized to output a result with better anti-interference performance, so that the accuracy and the robustness of final pulse rate measurement are improved.

The process of extracting skin pixels by using the image information of the target position includes:

and acquiring a region of interest (ROI) according to the image information of the target position and the depth frame acquired by the depth camera. And identifying the skin pixels from the pixels in the ROI by using the depth field frames acquired by the depth camera.

Namely, the image acquisition device acquires a color frame and a depth frame or a near infrared frame and a depth frame. And depth of field frames are added in the ROI calculation and skin recognition stages, and color frames or near infrared frames are also applied

Thirdly, based on pulse rate measurement based on a dual-color camera (RGB camera) or a dual near infrared camera, a depth camera is further applied, comprising:

acquiring image information acquired by a double-color camera, wherein the image information acquired by the double-color camera comprises: a first color frame and a second color frame.

Or, acquiring image information acquired by the double near infrared cameras, wherein the image information acquired by the double near infrared cameras comprises a first near infrared frame and a second near infrared frame.

The technical scheme based on the binocular depth camera or the TOF depth camera is similar to that of the monocular camera, and the difference is that the depth camera provides depth information, and the information is input into the ROI calculation as shown in fig. 5 so as to improve the performance of a background extraction and skin recognition algorithm.

and acquiring a region of interest (ROI) according to the image information of the target position and the depth frame acquired by the depth camera. From among the pixels within the ROI, the skin pixels are identified.

Wherein the image information of the target position is acquired from a dual color camera or a dual near infrared camera. Namely, the image acquisition equipment acquires a double-path color frame and a depth frame or a double-path near infrared frame and a depth frame. And a depth frame is added in the ROI calculation stage, and a color frame or a near infrared frame is also applied.

In this embodiment, the obtaining the pulse rate time sequence by using the continuously generated pulse rate related characteristic signal includes:

s1041, extracting pulse rate related characteristic signals according to the time dimension according to the extracted skin pixels.

Specifically, the pulse rate related feature signal may be extracted in a time dimension for skin pixels in the ROI. And performing a series of signal processing including resampling, noise reduction, filtering, signal synthesis and the like on the extracted pulse rate characteristic signals. Furthermore, the pulse rate characteristic signal processing can utilize the background related characteristic signal, thereby being beneficial to improving the pulse rate characteristic signal processing quality. The background-based feature extraction and processing is an optional auxiliary module.

S1042, selecting sample points from the extracted pulse rate related characteristic signals, and performing signal fusion according to the selected sample points.

And automatically selecting sample points according to the signal quality of each characteristic. Signal fusion is performed according to the selected sample points to improve signal quality

S1043, performing pulse rate calculation according to the fused pulse rate related characteristic signals to obtain a continuously generated pulse rate time sequence.

And performing pulse rate calculation according to the fused signals to obtain a continuously generated pulse rate time sequence. And further smoothing and noise correction processing is carried out on the continuously generated pulse rate time sequence.

The non-contact remote pulse rate measurement system for multiple persons simultaneously provided by the embodiment not only supports the traditional monocular cameras (RGB and NIR) but also supports all main stream depth camera structures (binocular, TOF and structured light) based on the pulse rate measurement of the cameras. The method expands the business scene boundary of pulse rate measurement, has wide application scene, and is suitable for a plurality of industries such as medical treatment, security protection, traffic, finance and the like.

The embodiment also provides a device for detecting the human body characteristic signals, which can write corresponding functional modules through computer programs and run on a detection terminal; the camera can also transmit the shot image data to the cloud server, and the cloud server directly analyzes and processes the shot image data, namely, the device can be realized as an online program, the camera is only used as a shooting tool at the front end, the method flow in the embodiment is executed on the cloud server, and the front-end camera and the cloud processing mode are gradually mature under the current 5G-based technical framework. The device, as shown in fig. 6, comprises:

The image processing module is used for extracting pulse rate related characteristic signals according to the extracted skin pixels;

the analysis module is used for selecting sample points from the extracted pulse rate related characteristic signals and carrying out signal fusion according to the selected sample points;

and the calculation module is used for calculating the pulse rate according to the fused pulse rate related characteristic signals.

The image processing module is specifically used for acquiring a region of interest (ROI) according to the image information of the target position; from among the pixels within the ROI, the skin pixels are identified.

The image processing module is also used for positioning each ROI grid and confirming skin pixels in each ROI grid; obtaining the average pixel intensity of skin pixels and the number of skin pixel points in each ROI grid; in the process of extracting pulse rate related characteristic signals: when the image information of the target position is a near infrared frame, resampling, normalizing and filtering are sequentially carried out on the average pixel intensity of the skin pixels in the ROI grid; when the image information of the target position is a color frame, combining RGB three-color signals to generate a chrominance signal, and then sequentially carrying out mixing, resampling, dynamic castration (detrending) and filtering on the chrominance signal; and counting the change condition of the number of the skin pixel points of the skin pixels in each ROI grid to obtain the quartile range (interquartile range, IQR) of each ROI grid.

The analysis module is specifically configured to obtain an IQR and a signal-to-noise ratio of each ROI grid, and reject ROI grids with an IQR higher than a maximum IQR threshold and a signal-to-noise ratio lower than a minimum signal-to-noise ratio threshold.

Further, the method further comprises the following steps:

the preprocessing module is used for recognizing a human face from the shot image and extracting facial feature points;

the positioning module is used for tracking a target position in a human face to obtain image information of the target position, and the target position comprises: and the facial feature points are positioned.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the apparatus embodiments, since they are substantially similar to the method embodiments, the description is relatively simple, and reference is made to the description of the method embodiments for relevant points. The foregoing is merely illustrative of the present invention, and the present invention is not limited thereto, and any changes or substitutions easily contemplated by those skilled in the art within the scope of the present invention should be included in the present invention. Therefore, the protection scope of the present invention should be subject to the protection scope of the claims.

Claims

1. A system for detecting a characteristic signal of a person, comprising:

the cloud server receives image information acquired by image acquisition equipment, identifies a human face from an image and extracts facial feature points, the image acquisition equipment is provided with a depth camera, and the acquired image information comprises: depth of field frames; the color camera is installed in the image acquisition equipment, and the acquired image information comprises: a color frame; and/or, the near infrared camera is installed in the image acquisition equipment, and the acquired image information comprises: near infrared frames;

Tracking a target position in a human face to obtain image information of the target position, wherein the target position comprises: the facial feature points are located;

extracting skin pixels using the image information of the target location;

acquiring pulse rate related characteristic signals by using the extracted skin pixels, acquiring pulse rate time sequences according to the continuously generated pulse rate related characteristic signals, and outputting measurement results;

the extracting skin pixels using the image information of the target location includes: acquiring a region of interest (ROI) according to the image information of the target position; identifying the skin pixels from among pixels within the ROI;

the extracting skin pixels using the image information of the target location includes: acquiring a region of interest (ROI) according to the image information of the target position and a depth frame acquired by the depth camera; identifying the skin pixels from the pixels in the ROI by using depth frames acquired by the depth camera; the method comprises the following steps: cutting the human face rectangular frame marked by the human face recognition module into rectangular small grids with the length of 20 pixels and the width of 20 pixels, wherein the region of the ROI is the forehead and cheek part; when the depth camera is used, the size of the ROI grids is automatically calculated according to the area relation between the depth of field and the rectangular frame of the face so as to reach the designated grid number; after the ROI grids are obtained, the system performs skin recognition calculation, namely judging whether the pixel points in each grid are skin or not, wherein the calculation mode of the ROI grids is as follows: Wherein bbox represents the face rectangular frame azimuth marked by the face recognition module, w and h represent the width and the height of the grid respectively, the unit is a pixel, d represents a depth frame, landmark represents the position of a face feature point, and M and N are positive integers larger than 1.

2. The system of claim 1, wherein the cloud server receives image information acquired by an image acquisition device, comprising:

and the cloud server receives the uncompressed video stream sent by the camera equipment.

3. The system of claim 1, further comprising:

the image acquisition equipment is a camera integrated on the detection terminal, the cloud server receives measurement results sent by the detection terminal, the measurement results sent by the detection terminal are arranged according to time sequence and recorded, and the measurement results sent by the detection terminal are calculated by the detection terminal.

4. A system according to claim 3, further comprising:

the detection terminal acquires the image information acquired by the image acquisition equipment, and identifies a human face from the image and extracts facial feature points;

the detection terminal tracks a target position in a human face to obtain image information of the target position, and then extracts skin pixels by utilizing the image information of the target position, wherein the target position comprises: the facial feature points are located;

The detection terminal sends the extracted skin pixels to the cloud server;

5. The system of claim 1, wherein the extracting facial feature points comprises:

positioning the face position in the acquired image information;

and acquiring the position of the facial feature point according to the position of the human face, wherein the target position of the system also comprises a head position and a gaze position.

6. The system of claim 1, wherein the deriving a pulse rate time series from the continuously generated pulse rate related characteristic signal comprises:

extracting pulse rate related characteristic signals according to the time dimension according to the extracted skin pixels;

selecting sample points from the extracted pulse rate related characteristic signals, and carrying out signal fusion according to the selected sample points;

and (3) performing pulse rate calculation according to the fused pulse rate related characteristic signals to obtain a continuously generated pulse rate time sequence.