KR101738278B1

KR101738278B1 - Emotion recognition method based on image

Info

Publication number: KR101738278B1
Application number: KR1020150191185A
Authority: KR
Inventors: 홍광석; 오병훈; 서은주
Original assignee: 성균관대학교산학협력단
Priority date: 2015-12-31
Filing date: 2015-12-31
Publication date: 2017-05-22

Abstract

An emotion recognition method using an image is a method in which a computer device detects at least one region of interest of a first region of interest of a face region included in the image and a skin region included in the image or a second region of interest of the face region Extracting at least one of a morphological feature value in the first region of interest and a skin color value in the second region of interest; and the computer device estimating a bio-signal using the color value Mapping the at least one of the morphological feature value and the bio-signal to an intensity value for the emotional response value; and the computer device applying the intensity value to the emotional response value model, .

Description

TECHNICAL FIELD [0001] The present invention relates to an emotion recognition method using an image,

The technique described below relates to a technique for recognizing human emotions based on images.

Recently, there are active researches on "Affective Computing" which is a technology for predicting or recognizing human emotions from the viewpoint of interface through emotional sympathy as well as external interaction between human and computer. Emotional computing basically requires a technique of recognizing human emotions by analyzing certain digital data.

Korean Patent Publication No. 10-2010-0128023

The technique described below is to provide a technique of recognizing human emotions using facial expressions and / or bio-signals based on images.

The technique described below can easily detect a human emotion by detecting a bio-signal using only a video image without using another sensor device. Furthermore, the technique described below can accurately recognize human emotions by applying human facial expressions (facial morphological features) in addition to biological signals to emotion models.

1 is an example of a system for performing emotion recognition using an image.
2 is an example of a flow chart of an emotion recognition method using an image.
3 is another example of a flow chart of the emotion recognition method using an image.
4 is another example of a flow chart of an emotion recognition method using an image.
5 is an example of a process of detecting an object in an image.
6 is an example of a criterion for determining feature points and morphological feature values in a facial image.
7 is an example of a process of estimating blood pressure using an image.
8 is an example of a process of estimating a pulse wave transmission time using a facial image.
9 is an example of a process of estimating oxygen saturation using an image.
10 is an example of an emotional response model.
11 is an example of calculating the intensity value using the emotional sensitivity model.
12 is an example of mapping emotions using an emotional response model.
13 is another example of mapping emotions using the emotional response model.

The following description is intended to illustrate and describe specific embodiments in the drawings, since various changes may be made and the embodiments may have various embodiments. However, it should be understood that the following description does not limit the specific embodiments, but includes all changes, equivalents, and alternatives falling within the spirit and scope of the following description.

The terms first, second, A, B, etc., may be used to describe various components, but the components are not limited by the terms, but may be used to distinguish one component from another . For example, without departing from the scope of the following description, the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component. And / or < / RTI > includes any combination of a plurality of related listed items or any of a plurality of related listed items.

As used herein, the singular " include "should be understood to include a plurality of representations unless the context clearly dictates otherwise, and the terms" comprises & , Parts or combinations thereof, and does not preclude the presence or addition of one or more other features, integers, steps, components, components, or combinations thereof.

Before describing the drawings in detail, it is to be clarified that the division of constituent parts in this specification is merely a division by main functions of each constituent part. That is, two or more constituent parts to be described below may be combined into one constituent part, or one constituent part may be divided into two or more functions according to functions that are more subdivided. In addition, each of the constituent units described below may additionally perform some or all of the functions of other constituent units in addition to the main functions of the constituent units themselves, and that some of the main functions, And may be carried out in a dedicated manner.

Also, in performing a method or an operation method, each of the processes constituting the method may take place differently from the stated order unless clearly specified in the context. That is, each process may occur in the same order as described, may be performed substantially concurrently, or may be performed in the opposite order.

The following description will be made on the assumption that the emotion of a subject is estimated based on a biometric signal based on a facial expression and / or a skin color value corresponding to a morphological characteristic of a face based on a face region included in the image and / .

FIG. 1 is an example of a system 100 for performing emotion recognition using an image. In order to recognize emotions using images, three processes are required. The first step is to take a picture of a person's face and / or the skin exposed area. In other words, it is the process of acquiring the image containing the analysis object. Since the face corresponds to the area where the skin is exposed, an image including only the face region is also available. Facial images are essential source data for facial emotion recognition. Biological signals can be used for other skin areas such as hands, fingers, etc., as well as facial images. FIG. 1 shows an example of acquiring a face image. The second process extracts the region of interest from the image and extracts the feature values for the region of interest. The ROI extracted from the ROI can be extracted from the facial image, and feature values constituting the facial expression can be extracted. In addition, the biological signal can be estimated based on the color value in the skin image. In the last step, emotion is estimated by applying extracted feature values or bio-signals to emotional response model. The above processes may be performed in one apparatus or in a separate apparatus, respectively.

(1) Referring to FIG. 1, a camera 125 connected to the PC 120 photographs a user's face. The PC 120, which receives the image captured by the camera 125, collects data. The emotion can be recognized by analyzing data collected by the PC 120. [ In FIG. 1, the data collected by the PC 120 is transmitted to the server 150 at a remote location. The server 150 can extract the feature value from the image and recognize the emotion.

(2) Referring to FIG. 1, a portable terminal 110 such as a smart phone photographs a user's face with a built-in camera. The portable terminal 110 can recognize the emotion by analyzing the image data. In the case of a smart phone, emotion recognition using an image can be performed using a specific application.

Hereinafter, a device for analyzing images and recognizing emotions is referred to as a computer device. The computer device includes a PC, a smart phone, a tablet PC, a server, and a dedicated image processing device.

2 is an example of a flow chart for the emotion recognition method 200 using an image. 2 is an example of a method of extracting a feature value corresponding to a facial expression from a facial image to recognize emotion. A detailed description of each process of the emotion recognition method 200 will be described later, and the entire process will be briefly described first.

The computer device first acquires a face image (210). The computer device may directly acquire the face image using the camera, or may receive the image through the network. Alternatively, the computer device may access an image stored on a storage medium such as a hard disk or a memory card to acquire the image.

The computer device extracts the region of interest from the facial image (220). Hereinafter, the computer device refers to a region of interest for extracting features according to facial expressions as a first region of interest. That is, the first region of interest corresponds to a part of the face region. The first region of interest may be varied depending on the feature point to be extracted. Facial expression is caused by contraction of facial muscles that occurs when facial elements such as eyebrows, eyes, nose, mouth, and skin are deformed, and the intensity of facial expression is determined by the geometric change of facial features or the density of muscle expression . Therefore, the main areas related to the facial expression are the eye region, the eyebrow region, the nose region, and the mouth region.

The computer device may extract a point in the first region of interest and determine a feature value using the feature point (230). The feature value corresponds to a specific numerical value representing a human expression based on the distance between feature points and the like. The feature value may be a plurality of values, or a combination of a plurality of values.

In order to apply the determined feature value to the emotional response model, the computer device determines a constant intensity value according to the degree of the feature value displayed on the image (240). The intensity value can be determined according to the magnitude of the numerical value. The computer device determines a constant intensity value matching the numerical value of each specific value by using a mapping table prepared in advance. When there are a plurality of specific values, the intensity values may be plural. The mapping table is prepared in advance according to the emotional response model. Although human facial expressions can be determined as one feature value, a plurality of feature values may be combined to extract an accurate facial expression. The following description assumes that a mapping table is prepared according to a certain standard.

The computer device maps the emotion value model and the intensity value (250). In FIG. 1, the emotional response model is exemplified by a model using arousal, emotion (Valence), and dominance (Dominance). A detailed description of the emotional sensitivity model will be described later.

The computer device calculates the type of emotion determined according to the result of applying the intensity value to the emotional response model (260).

3 is another example of a flowchart for the emotion recognition method 300 using an image. 3 is an example of an emotion recognition method using a bio-signal. A detailed description of each process of the emotion recognition method 300 will be described later, and the whole process will be briefly described first.

The computer device first acquires a skin image (310). The skin image refers to the image of human skin. Therefore, the skin image means images such as face image, hand image, finger image, and the like.

The computer device extracts the region of interest from the skin image (320). Hereinafter, the computer device refers to a region of interest for extracting features according to facial expressions as a second region of interest. The second region of interest is determined as a region capable of relatively accurately detecting the color of the skin.

The computer device determines a color value in a second region of interest and estimates a bio-signal according to a predetermined criterion (330). The contents related to the bio-signal estimation will be described later.

In order to apply the value of the determined bio-signal to the emotional response model, the computer determines a constant intensity value according to the intensity of the bio-signal (340). The computer device determines a constant intensity value matching the numerical value of each bio-signal value using a mapping table prepared in advance. When there are plural biological signals, the intensity value may be plural. The mapping table is prepared in advance according to the emotional response model. That is, a mapping table should be prepared based on prior information about the change of the bio-signal related to the emotion. The following description assumes that a mapping table is prepared according to a certain standard.

The computer device maps the emotional response model and intensity values (350). In FIG. 2, the emotional response model is illustrated by a model using arousal, emotional valence, and dominance.

The computer device calculates 360 the end of the emotion determined according to the result of applying the intensity value to the emotional response model.

4 is yet another example of a flowchart for an emotion recognition method 400 using an image. 4 is an example of a method of recognizing emotion using both the feature value corresponding to the facial expression and the biometric signal in the facial image. A detailed description of each process of the emotion recognition method 400 will be described later, and the entire process will be briefly described first.

The computer device acquires a face image (410). The computer device extracts a first ROI from the facial image (420). The computer device may extract a point in the first region of interest and determine a feature value using the feature point (430).

The computer device first acquires the skin image (440). The skin image refers to the image of human skin. Accordingly, the computer device may use the facial image acquired in step 410. [ The computer device extracts a second region of interest from the skin image (450). The computer determines the color value in the second region of interest and estimates the bio-signal according to a predetermined criterion (460).

The computer device determines a constant intensity value to apply the determined feature value and the bio-signal value to the emotional response model, respectively (470). In this process, the computer device can determine a separate intensity value for the feature value and the value of the bio-signal, respectively. The computer device may also determine a combined intensity value for the result of combining the feature value and the value of the bio-signal. The following description assumes that a mapping table is prepared according to a certain standard.

The computer device maps the emotion value model and the intensity value (480). In FIG. 3, the emotional response model is exemplified by a model using arousal, emotional valence, and dominance.

The computer device calculates the end of the emotion determined according to the result of applying the intensity value to the emotional response model (490).

5 is an example of a process of detecting an object in an image. The computer device detects a particular object, such as a face region and / or a skin region, in the source image. Hereinafter, a technique in which a computer device detects a face region and / or a skin region in an image will be described. However, in addition to the techniques described below, the computer device may also detect facial regions, etc. using a variety of other techniques. FIG. 5 shows an example of detecting a face region.

The computer device performs an RGB image normalization process to obtain a robust image from the image acquired from the camera. The RGB image normalization is normalized for each pixel of the RGB color model as shown in Equation 1 below.

In Equation 1, R, G and B denote the respective color channels, and r, g and b denote the respective normalized color channels and denote T = R + G + B. Then, face detection using gradient information and face detection using color information are performed in parallel from the normalized image.

In FIG. 5, the process of branching from the source image to the left is processing of the morphological gradient image, and the process of branching from the source image to the right is the process of processing the YCbCr image. Finally, the preprocessing of the source image is completed by ANDing the morphological gradient image and the YCbCr image.

Face detection using gradient information combines only the maximum value pixels of the morphological gradient in each channel of Red, Green, and Blue colors, not the Morphological Gradient operation in the normal gray image, to emphasize the detection component of the face Can be generated. The formula for the Maximum Morphological Gradient Combination (MMGC) image is shown in Equation 2 below.

Here, i and j are pixel coordinates, MG _r is a pixel having a maximum morphological gradient in the R channel, MG _g is a pixel having a maximum morphological gradient in the G channel, and MG _b is a pixel having a morphological gradient in the B channel Pixel "

The step of converting the RGB image to the YC _b C _r color includes converting the image from the RGB color model to the YC _b C _r color, applying a threshold of the skin color to the source image, and removing the noise using the erosion and expansion calculation .

A skin color threshold for separating the background and the face region image can be set as shown in Equation (3) below.

The threshold value may vary depending on the skin color, and such threshold value setting can be set by a person having ordinary knowledge in the field.

The detected skin color region is converted into a binary image, and noise is removed through a closing operation using an erosion and a dilation operation. In the noise removal step, a portion having a large size may not be removed. In this case, only the face region image is detected after labeling each region except for the face region image. Finally, only the face image with the background removed is detected (Blob detection).

Finally, we combine (AND) the morphological gradient image with the YCbCr image. The computer device may then use the AdaBoost (Adaptive Boosting) algorithm to detect the face region. The AdaBoost algorithm performs learning by iterative computation of weak classifiers using samples of classes, and generates strong classifiers by combining weak classifiers generated. In the initial stage, weights are applied to all the samples and the weak classifiers are learned. The lower error weights are applied to correctly classified data in the baseline classifier and the higher error weights are applied to the incorrectly classified data as the steps progress. It is a technique to improve the performance of the classifier. The AdaBoots algorithm itself is widely known to those of ordinary skill in the art, so a detailed description thereof will be omitted.

6 is an example of a criterion for determining feature points and morphological feature values in a facial image. FIG. 6 shows a process of extracting face component features from the detected face. The computer device can limit the detection range based on the detected face area using the geometric characteristics. 6 (a) shows an example of limiting the detection range. For example, assuming that the height of the detected face region is 1, the average ratio of the eyes, nose, and mouth regions in the face region may be 0.35, 0.18, and 0.22. Therefore, the process can be performed only for the corresponding region. The computer device detects the face components (right eye, left eye, nose, mouth) through the histogram analysis within a limited range set in FIG. 6 (a) to extract thirteen points. 6 (b) is an example of extracting a point corresponding to a face component (or a facial expression component) within a limited range.

Finally, the computer device can determine as a parameter for prediction of the response value based on the extracted point. Here, the parameter corresponds to the distance to the main feature point of the face region affecting the emotion. This distance value corresponds to the above-mentioned characteristic value. Fig. 6 (c) illustrates the feature elements that can extract feature values. The facial component feature (feature value) thus extracted is used as an input parameter of facial expression emotion recognition.

6 (c), there are 1. Inner Brow Raiser, 2. Outer Brow Raiser, 3. Distance between the eyebrows and eyebrows. Brow Lowerer, 4. Upper Lip Raiser, 5. Lip Corner Puller, 6. Distance between mouth and mouth, 6. Mouth Stretch, It corresponds to the lower lip depressor. The computer device may use at least one of the extracted feature values as an input value for emotion recognition.

Hereinafter, a process of estimating a bio-signal from an image will be described.

1. Pulse wave (PPG) estimation method

Pulse is the wavelength at which blood circulates in the heart. It is used mainly to measure heart rate (HRV), current blood circulation and accumulated stress. Pulse waves can be measured with a variety of medical devices.

A method of detecting a pulse wave using an image will be described. A method of detecting a pulse wave using an image can be achieved by a method of using skin color (face and body skin area) in a state in which the skin is not in close contact as described below and a method of using an image obtained by bringing the skin close to the camera.

1) A method of estimating a pulse wave by detecting a face or detecting a skin color of a part of the body skin in a state in which the skin is not brought close to the camera will be described. After the area that can reflect the user's condition such as the face and the finger is photographed, the skin color is detected through a preprocessing process such as face detection and skin color detection. The PPG signal can be detected by setting a region of interest from the detected skin region and extracting a color average value such as Cg or Red of all the pixels in the region.

2) Describe how to use the image obtained by bringing the skin close to the camera.

(1) Instead of using all the images obtained from the camera, the RGB color values extracted for each frame are substituted into the following expression (4), and only the frame having the output value of 1 is selected and used. mean (R), std (G), and std (B) are the mean values of red signal, mean (G) , B Shows the standard deviation value of each color channel.

(2) The color threshold is set to estimate the pulse wave. For example, the maximum value and the minimum value of the average value of the Red signal can be substituted into the following equation (5) for the first 5 seconds to calculate the threshold value.

The process of Equation (6) is performed for the case where the output value is 1 through the process of (1) in each frame. I is the Red value of each pixel in one frame, and the PPG value for one frame can be obtained by adding the number of pixels whose Red value is larger than Threshold (T) value determined in step 2).

This process is repeated every frame to obtain the PPG signal. It is also possible to detect the PPG signal by extracting the red color average value of all the pixels for the ROI per frame as well as the upper method when an image obtained by bringing the skin into close contact with the camera is used.

2. Pulse Signal Estimation Method

The pulse signal estimation uses the brightness value for the second reference area. The computer device changes the image model from RGB to YCgCo (a color space composed of luminance Y, green color difference Cg, and orange color difference Co) to extract a pulse signal. Then, the Cg signal is extracted by calculating the average of the Cg values for every frame, and the Cg signal is extracted for several tens to several hundreds of frames and then converted into the frequency domain using the FFT. The computer device observes in the frequency domain and determines the largest frequency component as the period of the pulse. The computer device can estimate a frequency component larger than a predetermined threshold value in the frequency domain as a pulse. Normally, the pulse rate per minute can be measured from about 40 to 200 depending on the degree of stability or excitement, and thus the region observed in the frequency domain can be limited from 0.75 Hz to 4.00 Hz.

3. Blood pressure signal estimation method

Blood pressure is the pressure in the blood vessel of the blood released from the heart. Systolic blood pressure refers to the pressure on the artery when the heart contracts, and diastolic blood pressure refers to the pressure on the artery when the heart is relaxed. Blood pressure fluctuates greatly according to body weight, height, and age, and individual differences are relatively severe.

7 is an example of a process 500 for estimating a blood pressure using an image.

First, the camera acquires the user's image (510). The camera acquires images during a certain period of time. As will be described later, it is necessary to extract the degree of change of the brightness value in the image. At this time, the user's image should include the skin of the user. For example, the camera needs to photograph an area including the skin, such as the user's face, arm, hand, and the like. The computer device then extracts the skin region from the image (520). The skin area means the area where the skin appears in the image. For example, the background can be removed from the image using the face recognition algorithm, and only the face region can be extracted. The computer device can detect a skin region in an image using a face region detection algorithm or a skin detection algorithm. The computer device includes a smart device with a built-in camera, a PC connected to the camera, a computer device at a remote location receiving the image collected by the camera, and a server receiving the image collected by the camera.

The computer device stores (530) changes in brightness values for the two subject areas in the skin area. The process 530 includes a process of setting two object regions and a process of storing a change of a brightness value for each object region. The computer device sets two subject areas in the skin area.

If the computer device has extracted one continuous skin area, the computer device can divide one skin area into two or set two specific areas in one continuous skin area as the object area. For example, if the skin area is a face area, the face area can be divided into two to set two object areas. Furthermore, if two cameras are used to acquire an image for a user, each skin region may be extracted from each image acquired by each camera, and two skin regions may be set as target regions. For example, when one camera shoots a face area and one camera shoots a hand, the computer device may set a face area and a hand area as target areas.

The computer device stores a change in brightness value for the two object regions. The skin changes color depending on the blood flow in the blood vessels near the skin. That is, by monitoring the brightness value of the target area, it is possible to grasp the flow of the blood flow having a certain rule. A certain rule is the flow of blood flow that moves according to the heartbeat. (1) The computer device can calculate the average brightness value of the target area for each frame and store the average brightness value for each frame. In this case, the computer device stores a change in the brightness value on a frame-by-frame basis. (2) The computer device may also calculate and store the average brightness value for the frame at regular intervals. In this case, the brightness value for the still image is calculated at a predetermined time interval. (3) Further, the computer device may calculate and store the average value of the total brightness values of each frame in a predetermined frame unit.

The computer device generates a pulse wave signal based on a change in the brightness value of the two object regions (540). As described above, the brightness value of the target area is related to the flow of blood flow. The vertical axis is the brightness value and the horizontal axis is the time flow, the change of the brightness value can be a signal having a constant waveform. The computer device can convert a signal represented by the brightness value into a pulse wave signal using a band pass filter. Furthermore, the computer device may use a different filter for removing noise from the pulse wave signal.

The computer device determines an associated peak point in the pulse wave signal for the two subject areas and estimates a time difference between the two peak points in a pulse-wave transit time (PTT) (550). There may be several peak points in a single pulse wave signal. The peak point is the point at which the brightness value increases. The brightness value may vary depending on the flow of blood flow.

Once the heart is beating, the moment it is beating, it will deliver a constant blood flow to the artery, and then the blood flow to the artery will decrease. This process is repeated according to the heartbeat. When the heart is beating, the blood vessels that follow the artery repeat the pattern of increasing or decreasing velocity of the blood flow or blood flow according to the heart rate. Eventually, the point at which the brightness value increases in the subject area is due to the action of the heart pushing the blood flow into the artery. Thus, the peak point in the pulse wave signal may be constant or somewhat irregular depending on the heartbeat.

The computer device finds a peak point in two subject areas. At this time, the peak points to be found in the two object regions are the peak points associated with each other. An associated peak point is a point affected by a specific heartbeat. For example, if the heart makes a first beat, the blood flow first increases in the blood vessel at the first point near the heart, and then increases in the blood vessel at the second point at a certain distance from the first blood vessel. At the same time, the blood flow increases at two different points in time. The change in blood flow due to the same beating varies with the distance between the two points. The peak points associated with each other in the two subject areas are due to changes in blood flow over the same beat. Thus, the computer device looks for the associated peak point in consideration of the distance of the target area. The computer device estimates the pulse wave propagation time based on the time interval between the associated peak points.

Finally, the computer device can estimate the blood pressure using the pulse wave delivery time (570). The formula used in the blood pressure estimation can be used in the equation used in the study using the PPG signal and the ECG signal. Most of them estimate blood pressure through regression equation. The formula for estimating the blood pressure includes the user's body information in addition to the pulse wave transmission time. Accordingly, the computer device receives the physical information from the user in advance or receives the physical information of the user from the database storing the physical information (560). The body information includes the age, height, weight, etc. of the user.

8 is an example of a process of estimating a pulse wave transmission time using a facial image. FIG. 8 (a) shows a change in brightness value for two object regions in the form of a signal. In FIG. 8, the signal for the upper region is shown in blue and the signal for the lower region is shown in red.

FIG. 8A is a graph showing the average brightness value for each frame of a moving image of a target area, and the average brightness value for the entire frame. The brightness can be used based on the R, G, B values in the color image. Alternatively, the brightness value may be determined using another color model (YUV, YCbCr) representing the brightness value of the RGB image.

Fig. 8 (b) shows an example in which the brightness value of the target area expressed in Fig. 8 (a) is converted into a pulse wave signal. In FIG. 8 (a), a signal corresponding to a change in brightness value stored for each frame is expressed. The brightness value signal for the target area includes the pulse wave signal and the pulse wave signal to be photographed as well as the noise due to the motion. A filtering process is required to eliminate noise. The filter can use a band pass filter that allows only the frequency of the corresponding region to pass therethrough so as to extract only the pulse wave signal component. 8 (b) is an example of extracting only the signal corresponding to the pulse wave from the brightness value signal.

8 (c) shows an example of estimating the pulse wave propagation time between two object regions from a pulse wave signal. As described above, the pulse wave signals of the two target regions repeat a flow of increasing and decreasing with a constant period. Find the peak points that are related to each other for the two target areas. In FIG. 8 (c), the peak points related to each other in the two pulse wave signals are indicated by a dotted line. When a peak point of each cycle is found from two pulse wave signals, a time difference in which peak points appear between the two signals is determined as a pulse wave propagation time (PTT) as shown in FIG. 8 (c). Of course, the pulse wave propagation time does not necessarily have to be based on the peak point of the pulse wave signal. Since the pulse-wave propagation time corresponds to the time interval between two pulse-wave signals, it can be calculated using another point as a reference.

The computer device calculates the systolic and diastolic blood pressure measurement equations of the form as shown in Equation (7) below, with the body weight, the height, the age, and the PTT as independent variables and the actual blood pressure values (systolic and diastolic) as dependent variables .

In Equation (7), PTT is a value obtained by calculating a time difference between peak points of two signals, weight is weight, height is key, and age is a variable for age. W _PTT , W _weight , W _height , and _age represent constant values for each variable derived from multiple regression analysis.

4. Breathing Estimation Method

A method of measuring respiratory rate from an image will be described. It can be estimated based on brightness value in respiration image. The computer calculates the mean value of Cg from each frame to measure the respiration rate, and observes the signal in the frequency domain by applying FFT to the Cg signal. That is, the frequency component for the Cg signal is observed during a constant frame (for a constant time). The computer device analyzes the correlation between the pulse rate and the breath rate, and estimates the breath rate using the frequency having the largest frequency component within the frequency range of 0.13 to 0.33 Hz. The computer device can estimate the number of breaths at certain time intervals.

5. Estimation of oxygen saturation

Oxygen saturation is one of the important vital signs that need to be managed in everyday life because oxygen deficiency is a cause of various diseases and it can cope with the risk of respiratory management and hypoxia quickly by measuring and observing oxygen saturation. In the present invention, a method of measuring oxygen saturation from an image obtained through a terminal with a camera is used. Set the region of interest for the measurement of oxygen saturation and extract RGB color values from the region of interest.

And the oxygen saturation is measured by using the color combination obtained by weighting the extracted RGB channels as the characteristic parameter. Oxygen saturation refers to the ratio of oxyhemoglobin to total hemoglobin in the blood as a percentage. The oxygen saturation is measured using an infrared wavelength (750 to 940 nm) and a red wavelength (660 to 750 nm).

9 is an example of a process 600 for estimating oxygen saturation using an image. The computer device acquires an image including a body part to be observed, such as a user's face and a finger, using a camera (620). Here, the camera means a camera built in a smart phone, a camera connected to a PC, or a separate camera capable of shooting an image. The camera may photograph a moving image including a partial area of the body such as a face and a finger or a still image (photograph) including a partial area of the body such as a face and a finger.

The computer device may extract a body part to be observed, such as a face and a finger, from the image using an image processing technique, and then determine a specific ROI in a body part area such as a face and a finger (630). In general, some areas of the body, such as the face and the fingers, distinguish them from the background using skin colors. There are various face detection techniques that can be applied to face region detection. The computer device then determines the region of interest that is the basis for measuring oxygen saturation in the extracted face region. Various regions of interest can be used. However, it is desirable to set the region of interest to be as small as possible of the skin color and other colors. A region including a human's eye, nose, mouth, etc. has a different color and has a constant edge, which is not desirable as a region of interest. If the region of interest is set in the face region, the region around the cheek is relatively suitable for a region with relatively low noise. Therefore, it is assumed that the region of interest below is a ball region.

After determining the region of interest, the computer device generates 640 feature parameters for the oxygen saturation measurement using the color values of the region of interest. The computer device generates feature parameters using the R color value, the G color value, and the B color value of the ROI.

Conventionally, the oxygen saturation measuring apparatus irradiates the infrared wavelength and the red wavelength to the finger, and measures the oxygen saturation using the wavelength (light) transmitted in the opposite direction. The characteristic parameter includes a first parameter corresponding to an infrared wavelength and a second parameter corresponding to a red wavelength.

The computer device determines the first parameter (C _{660 nm} ) and the second parameter (C _{940 nm} ) using the RGB color values of the region of interest. The feature parameters are determined using the RGB color values of the region of interest. The first parameter (C _{660 nm} ) and the second parameter (C _{940 nm} ) are expressed by the following equations (8) and (9), respectively.

Mean (Red) is an average value of R color values of pixels included in the ROI, mean (Green) is an average value of G color values of pixels included in the ROI, mean (Blue) B is the average value of the color values. W _R , W _G , and W _B are weight values for each channel for obtaining C _{660 nm} . T _R , T _G and T _B are weight values for each channel for obtaining C _{940 nm} . The weight determination method will be described later. As shown in FIG. 9, the weights must be set 610 before the computer device generates the feature parameters.

The computer device now calculates the average value of the feature parameters for the region of interest and the standard deviation of the feature parameters (650). Wherein the average value of the feature parameters includes an average value for each of the first and second parameters described above. The standard deviation of the feature parameter includes the standard deviation for each of the first and second parameters described above. Here, the mean value and the standard deviation mean an average value and a standard deviation of a plurality of frames for the region of interest. The computer device determines a first parameter and a second parameter for a region of interest for each moving picture frame, and calculates a first parameter and a second parameter for each of a plurality of frames (based on a predetermined time or frame number) And calculates the deviation.

Finally, the computer device estimates the oxygen saturation based on the mean and the standard deviation (660). The computer device can measure the oxygen saturation (SpO ₂ ) using the following equation (10).

The constants A and B can be determined using the oxygen saturation value measured using the actual equipment. A and B can be determined so as to minimize Equation (11) below. That is, A and B can be determined by applying a least squares method between the R value and the actual oxygen saturation measurement value.

Here, O ximeter _i means the oxygen saturation value measured using the actual equipment. R _i can be determined using Equation (12) below.

We can generate emotional models composed of multidimensional elements of sensitivities using the data of the degree of explicit sensitivity to the ones known as emotional factors (Arousal, Valence, Dominance, etc.) constituting human emotions in psychological and cognitive science fields . Russell's emotional dimension model.

10 is an example of an emotional response model. FIG. 10 shows a basic structure of a multidimensional emotion model. According to existing research results, it has been established that human emotions are formed by various emotional responses. Thus, all psychological theories are based on a multidimensional emotion model based on a combination of emotional sensation values.

Referring to FIG. 10 (a), Arousal indicates the intensity of awakening. The higher the value, the more relaxed and tense the body, and the lower the more relaxed the feeling. Valance indicates the degree of positive and negative, the higher the value, the better the mood, the lower the mood and the center of gravity indicates a mediocre state with no center. The two-dimensional emotional space formed by the combination of Arousal and Valence is the most fundamental form of emotion model. In other words, the change of human emotions occurs due to the degree of positive and negative, and the level of arousal, and the emotions of human beings can be detected if the degree of reaction of these emotional response factors is known. In addition, as shown in 10 5 (b), the two-dimensional emotion space model can be extended to the three-dimensional emotion space model by adding the Dominance axis to the two axes Arousal and Valence. Dominance, which indicates the extent of domination, is defined as a criterion that shows how dominant the subject himself is to a particular subject, and is recognized as an element constituting human emotion with Valence and Arousal.

11 is an example of calculating the intensity value using the emotional sensitivity model. The computer device inputs the feature value of the face component (first region of interest) and the value of the bio-signal for the second region of interest and performs the sensitivity value prediction through the emotionally sensitive model. The emotional sensitivity model consists of Arousal, Valence, and Dominance models, and each model is generated based on the corresponding facial components and biometric signal feature values from 1 point to 9 points. The computer device inputs the feature value for the first ROI and the bio-signal for the second ROI as parameters of the PA value prediction module. The computer device predicts the response value by comparing it with each emotional response model (Arousal, Valence, Dominance) through a pattern recognition algorithm. As shown in the sensitivity prediction result table of FIG. 11, it is possible to derive the similarity score from 1 point to 9 points of each of Arousal, Valence and Dominance when the sensitivity value is predicted. As the score increases, it is similar to the corresponding model . As an example of determining the intensity value, a random forest algorithm can be used as a recognition algorithm, and the recognition result of the random forest algorithm is determined according to the proportion of votes from the first rank candidate to the last rank candidate. Scoring can be done by a percentage of the number of votes cast. Based on the derived values, the intensity values for the values can be determined through various weighting methods. In Equation (13), A represents the similarity score of Arousal, V represents the similarity score of Valance, and D represents the similarity score of Dominance. Also,

Are the weights to multiply Arousal, Valence, and Dominance, respectively.

For example, we can weight the Dominance value over the Arousal and Valence values in order to more clearly see the intensity of the emotion, and the resulting results can be different. Once the response scoring has been performed, you can derive values from 1 to 9 of the scored Arousal, 1 to 9 of the scored Valence, and 1 to 9 of the scored Dominance, You can get a total of 729 values. This value enters the input of the emotion model mapping module and is mapped to three-dimensional coordinates.

12 is an example of mapping emotions using an emotional response model. FIG. 12 shows the concept of a method of mapping an emotion model using three-dimensional sensitive elements of Arousal, Valence, and Dominance performed by the emotion model mapping module. As mentioned above, the Arousal element means the intensity of arousal, the Valence element means affirmative or negative, and Dominance means the degree of dominance. As shown in FIG. 12 (a), it is possible to map various emotions on a two-dimensional plane through the results of 1 to 9 of Arousal and Valence from the sensitivity scoring module. In addition, as shown in FIG. 12 (b), after emotion mapping is performed on the two-dimensional plane, the intensity of the emotion can be known through the result value of 1 to 9 of the Dominance.

For example, assuming that the largest Arousal value is 7 and the Valence value is 8, as shown in FIG. 12 (a), mapping is performed on a two-dimensional plane, and a feeling of Happy is output. Further, assuming that the largest Arousal value is 7, the Valence value is 8, and the Dominance value is 2 as shown in FIG. 12 (b), it can be predicted that a happy emotion of weak intensity is mapped on the 3D plane.

13 is another example of mapping emotions using the emotional response model. FIG. 13 shows an example in which 7 predicted complex emotions, which are the upper 1% of the 729 values having a high degree of similarity, are mapped to the three-dimensional emotion model. The larger the size and the larger the size, the more similar to the emotion in the 3D emotion model, and the higher the intensity of emotion.

It should be noted that the present embodiment and the drawings attached hereto are only a part of the technical idea included in the above-described technology, and those skilled in the art will readily understand the technical ideas included in the above- It is to be understood that both variations and specific embodiments which can be deduced are included in the scope of the above-mentioned technical scope.

100: a system for performing emotion recognition
110: Portable terminal
120: PC
125: camera
150: Server

Claims

Detecting a region of interest of at least one of a first region of interest of a face region included in the image and a skin region included in the image or a second region of interest of the face region in the image;
The computer device extracting a morphological feature value in the first region of interest and a skin color value in the second region of interest;
The computer device estimating a bio-signal using the color value;
The computer device mapping the morphological feature value and the bio-signal to an intensity value for an emotional response value; And
And calculating the emotion recognition result by applying the intensity value to the emotional response value model by the computer device.

The method according to claim 1,
Wherein the computer apparatus analyzes a histogram of the image in the face region to detect a plurality of feature points and determines a distance between any two of the plurality of feature points as the morphological feature value, Recognition method.

The method according to claim 1,
The morphological characteristic value may be at least one of the degree to which the inner eyebrows have risen, the degree to which the outer eyebrows have risen, the distance between the two eyebrows, the degree of the upper lip elevation, the degree of the mouth's tails, the distance between the two lips, The method comprising the steps of:

The method according to claim 1,
Wherein the bio-signal is at least one of a pulse wave, a pulse, a blood pressure, respiratory rate, and an oxygen saturation.

The method according to claim 1,
In the step of estimating the bio-signal
Wherein the computer apparatus estimates a pulse wave (PPG) signal by adding a number of pixels having a red color equal to or greater than a threshold value in the second ROI for each frame of the image.

The method according to claim 1,
In the step of estimating the bio-signal
Wherein the computer device calculates an average value of brightness values of the second ROI for each frame of the image, converts the average value signal extracted from the consecutive frames of a predetermined length into a frequency domain, And estimating a pulse signal at a time interval between frequency components having the frequency components.

The method according to claim 1,
In the step of estimating the bio-signal
Wherein the computer device calculates an average value of brightness values of two of the second ROIs in every frame of the image, transforms the average value signals extracted from consecutive frames of a predetermined length into a frequency domain, Determining an associated peak point having a magnitude equal to or greater than a threshold value among frequency components for each of the two peak points, estimating a pulse wave propagation time based on a time difference between the two peak points, and estimating a blood pressure using the pulse wave propagation time Method of Emotion Recognition Using.

The method according to claim 1,
In the step of estimating the bio-signal
Wherein the computer device calculates an average value of brightness values of the second ROI for each frame of the image, converts the average value signal extracted from the consecutive frames of a predetermined length into a frequency domain, An emotion recognition method using an image that estimates the number of breaths using the number of frequencies in the range of.

The method according to claim 1,
In the step of estimating the bio-signal
The computer device calculates an average value of each of the R color value, the G color value, and the B color value of the second ROI in a plurality of frames of the image, Method of Emotion Recognition Using Image Estimating Saturation.

The method according to claim 1,
Wherein the computer device determines an intensity value for the morphological characteristic using a mapping table provided in advance and determines an intensity value for the bio signal using a mapping table provided in advance, Emotion recognition method.

The method according to claim 1,
The emotional sensitivity model is a two-dimensional model based on arousal and emotion, or a three-dimensional model based on arousal, emotion, and dominance.

The method according to claim 1,
Wherein the computing device calculates the emotion recognition result using at least one recognition result in which the intensity of the emotion among the recognition results calculated in the emotion recognition sensitivity model is equal to or greater than a reference value.