CN110765838B

CN110765838B - Real-time dynamic analysis method for facial feature region for emotional state monitoring

Info

Publication number: CN110765838B
Application number: CN201910823146.9A
Authority: CN
Inventors: 丁帅; 李莹辉; 杨善林; 王林杰; 李霄剑; 李志利; 张彩云; 贺利
Original assignee: Hefei University of Technology; China Astronaut Research and Training Center
Current assignee: Hefei University of Technology; China Astronaut Research and Training Center
Priority date: 2019-09-02
Filing date: 2019-09-02
Publication date: 2023-04-11
Anticipated expiration: 2039-09-02
Also published as: CN110765838A

Abstract

The method comprises the steps of extracting spatial feature information of a plurality of preset facial feature points from a continuous facial image sequence of a monitored target individual based on a digital human face monitoring library and a Surrey human face model, obtaining color change features of a specific facial skin interesting region so as to judge a heart rate value of the target individual, and building an SVM classifier based on the feature information of the plurality of preset feature points and the heart rate value of the target individual to determine emotion distribution of the monitored target individual. The scheme of the application overcomes the defects that monitoring data are not comprehensive enough, the type is single, and an emotion analysis method is simple in the prior art, and comprehensively improves the real-time analysis accuracy of the emotion state of the monitored target individual.

Description

Real-time dynamic analysis method for facial feature region for emotional state monitoring

Technical Field

The application relates to the field of information processing and psychology, in particular to a real-time dynamic analysis method for a facial feature area for emotional state monitoring.

Background

Emotion, a general term for a series of subjective cognitive experiences, is a psychological and physiological state resulting from the integration of various senses, ideas and behaviors. The most common and popular emotions are happiness, anger, grief, surprise, terror, love, etc., and also some subtle and subtle emotions, such as jealousy, jeopardy, shame, self-haury, etc. Mood often interacts with factors such as mood, character, spleen qi, purpose, etc., and is also affected by hormones and neurotransmitters. Either positive or negative emotions are motivations for people to act. Although some mood-induced behaviors do not appear to be thought, in practice consciousness is one of the important rings in creating mood. It is seen that focusing on the emotional characteristics of an individual can play a very important role in mood guidance and people's safety.

At present, in the technical scheme for acquiring the emotional characteristics of individuals, acquired data are not comprehensive enough and single in type, and in the process of performing emotional analysis by using the acquired data, the method is simple and single, so that the accuracy of the determined emotional characteristics in the prior art is low.

Disclosure of Invention

Technical problem to be solved

Aiming at the defects of the prior art, the application provides a real-time dynamic analysis method for the facial feature region for monitoring the emotional state, and the defect of low accuracy of the determined emotional feature in the prior art is overcome.

(II) technical scheme

In order to achieve the above purpose, the present application is implemented by the following technical solutions:

the application provides a real-time dynamic analysis method for a facial feature region for emotional state monitoring, which comprises the following steps:

acquiring a video image of a target individual;

extracting a plurality of facial images from the video image;

for each face image, changing the gray value of at least one pixel point in the face image based on the number of the pixel points of the face image in each gray value level, so that the difference value of the number of the pixel points in different gray value levels is smaller than a preset value, and obtaining a gray image corresponding to the face image; removing isolated point noise in the corresponding gray level image of the face image;

determining the position information of a plurality of preset feature points in each gray level image; performing geometric normalization processing on the feature points corresponding to the centers of the eyes, the upper lip and the forehead in the plurality of preset feature points; calculating distance information and angle information between any two preset feature points in the preset feature points based on the position information of the preset feature points; determining first probability distribution of the gray level image corresponding to each preset emotion type based on distance information and angle information between any two preset feature points in the plurality of preset feature points;

extracting an image in a preset area from each facial image, and amplifying a skin color change signal in the image in the preset area; acquiring R channel image information, G channel image information and B channel image information in an image amplified by a skin color change signal; decomposing a signal corresponding to R channel image information into a plurality of sub-signals, screening the plurality of sub-signals by using a first preset threshold value to obtain an R channel target signal, and normalizing the R channel target signal; decomposing a signal corresponding to G channel image information into a plurality of sub-signals, screening the plurality of sub-signals by using a second preset threshold value to obtain a G channel target signal, and carrying out normalization processing on the G channel target signal; decomposing a signal corresponding to B-channel image information into a plurality of sub-signals, screening the plurality of sub-signals by using a third preset threshold value to obtain a B-channel target signal, and normalizing the B-channel target signal; determining an effective chrominance signal by using the R channel target signal, the G channel target signal and the B channel target signal after normalization processing;

for each face image, converting an effective chrominance signal corresponding to the face image into a frequency domain signal, and determining a heart rate value of a target individual based on a frequency value at the peak value of the frequency domain signal obtained by conversion; determining a second probability distribution of the face image corresponding to each preset emotion category based on the determined heart rate value;

and determining a target emotion category of the target individual based on the first probability distribution, the second probability distribution, the heart rate characteristic weight information and the image characteristic weight information.

In one possible embodiment, the image in the preset area is the image left by the corresponding face image except the eye position image.

In a possible implementation manner, the plurality of preset feature points include a feature point corresponding to an eye position, a feature point corresponding to a lip position, and a feature point corresponding to a forehead position.

In a possible implementation manner, the feature points corresponding to the forehead position include 13 feature points corresponding to the forehead position.

In one possible embodiment, the amplifying the skin color variation signal in the image in the preset area includes:

performing multi-resolution decomposition on a sub-image sequence corresponding to an image in a preset area by using a wavelet analysis theory to obtain first image sequences with different frequencies; filtering the decomposed image sequences with different frequencies according to a preset band-pass filter to obtain a second image sequence; amplifying the second image sequences with different frequencies by using different amplification coefficients to obtain a third image sequence; and recombining the third image sequence into images by using a wavelet reconstruction method.

In one possible embodiment, the extracting a plurality of facial images from the video influence includes:

and carrying out face recognition on the video image by using opencv to obtain the plurality of face images.

In a possible embodiment, the determining the effective chrominance signal by using the normalized R-channel target signal, G-channel target signal, and B-channel target signal includes:

the effective chrominance signal is determined using the following equation:

X _s ＝3R _n -2G _n

in the formula, X _S 、Y _S Representing the effective chrominance signal, R _n Represents the nth R channel target signal, G _n Representing the nth G-channel target signal and Bn representing the nth B-channel target signal.

In a possible embodiment, the determining a heart rate value of the target individual based on the frequency value at the peak of the frequency-domain signal obtained by the transformation includes:

determining the heart rate value using the formula:

HeardRate＝60*F _max

wherein HeardRate represents the heart rate value, F _max Representing the frequency value at the peak of the frequency domain signal.

(III) advantageous effects

The application provides a real-time dynamic analysis method for a facial feature region for emotional state monitoring. The method has the following beneficial effects:

the method comprises the steps of extracting feature information of a plurality of preset feature points capable of reflecting the emotion of a target individual from a plurality of facial images of the target individual, determining the heart rate value of the target individual based on features of skin colors in the facial images, analyzing the feature information of the plurality of preset feature points and the heart rate value of the target individual, and determining the emotion category of the target individual. The scheme of the application overcomes the defects that data for analyzing individual emotion characteristics are not comprehensive enough, the type is single, and the method is simple in the prior art, and the accuracy of the determined emotion category of the target individual is improved comprehensively.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 schematically shows a flow chart of a method for real-time dynamic analysis of facial feature regions for emotional state monitoring according to an embodiment of the present application;

fig. 2 is a schematic diagram schematically illustrating an image within a preset region obtained from a face image extracted in the present application;

fig. 3 schematically shows a flowchart of a real-time dynamic analysis method of facial feature regions for emotional state monitoring according to another embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

In order to overcome the defects that data for analyzing individual emotion characteristics in the prior art are not comprehensive enough, the types are single, the method is simple, and accuracy is low, the method for real-time dynamic analysis of the facial characteristic region for emotion state monitoring is provided, facial characteristic points reflecting emotion are extracted from a visible light face video, namely a video image, heart rate parameters are extracted from a facial image, emotion states are comprehensively analyzed, and accuracy of the determined emotion types of target individuals is effectively improved. Specifically, as shown in fig. 1, the method of the present application includes the steps of:

s110, acquiring a video image of a target individual; and extracting a plurality of face images from the video image.

Here, a video image of the target individual may be acquired using a visible light camera. And carrying out face recognition on the video image by using opencv to obtain the plurality of face images.

S120, aiming at each face image, changing the gray value of at least one pixel point in the face image based on the number of the pixel points of the face image in each gray value level, so that the difference value of the number of the pixel points in different gray value levels is smaller than a preset numerical value, and obtaining the corresponding gray image of the face image; and removing isolated point noise in the corresponding gray-scale image of the face image.

In this step, the histogram equalization and median filtering methods may be used to perform gray level correction and noise filtering on the face image. The histogram equalization is to change the gray value of the face image, so that each gray level has the same number of pixel points as much as possible, the histogram tends to be balanced, and the gray image corresponding to the face image is obtained. The median filtering is to remove isolated point noise from the gray image and also to protect the image edge.

S130, determining the position information of a plurality of preset feature points in each gray level image; performing geometric normalization processing on feature points corresponding to the centers of the eyes, the upper lips and the forehead in the plurality of preset feature points; calculating distance information and angle information between any two preset feature points in the preset feature points based on the position information of the preset feature points; and determining first probability distribution of the gray level image corresponding to each preset emotion category based on distance information and angle information between any two preset feature points in the plurality of preset feature points.

In this step, 81 feature points of the gray level image are calibrated and matched based on a Dlib library and a Surrey face model. On the basis of 68 facial feature points detected by a Dlib library, training the facial feature points by using a Surrey face model, adding 13 feature points in a forehead area, covering the forehead area of a face and improving the accuracy.

After the positions of the feature points are calibrated for the face image, the feature points of the eyes, the center of the upper lip and the center of the forehead are selected, and geometric normalization processing is carried out on the image, so that the influence of the head posture and the personalized difference of the face on the appearance performance is eliminated.

In the step, in the geometric feature extraction stage, the distance and the angle between the geometric feature points are calculated on the basis of obtaining the normalized facial feature points corresponding to the input image. And obtaining the distance characteristic and the Delaunay triangle angle characteristic corresponding to the image.

In the step, the geometrical characteristics of the human face under different emotional states are learned and trained through an SVM classifier, and a facial characteristic emotion analysis model is established. And determining a first probability distribution of the gray-scale image corresponding to each preset emotion category by using a line number classifier based on distance information and angle information between any two preset feature points in the plurality of preset feature points.

S140, extracting an image in a preset area from each facial image, and amplifying a skin color change signal in the image in the preset area; acquiring R channel image information, G channel image information and B channel image information in an image amplified by a skin color change signal; decomposing a signal corresponding to R channel image information into a plurality of sub-signals, screening the plurality of sub-signals by using a first preset threshold value to obtain an R channel target signal, and normalizing the R channel target signal; decomposing a signal corresponding to G channel image information into a plurality of sub-signals, screening the plurality of sub-signals by using a second preset threshold value to obtain a G channel target signal, and carrying out normalization processing on the G channel target signal; decomposing a signal corresponding to the B channel image information into a plurality of sub-signals, screening the plurality of sub-signals by using a third preset threshold value to obtain a B channel target signal, and carrying out normalization processing on the B channel target signal; and determining an effective chrominance signal by using the R channel target signal, the G channel target signal and the B channel target signal after normalization processing.

In the step, an interested area is selected from each facial image, skin color information of the area is extracted to obtain skin color information of continuous multi-frame images, and an Euler amplification algorithm is used for amplifying a skin color change signal.

The region of interest may be the image left by the corresponding facial image except the eye position image, and the resulting sequence of images of interest is shown in fig. 2.

In this step, amplifying a skin color change signal in an image in a preset region, including:

performing multi-resolution decomposition on a sub-image sequence corresponding to an image in a preset area by using a wavelet analysis theory to obtain first image sequences with different frequencies; filtering the decomposed image sequences with different frequencies according to a preset band-pass filter to obtain a second image sequence; amplifying the second image sequences with different frequencies by using different amplification coefficients to obtain third image sequences; and recombining the third image sequence into images by using a wavelet reconstruction method.

The signal amplifying step includes the first spatial filtering, that is, the image sequence to be processed is decomposed in multiple resolution by wavelet analysis theory to obtain video image sequences of different frequencies, the subsequent time domain filtering, that is, filtering the decomposed video sequences of different frequencies by selecting proper band-pass filters as required to obtain the concerned video image sequence, the subsequent amplifying the filtering result, that is, amplifying the video sequences of different frequencies by selecting different amplifying coefficients according to corresponding rules to effectively amplify the concerned sequence, and the final image synthesizing, that is, re-synthesizing the video image by wavelet reconstruction method.

The step of amplifying the signal is to regard the whole image as dynamically changing and then amplify the skin color change signal.

The steps are denoised by using continuous wavelet transform, i.e. a signal is decomposed into a series of wavelets, namely, the basic functions of the wavelet transform. Wavelets have constant attenuation and volatility. Through wavelet transformation, the statistical properties of the noisy image can be obtained. An appropriate threshold function is selected for the image so that the wavelet coefficients of the image itself can be retained to the greatest extent possible while the wavelet coefficients of the noise are removed as much as possible.

After the image is denoised, normalizing R, G and B three-channel signals of the image:

wherein, mu (C) _i ) Substitute for Chinese traditional medicineThe moving average of the ith frame image is shown.

In this step, the determining an effective chrominance signal using the R channel target signal, the G channel target signal, and the B channel target signal after the normalization process includes:

the effective chrominance signal is determined using the following equation:

X _s ＝3R _n -2G _n

in the formula, X _S 、Y _S Representing the effective chrominance signal, R _n Representing the nth R channel target signal, G _n Representing the nth G-channel target signal, B _n Representing the nth B-channel target signal.

And extracting R, G and B three-channel signals in the step of determining the effective chrominance signal, and carrying out primary color separation on the amplified signals.

S150, aiming at each facial image, converting the effective chrominance signal corresponding to the facial image into a frequency domain signal, and determining the heart rate value of the target individual based on the frequency value of the peak value of the frequency domain signal obtained by conversion; and determining a second probability distribution of the face image corresponding to each preset emotion category based on the determined heart rate value.

In this step, determining a heart rate value of the target individual based on the frequency value at the peak of the frequency domain signal obtained by the transformation includes:

determining the heart rate value using the formula:

HeardRate＝60*F _max

where HeardRate represents the heart rate value and Fmax represents the frequency value at the peak of the frequency domain signal.

In this step, the obtained signal is converted to the frequency domain by using fourier transform, and the position of the maximum peak value is found on the frequency domain graph, so that the corresponding heart rate value can be calculated. The fourier transform aims to describe the image as a sum of complex exponentials of different amplitudes, different frequencies and different phases.

Training the heart rate values by using a Support Vector Machine (SVM), establishing an emotion analysis model, and outputting probability values of corresponding preset emotion categories.

And S160, determining the target emotion category of the target individual based on the first probability distribution, the second probability distribution, the heart rate characteristic weight information and the image characteristic weight information.

The first probability distribution and the second probability distribution are weighted averagely to obtain the final probability value of each preset emotion category, and the emotion category with the highest probability value is selected as the target emotion category of the target individual to be output.

The embodiment combines two emotion analysis schemes, namely extracting facial feature points reflecting emotions from a visible light image or a visible light video (namely the video image), and extracting a facial interesting area from the visible light video to obtain a heart rate parameter. The emotion classification is judged and analyzed by combining the visible light face image with the mode of increasing the face feature point calibration of the forehead area and selecting the face interested area to obtain the heart rate value, and the accuracy of emotion judgment is improved.

As shown in fig. 3, the method for extracting facial feature points reflecting emotion from a visible light video includes the following steps: collecting a visible light image through non-contact equipment; carrying out face detection on the visible light image by using opencv to obtain a face image; the histogram equalization and the median filtering are used for preprocessing the gray level correction and the noise filtering of the image, so that the influence of light, noise and the like on the image can be effectively removed; the method comprises the steps of carrying out real-time face 81 feature point calibration and matching by means of a Dlib library and a Surrey face model, adding 13 feature points in a forehead area on the basis of 68 feature points calibrated by the Dlib library, covering the forehead area, effectively obtaining contour information of a face, and reducing loss of face information. Extracting the facial feature quantity capable of reflecting the psychological state, and carrying out geometric normalization processing on the image with calibrated facial feature points; and training the facial features by using a Support Vector Machine (SVM), and establishing an emotion state analysis model to obtain probability output of each emotion category.

The method for extracting the heart rate parameter from the visible light video comprises the following steps: extracting a facial region of interest (ROI) from a visible light video to avoid interference of non-skin areas such as eyes; extracting the face color change information of the skin area; amplifying the change of the color signal by using an Euler amplification algorithm; extracting signals of three channels of R, G and B based on the amplified signals; denoising signals of three channels of R, G and B by using continuous wavelet transform, and performing normalization processing; extracting effective information from the three-channel signal subjected to noise reduction, and calculating a corresponding chrominance signal; converting the obtained signal to a frequency domain by using Fourier transform to obtain a continuous heart rate value; and training the heart rate value by using a Support Vector Machine (SVM), establishing an emotion analysis model, and outputting the probability value of each corresponding emotion category. In the step, when the heart rate value of the facial video is obtained, a fixed human face interesting area is selected, the change of the color signal is amplified by using an Euler amplification algorithm, the running speed is high, and the effect of real-time processing on the video image can be almost achieved. Meanwhile, when the heart rate value of the face image is obtained, the signal is calculated and processed by using a chroma-based extraction method, so that the extracted signal is optimized.

And carrying out average weighting on the probabilities of the facial features and the heart rate value corresponding to the emotion categories to obtain the final probability value of each emotion category, and selecting the emotion category with the highest probability value as output.

The real-time dynamic analysis method for the facial feature region for emotion state monitoring is based on a digital human face monitoring library and a Surrey human face model, the method extracts spatial feature information of a plurality of preset facial feature points from a continuous facial image sequence of a monitored target individual, obtains color change features of a specific facial skin region of interest so as to judge the heart rate value of the target individual, and establishes an SVM classifier based on the feature information of the plurality of preset feature points and the heart rate value of the target individual so as to determine emotion distribution of the monitored target individual. The scheme of the application overcomes the defects that monitoring data are not comprehensive enough, the type is single, and an emotion analysis method is simple in the prior art, and comprehensively improves the real-time analysis accuracy of the emotion state of the monitored target individual.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of additional like elements in a process, method, article, or apparatus that comprises the element.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A real-time dynamic analysis method for facial feature regions for emotional state monitoring is characterized by comprising the following steps:

acquiring a video image of a target individual;

extracting a plurality of facial images from the video image;

extracting an image in a preset area from each facial image, and amplifying a skin color change signal in the image in the preset area; acquiring R channel image information, G channel image information and B channel image information in an image amplified by a skin color change signal; decomposing a signal corresponding to R channel image information into a plurality of sub-signals, screening the plurality of sub-signals by using a first preset threshold value to obtain an R channel target signal, and normalizing the R channel target signal; decomposing a signal corresponding to G channel image information into a plurality of sub-signals, screening the plurality of sub-signals by using a second preset threshold value to obtain a G channel target signal, and carrying out normalization processing on the G channel target signal; decomposing a signal corresponding to the B channel image information into a plurality of sub-signals, screening the plurality of sub-signals by using a third preset threshold value to obtain a B channel target signal, and carrying out normalization processing on the B channel target signal; determining an effective chrominance signal by using the R channel target signal, the G channel target signal and the B channel target signal after normalization processing;

for each face image, converting an effective chrominance signal corresponding to the face image into a frequency domain signal, and determining a heart rate value of the target individual based on a frequency value at the peak value of the frequency domain signal obtained by conversion; determining a second probability distribution of the face image corresponding to each preset emotion category based on the determined heart rate value;

2. The method of claim 1, wherein the image in the predetermined area is the image left by the corresponding facial image except the eye position image.

3. The method according to claim 1, wherein the plurality of preset feature points comprise feature points corresponding to eye positions, feature points corresponding to lip positions and feature points corresponding to forehead positions.

4. The method according to claim 3, wherein the feature points corresponding to the forehead region comprise 13 feature points corresponding to the forehead region.

5. The method according to claim 1, wherein the amplifying the skin color change signal in the image in the preset region comprises:

6. The method of claim 1, wherein said extracting a plurality of facial images from said video influence comprises:

7. The method of claim 1, wherein determining the valid chrominance signal using the normalized R-channel target signal, G-channel target signal, and B-channel target signal comprises:

the effective chrominance signal is determined using the following equation:

X _s ＝3R _n -2G _n

in the formula, X _S 、Y _S Representing the effective chrominance signal, R _n Representing the nth R channel target signal, G _n Representing the nth G-channel target signal, B _n Representing the nth B channel target signal.

8. The method of claim 1, wherein determining a heart rate value for the target individual based on the frequency values at the peaks of the transformed frequency domain signal comprises:

determining the heart rate value using the formula:

HeardRate＝60*F _max