CN110619301A - Emotion automatic identification method based on bimodal signals - Google Patents

Emotion automatic identification method based on bimodal signals Download PDF

Info

Publication number
CN110619301A
CN110619301A CN201910868310.8A CN201910868310A CN110619301A CN 110619301 A CN110619301 A CN 110619301A CN 201910868310 A CN201910868310 A CN 201910868310A CN 110619301 A CN110619301 A CN 110619301A
Authority
CN
China
Prior art keywords
signal
facial expression
extracting
lbp
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910868310.8A
Other languages
Chinese (zh)
Other versions
CN110619301B (en
Inventor
王峰
牛锦
魏祥
宋剑桥
相虎生
王飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dao An Bang Tianjin Security Technology Co Ltd
Original Assignee
Dao An Bang Tianjin Security Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dao An Bang Tianjin Security Technology Co Ltd filed Critical Dao An Bang Tianjin Security Technology Co Ltd
Priority to CN201910868310.8A priority Critical patent/CN110619301B/en
Publication of CN110619301A publication Critical patent/CN110619301A/en
Application granted granted Critical
Publication of CN110619301B publication Critical patent/CN110619301B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/165Detection; Localisation; Normalisation using facial parts and geometric relationships
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/08Feature extraction
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Geometry (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
  • Measuring Pulse, Heart Rate, Blood Pressure Or Blood Flow (AREA)

Abstract

The invention discloses an emotion automatic identification method based on bimodal signals, which comprises the steps of cutting and framing video data containing expression actions, extracting a facial expression picture sequence, extracting LBP-TOP characteristics of the facial expression picture sequence, extracting pulse wave signals of the facial expression picture sequence based on a chromaticity model, and extracting time domain and frequency domain characteristics of the pulse wave signals; fusing the LBP-TOP characteristics of the extracted facial expression picture sequence with the time domain and frequency domain characteristics of the pulse wave signal; and dividing the fused facial expression images into a training set and a testing set, inputting the training set into a support vector machine for training and optimizing, and then supporting the support vector machine to realize automatic emotion recognition in the facial expression images. By the invention, the complexity of the system is greatly reduced and the convenience of the system is improved; the fused features avoid the problem of low recognition accuracy caused by artificial deliberate emotion masking or no obvious facial expression change.

Description

Emotion automatic identification method based on bimodal signals
Technical Field
The invention relates to the technical field of image processing, in particular to an automatic emotion recognition method based on a bimodal signal.
Background
The emotion recognition technology is becoming mature with the updating of instruments and equipment and the development of artificial intelligence, and is widely applied to various fields such as clinical medicine, emotional intelligence, national security, political psychology and the like. The existing emotion recognition method mainly comprises two types, namely physical sign detection based on a precision instrument, emotion recognition classification achieved by detecting physiological signals such as human brain electricity, electrocardio and pulse waves, and intelligent facial emotion recognition detection based on machine learning, wherein emotion recognition is achieved by mainly capturing motion conditions of facial muscles of a human face, such as the situation that a mouth is raised when a user is happy.
However, the two methods have respective advantages and disadvantages, the first method often requires expensive and complex equipment, although the detection result has high accuracy, the detection cost is high, the methods both belong to a contact type method, the data acquisition process is complex, the labor is consumed, and discomfort is easily brought to a testee, so that the method has certain limitation in practical application, is not suitable for large-scale popularization, and is often used in some special scenes, such as emotion detection of astronauts and emotion detection of soldiers after participation in major rescue; the second method is a commonly used identification and detection means at present, and the method does not need expensive equipment and is simple to operate. However, the accuracy of the detected recognized emotion cannot be guaranteed, although the facial expression can visually display the emotion change, many intrinsic emotion change processes are not perceived along with visual facial activities, people can mask and hide their emotion experiences, and therefore the meanings of the expression are misunderstood by observers, and emotion recognition accuracy is affected.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide an automatic emotion recognition method based on bimodal signals, aiming at the above-mentioned defects in the prior art.
The technical scheme adopted by the invention for solving the technical problems is as follows: the method for automatically recognizing the emotion based on the bimodal signals comprises the following steps:
the method comprises the following steps: cutting and framing video data containing expression actions, extracting a facial expression picture sequence from the beginning to the end of expression, and preprocessing the extracted facial expression picture sequence; wherein, the preprocessing mode at least comprises geometric correction and normalization;
step two: extracting LBP-TOP characteristics of the facial expression picture sequence;
step three: extracting pulse wave signals of the facial expression picture sequence based on the chrominance model, and extracting time domain and frequency domain characteristics of the pulse wave signals;
step four: fusing the LBP-TOP characteristics of the extracted facial expression picture sequence with the time domain and frequency domain characteristics of the pulse wave signal;
step five: dividing the fused facial expression images into a training set and a testing set, inputting the training set into a support vector machine for training, carrying out test optimization through the testing set after the training is finished, and realizing automatic emotion recognition in the facial expression images through the support vector machine after the training optimization.
Further, the step of extracting the LBP-TOP characteristics of the facial expression picture sequence is as follows:
A. converting the normalized sequence of the facial expression pictures into a gray scale image;
B. setting LBP-TOP extraction parameters, including selection of the number of face blocks, selection of radii of an X axis, a Y axis and a T axis, the number of adjacent points p and selection of an LBP mode;
C. calculating LBP values of an XY plane, an XT plane and a YT plane respectively, and connecting the LBP values of the three planes in series to obtain an LBP-TOP characteristic, wherein the calculation formula of the LBP value of each plane is as follows:
wherein (x)c,yc): the positions of the pixel points are the positions of the pixel points,the gray value of the pixel point is set,is the gray value of the central pixel point, and p is the number of the adjacent points of the pixel point.
Further, pulse wave signals of the facial expression picture sequence are extracted based on the chromaticity model, and time domain and frequency domain features of the pulse wave signals are extracted, and the method specifically comprises the following steps:
A. performing frame-by-frame face detection on cut video data containing expression motions by adopting a detection method of comprehensive AdaBoost and Cascade, selecting a face area with eyes and mouth removed as an interested area, avoiding the eyes and mouth area, and increasing the area of the interested area on the premise of ensuring the area to be a pure skin area so as to avoid the influence of eye blinking and mouth motions on pulse signal extraction;
B. the pulse wave signal extraction based on the chromaticity model is to eliminate static components, motion interference and diffuse reflection interference by utilizing the transformation of different color channel information difference values and proportions, and the change of the skin reflected light intensity caused by the blood volume change caused by the pulse fluctuation is reflected as the change of the brightness information in the acquired human face expression picture sequence; the brightness change is obtained by averaging the gray values of all the pixels, and the brightness information of all the color channels C E { R, G, B } of the image is represented as:
wherein N is the image serial number, N is the number of pictures, C (N) is the region of interest, R, G, B three-channel one-dimensional signal, cn(i, j) is the color gray value of the channel corresponding to the pixel point (i, j), and h and w respectively represent the height and width of the region of interest;
for each frame image, the change of luminance information of each color channel C ∈ { R, G, B } is expressed as:
wherein the subscript I denotes the current frame number, IciRepresenting the intensity of illumination during the exposure time of the camera,representing the coefficient of the static component of the light reflected by the skin surface,a dynamic change component s representing reflected light caused by a change in blood volume due to pulse beatiThe additive specular reflection component is shown, and the R, G and B channels are completely the same;
normalizing each color channel information sequence over a period of time to eliminateThe specific formula is as follows:
wherein C (N) represents the channel information of R, G and B in a period of time, N represents the image sequence label in the current time, N is the total number of images,the average value of the brightness information of each color channel C (n) in the current time;
defining the chrominance signal:
Xs=2R1(n)-3G1(n)
Ys=1.5R1(n)+G1(n)-1.5B1(n)
wherein R is1(n),G1(n),B1(n) is the normalized color channel signal;
to XsAnd YsObtaining X through a band-pass filter (0.7Hz-4Hz)fAnd YfAnd extracting the pulse wave signal S by the following equation:
S=Xf-αYf
where σ (·) represents the standard deviation of the signal; thereby eliminating interference of diffuse reflection and static components;
extracting time domain characteristics including a mean value, a standard deviation, an absolute value mean value of a first-order difference signal, an absolute value mean value of a second-order difference signal and an absolute value mean value of a normalized difference signal from the pulse waves, performing five-point moving smoothing filtering on the obtained pulse waves and removing heteropulsation, then detecting a main wave peak of the waveform, calculating a time interval of adjacent main wave peaks, namely a P-P interval, removing the pulse waves with the time interval less than 50ms, drawing a normal P-P interval to obtain a pulse variation signal, extracting the mean value and the standard deviation from the pulse variation signal, counting the number of P-P intervals more than 50ms, calculating the percentage of the P-P intervals more than 50ms, and calculating the root mean square of the difference value of the P-P intervals;
extracting frequency domain characteristics of pulse waves, dividing an original signal (0.7Hz-4Hz) into 6 non-overlapping sub-bands by using typical 1024-point fast Fourier transform, and respectively calculating the power spectral entropy of each sub-band, wherein the calculation formula is as follows:
p(ωi) The power spectral densities of different sub-bands are normalized; taking the first three sub-bands of the 6 sub-bands as low frequency bands and the last three sub-bands as high frequency bands, and calculating power spectrum entropy ratios of the high frequency band and the low frequency band; carrying out cubic spline interpolation on the pulse variation signal, refining pulse wave peak points, reserving signal instantaneous characteristics by removing signal mean, carrying out Fourier transform analysis on pulse variation signal frequency domain characteristics, and respectively calculating very low frequency power, wherein the calculation formula is as follows:
where PSD (f) is the signal power spectral density, f1And f2Respectively obtaining low-frequency power, high-frequency power, total power, a ratio of the low-frequency power to the high-frequency power, a ratio of the low-frequency power to the total power and a ratio of the high-frequency power to the total power by using the same principle for real frequency.
Further, the step of fusing the extracted LBP-TOP characteristics of the facial expression picture sequence with the time domain and frequency domain characteristics of the pulse wave signal is as follows:
fusing the LBP-TOP characteristics with the time-frequency domain characteristics of the physiological signals through typical correlation analysis to obtain new characteristics including expression signals and the physiological signals;
the CCA algorithm finds the corresponding basis vector w for the sample set X, Yx∈Rq,wy∈RpMake variableMaximum, the specific algorithm is expressed as finding the maximum of the correlation coefficient:
wherein ∑11Is the covariance matrix of X, Σ22Is the covariance matrix of Y, Σ12=cov(X,Y),∑21Is12Is transposed and solved to obtain wx,wyChange the current stateAnd as the combined features after projection, fusion of the two types of features is realized.
Further, inputting the training set into a support vector machine for training, and after the training is finished, performing test optimization through the test set, wherein the test optimization steps comprise:
selecting a radial vector RBF kernel function, wherein the radial vector kernel function can map samples in a nonlinear way, and has smaller numerical complexity, and the kernel function formula is as follows:
K(xi,xj)=exp(-γ||xi-xj||2)
wherein gamma is more than 0, the default value is 1/k, and k is the number of categories;
determining two parameters of a penalty factor C and cross validation times, wherein the selection of a C value has important influence on classification accuracy, and the larger the C value is, the larger the penalty for errors is, but the overfitting is caused if the C value is too large, so that the C value needs to be properly selected;
and training a support vector machine by using the training set data, calculating the recognition rate by using the test set, finishing the training when the recognition result meets the expected requirement, otherwise optimizing a punishment factor C, and continuing the training until the training effect meets the expected requirement.
The method for automatically identifying the emotion based on the bimodal signals has the following beneficial effects:
compared with the traditional emotion recognition technology needing wearing equipment, the method only needs to record videos containing different emotions;
compared with the traditional method with a single signal source, the method comprehensively utilizes the facial expression signals and the pulse signals, realizes emotion recognition based on multi-source information characteristic fusion, and avoids the problem of low recognition precision caused by artificial deliberate emotion masking or no obvious facial expression change;
compared with the inconvenience of traditional pulse signal acquisition and feature extraction, the pulse wave signal and the features thereof are acquired in a non-contact manner, so that the complexity of the system is greatly reduced and the convenience of the system is improved;
the LBP-TOP expression characteristic and the pulse signal characteristic are fused based on the typical correlation analysis (CCA), and a support vector machine is trained to realize final classification.
Drawings
Fig. 1 is a schematic flow chart of an emotion automatic identification method based on a bimodal signal provided by the present invention.
Fig. 2 is a schematic diagram of an LBP-TOP expression feature extraction process of an emotion automatic identification method based on a bimodal signal provided by the invention.
Fig. 3 is a schematic flow chart of extracting a pulse signal based on a chromaticity model in the automatic emotion recognition method based on a bimodal signal provided by the invention.
FIG. 4 is a schematic flow chart of feature fusion and classification of an emotion automatic identification method based on bimodal signals provided by the present invention.
FIG. 5 is a schematic flow chart of the support vector machine classification recognition model establishment of the automatic emotion recognition method based on bimodal signals.
Detailed Description
For a more clear understanding of the technical features, objects and effects of the present invention, embodiments of the present invention will now be described in detail with reference to the accompanying drawings.
As shown in fig. 1, the method for automatically recognizing emotion based on bimodal signals provided by the present invention comprises the following steps:
the method comprises the following steps: cutting and framing video data containing expression actions, extracting a facial expression picture sequence from the beginning to the end of expression, and preprocessing the extracted facial expression picture sequence; wherein, the preprocessing mode at least comprises geometric correction and normalization;
step two: extracting LBP-TOP characteristics of the facial expression picture sequence;
step three: extracting pulse wave signals of the facial expression picture sequence based on the chrominance model, and extracting time domain and frequency domain characteristics of the pulse wave signals;
step four: fusing the LBP-TOP characteristics of the extracted facial expression picture sequence with the time domain and frequency domain characteristics of the pulse wave signal;
step five: dividing the fused facial expression images into a training set and a testing set, inputting the training set into a support vector machine for training, carrying out test optimization through the testing set after the training is finished, and realizing automatic emotion recognition in the facial expression images through the support vector machine after the training optimization.
The method comprises the following steps of extracting LBP-TOP characteristics of a facial expression picture sequence:
converting the normalized sequence of the facial expression pictures into a gray scale image;
setting LBP-TOP extraction parameters, including selection of the number of face blocks, selection of radii of an X axis, a Y axis and a T axis, the number of adjacent points p and selection of an LBP mode;
calculating LBP values of an XY plane, an XT plane and a YT plane respectively, and connecting the LBP values of the three planes in series to obtain an LBP-TOP characteristic, wherein the calculation formula of the LBP value of each plane is as follows:
wherein (x)c,yc): the positions of the pixel points are the positions of the pixel points,the gray value of the pixel point is set,is the gray value of the central pixel point, and p is the number of the adjacent points of the pixel point.
The method comprises the following steps of extracting pulse wave signals of a facial expression picture sequence based on a chrominance model, and extracting time domain and frequency domain characteristics of the pulse wave signals, wherein the specific steps are as follows:
performing frame-by-frame face detection on cut video data containing expression motions by adopting a detection method of comprehensive AdaBoost and Cascade, selecting a face area with eyes and mouth removed as an interested area, avoiding the eyes and mouth area, and increasing the area of the interested area on the premise of ensuring the area to be a pure skin area so as to avoid the influence of eye blinking and mouth motions on pulse signal extraction;
the pulse wave signal extraction based on the chromaticity model is to eliminate static components, motion interference and diffuse reflection interference by utilizing the transformation of different color channel information difference values and proportions, and the change of the skin reflected light intensity caused by the blood volume change caused by the pulse fluctuation is reflected as the change of the brightness information in the acquired human face expression picture sequence; the brightness change is obtained by averaging the gray values of all the pixels, and the brightness information of all the color channels C E { R, G, B } of the image is represented as:
wherein N is the image serial number, N is the number of pictures, C (N) is the region of interest, R, G, B three-channel one-dimensional signal, cn(i, j) is the color gray value of the channel corresponding to the pixel point (i, j), and h and w respectively represent the height and width of the region of interest;
for each frame image, the change of luminance information of each color channel C ∈ { R, G, B } is expressed as:
wherein, the subscript i represents the current frame number,representing the intensity of illumination during the exposure time of the camera,representing the coefficient of the static component of the light reflected by the skin surface,a dynamic change component s representing reflected light caused by a change in blood volume due to pulse beatiThe additive specular reflection component is shown, and the R, G and B channels are completely the same;
normalizing each color channel information sequence over a period of time to eliminateThe specific formula is as follows:
wherein C (n) represents the information of R, G, B three color channels in a period of time, n representsThe image sequence number in the current time, N is the total number of images,the average value of the brightness information of each color channel C (n) in the current time;
defining the chrominance signal:
Xs=2R1(n)-3G1(n)
Ys=1.5R1(n)+G1(n)-1.5B1(n)
wherein R is1(n),G1(n),B1(n) is the normalized color channel signal;
to XsAnd YsObtaining X through a band-pass filter (0.7Hz-4Hz)fAnd YfAnd extracting the pulse wave signal S by the following equation:
S=Xf-αYf
where σ (·) represents the standard deviation of the signal; thereby eliminating interference of diffuse reflection and static components;
extracting time domain characteristics including a mean value, a standard deviation, an absolute value mean value of a first-order difference signal, an absolute value mean value of a second-order difference signal and an absolute value mean value of a normalized difference signal from the pulse waves, performing five-point moving smoothing filtering on the obtained pulse waves and removing heteropulsation, then detecting a main wave peak of the waveform, calculating a time interval of adjacent main wave peaks, namely a P-P interval, removing the pulse waves with the time interval less than 50ms, drawing a normal P-P interval to obtain a pulse variation signal, extracting the mean value and the standard deviation from the pulse variation signal, counting the number of P-P intervals more than 50ms, calculating the percentage of the P-P intervals more than 50ms, and calculating the root mean square of the difference value of the P-P intervals;
extracting frequency domain characteristics of pulse waves, dividing an original signal (0.7Hz-4Hz) into 6 non-overlapping sub-bands by using typical 1024-point fast Fourier transform, and respectively calculating the power spectral entropy of each sub-band, wherein the calculation formula is as follows:
p(ωi) The power spectral densities of different sub-bands are normalized; taking the first three sub-bands of the 6 sub-bands as low frequency bands and the last three sub-bands as high frequency bands, and calculating power spectrum entropy ratios of the high frequency band and the low frequency band; carrying out cubic spline interpolation on the pulse variation signal, refining pulse wave peak points, reserving signal instantaneous characteristics by removing signal mean, carrying out Fourier transform analysis on pulse variation signal frequency domain characteristics, and respectively calculating very low frequency power, wherein the calculation formula is as follows:
where PSD (f) is the signal power spectral density, f1And f2Respectively obtaining low-frequency power, high-frequency power, total power, a ratio of the low-frequency power to the high-frequency power, a ratio of the low-frequency power to the total power and a ratio of the high-frequency power to the total power by using the same principle for real frequency.
The method comprises the following steps of extracting LBP-TOP characteristics of a facial expression picture sequence, and fusing time domain and frequency domain characteristics of a pulse wave signal:
fusing the LBP-TOP characteristics with the time-frequency domain characteristics of the physiological signals through typical correlation analysis to obtain new characteristics including expression signals and the physiological signals;
the CCA algorithm finds the corresponding basis vector w for the sample set X, Yx∈Rq,wy∈RpMake variableMaximum, the specific algorithm is expressed as finding the maximum of the correlation coefficient:
wherein ∑11Is the covariance matrix of X, Σ22Is the covariance matrix of Y,∑12=cov(X,Y),∑21Is12Is transposed and solved to obtain wx,wyChange the current stateAnd as the combined features after projection, fusion of the two types of features is realized.
The training set is input into a support vector machine for training, and the step of testing and optimizing through the testing set after the training is finished comprises the following steps:
selecting a radial vector RBF kernel function, wherein the radial vector kernel function can map samples in a nonlinear way, and has smaller numerical complexity, and the kernel function formula is as follows:
K(xi,xj)=exp(-γ||xi-xj||2)
wherein gamma is more than 0, the default value is 1/k, and k is the number of categories;
determining two parameters of a penalty factor C and cross validation times, wherein the selection of a C value has important influence on classification accuracy, and the larger the C value is, the larger the penalty for errors is, but the overfitting is caused if the C value is too large, so that the C value needs to be properly selected;
and training a support vector machine by using the training set data, calculating the recognition rate by using the test set, finishing the training when the recognition result meets the expected requirement, otherwise optimizing a punishment factor C, and continuing the training until the training effect meets the expected requirement.
The following case is used for specific explanation.
Because video input needs a certain time for extracting physiological signals, a Chinese academy of sciences CAS (ME) 2 expression database is selected, and due to the fact that the difference of various expression data amounts of the database is large, 55 anger data, 74 distortion data, 131 happy data, 36 surpride data and 21 fear data are finally selected, and 317 data are calculated in total.
The method comprises the following steps: the method comprises the steps of cutting video data containing expression actions, enabling the cut video to be required to be unified for 10 seconds, dividing the cut video into frames, extracting expression picture sequences from the beginning to the end of expressions, and normalizing the frame numbers of all the samples to be 120 through linear interpolation, wherein the shortest sample in a CAS (ME) 2 expression database comprises 4 frames, and the longest sample comprises 118 frames. Preprocessing the extracted expression sequence such as geometric correction and normalization;
step two: extracting the LBP-TOP feature of the sequence of facial expression pictures, as shown in fig. 2, specifically includes the steps of:
(1) converting the normalized picture sequence into a gray-scale image;
(2) setting LBP-TOP extraction parameters, and making the number P of neighborhood points of the LBP-TOP operator equal to PXY=PXT=PYTThe radii of the x-axis, the y-axis and the t-axis are R respectively as 4X=RY=1,RTThe LBP mode selects a normalization mode, and the number of the blocks is 3 multiplied by 3;
(3) respectively calculating LBP values of the XY plane, the XT plane and the YT plane according to the set parameters, and connecting the LBP values of the three planes in series to obtain an LBP-TOP characteristic, wherein the calculation formula of the LBP value of each plane is as follows:
wherein (x)c,yc): the positions of the pixel points are the positions of the pixel points,is the gray value of the pixel point, and,the gray value of the central pixel point is, and p is the number of the adjacent points of the pixel point;
step three: pulse wave signals are extracted based on a chrominance model, and as shown in fig. 3, time domain and frequency domain features of the pulse wave signals are extracted.
Step four: fusing the expression characteristics with the time domain and frequency domain characteristics of the pulse wave signals, as shown in FIG. 4;
step five: dividing all facial expression data into a training set and a testing set, processing according to the four steps, and finally classifying by using a support vector machine, as shown in fig. 5:
(1) dividing the fused features into a test set and a training set at random, selecting a radial vector RBF kernel function which can map samples in a nonlinear way and has smaller numerical complexity, wherein the kernel function formula is as follows:
K(xi,xj)=exp(-γ||xi-xj||2)
where γ > 0, the default value is 1/k, and k is the number of classes.
(2) And determining two parameters of a penalty factor C and a cross validation time, wherein the selection of the value C has an important influence on the classification accuracy, and the larger the value C is, the greater the penalty on errors is, but the greater the value C is, the overfitting is caused, so that the value C needs to be properly selected.
(3) And training a support vector machine by using the training set data, calculating the recognition rate by using the test set, finishing the training when the recognition result meets the expected requirement, otherwise optimizing a punishment factor C, and continuing the training until the training effect meets the expected requirement.
The method for automatically identifying the emotion based on the bimodal signals has the following beneficial effects:
compared with the traditional emotion recognition technology needing wearing equipment, the method only needs to record videos containing different emotions;
compared with the traditional method with a single signal source, the method comprehensively utilizes the facial expression signals and the pulse signals, realizes emotion recognition based on multi-source information characteristic fusion, and avoids the problem of low recognition precision caused by artificial deliberate emotion masking or no obvious facial expression change;
compared with the inconvenience of traditional pulse signal acquisition and feature extraction, the pulse wave signal and the features thereof are acquired in a non-contact manner, so that the complexity of the system is greatly reduced and the convenience of the system is improved;
the LBP-TOP expression characteristic and the pulse signal characteristic are fused based on the typical correlation analysis (CCA), and a support vector machine is trained to realize final classification.
While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (5)

1. An automatic emotion recognition method based on a bimodal signal is characterized by comprising the following steps:
the method comprises the following steps: cutting and framing video data containing expression actions, extracting a facial expression picture sequence from the beginning to the end of expression, and preprocessing the extracted facial expression picture sequence; wherein, the preprocessing mode at least comprises geometric correction and normalization;
step two: extracting LBP-TOP characteristics of the facial expression picture sequence;
step three: extracting pulse wave signals of the facial expression picture sequence based on the chrominance model, and extracting time domain and frequency domain characteristics of the pulse wave signals;
step four: fusing the LBP-TOP characteristics of the extracted facial expression picture sequence with the time domain and frequency domain characteristics of the pulse wave signal;
step five: dividing the fused facial expression images into a training set and a testing set, inputting the training set into a support vector machine for training, carrying out test optimization through the testing set after the training is finished, and realizing automatic emotion recognition in the facial expression images through the support vector machine after the training optimization.
2. The method as claimed in claim 1, wherein the step of extracting LBP-TOP features of the facial expression picture sequence in the second step is as follows:
A. converting the normalized sequence of the facial expression pictures into a gray scale image;
B. setting LBP-TOP extraction parameters, including selection of the number of face blocks, selection of radii of an X axis, a Y axis and a T axis, the number of adjacent points p and selection of an LBP mode;
C. calculating LBP values of an XY plane, an XT plane and a YT plane respectively, and connecting the LBP values of the three planes in series to obtain an LBP-TOP characteristic, wherein the calculation formula of the LBP value of each plane is as follows:
wherein (x)c,yc) The positions of the pixel points are the positions of the pixel points,is the gray value of the pixel point, and,is the gray value of the central pixel point, and p is the number of the adjacent points of the pixel point.
3. The method of claim 1, wherein the third step of extracting the pulse wave signal of the sequence of facial expression pictures based on the chrominance model and extracting the time domain and frequency domain features of the pulse wave signal comprises the following steps:
A. performing frame-by-frame face detection on cut video data containing expression motions by adopting a detection method of comprehensive AdaBoost and Cascade, selecting a face area with eyes and mouth removed as an interested area, avoiding the eyes and mouth area, and increasing the area of the interested area on the premise of ensuring the area to be a pure skin area so as to avoid the influence of eye blinking and mouth motions on pulse signal extraction;
B. the pulse wave signal extraction based on the chromaticity model is to eliminate static components, motion interference and diffuse reflection interference by utilizing the transformation of different color channel information difference values and proportions, and the change of the skin reflected light intensity caused by the blood volume change caused by the pulse fluctuation is reflected as the change of the brightness information in the acquired human face expression picture sequence; the brightness change is obtained by averaging the gray values of all the pixels, and the brightness information of all the color channels C E { R, G, B } of the image is represented as:
wherein N is the image serial number, N is the number of pictures, C (N) is the region of interest, R, G, B three-channel one-dimensional signal, cn(i, j) is the color gray value of the channel corresponding to the pixel point (i, j), and h and w respectively represent the height and width of the region of interest;
for each frame image, the change of luminance information of each color channel C ∈ { R, G, B } is expressed as:
wherein the subscript I denotes the current frame number, IciRepresenting the intensity of illumination during the exposure time of the camera,representing the coefficient of the static component of the light reflected by the skin surface,a dynamic change component s representing reflected light caused by a change in blood volume due to pulse beatiThe additive specular reflection component is shown, and the R, G and B channels are completely the same;
normalizing each color channel information sequence over a period of time to eliminateThe specific formula is as follows:
wherein C (N) represents the channel information of R, G and B in a period of time, N represents the image sequence label in the current time, N is the total number of images,the average value of the brightness information of each color channel C (n) in the current time;
defining the chrominance signal:
Xs=2R1(n)-3G1(n)
Ys=1.5R1(n)+G1(n)-1.5B1(n)
wherein R is1(n),G1(n),B1(n) is the normalized color channel signal;
to XsAnd YsObtaining X through a band-pass filter (0.7Hz-4Hz)fAnd YfAnd extracting the pulse wave signal S by the following equation:
S=Xf-αYf
where σ (·) represents the standard deviation of the signal; thereby eliminating interference of diffuse reflection and static components;
extracting time domain characteristics including a mean value, a standard deviation, an absolute value mean value of a first-order difference signal, an absolute value mean value of a second-order difference signal and an absolute value mean value of a normalized difference signal from the pulse waves, performing five-point moving smoothing filtering on the obtained pulse waves and removing heteropulsation, then detecting a main wave peak of the waveform, calculating a time interval of adjacent main wave peaks, namely a P-P interval, removing the pulse waves with the time interval less than 50ms, drawing a normal P-P interval to obtain a pulse variation signal, extracting the mean value and the standard deviation from the pulse variation signal, counting the number of P-P intervals more than 50ms, calculating the percentage of the P-P intervals more than 50ms, and calculating the root mean square of the difference value of the P-P intervals;
extracting frequency domain characteristics of pulse waves, dividing an original signal (0.7Hz-4Hz) into 6 non-overlapping sub-bands by using typical 1024-point fast Fourier transform, and respectively calculating the power spectral entropy of each sub-band, wherein the calculation formula is as follows:
p(ωi) The power spectral densities of different sub-bands are normalized; taking the first three sub-bands of the 6 sub-bands as low frequency bands and the last three sub-bands as high frequency bands, and calculating power spectrum entropy ratios of the high frequency band and the low frequency band; carrying out cubic spline interpolation on the pulse variation signal, refining pulse wave peak points, reserving signal instantaneous characteristics by removing signal mean, carrying out Fourier transform analysis on pulse variation signal frequency domain characteristics, and respectively calculating very low frequency power, wherein the calculation formula is as follows:
where PSD (f) is the signal power spectral density, f1And f2Respectively obtaining low-frequency power, high-frequency power, total power, a ratio of the low-frequency power to the high-frequency power, a ratio of the low-frequency power to the total power and a ratio of the high-frequency power to the total power by using the same principle for real frequency.
4. The method as claimed in claim 1, wherein the step of fusing the extracted LBP-TOP features of the sequence of facial expression pictures with the time domain and frequency domain features of the pulse wave signal comprises the following steps:
fusing the LBP-TOP characteristics with the time-frequency domain characteristics of the physiological signals through typical correlation analysis to obtain new characteristics including expression signals and the physiological signals;
the CCA algorithm finds the corresponding basis vector w for the sample set X, Yx∈Rq,wy∈RpMake variableMaximum, the specific algorithm is expressed as finding the maximum of the correlation coefficient:
wherein ∑11Is XCovariance matrix, Σ22Is the covariance matrix of Y, Σ12=cov(X,Y),∑21Is12Is transposed and solved to obtain wx,wyChange the current stateAnd as the combined features after projection, fusion of the two types of features is realized.
5. The method of claim 1, wherein in the fifth step, the training set is input into a support vector machine for training, and after training, the step of performing test optimization through the test set comprises:
selecting a radial vector RBF kernel function, wherein the radial vector kernel function can map samples in a nonlinear way, and has smaller numerical complexity, and the kernel function formula is as follows:
K(xi,xj)=exp(-γ||xi-xj||2)
wherein gamma is more than 0, the default value is 1/k, and k is the number of categories;
determining two parameters of a penalty factor C and cross validation times, wherein the selection of a C value has important influence on classification accuracy, and the larger the C value is, the larger the penalty for errors is, but the overfitting is caused if the C value is too large, so that the C value needs to be properly selected;
and training a support vector machine by using the training set data, calculating the recognition rate by using the test set, finishing the training when the recognition result meets the expected requirement, otherwise optimizing a punishment factor C, and continuing the training until the training effect meets the expected requirement.
CN201910868310.8A 2019-09-13 2019-09-13 Emotion automatic identification method based on bimodal signals Active CN110619301B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910868310.8A CN110619301B (en) 2019-09-13 2019-09-13 Emotion automatic identification method based on bimodal signals

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910868310.8A CN110619301B (en) 2019-09-13 2019-09-13 Emotion automatic identification method based on bimodal signals

Publications (2)

Publication Number Publication Date
CN110619301A true CN110619301A (en) 2019-12-27
CN110619301B CN110619301B (en) 2023-04-18

Family

ID=68922889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910868310.8A Active CN110619301B (en) 2019-09-13 2019-09-13 Emotion automatic identification method based on bimodal signals

Country Status (1)

Country Link
CN (1) CN110619301B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353390A (en) * 2020-01-17 2020-06-30 道和安邦(天津)安防科技有限公司 Micro-expression recognition method based on deep learning
CN111523601A (en) * 2020-04-26 2020-08-11 道和安邦(天津)安防科技有限公司 Latent emotion recognition method based on knowledge guidance and generation counterstudy
CN111797747A (en) * 2020-06-28 2020-10-20 道和安邦(天津)安防科技有限公司 Potential emotion recognition method based on EEG, BVP and micro-expression
CN112270327A (en) * 2020-10-19 2021-01-26 西安工程大学 Power transmission conductor icing classification method based on local fusion frequency domain characteristics
CN112580612A (en) * 2021-02-22 2021-03-30 中国科学院自动化研究所 Physiological signal prediction method
CN112766112A (en) * 2021-01-08 2021-05-07 山东大学 Dynamic expression recognition method and system based on space-time multi-feature fusion
CN113017630A (en) * 2021-03-02 2021-06-25 贵阳像树岭科技有限公司 Visual perception emotion recognition method
CN113057633A (en) * 2021-03-26 2021-07-02 华南理工大学 Multi-modal emotional stress recognition method and device, computer equipment and storage medium
CN113827240A (en) * 2021-09-22 2021-12-24 北京百度网讯科技有限公司 Emotion classification method and emotion classification model training method, device and equipment
CN114049677A (en) * 2021-12-06 2022-02-15 中南大学 Vehicle ADAS control method and system based on emotion index of driver
CN114399709A (en) * 2021-12-30 2022-04-26 北京北大医疗脑健康科技有限公司 Child emotion recognition model training method and child emotion recognition method
CN114391846A (en) * 2022-01-21 2022-04-26 中山大学 Emotion recognition method and system based on filtering type feature selection
CN114403877A (en) * 2022-01-21 2022-04-29 中山大学 Multi-physiological-signal emotion quantitative evaluation method based on two-dimensional continuous model

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130300900A1 (en) * 2012-05-08 2013-11-14 Tomas Pfister Automated Recognition Algorithm For Detecting Facial Expressions
US20170311901A1 (en) * 2016-04-18 2017-11-02 Massachusetts Institute Of Technology Extraction of features from physiological signals
CN107491740A (en) * 2017-07-28 2017-12-19 北京科技大学 A kind of neonatal pain recognition methods based on facial expression analysis
CN108216254A (en) * 2018-01-10 2018-06-29 山东大学 The road anger Emotion identification method merged based on face-image with pulse information
CN110135254A (en) * 2019-04-12 2019-08-16 华南理工大学 A kind of fatigue expression recognition method
CN110164209A (en) * 2019-04-24 2019-08-23 薄涛 Instructional terminal, server and live teaching broadcast system
CN111797747A (en) * 2020-06-28 2020-10-20 道和安邦(天津)安防科技有限公司 Potential emotion recognition method based on EEG, BVP and micro-expression

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130300900A1 (en) * 2012-05-08 2013-11-14 Tomas Pfister Automated Recognition Algorithm For Detecting Facial Expressions
US20170311901A1 (en) * 2016-04-18 2017-11-02 Massachusetts Institute Of Technology Extraction of features from physiological signals
CN107491740A (en) * 2017-07-28 2017-12-19 北京科技大学 A kind of neonatal pain recognition methods based on facial expression analysis
CN108216254A (en) * 2018-01-10 2018-06-29 山东大学 The road anger Emotion identification method merged based on face-image with pulse information
CN110135254A (en) * 2019-04-12 2019-08-16 华南理工大学 A kind of fatigue expression recognition method
CN110164209A (en) * 2019-04-24 2019-08-23 薄涛 Instructional terminal, server and live teaching broadcast system
CN111797747A (en) * 2020-06-28 2020-10-20 道和安邦(天津)安防科技有限公司 Potential emotion recognition method based on EEG, BVP and micro-expression

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
ANASTASSIA ANGELOPOULOU等: "Evaluation of different chrominance models in the detection and reconstruction of faces and hands using the growing neural gas network", 《PATTERN ANALYSIS AND APPLICATIONS》 *
GERARD DE HAAN等: "Robust pulse-rate from chrominance-based rPPG", 《IEEE TRANSACTIONS ON BIOMEDICAL ENGINEERING》 *
XIAOPENG HONG等: "LBP-TOP: a Tensor Unfolding Revisit", 《ACCV 2016: COMPUTER VISION – ACCV 2016 WORKSHOPS》 *
于曼丽: "基于表情和生理信号的双模态视频情感识别研究", 《中国优秀硕士学位论文全文数据库_信息科技辑》 *
牛锦: "基于视频的潜在情绪识别研究", 《中国优秀硕士学位论文全文数据库_信息科技辑》 *
牛锦等: "人脸特征和脉搏信号特征的视频情绪识别", 《重庆理工大学学报(自然科学)》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353390A (en) * 2020-01-17 2020-06-30 道和安邦(天津)安防科技有限公司 Micro-expression recognition method based on deep learning
CN111523601A (en) * 2020-04-26 2020-08-11 道和安邦(天津)安防科技有限公司 Latent emotion recognition method based on knowledge guidance and generation counterstudy
CN111523601B (en) * 2020-04-26 2023-08-15 道和安邦(天津)安防科技有限公司 Potential emotion recognition method based on knowledge guidance and generation of countermeasure learning
CN111797747A (en) * 2020-06-28 2020-10-20 道和安邦(天津)安防科技有限公司 Potential emotion recognition method based on EEG, BVP and micro-expression
CN111797747B (en) * 2020-06-28 2023-08-18 道和安邦(天津)安防科技有限公司 Potential emotion recognition method based on EEG, BVP and micro-expression
CN112270327A (en) * 2020-10-19 2021-01-26 西安工程大学 Power transmission conductor icing classification method based on local fusion frequency domain characteristics
CN112270327B (en) * 2020-10-19 2023-03-14 西安工程大学 Power transmission conductor icing classification method based on local fusion frequency domain characteristics
CN112766112A (en) * 2021-01-08 2021-05-07 山东大学 Dynamic expression recognition method and system based on space-time multi-feature fusion
US11227161B1 (en) 2021-02-22 2022-01-18 Institute Of Automation, Chinese Academy Of Sciences Physiological signal prediction method
CN112580612A (en) * 2021-02-22 2021-03-30 中国科学院自动化研究所 Physiological signal prediction method
CN112580612B (en) * 2021-02-22 2021-06-08 中国科学院自动化研究所 Physiological signal prediction method
CN113017630A (en) * 2021-03-02 2021-06-25 贵阳像树岭科技有限公司 Visual perception emotion recognition method
CN113057633B (en) * 2021-03-26 2022-11-01 华南理工大学 Multi-modal emotional stress recognition method and device, computer equipment and storage medium
CN113057633A (en) * 2021-03-26 2021-07-02 华南理工大学 Multi-modal emotional stress recognition method and device, computer equipment and storage medium
CN113827240A (en) * 2021-09-22 2021-12-24 北京百度网讯科技有限公司 Emotion classification method and emotion classification model training method, device and equipment
CN113827240B (en) * 2021-09-22 2024-03-22 北京百度网讯科技有限公司 Emotion classification method, training device and training equipment for emotion classification model
CN114049677A (en) * 2021-12-06 2022-02-15 中南大学 Vehicle ADAS control method and system based on emotion index of driver
CN114049677B (en) * 2021-12-06 2023-08-25 中南大学 Vehicle ADAS control method and system based on driver emotion index
CN114399709A (en) * 2021-12-30 2022-04-26 北京北大医疗脑健康科技有限公司 Child emotion recognition model training method and child emotion recognition method
CN114391846A (en) * 2022-01-21 2022-04-26 中山大学 Emotion recognition method and system based on filtering type feature selection
CN114403877A (en) * 2022-01-21 2022-04-29 中山大学 Multi-physiological-signal emotion quantitative evaluation method based on two-dimensional continuous model
CN114391846B (en) * 2022-01-21 2023-12-01 中山大学 Emotion recognition method and system based on filtering type feature selection

Also Published As

Publication number Publication date
CN110619301B (en) 2023-04-18

Similar Documents

Publication Publication Date Title
CN110619301B (en) Emotion automatic identification method based on bimodal signals
CN109730637B (en) Quantitative analysis system and method for facial image of human face
CN110069958B (en) Electroencephalogram signal rapid identification method of dense deep convolutional neural network
CN106778695B (en) Multi-person rapid heart rate detection method based on video
JP6521845B2 (en) Device and method for measuring periodic fluctuation linked to heart beat
CN113017630B (en) Visual perception emotion recognition method
CN110991406B (en) RSVP electroencephalogram characteristic-based small target detection method and system
CN111797747B (en) Potential emotion recognition method based on EEG, BVP and micro-expression
Qureshi et al. Detection of glaucoma based on cup-to-disc ratio using fundus images
CN108596237B (en) A kind of endoscopic polyp of colon sorter of LCI laser based on color and blood vessel
CN111259895B (en) Emotion classification method and system based on facial blood flow distribution
CN111832431A (en) Emotional electroencephalogram classification method based on CNN
CN112084927A (en) Lip language identification method fusing multiple visual information
Hernandez-Ortega et al. A comparative evaluation of heart rate estimation methods using face videos
CN110473176B (en) Image processing method and device, fundus image processing method and electronic equipment
Cvejic et al. A nonreference image fusion metric based on the regional importance measure
CN110610480A (en) MCASPP neural network eyeground image optic cup optic disc segmentation model based on Attention mechanism
CN110251076B (en) Method and device for detecting significance based on contrast and fusing visual attention
CN111814738A (en) Human face recognition method, human face recognition device, computer equipment and medium based on artificial intelligence
Karmuse et al. A robust rPPG approach for continuous heart rate measurement based on face
Soltani et al. Beta wave activity analysis of EEG during mental painting reflects influence of artistic expertise
CN116453171A (en) Method and device for detecting blood vessel color in white eye area, electronic equipment and medium
CN110507288A (en) Vision based on one-dimensional convolutional neural networks induces motion sickness detection method
CN115690528A (en) Electroencephalogram signal aesthetic evaluation processing method, device, medium and terminal across main body scene
Hamel et al. Contribution of color information in visual saliency model for videos

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant