CN112699785B

CN112699785B - Group emotion recognition and abnormal emotion detection method based on dimension emotion model

Info

Publication number: CN112699785B
Application number: CN202011601643.3A
Authority: CN
Inventors: 潘磊; 王艾; 赵欣; 刘国春; 高大鹏; 袁小珂; 严宏; 马婷; 朱建刚; 严崇耀; 卢志伟
Original assignee: Civil Aviation Flight University of China
Current assignee: Civil Aviation Flight University of China
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2022-06-07
Anticipated expiration: 2040-12-29
Also published as: CN112699785A

Abstract

The invention discloses a group emotion recognition and abnormal emotion detection method based on a dimension emotion model, which relates to the technical field of intelligent emotion recognition, and is characterized in that a video data set of group emotion is created through data collection and manual marking based on a cognitive psychology PAD three-dimensional emotion model, and the position relation of six typical emotions in a PAD space is disclosed; creating an emotion prediction model based on group behaviors, and mapping group motion characteristics into three-dimensional coordinates in a PAD space; and constructing an abnormal emotion classifier, and judging that the scene has an abnormal state when two abnormal emotions, namely anger and fear, are detected. Aiming at group motion videos, the method and the device can accurately express the continuous change state of group emotion and can effectively identify the global abnormal state.

Description

Group emotion recognition and abnormal emotion detection method based on dimension emotion model

Technical Field

The invention relates to the technical field of intelligent emotion recognition, in particular to a group emotion recognition and abnormal emotion detection method based on a dimension emotion model.

Background

In recent years, with the continuous development of artificial intelligence, deep learning, psychological science and cognitive science, the computer is used for identifying, understanding, expressing and communicating human emotions, so that the computer has more comprehensive and higher-level intelligent degree and is more and more attracted by extensive attention and deep exploration in academia. For the intelligent video monitoring technology, the language communication, the facial expression and the limb movement of the crowd in the scene are collected, the joy, anger and sadness of the crowd are understood and experienced, the emotional state and the internal intention of the crowd are analyzed, the next action attempt is deduced, the computer makes corresponding feedback, and therefore the intelligent video monitoring technology has the communication capability of the emotional level. As one of the important development directions of the future intelligent monitoring technology, the emotion analysis and emotion recognition method based on vision has very important academic research value.

Of course, there are many ways in which human information media can convey emotion, including words, language, facial expressions, body behaviors, etc. Although both speech and facial expressions can abundantly express human emotions, speech signals are difficult to clearly capture in noisy public places. Meanwhile, in consideration of the high crowdedness and dynamic change of dense scenes, it is difficult for the existing video analysis technology to accurately locate each person's face from a crowded crowd and accurately extract facial expressions. Therefore, emotion analysis based on face tracking and face feature extraction is difficult to achieve ideal effects in dense scenes. Therefore, a feasible approach is to identify and evaluate the emotional state of the population by analyzing the body behaviors of the population in the monitoring video.

It is worth noting that currently, for emotional analysis of limb movement in academic communities, individual individuals are often used as research objects, and the emphasis is placed on mining and identifying individual posture features and emotional expressions thereof. However, unlike individual exercise, group behaviors have their unique internal structures and abundant external forms under the combined action of subjective factors, environmental factors, social factors and psychological factors. On one hand, the individuals communicate and cooperate with each other through information, so that the group presents certain tendency and integrity; on the other hand, the movement of individuals has certain autonomy and randomness, and the group shows certain disordering and unstructured characteristics. From the perspective of social psychology, in a dense crowd scene, the individual psychology is influenced by the surrounding environment, certain independence is lost, certain dependence is formed on the companions, the emotional state of the companions gradually tends to be consistent with the crowd, and a collective and subordinate psychological state is formed. Therefore, in consideration of the specificity of dense scenes and the uniqueness of group psychology, it is necessary to explore specific methods and strategies for analyzing group emotional states.

Emotion recognition methods based on group behaviors are currently mainly classified into two types: the method comprises a discrete model-based identification method and a basic A-V two-dimensional emotion model identification method. However, both of these methods have some disadvantages. First, unlike a simple piece of speech or an image, the content presented by surveillance video is very rich. The method has active group movement, complex group emotion and certain plot change. Therefore, the discrete emotion model can only identify a few typical scenes with single shapes and high recognition, and the covered specific emotion types are limited and deficient for dense people. In addition, the group feelings have many subtle features, and are manifested as a combination of multiple emotions. And emotions can change continuously over time. The characteristics of group emotion cannot be effectively expressed by a discrete model. Second, the A-V two-dimensional emotional model is measured primarily from two dimensions, Arousal and Valence. Wherein, Arousal reflects the intensity of the emotional state, and Valence reflects the type of the emotional state. However, the description form of two dimensions is still slightly simpler compared with a three-dimensional emotion model, for example, the document adopts an A-V two-dimensional emotion model, and only four emotion categories are distinguished. For complex group emotions, this is clearly insufficient. Third, the A-V emotion model cannot distinguish certain emotions (e.g., anger and fear both belong to the higher-dominance emotion of Arousal), but the PAD three-dimensional emotion model can effectively distinguish (anger belongs to the higher-dominance emotion and fear belongs to the lower-dominance emotion).

In order to solve the problems, the application provides a group emotion recognition and abnormal emotion detection method based on a dimension emotion model, and the PAD dimension model is used as a basis to express the group emotion as a three-dimensional coordinate point in an emotion space so as to realize accurate expression of complex emotion.

Disclosure of Invention

The invention aims to provide a group emotion recognition and abnormal emotion detection method based on a dimension emotion model, which is based on a PAD dimension model and expresses group emotion as a three-dimensional coordinate point in an emotion space so as to realize accurate expression of complex emotion.

The invention provides a group emotion recognition and abnormal emotion detection method based on a dimension emotion model, which comprises the following steps of:

s1: establishing a PAD three-dimensional emotion model based on group emotion: the method comprises three dimensions of a pleasure degree P, an activation degree A and an dominance degree D, wherein the value of each dimension is between-1 and +1, and a PAD emotion scale is set for reference of emotion dimensions;

s2: establishing a group behavior and group emotion data set: aiming at video data of different scenes, acquiring a standard video data set through a manual labeling strategy based on a cognitive psychology principle;

s3: and (3) counting a group emotion data set: according to a standard video data set, defining emotion types of videos, marking the videos as videos with the same emotion, normalizing PAD values of the videos to be between [ -1, 1], and determining values of the emotion in PAD space by calculating central points of coordinates;

s4: evaluating the group emotion data set: whether the labeling data are consistent or not is checked, whether the labeling data obey Gaussian distribution or not is verified and analyzed by adopting a Normplot function in a Matlab tool, and if the labeling data are not obey Gaussian distribution, the output image is bent;

s5: group emotion recognition and abnormal emotion detection: extracting group motion characteristics from the video, and expressing layer semantics in group motion;

s6: extracting and regressing the population emotional characteristics: a Support Vector Regression (SVR) is adopted, under the support of a training data set, an optimal hyperplane is searched, and a regression function is obtained on the basis of restraining the minimization of the structured risk;

s7: detection of abnormal emotional states: and taking the PAD value of each marked sample as an input, and training by a Support Vector Machine (SVM).

Further, in step S2, an emotion labeling system is designed according to the manual labeling strategy, and the system represents a P-dimensional value by the facial expression of the character model, represents an a-dimensional value by the vibration degree of the heart, and represents a D-dimensional value by the size of the small person.

Further, the method for determining consistency in step S4 is as follows: calculating a variation coefficient, and counting and evaluating three indexes of a sample mean value mu, a sample standard deviation sigma and a variation coefficient CV of the PAD data, wherein the variation coefficient is defined as:

if the variation coefficient is small, the consistency of the verification marking data is low; otherwise, the consistency of the verification marking data is high.

Further, the step S5 extracts a populationThe motion characteristics comprise extraction of a foreground area, extraction of optical flow characteristics, extraction of track characteristics and graphical expression of the motion characteristics; the foreground region is extracted by adopting an improved ViBE + algorithm, and the foreground region of the t-th frame is detected to be represented as R^t(ii) a The extraction of the optical flow characteristics adopts a dense optical flow field of Gunner Farnembeck to carry out visual expression, and for the t frame image, the optical flow offsets of pixel points (x, y) in the transverse direction and the longitudinal direction are u and v respectively; the extraction of the track features adopts iDT algorithm, carries out dense collection on video pixel points, and judges the position of a tracking point in the next frame through optical flow, thereby forming a tracking track which is expressed as T (p)₁，p₂..p_L) Wherein L is less than or equal to 15; the graphical expression of the motion characteristics adopts three graphical characteristic expression forms of a global motion intensity graph, a global motion directional diagram and a global motion trail graph.

Further, each track in the global motion track graph is represented by a solid line, and each track comprises three attribute features<T(p₁，p₂…p_L)，L，g_i>(ii) a Wherein, T (p)₁，p₂…p_L) Representing a number of tracking points p constituting a trajectory_iL represents the length of the track, g ∈ [0, 255 ∈]Representing the gray value of the i-th segment in the track, g_iIs represented as follows:

wherein i belongs to [1, L-1 ].

Further, the expression of the level semantics in the group motion in the step S5 is deeply analyzed by using a gray level co-occurrence matrix, and the adopted statistics include variance, contrast, second moment, entropy, correlation and reciprocal difference moment;

the variance is used for reflecting the gray level change degree of the image, when the variance is larger, the gray level change of the image is larger, and the calculation formula of the variance is as follows:

wherein the content of the first and second substances,

the contrast is used for measuring the value distribution of the matrix and the local variation in the image and reflecting the definition of the image and the depth of the texture, and a calculation formula of the contrast is as follows:

the second moment is used for measuring the gray change stability degree of the image texture and reflecting the gray distribution uniformity degree and the texture thickness degree of the image, if the value of the second moment is larger, the texture mode is in a uniform and regular change, and the calculation formula of the second moment is as follows:

the entropy is used for measuring the randomness of the information content of the image and reflecting the complexity of the gray level distribution of the image, and the calculation formula of the entropy is as follows:

the correlation is used for measuring the similarity of the elements of the space gray level co-occurrence matrix in the row or column direction and reflecting the consistency of image textures, and a calculation formula of the correlation is as follows:

the reciprocal difference moment is used for reflecting the homogeneity of the image texture and measuring the local change of the image texture, if the value is large, the change is absent among different areas of the image texture, the local uniformity is realized, and the calculation formula of the reciprocal difference moment is as follows:

further, the regression function of step S6 is as follows:

wherein, omega is a weight vector, C is a balance coefficient,

ξ_iin order to be a function of the relaxation variable,

for non-linear transformations mapping data to high dimensional space, b is the bias term and ε is the sensitivity;

introducing Lagrange multiplier, and converting the formula (10) into:

wherein, the first and the second end of the pipe are connected with each other,

the regression function finally found was:

wherein, k (x, x)_i) Is a kernel function;

adopting a radial basis kernel function RBF, and the expression is as follows:

k(x_i，x_j)＝exp(-||x_i-x_j||²/2σ²) (13)

and obtaining a regression model after training to realize dimension emotion prediction, predicting a continuous value of each video section in the PAD space, and when the group emotion changes along with time, expressing the group emotion as a continuous three-dimensional track so as to present a gradual emotion process.

Further, the detection of the abnormal emotional state in step S7 obtains a quadratic equation of the SVM hyperplane, which is expressed as:

wherein s.t.w^TΦ(x_i)≥ρ-ξ_i，ξ_i≥0，x_iTraining set data w representing i ═ {1,2, … N }^TΦ(x_i) -0 max decision hyperplane; xi shape_iA relaxation variable that penalizes outliers; v is an element (0, 1)]Is a percentage estimate; phi (-) is a nonlinear equation for mapping training data to a high-dimensional feature space; further, the kernel function is defined as k (x)_i，x_j)＝<Φ(x_i)，Φ(x_j)>Performing point multiplication operation in the feature space, and adopting a Gaussian kernel function, wherein the decision function is defined as:

compared with the prior art, the invention has the following remarkable advantages:

the method comprises the steps of firstly, applying a three-dimensional emotion model to group emotion recognition under a dense crowd scene for the first time, and representing group emotion as a three-dimensional coordinate point in an emotion space on the basis of a PAD dimension model so as to realize accurate expression of complex emotion.

Secondly, a dimension emotion data set facing to group behaviors is created for the first time, and coordinates and connection of various emotions in a three-dimensional emotion space are disclosed through a manual labeling and statistical analysis method, so that a data base is laid for subsequent emotion analysis.

Thirdly, a series of methods for extracting emotional features from group motion are provided. Under the relevant definition of dimension emotion, through support vector regression, an abstract process and a mapping method from motion to emotion are constructed.

Fourth, both feelings of startle and anger are defined as abnormal feelings. By identifying these two emotions, it can be judged that an abnormal state has occurred in the scene. Therefore, a novel solution is developed for the intelligent detection of scenes from the perspective of emotion recognition.

Drawings

Fig. 1 is a diagram for detecting group abnormal emotion based on a UMN and PET2009 data set according to an embodiment of the present invention;

FIG. 2 is a diagram of data analysis of PAD dimension for video segments according to an embodiment of the present invention;

FIG. 3 is a flowchart of group emotion recognition and abnormal state detection provided by an embodiment of the present invention;

FIG. 4 is a diagram of extraction of group motion features and middle level semantic representation provided by an embodiment of the present invention;

FIG. 5 is a diagram illustrating the effect of GMIC, GMOC and GMTC provided by an embodiment of the present invention;

FIG. 6 is a flowchart illustrating an exemplary method for detecting an abnormal emotional state;

FIG. 7 is a graph of the PAD dimension space for six emotion types provided by embodiments of the present invention.

Detailed Description

The technical solutions of the embodiments of the present invention are clearly and completely described below with reference to the drawings in the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be obtained by a person skilled in the art without any inventive step based on the embodiments of the present invention, shall fall within the scope of protection of the present invention.

The psychological theory holds that the emotion and intention of the inner heart of a person can be revealed by subconscious limb actions due to the strong correlation between the emotion and the external behavior of the inner heart of the person. Therefore, it is feasible to identify emotional states by emotional attributes that describe the behavior of the population based on psychological models and social emotional principles. From a macroscopic point of view, different morphologies and movement patterns of the population tend to reflect many typical emotional states as a whole.

According to the existing literature data analysis, the emotion recognition method based on the group behaviors is divided from the perspective of a psychological model, and is mainly divided into two types at present: the identification method based on the discrete model and the identification method based on the A-V two-dimensional emotion model. However, both of these methods have some disadvantages.

First, unlike a simple piece of speech or an image, the content presented by surveillance video is very rich. The method has active group movement, complex group emotion and certain plot change. Therefore, the discrete emotion model can only identify a few typical scenes with single shapes and high recognition, and the covered specific emotion types are limited and deficient for dense people. In addition, the group feelings have many subtle features, and are manifested as a combination of multiple emotions. And emotions can change continuously over time. The characteristics of group emotion cannot be effectively expressed by a discrete model.

Second, the A-V two-dimensional emotional model is measured primarily from two dimensions, Arousal and Valence. Wherein, Arousal reflects the intensity of the emotional state, and Valence reflects the type of the emotional state. But the two-dimensional description is still somewhat simpler than the three-dimensional emotional model.

Third, the A-V affective model cannot distinguish some emotions (e.g., anger and fear both belong to the higher-dominance emotions of Arousal), but the PAD three-dimensional affective model can effectively distinguish (anger belongs to the higher-dominance emotion and fear belongs to the lower-dominance emotion).

According to the method, a video data set of group emotion is created from the perspective of group emotion recognition based on a psychological PAD three-dimensional emotion model through data collection and manual marking, and the position relation of six typical emotions in a PAD space is disclosed. An emotion prediction model based on group behaviors is constructed, and group motion characteristics are mapped into three-dimensional coordinates in a PAD space. The method comprises the following steps: motion feature extraction, middle-layer semantic expression, population emotion feature extraction and fusion, and emotion state regression. And constructing an abnormal emotion classifier, and judging that the scene has an abnormal state when two abnormal emotions, namely anger and fear, are detected. The method provided by the application can accurately express the continuous variation state of the group emotion and can also realize effective identification of the global abnormal state.

Referring to fig. 1-7, the invention provides a group emotion recognition and abnormal emotion detection method based on a dimension emotion model, which comprises the following steps:

s2: establishing a group behavior and group emotion data set: acquiring a standard video data set by a manual marking strategy based on a cognitive psychology principle aiming at video data of different scenes;

Wherein P in step S1 represents positive and negative characteristics of individual emotional state, including positive or negative opposite states of emotion, such as like or dislike, satisfaction or dissatisfaction, pleasure or unhappy. If the joy is positive, it represents positive emotion, otherwise it represents negative emotion. In general, different types of group movements may express positive or negative emotions for a dense population. For example, slow walking and stationary conversation all represent positive emotions; while fighting conflicts and running away are typical negative emotions.

A represents the neurophysiologic activation level, alertness, and the degree of activation of body energy associated with emotional states, i.e., the intensity characteristics of emotion, including both low arousal states (e.g., silence) and high arousal states (e.g., surprise). For dense populations, the intensity of the movement changes reflects the level of activation. For example, when the free forward movement of the crowd suddenly changes into escape, the general situation shows that the crowd is stimulated and influenced by some external factors, and the activation degree of the crowd is changed from a low awakening state to a high awakening state.

D represents the control state of the individual on the scene and other people, mainly refers to the subjective control degree of the individual on the emotional state, and is used for distinguishing whether the emotional state is generated by the individual subjectively or influenced by the objective environment. For dense populations, the variability and homogeneity of individual movements represent the magnitude of dominance. When the individual movement shows certain autonomy, randomness and disorder, for example, people who walk on squares and streets for leisure have individual behaviors which mainly follow the subjective consciousness of the individual, the dominance degree is higher. When the individual movement shows certain characteristics of people following and converging, for example, when people who escape are evacuated, all people run in a certain direction, the individual movement is limited to a group movement mode, and the group movement mode is macroscopically consistent, so that the dominance degree is low.

Example 1

In the step S2, an emotion labeling system is designed according to the strategy of manual labeling, and the system represents a P-dimensional value by the facial expression of the character model, represents an a-dimensional value by the vibration degree of the heart, and represents a D-dimensional value by the size of the little.

The emotion data set construction method mainly comprises two methods: deduction mode and extraction mode. The deduction way is composed of

Performers (preferably with professional performance literacy) simulate a typical emotional type (joy, panic, sadness) by physical movement. The emotion contrast of the method is bright, the expressive force is strong, but the deductive form is different from the real emotion, the requirement on performance literacy of an actor is high, and the method is not universal. The extraction mode is to score the emotional state and the display of the group behaviors and the evaluation of each index by adopting an artificial marking method from the video clips of the real scene. The emotion obtained by the method is natural leakage of people, is closer to real life, but has larger workload of later-stage labeling.

Currently, there is no emotional database of group behaviors in the academic world. Some scholars do similar work in research work, and propose some calibrated data sets, but the data sets are not published, and the validity of the data sets cannot be verified. The main purpose of the prior art is to improve the detection efficiency of group behaviors, not emotion analysis. In view of the data set of the individual posture and the individual behavior, the emotion labeling experiment is developed, so that the data set of the group behavior and the group emotion is established, a plurality of real video data are abstracted according to different scenes, and a standard video data set is obtained through a third-party manual labeling strategy.

The emotion data set constructed by the emotion experiment is derived from a UMN data set, a PETS 2009 data set, a UCF data set, an SCU data set, a UCF crown BEHAVE data set and Web

Absolul/Normal crown data set, and Violent-flows data set Rodriguez's data set, wherein the videos of the dense crowd scene are 50 videos in total, a plurality of video segments are cut out in a unit of 15 frames, 200 video segments are 200 video segments in total, 31 volunteers (17 males and 14 females with the age of 19-35 years) are invited, and each video segment is labeled respectively, and the method comprises the following two aspects of work:

(1) the volunteers scored P, A, D three dimensional values for each video segment. The score of each dimension is from low to high as {1,2,3,4,5} five single options.

(2) The volunteers were to determine the type of emotion each video presented, including seven single options { excited, angry, fear, peaceful, boring, neutral, none of the above }.

Example 2

The method for judging the consistency in step S4 is as follows: calculating a variation coefficient, and counting and evaluating three indexes of a sample mean value mu, a sample standard deviation sigma and a variation coefficient CV of the PAD data, wherein the variation coefficient is defined as:

For the PAD data of different video segments, their mark values in the same dimension are counted. If the coefficient of variation is large, the dispersion degree on the unit mean value is large, which indicates that the consistency and the certainty of the volunteer scoring the group are low; otherwise, it indicates that the volunteer has high consistency and certainty of scoring the group. Generally speaking, for a video with low consistency, if the coefficient of variation is greater than 20%, the data is considered to be possibly abnormal, which indicates that the volunteer has a large divergence, and the data can be considered to be removed from the data set so as to ensure the credibility of the data.

Taking the video segments as examples, the former video segment represents that people are running away, the latter video segment represents that people are fighting violently, and regarding the PAD statistical data of the two video segments, the statistical results show that the variation coefficients CV of the labeled data are concentrated in the [0, 20% ] interval, and it can be considered that the scores of the volunteers are concentrated and have small divergence, so the PAD data of the two video segments are credible.

Example 3

The step S5 of extracting group motion features comprises extraction of foreground regions, extraction of optical flow features, extraction of track features and graphical expression of motion features; the foreground region is extracted by adopting an improved ViBE + algorithm, and the foreground region of the t-th frame is detected to be represented as R^t(ii) a The extraction of the optical flow characteristics adopts a dense optical flow field of a Gunner Fameback to carry out visual expression, and for the t frame image, the optical flow offsets of pixel points (x, y) in the transverse direction and the longitudinal direction are u and v respectively; the extraction of the track features adopts iDT algorithm, carries out dense collection on video pixel points, and judges the position of a tracking point in the next frame through optical flow, thereby forming a tracking track which is expressed as T (p)₁，p₂…p_L) Wherein L is less than or equal to 15; the graphical expression of the motion characteristics adopts three graphical characteristic expression forms of a global motion intensity graph, a global motion directional diagram and a global motion trail graph.

Each track in the global motion track graph is represented by a solid line, and each track comprises three attribute characteristics<T(p₁，p₂…p_L)，L，g_i>(ii) a Wherein, T (p)₁，p₂…p_L) Representing a number of tracking points p constituting a trajectory_iL represents the length of the track, g ∈ [0, 255 ∈]Representing the gray value of the i-th segment in the track, g_iIs represented as follows:

wherein i belongs to [1, L-1 ].

The expression of the layer semantics in the group movement of the step S5 is deeply analyzed by adopting a gray level co-occurrence matrix, and the adopted statistics comprise variance, contrast, second moment, entropy, correlation and reciprocal difference moment;

wherein the content of the first and second substances,

the contrast is used for measuring the value distribution of the matrix and the local variable quantity in the image and reflecting the definition of the image and the depth of the texture, and the calculation formula of the contrast is as follows:

the second moment is used for measuring the gray change stability of the image texture and reflecting the gray distribution uniformity and texture thickness of the image, and if the value of the second moment is larger, the texture mode with uniform and regular change is indicated, and the calculation formula of the second moment is as follows:

example 4

The step S6 regression function is as follows:

wherein, omega is a weight vector, C is a balance coefficient,

ξ_iin order to be a function of the relaxation variable,

introducing a lagrange multiplier, equation (10) translates to:

wherein the content of the first and second substances,

the regression function finally found was:

wherein, k (x, x)_i) Is a kernel function;

adopting a radial basis kernel function RBF, and the expression is as follows:

k(x_i，x_j)＝exp(-||x_i-x_j||²/2σ²) (13)

Example 5

The step S7 of detecting the abnormal emotional state obtains a quadratic equation of the SVM hyperplane, where the expression is:

wherein s.t.w^TΦ(x_i)≥ρ-ξ_i，ξ_i≥0，x_iTraining set data w representing i ═ {1,2, … N }^TΦ(x_i) -0 max decision hyperplane; xi_iA relaxation variable that penalizes outliers; v is an element (0, 1)]Is a percentage estimate; phi (-) is a nonlinear equation for mapping training data to a high-dimensional feature space; further, the kernel function is defined as k (x)_i，x_j)＝<Φ(x_i)，Φ(x_j)>Performing point multiplication operation in the feature space, and adopting a Gaussian kernel function, wherein the decision function is defined as:

for the detection result of the population abnormal emotion, the coordinates of six emotion types in the PAD space are determined. The gradual change of the curve from light to dark shows the sequence of the frame sequences. Since the group emotional state in the video is changed continuously, the group emotional state is represented as continuous change of the group emotional state. The variation process of the group emotion in the video along with time is represented as a continuous emotion track in the figure. It can be seen that the emotion initially fluctuates around the boring coordinate point, indicating that the population is in a normal state at this time. Then the emotion track suddenly moves to the vicinity of the coordinate point of fear, which shows that the group emotion is converted into abnormity and the change is very sudden. Therefore, from a qualitative point of view, the description of the population emotion by the experiment is consistent with the fact.

The above disclosure is only for a few specific embodiments of the present invention, however, the present invention is not limited to the above embodiments, and any modifications that can be made by those skilled in the art are intended to fall within the scope of the present invention.

Claims

1. The method for group emotion recognition and abnormal emotion detection based on the dimension emotion model is characterized by comprising the following steps of:

s4: evaluating the group emotion data set: whether the labeling data are consistent or not is checked, whether the labeling data obey Gaussian distribution or not is verified and analyzed through a Normplot function, and if the labeling data do not obey the Gaussian distribution, the output image is bent;

extracting group motion features, wherein the extracting group motion features comprise extracting a foreground region, extracting optical flow features, extracting track features and graphically expressing motion features; the foreground region is extracted by adopting an improved ViBE + algorithm, and the foreground region of the t-th frame is detected to be represented as R^t(ii) a The extraction of the optical flow characteristics adopts a dense optical flow field of Gunner Farnembeck to carry out visual expression, and for the t frame image, the optical flow offsets of pixel points (x, y) in the transverse direction and the longitudinal direction are u and v respectively; the extraction of the track features adopts iDT algorithm, carries out dense collection on video pixel points, and judges the position of a tracking point in the next frame through optical flow, thereby forming a tracking track which is expressed as T (p)₁，p_2…p_L) Wherein L is less than or equal to 15; the graphical expression of the motion characteristics adopts three graphical characteristic expression forms of a global motion intensity graph, a global motion directional diagram and a global motion trail graph;

each track in the global motion track graph is represented by a solid line, and each track comprises three attribute characteristics<T(p₁，p₂…p_L)，L，g_i>(ii) a Wherein, T (p)₁，p₂…p_L) Representing a number of tracking points p constituting a track_iL represents the length of the track, g ∈ [0, 255 ∈]Representing the gray value of the i-th segment in the track, g_iIs represented as follows:

wherein i belongs to [1, L-1 ];

wherein the content of the first and second substances,

s6: extracting and regressing the population emotional characteristics: a Support Vector Regression (SVR) is adopted, under the support of a training data set, an optimal hyperplane is searched, and a regression function is obtained on the basis of restraining the minimization of the structured risk; the regression function is as follows:

wherein, omega is a weight vector, C is a balance coefficient,

ξ_iin order to be a function of the relaxation variable,

introducing a lagrange multiplier, equation (10) translates to:

wherein, a is more than or equal to 0_i≤C，

0≤a_i≤C；

The regression function finally found was:

wherein, k (x, x)_i) Is a kernel function;

adopting a radial basis kernel function RBF, and the expression is as follows:

k(x_i，x_j)＝exp(-||x_i-x_j||²/2σ²) (13)

obtaining a regression model after training to realize dimension emotion prediction, predicting a continuous value of each video section in a PAD space, and expressing the continuous value as a continuous three-dimensional track to present a gradual emotion process when group emotion changes along with time;

2. The method for group emotion recognition and abnormal emotion detection based on the dimension emotion model as recited in claim 1, wherein in step S2, an emotion labeling system is designed according to a manual labeling strategy, and the system represents a value in the P dimension by facial expression of the character model, a value in the a dimension by vibration degree of the heart, and a value in the D dimension by size of the small person.

3. The method for group emotion recognition and abnormal emotion detection based on dimension emotion model as claimed in claim 1, wherein the method for judging consistency in step S4 is as follows: calculating a variation coefficient, and counting and evaluating three indexes of a sample mean value mu, a sample standard deviation sigma and a variation coefficient CV of the PAD data, wherein the variation coefficient is defined as:

4. The method for group emotion recognition and abnormal emotion detection based on dimension emotion model as claimed in claim 1, wherein the detection of abnormal emotion state in step S7 obtains quadratic equation of SVM hyperplane, and its expression is:

wherein s.t.w^TΦ(x_i)≥ρ-ξ_i，ξ_i≥0，x_iTraining set data w representing i ═ {1,2, … N }^TΦ(x_i) -0 max decision hyperplane; xi_iA relaxation variable that penalizes outliers; v is an element (0, 1)]Is a percentage estimate; phi (-) is a nonlinear equation of the training data mapping to the high-dimensional feature space; further, the radial basis kernel function is defined as k (x)_i，x_j)＝<Φ(x_i)，Φ(x_j)>Performing point multiplication operation in a feature space, and defining a decision function as follows by adopting a Gaussian kernel function: