CN114694234A

CN114694234A - Emotion recognition method, system, electronic device and storage medium

Info

Publication number: CN114694234A
Application number: CN202210621369.9A
Authority: CN
Inventors: 涂涛; 郑舟旋; 王增锹
Original assignee: Hangzhou Zeno Videopark Import Export Co ltd
Current assignee: Hangzhou Zeno Videopark Import Export Co ltd
Priority date: 2022-06-02
Filing date: 2022-06-02
Publication date: 2022-07-01
Anticipated expiration: 2042-06-02
Also published as: CN114694234B

Abstract

The method comprises the steps of obtaining a face video, calculating motion vectors among frames in the face video, carrying out DCT coefficient conversion processing on the motion vectors to obtain a time sequencing array and a space sequencing array, inputting the time sequencing array and the space sequencing array into an Expianabizable AI model to obtain a sentiment recognition classification result, and obtaining a value of a health state label according to the sentiment recognition classification result, so that parents or teachers can be helped to find students with potential psychological risks. In addition, compared with the traditional unsupervised clustering K-Means algorithm in a nonlinear calculation mode, the Explainable AI model of the embodiment adopts a linear calculation mode, so that the calculation complexity is reduced, the response time is shortened, and the interpretability of the emotion recognition classification result is improved.

Description

Emotion recognition method, system, electronic device and storage medium

Technical Field

The present application relates to the field of emotion recognition technology, and in particular, to an emotion recognition method, system, electronic device, and storage medium.

Background

People have emotions, particularly young students, and teenagers with certain behaviors or disorders in mind can lose the emotions if the emotions and emotional treatments are not proper, so that a person keeps the spirit of the young students, the psychological health of children is not facilitated, the achievement of the children is influenced, and a series of psychological diseases are seriously even caused, therefore, the emotions of the children need to be concerned by parents and society urgently.

At present, the traditional unsupervised clustering K-Means algorithm is adopted, because the discovery accuracy rate is 75%, the recall rate is 30%, and the test accuracy and stability are low, the practical target can not be reached far away, and therefore students with potential psychological risks can not be discovered effectively.

At present, an effective solution is not provided aiming at the problem that students with potential psychological risks cannot be effectively discovered due to the fact that the emotion of the students is analyzed through an unsupervised clustering K-Means algorithm in the related technology and the accuracy and stability of the test are low.

Disclosure of Invention

The embodiment of the application provides an emotion recognition method, an emotion recognition system, electronic equipment and a storage medium, and aims to at least solve the problem that students with potential psychological risks cannot be effectively discovered due to low test accuracy and stability when the emotions of the students are analyzed through an unsupervised clustering K-Means algorithm in the related technology.

In a first aspect, an embodiment of the present application provides an emotion recognition method, including:

acquiring a face video, wherein the face video consists of a plurality of vibration images;

calculating a motion vector between frames in the face video;

performing DCT coefficient conversion processing on the motion vector to obtain a time sorting array and a space sorting array;

and inputting the time sequencing array and the space sequencing array into an Explainable AI model to obtain an emotion recognition classification result, and obtaining a value of the health state label according to the emotion recognition classification result.

In some embodiments, the explainable ai model is a generalized additive model, where the generalized additive model is a model obtained when each feature function is an optimal function through integrated learning of each feature function by a Bagging algorithm and a Boosting algorithm based on an EBM model, each classification result in the emotion recognition classification results is obtained through each feature function, and each feature function corresponds to each emotion recognition classification result one to one, and the emotion recognition classification results at least include an aggressive classification result, a stress classification result, an anxiety classification result, a suspicious classification result, a balanced classification result, a confidence classification result, a vitality classification result, a self-adjusting classification result, an inhibition classification result, a neural quality classification result, a depression classification result, and a happiness classification result.

In some embodiments, the training step of the explainable ai model is as follows:

constructing a cross entropy function of the Explainable AI model, and calculating the value of the cross entropy function;

when the calculation result of the cross entropy function is minimum, obtaining a weight matrix, a first group of critical value matrixes and a second group of critical value matrixes of each classifier;

inputting a training set into the Explainable AI model to obtain a first vector;

comparing the first vector to the first set of threshold matrices;

multiplying the comparison result by the second group of critical value matrixes, and taking the maximum value of the multiplication result as a feature vector;

and taking a value obtained by adding a first result and a second result as a value of the health state label, wherein the first result is obtained by multiplying the eigenvector by the weight matrix, the second result is obtained by multiplying a constant weight calculated by fitting by a first constant, and the first constant is the sum of a constant c and a regularization coefficient L which is subjected to constraint fitting.

In some embodiments, the cross entropy function is calculated as follows:

wherein yi is a real label value, i is a sample, and pi is the probability distribution of the sample;

the calculation formula of the regularization coefficient L of the constraint overfitting is as follows:

wherein, L is a regularization coefficient of constrained overfitting, N represents the total number of samples, y is a value of a health state label, x is a training sample, W is a weight matrix, and T is a matrix transposition symbol.

In some embodiments, after obtaining the value of the health status label according to the emotion recognition classification result, the method further comprises:

and judging whether the value of the health state label is smaller than a preset value, if so, indicating that the emotion of the student in the face video is unhealthy, and if not, indicating that the emotion of the student in the face video is healthy.

In some embodiments, the method further comprises inputting the time-ordered array and the space-ordered array into an explainabizable ai model to obtain an emotion recognition classification result, and after obtaining the value of the health status label according to the emotion recognition classification result, the method further comprises:

and comparing the value of the health state label with a heart lining measuring table, and calculating the coincidence rate.

In some embodiments, the face video is a video encoded by MPEG, and the face video includes a face upper body video of a preset time.

In a second aspect, an embodiment of the present application provides an emotion recognition system, including:

the system comprises an acquisition module, a processing module and a display module, wherein the acquisition module is used for acquiring a face video, and the face video consists of a plurality of vibration images;

the computing module is used for computing the motion vector between frames in the face video;

the conversion module is used for carrying out DCT coefficient conversion processing on the motion vector to obtain a time sorting array and a space sorting array;

and the classification module is used for inputting the time sequencing array and the space sequencing array into an Explainable AI model to obtain an emotion recognition classification result, and obtaining a value of the health state label according to the emotion recognition classification result.

In a third aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory stores therein a computer program, and the processor is configured to execute the computer program to perform the emotion recognition method as described above.

In a fourth aspect, the present application provides a storage medium having a computer program stored therein, wherein the computer program is configured to execute the emotion recognition method as described above when the computer program runs.

Compared with the related technology, in the emotion recognition method provided by the embodiment of the application, in the process of recognizing emotion scenes in juvenile student groups, the embodiment firstly obtains a face video, then calculates motion vectors among frames in the face video, performs DCT (discrete cosine transformation) coefficient conversion processing on the motion vectors to obtain a time sequencing array and a space sequencing array, inputs the time sequencing array and the space sequencing array into an Expianabizable AI (Artificial intelligence) model to obtain emotion recognition classification results, and obtains the value of a health state label according to the emotion recognition classification results, so that the problem that students with potential psychological risks cannot be effectively found in the related technology is solved And (4) generating. In addition, compared with the traditional unsupervised clustering K-Means algorithm in a nonlinear calculation mode, the Explainable AI model of the embodiment adopts a linear calculation mode, so that not only is the calculation complexity reduced, but also the response time (the response time is reduced from 3s to 0.02 s) is shortened, and the emotion recognition classification result can be used for representing the parameter weight by drawing in a characteristic value vector mode, and the result interpretability is improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1 is a first flowchart of an emotion recognition method according to an embodiment of the present application;

fig. 2 is a block diagram of a structure of an emotion recognition system according to an embodiment of the present application;

fig. 3 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.

Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.

Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The use of the terms "including," "comprising," "having," and any variations thereof herein, is meant to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. Reference herein to "a plurality" means greater than or equal to two. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.

The present application provides an emotion recognition method, and fig. 1 is a first flowchart of the emotion recognition method in an embodiment of the present application, and as shown in fig. 1, in the embodiment, the method includes the following steps:

step S101, obtaining a face video, wherein the face video is composed of a plurality of vibration images; the vibration image in the embodiment is an image of a human face obtained by measuring vestibular-emotional reflex in muscles on the neck of a human to enable the human brain to keep the vertical direction, and compared with the situation that the vibration frequency and the vibration amplitude of a pixel point cannot be inversely mapped by a common image, a thermal imaging or an X-ray image, the vibration image in the embodiment can calculate the vector change between each frame of a video from two dimensions of time and space, namely the frequency and the amplitude, so as to reflect the vibration frequency and the vibration amplitude of the pixel point; it should be noted that the face video in this embodiment is mainly directed to a face video of a juvenile student group, and certainly, other embodiments may also be a group of different ages, professions, or regions, and this is not specifically limited herein;

it is readily understood by those skilled in the art that vibrographic techniques characterize the reflex movements of the head by psychophysiological parameters based on the antenatal emotional reflex. The mental state of a person has a significant influence on the vibration of the person, and even small changes in the psychological emotional state can instantaneously cause changes in the kinetic energy and the vibration. The normal of the psychological emotional state also follows the normal law of vibration distribution, similar to physiological conditions. The quiet state is characterized by low frequency vibrations, and mental up fluctuations cause the frequency of the vibrations to increase.

Since the original data is often very chaotic, the machine learning model cannot extract effective information quickly, and in order to extract information in the face video quickly, after step S101, that is, after the face video is acquired, the method further includes the following steps:

preprocessing a face video to remove noise and redundant jitter, and intelligently removing a background in the video by the conventional AI means;

step S102, calculating motion vectors of all frames in the face video; the method for calculating the motion vector between frames in the face video comprises the following steps:

the method includes reading a plurality of frames of Macroblock (macro block) for calculation while decoding to obtain a motion vector, where the plurality of frames of Macroblock may be set according to user requirements, for example, reading every 2 frames of Macroblock while decoding for calculation, or reading every 5 frames of Macroblock while decoding for calculation, or reading every 10 frames of Macroblock while decoding for calculation, where no specific limitation is made here, and it is easy to understand that the difference in frame number affects calculation performance, that is, the more frames are superimposed, the faster the calculation is.

In addition, as known to those skilled in the art, Macroblock (Macroblock) is a basic concept in video coding technology, i.e. a picture is divided into blocks with different sizes to implement different compression strategies at different positions, and therefore, a detailed description thereof is omitted here.

Step S103, performing DCT coefficient conversion processing on the motion vector to obtain a time sorting array and a space sorting array; in the embodiment, the high frequency and the low frequency are separated by performing DCT coefficient conversion processing on the motion vector, so as to obtain two types of arrays of time ordering and space ordering;

step S104, inputting the time sequencing array and the space sequencing array into an Explainable AI model to obtain an emotion recognition classification result, and obtaining a value of a health state label according to the emotion recognition classification result, so that the current emotion state of a tested person (namely, a juvenile student) can be accurately recognized, and parents or teachers can be rapidly helped to find students with potential psychological risks through the value of the health state label. The emotion recognition classification result is a multi-dimensional classification result, and may be one or more of aggressive classification result, stress classification result, anxiety classification result, suspicious classification result, balanced classification result, confidence classification result, energy classification result, self-regulation classification result, inhibitory classification result, neural classification result, depression classification result, and happy classification result, in other embodiments, the emotion recognition classification result may be more, and is not specifically limited herein.

Through the above steps S101 to S104, in recognizing emotional scenes in the juvenile students group, the embodiment firstly obtains the face video, then calculates the motion vector between each frame in the face video, performing DCT coefficient conversion processing on the motion vector to obtain a time sorting array and a space sorting array, inputting the time sorting array and the space sorting array into an Explainable AI model to obtain emotion recognition classification results, the value of the health status label is obtained according to the emotion recognition classification result, the problem that students with potential psychological risks cannot be effectively found in the related technology is solved, the current psychological state of one person (namely, teenagers) is comprehensively evaluated according to the classification result with multiple dimensions, therefore, the current emotional state of the tested person (namely the juvenile students) can be accurately identified, and parents or teachers can be quickly helped to discover the students with potential psychological risks through the value of the health state label. In addition, compared with the traditional unsupervised clustering K-Means algorithm which adopts a nonlinear calculation mode, the Explainable AI model of the embodiment adopts a linear calculation mode, so that not only is the calculation complexity reduced, but also the response time (the response time is reduced from 3s to 0.02 s) is shortened, and in addition, the emotion recognition classification result can be used for representing the parameter weight through drawing in a characteristic value vector mode, and the result interpretability is improved.

In some embodiments, the Explainable ai model is a generalized additive model, wherein the generalized additive model is based on an EBM model (Explainable Boosting Machine, explanatory Machine learning model) that performs ensemble learning of each feature function through a Bagging algorithm (Bootstrap aggregation algorithm or Bagging algorithm) and a Boosting algorithm (Boosting algorithm), and a model obtained when each feature function is an optimal function, in other words, the Explainable a model is based on an EBM model (Bagging) composed of several classifiers after ensemble learning (Boosting); the emotion recognition classification results at least comprise aggressive classification results, stress classification results, anxiety classification results, suspicious classification results, balance classification results, confidence classification results, vitality classification results, self-regulation classification results, inhibition classification results, nerve quality classification results, depression classification results and happiness classification results. In the embodiment, the current psychological state of one person (namely, the juvenile students) is evaluated in all directions through the classification result of twelve dimensions, so that the current emotional state of the tested person (namely, the juvenile students) can be accurately identified, and parents or teachers can be rapidly helped to discover students with potential psychological risks through the value of the health state label. In addition, since those skilled in the art know the working principles of the EBM model, the Bagging algorithm and the Boosting algorithm, the detailed description is omitted here.

In some of these embodiments, the training steps for the explainable ai model are as follows:

firstly, constructing a cross entropy function of an Explainable AI model, and calculating the value of the cross entropy function;

then, when the calculation result of the cross entropy function is minimum, obtaining a weight matrix of each classifier and a first group of critical value matrixes and a second group of critical value matrixes; the first group of critical value matrixes and the second group of critical value matrixes refer to partitioned critical point matrixes and can be imagined into a plurality of sub-boxes, data fall into each data interval, and partition points between the intervals are the critical point matrixes; in this embodiment, the weight matrix is w1, w2, w3, w4, w5, w6, w7, w8, w9, w10, w11, w12, the first set of threshold value matrices are a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12, the second set of threshold value matrices are b1, b2, b3, b4, b5, b6, b7, b8, b9, b10, b11, b 12;

it should be noted that, integrated learning is performed on the basis of twelve emotional and physiological parameters (i.e. classification results) of aggressiveness, stress, anxiety, doubtful, balance, confidence, vitality, self-regulation, inhibition, nervousness, depression and happiness, a plurality of different trees are formed firstly, training is performed on a training set, each tree learns N rounds on a single feature with the minimum cross entropy as a target, and in the specific training process, if a certain sample point is accurately classified, the weight of the tree is reduced in the construction of the next training set; conversely, if a sample point is not classified accurately, its weight is increased. The sample set with updated weights is then used to train the next classifier, and the entire training process proceeds iteratively. It is easy to understand that each of the most basic classifiers is a tree, which is a data structure in a computer, such as a binary tree, and the explainable ai model is a forest formed by combining many trees after continuously training parameters.

Secondly, inputting the training set into an Explainable AI model to obtain a first vector;

then, comparing the first vector with a first set of critical value matrixes;

then, multiplying the comparison result by a second group of critical value matrixes, and taking the maximum value of the multiplication result as a feature vector;

and finally, taking a value obtained by adding a first result and a second result as a value of the health state label, wherein the first result is obtained by multiplying the eigenvector by a weight matrix, the second result is obtained by multiplying a constant weight calculated by fitting by a first constant, and the first constant is the sum of a constant c and a regularization coefficient L which is subjected to constraint overfitting. Where the constant c is the average difference between the sample set observations and the estimates.

Specifically, the training step of the explainable ai model can be represented by the following formula:

wherein y is the value of the health status label, w0 is a constant weight calculated by fitting, c is the average difference between the observed value and the estimated value of the sample set, L is a regularization coefficient for constraining overfitting, w1, w2, w3, w4, w5, w6, w7, w8, w9, w10, w11, and w12 are weight matrices, and x1, x2, x3, x4, x5, x6, x7, x8, x9, x10, x11, and x12 are eigenvectors.

In some of these embodiments, the cross entropy function is calculated as follows:

in addition, the calculation formula of the regularization coefficient L of the constraint overfitting is as follows:

wherein L is a regularization coefficient of constrained overfitting, N represents the total number of samples, y is a value of a health state label, x is a training sample, W is a weight matrix, T is a matrix transposition symbol,

is a regularization coefficient in which, among others,

the larger the value of (a), the greater the penalty strength.

In some embodiments, after obtaining the value of the health status label according to the emotion recognition classification result, the method further comprises the following steps:

and judging whether the value of the health state label is smaller than a preset value, if so, indicating that the emotion of the student in the face video is unhealthy, and if not, indicating that the emotion of the student in the face video is healthy. The preset value in this embodiment is 0, that is, when it is determined whether the value of the health status label is less than 0, if so, it indicates that the emotion of the student in the face video is unhealthy (i.e., a person who needs to pay attention to the student), and if not, it indicates that the emotion of the student in the face video is healthy, so that parents or teachers can be helped to find the student with the potential psychological risk, and then corresponding help can be timely provided for the student with the potential psychological risk.

In some embodiments, after inputting the time-ordered array and the space-ordered array into the explainabiableai model to obtain the emotion recognition classification result and obtaining the value of the health status label according to the emotion recognition classification result, the method further includes the following steps:

and comparing the value of the health state label with a heart lining measuring table, and calculating the coincidence rate. The psychological Scale in this embodiment is SAS (Self-Rating Anxiety Scale) and SDS (Self-Rating Depression Scale), which are used to measure Anxiety and Depression respectively, and the SAS is known to be a Self-Rating Anxiety Scale, which is a psychological term for Anxiety assessment, and is a psychological Scale for measuring the degree of Anxiety and its change during treatment. Mainly for efficacy assessment, not for diagnosis, and the skilled person knows that SDS is a (psychological) depression self-rating scale. The method is widely applied to the rough screening, emotional state assessment, investigation, scientific research and the like of outpatients, can not be used for diagnosis, and is not repeated herein.

In this embodiment, the calculation formula of the coincidence ratio is as follows: coincidence ratio = (person who detects anxiety OR person who detects depression) number/total number of people of ad (person who is determined to be a person to be concerned by the model);

the calculation formula of the accuracy is as follows:

wherein, Accuracy is Accuracy, tp (true positive) is correct positive case, one example is positive class and is also determined as positive class, fn (false negative) is wrong negative, false report is missed, namely, the positive class is judged as false class, fp (false positive) is wrong positive case, false report is wrong positive, namely, the false class is judged as positive class, tn (true negative) is correct negative, namely, one example is false class and is also determined as false class; in this embodiment, the positive class is healthy, and the negative class is unhealthy (i.e., needs attention).

In addition, the recall ratio is calculated as follows:

wherein Recall is Recall.

In an alternative embodiment, a middle school performs a batch test on students with certain handicaps in behavior and psychology (hereinafter, referred to as problem students) which are evaluated by a shift master teacher and a doctor together, and in the case that 68 problem students and 32 ordinary students selected by the teacher, namely 100 students are used as test samples, the test samples are calculated through the above steps S101 to S104: TP =28, TN =66, FN =4, FP = 2; then the Accuracy = (28+ 66)/100 =0.94 is calculated; recall =28/(28+4) = 0.875; coincidence ratio = (person who detects anxiety OR person who detects depression) And the number of people/total people of ad (person who is determined to be attention by the model) =85/100= 85%; in summary, compared with the traditional unsupervised clustering K-Means algorithm, the accuracy is 75%, the recall rate is 30%, and the situation that students with potential psychological risks cannot be found effectively, the accuracy obtained by the method in the embodiment is 94%, 19% higher than before, 88% higher than before, 58% higher than before, and 85% higher than the coincidence rate of comparison with the psychological scale, so that the accuracy and stability of the test are greatly improved.

In order to improve the encoding efficiency, in an embodiment, the face video is a video encoded by MPEG, and the face video includes a face upper half video of a predetermined time. Because the obtained original face video is in an Avi format, the resolution ratio of the face video is 1280x720, and the frame number of the picture transmitted per second is 30fps, the face video coded by the MPEG in the embodiment improves the coding efficiency, and is beneficial to the use of a subsequent model. It should be noted that the preset time may be set to 1 minute or other times according to the user's needs, and is not specifically described here.

It should be noted that the steps illustrated in the above-described flow diagrams or in the flow diagrams of the figures may be performed in a computer system, such as a set of computer-executable instructions, and that, although a logical order is illustrated in the flow diagrams, in some cases, the steps illustrated or described may be performed in an order different than here.

The present embodiment further provides an emotion recognition system, which is used to implement the above embodiments and preferred embodiments, and the description of the emotion recognition system is omitted. As used hereinafter, the terms "module," "unit," "subunit," and the like may implement a combination of software and/or hardware for a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 2 is a block diagram of a structure of an emotion recognition system according to an embodiment of the present application, as shown in fig. 2, the system including:

the acquisition module 21 is configured to acquire a face video, where the face video is composed of a plurality of vibration images;

the calculating module 22 is used for calculating motion vectors among frames in the face video;

the conversion module 23 is configured to perform DCT coefficient conversion processing on the motion vector to obtain a time ordering array and a space ordering array;

and the classification module 24 is used for inputting the time sequencing array and the space sequencing array into an Explainable AI model to obtain an emotion recognition classification result, and obtaining a value of the health status label according to the emotion recognition classification result. In the method for recognizing emotion scenes in adolescent students, firstly, a face video is obtained, then, motion vectors among frames in the face video are calculated, DCT (discrete cosine transformation) coefficient conversion processing is carried out on the motion vectors to obtain a time sequencing array and a space sequencing array, the time sequencing array and the space sequencing array are input into an Explainable AI (Artificial Intelligence) model to obtain emotion recognition classification results, values of health state labels are obtained according to the emotion recognition classification results, and the problem that students with potential psychological risks cannot be effectively found in the related technology is solved The state can be accurately identified, and the current emotional state of the tested person (namely the juvenile students) can be rapidly identified through the value of the health state label, so that parents or teachers can be rapidly helped to discover the students with potential psychological risks. In addition, compared with the traditional unsupervised clustering K-Means algorithm in a nonlinear calculation mode, the Explainable AI model of the embodiment adopts a linear calculation mode, so that not only is the calculation complexity reduced, but also the response time (the response time is reduced from 3s to 0.02 s) is shortened, and the emotion recognition classification result can be used for representing the parameter weight by drawing in a characteristic value vector mode, and the result interpretability is improved.

The above modules may be functional modules or program modules, and may be implemented by software or hardware. For a module implemented by hardware, the above modules may be located in the same processor; or the modules may be located in different processors in any combination.

The present embodiment also provides an electronic device comprising a memory having a computer program stored therein and a processor configured to execute the computer program to perform the steps of any of the above method embodiments.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the transmission device is connected to the processor, and the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

step S101, obtaining a face video, wherein the face video is composed of a plurality of vibration images;

step S102, calculating motion vectors among frames in the face video;

step S103, performing DCT coefficient conversion processing on the motion vector to obtain a time sorting array and a space sorting array;

and step S104, inputting the time sequencing array and the space sequencing array into an Explainable AI model to obtain an emotion recognition classification result, and obtaining a value of the health state label according to the emotion recognition classification result.

It should be noted that, for specific examples in this embodiment, reference may be made to examples described in the foregoing embodiments and optional implementations, and details of this embodiment are not described herein again.

In addition, in combination with the emotion recognition method in the above embodiments, the embodiments of the present application may be implemented by providing a storage medium. The storage medium having stored thereon a computer program; the computer program, when executed by a processor, implements any of the emotion recognition methods in the above embodiments.

In one embodiment, a computer device is provided, which may be a terminal. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operating system and the computer program to run on the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of emotion recognition. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, a key, a track ball or a touch pad arranged on the shell of the computer equipment, an external keyboard, a touch pad or a mouse and the like.

In one embodiment, fig. 3 is a schematic diagram of an internal structure of an electronic device according to an embodiment of the present application, and as shown in fig. 3, there is provided an electronic device, which may be a server, and its internal structure diagram may be as shown in fig. 3. The electronic device comprises a processor, a network interface, an internal memory and a non-volatile memory connected by an internal bus, wherein the non-volatile memory stores an operating system, a computer program and a database. The processor is used for providing calculation and control capabilities, the network interface is used for communicating with an external terminal through a network connection, the internal memory is used for providing an environment for the operation of an operating system and a computer program, the computer program is executed by the processor to realize an emotion recognition method, and the database is used for storing data.

It will be understood by those skilled in the art that the structure shown in fig. 3 is a block diagram of only a portion of the structure associated with the present application, and does not constitute a limitation on the electronic device to which the present application applies, and that a particular electronic device may include more or fewer components than shown in the drawings, or may combine certain components, or have a different arrangement of components.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above may be implemented by hardware instructions of a computer program, which may be stored in a non-volatile computer-readable storage medium, and when executed, the computer program may include the processes of the embodiments of the methods described above. Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory, among others. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It should be understood by those skilled in the art that various features of the above-described embodiments can be combined in any combination, and for the sake of brevity, all possible combinations of features in the above-described embodiments are not described in detail, but rather, all combinations of features which are not inconsistent with each other should be construed as being within the scope of the present disclosure.

The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims

1. A method of emotion recognition, the method comprising:

calculating a motion vector between frames in the face video;

2. The method according to claim 1, wherein the explainable ai model is a generalized additive model, wherein the generalized additive model is a model obtained when each feature function is an optimal function through integrated learning of each feature function by a Bagging algorithm and a Boosting algorithm based on an EBM model, each classification result in the emotion recognition classification results is obtained through each feature function, and each feature function corresponds to each emotion recognition classification result one to one, and the emotion recognition classification results at least include an aggressive classification result, a stressed classification result, an anxious classification result, a suspicious classification result, a balanced classification result, a confidence classification result, a vitality classification result, a self-adjusting classification result, an inhibited classification result, a neural quality classification result, a depression classification result, and a happy classification result.

3. The method of claim 1, wherein the Explainable AI model is trained by:

when the calculation result of the cross entropy function is minimum, obtaining a weight matrix of each classifier and a first group of critical value matrixes and a second group of critical value matrixes;

comparing the first vector to the first set of threshold matrices;

4. The method of claim 3, wherein the cross entropy function is calculated as follows:

5. The method of claim 1, wherein after obtaining the value of the health status label according to the emotion recognition classification result, the method further comprises:

6. The method of claim 1, wherein the time-ordered array and the space-ordered array are input into an explainabizable ai model to obtain an emotion recognition classification result, and after obtaining the value of the health status label according to the emotion recognition classification result, the method further comprises:

7. The method of claim 1, wherein the face video is a video encoded by MPEG, and the face video comprises a face upper body video of a predetermined time.

8. An emotion recognition system, characterized in that the system comprises:

9. An electronic device comprising a memory and a processor, wherein the memory has stored therein a computer program, and wherein the processor is arranged to run the computer program to perform the emotion recognition method as claimed in any of claims 1 to 7.

10. A storage medium having stored thereon a computer program, wherein the computer program is arranged to, when run, perform the emotion recognition method of any of claims 1 to 7.