CN114581991A

CN114581991A - Behavior attitude identification method based on dynamic perception of facial expressions

Info

Publication number: CN114581991A
Application number: CN202210219980.9A
Authority: CN
Inventors: 余伟; 余放; 李宇轩; 李石君; 杨弋; 卢可
Original assignee: Wuhan Hangjun Technology Co ltd; Wuhan University WHU
Current assignee: Wuhan Hangjun Technology Co ltd; Wuhan University WHU
Priority date: 2022-03-08
Filing date: 2022-03-08
Publication date: 2022-06-03

Abstract

The invention discloses a behavior attitude identification method based on facial expression dynamic perception, which comprises the following steps of S1: the face detection and positioning are carried out through the face positioning of the expression image, the face cutting is carried out after the exact position of the face is found in the image, other interference information is thoroughly eliminated, and the face in the image is highlighted through the image enhancement; s2, establishing a face time sequence feature evolution model; and S3, judging the answer attitude according to the dynamic time sequence characteristics of the face. The invention analyzes facial expressions through an artificial intelligence technology, realizes the recognition of behavior attitude, can effectively understand the real inner feeling of a user, can be widely applied to businesses such as marital relation prediction, communication negotiation, teaching evaluation and the like, can find the real intention of the user through analyzing the expressions of the user, can timely stop illegal behaviors of dangerous molecules, and can well predict whether a prisoner lies, whether violent behaviors exist and the like, thereby protecting the long-term security of the country.

Description

Behavior attitude identification method based on facial expression dynamic perception

Technical Field

The invention relates to the technical field of artificial intelligence, in particular to a behavior attitude identification method based on dynamic perception of facial expressions.

Background

Facial expression recognition can recognize human facial expressions such as surprise, sadness, happiness, anger and the like, and there is a wide range of potential applications for facial expression recognition. Expression is the intuitive reaction of human emotion, and expression recognition is always one of important research subjects of computer vision. Researchers have achieved significant results in various expression recognition problems over the past few decades. One method is that a computer visual characteristic is constructed, an efficient expression form is found to describe the expression and model learning is carried out; and the other method is to search a proper learning algorithm for constructing the model according to the characteristics of the expression. In recent years, studies have been made to encode 6 expressions such as anger, disgust, fear, happiness, sadness, surprise, and the like, which are proposed by a Facial Action Coding System (FACS), and to recognize the expression of a human face in a picture or a video. The main technologies at present are as follows:

1. LBP-TOP-based identification method

A 68-point subjective shape model is used to locate key points of the face. And calculating the deformation relation between the face image in the first frame of each sequence and the model face image by using a local weighted average algorithm on the basis of the obtained key points, and applying the deformation to each frame image of the corresponding sequence. This eliminates to some extent the differences between different faces and different sequences in an expressionless state.

2. Identification method based on STCLQP

The complete local quantization mode is an improvement of LBP. Different from the relationship of the gray value of only the local pixel coded by the LBP, the complete local quantization mode decomposes the local symbiotic mode of the central pixel and the surrounding pixels into positive and negative signs and amplitude values, adds the gradient information of the central pixel, and respectively codes by binary numbers. In the stage of constructing a statistical histogram, in order to reduce the dimension of a feature, the complete local quantization mode does not count all possible binary codes, but considers the most frequently occurring binary mode, and introduces a vector quantization technique, which can specify the number of centers (the number of words in a codebook) in the quantization process and obtain a histogram with specified dimension as the feature.

3. LBP-SIP-based identification method

Unlike the expression recognition work based on LBP-TOP improvement, the six-intersection local binary pattern expands the LBP characteristics from another perspective for micro-expression recognition. The LBP-SIP uses four points on the same plane of the central point as space texture description, and the central points of the front frame and the rear frame as time texture description, thereby reducing the dimension of the characteristic and improving the efficiency of characteristic extraction.

4. Delaunay time domain coding-based identification method

The time domain coding model based on Delaunay triangulation utilizes a subjective appearance model to calibrate the face image sequence. Because the variation range of the expression is very small, the expression change can not be well described only by using key points, and therefore, the sequence images are normalized by using the characteristic points to obtain a face image sequence with fixed characteristic point positions. Delaunay triangulation may divide a face into a series of triangular regions based on given feature points. Because the feature points have been normalized, the size and shape of each triangular region is the same, with the same number of pixels. By comparing the changes of the same area with time, the dynamic process of the expression can be described.

However, these methods are susceptible to illumination changes and the like, and face the problem of insufficient robustness in a real scene, and these feature extraction methods mainly depend on the prior knowledge of designers, and need manual parameter adjustment, so that only a small number of parameters are allowed to appear in the design of features, which may result in a greatly reduced recognition rate.

We propose a behavioral attitude recognition method based on dynamic perception of facial expressions in order to solve the problems set forth above.

Disclosure of Invention

The invention aims to provide a behavior attitude identification method based on dynamic perception of facial expressions, so as to solve the problems in the background technology. The invention judges the answering attitude of the on-line answering person according to the human face displacement characteristics, thereby indirectly judging the single body credibility. On-line answering is a dynamic process, so that facial time sequence data in the answering process of an answerer needs to be collected and converted into a key point displacement time sequence. And carrying out evolution disturbance on the time sequence through a Markov theory to generate a displacement random evolution sequence of the key points of the face, wherein the sequence has smaller difference with the original sequence and can form random disturbance on the original data in a longer time span. And constructing a splitting loss function by using the evolution sequence as a heterocons multiplication component, wherein the loss function can enable the trained model to obtain stronger generalization capability. The splitting loss function can make the displacement time sequences of the facial key points with the same attitude similar as much as possible in the training process, and make the corresponding sequences with different attitudes separated as much as possible. The encoder trained by the split loss function is combined with the logstic function, so that the capability of judging the answering attitude of an answerer is realized.

In order to achieve the purpose, the invention provides the following technical scheme: the behavior attitude identification method based on the dynamic perception of the facial expression comprises the following steps:

s1, face image preprocessing: the face detection and positioning are carried out through the face positioning of the expression image, the face cutting is carried out after the exact position of the face is found in the image, other interference information is thoroughly eliminated, and the face in the image is highlighted through the image enhancement;

s2, establishing a face time sequence feature evolution model;

and S3, judging the answer attitude according to the dynamic time sequence characteristics of the face.

Preferably, in step S1, the image enhancement may stretch the grayscale details of the feature object according to the needs of the user through piecewise linear transformation, and after dividing the whole grayscale interval of 0 to 255 into a plurality of line segments according to needs, perform corresponding linear transformation on each line segment; the linear transformation formula is as follows:

preferably, in step S1, in the facial positioning of the expression image, on the basis of performing piecewise linear transformation on the image, a gray scale integral projection method is used to perform the positioning of the facial area, including a horizontal integral projection and a vertical integral projection;

setting the size of the gray image as M multiplied by N, and setting f (x, y) as the gray value at the middle point (x, y) of the image;

its vertical integral projection is then:

the horizontal integral projection is:

and carrying out scale normalization processing on the image to process the image into a uniform size, carrying out image scaling according to a proportion, and cutting and size normalization on the boundary to obtain the facial expression image.

Preferably, in step S2, the establishing a face temporal feature evolution model includes the following:

s20, face time series data: the face time sequence data of the answering person refers to the record of the change of pixels in a face area along with the time within the answering time of the answering person in unit question quantity;

s21, face time series feature: the face time sequence data is dynamic data which contains face dynamic characteristics;

s22, coordinates and displacement of the face key points: the face key points refer to a set of key pixels which can be calibrated at relative positions of features, and the key points can be calibrated manually or given by weight of machine learning;

s23, facial key point displacement time sequence: the facial key point displacement time sequence is a sequence of the coordinate displacement of a person's facial key point along with the change of time in unit quantity answering time and is recorded in a discrete vector form; defining the facial keypoint shift time series using the following formula:

x(0)＝[Δx(t₁),Δx(t₂),···,Δx(t_n)] (4)；

wherein x (0) represents the original displacement vector sequence of the same key point pixel at n moments, and the element Δ x (t)_i) Indicating the direction of displacement of the pixel at the time;

s24, randomly evolving the displacement of the facial key points: the answerers divide the test attitude into positive and negative types, and the deviation is utilized to apply disturbance to the x (0), so that the evolved displacement sequence has certain difference with the original sequence and influences a period of time;

s25, initial distribution of displacement of the facial key points: the definition of the initial distribution of the displacement of the facial key points is as follows:

φ(t₀)＝[φ₁,φ₂,···,φ_N] (5)；

wherein, the vector phi represents the initial displacement distribution of the displacement of the facial key points, and the facial key points are recorded at the initial time t₀Probability of each mark position, and the internal component represents the probability of the face key point at the ith position;

the evolution matrix of the displacement of the facial key points is as follows:

wherein S represents a displacement evolution matrix of the facial key points; i rows and j columns of elements S_ijThe probability of random disturbance represents the probability that the displacement of the facial key point is shifted to the key position j from the key position i through a unit moment, and can be obtained by counting the frequency that the facial key point falls to a new mark position when the answerer finishes answering;

s26, face key point multiple displacement distribution: according to the chepman-cole moguov equation, the following is shown:

wherein phi (t)₀) Is the displacement distribution of key points m times, which represents the displacement distribution estimation of the key points of the face of the answerer after m unit moments from the initial moment, and the element phi is_i(m) represents the probability that the rear key point is at the ith position after m unit moments, and the multiple displacement distribution is estimated by the above Markov process with continuous time and state;

s27, randomly evolving the displacement of the key points of the face: the random evolution of the facial time sequence features refers to the probability that the same facial key point moves towards the key position direction in the mark position at a certain moment.

Preferably, in step S22, after compressing the video of the face time series data to a uniform standard aspect ratio, the coordinates of each pixel in the same key point are called a face key point coordinate set;

wherein X^mRepresenting a set of coordinates of facial key points, elements of the set

A coordinate representing a kth pixel;

wherein, Δ X^m(t) a set of displacement vectors representing the coordinates of the key points from time t; element(s)

Indicating the displacement direction of the kth pixel, with the coordinate at time t +1 as the end point and the coordinate at time t as the start point.

Preferably, in step S27, the randomly evolving sequence of the displacements of the facial key points is as follows:

wherein, the random evolution model of the facial time sequence feature is a transformation of an original displacement vector sequence x (0) to m displacement sequences x (m); the transformation is to apply an N-dimensional probability distribution of displacement to each component in x (0), with the components being replaced by randomly evolving directions depending on the probability.

Preferably, in step S3, the answer attitude determination according to the dynamic time-series feature of the face includes the following contents:

s30, constructing a same attitude sample pair: counting a displacement time sequence x (0) of the key points of the face of a certain positive attitude sample according to the step S23; calculating a random evolution sequence x (m) of the displacement of the corresponding facial key points according to the step S27 to form a group of sample pairs; constructing corresponding evolution sequences for all samples with active attitude tags;

s31, constructing a splitting loss function: the splitting loss function is designed to have a mode of distinguishing sample pairs from non-sample pairs, so that the face displacement sequences in the same attitude can be gathered together as much as possible, and the sequences from different attitudes can be separated;

s32, attitude judgment: and acquiring minimum value points of the splitting loss function, and inputting a new face time sequence into the trained R [ ] + logstic linear classifier structure.

Preferably, in step S31, the splitting loss function is defined as follows:

wherein, L (x)_i) Representing a cleavage loss function, R [, ]]The neural network is a backbone neural network with parameters, and a plurality of existing specific network forms can be selected according to requirements.

Compared with the prior art, the invention has the beneficial effects that:

the invention analyzes facial expressions through an artificial intelligence technology, realizes the recognition of behavior attitude, can effectively understand the real inner feeling of a user, can be widely applied to businesses such as marital relation prediction, communication negotiation, teaching evaluation and the like, can find the real intention of the user through analyzing the expressions of the user, can timely stop illegal behaviors of dangerous molecules, and can well predict whether a prisoner lies, whether violent behaviors exist and the like, thereby protecting the long-term security of the country.

Drawings

FIG. 1 is a general flow chart of the behavioral attitude recognition method based on dynamic perception of facial expressions according to the present invention;

FIG. 2 is a random evolution diagram of displacements of key points of a face based on the behavior attitude identification method of dynamic perception of facial expressions.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Referring to fig. 1-2, the present invention provides a technical solution: the behavior attitude identification method based on the dynamic perception of the facial expressions comprises the following steps:

firstly, preprocessing a face image.

1. Image enhancement

The purpose of image enhancement is mainly to highlight the face in the image, the gray level details of the characteristic object can be stretched according to the needs of a user through piecewise linear transformation, the whole gray level interval of 0-255 is divided into a plurality of line segments according to the needs, and then each line segment is subjected to corresponding linear transformation. The mathematical transformation formula is as follows:

2. facial positioning of expression images

The important step in the expression image preprocessing is the detection and the positioning of the face, which are used for finding the exact position of the face in the image, then the face is cut, other interference information is thoroughly eliminated, and only the face information is left for subsequent analysis. On the basis of piecewise linear transformation of the image, a gray scale integral projection method is adopted to locate the facial area, including a horizontal integral projection and a vertical integral projection.

its vertical integral projection is then:

the horizontal integral projection is:

in order to obtain complete information of facial expressions, the face boundary map obtained by gray scale integral projection is slightly modified, i.e. the upper and lower boundaries are appropriately expanded. Because the face images cut by the gray scale integral projection are different in size, the images need to be subjected to scale normalization processing so as to be processed into a uniform size, and subsequent related work such as feature extraction is facilitated. And zooming the image according to the proportion, and then cutting and normalizing the boundary to obtain the facial expression image.

Secondly, establishing a face time sequence feature evolution model

1. Face time series data

The face time sequence data of the answering person refers to the record of the change of pixels in a face area along with the time within the answering time of the answering person in unit question amount. The unit question amount is the answer amount which contains at least one feedback operation of the answering person and has measurement significance, and can be set to be a half question or a single question according to a specific task, or a certain period of time of the question which takes longer time is specified. One topic can be generally selected as the unit topic quantity. The time starting point of the unit quantity is the time when the theme appears on the display of the subject, the time end point of the unit quantity is the time when the subject performs effective feedback operation, the time difference between the former and the latter is the answering time of the unit quantity, and the time sequence data is recorded and stored in a video format.

2. Facial timing features

Face recognition techniques exist to extract 2D static features of the face. The face 3D recognition technology is to collect dynamic data through a structured light phase or an optical depth technology to establish 3D face static data and extract 3D static features. The face time sequence data is dynamic data and comprises face dynamic characteristics, and generalized face key point displacement time sequences are established in the research to represent the characteristics of the face dynamic.

3. Face key point coordinates and displacements

Facial keypoints refer to a set of keypoints that can be labeled with relative positions across features, and the keypoints can be labeled by human or machine-learned weights. After the video of the face time sequence data is compressed to a uniform standard length-width ratio, the coordinate of each pixel in the same key point is called a face key point coordinate set.

Defining: facial keypoint coordinate set

Wherein, X^mRepresenting a set of coordinates of facial key points, elements of the set

Indicating the coordinates of the kth pixel. Facial temporal features are a dynamic feature, and therefore focus on dynamic displacement changes of pixels within a set.

Defining: displaced set of facial keypoints

Wherein, Δ X^m(t) a set of displacement vectors representing the coordinates of the key points at time t; element(s)

4. Facial key point displacement time series

The facial key point displacement time sequence is a sequence of coordinate displacement of a person's facial key points changing along with time in unit order quantity answering time, and is recorded in a discrete vector (array) form.

Defining: facial key point displacement time series

x(0)＝[Δx(t₁),Δx(t₂),···,Δx(t_n)] (4)；

Wherein x (0) represents the original displacement vector sequence of the same key point pixel at n moments, and the element deltax (t)_i) Indicating the direction of displacement of the pixel at the instant. The displacement time sequence of the facial key points does not record any coordinates, because each element of the facial key points is a displacement vector, the coordinates of the corresponding moment can be obtained by the addition of the displacement vectors as long as the initial coordinates of the key points are determined. This sequence characterizes the dynamic features of the respondent's facial keypoints during the response.

5. Random evolution of facial keypoint displacements

The respondents are classified into positive cooperation and negative resistance to the test attitude. Hereinafter, the positive and negative are simply referred to in order. The research assumes that the two attitudes have certain deviation on the face of each individual, namely that the facial time sequence characteristics of the respondents with the same attitude cannot be completely the same, but the facial time sequence characteristics of the respondents with the same attitude have certain commonality on the whole, so that the difference of the individuals with the same attitude is considered as a random evolution deviation of an ideal state. The invention utilizes the deviation to apply disturbance to x (0), so that the evolved displacement sequence has certain difference with the original sequence and is influenced for a long time. This random perturbation will allow the model to gain generalization capability in the subsequent training phase.

6. Initial distribution of displacement of key points of face

Defining: initial distribution of displacement of facial key points

Wherein S represents a displacement evolution matrix of the facial key points; i rows and j columns of elements S_ijThe probability of the displacement of the face key point is a random disturbance probability which represents the probability that the displacement of the face key point is displaced from the key position i to the key position j through a unit moment, and can be obtained by counting the frequency of the face key point falling to a new mark position when the answerer finishes answering.

7. Multiple displacement distribution of facial key points

According to the Chepmann-Col Morgonov equation:

wherein phi (t)₀) Is the displacement distribution of key points m times, which represents the displacement distribution estimation of the key points of the face of the answerer after m unit moments from the initial moment, and the element phi is_i(m) represents the posterior key after m unit time pointsThe probability that a point is at the ith position, the multiple displacement distribution is estimated by the above Markov process of the time and state continuous type.

8. Random evolution of displacement of facial keypoints

The random evolution of the facial time sequence characteristics refers to the probability that the same facial key point moves towards the key position direction in the mark position at a certain moment, and the probability changes along with the change of time.

Defining: displacement random evolution sequence of facial key points

In particular, the probability value at which any displacement component is replaced follows the distribution law φ (t)_m) The direction of displacement is directed to phi (t)_m) The middle probability component. The degree of displacement is proportional to the distance of the center point of the mark position relative to the current coordinates. Phi (t) calculated since the facial movements of the respondents have a certain commonality in total_m) So that x (0) is given a predetermined displacement by phi (t)_m) The mid-position replacement is a small probability event, x (m) is similar to the sequence of x (0) after m times of displacement in general, except that the replacement occurs in individual components, which is called x (m) as a randomly evolved sequence of facial temporal features after m times of displacement.

Thirdly, answer attitude judgment according to the dynamic time sequence characteristics of the face

1. Constructing pairs of same-attitude samples

The first step in the answer attitude determination is the evolution of facial time series data. The positive attitude samples are taken as an example for explanation, and the negative attitude samples are vice versa. Counting a facial key point displacement time sequence x (0) of a certain positive attitude sample according to the above, and calculating a corresponding displacement random evolution sequence x (m) of the facial key point to form a group of sample pairs; and constructing corresponding evolutionary sequences for all samples with positive attitude tags.

2. Constructing a splitting loss function

The split loss function is designed to have a form of distinguishing sample pairs from non-sample pairs, and it can group together as much as possible face displacement sequences in the same attitude, and separate sequences from different attitudes.

Defining: splitting loss function

Wherein, L (x)_i) Representing a cleavage loss function, R [, ]]The method is a backbone neural network with parameters, and can select various existing specific network forms (such as ResNet, VGG-19, and when the number of answers is large or the answer time is long, a serialization network (such as GRU, LSTM, BERT) can be selected as required to be used as a dot product of a subsection in an encoder logarithmic function of a face displacement time sequence and a corresponding evolution sequence, and to be used as a vector dot product of a sample sequence and the corresponding evolution sequence after the sample sequence and the corresponding evolution sequence are output by an encoder. When the fragmentation function is more optimized, R2]The middle parameter will make the point product value of the homomorphism sample larger, i.e. the similarity is higher. The point multiplication term of the accumulation part of the denominator is the point multiplication of the original sequence and the sample evolution sequence with different attitudes, so that when the splitting loss is reduced, R [ 2 ]]The medium parameters can reduce the dot product value of the part as much as possible, reduce the similar pairs of the partial dot product values and split the displacement sequences with different attitudes. Therefore, the optimization process of the splitting function enables the encoder R to extract the common mode of the displacement of the key points of the faces of the respondents with the same attitude and separate the non-common modes of the faces of the respondents with different attitudes. These patterns are dynamic time series at the input and are therefore a dynamic pattern.

3. Attitude determination

The main part of the splitting loss function is a factor similar to the softmax function, so the minimum point of the splitting loss function can be obtained by various existing optimization methods (e.g. ADAM, SGD). The encoder R after training becomes the input end of the discriminator, and the output end is connected with a two-class logstic linear classifier, so that the discrimination can be carried out on the new face key point time sequence data. Inputting a new face time sequence into a trained R [ ] + logstic linear classifier structure to obtain an attitude judgment result, wherein the whole training process (a split loss function optimization process) can be found, only less label data is needed, most data are evolved from original data, the nature of evolution is to add a random disturbance which can not cause the whole to deviate from a larger value to an original time sequence, and the disturbance acts on a long period of time sequencer, so that the time sequence can also have generalization during model training even if the time sequence represents a long-time dynamic displacement mode.

Although the present invention has been described in detail with reference to the foregoing embodiments, it will be apparent to those skilled in the art that various changes in the embodiments and/or modifications of the invention can be made, and equivalents and modifications of some features of the invention can be made without departing from the spirit and scope of the invention.

Claims

1. The behavior attitude identification method based on the dynamic perception of the facial expressions is characterized by comprising the following steps of:

s2, establishing a face time sequence feature evolution model;

2. A behavioral attitude recognition method based on dynamic perception of facial expressions according to claim 1, characterized in that in step S1, the image enhancement can stretch the grayscale details of the feature object according to the needs of the user through piecewise linear transformation, after dividing the whole grayscale interval of 0-255 into several line segments according to the needs, perform corresponding linear transformation on each line segment; the linear transformation formula is as follows:

3. a behavioral attitude recognition method based on dynamic perception of facial expressions according to claim 1, characterized in that in step S1, facial positioning of the expression image is performed by using a gray scale integral projection method based on piecewise linear transformation of the image, including horizontal integral projection and vertical integral projection;

setting the size of the gray image as M multiplied by N, and setting f (x, y) as the gray value at the midpoint (x, y) of the image;

its vertical integral projection is then:

the horizontal integral projection is:

4. A behavioral attitude recognition method based on dynamic perception of facial expressions according to claim 1, characterized in that in step S2, the establishing of the evolution model of facial temporal features includes the following steps:

s20, face time series data: the face time sequence data of the answering personnel refers to the record of the change of pixels in a face area along with the time within the answering time of the answering personnel in unit number of questions;

x(0)＝[Δx(t₁),Δx(t₂),···,Δx(t_n)] (4)；

s24, randomly evolving the displacement of the facial key points: the answerer divides the attitude of the test into positive and negative types, and the deviation is utilized to apply disturbance to the x (0), so that the evolved displacement sequence has certain difference with the original sequence and influences a period of time;

s25, initial distribution of facial key point displacement: the definition of the initial distribution of the displacement of the facial key points is as follows:

φ(t₀)＝[φ₁,φ₂,···,φ_N] (5)；

wherein, the vector phi represents the displacement initial displacement distribution of the facial key points and records the facial key points at the initial time t₀Probability of each mark position, and the internal component represents the probability of the face key point at the ith position;

wherein phi (t)₀) Is the displacement distribution of key points for m times, and represents the displacement distribution estimation of the key points of the face of the answerer after m unit moments from the initial moment, and the element phi_i(m) represents the probability that the rear key point is at the ith position after m unit moments, and the multiple displacement distribution is estimated by the above Markov process with continuous time and state;

5. A behavioral attitude recognition method based on dynamic perception of facial expressions according to claim 4, characterized in that: in step S22, after compressing the video of the face time series data to a uniform standard aspect ratio, the coordinates of each pixel in the same key point are called a face key point coordinate set;

wherein X^mRepresenting a set of facial keypoint coordinatesSynthesis of elements in a set

Coordinates representing a k-th pixel;

6. A behavioral attitude recognition method based on dynamic perception of facial expressions according to claim 4, characterized in that in step S27, the random evolution sequence of the displacements of the facial key points is as follows:

7. A behavioral attitude recognition method based on dynamic perception of facial expressions according to claim 6, wherein in step S3, said answer attitude determination based on time-series characteristics of facial dynamics includes the following:

s31, constructing a splitting loss function: the splitting loss function is designed to have a form of distinguishing sample pairs from non-sample pairs, so that the face displacement sequences in the same attitude can be gathered together as much as possible, and the sequences from different attitudes can be separated;

8. A behavioral attitude recognition method based on dynamic perception of facial expressions according to claim 7, characterized in that in step S31, the splitting loss function is defined as follows:

wherein, L (x)_i) The function of the loss of the split is expressed,

the neural network is a backbone neural network with parameters, and a plurality of existing specific network forms can be selected according to requirements.