CN116884067A - Micro-expression recognition method based on improved implicit semantic data enhancement - Google Patents

Micro-expression recognition method based on improved implicit semantic data enhancement Download PDF

Info

Publication number
CN116884067A
CN116884067A CN202310854565.5A CN202310854565A CN116884067A CN 116884067 A CN116884067 A CN 116884067A CN 202310854565 A CN202310854565 A CN 202310854565A CN 116884067 A CN116884067 A CN 116884067A
Authority
CN
China
Prior art keywords
optical flow
micro
frame
feature
depth
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310854565.5A
Other languages
Chinese (zh)
Inventor
岳希
王文鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chengdu University of Information Technology
Original Assignee
Chengdu University of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chengdu University of Information Technology filed Critical Chengdu University of Information Technology
Priority to CN202310854565.5A priority Critical patent/CN116884067A/en
Publication of CN116884067A publication Critical patent/CN116884067A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/20Analysis of motion
    • G06T7/246Analysis of motion using feature-based methods, e.g. the tracking of corners or segments
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Computer Interaction (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Image Analysis (AREA)

Abstract

The application discloses a microexpressive data enhancement based on improvement, which comprises the steps of obtaining frame data from a microexpressive initial frame to an ending frame in microexpressive data set; preprocessing the frame data, and extracting optical flow information and a face action unit; inputting the optical flow information and the human face action unit into a micro-expression recognition network for training to obtain a trained micro-expression recognition network; the micro expression recognition network is used for carrying out depth characteristic enhancement through a self-adaptive weighting loss function; and obtaining an image to be processed, inputting the image to be processed into the trained microexpressive recognition network, and obtaining microexpressive recognition results. The application provides a micro-expression recognition method based on improved implicit semantic data enhancement, which solves the technical problems of unbalanced data and small samples in micro-expression data sets in the prior art and achieves the purposes of improving recognition performance and the like.

Description

Micro-expression recognition method based on improved implicit semantic data enhancement
Technical Field
The application relates to the field of microexpressive recognition, in particular to a microexpressive recognition method based on improved implicit semantic data enhancement.
Background
Microexpressive recognition has great application value in the fields of judicial interrogation, business negotiations, psychological consultation and the like, but the current microexpressive recognition faces the following difficulties: the duration of the micro-expression is short and the exercise intensity is low; the samples of the microexpressive data set are unbalanced and the sample size is small; the micro-expressive subject has self-contained expressions. The above difficulties result in poor performance of the existing micro-expression recognition method.
In recent years, some micro-expression recognition methods divide a region where micro-expressions are relatively active on a face as a region of interest (Region ofInterest, roI) by a face Action unit (Action Units, AUs), and recognize the region as a spatial feature input. The method eliminates redundant information irrelevant to micro-expressions, thereby improving recognition performance. The RoI based on purely manual design is very dependent on the psychological experience of researchers, and the division of RoI may involve both spontaneous micro-expressions and self-contained expressions of the subject, and thus is prone to bottlenecks in recognition performance.
In addition, some micro-expression recognition methods have been studied for the variation of RoI in the time dimension, and the commonly used 3 methods are an optical flow method, a model based on a convolutional neural network (Convolutional Neural Network, CNN) combined with a Long Short-Term Memory (LSTM), and a three-dimensional neural network (3D Convolutional Neural Networks,3DCNN). The timing information for creating the micro-expressions is usually based on inputting a large number of micro-expression video frames, but the large amount of video frame data not only requires extra computing overhead, but also the excessive redundant frames affect the recognition performance.
Moreover, current network models based on deep learning have achieved great success in the field of image recognition, but neural network models require a large amount of training data to avoid the problem of overfitting. In the prior art, the total sample size of the mainstream micro-expression data is less than 500 and the sample sizes among categories are very different, so that the problems of unbalanced data and small samples in the micro-expression data set lead to low performance of the current micro-expression recognition method based on deep learning.
In summary, the micro-expression recognition method in the prior art has a plurality of technical defects.
Disclosure of Invention
The application provides a micro-expression recognition method based on improved implicit semantic data enhancement, which solves the technical problems of unbalanced data and small samples in micro-expression data sets in the prior art and achieves the purposes of improving recognition performance and the like.
The application is realized by the following technical scheme:
a micro-expression recognition method based on improved implicit semantic data enhancement, comprising:
acquiring frame data from a micro-expression start frame to an end frame in the micro-expression data set;
preprocessing the frame data, and extracting optical flow information and a face action unit;
inputting the optical flow information and the human face action unit into a micro-expression recognition network for training to obtain a trained micro-expression recognition network; the micro expression recognition network is used for carrying out depth characteristic enhancement through a self-adaptive weighting loss function;
and obtaining an image to be processed, inputting the image to be processed into the trained microexpressive recognition network, and obtaining microexpressive recognition results.
The method comprises the steps of firstly acquiring frame data from a micro-expression start frame to an end frame based on a micro-expression data set, then preprocessing, extracting optical flow information and a human face action unit, and training a deep learning model by taking the optical flow information and the human face action unit as output data of a micro-expression recognition network. Aiming at the problems of unbalanced data and small samples in the microexpressive data set in the prior art, which result in low performance of the traditional microexpressive recognition method based on deep learning, the application performs depth characteristic enhancement through a self-adaptive weighting loss function in a microexpressive recognition network, thereby overcoming the problem of poor recognition effect caused by unbalanced data and small samples in the data set. And finally, inputting the image to be processed into a trained micro-expression recognition network to obtain a micro-expression recognition result.
Further, the optical flow information comprises a horizontal optical flow characteristic diagram, a vertical optical flow characteristic diagram, an optical flow strain characteristic diagram and a RAFT optical flow characteristic diagram; the face action unit is a face action feature sequence.
The inventor finds that the problem that the subject carries the expression in the current mainstream microexpressive data set exists in the research process, the inventor tries to analyze the reason of the problem, and finds that the currently disclosed microexpressive data set adopts a mode that a video set induces microexpressions, so that the subject is required to inhibit the expression exposure of the subject as much as possible, and therefore, when the subject deliberately inhibits the expression of the subject, some expression which is irrelevant or opposite to the video is often generated. For example, the subject initially remains smiling despite the negative video set being currently played; or the subject remains in a state of locking the eyebrows or maintaining no expression all the time, which is obviously not a micro expression in a natural state.
The problem of the expression of the subject in the data set inevitably causes interference to the learning accuracy of the deep learning network, so that the recognition error is increased. In the prior art, the related technical scheme is not aware of the existence of the technical problem of the expression of the subject, and the technical scheme is not used for carrying out deep analysis and providing corresponding solutions.
In order to overcome the technical problem of the self expression of the subject, the facial motion characteristic sequence is used as a facial motion unit of the input deep learning network model, the time sequence information is introduced in a characteristic sequence mode, and compared with the prior art, AUs is not independently used as a one-dimensional characteristic for describing the space motion intensity of the peak frame, but is used as the input of the facial motion by establishing AUs two-dimensional time-space information, wherein the time sequence information contained in the facial motion characteristic sequence can reflect the micro expression actually happened by the subject, so that the problem of the self expression of the subject is solved. In addition, the introduction of the facial action feature sequence can also avoid huge calculation cost caused by directly processing a large number of micro-expression video frames.
Further, the method for extracting optical flow information comprises the following steps:
collecting a start frame and a peak frame of the frame data to obtain a plurality of first images;
sequentially cutting faces and blackening backgrounds of the first images to obtain second images;
respectively extracting horizontal, vertical, optical flow strain and RAFT optical flow characteristics of the second image to obtain a horizontal optical flow characteristic diagram, a vertical optical flow characteristic diagram, an optical flow strain characteristic diagram and a RAFT optical flow characteristic diagram;
the method for extracting the face action unit comprises the following steps:
and extracting facial action features from the frame data, and normalizing the extracted facial action features to obtain a facial action feature sequence.
Further, before face clipping, clipping frame coordinate normalization processing is performed on the first image by the following method:
the following parameters are accumulated to obtain an accumulation result: in the frame data, each image corresponds to the height and width of the left top vertex, the height and width of the right top vertex and the height and width of the lowest vertex of the initial cutting frame;
based on the accumulation result, respectively calculating the average value of the height and the width of the top left vertex, and defining the average value as a first average value and a second average value; calculating the average value of the width of the top right vertex, and defining the average value as a third average value; calculating the average value of the height of the top point at the lowest position, and defining the average value as a fifth average value;
taking the absolute value difference value of the first mean value and the fifth mean value as the height of the final cutting frame;
taking the absolute value difference value of the second mean value and the third mean value as the width of the final cutting frame;
based on the height and width of the final crop frame, the final crop frame is obtained.
According to the scheme, normalization processing is performed on the cutting frame for face cutting in advance. The traditional face clipping method adopts a clipping frame with a fixed position, and does not consider the problem that the clipping image has poor face position due to the face shape difference caused by different race, gender and age. In order to improve the accuracy of face clipping, the scheme provides a brand-new self-adaptive face clipping method, which takes a subject as a unit, adaptively calculates face offset coordinates of each subject in different data sets, and performs normalization processing. Compared with the face cutting frame with fixed positions by the traditional method, the face cutting frame has the advantages of being more flexible and more accurate.
Further, the method for normalizing the extracted facial motion features includes:
taking the facial motion characteristic of the first frame of the video frame sequence as a baseline facial motion characteristic;
and carrying out difference calculation on the facial motion characteristics from the initial frame to the end frame and the baseline facial motion characteristics in sequence to obtain normalized facial motion characteristics.
The scheme defines a normalization method for facial motion characteristics, and the method can effectively improve the calculation efficiency.
Further, the method for inputting the optical flow information and the face action unit into the micro-expression recognition network for training comprises the following steps:
constructing a training set by using the horizontal optical flow characteristic diagram, the vertical optical flow characteristic diagram, the optical flow strain characteristic diagram, the RAFT optical flow characteristic diagram and the facial action characteristic sequence;
inputting the facial action feature sequence in the training set into a AUs feature extraction module to obtain AUs depth features;
inputting a horizontal optical flow feature map, a vertical optical flow feature map and an optical flow strain feature map in a training set into a first optical flow feature extraction module to obtain a first optical flow depth feature;
inputting the RAFT optical flow feature map in the training set into a second optical flow feature extraction module to obtain second optical flow depth features;
performing depth feature fusion on the AUs depth features, the first optical flow depth features and the second optical flow depth features, and outputting fused depth features;
performing depth feature enhancement on the fused depth features through the self-adaptive weighted loss function;
and outputting a training result.
The micro-expression recognition network is limited by the scheme, and the recognition effect of the network model on the micro-expression can be obviously improved by respectively extracting AUs depth features, first optical flow depth features and second optical flow depth features through three independent feature extraction networks.
Further, the AUs feature extraction module obtains AUs depth features by:
adding time position codes on the time dimension of facial action features, and establishing the position relation of the same facial action unit among different frames through a time self-attention layer;
adding a CLS token on the space dimension of the facial action feature, and establishing the relation between the local facial action unit and the global facial action feature through a space self-attention layer;
and outputting the space-time information of the facial action characteristics through a plurality of layers Transformer Encoder, and taking out the CLS token with a preset size as the AUs depth characteristics.
The scheme improves the Encoder part and the network structure part on the basis of the traditional vision Transformer (Vision Transformer). The original single self-attention is improved into 2 self-attention points in time and space through a plurality of layers Transformer Encode, so that the method can be used for constructing the time-space information of the face action unit data of non-image types, and the problems of redundant frames, overlarge calculation overhead and the like caused by directly processing a large number of micro-expression video frames are avoided.
Further, the AUs depth feature, the first optical flow depth feature and the second optical flow depth feature are fused by a shallow transducer.
The scheme provides a depth feature fusion mode based on shallow layer convertors, and can effectively fuse the space-time information of AUs depth features with the first optical flow depth features and the second optical flow depth features. Experiments prove that compared with the traditional full-connection layer depth feature fusion mode, the fusion effect is remarkably improved, the training speed is greatly accelerated, the problem that the training effect of the traditional transducer on small data is poor is solved, and the generalization and the robustness of the network are improved.
In addition, due to the shallow layer convertors, the deep learning network defined by the scheme can also gradually increase the recognition performance along with the increase of the length of the facial action feature sequence, but the consumed computing resources are not excessively increased.
It will be understood by those skilled in the art that shallow transformers in this embodiment refer to at least 12 layers of self-attention having fewer layers than those used in conventional vision transformers, and the specific number of layers is not limited herein.
Furthermore, the first optical flow feature extraction module and the second optical flow feature extraction module extract optical flow depth features through a plurality of residual convolution networks which are connected in sequence.
According to the scheme, the residual convolution network is fused into the deep learning model, the capability of the network for feature extraction is enriched, and the residual convolution network is fused into the deep learning model for extracting the depth features of the optical flow, so that errors caused by AUs features can be corrected. This is because the AUs features extracted using the prior art have some errors due to the limitations of the current technology, and the accuracy is still less than 70%.
Further, the adaptive weighted loss function is:
wherein: l (L) j Adaptive weighting loss for class j samples; c is the number of categories; mu (mu) j A weight value for a j-th class of samples;unweighted loss for sample class j;
wherein:
wherein: n is the total number of samples; n (N) j The number of samples for class j; epsilon (0, 1), a positive number with epsilon greater than 0 and less than 1.
The inventor finds that the microexpressive data set in the prior art shows a rule of long tail distribution in the deep research process. Taking microexpressive 3 classification as an example, the number of samples in the negative class is far greater than the number of samples in the positive and negative classes. The unbalance enables the deep learning network to have better performance in the negative type and worse performance in the positive and the surrising type, and seriously influences the recognition performance of the supervised learning network.
Implicit semantic data enhancement algorithms ISDA, which can enhance minority classes to solve the data imbalance problem, generate diverse enhancement samples by transforming deep features in multiple semantically meaningful directions.
However, the inventors also found during the course of the study that: since ISDA obtains semantic directions by estimating class condition statistics, there is little effect on small sample classes if training data is insufficient; the total sample size of the microexpressive data of the current mainstream is less than 500, and the sample sizes of the categories are very different, so that the microexpressive data belongs to the category of small samples. Therefore, based on the currently mainstream micro-expression data set, the conventional implicit semantic data enhancement algorithm is difficult to be suitable for deep learning network training of micro-expression recognition in practical operation.
The improved implicit semantic data enhancement algorithm provided by the scheme has the core points that the loss function of the traditional ISDA algorithm is improved, and the problem that the enhancement effect of the traditional ISDA on micro-expression small sample class is poor is solved by adopting the self-adaptive weighting loss function. The self-adaptive weighting loss function defined by the scheme not only can enhance the depth semantic features of the minority class, but also can inhibit the importance degree of the majority class through the self-adaptive weight, and the importance of the minority class in the feature space is improved, so that the problem of small samples of the micro-expression dataset is effectively solved.
Compared with the prior art, the application has the following advantages and beneficial effects:
1. according to the micro-expression recognition method based on improved implicit semantic data enhancement, AUs is not taken as a one-dimensional feature for describing the space motion intensity of a peak frame, but is taken as the input of face actions by establishing AUs two-dimensional space-time information, and time sequence information contained in a facial action feature sequence can reflect the micro-expression actually happened by a subject, so that the problem of the self-expression of the subject is solved.
2. The improved implicit semantic data enhancement-based micro-expression recognition method can also avoid huge calculation overhead caused by directly processing a large number of micro-expression video frames due to the introduction of the facial action feature sequence.
3. The application provides a micro-expression recognition method based on improved implicit semantic data enhancement, and provides a brand-new self-adaptive face clipping method. Compared with the traditional method, the fixed-position cutting frame has the advantages of being more flexible and more accurate.
4. According to the micro-expression recognition method based on improved implicit semantic data enhancement, AUs depth features, first optical flow depth features and second optical flow depth features are respectively extracted through three independent feature extraction networks. By introducing the first optical flow depth feature and the second optical flow depth feature to assist the network model to correct errors caused by AUs features, the recognition effect of the network model on the micro-expressions can be remarkably improved.
5. According to the micro-expression recognition method based on improved implicit semantic data enhancement, the original single self-attention is improved into 2 self-attentions in time and space through a plurality of layers Transformer Encode, so that the method can be used for constructing the space-time information of face action unit data of non-image types, and huge calculation cost caused by directly processing a large number of micro-expression video frames is avoided.
6. The application provides a micro-expression recognition method based on improved implicit semantic data enhancement, creatively provides a depth feature fusion mode based on shallow layer Transformer, has obviously improved fusion effect compared with the traditional full-connection layer depth feature fusion mode, greatly accelerates training speed, solves the problem that the traditional Transformer has poor training effect on small data, and increases generalization and robustness of a network.
7. According to the micro-expression recognition method based on improved implicit semantic data enhancement, recognition performance can be gradually increased along with the increase of the length of the facial action feature sequence, but consumed computing resources are not excessively increased.
8. According to the micro-expression recognition method based on improved implicit semantic data enhancement, not only can the depth semantic features of a minority class be enhanced through the self-adaptive weighting loss function, but also the importance degree of the majority class can be restrained through the self-adaptive weight, and the importance of the minority class in the feature space is improved, so that the problem of a small sample of a micro-expression data set is effectively solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of embodiments of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the principles of the application. In the drawings:
FIG. 1 is a schematic flow chart of an embodiment of the present application;
FIG. 2 is a micro-expression recognition network model in accordance with an embodiment of the present application;
FIG. 3 is a confusion matrix in an embodiment of the application;
fig. 4 is a schematic diagram of a face clipping process according to an embodiment of the present application.
Detailed Description
For the purpose of making apparent the objects, technical solutions and advantages of the present application, the present application will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present application and the descriptions thereof are for illustrating the present application only and are not to be construed as limiting the present application.
Example 1:
a method of microexpressive recognition based on improved implicit semantic data enhancement as shown in fig. 1, comprising:
s1, acquiring frame data from a micro-expression start frame to an end frame in a micro-expression data set;
s2, preprocessing the frame data, and extracting optical flow information and a face action unit;
s3, inputting the optical flow information and the human face action unit into a micro-expression recognition network for training, and obtaining a trained micro-expression recognition network; the micro expression recognition network is used for carrying out depth characteristic enhancement through a self-adaptive weighting loss function;
s4, obtaining an image to be processed, inputting the image to be processed into the trained micro-expression recognition network, and obtaining a micro-expression recognition result.
Wherein the optical flow information comprises a horizontal optical flow feature map, a vertical optical flow feature map, an optical flow strain feature map and a RAFT optical flow feature map; the face action unit is a face action feature sequence.
The method for extracting the optical flow information comprises the following steps:
s201, acquiring a start frame and a peak frame of the frame data to obtain a plurality of first images;
s202, face cutting and background blackening are sequentially carried out on a plurality of first images, and a plurality of second images are obtained;
s203, respectively extracting horizontal, vertical, optical flow strain and RAFT optical flow characteristics from the second image: obtaining a horizontal optical flow characteristic diagram, a vertical optical flow characteristic diagram and an optical flow strain characteristic diagram based on an optical flow algorithm TV-L1 of total variation and L1 norm; obtaining a RAFT optical flow feature map based on a RAFT deep neural network model;
the method for extracting the face action unit comprises the following steps: and extracting facial motion characteristics of the frame data through an openface algorithm, and carrying out normalization processing on the extracted facial motion characteristics to obtain a facial motion characteristic sequence. Among other things, the dark-lit SMIC-HS data set presents certain difficulties for facial motion feature extraction. Thus, the present embodiment darkly enhances SMIC-HS with an ultra lightweight darkness enhancement model IAT.
The method for carrying out normalization processing on the extracted facial action features comprises the following steps:
s2031, taking the facial motion characteristic of the first frame of the video frame sequence as a baseline facial motion characteristic;
s2032, calculating the difference value between the facial motion characteristics of the initial frame and the end frame and the baseline facial motion characteristics in sequence to obtain normalized facial motion characteristics.
The micro expression recognition network model used in the present embodiment is shown in fig. 2:
the AUs features of the microexpressive peak frame and the previous 7 frames are input into a space-time transducer for constructing time sequence changes and spatial information of microexpressive motions. In the AUs spatio-temporal feature extraction module, the model is focused more on the motion changes in the time dimension by adding a temporal position code in the AUs time dimension and then building the positional relationship of the same AU between different frames through a temporal self-saturation layer. CLS token is added in the AUs spatial dimension, and then the local AU is related to global AUs through the spatial self-saturation layer. After passing through the L layer Transformer Encoder, AUs spatiotemporal information with the size of 9×17 is output, and a CLS token with the size of 1×17 is taken out as a depth feature of AUs. Because of the limitation of the current technology, the extracted AUs features have a certain error, and the accuracy rate is less than 70%, so that an optical flow information auxiliary model is required to be corrected. Features are extracted from the TV-L1 optical flow information (horizontal optical flow, vertical optical flow, and optical flow strain) and RAFT optical flow information by the res net18, respectively, and 2 optical flow depth features with a scale of 1×17 are output. In order to better establish the relation between AUs depth features and optical flow depth features, the depth features are input into an M-layer transducer for feature fusion, and finally features with the scale of 1×17 are output as AUs depth features fused with optical flow features. And carrying out depth feature enhancement on the output depth features through an ISDA_AW loss function, and finally outputting the emotion category with the highest probability.
The working process of the network model can be summarized as follows:
s301, constructing a training set by utilizing a horizontal optical flow feature map, a vertical optical flow feature map, an optical flow strain feature map, a RAFT optical flow feature map and a facial action feature sequence;
s302, inputting a facial action feature sequence in a training set into a AUs feature extraction module to obtain AUs depth features;
s303, inputting a horizontal optical flow feature map, a vertical optical flow feature map and an optical flow strain feature map in a training set into a first optical flow feature extraction module to obtain a first optical flow depth feature;
s304, inputting the RAFT optical flow feature map in the training set into a second optical flow feature extraction module to obtain second optical flow depth features;
s305, carrying out depth feature fusion on the AUs depth features, the first optical flow depth features and the second optical flow depth features, and outputting fused depth features;
s306, performing depth feature enhancement on the fused depth features through the self-adaptive weighted loss function;
s307, outputting a training result.
The AUs feature extraction module obtains AUs depth features by the following method:
s3021, adding time position codes on the time dimension of facial action characteristics, and establishing the position relation of the same facial action unit among different frames through a time self-attention layer;
s3022, adding a CLS token to the space dimension of the facial action feature, and establishing a relation between a local facial action unit and the global facial action feature through a space self-attention layer;
s3023, outputting the space-time information of the facial motion characteristics through a plurality of layers Transformer Encoder, and taking out the CLS token with the preset size as the AUs depth characteristics.
It can be seen that compared with the 12, 24, and 32-layer self-attentions used by the conventional visual transducer, the transducer in the AUs depth feature extraction module only uses 3 layers, and the transducer in the depth feature fusion module only uses 2 layers, so that the shallow layer characteristics are very remarkable.
Preferably, the first optical flow feature extraction module and the second optical flow feature extraction module extract optical flow depth features through a plurality of residual convolution networks which are connected in sequence.
The adaptive weighted loss function adopted in the embodiment is isda_awloss, and the calculation method comprises the following steps:
wherein: l (L) j Adaptive weighting loss for class j samples; c is the number of categories; mu (mu) j A weight value for a j-th class of samples;unweighted loss for sample class j;
wherein:
wherein: n is the total number of samples; n (N) j The number of samples for class j; ε.epsilon.0, 1, usually 0.001.
Wherein unweighted loss of samples of class jObtained by an implicit semantic data enhancement algorithm ISDA, wherein the specific formula is as follows:
wherein: e is natural logarithm; lambda is a positive coefficient of the covariance matrix of the control class; t represents a transpose operation;is according to y i A covariance matrix of the sample-like estimates; alpha i =[α i1 ,...,α il ] T Is a depth feature; l is the length of the depth feature;representing class j samples and y i The difference value of the full connection layer weight matrix of the class sample; b j Inductive bias for j-type sample full connection layer; b yi Is y i Inductive biasing of sample-like full-join layers.
In order to verify the feasibility and effectiveness of the application, the following experiments were performed in this example:
in the embodiment, three spontaneous micro expression sets of CASMEII, SAMM and SMIC-HS in the prior art are subjected to data fusion. The fused complex dataset FULL is shown in table 1 and contains a total of 442 samples of 68 subjects, with the number of samples from the CASMEII, SAMM and SMIC-HS datasets being 145, 133 and 164, respectively.
TABLE 1 microexpressive dataset
Data set negative positive surprise
CASMEII 88 32 25
SAMM 92 26 15
SMIC-HS 70 51 43
FULL 250 109 83
Leave-one-out (Leave One Subject Out, LOSO) verification is performed on samples of the composite data set and the single data set, respectively. To ensure that the evaluation results of the experiment are more reference, the present embodiment uses 2 indexes of Unweighted F1 score (UF 1) and Unweighted average recall (Unweighted Average Recall, UAR) to evaluate the recognition performance of the proposed model.
In this embodiment, the method of the present application is compared with the conventional manual feature method and the current latest deep learning method, and the recognition results of the respective recognition methods are evaluated on the FULL, CASME II, SAMM and SMIC-HS datasets, respectively. The specific recognition performance results are shown in table 2:
table 2 identification Performance results
From table 2 it can be seen that: the method (ANAO) of the present application was 0.9% and 3.9% lower UF1 on the FULL and SMIC-HS data sets, 6.77% higher UF1 on the CASMEII data set, 0.52% lower UAR on the SMIC-HS data set, and 1.22% and 8.77% higher UAR on the FULL and CASMEII data sets, respectively, compared to the latest baseline model micro-BERT.
More importantly, the transducer portion of the method (ANAO) of the present application was not pre-trained, whereas the transducer portion of the micro-BERT model was pre-trained on 800 ten thousand images. Furthermore, the performance of the method of the application is higher than that of the C3DBed model based on a transducer. ViT (unmodified visual transducer model) achieved the worst performance compared to other transducer models. These demonstrate that the shallow Transformer model proposed by the present application has good generalization and robustness on small data.
Both the inventive method (ANAO) and BDCNN employ ISDA data enhancement algorithms. In contrast, the performance of ANAO+ISDA loss is much higher than BDCNN+ISDAloss, and ANAO+ISDA_AW loss is also the best method to be higher than BDCNN, namely BDCNN+ISDA+GA (genetic algorithm).
Furthermore, ANAO compares 3 loss functions, namely CE loss, ISDA loss and ISDA_AW loss, and experimental performance is gradually increased. This verifies the significant boosting effect of the isda_aw algorithm of the present application on microexpressive data enhancement.
Compared with the AU-GCN model based on AUs, the recognition performance of ANAO is far higher than that of the AU-GCN model. This verifies that incorporating AUs sequences into shallow space-time convertors facilitates model classification recognition of microexpressions.
Fig. 3 shows a confusion matrix identified on 4 data sets by the identification method of the present embodiment. It can be seen that in the wrong prediction samples, most are identified as negative categories, mainly due to sample category imbalance, which dominate the dataset. Nevertheless, the number of correct predictions for all categories of the 4 data sets is greater than the number of mispredictions, which demonstrates the effectiveness of the method of the present application.
Example 2:
based on the embodiment 1, the specific algorithm process of the isda_aw algorithm is described in this embodiment:
assume that our deep network Model and weights Θ are trained in the microexpressive dataset D = { (x) j ,y j )},x j = (AUs feature j, optical flow feature j), y j ∈{1,...C is x j Is the j-th class tag of (c). Depth feature alpha j =[α j1 ,...,α jl ] T =Model(x j ,Θ)。
The isda_aw algorithm can be summarized as follows:
input: a data set D and a parameter lambda.
And (3) outputting: weight Θ, weight W and bias b of full connection layer, and sample weight μ j
a) Randomly initializing Θ, W and b.
b) Initializing sample weights mu j
c)for t=0to T
d) Extract { (x) from training set D j ,y j )};
e) Calculation of alpha j =Model(x j ,Θ);
f) Calculating to obtain unweighted loss
g) Calculating to obtain a weighted loss L j
h) Updating Θ, W, b and μ by Adam optimizer j
i)end。
It can be seen that in the above steps, the training set D and the coefficient λ are input first. Output weight Θ, weight matrix W and inductive bias b of full connection layer and self-adaptive weight μ of jth sample j . Next, parameters Θ, W and b are randomly initialized, and μ is initialized j . Then, the loop starts to be executed, AUs features and optical flow features of the j-th class sample are obtained from the training set D, the features and weights Θ are input into the network Model, and then depth features alpha are output j . Calculation of alpha j Unweighted loss of class j samples of (2)According to mu j Calculate->Adaptation of class j samples of (2)Weight loss L j . Finally, through Adam optimizer and L j Loss update parameters Θ, W, b, and μ j . After execution to the maximum number of iterations T, the loop is ended, returning parameters Θ, W, b and μ j
The ISDA_AW algorithm not only can enhance the depth semantic features of the minority class, but also can inhibit the importance degree of the majority class through the self-adaptive weight, and the importance of the minority class in the feature space is improved, so that the problem of a small sample of the micro-expression dataset is relieved.
Example 3:
based on any one of the embodiments, the embodiment performs depth feature fusion on AUs depth features, first optical flow depth features and second optical flow depth features through a shallow layer transducer.
In this embodiment, on the composite dataset, the depth fusion mode of AUs and the optical flow depth feature is compared and analyzed, that is, the depth feature fusion mode based on the shallow layer transducer and the depth feature fusion mode based on the traditional full connection layer (Fully Connected layers, FC) in the present application are subjected to experimental comparison, and the comparison result is shown in table 3:
TABLE 3 comparison of different depth feature fusion methods
Depth feature fusion method Acc F1-score UF1 UAR
FC 87.46 0.8744 0.8686 0.8669
Shallow layer transducer 89.16 0.8921 0.8813 0.8964
The performance index for comparison in table 3 is: accuracy (Acc), F1 score (F1-score), unweighted F1 score (UF 1), and Unweighted Average Recall (UAR).
As can be seen from the comparison experiment results in Table 3, the shallow Transformer adopted in the embodiment has significantly better performance than the conventional FC for the feature fusion mode of AUs space-time features and optical flow depth features.
Example 4:
based on any one of the above embodiments, before face clipping, the clipping frame coordinate normalization processing is performed on the first image by the following method:
s2021, accumulating the following parameters to obtain an accumulation result: in the frame data, each image corresponds to the height and width of the left top vertex, the height and width of the right top vertex and the height and width of the lowest vertex of the initial cutting frame;
s2022, calculating the average value of the height and the width of the top left vertex based on the accumulation result, wherein the average value is defined as a first average value and a second average value; respectively calculating the average value of the width and the height of the top right vertex, and defining the average value as a third average value and a fourth average value; respectively calculating the average value of the height and the width of the top point at the lowest point, and defining a fifth average value and a sixth average value;
s2023, taking the absolute value difference value of the first mean value and the fifth mean value as the height of the final cutting frame; taking the absolute value difference value of the second mean value and the third mean value as the width of the final cutting frame;
s2024, obtaining the final cutting frame based on the height and the width of the final cutting frame.
In a more preferred embodiment, the background darkening method is: and setting all pixel values outside the cutting frame to zero based on the final cutting frame coordinates, and obtaining the face image after background blackening.
The result of the normalization processing of the coordinates of the cutting frame in this embodiment is shown in the left graph in fig. 4, and the result of the final completion of cutting and blackening the background is shown in the right graph in fig. 4.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the application, and is not meant to limit the scope of the application, but to limit the application to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the application are intended to be included within the scope of the application.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.

Claims (10)

1. A method for microexpressive recognition based on improved implicit semantic data enhancement, comprising:
acquiring frame data from a micro-expression start frame to an end frame in the micro-expression data set;
preprocessing the frame data, and extracting optical flow information and a face action unit;
inputting the optical flow information and the human face action unit into a micro-expression recognition network for training to obtain a trained micro-expression recognition network; the micro expression recognition network is used for carrying out depth characteristic enhancement through a self-adaptive weighting loss function;
and obtaining an image to be processed, inputting the image to be processed into the trained microexpressive recognition network, and obtaining microexpressive recognition results.
2. The improved implicit semantic data enhancement based micro-expression recognition method of claim 1, wherein the optical flow information comprises a horizontal optical flow feature map, a vertical optical flow feature map, an optical flow strain feature map, and a RAFT optical flow feature map; the face action unit is a face action feature sequence.
3. The method for enhanced micro-expression recognition based on improved latent semantic data according to claim 2,
the method for extracting the optical flow information comprises the following steps:
collecting a start frame and a peak frame of the frame data to obtain a plurality of first images;
sequentially cutting faces and blackening backgrounds of the first images to obtain second images;
respectively extracting horizontal, vertical, optical flow strain and RAFT optical flow characteristics of the second image to obtain a horizontal optical flow characteristic diagram, a vertical optical flow characteristic diagram, an optical flow strain characteristic diagram and a RAFT optical flow characteristic diagram;
the method for extracting the face action unit comprises the following steps:
and extracting facial action features from the frame data, and normalizing the extracted facial action features to obtain a facial action feature sequence.
4. The improved implicit semantic data enhancement based microexpressive recognition method according to claim 3, wherein prior to face clipping, clipping frame coordinate normalization is performed on the first image by:
the following parameters are accumulated to obtain an accumulation result: in the frame data, each image corresponds to the height and width of the left top vertex, the height and width of the right top vertex and the height and width of the lowest vertex of the initial cutting frame;
based on the accumulation result, respectively calculating the average value of the height and the width of the top left vertex, and defining the average value as a first average value and a second average value; calculating the average value of the width of the top right vertex, and defining the average value as a third average value; calculating the average value of the height of the top point at the lowest position, and defining the average value as a fifth average value;
taking the absolute value difference value of the first mean value and the fifth mean value as the height of the final cutting frame;
taking the absolute value difference value of the second mean value and the third mean value as the width of the final cutting frame;
based on the height and width of the final crop frame, the final crop frame is obtained.
5. A method of microexpressive data enhancement based on improvement as claimed in claim 3, wherein said method of normalizing extracted facial motion features comprises:
taking the facial motion characteristic of the first frame of the video frame sequence as a baseline facial motion characteristic;
and carrying out difference calculation on the facial motion characteristics from the initial frame to the end frame and the baseline facial motion characteristics in sequence to obtain normalized facial motion characteristics.
6. The improved implicit semantic data enhancement based micro-expression recognition method of claim 2, wherein the method of inputting the optical flow information and the face action unit into the micro-expression recognition network for training comprises:
constructing a training set by using the horizontal optical flow characteristic diagram, the vertical optical flow characteristic diagram, the optical flow strain characteristic diagram, the RAFT optical flow characteristic diagram and the facial action characteristic sequence;
inputting the facial action feature sequence in the training set into a AUs feature extraction module to obtain AUs depth features;
inputting a horizontal optical flow feature map, a vertical optical flow feature map and an optical flow strain feature map in a training set into a first optical flow feature extraction module to obtain a first optical flow depth feature;
inputting the RAFT optical flow feature map in the training set into a second optical flow feature extraction module to obtain second optical flow depth features;
performing depth feature fusion on the AUs depth features, the first optical flow depth features and the second optical flow depth features, and outputting fused depth features;
performing depth feature enhancement on the fused depth features through the self-adaptive weighted loss function;
and outputting a training result.
7. The improved implicit semantic data enhancement based micro-expression recognition method of claim 6, wherein the AUs feature extraction module obtains AUs depth features by:
adding time position codes on the time dimension of facial action features, and establishing the position relation of the same facial action unit among different frames through a time self-attention layer;
adding a CLS token on the space dimension of the facial action feature, and establishing the relation between the local facial action unit and the global facial action feature through a space self-attention layer;
and outputting the space-time information of the facial action characteristics through a plurality of layers Transformer Encoder, and taking out the CLS token with a preset size as the AUs depth characteristics.
8. The method of claim 6, wherein the AUs depth feature, the first optical flow depth feature, and the second optical flow depth feature are depth feature fused by a shallow transducer.
9. The improved implicit semantic data enhancement based micro-expression recognition method of claim 6, wherein the first optical flow feature extraction module and the second optical flow feature extraction module each extract optical flow depth features through a plurality of sequentially connected residual convolution networks.
10. The improved implicit semantic data enhancement based microexpressive recognition method according to any one of claims 1-9, wherein said adaptive weighting loss function is:
wherein: l (L) j Adaptive weighting loss for class j samples; c is the number of categories; mu (mu) j A weight value for a j-th class of samples;unweighted loss for sample class j;
wherein:
wherein: n is the total number of samples; n (N) j The number of samples for class j; epsilon (0, 1).
CN202310854565.5A 2023-07-12 2023-07-12 Micro-expression recognition method based on improved implicit semantic data enhancement Pending CN116884067A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310854565.5A CN116884067A (en) 2023-07-12 2023-07-12 Micro-expression recognition method based on improved implicit semantic data enhancement

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310854565.5A CN116884067A (en) 2023-07-12 2023-07-12 Micro-expression recognition method based on improved implicit semantic data enhancement

Publications (1)

Publication Number Publication Date
CN116884067A true CN116884067A (en) 2023-10-13

Family

ID=88254430

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310854565.5A Pending CN116884067A (en) 2023-07-12 2023-07-12 Micro-expression recognition method based on improved implicit semantic data enhancement

Country Status (1)

Country Link
CN (1) CN116884067A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117392727A (en) * 2023-11-02 2024-01-12 长春理工大学 Facial micro-expression recognition method based on contrast learning and feature decoupling

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496217A (en) * 2021-07-08 2021-10-12 河北工业大学 Method for identifying human face micro expression in video image sequence
CN114882570A (en) * 2022-05-31 2022-08-09 华中师范大学 Remote examination abnormal state pre-judging method, system, equipment and storage medium
CN115359534A (en) * 2022-08-25 2022-11-18 成都信息工程大学 Micro expression recognition method based on multi-feature fusion and double-flow network
CN115424315A (en) * 2022-07-25 2022-12-02 浙江大华技术股份有限公司 Micro-expression detection method, electronic device and computer-readable storage medium
CN115797835A (en) * 2022-12-01 2023-03-14 大连理工大学 Non-supervision video target segmentation algorithm based on heterogeneous Transformer
CN116030516A (en) * 2022-12-15 2023-04-28 中国矿业大学 Micro-expression recognition method and device based on multi-task learning and global circular convolution
EP4174770A1 (en) * 2021-10-28 2023-05-03 Toyota Jidosha Kabushiki Kaisha Monocular-vision-based detection of moving objects
KR20230063135A (en) * 2021-11-01 2023-05-09 현대모비스 주식회사 Method and Apparatus for Spatio-temporal Action Localization Based on Hierarchical Structure

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113496217A (en) * 2021-07-08 2021-10-12 河北工业大学 Method for identifying human face micro expression in video image sequence
EP4174770A1 (en) * 2021-10-28 2023-05-03 Toyota Jidosha Kabushiki Kaisha Monocular-vision-based detection of moving objects
KR20230063135A (en) * 2021-11-01 2023-05-09 현대모비스 주식회사 Method and Apparatus for Spatio-temporal Action Localization Based on Hierarchical Structure
CN114882570A (en) * 2022-05-31 2022-08-09 华中师范大学 Remote examination abnormal state pre-judging method, system, equipment and storage medium
CN115424315A (en) * 2022-07-25 2022-12-02 浙江大华技术股份有限公司 Micro-expression detection method, electronic device and computer-readable storage medium
CN115359534A (en) * 2022-08-25 2022-11-18 成都信息工程大学 Micro expression recognition method based on multi-feature fusion and double-flow network
CN115797835A (en) * 2022-12-01 2023-03-14 大连理工大学 Non-supervision video target segmentation algorithm based on heterogeneous Transformer
CN116030516A (en) * 2022-12-15 2023-04-28 中国矿业大学 Micro-expression recognition method and device based on multi-task learning and global circular convolution

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
SHUTING YIN 等: "Weakly Supervised Land Cover Classification Method For Large-Scale Multi-Resolution Labeled Satellite Images Data Sets", 《IGARSS 2020 - 2020 IEEE INTERNATIONAL GEOSCIENCE AND REMOTE SENSING SYMPOSIUM》, pages 2 *
ZHIHONG CHANG 等: "STA-GCN: Spatial-Temporal Self-Attention Graph Convolutional Networks for Traffic-Flow Prediction", 《MDPI》, pages 3 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117392727A (en) * 2023-11-02 2024-01-12 长春理工大学 Facial micro-expression recognition method based on contrast learning and feature decoupling
CN117392727B (en) * 2023-11-02 2024-04-12 长春理工大学 Facial micro-expression recognition method based on contrast learning and feature decoupling

Similar Documents

Publication Publication Date Title
CN108133188B (en) Behavior identification method based on motion history image and convolutional neural network
CN107526785B (en) Text classification method and device
US20190294929A1 (en) Automatic Filter Pruning Technique For Convolutional Neural Networks
Tang et al. Deep networks for robust visual recognition
CN110659665B (en) Model construction method of different-dimension characteristics and image recognition method and device
CN111832516B (en) Video behavior recognition method based on unsupervised video representation learning
CN109885709B (en) Image retrieval method and device based on self-coding dimensionality reduction and storage medium
CN110378208B (en) Behavior identification method based on deep residual error network
CN112257449B (en) Named entity recognition method and device, computer equipment and storage medium
CN107463917B (en) Improved LTP and two-dimensional bidirectional PCA fusion-based face feature extraction method
CN110889865B (en) Video target tracking method based on local weighted sparse feature selection
CN111814611B (en) Multi-scale face age estimation method and system embedded with high-order information
CN109255381B (en) Image classification method based on second-order VLAD sparse adaptive depth network
CN107066951B (en) Face spontaneous expression recognition method and system
CN113989890A (en) Face expression recognition method based on multi-channel fusion and lightweight neural network
CN110991554B (en) Improved PCA (principal component analysis) -based deep network image classification method
CN112329784A (en) Correlation filtering tracking method based on space-time perception and multimodal response
CN116884067A (en) Micro-expression recognition method based on improved implicit semantic data enhancement
CN112883931A (en) Real-time true and false motion judgment method based on long and short term memory network
CN114332500A (en) Image processing model training method and device, computer equipment and storage medium
He et al. What catches the eye? Visualizing and understanding deep saliency models
CN112069892A (en) Image identification method, device, equipment and storage medium
CN115457332A (en) Image multi-label classification method based on graph convolution neural network and class activation mapping
CN108388918B (en) Data feature selection method with structure retention characteristics
CN111310807B (en) Feature subspace and affinity matrix joint learning method based on heterogeneous feature joint self-expression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination