CN116665278A - Micro-expression recognition method, micro-expression recognition device, computer equipment and storage medium - Google Patents

Micro-expression recognition method, micro-expression recognition device, computer equipment and storage medium Download PDF

Info

Publication number
CN116665278A
CN116665278A CN202310687981.0A CN202310687981A CN116665278A CN 116665278 A CN116665278 A CN 116665278A CN 202310687981 A CN202310687981 A CN 202310687981A CN 116665278 A CN116665278 A CN 116665278A
Authority
CN
China
Prior art keywords
expression
micro
video
space
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310687981.0A
Other languages
Chinese (zh)
Inventor
宋延新
王健宗
黄章成
吴天博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN202310687981.0A priority Critical patent/CN116665278A/en
Publication of CN116665278A publication Critical patent/CN116665278A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H40/00ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices
    • G16H40/60ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices
    • G16H40/67ICT specially adapted for the management or administration of healthcare resources or facilities; ICT specially adapted for the management or operation of medical equipment or devices for the operation of medical equipment or devices for remote operation
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/20ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Biomedical Technology (AREA)
  • Human Computer Interaction (AREA)
  • Medical Informatics (AREA)
  • Public Health (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Epidemiology (AREA)
  • Primary Health Care (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Pathology (AREA)
  • Business, Economics & Management (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Measuring And Recording Apparatus For Diagnosis (AREA)

Abstract

The present application relates to the field of image recognition and digital medical treatment, and in particular, to a method, an apparatus, a computer device, and a storage medium for micro-expression recognition. The method comprises the following steps: acquiring a video to be processed comprising micro-expressions and macro-expressions; preprocessing data including face detection of the video to be processed to obtain a preprocessed video; carrying out space-time feature extraction on the preprocessed video through a three-dimensional micro-expression recognition model to obtain micro-expression space-time features and macro-expression space-time features; extracting shared space-time characteristics of the micro-expression and the macro-expression from the micro-expression space-time characteristics and the macro-expression space-time characteristics; and carrying out expression classification on the micro-expressions according to the shared space-time features, the micro-expression space-time features and the macro-expression space-time features to obtain a recognition result of the micro-expressions. According to the application, the space-time characteristics of the micro-expressions are considered, and the space-time characteristics of the macro-expressions are also considered, so that the expression capacity of the micro-expressions is increased, and the accuracy of micro-expression recognition is improved.

Description

Micro-expression recognition method, micro-expression recognition device, computer equipment and storage medium
Technical Field
The present application relates to the field of image recognition and digital medical treatment, and in particular, to a method, an apparatus, a computer device, and a storage medium for micro-expression recognition.
Background
In recent years, with the rapid development of artificial intelligence technology, facial micro-expression recognition tasks are also becoming more and more important. At present, the micro-expression recognition method mainly recognizes facial movement characteristics, texture characteristics and the like of the micro-expression to acquire the micro-expression; for example, in digital medical related scenarios, such as intelligent diagnosis and remote consultation, the current condition of the patient can be identified by identifying the micro-expression of the patient. However, the facial motion amplitude of the human face is smaller in the micro-expression, so that the characteristics of the micro-expression are difficult to completely capture and accurately describe. In addition, in the micro-expression video sequence, the problems of short duration time of the micro-expressions, limited quantity of the micro-expressions and the like often exist, the difficulty of capturing the micro-expression features is increased, and the recognition rate of the micro-expressions is difficult to further improve. Therefore, the micro-expression recognition method in the prior art has the problem of low micro-expression recognition rate.
Disclosure of Invention
Based on the foregoing, it is necessary to provide a method, an apparatus, a computer device and a storage medium for identifying a micro-expression, so as to solve the problem that the micro-expression identification rate is not high in the existing micro-expression identification technology.
A micro-expression recognition method, comprising:
acquiring a video to be processed comprising micro-expressions and macro-expressions;
preprocessing the data comprising the face detection of the video to be processed to obtain a preprocessed video;
extracting space-time characteristics of the preprocessed video through a three-dimensional micro-expression recognition model to obtain micro-expression space-time characteristics and macro-expression space-time characteristics;
extracting shared space-time features of the micro-expression and the macro-expression from the micro-expression space-time features and the macro-expression space-time features;
and carrying out expression classification on the micro-expressions according to the shared space-time characteristics, the micro-expression space-time characteristics and the macro-expression space-time characteristics to obtain the recognition result of the micro-expressions.
A microexpressive recognition device, comprising:
the video module to be processed is used for acquiring a video to be processed comprising micro-expressions and macro-expressions;
the preprocessing video module is used for preprocessing the data comprising face detection of the video to be processed to obtain a preprocessed video;
the space-time feature extraction module is used for extracting space-time features of the preprocessed video through the three-dimensional micro-expression recognition model to obtain micro-expression space-time features and macro-expression space-time features;
the sharing space-time feature module is used for extracting sharing space-time features of the micro-expression and the macro-expression from the micro-expression space-time features and the macro-expression space-time features;
and the recognition result module is used for carrying out expression classification on the micro-expressions according to the shared space-time characteristics, the micro-expression space-time characteristics and the macro-expression space-time characteristics to obtain a recognition result of the micro-expressions.
A computer device comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, the processor implementing the microexpressive recognition method described above when executing the computer readable instructions.
One or more readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform a method of microexpressive recognition as described above.
The method, the device, the computer equipment and the storage medium for identifying the micro-expressions are used for acquiring the video to be processed comprising the micro-expressions and the macro-expressions; preprocessing the data comprising the face detection of the video to be processed to obtain a preprocessed video; extracting space-time characteristics of the preprocessed video through a three-dimensional micro-expression recognition model to obtain micro-expression space-time characteristics and macro-expression space-time characteristics; extracting shared space-time features of the micro-expression and the macro-expression from the micro-expression space-time features and the macro-expression space-time features; and carrying out expression classification on the micro-expressions according to the shared space-time characteristics, the micro-expression space-time characteristics and the macro-expression space-time characteristics to obtain the recognition result of the micro-expressions. According to the application, the micro-expression space-time characteristics, the macro-expression space-time characteristics and the shared space-time characteristics between the micro-expression space-time characteristics and the macro-expression space-time characteristics of the video to be processed are obtained through the three-dimensional micro-expression recognition model, micro-expression recognition is performed based on the obtained space-time characteristics of the video to be processed in three dimensions, and the micro-expression space-time characteristics are considered, and meanwhile, the macro-expression space-time characteristics are fully considered, so that the expression capacity of the micro-expression is increased. And the space-time characteristics of the macro expression are richer than those of the macro expression, so that the accuracy of micro expression recognition is improved. The micro-expression recognition method can be applied to intelligent diagnosis and remote consultation, so that accuracy of recognition of micro-expressions of patients during consultation can be improved, and further efficiency and effect of consultation are improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic view of an application environment of a micro-expression recognition method according to an embodiment of the application;
FIG. 2 is a flow chart illustrating a micro-expression recognition method according to an embodiment of the application;
FIG. 3 is a schematic diagram of a micro-expression recognition device according to an embodiment of the application;
FIG. 4 is a schematic diagram of a computer device in accordance with an embodiment of the application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.
The micro-expression recognition method provided by the embodiment can be applied to an application environment as shown in fig. 1, wherein a client communicates with a server. Clients include, but are not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server may be implemented by a stand-alone server or a server cluster formed by a plurality of servers.
In an embodiment, as shown in fig. 2, a micro-expression recognition method is provided, and the method is applied to the server in fig. 1 for illustration, and includes the following steps:
s10, acquiring a video to be processed comprising the micro-expression and the macro-expression.
Understandably, microexpressions refer to facial expressions that flash instantaneously, revealing the person's true emotion and emotion. In contrast, macro-expressions refer to facial expressions of a slightly longer duration, which also reveal the true emotion and emotion of a person. The video to be processed refers to a video to be subjected to micro-expression recognition, which contains both micro-expressions and macro-expressions. Generally, the emotion types expressed by the micro-expressions include happiness, heart injury, fear, anger, aversion, and the like. Likewise, the emotion types expressed by the macro expression also include happiness, heart injury, fear, anger, aversion, and the like.
S20, preprocessing the data including face detection of the video to be processed to obtain a preprocessed video.
Data preprocessing is understood to mean preprocessing the data of the video to be processed. Data preprocessing includes, but is not limited to, face detection. The face detection refers to a process of performing face detection on a video frame sequence to be processed of a video to be processed to obtain key points of the face. The preprocessing video refers to video obtained after data preprocessing.
S30, extracting space-time characteristics of the preprocessed video through a three-dimensional micro-expression recognition model to obtain micro-expression space-time characteristics and macro-expression space-time characteristics.
Understandably, the three-dimensional micro-expression recognition model is a trained three-dimensional convolutional neural network, and comprises a micro-expression recognition network and a macro-expression recognition network for extracting space-time characteristics of the preprocessing video. For example, the three-dimensional microexpressive recognition model can be 3D-ResNet10 (three-dimensional depth residual error learning network). The space-time characteristics of the preprocessing video comprise micro-expression space-time characteristics and macro-expression space-time characteristics. The micro-expression space-time feature refers to the feature of the pre-processed video in the temporal and spatial dimensions with respect to the micro-expression. Macro expression spatiotemporal features refer to features of the pre-processed video in both the temporal and spatial dimensions that pertain to macro expressions.
S40, extracting the shared space-time characteristics of the micro expression and the macro expression from the micro expression space-time characteristics and the macro expression space-time characteristics.
Understandably, the same spatiotemporal features exist between micro-expressions and macro-expressions of the pre-processed video. The shared spatiotemporal features are the same spatiotemporal features between the micro-expressive spatiotemporal features and the macro-expressive spatiotemporal features. For example, if the pre-processed video has both micro-expressive spatiotemporal features that express the happy emotion and macro-expressive spatiotemporal features that express the happy emotion, then there is a shared spatiotemporal feature that expresses the happy emotion between the micro-expressive spatiotemporal features and macro-expressive spatiotemporal features of the pre-processed video.
S50, carrying out expression classification on the micro-expressions according to the shared space-time characteristics, the micro-expression space-time characteristics and the macro-expression space-time characteristics to obtain the recognition result of the micro-expressions.
Understandably, after the three-dimensional micro-expression recognition model acquires the shared space-time feature, the micro-expression space-time feature and the macro-expression space-time feature of the pre-processed video, the three-dimensional micro-expression recognition model can perform expression classification according to the association relationship among the shared space-time feature, the micro-expression space-time feature and the macro-expression space-time feature. The variety of expressions includes, but is not limited to, happiness, heart injury, fear, anger, and aversion. The recognition result is a result of the three-dimensional micro-expression recognition model recognizing and classifying the micro-expressions in the preprocessing video according to the shared space-time characteristics, the micro-expression space-time characteristics and the macro-expression space-time characteristics in the preprocessing video.
In the steps S10-S50, a video to be processed comprising micro-expressions and macro-expressions is obtained; preprocessing the data comprising the face detection of the video to be processed to obtain a preprocessed video; extracting space-time characteristics of the preprocessed video through a three-dimensional micro-expression recognition model to obtain micro-expression space-time characteristics and macro-expression space-time characteristics; extracting shared space-time features of the micro-expression and the macro-expression from the micro-expression space-time features and the macro-expression space-time features; and carrying out expression classification on the micro-expressions according to the shared space-time characteristics, the micro-expression space-time characteristics and the macro-expression space-time characteristics to obtain the recognition result of the micro-expressions. In the embodiment, the micro-expression space-time characteristics, the macro-expression space-time characteristics and the shared space-time characteristics between the micro-expression space-time characteristics and the macro-expression space-time characteristics of the video to be processed are obtained through the three-dimensional micro-expression recognition model, micro-expression recognition is performed based on the obtained space-time characteristics of the video to be processed in three dimensions, the macro-expression space-time characteristics are fully considered while the micro-expression space-time characteristics are considered, and the expression capacity of the micro-expression is improved. And the space-time characteristics of the macro expression are richer than those of the macro expression, so that the accuracy of micro expression recognition is improved. The voice recognition method can be applied to the digital medical field, for example, during intelligent diagnosis and remote consultation, the accuracy of recognition of the micro expression of the patient during inquiry can be improved, and the inquiry efficiency and effect are further improved.
Optionally, in step S20, the preprocessing the data including face detection is performed on the video to be processed to obtain a preprocessed video, which includes:
s201, performing face detection on a video frame sequence to be processed in the video to be processed by using a visual library to obtain face key points;
s202, face clipping is carried out according to the face key points, and a face video frame sequence is obtained;
s203, generating the preprocessing video according to the face video frame sequence.
Visual library is understood to generally refer to pre-written code and data used to construct or optimize a computer program. For example, openCV is a widely used open source computer vision library to date, and OpenCV includes face recognition applications, object detection, and the like. The video to be processed comprises at least one sequence of video frames to be processed. The video frame sequence to be processed refers to a video frame sequence to be processed comprising micro-expressions and/or macro-expressions. The face detection refers to a process of detecting faces contained in a video frame sequence to be processed by using a visual library to obtain key points of the faces. Wherein the face keypoints may comprise 68 keypoints of the face. The face clipping refers to clipping faces in a video frame sequence to be processed according to detected face key points to obtain a face video frame sequence containing micro-expressions and/or macro-expressions, and eliminating influence of irrelevant background in video frames on micro-expression recognition. And generating a preprocessing video according to the human face video frame sequence containing the micro-expression and/or the macro-expression.
In this embodiment, the detected face is cut by performing face detection on the video frame sequence to be processed including the micro-expression and/or macro-expression in the video to be processed, so that the face including the micro-expression and/or macro-expression can be accurately cut, the influence of irrelevant background is eliminated, and the accuracy of micro-expression recognition is improved.
Optionally, in step S203, the generating the preprocessing video according to the face video frame sequence includes:
s2031, carrying out face alignment on face video frames in the face video frame sequence according to the face key points to generate a reference face video;
and S2032, performing time domain image interpolation on the reference face video to obtain the preprocessing video.
The face alignment is a process of aligning faces in a face video frame sequence according to a face key point to obtain a reference face video. The reference face video aligned by the faces is beneficial to better and faster micro-expression recognition, and accuracy and efficiency of micro-expression recognition are improved. In general, it is difficult to accurately acquire the spatiotemporal features of a microexpressive because of the short duration of the microexpressive. Thus, the time-space characteristics of the micro-expressions can be better obtained by prolonging the duration time of the micro-expressions. Similarly, the time-space characteristics of the macro expression can be better obtained by prolonging the duration time of the macro expression. Specifically, by performing time domain image interpolation on the reference face video, that is, performing a difference value in the time domain in the reference face video, increasing video frames contained in the reference face video, and prolonging the duration of micro-expressions and/or macro-expressions in the reference face video, the micro-expression space-time characteristics and/or macro-expression space-time characteristics can be improved, and further, the accuracy of micro-expression recognition is improved.
Optionally, the three-dimensional micro-expression recognition model includes a micro-expression recognition network and a macro-expression recognition network;
in step S30, that is, the extracting the space-time feature of the preprocessed video through the three-dimensional micro-expression recognition model, to obtain the micro-expression space-time feature and the macro-expression space-time feature includes:
s301, extracting micro-expression space-time characteristics from a video frame sequence to be processed in the preprocessing video through the micro-expression recognition network to obtain the micro-expression space-time characteristics;
s302, extracting macro expression space-time characteristics from a video frame sequence to be processed in the preprocessing video through the macro expression recognition network to obtain the macro expression space-time characteristics.
Understandably, the microexpressive recognition network is used for recognizing microexpressions in the preprocessing video and extracting microexpressive space-time characteristics of the preprocessing video. The macro expression recognition network is used for recognizing macro expressions in the preprocessing video and extracting macro expression space-time characteristics of the preprocessing video. Feature encoders with the same structure are arranged between the micro-expression recognition network and the macro-expression recognition network, and feature parameter sharing is carried out between the two feature encoders, so that the same features among the dimensional expression space-time feature macro-expression space-time features are extracted.
Optionally, before step S30, that is, before the extracting the space-time features of the preprocessed video by using the three-dimensional micro-expression recognition model, the method includes:
s303, acquiring a micro-expression video sample set and a macro-expression video sample set;
s304, extracting space-time characteristic samples of the micro-expression video sample set and the macro-expression video sample set through an initial three-dimensional micro-expression recognition model to obtain a micro-expression space-time characteristic sample set corresponding to the micro-expression video sample set and a macro-expression space-time characteristic sample set corresponding to the macro-expression video sample set;
s305, constructing a four-tuple loss function according to the micro-expression space-time feature sample set and the macro-expression space-time feature sample set;
s306, determining a loss value according to the four-tuple loss function and the cross entropy loss function;
s307, when the loss value does not meet the convergence condition, iteratively updating initial parameters of the initial three-dimensional micro-expression recognition model, and calculating a new loss value according to the updated initial parameters; and when the new loss value meets the convergence condition, determining the initial three-dimensional micro-expression recognition model corresponding to the new loss value as the three-dimensional micro-expression recognition model.
Understandably, the set of microexpressive video samples includes several microexpressive videos of multiple expressions. For example, the set of microexpressive video samples includes a microexpressive standard video, a first microexpressive sample video, and a second microexpressive sample video. The micro-expression standard video and the first micro-expression sample video refer to different micro-expression videos corresponding to the first type of expression. The second micro-expression sample video refers to a second class table belonging to a different class from the first class expressionMicro-expression video corresponding to the emotion. The macro expression video sample set comprises a plurality of macro expression videos with various expressions. The initial three-dimensional micro-expression recognition model is an untrained three-dimensional convolutional neural network and comprises an initial micro-expression recognition network and an initial macro-expression recognition network and is used for extracting space-time characteristic samples of a micro-expression video sample set and a macro-expression video sample set. The initial micro-expression recognition network is used for extracting space-time characteristic samples of the micro-expression video sample set; the initial macro expression recognition network is used for extracting space-time characteristic samples of the macro expression video sample set. The micro-expression space-time feature sample set comprises a plurality of micro-expression space-time feature samples. The macro expression space-time feature sample set comprises a plurality of macro expression space-time feature samples. In order to learn the shared space-time characteristics between the micro expression space-time sample and the macro expression space-time sample, a four-tuple loss function I is constructed q . In order to solve the problem of unbalanced micro expression samples, a cross entropy loss function I is introduced Fcoal . According to the four-tuple loss function I q And cross entropy loss function I Fcoal Determining the total loss function I, then i=i q +I Fcoal . In model training, the smaller the loss value, the better the model is, and the model is continuously trained so that the loss value of the model meets the convergence condition, namely, the closer the loss value is to the convergence condition, the better the model is. The convergence condition may be less than a preset convergence threshold, for example, the preset convergence threshold is 0.1, and the convergence condition is that the loss value is less than 0.1. And when the loss value meets the convergence condition, stopping training, and determining the initial three-dimensional micro-expression recognition model corresponding to the loss value as a three-dimensional micro-expression recognition model. And when the loss value does not meet the convergence condition, iteratively updating initial parameters of the initial three-dimensional micro-expression recognition model. And calculating a new loss value according to the updated initial parameters, and judging whether the new loss value meets the convergence condition. And stopping training until the new loss value meets the convergence condition, and determining the initial three-dimensional micro-expression recognition model corresponding to the new loss value as a three-dimensional micro-expression recognition model.
In the embodiment, the loss value is determined through the four-tuple loss function and the cross entropy loss function, so that common characteristics between the micro-expression and the macro-expression are learned, the problem of unbalance of the micro-expression sample is solved, and the recognition accuracy of the three-dimensional micro-expression recognition model can be improved.
Optionally, the micro-expression video sample set includes a micro-expression standard video, a first micro-expression sample video, and a second micro-expression sample video; the micro-expression standard video and the first micro-expression sample video refer to different micro-expression videos corresponding to a first type of expression, and the second micro-expression sample video refers to a micro-expression video corresponding to a second type of expression, wherein the first type of expression belongs to different types of expression; the macro expression video sample set comprises macro expression sample videos corresponding to the second type of expressions;
in step S305, the constructing a four-tuple loss function according to the micro-expression space-time feature sample set and the macro-expression space-time feature sample set includes:
s3051, acquiring a first space-time feature of the micro-expression standard video, a second space-time feature of the first micro-expression sample video and a third space-time feature of the second micro-expression sample video from the micro-expression space-time feature sample set;
s3052, acquiring a fourth time-space feature of the macro expression sample video from the macro expression time-space feature sample set;
s3053, constructing the four-tuple loss function according to the first space-time feature, the second space-time feature, the third space-time feature and the fourth space-time feature.
Understandably, the first and second expressions are different kinds of expressions. For example, the first type of expression is happy, and the second type of expression may be fear. And taking the micro-expression standard video as a micro-expression anchor sample of the first type of expression, and taking the first micro-expression sample video as a micro-expression positive sample of the first type of expression. The second micro-expression sample video and the first micro-expression sample video are micro-expression videos with different expression categories.
In the embodiment, the four-tuple loss function is constructed by acquiring the space-time characteristics of the micro-expressions and the macro-expressions of different expression types, so that the common characteristics between the micro-expressions and the macro-expressions of different expression types can be fully learned, and the recognition accuracy of the three-dimensional micro-expression recognition model is improved.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic, and should not limit the implementation process of the embodiment of the present application.
In an embodiment, a micro-expression recognition device is provided, where the micro-expression recognition device corresponds to the micro-expression recognition method in the above embodiment one by one. As shown in fig. 3, the micro-expression recognition apparatus includes a video module to be processed 10, a pre-processing video module 20, a spatiotemporal feature extraction module 30, a shared spatiotemporal feature module 40, and a recognition result module 50. The functional modules are described in detail as follows:
optionally, in step S203, the generating the preprocessing video according to the face video frame sequence includes:
the video module to be processed 10 is configured to obtain a video to be processed including a micro expression and a macro expression;
the preprocessing video module 20 is configured to perform data preprocessing including face detection on the video to be processed to obtain a preprocessed video;
the space-time feature extraction module 30 is used for extracting space-time features of the preprocessed video through a three-dimensional micro-expression recognition model to obtain micro-expression space-time features and macro-expression space-time features;
a shared space-time feature module 40, configured to extract a shared space-time feature of the micro-expression and the macro-expression from the micro-expression space-time feature and the macro-expression space-time feature;
and the recognition result module 50 is configured to perform expression classification on the micro-expressions according to the shared space-time feature, the micro-expression space-time feature and the macro-expression space-time feature, so as to obtain a recognition result of the micro-expressions.
Preprocessing video module 20, including:
the human face key point unit is used for carrying out human face detection on the video frame sequence to be processed in the video to be processed by utilizing the visual library to obtain human face key points;
the face video frame sequence unit is used for carrying out face cutting according to the face key points to obtain a face video frame sequence;
the preprocessing video unit is used for generating the preprocessing video according to the face video frame sequence.
Optionally, the preprocessing video unit includes:
the face alignment unit is used for carrying out face alignment on the face video frames in the face video frame sequence according to the face key points to generate a reference face video;
and the time domain image interpolation unit is used for performing time domain image interpolation on the reference face video to obtain the preprocessing video.
Optionally, the three-dimensional micro-expression recognition model includes a micro-expression recognition network and a macro-expression recognition network;
the spatiotemporal feature extraction module 30 includes:
the microexpressive space-time feature unit is used for extracting microexpressive space-time features of a video frame sequence to be processed in the preprocessing video through the microexpressive recognition network to obtain the microexpressive space-time features;
and the macro expression space-time feature unit is used for extracting macro expression space-time features from the video frame sequence to be processed in the preprocessing video through the macro expression recognition network to obtain the macro expression space-time features.
Optionally, before the spatio-temporal feature extraction module 30, it comprises:
the video sample set module is used for acquiring a micro-expression video sample set and a macro-expression video sample set;
the space-time feature sample set module is used for extracting space-time feature samples of the micro-expression video sample set and the macro-expression video sample set through an initial three-dimensional micro-expression recognition model to obtain a micro-expression space-time feature sample set corresponding to the micro-expression video sample set and a macro-expression space-time feature sample set corresponding to the macro-expression video sample set;
the four-element loss function module is used for constructing a four-element loss function according to the micro-expression space-time characteristic sample set and the macro-expression space-time characteristic sample set;
the loss value module is used for determining a loss value according to the four-tuple loss function and the cross entropy loss function;
the three-dimensional micro-expression recognition model module is used for iteratively updating initial parameters of the initial three-dimensional micro-expression recognition model when the loss value does not meet the convergence condition, and calculating a new loss value according to the updated initial parameters; and when the new loss value meets the convergence condition, determining the initial three-dimensional micro-expression recognition model corresponding to the new loss value as the three-dimensional micro-expression recognition model.
Optionally, the micro-expression video sample set includes a micro-expression standard video, a first micro-expression sample video, and a second micro-expression sample video; the micro-expression standard video and the first micro-expression sample video refer to different micro-expression videos corresponding to a first type of expression, and the second micro-expression sample video refers to a micro-expression video corresponding to a second type of expression, wherein the first type of expression belongs to different types of expression; the macro expression video sample set comprises macro expression sample videos corresponding to the second type of expressions;
namely, the four-tuple loss function module comprises:
the micro-expression space-time feature acquisition unit is used for acquiring a first space-time feature of the micro-expression standard video, a second space-time feature of the first micro-expression sample video and a third space-time feature of the second micro-expression sample video from the micro-expression space-time feature sample set;
the macro expression space-time feature acquisition unit is used for acquiring a fourth space-time feature of the macro expression sample video from the macro expression space-time feature sample set;
and the quadruple loss function unit is used for constructing the quadruple loss function according to the first space-time feature, the second space-time feature, the third space-time feature and the fourth space-time feature.
For specific limitations of the micro-expression recognition apparatus, reference may be made to the above limitations of the micro-expression recognition method, and no further description is given here. The above-mentioned respective modules in the micro-expression recognition apparatus may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.
In one embodiment, a computer device is provided, which may be a server, the internal structure of which may be as shown in fig. 4. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a readable storage medium, an internal memory. The readable storage medium stores an operating system, computer readable instructions, and a database. The internal memory provides an environment for the execution of an operating system and computer-readable instructions in a readable storage medium. The database of the computer device is used for storing data related to the micro-expression recognition method. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer readable instructions when executed by a processor implement a method of microexpressive recognition. The readable storage medium provided by the present embodiment includes a nonvolatile readable storage medium and a volatile readable storage medium.
In one embodiment, a computer device is provided that includes a memory, a processor, and computer readable instructions stored on the memory and executable on the processor, when executing the computer readable instructions, performing the steps of:
acquiring a video to be processed comprising micro-expressions and macro-expressions;
preprocessing the data comprising the face detection of the video to be processed to obtain a preprocessed video;
extracting space-time characteristics of the preprocessed video through a three-dimensional micro-expression recognition model to obtain micro-expression space-time characteristics and macro-expression space-time characteristics;
extracting shared space-time features of the micro-expression and the macro-expression from the micro-expression space-time features and the macro-expression space-time features;
and carrying out expression classification on the micro-expressions according to the shared space-time characteristics, the micro-expression space-time characteristics and the macro-expression space-time characteristics to obtain the recognition result of the micro-expressions.
In one embodiment, one or more computer-readable storage media are provided having computer-readable instructions stored thereon, the readable storage media provided by the present embodiment including non-volatile readable storage media and volatile readable storage media. The readable storage medium has stored thereon computer readable instructions which when executed by one or more processors perform the steps of:
acquiring a video to be processed comprising micro-expressions and macro-expressions;
preprocessing the data comprising the face detection of the video to be processed to obtain a preprocessed video;
extracting space-time characteristics of the preprocessed video through a three-dimensional micro-expression recognition model to obtain micro-expression space-time characteristics and macro-expression space-time characteristics;
extracting shared space-time features of the micro-expression and the macro-expression from the micro-expression space-time features and the macro-expression space-time features;
and carrying out expression classification on the micro-expressions according to the shared space-time characteristics, the micro-expression space-time characteristics and the macro-expression space-time characteristics to obtain the recognition result of the micro-expressions.
Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by instructing the associated hardware by computer readable instructions stored on a non-volatile readable storage medium or a volatile readable storage medium, which when executed may comprise the above described embodiment methods. Any reference to memory, storage, database, or other medium used in embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions.
The above embodiments are only for illustrating the technical solution of the present application, and not for limiting the same; although the application has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present application, and are intended to be included in the scope of the present application.

Claims (10)

1. A method for identifying a microexpressive expression, comprising:
acquiring a video to be processed comprising micro-expressions and macro-expressions;
preprocessing the data comprising the face detection of the video to be processed to obtain a preprocessed video;
extracting space-time characteristics of the preprocessed video through a three-dimensional micro-expression recognition model to obtain micro-expression space-time characteristics and macro-expression space-time characteristics;
extracting shared space-time features of the micro-expression and the macro-expression from the micro-expression space-time features and the macro-expression space-time features;
and carrying out expression classification on the micro-expressions according to the shared space-time characteristics, the micro-expression space-time characteristics and the macro-expression space-time characteristics to obtain the recognition result of the micro-expressions.
2. The method of claim 1, wherein the preprocessing the video to be processed to obtain a preprocessed video includes:
performing face detection on a video frame sequence to be processed in the video to be processed by using a visual library to obtain face key points;
face clipping is carried out according to the face key points, so that a face video frame sequence is obtained;
and generating the preprocessing video according to the human face video frame sequence.
3. The method of microexpressive recognition according to claim 2, wherein said generating the pre-processed video from the sequence of facial video frames includes:
according to the face key points, face alignment is carried out on face video frames in the face video frame sequence, and a reference face video is generated;
and performing time domain image interpolation on the reference face video to obtain the preprocessing video.
4. The microexpressive recognition method according to claim 1, wherein said three-dimensional microexpressive recognition model includes a microexpressive recognition network and a macroexpression recognition network;
the method for extracting the space-time characteristics of the preprocessing video through the three-dimensional micro-expression recognition model to obtain micro-expression space-time characteristics and macro-expression space-time characteristics comprises the following steps:
extracting micro-expression space-time characteristics from a video frame sequence to be processed in the preprocessing video through the micro-expression recognition network to obtain the micro-expression space-time characteristics;
and extracting macro expression space-time characteristics from the video frame sequence to be processed in the preprocessing video through the macro expression recognition network to obtain the macro expression space-time characteristics.
5. The method of claim 1, wherein before extracting the space-time features of the preprocessed video by the three-dimensional micro-expression recognition model to obtain the micro-expression space-time features and the macro-expression space-time features, the method comprises:
acquiring a micro-expression video sample set and a macro-expression video sample set;
extracting space-time characteristic samples of the micro-expression video sample set and the macro-expression video sample set through an initial three-dimensional micro-expression recognition model to obtain a micro-expression space-time characteristic sample set corresponding to the micro-expression video sample set and a macro-expression space-time characteristic sample set corresponding to the macro-expression video sample set;
constructing a four-tuple loss function according to the micro-expression space-time characteristic sample set and the macro-expression space-time characteristic sample set;
determining a loss value according to the four-tuple loss function and the cross entropy loss function;
when the loss value does not meet the convergence condition, iteratively updating initial parameters of the initial three-dimensional micro-expression recognition model, and calculating a new loss value according to the updated initial parameters; and when the new loss value meets the convergence condition, determining the initial three-dimensional micro-expression recognition model corresponding to the new loss value as the three-dimensional micro-expression recognition model.
6. The method of claim 5, wherein the set of microexpressive video samples comprises microexpressive standard video, a first microexpressive sample video, and a second microexpressive sample video; the micro-expression standard video and the first micro-expression sample video refer to different micro-expression videos corresponding to a first type of expression, and the second micro-expression sample video refers to a micro-expression video corresponding to a second type of expression, wherein the first type of expression belongs to different types of expression; the macro expression video sample set comprises macro expression sample videos corresponding to the second type of expressions;
the constructing a four-tuple loss function according to the micro-expression space-time feature sample set and the macro-expression space-time feature sample set comprises the following steps:
acquiring a first space-time feature of the micro-expression standard video, a second space-time feature of the first micro-expression sample video and a third space-time feature of the second micro-expression sample video from the micro-expression space-time feature sample set;
acquiring a fourth time-space feature of the macro expression sample video from the macro expression time-space feature sample set;
and constructing the four-tuple loss function according to the first time-space characteristic, the second time-space characteristic, the third time-space characteristic and the fourth time-space characteristic.
7. A microexpressive recognition device, comprising:
the video module to be processed is used for acquiring a video to be processed comprising micro-expressions and macro-expressions;
the preprocessing video module is used for preprocessing the data comprising face detection of the video to be processed to obtain a preprocessed video;
the space-time feature extraction module is used for extracting space-time features of the preprocessed video through the three-dimensional micro-expression recognition model to obtain micro-expression space-time features and macro-expression space-time features;
the sharing space-time feature module is used for extracting sharing space-time features of the micro-expression and the macro-expression from the micro-expression space-time features and the macro-expression space-time features;
and the recognition result module is used for carrying out expression classification on the micro-expressions according to the shared space-time characteristics, the micro-expression space-time characteristics and the macro-expression space-time characteristics to obtain a recognition result of the micro-expressions.
8. The microexpressive recognition device of claim 7, wherein the pre-processing video module comprises:
the human face key point unit is used for carrying out human face detection on the video frame sequence to be processed in the video to be processed by utilizing the visual library to obtain human face key points;
the face video frame sequence unit is used for carrying out face cutting according to the face key points to obtain a face video frame sequence;
the preprocessing video unit is used for generating the preprocessing video according to the face video frame sequence.
9. A computer device comprising a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, wherein the processor, when executing the computer readable instructions, implements the micro-expression recognition method of any one of claims 1 to 6.
10. One or more readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the microexpressive recognition method of any of claims 1-6.
CN202310687981.0A 2023-06-09 2023-06-09 Micro-expression recognition method, micro-expression recognition device, computer equipment and storage medium Pending CN116665278A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310687981.0A CN116665278A (en) 2023-06-09 2023-06-09 Micro-expression recognition method, micro-expression recognition device, computer equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310687981.0A CN116665278A (en) 2023-06-09 2023-06-09 Micro-expression recognition method, micro-expression recognition device, computer equipment and storage medium

Publications (1)

Publication Number Publication Date
CN116665278A true CN116665278A (en) 2023-08-29

Family

ID=87716902

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310687981.0A Pending CN116665278A (en) 2023-06-09 2023-06-09 Micro-expression recognition method, micro-expression recognition device, computer equipment and storage medium

Country Status (1)

Country Link
CN (1) CN116665278A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649933A (en) * 2023-11-28 2024-03-05 广州方舟信息科技有限公司 Online consultation assistance method and device, electronic equipment and storage medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649933A (en) * 2023-11-28 2024-03-05 广州方舟信息科技有限公司 Online consultation assistance method and device, electronic equipment and storage medium
CN117649933B (en) * 2023-11-28 2024-05-28 广州方舟信息科技有限公司 Online consultation assistance method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
US11967151B2 (en) Video classification method and apparatus, model training method and apparatus, device, and storage medium
CN109583325B (en) Face sample picture labeling method and device, computer equipment and storage medium
WO2020140665A1 (en) Method and apparatus for quality detection of double-recorded video, and computer device and storage medium
CN108920639B (en) Context obtaining method and device based on voice interaction
WO2021068616A1 (en) Method and device for identity authentication, computer device, and storage medium
CN108920640B (en) Context obtaining method and device based on voice interaction
CN110020582B (en) Face emotion recognition method, device, equipment and medium based on deep learning
CN113435330B (en) Video-based micro-expression recognition method, device, equipment and storage medium
US20230057010A1 (en) Term weight generation method, apparatus, device and medium
CN112397057B (en) Voice processing method, device, equipment and medium based on generation countermeasure network
CN111476216A (en) Face recognition method and device, computer equipment and readable storage medium
CN113706502B (en) Face image quality assessment method and device
CN112633424B (en) Image processing method, image processing apparatus, image processing device, and storage medium
CN116665278A (en) Micro-expression recognition method, micro-expression recognition device, computer equipment and storage medium
US11611554B2 (en) System and method for assessing authenticity of a communication
CN110688878B (en) Living body identification detection method, living body identification detection device, living body identification detection medium, and electronic device
CN114268747A (en) Interview service processing method based on virtual digital people and related device
CN116863522A (en) Acne grading method, device, equipment and medium
CN114238715A (en) Question-answering system based on social aid, construction method, computer equipment and medium
CN113627387A (en) Parallel identity authentication method, device, equipment and medium based on face recognition
CN116597810A (en) Identity recognition method, identity recognition device, computer equipment and storage medium
CN113032621A (en) Data sampling method and device, computer equipment and storage medium
CN116484224A (en) Training method, device, medium and equipment for multi-mode pre-training model
CN109471717B (en) Sample library splitting method, device, computer equipment and storage medium
CN114298182A (en) Resource recall method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination