CN116912920B - Expression recognition method and device - Google Patents

Expression recognition method and device Download PDF

Info

Publication number
CN116912920B
CN116912920B CN202311168494.XA CN202311168494A CN116912920B CN 116912920 B CN116912920 B CN 116912920B CN 202311168494 A CN202311168494 A CN 202311168494A CN 116912920 B CN116912920 B CN 116912920B
Authority
CN
China
Prior art keywords
loss
expression recognition
expression
classification
recognition model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311168494.XA
Other languages
Chinese (zh)
Other versions
CN116912920A (en
Inventor
蒋召
周靖宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Xumi Yuntu Space Technology Co Ltd
Original Assignee
Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Xumi Yuntu Space Technology Co Ltd filed Critical Shenzhen Xumi Yuntu Space Technology Co Ltd
Priority to CN202311168494.XA priority Critical patent/CN116912920B/en
Publication of CN116912920A publication Critical patent/CN116912920A/en
Application granted granted Critical
Publication of CN116912920B publication Critical patent/CN116912920B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The application relates to the technical field of image recognition and provides an expression recognition method and device. The method comprises the following steps: acquiring an image to be identified; inputting the image to be recognized into an expression recognition model obtained through training in advance, and obtaining an expression recognition result output by the expression recognition model; the expression recognition model is obtained based on training of a training set, the training set comprises an image sample, an expression category corresponding to the image sample and a center vector corresponding to the expression category, a loss value of the expression recognition model is obtained based on classification loss and center loss, the center loss is determined based on the center vector corresponding to the expression category and the weight of a sample feature vector of the image sample on different sides, and the classification loss is determined based on classification of the sample feature vector of the image sample. The embodiment of the application solves the problem of low expression recognition precision.

Description

Expression recognition method and device
Technical Field
The present disclosure relates to the field of image recognition technologies, and in particular, to an expression recognition method and apparatus.
Background
An uncertainty sample exists in the expression recognition data set, namely, a labeling person cannot determine which expression sample belongs to, and the data can greatly influence an expression recognition task in a real scene. To avoid the impact of the uncertainty samples on the surface recognition task, existing algorithms may manually re-label the samples or correct the data tags during training by algorithms. However, manual re-labeling can remove complex samples in the data set, reduce generalization of the model, and require fine loss design for correcting the data label, and both the two modes can lead to lower expression recognition accuracy.
As can be seen, the related art has a problem of low expression recognition accuracy.
Disclosure of Invention
In view of this, the embodiments of the present application provide an expression recognition method and apparatus, so as to solve the problem in the prior art that the accuracy of the expression recognition algorithm is low.
In a first aspect of an embodiment of the present application, an expression recognition method is provided, including:
acquiring an image to be identified;
inputting the image to be recognized into an expression recognition model obtained through training in advance, and obtaining an expression recognition result output by the expression recognition model;
the expression recognition model is obtained based on training of a training set, the training set comprises an image sample, an expression category corresponding to the image sample and a center vector corresponding to the expression category, a loss value of the expression recognition model is obtained based on classification loss and center loss, the center loss is determined based on the center vector corresponding to the expression category and the weight of a sample feature vector of the image sample in different dimensions, and the classification loss is determined based on classification of the sample feature vector of the image sample.
In a second aspect of the embodiments of the present application, there is provided an expression recognition apparatus, including:
the acquisition module is used for acquiring the image to be identified;
the recognition module is used for inputting the image to be recognized into an expression recognition model obtained through training in advance to obtain an expression recognition result output by the expression recognition model;
the expression recognition model is obtained based on training of a training set, the training set comprises an image sample, an expression category corresponding to the image sample and a center vector corresponding to the expression category, a loss value of the expression recognition model is obtained based on classification loss and center loss, the center loss is determined based on the center vector corresponding to the expression category and the weight of a sample feature vector of the image sample in different dimensions, and the classification loss is determined based on classification of the sample feature vector of the image sample.
In a third aspect of the embodiments of the present application, there is provided an electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above method when executing the computer program.
In a fourth aspect of the embodiments of the present application, there is provided a readable storage medium storing a computer program which, when executed by a processor, implements the steps of the above method.
The beneficial effects of the embodiment of the application are that:
acquiring an image to be identified by acquiring the image to be identified; inputting the image to be recognized into an expression recognition model obtained by training in advance to obtain an expression recognition result output by the expression recognition model; the expression recognition model is obtained based on training set, the training set comprises an image sample, an expression category corresponding to the image sample and a center vector corresponding to the expression category, a loss value of the expression recognition model is obtained based on classification loss and center loss, the center loss is determined based on the center vector corresponding to the expression category and the weight of a sample feature vector of the image sample in different dimensions, and the classification loss is determined based on classification of the sample feature vector of the image sample. In this way, the center loss is determined through the center vector corresponding to the expression category and the weights of the sample feature vectors of the image samples in different dimensions, so that the vector weight with larger influence on the center loss can be improved, the vector weight with smaller influence on the center loss can be reduced, the distance between the samples and the center vector can be restrained, the influence of the uncertain samples on expression recognition is reduced, the discernability of the expression features is improved, the accuracy of expression recognition is improved, and the accuracy of the expression recognition result output by the expression recognition model is improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the following description will briefly introduce the drawings that are needed in the embodiments or the description of the prior art, it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic flow chart of an expression recognition method according to an embodiment of the present application;
fig. 2 is a flowchart of another expression recognition method according to an embodiment of the present application;
fig. 3 is a flowchart of another expression recognition method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an expression recognition device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system configurations, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
The terms first, second and the like in the description and in the claims, are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type and not limited to the number of objects, e.g., the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.
Furthermore, it should be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.
An expression recognition method and apparatus according to embodiments of the present application will be described in detail below with reference to the accompanying drawings.
Fig. 1 is a flowchart of an expression recognition method according to an embodiment of the present application. The expression recognition method as shown in fig. 1 includes:
step 101, obtaining an image to be identified.
The image to be recognized is an image that needs to be recognized by using the expression recognition model.
The image to be identified may be a facial image, which may be an electronic image, a paper image, or a real person facial image, where the specific limitation is not imposed, and the expression may be identified.
In addition, the image to be identified can be a clear and non-blocking image, or can be an image with partial image features blocked or partial image features blurred.
Step 102, inputting the image to be recognized into an expression recognition model obtained through training in advance, and obtaining an expression recognition result output by the expression recognition model.
The expression recognition model is obtained based on training set, the training set comprises an image sample, an expression category corresponding to the image sample and a center vector corresponding to the expression category, a loss value of the expression recognition model is obtained based on classification loss and center loss, the center loss is determined based on the center vector corresponding to the expression category and the weight of a sample feature vector of the image sample in different dimensions, and the classification loss is determined based on classification of the sample feature vector of the image sample.
The expression recognition model is an artificial intelligent model, and can analyze and recognize facial expressions, and the emotion state reflected by the face is judged usually by analyzing expression characteristics.
The image samples included in the training set can be obtained through shooting or video recording, and also can be obtained from the Internet through a crawler technology, and the obtaining mode is not limited to be used for training the expression recognition model. For the efficiency and integrity of expression recognition model training, a variety of image samples should be taken as much as possible, such as images of eyes being occluded, images of laugh states, images of crying and blurred states, and so forth.
It should be noted that, when the expression recognition training set is obtained, a specific application scenario of the expression recognition model may be referred to obtain some types of sample images, for example, a scenario only for company face recognition and card punching, so that not too many complex sample images are not needed, but as far as possible, a model specially for face recognition research is needed to obtain complex sample images.
In addition, the face recognition task can be trained based on the training set, and then the trained network parameter value is used as an initial value of the expression recognition model.
Expression categories include, but are not limited to, the following: the expression category can be expanded according to specific situations or special situations.
The loss value of expression recognition can be obtained by classifying the loss and the center loss.
The center loss is determined through the weights of the center vector corresponding to the expression category and the sample feature vector of the image sample in different dimensions, so that the center loss is related to the weights of the sample feature vector of the image sample in different dimensions, the vector weight with larger influence on the center loss can be improved, the vector weight with smaller influence on the center loss is reduced, in addition, the distance between the sample and the center of the category can be restrained through the center loss, the compactness of the similar sample is compressed, the separability of the non-similar sample is improved, the accuracy of the expression recognition algorithm is improved, and the influence of the uncertain sample on the expression recognition algorithm is reduced.
The image to be recognized is input into the pre-training mode to obtain the expression recognition model, the expression recognition result output by the expression recognition model is obtained, and the accuracy of the expression recognition result is improved through the high accuracy of the expression recognition model recognition image.
According to the technical scheme provided by the embodiment of the application, the expression recognition model with higher accuracy is obtained by training the expression recognition model, the image to be recognized is input into the trained expression recognition model, the accurate expression recognition result is obtained, the expression recognition model is effectively utilized, the image to be recognized is recognized, the more accurate recognition result is obtained, and the problem that the recognition result of the common expression recognition method is not accurate enough is solved.
In some embodiments, before the inputting the image to be recognized into the expression recognition model obtained by training in advance, obtaining an expression recognition result output by the expression recognition model, the method further includes:
based on the training set, respectively calculating the classification loss and the center loss through an expression recognition model to be trained;
obtaining the loss value based on the classification loss and the center loss;
and under the condition that the loss value is smaller than or equal to a preset value, obtaining the expression recognition model after training.
Specifically, when training the expression recognition model, based on the image sample acquisition method taught in the foregoing, an image sample is acquired, the expression recognition model is trained according to the image sample, that is, the training set, and the training of the expression recognition model firstly needs to acquire the feature vector of the image sample, and the feature vector can be extracted by using a deep learning algorithm, for example, a convolutional neural network or a cyclic neural network to extract the features of the image sample.
The classification loss may be calculated using a cross entropy loss, which represents the difference between the classification result of the image sample and the real sample.
The center loss can measure the square of the distance between the feature vector of the image sample and the center vector of the category to which the feature vector belongs, and the smaller the center loss value is, the closer the feature vector is to the center vector, so that the model can be trained by optimizing the total loss of the center loss and the classification loss in expression recognition.
The setting of the preset value should be set according to the expression recognition result, the setting of the preset value should make the expression recognition result as accurate as possible, the preset value does not need to be limited to a fixed value, and the preset value should be comprehensively determined according to the application scene of the expression recognition model and the characteristics of the training set image sample.
According to the technical scheme provided by the embodiment of the application, the total loss is obtained by calculating the classification loss and the center loss, the expression recognition model after training is obtained under the condition that the value of the total loss is smaller than the preset value by comparing the total loss with the preset value, and therefore the expression recognition model can be utilized to carry out expression recognition on the image to be recognized, and the expression recognition result with high accuracy is obtained.
In some embodiments, the expression recognition model includes: the backbone network layer, the global average pooling layer and the classification layer are sequentially connected;
based on the training set, calculating the classification loss through an expression recognition model to be trained, wherein the method comprises the following steps:
obtaining the sample feature vector through the backbone network layer;
carrying out global average pooling on the sample feature vectors through the global average pooling layer to obtain first feature vectors;
based on the first feature vector, obtaining the probability of the image sample belonging to different expression categories through the classification layer, and obtaining the classification loss based on the probability.
In particular, the backbone network layer is used to extract feature vectors of the image samples, including but not limited to a depth residual network and a convolutional neural network, and the backbone network is selected according to a specific scene.
The first eigenvector causes the sample to beThe feature vector is subjected to global average pooling layer averaging or dimension reduction to obtain a sample feature vector, the sample feature vector is further subjected to a classification layer to obtain probabilities of different expressions, the classification loss is determined according to the probabilities, the classification loss can be calculated by cross entropy loss in the previous description, so the classification loss can be calculated by the following formula,. Wherein L represents a cross entropy loss function, C represents a category number, t i True tag value, y, representing the i-th class i Representing the non-noise probability of the i-th category.
According to the embodiment of the application, the feature vectors extracted from the backbone network and subjected to global average pooling are classified, the classification loss is calculated, and the expression recognition model is trained by combining the classification loss value with the center loss, so that the expression recognition model with high accuracy is obtained.
In some embodiments, the expression recognition model further comprises: the system comprises an activation function layer, a first full-connection layer and a classification function layer which are connected in sequence; based on the training set, calculating the center loss through an expression recognition model to be trained, wherein the method comprises the following steps:
activating the first feature vector through the activation function layer to obtain a second feature vector;
carrying out feature transformation on N segmented vectors through the first full connection layer to obtain noise probability and non-noise probability, wherein the N segmented vectors are N vectors obtained by segmenting the second feature vector according to vector dimensions, and the vector dimensions of the second feature vector are the same as the number of output channels of the sample feature vector;
and carrying out normalization conversion on the non-noise probabilities corresponding to the N segmented vectors through the classification function layer to obtain weights corresponding to each segmented vector, multiplying the weights corresponding to each segmented vector by the partial vectors corresponding to the sample feature vectors to obtain weighted feature vectors, and obtaining the center loss based on the weighted feature vectors and the center vectors corresponding to the expression categories of the image samples.
Specifically, the activation function layer receives the output feature vector of the previous layer and performs nonlinear transformation on the output feature vector, and the activation functions may include Sigmoid function, reLU function, tanh function and leakage ReLU function, where the activation function applied by the activation function layer is not specifically limited, so that the purpose of vector activation can be achieved, but a more suitable activation function needs to be selected according to a specific application scenario, for example, the output range of the Tanh function is wider, and a larger nonlinear transformation capability can be provided.
The N segmented vectors are subjected to feature conversion through the first full connection layer, namely the segmented vectors are subjected to dimension reduction to obtain vectors with two dimensions, wherein one dimension represents noise probability, the other dimension represents non-noise probability, the noise probability represents the feature influencing the expression recognition task, and the non-noise probability represents the feature required to be utilized by the expression recognition task.
The vector is segmented to improve the accuracy of expression recognition.
The number of segments of the N segmented vectors, which is N, may be 3, 4, 5, etc., which is not particularly limited herein, and may achieve improvement of feature recognition accuracy.
The number of channels of the sample feature vector may be set according to requirements, for example, in order to make the accuracy of the expression recognition model higher and combine with the current technology of image sample acquisition, the number of channels may be 3, but is not specifically limited herein, and the number of channels of the sample feature vector is determined according to specific situations.
It should be noted that, the weight corresponding to each segmented vector is the weight of the sample feature vector in different dimensions, and the dimensions can be determined by the sample feature extraction mode and the expression recognition architecture, and the dimensions of the sample feature vector are not specifically limited, but different expressions can be fully represented and distinguished.
When multiplying the weight corresponding to each segmented vector with the partial vector corresponding to the sample feature vector, the vector dimension of the second feature vector is the same as the number of output channels of the sample feature vector, that is, the vector weight dimension combined by the N segmented vectors is the same as the number of output channels of the sample feature vector, that is, each dimension vector in the sample feature vector corresponds to a weight value, and at this time, the weight corresponding to each segmented vector can be multiplied with the partial vector corresponding to the sample feature vector, so as to obtain the weighted feature vector.
The center loss can be determined by calculating the cosine value of the center vector corresponding to the weighted feature vector and the expression class of the image sample.
According to the embodiment of the application, the feature vector is segmented and then subjected to feature transformation to obtain the non-noise probability, the non-noise probability of the vector is normalized to obtain the probability of 0 to 1, the weight corresponding to the segmented vector is obtained, the weighted feature vector is obtained according to the weight, the center loss is obtained according to the weighted feature vector and the center vector, the center loss is combined with the classification loss to train the expression recognition model, the expression recognition model with high accuracy is obtained, and therefore the accurate expression recognition result can be output.
In some embodiments, the expression recognition model further includes a second fully connected layer, disposed between the global average pooling layer and the classification layer, for weighting the first feature vector and inputting the weighted feature vector to the classification layer.
Specifically, through the second full-connection layer, the weighted feature vector is obtained by multiplying the first feature vector by the weight matrix, the weighted feature vector is input to the classification layer, nonlinear transformation is performed through an activation function, the activation function includes but is not limited to a Sortmax function, the vector is converted into a probability through the activation function, finally the probabilities belonging to different expression categories are output, and in the expression recognition process, the Sortmax function is generally used as a function of the output layer in order to ensure that the sum of the output probabilities is 1, but the method is not particularly limited herein.
In this way, the embodiment can predict the corresponding expression category according to the obtained feature vector through the classification layer.
In some embodiments, the deriving the loss value based on the classification loss and the center loss comprises:
and adding the classified loss and the central loss to obtain the loss value.
In the model training process, the loss value is calculated by comparing the loss value with the real label, and the parameter of the expression recognition model is updated by using a back propagation algorithm.
Therefore, the expression recognition model can obtain more accurate expression recognition results.
In some embodiments, the backbone network layer is a convolutional neural network ResNet50.
Specifically, the ResNet50 network contains 49 convolutional layers, a fully-connected layer.
In the expression recognition, the network can be used to improve the separation between the expression features and the compactness in the class, which are optimization targets, so that the network can be used as a basic network for training the expression recognition model, but the method is not particularly limited.
In the embodiment of the application, the expression recognition model obtained by training the network serving as the basic network is utilized, so that the accuracy of the expression recognition result is improved.
The following describes a table condition recognition model training process in one embodiment of the present application with reference to fig. 2.
Specifically, firstly, inputting an image sample into an expression recognition model to be trained, and extracting sample characteristics of the image sample through a backbone network.
It should be noted that, in the embodiment of the present application, the network selected is a res net50, the network is used as a base network, a face recognition task is trained in a large-scale face recognition data set, and then the trained network parameter value is used as an initial value of the expression recognition network.
The extracted sample features are then passed through a global averaging pooling layer.
Then, the probability of different expression categories is output through the classification layer, and the classification loss is calculated.
The classification loss is a cross entropy loss.
And then the extracted sample characteristics pass through an unbalanced weight learning module, the module divides the input sample characteristics into three sections through a global average pooling layer and three full connection layers, sequentially learns the non-noise probability of the three sections of sample characteristics, calculates weighted sample characteristics, and calculates unbalanced center loss through the weighted sample characteristics and center vectors.
And then, adding the classification loss and the unbalanced center loss to obtain total loss, and reversely updating the expression recognition network through the total loss, thereby achieving the effect of improving the accuracy of the expression recognition result.
The following describes a table condition recognition model training process in one embodiment of the present application with reference to fig. 3.
And carrying out global average pooling on the sample feature map, and then obtaining vectors through three full connection layers, wherein the vector dimension is the same as the number of output channels of the sample feature vector.
The vector of the previous step is activated by a hyperbolic tangent function (Tanh activation function), and then the activated vector is divided into three sections according to dimensions.
Then, feature-converting the segmented vector into a vector with dimension 2 by using a full-connection layer, wherein one dimension represents noise probability, and the other dimension represents non-noise probability, and then converting the non-noise probability by using a normalized exponential function (Softmax).
And then, splicing the vectors calculated by the different segments to obtain the final weight.
Then, the weights are multiplied by the input sample features to obtain weighted feature vectors.
And then, using the weighted feature vector and the center vector corresponding to the expression list of the image sample to obtain the unbalanced center loss.
The above unbalanced center loss is the center loss described above.
Any combination of the above optional solutions may be adopted to form an optional embodiment of the present application, which is not described herein in detail.
The following are device embodiments of the present application, which may be used to perform method embodiments of the present application. For details not disclosed in the device embodiments of the present application, please refer to the method embodiments of the present application.
Fig. 4 is a schematic diagram of an expression recognition device according to an embodiment of the present application. As shown in fig. 4, the expression recognition apparatus includes:
an acquisition module 401, configured to acquire an image to be identified;
the recognition module 402 is configured to input the image to be recognized into an expression recognition model obtained by training in advance, so as to obtain an expression recognition result output by the expression recognition model;
the expression recognition model is obtained based on training of a training set, the training set comprises an image sample, an expression category corresponding to the image sample and a center vector corresponding to the expression category, a loss value of the expression recognition model is obtained based on classification loss and center loss, the center loss is determined based on the center vector corresponding to the expression category and the weight of a sample feature vector of the image sample in different dimensions, and the classification loss is determined based on classification of the sample feature vector of the image sample.
According to the technical scheme provided by the embodiment of the application, the image to be identified is obtained through the obtaining module, and then the image to be identified input into the expression identification module is identified through the identification module, so that an expression identification result is obtained; the expression recognition module can recognize the expression type more accurately after training, so that the problem that the expression recognition is not accurate enough in the prior art is solved.
In some embodiments, the recognition module is further configured to calculate, based on the training set, the classification loss and the center loss through an expression recognition model to be trained, respectively; obtaining the loss value based on the classification loss and the center loss; and under the condition that the loss value is smaller than or equal to a preset value, obtaining the expression recognition model after training.
In some embodiments, the expression recognition model includes: the backbone network layer, the global average pooling layer and the classification layer are sequentially connected; the identification module is specifically configured to obtain the sample feature vector through the backbone network layer; carrying out global average pooling on the sample feature vectors through the global average pooling layer to obtain first feature vectors; based on the first feature vector, obtaining the probability of the image sample belonging to different expression categories through the classification layer, and obtaining the classification loss based on the probability.
In some embodiments, the expression recognition model further comprises: the system comprises an activation function layer, a first full-connection layer and a classification function layer which are connected in sequence; the identification module is specifically configured to activate the first feature vector through the activation function layer to obtain a second feature vector; carrying out feature transformation on N segmented vectors through the first full connection layer to obtain noise probability and non-noise probability, wherein the N segmented vectors are N vectors obtained by segmenting the second feature vector according to vector dimensions, and the vector dimensions of the second feature vector are the same as the number of output channels of the sample feature vector; and carrying out normalization conversion on the non-noise probabilities corresponding to the N segmented vectors through the classification function layer to obtain weights corresponding to each segmented vector, multiplying the weights corresponding to each segmented vector by the partial vectors corresponding to the sample feature vectors to obtain weighted feature vectors, and obtaining the center loss based on the weighted feature vectors and the center vectors corresponding to the expression categories of the image samples.
In some embodiments, the expression recognition model further includes a second fully connected layer, disposed between the global average pooling layer and the classification layer, for weighting the first feature vector and inputting the weighted feature vector to the classification layer.
In some embodiments, the identification module is specifically configured to add the classification loss and the center loss to obtain the loss value.
It should be understood that the sequence number of each step in the foregoing embodiment does not mean that the execution sequence of each process should be determined by the function and the internal logic of each process, and should not limit the implementation process of the embodiment of the present application in any way.
Fig. 5 is a schematic diagram of an electronic device 5 according to an embodiment of the present application. As shown in fig. 5, the electronic apparatus 5 of this embodiment includes: a processor 501, a memory 502 and a computer program 503 stored in the memory 502 and executable on the processor 501. The steps of the various method embodiments described above are implemented by processor 501 when executing computer program 503. Alternatively, the processor 501, when executing the computer program 503, performs the functions of the modules/units in the above-described apparatus embodiments.
The electronic device 5 may be a desktop computer, a notebook computer, a palm computer, a cloud server, or the like. The electronic device 5 may include, but is not limited to, a processor 501 and a memory 502. It will be appreciated by those skilled in the art that fig. 5 is merely an example of the electronic device 5 and is not limiting of the electronic device 5 and may include more or fewer components than shown, or different components.
The processor 501 may be a central processing unit (Central Processing Unit, CPU) or other general purpose processor, digital signal processor (Digital Signal Processor, DSP), application specific integrated circuit (Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-Programmable Gate Array, FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.
The memory 502 may be an internal storage unit of the electronic device 5, for example, a hard disk or a memory of the electronic device 5. The memory 502 may also be an external storage device of the electronic device 5, for example, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash Card (Flash Card) or the like, which are provided on the electronic device 5. Memory 502 may also include both internal storage units and external storage devices of electronic device 5. The memory 502 is used to store computer programs and other programs and data required by the electronic device.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit.
The integrated modules/units may be stored in a readable storage medium if implemented in the form of software functional units and sold or used as stand-alone products. Based on such understanding, the present application implements all or part of the flow in the method of the above embodiment, or may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a readable storage medium, where the computer program may implement the steps of the method embodiments described above when executed by a processor. The computer program may comprise computer program code, which may be in source code form, object code form, executable file or in some intermediate form, etc. The readable storage medium may include: any entity or device capable of carrying computer program code, a recording medium, a U disk, a removable hard disk, a magnetic disk, an optical disk, a computer Memory, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), an electrical carrier signal, a telecommunications signal, a software distribution medium, and so forth. The content contained in the readable storage medium may be appropriately increased or decreased according to the requirements of the legislation and the patent practice.

Claims (7)

1. An expression recognition method, comprising:
acquiring an image to be identified;
inputting the image to be recognized into an expression recognition model obtained through training in advance, and obtaining an expression recognition result output by the expression recognition model;
the expression recognition model is obtained based on training of a training set, the training set comprises an image sample, an expression category corresponding to the image sample and a center vector corresponding to the expression category, a loss value of the expression recognition model is obtained based on classification loss and center loss, the center loss is determined based on the center vector corresponding to the expression category and the weight of a sample feature vector of the image sample in different dimensions, and the classification loss is determined based on classification of the sample feature vector of the image sample;
before the image to be recognized is input into the expression recognition model obtained through training in advance to obtain the expression recognition result output by the expression recognition model, the method further comprises the following steps:
based on the training set, respectively calculating the classification loss and the center loss through an expression recognition model to be trained;
obtaining the loss value based on the classification loss and the center loss;
under the condition that the loss value is smaller than or equal to a preset value, obtaining a trained expression recognition model;
the expression recognition model includes: the backbone network layer, the global average pooling layer and the classification layer are sequentially connected;
based on the training set, calculating the classification loss through an expression recognition model to be trained, wherein the method comprises the following steps:
obtaining the sample feature vector through the backbone network layer;
carrying out global average pooling on the sample feature vectors through the global average pooling layer to obtain first feature vectors;
based on the first feature vector, obtaining the probability that the image sample belongs to different expression categories through the classification layer, and obtaining the classification loss based on the probability;
the expression recognition model further includes: the system comprises an activation function layer, a first full-connection layer and a classification function layer which are connected in sequence;
based on the training set, calculating the center loss through an expression recognition model to be trained, wherein the method comprises the following steps:
activating the first feature vector through the activation function layer to obtain a second feature vector;
carrying out feature transformation on N segmented vectors through the first full connection layer to obtain noise probability and non-noise probability, wherein the N segmented vectors are N vectors obtained by segmenting the second feature vector according to vector dimensions, and the vector dimensions of the second feature vector are the same as the number of output channels of the sample feature vector;
and carrying out normalization conversion on the non-noise probabilities corresponding to the N segmented vectors through the classification function layer to obtain weights corresponding to each segmented vector, multiplying the weights corresponding to each segmented vector by the partial vectors corresponding to the sample feature vectors to obtain weighted feature vectors, and obtaining the center loss based on the weighted feature vectors and the center vectors corresponding to the expression categories of the image samples.
2. The expression recognition method according to claim 1, wherein the expression recognition model further comprises a second fully connected layer, the second fully connected layer being disposed between the global averaging pooling layer and the classification layer, for weighting the first feature vector and inputting the weighted feature vector to the classification layer.
3. The expression recognition method according to claim 1, wherein the obtaining the loss value based on the classification loss and the center loss includes:
and adding the classified loss and the central loss to obtain the loss value.
4. The expression recognition method of claim 1, wherein the backbone network layer is a convolutional neural network res net50.
5. An expression recognition apparatus, characterized by comprising:
the acquisition module is used for acquiring the image to be identified;
the recognition module is used for inputting the image to be recognized into an expression recognition model obtained through training in advance to obtain an expression recognition result output by the expression recognition model;
the expression recognition model is obtained based on training of a training set, the training set comprises an image sample, an expression category corresponding to the image sample and a center vector corresponding to the expression category, a loss value of the expression recognition model is obtained based on classification loss and center loss, the center loss is determined based on the center vector corresponding to the expression category and the weight of a sample feature vector of the image sample in different dimensions, and the classification loss is determined based on classification of the sample feature vector of the image sample;
the recognition module is further used for respectively calculating the classification loss and the center loss through an expression recognition model to be trained based on the training set; obtaining the loss value based on the classification loss and the center loss; under the condition that the loss value is smaller than or equal to a preset value, obtaining a trained expression recognition model;
the expression recognition model includes: the backbone network layer, the global average pooling layer and the classification layer are sequentially connected; the identification module is specifically configured to obtain the sample feature vector through the backbone network layer; carrying out global average pooling on the sample feature vectors through the global average pooling layer to obtain first feature vectors; based on the first feature vector, obtaining the probability that the image sample belongs to different expression categories through the classification layer, and obtaining the classification loss based on the probability;
the expression recognition model further includes: the system comprises an activation function layer, a first full-connection layer and a classification function layer which are connected in sequence; the identification module is specifically configured to activate the first feature vector through the activation function layer to obtain a second feature vector; carrying out feature transformation on N segmented vectors through the first full connection layer to obtain noise probability and non-noise probability, wherein the N segmented vectors are N vectors obtained by segmenting the second feature vector according to vector dimensions, and the vector dimensions of the second feature vector are the same as the number of output channels of the sample feature vector; and carrying out normalization conversion on the non-noise probabilities corresponding to the N segmented vectors through the classification function layer to obtain weights corresponding to each segmented vector, multiplying the weights corresponding to each segmented vector by the partial vectors corresponding to the sample feature vectors to obtain weighted feature vectors, and obtaining the center loss based on the weighted feature vectors and the center vectors corresponding to the expression categories of the image samples.
6. An electronic device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the steps of the method according to any of claims 1 to 4 when the computer program is executed.
7. A readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the steps of the method according to any one of claims 1 to 4.
CN202311168494.XA 2023-09-12 2023-09-12 Expression recognition method and device Active CN116912920B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311168494.XA CN116912920B (en) 2023-09-12 2023-09-12 Expression recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311168494.XA CN116912920B (en) 2023-09-12 2023-09-12 Expression recognition method and device

Publications (2)

Publication Number Publication Date
CN116912920A CN116912920A (en) 2023-10-20
CN116912920B true CN116912920B (en) 2024-01-05

Family

ID=88368103

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311168494.XA Active CN116912920B (en) 2023-09-12 2023-09-12 Expression recognition method and device

Country Status (1)

Country Link
CN (1) CN116912920B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079790A (en) * 2019-11-18 2020-04-28 清华大学深圳国际研究生院 Image classification method for constructing class center
CN112241715A (en) * 2020-10-23 2021-01-19 北京百度网讯科技有限公司 Model training method, expression recognition method, device, equipment and storage medium
CN113887538A (en) * 2021-11-30 2022-01-04 北京的卢深视科技有限公司 Model training method, face recognition method, electronic device and storage medium
CN114118370A (en) * 2021-11-19 2022-03-01 北京的卢深视科技有限公司 Model training method, electronic device, and computer-readable storage medium
CN116152587A (en) * 2022-07-06 2023-05-23 马上消费金融股份有限公司 Training method of expression recognition model, facial expression recognition method and facial expression recognition device

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2707147C1 (en) * 2018-10-31 2019-11-22 Общество с ограниченной ответственностью "Аби Продакшн" Neural network training by means of specialized loss functions
CN112990097B (en) * 2021-04-13 2022-11-04 电子科技大学 Face expression recognition method based on countermeasure elimination

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111079790A (en) * 2019-11-18 2020-04-28 清华大学深圳国际研究生院 Image classification method for constructing class center
CN112241715A (en) * 2020-10-23 2021-01-19 北京百度网讯科技有限公司 Model training method, expression recognition method, device, equipment and storage medium
CN114118370A (en) * 2021-11-19 2022-03-01 北京的卢深视科技有限公司 Model training method, electronic device, and computer-readable storage medium
CN113887538A (en) * 2021-11-30 2022-01-04 北京的卢深视科技有限公司 Model training method, face recognition method, electronic device and storage medium
CN116152587A (en) * 2022-07-06 2023-05-23 马上消费金融股份有限公司 Training method of expression recognition model, facial expression recognition method and facial expression recognition device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
引入注意力机制和中心损失的表情识别算法;张翔 等;传感器与微系统(第11期);第154-157页 *

Also Published As

Publication number Publication date
CN116912920A (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN110020592B (en) Object detection model training method, device, computer equipment and storage medium
CN109583501B (en) Method, device, equipment and medium for generating image classification and classification recognition model
CN110188829B (en) Neural network training method, target recognition method and related products
CN111444765B (en) Image re-identification method, training method of related model, related device and equipment
CN110598603A (en) Face recognition model acquisition method, device, equipment and medium
CN113298152B (en) Model training method, device, terminal equipment and computer readable storage medium
CN113469088A (en) SAR image ship target detection method and system in passive interference scene
CN113065525B (en) Age identification model training method, face age identification method and related device
CN111310743B (en) Face recognition method and device, electronic equipment and readable storage medium
CN112820322A (en) Semi-supervised audio event labeling method based on self-supervised contrast learning
CN114091594A (en) Model training method and device, equipment and storage medium
CN116612500B (en) Pedestrian re-recognition model training method and device
CN113255557A (en) Video crowd emotion analysis method and system based on deep learning
CN111783688A (en) Remote sensing image scene classification method based on convolutional neural network
CN111652320A (en) Sample classification method and device, electronic equipment and storage medium
CN116912920B (en) Expression recognition method and device
CN114155388B (en) Image recognition method and device, computer equipment and storage medium
CN115731422A (en) Training method, classification method and device of multi-label classification model
CN113723431B (en) Image recognition method, apparatus and computer readable storage medium
CN115359296A (en) Image recognition method and device, electronic equipment and storage medium
CN115062769A (en) Knowledge distillation-based model training method, device, equipment and storage medium
CN114299304A (en) Image processing method and related equipment
CN114359786A (en) Lip language identification method based on improved space-time convolutional network
CN112613341A (en) Training method and device, fingerprint identification method and device, and electronic device
CN114444554A (en) Handwritten number recognition method and device, electronic equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant