CN111160327A - Expression recognition method based on lightweight convolutional neural network - Google Patents

Expression recognition method based on lightweight convolutional neural network Download PDF

Info

Publication number
CN111160327A
CN111160327A CN202010252867.1A CN202010252867A CN111160327A CN 111160327 A CN111160327 A CN 111160327A CN 202010252867 A CN202010252867 A CN 202010252867A CN 111160327 A CN111160327 A CN 111160327A
Authority
CN
China
Prior art keywords
neural network
convolutional neural
lightweight
expression
convolution
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010252867.1A
Other languages
Chinese (zh)
Other versions
CN111160327B (en
Inventor
赵光哲
张雷
杨瀚霆
朱娜
邵帅
田军伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Civil Engineering and Architecture
Original Assignee
Beijing University of Civil Engineering and Architecture
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Civil Engineering and Architecture filed Critical Beijing University of Civil Engineering and Architecture
Priority to CN202010252867.1A priority Critical patent/CN111160327B/en
Publication of CN111160327A publication Critical patent/CN111160327A/en
Application granted granted Critical
Publication of CN111160327B publication Critical patent/CN111160327B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06T5/80
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/50Extraction of image or video features by performing operations within image blocks; by using histograms, e.g. histogram of oriented gradients [HoG]; by summing image-intensity values; Projection analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10004Still image; Photographic image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Abstract

The invention relates to the field of artificial intelligence, and particularly provides an expression recognition method based on a lightweight convolutional neural network, which is characterized by comprising the following steps of: s1: building and training a lightweight convolution network model, wherein the range of the number of convolution layers of the lightweight convolution network model is 36-58, the range of the number of grouped convolution groups is 2-4, and the range of compression factors of compression layers is 0.3-0.5; s2: building a human face corrector; s3: detecting and correcting an input image by adopting a face corrector to obtain a preprocessed image; s4: and classifying and preprocessing the facial expressions in the image by adopting a lightweight convolutional neural network model. The invention solves the technical problems of low identification accuracy and low identification speed in the prior art, and has high real-time performance while ensuring the accuracy.

Description

Expression recognition method based on lightweight convolutional neural network
Technical Field
The invention relates to the field of computer vision, in particular to an expression recognition method based on a lightweight convolutional neural network.
Background
Emotion is a cognitive experience produced by human beings under intense psychological activities and is an important element for guiding communication in social environments. The initiation of emotion comes from a variety of sources, including mood, character, motivation, and the like. Facial expressions, as a unique signal transmission system, can express the psychological state of a person, and are one of effective methods for analyzing emotions. The expression recognition mainly comprises the following four processes: face positioning, face correction, feature extraction and expression classification. The feature extraction and the expression classification are important parts in the process and are the core difficult problems of expression identification. Conventional methods extract facial information using manually designed geometric features based on geometric attributes in the image and appearance features based on grayscale information of the image. These methods have high recognition accuracy for specific data distributions, but are difficult to handle with a wide range of pose changes, and have poor results when generalized to other data sets. In recent years, a method based on data driving has been attracting attention. For example, the convolutional neural network model learns the features directly from the data by means of weight sharing and downsampling, and is robust to changes such as postures, shelters and light. However, in order to obtain higher accuracy, the depth of the model is continuously deepened by the scholars, and the quantity of the model parameters is excessive. This is not conducive to the training of the model and the use of practical applications.
Disclosure of Invention
In order to solve the technical problems of low identification accuracy and low identification speed in the prior art, the invention provides an expression identification method based on a lightweight convolutional neural network, which adopts calculated quantity to reduce parameters
Figure 100002_DEST_PATH_IMAGE001
Determining parameters of a lightweight convolutional network model, comprising the steps of:
s1: building and training a lightweight convolution network model, and collecting input image information by adopting the lightweight convolution network model; the range of the convolution layer number of the lightweight convolution network model is 36-58, and the range of the compression factor of the compression layer is 0.3-0.5;
s2: building a human face corrector;
s3: detecting and correcting the input image information by adopting a face corrector to obtain a preprocessed image;
s4: classifying and preprocessing the facial expressions in the image by adopting a lightweight convolutional neural network model;
the building and training of the lightweight convolutional network model comprises the following steps:
s1.1: building a network model, and transmitting the output of each convolutional layer into a subsequent convolutional layer as an additional input, wherein the number of initial grouped convolutional groups is 2-4, and the number of convolutional layers of a single dense block is not less than 12;
s1.2: determining a rate of increase of a structural parameter of a lightweight convolutional neural network
Figure 100002_DEST_PATH_IMAGE002
Length of convolution filter
Figure 100002_DEST_PATH_IMAGE003
Width of convolution filter
Figure 100002_DEST_PATH_IMAGE004
Number of layers of convolution
Figure 100002_DEST_PATH_IMAGE005
Determining a rate of increase of a structural parameter of a lightweight convolutional neural network
Figure 62913DEST_PATH_IMAGE002
Length H of convolution filterkWidth of convolution Filter Wk number of convolution layers
Figure 338037DEST_PATH_IMAGE005
The method comprises the following steps:
calculating a calculation amount reduction parameter based on a lightweight convolutional neural network model
Figure 981507DEST_PATH_IMAGE001
Using parametersNRate of growth of structural parameter at minimum
Figure 752892DEST_PATH_IMAGE002
Length of convolution filter
Figure 481814DEST_PATH_IMAGE003
Width of convolution filter
Figure 560628DEST_PATH_IMAGE004
Number of layers of convolution
Figure 100002_DEST_PATH_IMAGE006
As parameters of the lightweight convolutional network model:
Figure 100002_DEST_PATH_IMAGE007
wherein
Figure 100002_DEST_PATH_IMAGE008
In order to increase the rate of growth of the structural parameter,
Figure 100002_DEST_PATH_IMAGE009
is the length of the convolution filter and,
Figure 100002_DEST_PATH_IMAGE010
in order to be the width of the convolution filter,
Figure 100002_DEST_PATH_IMAGE011
is the number of convolutional layers.
Preferably, the S1 includes: and training a lightweight convolutional neural network model according to a FERPLUS expression recognition database.
Preferably, the face corrector is built by adopting HOG characteristics and SVM algorithm.
Preferably, the S3 includes: detecting at least four reference points of a face in an input image through a logistic regression tree, matching the at least four reference points through the face corrector, and segmenting the input image according to the at least four reference points to obtain a preprocessed image.
Preferably, the step of training the lightweight convolutional neural network comprises:
acquiring a training sample, wherein the training sample comprises at least 1000 first expression images;
turning, rotating, cutting, scaling and deforming each first expression image by using a data augmentation method to obtain at least 10 corresponding second expression images;
randomly intercepting at least one picture block in the second expression image to obtain a third expression image with a blank area;
training a lightweight convolutional neural network model using the third expression image.
Preferably, the step of building the face corrector by using the HOG feature and the SVM algorithm comprises:
acquiring a training sample, wherein the training sample comprises at least 1000 standard face images;
the calculation formula for obtaining the gradient value and gradient direction of the HOG characteristic of the standard face image is as follows:
Figure 100002_DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure 100002_DEST_PATH_IMAGE013
in order to specify the abscissa of a pixel point,
Figure 100002_DEST_PATH_IMAGE014
in order to specify the ordinate of the pixel point,
Figure 100002_DEST_PATH_IMAGE015
and
Figure 100002_DEST_PATH_IMAGE016
in the representative image
Figure 100002_DEST_PATH_IMAGE017
The gradient values of the horizontal direction and the vertical direction of the dot,
Figure 871655DEST_PATH_IMAGE013
and
Figure 79520DEST_PATH_IMAGE014
is in the range of 0 to 255,
Figure 100002_DEST_PATH_IMAGE018
the gradient values of the representative pixels are then calculated,
Figure 100002_DEST_PATH_IMAGE019
representing the direction of the gradient of the pixel point,
Figure 100002_DEST_PATH_IMAGE020
the value is limited between 0 and 180 degrees;
building a human face SVM model according to the support vector machine principle;
training a human face SVM model by using the gradient value and the gradient direction of the obtained HOG characteristic of the standard human face image to obtain a training result;
the training results are used to form a face detector.
Preferably, the step of training the lightweight convolutional neural network model according to the FERPLUS expression recognition database includes: calculating an updated gradient value:
Figure 100002_DEST_PATH_IMAGE021
wherein
Figure 100002_DEST_PATH_IMAGE022
The direction is updated for the current gradient,
Figure 100002_DEST_PATH_IMAGE023
for the update direction of the gradient of the previous step,
Figure 100002_DEST_PATH_IMAGE024
for the current gradient calculated from the second derivative of the gradient,
Figure 100002_DEST_PATH_IMAGE025
and
Figure 100002_DEST_PATH_IMAGE026
in order to attenuate the weight(s),
Figure 905525DEST_PATH_IMAGE020
to update the gradient values.
Preferably, at least one type of classified expression classifier constructed by using the lightweight convolutional neural network model obtains the probability of expression prediction according to the extraction features of the input Softmax layer, and the calculation formula is as follows:
Figure 100002_DEST_PATH_IMAGE027
wherein
Figure 100002_DEST_PATH_IMAGE028
Is as follows
Figure 100002_DEST_PATH_IMAGE029
A label of a similar expression is displayed on the display,
Figure 100002_DEST_PATH_IMAGE030
is as follows
Figure 725713DEST_PATH_IMAGE029
The characteristics of the class input are such that,
Figure 100002_DEST_PATH_IMAGE031
to be generalized to represent all weights of a dense network,
Figure 100002_DEST_PATH_IMAGE032
as predicted probability of all expressionsThe vector of the composition is then calculated,
Figure 100002_DEST_PATH_IMAGE033
in order to be transposed, the device is provided with a plurality of groups of parallel connection terminals,
Figure 514415DEST_PATH_IMAGE029
Figure 100002_DEST_PATH_IMAGE034
Figure 100002_DEST_PATH_IMAGE035
is an integer variable.
According to the technical scheme provided by the invention, on the basis of face positioning and face correction, a light convolution mode is realized by reducing the parameter N by using the calculated amount, so that the calculated amount is lightened on the basis of keeping the accuracy of the dense convolution amount, and the advantages of high accuracy and low calculated amount are achieved. The invention combines the flows of facial feature extraction and expression classification by using a lightweight convolutional neural network model to realize facial expression recognition, realizes the recognition of facial expressions by using a single camera and image processing in a laboratory environment, has higher real-time property while ensuring the accuracy, and effectively analyzes facial expression information.
Drawings
Fig. 1 is a flowchart of an expression recognition method based on a lightweight convolutional neural network according to an embodiment of the present invention.
Fig. 2 is a schematic detection diagram of a face detector according to an embodiment of the present invention.
Fig. 3 is a schematic correction diagram of a face corrector according to an embodiment of the present invention.
Fig. 4a is a result of accuracy of the identification method of the lightweight convolutional neural network according to the first embodiment of the present invention on a verification data set.
Fig. 4b is a result of accuracy of the identification method of the lightweight convolutional neural network according to the first embodiment of the present invention on the verification data set.
Fig. 4c is a result of accuracy of the identification method of the lightweight convolutional neural network according to the first embodiment of the present invention on the verification data set.
Fig. 4d is a result of accuracy of the identification method of the lightweight convolutional neural network according to the first embodiment of the present invention on the verification data set.
Fig. 4e is a result of accuracy of the identification method of the lightweight convolutional neural network according to the first embodiment of the present invention on the verification data set.
Fig. 5a is a comparison graph of model parameters required by the identification method of the lightweight convolutional neural network according to the first embodiment of the present invention compared with other models.
Fig. 5b is a comparison graph of the model calculation amount compared with other models in the identification method of the lightweight convolutional neural network according to the first embodiment of the present invention.
Fig. 6a is a learning curve diagram of a lightweight convolutional neural network identification method in data set FER2013 according to an embodiment of the present invention.
Fig. 6b is a learning curve diagram of the lightweight convolutional neural network identification method in the data set FERPLUS according to the first embodiment of the present invention.
Fig. 6c is a learning curve diagram of the lightweight convolutional neural network identification method in the data set ferfn according to the first embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the scope of the present invention.
Example one
The embodiment provides an expression recognition method based on a lightweight convolutional neural network.
As shown in fig. 1, the expression recognition method based on the lightweight convolutional neural network provided in this embodiment includes four parts: inputting a single-frame image, detecting a human face, correcting the human face and identifying an expression. Starting from an original input image, the classification of the facial expression is predicted after two links of image processing.
The light weight of the embodiment refers to a convolution network calculation mode which is more efficient and requires less calculation amount compared with standard convolution calculation, so that the calculation complexity of the convolution network is reduced, and the calculation efficiency is improved. Dense blocks refer to the number of convolutional networks, and within a certain range, the greater the number of convolutional networks, the higher the model accuracy. According to the embodiment, the calculation mode of the convolutional network is improved in a light weight mode, the dense number of the convolutional network is not changed, the light weight neural network structure is obtained by optimizing the parameters of the convolutional network, and the operation efficiency is improved while the accuracy of the dense number is maintained.
Firstly, recognition accuracy and calculation time are two standards for detecting and positioning human faces in a man-machine interaction environment, but in consideration of the real-time performance of an expression recognition system, on the premise of ensuring certain accuracy, features and learning algorithms with higher calculation speed need to be selected to optimize various parameters of a lightweight convolution model.
The model training speed is slow considering that dense connection can cause the model to be too large in calculation amount. The lightweight convolutional neural network used in the present embodiment optimizes the convolutional layer. Building a network model, and transmitting the output of each convolutional layer into a subsequent convolutional layer as an additional input, wherein the number of initial grouped convolutional groups is 2-4, and the number of convolutional layers of a single dense block is not less than 12;
calculating a calculation amount reduction parameter according to the following formula
Figure DEST_PATH_IMAGE036
Figure DEST_PATH_IMAGE037
Wherein
Figure 394647DEST_PATH_IMAGE008
In order to increase the rate of growth of the structural parameter,
Figure 770264DEST_PATH_IMAGE009
is the length of the convolution filter and,
Figure 190881DEST_PATH_IMAGE010
in order to be the width of the convolution filter,
Figure 663451DEST_PATH_IMAGE011
is the number of convolutional layers. Using calculation reducing parameters
Figure 776901DEST_PATH_IMAGE036
Rate of growth of structural parameter at minimum
Figure 639814DEST_PATH_IMAGE008
Length of convolution filter
Figure 864122DEST_PATH_IMAGE009
Width of convolution filter
Figure 925619DEST_PATH_IMAGE010
Number of layers of convolution
Figure 646188DEST_PATH_IMAGE011
As a lightweight convolution model parameter.
The present embodiment forms a face detector based on the HOG features and the SVM algorithm for detecting the face position in a single frame image. Specifically, the present embodiment obtains training samples, where the training samples include 3000 face images in an LFW database; calculating the HOG characteristics of the face image according to a generation method of the direction gradient histogram; training a face detection SVM model by using the extracted HOG characteristics; and forming a face detector according to the training result. The HOG characteristic gradient calculation formula is as follows:
Figure DEST_PATH_IMAGE038
wherein the content of the first and second substances,
Figure 58715DEST_PATH_IMAGE013
in order to specify the abscissa of a pixel point,
Figure 24397DEST_PATH_IMAGE014
in order to specify the ordinate of the pixel point,
Figure 205980DEST_PATH_IMAGE015
and
Figure 395653DEST_PATH_IMAGE016
in the representative image
Figure 295476DEST_PATH_IMAGE017
The gradient values of the horizontal direction and the vertical direction of the dot,
Figure 799269DEST_PATH_IMAGE013
and
Figure 100938DEST_PATH_IMAGE014
is in the range of 0 to 255,
Figure 461512DEST_PATH_IMAGE018
the gradient values of the representative pixels are then calculated,
Figure 786314DEST_PATH_IMAGE019
representing the direction of the gradient of the pixel point,
Figure 156115DEST_PATH_IMAGE020
the value is limited to between 0 and 180 degrees.
As shown in fig. 2, after an original image is input, the HOG feature form of the image is first calculated, then the trained standard face HOG feature is compared with the HOG feature, and finally the position of the face in the original image is found and output.
The embodiment uses the regression tree set to detect the reference points in the face image block so as to correct the face in the single-frame image. Specifically, training samples are obtained, wherein the training samples comprise 2000 training face images and 330 testing face images; training the training sample by using a regression tree set of shape invariant feature segmentation; the training results are used to construct a face corrector. Fig. 3 is a schematic correction diagram of the face corrector according to the first embodiment. As shown in fig. 3, after the face image block is input, 68 feature points of the face are calculated first, then the face image block is compared with 68 feature points of the standard face, and finally the face image block is corrected.
The embodiment utilizes the lightweight convolutional neural network to perform feature extraction and prediction on the corrected human face so as to obtain expression classification. After a lightweight convolutional network model is built, model parameters are optimized through extreme value calculation; acquiring a facial expression recognition database FERFIN; preprocessing the expression data such as data augmentation; training by using the preprocessed expression image data set; and taking the training result as a final expression classification model.
Considering that applications in real environments require high real-time performance, an excessively large neural network architecture may result in an increased amount of computation. Setting the range of the convolution layer number of the lightweight convolution network model to be 36-58 and the range of the compression factor of the compression layer to be 0.3-0.5, and reducing parameters by adopting calculated amount
Figure 46711DEST_PATH_IMAGE036
Rate of growth of structural parameter at minimum
Figure 843766DEST_PATH_IMAGE008
Length of convolution filter
Figure 888820DEST_PATH_IMAGE009
Width of convolution filter
Figure 62312DEST_PATH_IMAGE010
Number of layers of convolution
Figure 72993DEST_PATH_IMAGE011
The parameters of the lightweight convolutional network model are beneficial to reducing the quantity of the parameters of the model and learning more characteristic features.
In the middle of each dense block there is a transition layer in order to accomplish parameter compression and adjust the computation variables. After 3 dense blocks, the feature tensor calculated by the model is input into a full-connection network layer, the combined kernel function of the layer maps the features extracted from the image into a 1 × 7 vector, wherein the value of each position represents the confidence of the expression of the category, and a vector calculation formula consisting of the prediction probabilities of all expressions is as follows:
Figure DEST_PATH_IMAGE039
wherein
Figure 978633DEST_PATH_IMAGE028
Is as follows
Figure 74765DEST_PATH_IMAGE029
A label of a similar expression is displayed on the display,
Figure 786369DEST_PATH_IMAGE030
is as follows
Figure 589240DEST_PATH_IMAGE029
The characteristics of the class input are such that,
Figure 728097DEST_PATH_IMAGE031
to be generalized to represent all weights of a dense network,
Figure 311525DEST_PATH_IMAGE032
is a vector consisting of the predicted probabilities of all expressions,
Figure 498924DEST_PATH_IMAGE033
in order to be transposed, the device is provided with a plurality of groups of parallel connection terminals,
Figure 484197DEST_PATH_IMAGE029
Figure 793956DEST_PATH_IMAGE034
Figure 864680DEST_PATH_IMAGE035
is an integer variable.
The embodiment trains an expression classifier using a lightweight convolutional neural network. For the deep convolutional neural network model, a large amount of training data is required to achieve high accuracy, so the present embodiment adopts the ferfn data set as the training data set. The ferfn data set was improved from the FER2013 data set, and included 12858 cases of "neutral" images, "9354 cases of" happy "images," 4462 cases of "sad" images 4351 cases of "angry" images 3082 cases of "disgust" images 575 cases of "afraid" images and "afraid" images 816 cases, and 35498 cases of 48 by 48 pixel grayish human expression images were summed.
The present embodiment processes the rfin database in two steps, taking into account the changes in pose, light, occlusion, etc. that exist in the expression recognition task. Turning, rotating, cutting, scaling and deforming the original picture to obtain twelve new pictures by using a data augmentation method for each picture; and randomly cutting the new picture into 16-by-16 pixel blocks to obtain a picture with blank areas.
In order to accelerate the convergence rate of the lightweight convolutional network model, the present embodiment obtains the current gradient by using the following momentum method instead of the conventional gradient descent method, where the momentum method calculates the current gradient formula as follows:
Figure DEST_PATH_IMAGE040
wherein
Figure 354305DEST_PATH_IMAGE022
And
Figure 194085DEST_PATH_IMAGE023
respectively representing the update direction of the current and previous step gradients,
Figure 612428DEST_PATH_IMAGE024
representing the current gradient calculated from the second derivative of the gradient,
Figure 904869DEST_PATH_IMAGE025
and
Figure 761967DEST_PATH_IMAGE026
respectively represent two attenuation weights which are respectively represented by,
Figure 456253DEST_PATH_IMAGE020
to update the gradient values.
As shown in FIGS. 4a, 4b, 4c, 4d and 4e, the used PGC-DenseNet model for optimizing DenseNet, which comprises 3 Dense blocks with 12 convolutional layers, optimizes all convolutional layers into depth separable convolution after grouping before each input image enters the Dense Block, and reduces the parameters by the calculated amount
Figure 45498DEST_PATH_IMAGE036
The model after the parameters are optimized is compared with other popular lightweight networks, and the result shows that the used method exceeds other models in accuracy performance, the used model converges earlier than other models, the speed reaching 80% is faster, and the final accuracy is higher.
As shown in FIG. 5a, the lightweight model comprises PGC-DenseNet, Squeezet, ShuffleNet1, ShuffleNet2, MobileNet1, MobileNet2 and MobileNet3 from left to right, and comparison of model parameters is carried out; as shown in FIG. 5b, the weight-reduced models were PGC-DenseNet, Squeezet, ShuffleNet1, ShuffleNet2, MobileNet1, MobileNet2, and MobileNet3 in the order from left to right, and were compared in terms of the amount of model calculations. As can be seen from fig. 5a and 5b, when the model used by PGC-DenseNet is the smallest in model parameters, and the calculated amount of the model is also in the same order as that of other models, only about 25 ten thousand of parameters are included, and the model parameters are reduced by at most 6 times compared with other lightweight models.
Fig. 6a, fig. 6b, and fig. 6c are the learning graphs of the PGC-DenseNet model in the data set FER2013, the data set FERPLUS, and the data set ferfn, respectively, in which the curves are more smoothly and continuously rising, and finally, the training set is located above, and the test set is where the curves fluctuate more and tend to converge. As can be clearly seen from fig. 6a, 6b, and 6c, the learning curves of the used models on the training set and the verification set exhibit sufficient robustness in the face of the over-fitting problem, and the curves of the training set and the verification set fit closely within 150 cycles.
The invention provides a lightweight rollThe expression recognition method of the neural network comprises the following steps: preprocessing facial expression data, training a lightweight convolutional neural network model to obtain an expression classification model, training a facial corrector, detecting a face based on a single frame image, correcting and recording the face based on the single frame image, and recognizing the expression based on the single frame image to obtain expression classification. The invention adopts calculation amount reduction parameters
Figure 90814DEST_PATH_IMAGE036
The method optimizes the parameters of the model and adopts a light convolution mode, so that on the basis of keeping the accuracy of the dense convolution quantity, the light calculation quantity has the advantages of high accuracy and low calculation quantity, online expression recognition is realized by using a single camera and a network transmission scheme in a laboratory environment, and the method has high real-time performance while the accuracy is ensured.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only examples of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (8)

1. An expression recognition method based on a lightweight convolutional neural network is characterized in that a calculated quantity reduction parameter is adopted
Figure DEST_PATH_IMAGE001
Determining parameters of a lightweight convolutional network model, comprising the steps of:
s1: building and training a lightweight convolution network model, and collecting input image information by adopting the lightweight convolution network model; the range of the convolution layer number of the lightweight convolution network model is 36-58, and the range of the compression factor of the compression layer is 0.3-0.5;
s2: building a human face corrector;
s3: detecting and correcting the input image information by adopting a face corrector to obtain a preprocessed image;
s4: classifying and preprocessing the facial expressions in the image by adopting a lightweight convolutional neural network model;
the building and training of the lightweight convolutional network model comprises the following steps:
s1.1: building a network model, and transmitting the output of each convolutional layer into a subsequent convolutional layer as an additional input, wherein the number of initial grouped convolutional groups is 2-4, and the number of convolutional layers of a single dense block is not less than 12;
s1.2: determining a rate of increase of a structural parameter of a lightweight convolutional neural network
Figure DEST_PATH_IMAGE002
Length of convolution filter
Figure DEST_PATH_IMAGE003
Width of convolution filter
Figure DEST_PATH_IMAGE004
Number of layers of convolution
Figure DEST_PATH_IMAGE005
Determining a rate of increase of a structural parameter of a lightweight convolutional neural network
Figure 572903DEST_PATH_IMAGE002
Length of convolution filter
Figure 490043DEST_PATH_IMAGE003
Width of convolution filter
Figure 269780DEST_PATH_IMAGE004
Number of layers of convolution
Figure 602673DEST_PATH_IMAGE005
The method comprises the following steps:
calculating a calculation amount reduction parameter based on a lightweight convolutional neural network model
Figure 417045DEST_PATH_IMAGE001
Using parameters
Figure 239508DEST_PATH_IMAGE001
Rate of growth of structural parameter at minimum
Figure 506541DEST_PATH_IMAGE002
Length of convolution filter
Figure 141659DEST_PATH_IMAGE003
Width of convolution filter
Figure 810538DEST_PATH_IMAGE004
Number of layers of convolution
Figure DEST_PATH_IMAGE006
As parameters of the lightweight convolutional network model:
Figure DEST_PATH_IMAGE007
wherein
Figure DEST_PATH_IMAGE008
In order to increase the rate of growth of the structural parameter,
Figure DEST_PATH_IMAGE009
is the length of the convolution filter and,
Figure DEST_PATH_IMAGE010
in order to be the width of the convolution filter,
Figure DEST_PATH_IMAGE011
is the number of convolutional layers.
2. The expression recognition method based on the light-weighted convolutional neural network of claim 1, wherein the S1 includes: and training a lightweight convolutional neural network model according to a FERPLUS expression recognition database.
3. The expression recognition method based on the light-weight convolutional neural network as claimed in claim 1, wherein the face corrector is built by adopting HOG features and SVM algorithm.
4. The expression recognition method based on the light-weighted convolutional neural network of claim 1, wherein the S3 includes: detecting at least four reference points of a face in an input image through a logistic regression tree, matching the at least four reference points through the face corrector, and segmenting the input image according to the at least four reference points to obtain a preprocessed image.
5. The expression recognition method based on the light-weighted convolutional neural network of claim 2, wherein the step of training the light-weighted convolutional neural network comprises:
acquiring a training sample, wherein the training sample comprises at least 1000 first expression images;
turning, rotating, cutting, scaling and deforming each first expression image by using a data augmentation method to obtain at least 10 corresponding second expression images;
randomly intercepting at least one picture block in the second expression image to obtain a third expression image with a blank area;
training a lightweight convolutional neural network model using the third expression image.
6. The expression recognition method based on the light-weight convolutional neural network as claimed in claim 3, wherein the step of constructing the face corrector by adopting the HOG feature and the SVM algorithm comprises the following steps:
acquiring a training sample, wherein the training sample comprises at least 1000 standard face images;
the calculation formula for obtaining the gradient value and gradient direction of the HOG characteristic of the standard face image is as follows:
Figure DEST_PATH_IMAGE012
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE013
in order to specify the abscissa of a pixel point,
Figure DEST_PATH_IMAGE014
in order to specify the ordinate of the pixel point,
Figure DEST_PATH_IMAGE015
and
Figure DEST_PATH_IMAGE016
in the representative image
Figure DEST_PATH_IMAGE017
The gradient values of the horizontal direction and the vertical direction of the dot,
Figure 616951DEST_PATH_IMAGE013
and
Figure 371280DEST_PATH_IMAGE014
is in the range of 0 to 255,
Figure DEST_PATH_IMAGE018
the gradient values of the representative pixels are then calculated,
Figure DEST_PATH_IMAGE019
representing the direction of the gradient of the pixel point,
Figure DEST_PATH_IMAGE020
the value is limited between 0 and 180 degrees;
building a human face SVM model according to the support vector machine principle;
training a human face SVM model by using the gradient value and the gradient direction of the obtained HOG characteristic of the standard human face image to obtain a training result;
the training results are used to form a face detector.
7. The expression recognition method based on the lightweight convolutional neural network as claimed in claim 1, wherein the step of training the lightweight convolutional neural network model according to the FERPLUS expression recognition database comprises: calculating an updated gradient value:
Figure DEST_PATH_IMAGE021
wherein
Figure DEST_PATH_IMAGE022
The direction is updated for the current gradient,
Figure DEST_PATH_IMAGE023
for the update direction of the gradient of the previous step,
Figure DEST_PATH_IMAGE024
for the current gradient calculated from the second derivative of the gradient,
Figure DEST_PATH_IMAGE025
and
Figure DEST_PATH_IMAGE026
in order to attenuate the weight(s),
Figure 419877DEST_PATH_IMAGE020
to update the gradient values.
8. The expression recognition method based on the lightweight convolutional neural network as claimed in claim 1, wherein at least one type of classified expression classifier constructed by using the lightweight convolutional neural network model obtains the probability of expression prediction according to the extracted features of the input Softmax layer, and the calculation formula is as follows:
Figure DEST_PATH_IMAGE027
wherein
Figure DEST_PATH_IMAGE028
Is as follows
Figure DEST_PATH_IMAGE029
A label of a similar expression is displayed on the display,
Figure DEST_PATH_IMAGE030
is as follows
Figure 880945DEST_PATH_IMAGE029
The characteristics of the class input are such that,
Figure DEST_PATH_IMAGE031
to be generalized to represent all weights of a dense network,
Figure DEST_PATH_IMAGE032
is a vector consisting of the predicted probabilities of all expressions,
Figure DEST_PATH_IMAGE033
in order to be transposed, the device is provided with a plurality of groups of parallel connection terminals,
Figure 747008DEST_PATH_IMAGE029
Figure DEST_PATH_IMAGE034
Figure DEST_PATH_IMAGE035
is an integer variable.
CN202010252867.1A 2020-04-02 2020-04-02 Expression recognition method based on lightweight convolutional neural network Active CN111160327B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010252867.1A CN111160327B (en) 2020-04-02 2020-04-02 Expression recognition method based on lightweight convolutional neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010252867.1A CN111160327B (en) 2020-04-02 2020-04-02 Expression recognition method based on lightweight convolutional neural network

Publications (2)

Publication Number Publication Date
CN111160327A true CN111160327A (en) 2020-05-15
CN111160327B CN111160327B (en) 2020-06-30

Family

ID=70567689

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010252867.1A Active CN111160327B (en) 2020-04-02 2020-04-02 Expression recognition method based on lightweight convolutional neural network

Country Status (1)

Country Link
CN (1) CN111160327B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642477A (en) * 2021-08-17 2021-11-12 苏州大学 Character recognition method, device and equipment and readable storage medium
CN116958703A (en) * 2023-08-02 2023-10-27 德智鸿(上海)机器人有限责任公司 Identification method and device based on acetabulum fracture

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109753922A (en) * 2018-12-29 2019-05-14 北京建筑大学 Anthropomorphic robot expression recognition method based on dense convolutional neural networks
CN109829923A (en) * 2018-12-24 2019-05-31 五邑大学 A kind of antenna for base station based on deep neural network has a down dip angle measuring system and method
CN110853630A (en) * 2019-10-30 2020-02-28 华南师范大学 Lightweight speech recognition method facing edge calculation
WO2020051776A1 (en) * 2018-09-11 2020-03-19 Intel Corporation Method and system of deep supervision object detection for reducing resource usage

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020051776A1 (en) * 2018-09-11 2020-03-19 Intel Corporation Method and system of deep supervision object detection for reducing resource usage
CN109829923A (en) * 2018-12-24 2019-05-31 五邑大学 A kind of antenna for base station based on deep neural network has a down dip angle measuring system and method
CN109753922A (en) * 2018-12-29 2019-05-14 北京建筑大学 Anthropomorphic robot expression recognition method based on dense convolutional neural networks
CN110853630A (en) * 2019-10-30 2020-02-28 华南师范大学 Lightweight speech recognition method facing edge calculation

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
孙若钒等: "VansNet轻量化卷积神经网络", 《贵州大学学报(自然科学版)》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642477A (en) * 2021-08-17 2021-11-12 苏州大学 Character recognition method, device and equipment and readable storage medium
CN116958703A (en) * 2023-08-02 2023-10-27 德智鸿(上海)机器人有限责任公司 Identification method and device based on acetabulum fracture

Also Published As

Publication number Publication date
CN111160327B (en) 2020-06-30

Similar Documents

Publication Publication Date Title
CN110569795B (en) Image identification method and device and related equipment
CN108133188B (en) Behavior identification method based on motion history image and convolutional neural network
CN109815826B (en) Method and device for generating face attribute model
CN109492529A (en) A kind of Multi resolution feature extraction and the facial expression recognizing method of global characteristics fusion
CN112801040B (en) Lightweight unconstrained facial expression recognition method and system embedded with high-order information
CN113158862A (en) Lightweight real-time face detection method based on multiple tasks
CN113420794B (en) Binaryzation Faster R-CNN citrus disease and pest identification method based on deep learning
CN111160327B (en) Expression recognition method based on lightweight convolutional neural network
CN113705290A (en) Image processing method, image processing device, computer equipment and storage medium
CN111476178A (en) Micro-expression recognition method based on 2D-3D CNN
Xu et al. Face expression recognition based on convolutional neural network
CN112883931A (en) Real-time true and false motion judgment method based on long and short term memory network
CN112418032A (en) Human behavior recognition method and device, electronic equipment and storage medium
CN111401116B (en) Bimodal emotion recognition method based on enhanced convolution and space-time LSTM network
CN115393944A (en) Micro-expression identification method based on multi-dimensional feature fusion
CN112836748A (en) Casting identification character recognition method based on CRNN-CTC
CN111126364A (en) Expression recognition method based on packet convolutional neural network
CN113343773B (en) Facial expression recognition system based on shallow convolutional neural network
CN113052132A (en) Video emotion recognition method based on face key point track feature map
TWI722383B (en) Pre feature extraction method applied on deep learning
CN113469116A (en) Face expression recognition method combining LBP (local binary pattern) features and lightweight neural network
Shijin et al. Research on classroom expression recognition based on deep circular convolution self-encoding network
CN112819133A (en) Construction method of deep hybrid neural network emotion recognition model
Nayak et al. Facial Expression Recognition based on Feature Enhancement and Improved Alexnet
Duan An object recognition method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant