CN108537168B - Facial expression recognition method based on transfer learning technology - Google Patents

Facial expression recognition method based on transfer learning technology Download PDF

Info

Publication number
CN108537168B
CN108537168B CN201810309575.XA CN201810309575A CN108537168B CN 108537168 B CN108537168 B CN 108537168B CN 201810309575 A CN201810309575 A CN 201810309575A CN 108537168 B CN108537168 B CN 108537168B
Authority
CN
China
Prior art keywords
model
data
face
samples
domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810309575.XA
Other languages
Chinese (zh)
Other versions
CN108537168A (en
Inventor
杨云
赵航
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Yunnan University YNU
Original Assignee
Yunnan University YNU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Yunnan University YNU filed Critical Yunnan University YNU
Priority to CN201810309575.XA priority Critical patent/CN108537168B/en
Publication of CN108537168A publication Critical patent/CN108537168A/en
Application granted granted Critical
Publication of CN108537168B publication Critical patent/CN108537168B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Abstract

The invention discloses a facial expression recognition method based on a transfer learning technology, which comprises the steps of collecting facial data in real time, recognizing and intercepting facial images, carrying out gray processing, extracting face-to-face (LBP) characteristics of human faces, comparing average faces to determine the field of each test sample, checking whether a corresponding transfer model exists in a model base, if so, putting a picture file into the model for prediction, and if not, continuing to execute the test; model training and prediction are carried out by adopting a transfer learning method for sampling source domain samples, under the condition that the target domain samples are insufficient, the source domain samples are guided through a small number of samples of the target domain, if the related data samples are judged, new samples are marked, the sampled and reserved source domain data and the target domain data are used for participating in the next round of supervised machine learning model training, and the newly trained model can be used for predicting the future facial expression category of the target domain. The recognition accuracy is high when the cross-domain facial surface is classified in the real environment.

Description

Facial expression recognition method based on transfer learning technology
Technical Field
The invention belongs to the technical field of facial data processing, and particularly relates to a facial expression recognition method based on a transfer learning technology.
Background
The transfer learning is a very active research direction in the current machine learning field, and is proposed to solve some machine learning problems that the probability distribution of data in a source field and that in a target field are different, or the source field and the target field are related but have different tasks. However, there are some short boards for transfer learning, most of which are widely studied in academic circles, and the application of transfer learning in industrial circles is relatively less.
Facial expression recognition is an important research area in artificial intelligence and pattern recognition disciplines. The method aims to identify the facial expression categories in a real scene through feature engineering and a mode identification method.
The existing facial expression recognition is around the supervised learning method training data in feature engineering and machine learning to obtain a facial expression classification model for recognizing the facial expression of a new sample. The mainstream face recognition method mainly comprises the following steps: a) recognizing the human face; b) extracting image features from the face, wherein the commonly used features comprise LBP [1] or SIFT [2] features, the LBP is a local binary pattern feature which can avoid the image face feature from generating large deviation due to illumination to a great extent, the SIFT features are local features of the image, the rotation, scale scaling and brightness change of the image face feature are kept unchanged, and the stability of the image face feature to a certain degree, affine transformation and noise are also kept; c) adding labels to the facial data, and labeling expression categories, such as 'fear', 'anger', 'joy', 'normal', 'hurry', 'surprise' and the like; d) using the vectorized facial expression sample data obtained in the step b) and the label vectors corresponding to the sample set, and learning a classification model by using a supervised learning method, such as a support vector machine, naive Bayes, a neural network and other classification algorithms; e) and solidifying and applying the learned model in a specific product.
Most of the existing technologies use software or a reference facial expression data set in a production environment for classification model training, and the classification model has the following technical disadvantages:
(1) the amount of training data in the actual environment (target area) is too small.
The existing transfer learning has two fundamental problems: on one hand, the target domain data amount is small, auxiliary domain data needs to be found for instance migration, the domain data which can assist the target domain data is single, and the data distribution is different from the target domain data distribution easily, so that a negative migration phenomenon is caused. On the other hand, if the target domain data is to be developed continuously, marking the data without the class mark in the target domain requires a lot of manual and expert knowledge, a lot of resources are consumed, and the correctness of the marked data class mark cannot be judged.
(2) Source domain training samples have limitations.
Most of the existing expression data model training is based on some reference data sets, the data sets are unique and are often shot and collected or processed under specific conditions, the famous human face expression reference data sets such as KDEF data sets have single shooting backgrounds and basically uniform shooting hues, but the backgrounds in real scenes are often not single and the hues are various, so that the prediction of expression by using the model trained by the single data set in a specific environment often generates deviation.
(3) The data model is fixed in the software product, and updating iteration of the model cannot be carried out according to a specific production environment.
At present, a commonly used software product based on machine learning or pattern recognition is often combined with training data to train to obtain a better model in the initial production stage, but the training data of the model may have the condition of inconsistent data distribution with the data in the actual environment, and if the model cannot be updated and iterated in time, the software product based on the model may gradually not adapt to the new environment. For some software industries, repeated new model training for new environments consumes a lot of manpower and material resources, and the solidified data model is not beneficial to enterprises to save cost.
(4) The transfer learning technique is rarely used in practical application products.
At present, the practical application based on the transfer learning technology is few, most of the practical application is an academic achievement, and considering that in many practical problems, data often cannot meet the assumption of the same distribution of data in the traditional machine learning (namely the edge probability distribution and the conditional probability distribution of training data and test data are the same), the transfer learning can well solve the problems, and a source domain can be used as auxiliary information to promote the solution of a target task.
The method is used for analyzing actual problems, combines the characteristics of a transfer learning technology, innovatively proposes the idea of using the transfer learning technology to solve the recognition and classification of the facial expressions in the cross-field, and is expected to not only perfect academic achievements in scientific research, but also provide a transfer scheme with high usability aiming at the actual problems in a real scene.
Disclosure of Invention
The invention aims to provide a facial expression recognition method based on a transfer learning technology, which has high recognition accuracy and does not generate the under-fitting phenomenon of a model when the facial expressions in the cross-domain are classified in a real environment.
The invention adopts the technical scheme that a facial expression recognition method based on a transfer learning technology is carried out according to the following steps:
step 1, acquiring real-time facial data, identifying a facial image, intercepting the facial image, performing gray processing and face-to-face (LBP) feature extraction, finally comparing an average face to determine the field of each test sample, checking whether a corresponding migration model exists in a model base, if so, directly putting a picture file into the model for prediction, otherwise, executing step 2;
and 2, performing model training and prediction by adopting a transfer learning method for sampling source domain samples, guiding the source domain samples to be sampled through a small number of samples in the target domain under the condition that the target domain samples are insufficient, marking new samples if relevant data samples are judged, participating in the next round of training of the supervised machine learning model by using the source domain data and the target domain data which are sampled and reserved, and predicting the facial expression category of the future target domain by using the model obtained by the new training.
Further, in the step 1, a Java environment is used as a running carrier.
Further, in the step 1, a face image is extracted by using a face recognition function of OpenCV, CascadeClassifier.
Further, in step 1, the LBP feature extraction process of the face adopts the following formula:
Figure BDA0001621978880000031
wherein
Figure BDA0001621978880000032
(xc,yc) As the coordinates of the central pixel point, LBP (x)c,yc) LBP characteristic of the central pixel point, p is the p-th pixel of the neighborhood, ipIs the gray value of the neighborhood pixel, icThe gray value of the central pixel, s (x) is a sign function;
then, carrying out blocking processing on the intercepted face picture, and establishing LBP histogram characteristics for each block; the vectors of all blocks are linked into one sample vector.
Further, in the step 1, comparing the average faces to determine the field to which each test sample belongs is performed according to the following steps: the comparison of the average faces is performed according to the following formula:
Figure BDA0001621978880000033
wherein
Figure BDA0001621978880000041
Denotes the D thkIth face gray scale tensor value, face of individual fieldaverage(Dk) Denotes the D thkAn average of facial gray tensors of the individual domains;
when a new image sample image is added into the model, the image is compared with the average faces of a plurality of target fields, the field to which the actually measured sample belongs is returned, the returned target field is represented by d, and the process is formalized and represented as
Figure BDA0001621978880000042
And obtaining the field of the measured sample according to the formula.
Further, in the step 2, the supervised machine learning algorithm adopts a support vector machine, and a libsvm function library is used in an implementation method.
Further, in the step 2, the transfer learning method is: if an instance in the source domain is predicted to be paired, its weight should be increased; otherwise, the value becomes smaller; and if the data sample in the target domain is predicted incorrectly, the data sample needs to be increased, and the next training needs to be treated emphatically.
The invention has the beneficial effects that: the method solves the problem that the model can obtain good recognition accuracy in the production environment but can generate lower accuracy in the real environment due to certain data distribution difference between the facial expression data in the real environment and the model training data during production of related facial expression recognition products. In addition, because the data in the real environment is relatively less, if samples in the real environment are collected and trained on the model, an under-fitting phenomenon of the model can be generated. For the data model training under the above conditions, the data acquisition and the data model training can be optimized by using the transfer learning technology, so as to obtain a better classifier and an actual classification result. Through the transfer learning technology, a user can quickly obtain a small number of suitable samples in the target field, namely the living environment where the user is located, and the model suitable for the local environment is trained in an auxiliary mode through a sampling method by combining the self-contained source domain, namely the reference training data set. Therefore, on one hand, the inadaptability of source field data to a new field model is avoided, and on the other hand, the defect that a trained classification model cannot be obtained due to too small sample amount of the target field is overcome.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of the method of the present invention.
Fig. 2 shows the original image read.
Figure 3 is the extraction of faces using OpenCV.
Fig. 4 is a graph of the effect after the formula gray scale processing.
Fig. 5 is a diagram after the face image blocking process.
Fig. 6 is a diagram of fetching a blocked LBP feature.
Fig. 7 is a graph of LBP features resulting in a complete image.
FIG. 8 is a SVM classification diagram.
FIG. 9 is a process diagram for dynamically adding target domain training.
Fig. 10 is a software initialization diagram for facial expression recognition test based on transfer learning.
Fig. 11 is a software layout for facial expression recognition testing based on transfer learning.
Fig. 12 is a category map predicting different expressions in an actual environment.
FIG. 13 is a graph showing the results of classification with migration and without migration.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The flow of the facial expression recognition method based on the transfer learning technology is shown in fig. 1. The method technically and functionally comprises two modules: (1) acquiring a face image in real time; (2) and (3) using a transfer learning technology to obtain a classification model to predict the expression categories of the collected images. A detailed description of each module follows.
(1) Real-time facial image acquisition module
The invention uses Java environment as running carrier, and in the real-time face image acquisition module, the invention comprises the following specific work:
a) face recognition and facial image interception using third-party OpenCV open source tool library
Face recognition is performed using an OpenCV facial expression recognition function cascadeclassfier (face extraction is performed in combination with an "lbpcasade _ frontage" file), an image is cut out after the face image is recognized, and the image is processed into a new image with the size of 100 × 100 pixels, and the image is cut out before and after as shown in fig. 2 to 3.
b) Graying processing
Since the grayscale image does not affect the recognition of the facial expression, and since the number of channels is reduced, the processing speed is increased for the graying processing. The method comprises the following specific steps of sorting 3-channel data of an original image into a single-channel color mode, wherein the most common graying processing formula is as follows, wherein R, G, B respectively represents red, green and blue channel values of a pixel point, and Gray is a Gray value corresponding to the pixel point:
Gray=0.30*R+0.59*G+0.11*B
the front and back effects of the image gray scale processing are shown in fig. 3-4.
c) Face-to-face LBP feature extraction
LBP refers to local binary pattern, which is called in english: local Binary Pattern is an operator for describing Local features of an image, and LBP features have the remarkable advantages of gray scale invariance, rotation invariance and the like. Because the LBP features are simple in calculation and good in effect, the LBP features are widely applied to many fields of computer vision, particularly the aspects of face recognition and target detection. The LBP operator is defined in the neighborhood of the pixel 3 x 3, the neighborhood center pixel is used as a threshold value, the gray values of 8 adjacent pixels are compared with the pixel value of the neighborhood center, if the surrounding pixels are larger than the center pixel value, the position of the pixel point is marked as 1, and if not, the position is 0. Thus, 8 points in 3-by-3 neighborhood can be compared to generate 8-bit binary number, and the 8-bit binary number is arranged in sequence to form a binary number, which is the LBP value of the center pixel, and the LBP value is 2 in total8There are 256 possible, therefore, LBP values. The LBP value of the central pixel reflects the texture information of the area around this pixel. In order to adapt to texture features of different scales and meet the requirements of gray scale and rotation invariance, LBP is improved, and 3 x 3 neighborhood is expanded to any neighborhood. The LBP formula is as follows:
Figure BDA0001621978880000061
wherein
Figure BDA0001621978880000062
(xc,yc) Is the coordinate of the central pixel point, p is the p-th pixel of the neighborhood, ipIs adjacent toGray value of field pixel, icThe gray value of the central pixel, s (x) is a sign function;
according to the LBP feature extraction method, the intercepted face picture is subjected to blocking processing, as shown in figure 5, LBP histogram features are established for each block, and as the LBP value interval is [0,255], the feature value of each block is a 256-dimensional vector, as shown in figure 6; the vectors of all the blocks (the original block of 5 × 5 is constructed by the present invention) are linked into one sample vector, which is represented as a vector of 6400 dimensions 5 × 5 × 256, as shown in fig. 7.
d) Compare average faces
First, a picture of a face of a person in a country or region (e.g., china, usa) is defined as a field. Kth domain DkThe calculation formula of the average face of the next m face pictures is as follows, wherein,
Figure BDA0001621978880000071
represents DkGray value tensor matrix, face of the original image of the ith surface expression of the fieldaverage(Dk) The value of the mean value of the m face data in the field is still a matrix.
Figure BDA0001621978880000072
Calculating an average face tensor for each field, wherein the average face tensor is expressed as a matrix, calculating an average face and comparing the average face to determine which determined field each test sample belongs to, and then performing model prediction work by using a transfer learning model of the corresponding field.
When a new face image sample image (representing the actual face image collected in the image collection stage) is added into the model, the image is compared with the average face of a plurality of target fields, the field of the actual measurement sample is returned, the returned target field is represented by d, and the face is used for representing the returned target fieldaverage(Dk) An average face tensor representing the kth domain, formalized as
Figure BDA0001621978880000073
And after the field (target field) of the actually measured sample is obtained by the above formula, checking whether a corresponding migration model exists in the model library, if so, directly putting the picture file into the model for prediction, and otherwise, using a migration model training module.
(2) Migration model training module
In the method, 3-4 pictures are given to each category, the data set of the target domain can be dynamically added by a user later, and the method for dynamically adding the data samples refers to the introduction of the updated data part in the following (3).
Regarding the selection of the migration model, the invention adopts a sampling migration method in example migration. The method is characterized in that under the condition that a target field sample is insufficient, a source field sample is guided to be sampled through the target field, and finally, the source field data and the target field data which are sampled and reserved participate in the training of a supervised machine learning model together to obtain a training model.
a) And (2) selecting and using a few target fields to train a classification model through the field parameter d obtained in the step (d) in the module (1), wherein a support vector machine is adopted in a supervised machine learning algorithm, and a libsvm function library is used in an implementation method.
Support Vector Machine (SVM)
The support vector machine is a classic algorithm for solving a binary classification problem, when a neural network algorithm is not popular, training of a plurality of classification data models is realized by adopting an SVM (support vector machine), the classic SVM is a convex optimization problem, and the aim is to seek a support vector when the maximum class interval between classes is generated and an optimal segmentation plane determined by the support vector under the condition of linearly separable data distribution. Given the formula: f (x) wTx + b, representing the prediction function of the model, where x is the sample vector, and the parameters w and b represent the hyper-parameters of the model, let f(x) 0, i.e. wTx + b is 0 and can be expressed as a linear splitting plane equation determined by w and b.
The distance from any point in the sample space to the hyperplane at this time can be written as follows:
Figure BDA0001621978880000081
to find the maximum r value, the purpose of the support vector machine becomes to solve the transformed objective optimization function:
Figure BDA0001621978880000082
s.t. yi(wTxi+b)≥1,i=1,2,...,m.
the result of the solution can be classified as shown in FIG. 8, wherein the solid black line wTx + b is 0 as the best dividing plane if the sample to be predicted is in wTx + b left side of 1 indicates that the sample prediction value is positive, if wTAnd x + b is-1 right side, the sample prediction value is negative:
in addition, the support vector machine can solve the classification problem of low-dimensional simple data samples, for high-dimensional linear indivisible data samples, the support vector machine can map the data samples to a high-dimensional space by a kernel function method, and in a new space, the original linear indivisible data becomes linearly separable, so that the support vector machine can obtain a hyperplane to divide the data under the high-dimensional data.
Since there are more than two expression categories to be distinguished, a multi-classification scheme in a support vector machine is used here to handle the classification of multiple expressions. In the selection of the parameters of the multi-classification support vector machine, firstly, the type of the support vector machine is selected to be a classification support vector machine, namely, the setting parameter is's 0', and in the selection of the kernel function, the invention selects and uses a linear kernel function, and the setting parameter is marked as't 0'; the invention sets a parameter mark as '-b 1' as to whether the accuracy estimation information is carried, the value can output the probability value of the prediction category label when predicting, and other support vector machine parameter settings are as default.
b) And b) verifying the data set of the source field by using the model trained in the step a), if the related data samples are judged, the samples of the source field are similar to the samples of the same type labels of the target field, the establishment of the model of the target field can be assisted, at the moment, the new samples are marked, and the sample data is used for participating in training in the next round of training process to obtain a new model.
Adding pseudo tag data
Adding pseudo labels is the idea in semi-supervised machine learning methods, with the goal of increasing the samples available for model training. Because the target field to which the invention is directed is uncertain, sample data of a specific field cannot be obtained at the beginning, and classification models facing different fields cannot be made, therefore, aiming at the condition that enough labeled sample data does not exist, only a proper sample can be selected from the source field to assist the training of the target field model, but the labeled data of the source field cannot be directly used, the sample needs to be sampled by a screening algorithm, and the use property of the sampled sample is the same as that of semi-supervision so as to expand the training data.
c) The second round of training uses all target domain data and the sampled source domain data samples to retrain a new domain model.
After all the data sampled in the target domain and the source domain are collected, the two parts of data are added into a training model together for training, and the training parameters still use the parameters in a) to realize classification.
Explanation on the sample migration method in the present invention:
the invention is an expansion of a classic example migration algorithm TrAdaboost as a sample screening-based migration learning algorithm, wherein a traditional classic TrAdaboost method is additionally provided, and comprises the following steps:
first, the input section includes the following sections: the method comprises the steps of data with class labels in a source domain and data (a small amount) with class labels in a target domain, data S without class labels in the target domain, a base classifier and iteration times N. The number of data samples with class labels in the source domain is n, and the number of data samples with class labels in the target domain is m.
Firstly, weights are given to all samples (including a source domain and a target domain) before the first classification model training
Figure BDA0001621978880000101
Wherein ω is1Representing the weight of each sample of a source domain and a target domain in the 1 st round of transfer learning training process, and normalizing the weight when the t-th iterative training is performed on the model, namely:
Figure BDA0001621978880000102
thirdly, learning the classification prediction model h of the t time by using weight distribution through a weak classifiert(x) And calculating the error rate epsilon of the model in the target fieldt
Figure BDA0001621978880000103
Wherein ht(xi) Represents the model h obtained by the t trainingt(x) For sample xiPredictive label of c (x)i) Represents xiTrue category label of (2).
Modifying the weight of the source domain and the target domain samples according to the error rate
Figure BDA0001621978880000104
Figure BDA0001621978880000105
Iterating the step (III), (IV) for N times, and obtaining a final classifier model by an integration method as follows:
Figure BDA0001621978880000106
this final model can then be used on the S data set.
In the traditional traadaboost algorithm, each sample instance needs to be given a weight, the weight is consistent with the idea of setting the weight in Adaboost, and the weight is increased to show that the weight needs to be taken into account in the next round of learning. Different from the Adaboost method, in the conventional Adaboost method, all samples are data examples in the same field under the same task, and in the migration learning, two parts of data examples are included, which are samples in a source field and samples in a target field, respectively, at this time, the method given by the algorithm is: if an instance in the source domain is predicted right, its weight should be increased, otherwise it becomes smaller, while if the data sample in the target domain is predicted wrong, it needs to be increased, and the next training needs to be treated heavily.
In the migration algorithm of the invention, firstly, there is no sample S without class label in the target domain, and since facial expression recognition based on migration learning requires real-time feedback of expression class prediction results, in order to take time and computational resources (including hardware resources and data resources) into account, the text compression training process is 1 time, and the weight problem is extremely changed into a boolean problem, i.e. it is determined whether the sample of the source domain is retained or dropped by using weights 0 and 1, and the dropped sample does not participate in the second round of training process. And the second round of training determines the final model to predict the future sample class.
The basic transfer model training is realized, and the model obtained in the step c) is placed in a model library, so that the defect that the model still needs to be retrained next time is overcome.
(3) Updating data
In fig. 1, the part pointed to by the prediction class in the target domain data set may be regarded as a process of performing data supplementation on the target domain data set, and in actual use, the number of the target domain data sets may gradually increase, which may be very beneficial for the data model of the target domain. An optional dynamic addition of new samples and training process is shown in fig. 9 below.
Under the flow chart, two modules are included, namely a migration training model actual measurement module and a migration training module, wherein the migration training module is mainly realized by a user through a user interface, such as operations of opening a camera and the like; the latter migration training part is mainly realized by the system. In the migration training module, the operation of updating data is realized.
a) Firstly, a user actually measures and opens a camera through a transfer training model to obtain a facial picture, a system carries out feature conversion on the facial picture to construct an LBP vector of facial expression, and then the vector is used as a test case. And putting the expression into a model to obtain a prediction category label of the expression.
b) And (2) carrying out secondary confirmation on the example of which the prediction is accurate in the step a), and if the test case can ensure that the confidence (the accuracy, namely (2) the return value obtained by setting the part of the parameter 'accuracy estimation information' of the support vector machine) is higher (for example, more than 0.9) while the prediction is accurate, putting the test case into a training set so as to ensure that the sample of the target domain can continuously increase the sample size of the target domain along with the use time.
c) It should be noted that the sample information of each sample stored in the file is represented in the form of an LBP vector, and the original image is not retained.
(4) Software model of the invention
In order to match and verify the method and the steps provided by the invention, the invention simultaneously verifies the software actual product of the method. The product model provides a logical model specification of a transfer learning technique-based facial expression recognition classification system product as follows:
a) initializing source domain and target domain data, initializing various path contents. As shown in fig. 10.
b) The invention relates to Demo layout and actual measurement in an actual product. As shown in fig. 11.
Actually measuring, invoking the migration mode, i.e. making "Transferring mode is on" in the selected state, and obtaining the data result of fig. 12:
besides the actual measurement of each category, the invention also carries out the comparison test on the performance of the classification result under the migration and non-migration modes during the verification. The non-migration mode includes predicting the collected sample by using only the source domain model and only the target domain model, and the result is shown in fig. 13. As can be seen from fig. 13, the prediction of the example by directly using the source domain model is wrong, while the prediction of the small sample using the target domain is correct, but the result is about 12% different from the classification result using the migration model, and the performance improvement is better when the classification result using the migration model is obtained rather than using the migration learning algorithm.
The key points of the invention are as follows: 1. a method for screening source domain samples to assist in training a target domain model based on transfer learning is added in traditional facial expression recognition. The method guides the retention or deletion of the source domain data through the data training classification model of the target domain, the retained data and the data of the target domain jointly perform the next round of model training, and the newly trained model can be used for predicting the future facial expression category. 2. Techniques to expand target domain samples are dynamically added. And new samples are allowed to be added continuously to enrich the target domain sample set, so that the prediction model can be optimized, and the defect that the model is solidified in a software product is avoided.
Specific advantages of the invention are described below:
aiming at the characteristic of few target domain samples, the method for supplementing facial expression data from the related field is used for assisting the training of the current target field model;
aiming at the condition that the source domain data distribution has limitation, the method and the device ensure that the sampled source domain data can be beneficial to the training of the target domain model by using the target domain to guide the source domain data selection.
Aiming at the defect that the traditional relevant model training cannot be dynamically changed, the method can dynamically update the model before iteration according to the requirement, so that the trained model is more suitable for the current environment.
Aiming at the fact that the prior transfer learning is rarely used in specific application, the invention innovatively applies the transfer learning technology to actual software products.
In summary, the classification model is trained by using data sets oriented to two parts (source domain and target domain) in the actual environment, and firstly, a sample is screened by using a transfer learning technology. The invention discloses a classification algorithm based on a transfer learning technology, which is a classification algorithm of an original TrAdaboost method is improved by matching with the requirements in the actual environment. In addition, the invention avoids the defects of the original similar product curing model in the software product, the model is not required to be fixed in the software product, and new data samples can be dynamically added according to the actual situation, thereby being beneficial to dynamically updating and optimizing the model. When the data volume is small, a good enough model can be trained and applied to the migration tasks related to the field and having the same task.
In addition, the invention also tests the relevant reference data set on the basis of practical application so as to confirm the feasibility of the method in practical application. In the benchmark dataset, the KDEF dataset and the Yale dataset are used herein for two domain validation. Here we use the expression classes Happy, Sad, Normal and surposed common to both datasets for 4-class training.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, for the system embodiment, since it is substantially similar to the method embodiment, the description is simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (1)

1. The facial expression recognition method based on the transfer learning technology is characterized by comprising the following steps of:
step 1, acquiring real-time facial data, identifying a facial image, intercepting the facial image, performing gray processing and face-to-face (LBP) feature extraction, finally comparing an average face to determine the field of each test sample, checking whether a corresponding migration model exists in a model base, if so, directly putting a picture file into the model for prediction, otherwise, executing step 2;
step 2, performing model training and prediction by adopting a transfer learning method for sampling source domain samples, guiding the source domain samples to be sampled through a small number of samples in the target domain under the condition that the target domain samples are insufficient, marking new samples if relevant data samples are judged, participating in the next round of training of a supervised machine learning model by using the source domain data and the target domain data which are sampled and reserved, and predicting the facial expression category of the future target domain by using the model obtained by the new training;
in the step 1, the average face is compared to determine the field to which each test sample belongs according to the following steps: defining the image data of the face of a person in a certain country or region as a field, and comparing the average face according to the following formula:
Figure FDA0003274590740000011
wherein
Figure FDA0003274590740000012
Denotes the D thkIth face gray scale tensor value, face of individual fieldaverage(Dk) Denotes the D thkAn average of facial gray tensors of the individual domains;
when a new image sample image is added into the model, the image is compared with the average faces of a plurality of target fields, the field to which the actually measured sample belongs is returned, the returned target field is represented by d, and the process is formalized and represented as
Figure FDA0003274590740000013
Wherein the meaning of k refers to the serial number of the domain;
obtaining the field of the actual measurement sample through the above formula;
in the step 1, a Java environment is used as an operation carrier;
in the step 1, extracting a face picture by using a face recognition function CascadeClassifier of OpenCV;
in the step 1, the LBP feature extraction process of the face adopts the following formula:
Figure FDA0003274590740000014
wherein
Figure FDA0003274590740000021
(xc,yc) As the coordinates of the central pixel point, LBP (x)c,yc) LBP characteristic of the central pixel point, p is the p-th pixel of the neighborhood, ipIs the gray value of the neighborhood pixel, icThe gray value of the central pixel, s (x) is a sign function;
then, carrying out blocking processing on the intercepted face picture, and establishing LBP histogram characteristics for each block; linking the vectors of all the blocks into a sample vector;
in the step 2, a support vector machine is adopted in the supervised machine learning algorithm, and a libsvm function library is used in the implementation method;
in the step 2, the transfer learning method is as follows: if an instance in the source domain is predicted to be paired, its weight should be increased; otherwise, the value becomes smaller; and if the data sample in the target domain is predicted incorrectly, the data sample needs to be increased, and the next training needs to be treated emphatically.
CN201810309575.XA 2018-04-09 2018-04-09 Facial expression recognition method based on transfer learning technology Active CN108537168B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810309575.XA CN108537168B (en) 2018-04-09 2018-04-09 Facial expression recognition method based on transfer learning technology

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810309575.XA CN108537168B (en) 2018-04-09 2018-04-09 Facial expression recognition method based on transfer learning technology

Publications (2)

Publication Number Publication Date
CN108537168A CN108537168A (en) 2018-09-14
CN108537168B true CN108537168B (en) 2021-12-31

Family

ID=63483362

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810309575.XA Active CN108537168B (en) 2018-04-09 2018-04-09 Facial expression recognition method based on transfer learning technology

Country Status (1)

Country Link
CN (1) CN108537168B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109376766B (en) * 2018-09-18 2023-10-24 平安科技(深圳)有限公司 Portrait prediction classification method, device and equipment
CN110458223B (en) * 2019-08-06 2023-03-17 湖南省华芯医疗器械有限公司 Automatic detection method and detection system for bronchial tumor under endoscope
CN111274973B (en) * 2020-01-21 2022-02-18 同济大学 Crowd counting model training method based on automatic domain division and application
CN111523683B (en) * 2020-07-06 2020-10-30 北京天泽智云科技有限公司 Method and system for predicting technological parameters in tobacco processing
CN111998936B (en) * 2020-08-25 2022-04-15 四川长虹电器股份有限公司 Equipment abnormal sound detection method and system based on transfer learning
CN112330488B (en) * 2020-11-05 2022-07-05 贵州电网有限责任公司 Power grid frequency situation prediction method based on transfer learning
CN112966601A (en) * 2021-03-05 2021-06-15 上海深硅信息科技有限公司 Method for artificial intelligence teachers and apprentices to learn by semi-supervision
CN115019084A (en) * 2022-05-16 2022-09-06 电子科技大学 Classification method based on tensor multi-attribute feature migration

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915351A (en) * 2014-03-12 2015-09-16 华为技术有限公司 Picture sorting method and terminal
CN105447473A (en) * 2015-12-14 2016-03-30 江苏大学 PCANet-CNN-based arbitrary attitude facial expression recognition method
CN107316015A (en) * 2017-06-19 2017-11-03 南京邮电大学 A kind of facial expression recognition method of high accuracy based on depth space-time characteristic
CN107358169A (en) * 2017-06-21 2017-11-17 厦门中控智慧信息技术有限公司 A kind of facial expression recognizing method and expression recognition device
CN107545243A (en) * 2017-08-07 2018-01-05 南京信息工程大学 Yellow race's face identification method based on depth convolution model
CN107742107A (en) * 2017-10-20 2018-02-27 北京达佳互联信息技术有限公司 Facial image sorting technique, device and server

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104915351A (en) * 2014-03-12 2015-09-16 华为技术有限公司 Picture sorting method and terminal
CN105447473A (en) * 2015-12-14 2016-03-30 江苏大学 PCANet-CNN-based arbitrary attitude facial expression recognition method
CN107316015A (en) * 2017-06-19 2017-11-03 南京邮电大学 A kind of facial expression recognition method of high accuracy based on depth space-time characteristic
CN107358169A (en) * 2017-06-21 2017-11-17 厦门中控智慧信息技术有限公司 A kind of facial expression recognizing method and expression recognition device
CN107545243A (en) * 2017-08-07 2018-01-05 南京信息工程大学 Yellow race's face identification method based on depth convolution model
CN107742107A (en) * 2017-10-20 2018-02-27 北京达佳互联信息技术有限公司 Facial image sorting technique, device and server

Also Published As

Publication number Publication date
CN108537168A (en) 2018-09-14

Similar Documents

Publication Publication Date Title
CN108537168B (en) Facial expression recognition method based on transfer learning technology
CN110532920B (en) Face recognition method for small-quantity data set based on FaceNet method
Xu et al. Learning-based shadow recognition and removal from monochromatic natural images
Sivaram et al. DETECTION OF ACCURATE FACIAL DETECTION USING HYBRID DEEP CONVOLUTIONAL RECURRENT NEURAL NETWORK.
Moallem et al. Optimal threshold computing in automatic image thresholding using adaptive particle swarm optimization
CN112069921A (en) Small sample visual target identification method based on self-supervision knowledge migration
CN109002755B (en) Age estimation model construction method and estimation method based on face image
CN111652317B (en) Super-parameter image segmentation method based on Bayes deep learning
CN110633708A (en) Deep network significance detection method based on global model and local optimization
Pang et al. Cell nucleus segmentation in color histopathological imagery using convolutional networks
Shang et al. Facilitating efficient mars terrain image classification with fuzzy-rough feature selection
CN111985554A (en) Model training method, bracelet identification method and corresponding device
CN109685065A (en) Printed page analysis method, the system of paper automatic content classification
Hsu et al. Human body motion parameters capturing using kinect
JP2015508501A (en) Supervised classification method for classifying cells contained in microscopic images
Zheng et al. Learning causal representations for generalizable face anti spoofing
CN116910571B (en) Open-domain adaptation method and system based on prototype comparison learning
Salih et al. Deep learning for face expressions detection: Enhanced recurrent neural network with long short term memory
INTHIYAZ et al. YOLO (YOU ONLY LOOK ONCE) Making Object detection work in Medical Imaging on Convolution detection System.
Calefati et al. Reading meter numbers in the wild
CN113837015A (en) Face detection method and system based on feature pyramid
CN111242114B (en) Character recognition method and device
Li et al. Finely Crafted Features for Traffic Sign Recognition
Boudraa et al. Combination of local features and deep learning to historical manuscripts dating
Jain et al. Real-time eyeglass detection using transfer learning for non-standard facial data.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant