CN109492529A

CN109492529A - A kind of Multi resolution feature extraction and the facial expression recognizing method of global characteristics fusion

Info

Publication number: CN109492529A
Application number: CN201811167972.4A
Authority: CN
Inventors: 王海波; 叶宾; 李会军; 张家铭
Original assignee: China University of Mining and Technology CUMT; Chinese University of Hong Kong CUHK
Current assignee: China University of Mining and Technology CUMT; Chinese University of Hong Kong CUHK
Priority date: 2018-10-08
Filing date: 2018-10-08
Publication date: 2019-03-19

Abstract

The invention discloses the facial expression recognizing methods that a kind of Multi resolution feature extraction and global characteristics merge, and select a face expression data collection as initial data, initial data is divided into training set data and test set data；The convolutional neural networks of Multi resolution feature extraction and global characteristics fusion are constructed using TensorFlow artificial intelligence learning system；Convolutional neural networks read training set data, model training is carried out after pre-processing to training set data, then read test collection data, affiliated expression classification is successively identified to each expression in test set data, after the identification for completing institute's espressiove, the Average Accuracy of calculating institute's espressiove and average F1-score index, are finally completed the process of facial expression recognition.Present invention recognition speed in the case where guaranteeing that recognition accuracy is high is fast, while adapting to a variety of light environments with stronger robustness, so as to effectively meet application request.

Description

A kind of Multi resolution feature extraction and the facial expression recognizing method of global characteristics fusion

Technical field

The present invention relates to a kind of facial expression recognizing method, specifically a kind of Multi resolution feature extraction and global characteristics fusion Facial expression recognizing method.

Background technique

For people in usual exchange, facial expression is a kind of wherein important mode, a kind of its carrier as information, Many information that can't be conveyed by words are contained, we are known as non-linguistic information.Since face can very directly in exchange It is observed, therefore expression can transmit emotion information in a manner of more intuitive.In recent years, sentiment analysis is gradually ground by numerous The concern for the person of studying carefully, and facial expression recognition is the important component of emotion analysis, has also obtained quick development, especially In intelligent human-machine interaction, computer is by acquiring the expression of people come the emotion information of reasoning people and carrying out imitation of emotion.

The step of being currently based on the facial expression recognition algorithm of traditional computer vision can be generally divided into two steps: feature mentions It takes and tagsort.Have the feature of the engineers of many maturations at present, such as: LBP feature, HOG feature, Haar feature, Gabor characteristic etc., what these were characterized in designing by artificial experimental summary, wherein the subjective factor comprising people is inside, The distribution that sufficiently can not objectively reflect feature in image brings uncertain factor to the identification of expression.Also have simultaneously The traditional characteristic sorting algorithm that many is had excellent performance, such as: PCA+LDA, SVM, Adaboost etc., deep learning upsurge it Before, these algorithms are since itself excellent performance has obtained everybody favor, and wherein SVM even more once becomes numerous classification problems Preferred model.But there is also deficiencies for traditional calculations visible sensation method: 1, artificial design rule carries out feature extraction, makes finally to extract Feature include a part of subjective factor, really cannot objectively reflect the true distribution of data；2, the feature haveing excellent performance mentions It takes method often computationally intensive, is difficult to realize identification in real time on limited hardware resource.

In view of the above problems, document (Zhang Guoliang, Zhao Zhu Jun, Du Jixiang, Wang Zhanni, Wang Tian view-based access control model Expression analysis The small-sized microcomputer system of interactive expression robot system research [J], 2017 (6): 1381-1386.) it proposes using more Size, multi-angle Gabor wavelet filter extract facial characteristics, obtain character representation more better than LBP, Gabor is small The principle of wave conversion is to carry out Fourier transformation using signal of the window function to each time slice, due to the items of its filter Characteristic is all similar with human visual system, this is also the major reason that it is excellent in computer vision problem；Liu Weifeng Deng (Liu Weifeng, Japanese plum Juan, LBP signature analysis [J] the computer engineering and application of Wang Yanjiang human face expression, 2011,47 (2): The method for 149-152.) first passing through integral projection orients the key point in the regions such as eyes, nose and mouth to come, then people Face be divided into several sub-regions, then calculate the LBP feature of each subregion for final classification, this method is in JAFFE Classification accuracy on data set is higher by 9 percentage points than traditional LBP method.Document (Andrew J Calder, A.Mike Burton,Paul Miller,Andrew W Young,Shigeru Akamatsu.A principal component Analysis of facial expressions [J] .Vision Research, 2001,41 (9): 1179-1208.) it uses PCA algorithm, the low-dimensional for obtaining face indicate, carry out expression classification with this, but its calculation amount is larger, and PCA cannot be very The detailed information of good reservation image, eventually leading to cannot be applied to well in practical problem, but this method is to face The information such as scale size, direction, position are insensitive, provide thinking for subsequent work；Brave wait of print (prints brave, Shi Jinyu, Liu Dan Facial expression recognition [J] the photoelectric project of flat based on Gabor wavelet, 2009,36 (5): 111-116.) extract face Gabor characteristic, then by PCA algorithm obtain Gabor characteristic low-dimensional indicate, finally using multi-classification algorithm to feature vector into Row classification, has biggish promotion in the discrimination on JAFFE data set compared with 2DPCA algorithm.Document (Wu T, Bartlett M S,Movellan J R.Facial expression recognition using gabor motion energy filters[C].San Francisco:2010IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), 2010.) it is filtered by Gabor Method obtains corresponding eigenface, then SVM will be used to classify after eigenface dimensionality reduction；California, USA university develops one kind can With the system for automatically adjusting video playout speed according to the expression of people；In addition the Japanese laboratory ATR discloses a Japan Female face expression data collection JAFFE, while also proposed the facial expression recognition technology based on Gabor wavelet coding (Michael J.Lyons,Shigeru Akemastu,Miyuki Kamachi,Jiro Gyoba.Coding Facial Expressions with Gabor Wavelets[C],Nara:Third IEEE International Conference On Automatic Face and Gesture Recognition, 1998.) and a kind of automatic facial expression recognition system (Michael J.Lyons,Julien Budynek,ShigeruAkamatsu.Automatic Classification of Single Facial Images[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,1999,21(12):1357-1362.).But to there is Expression Recognition accurate for these existing recognition methods In the case that rate is high, recognition speed is slower；And in the faster situation of recognition speed, Expression Recognition accuracy rate is low.

Summary of the invention

In view of the above existing problems in the prior art, the present invention provides a kind of Multi resolution feature extractions and global characteristics to merge Facial expression recognizing method, in the case where guaranteeing that recognition accuracy is high, recognition speed is fast, while adapting to a variety of illumination rings Border has stronger robustness, so as to effectively meet application request.

To achieve the goals above, the technical solution adopted by the present invention is that: a kind of Multi resolution feature extraction and global characteristics The facial expression recognizing method of fusion, specific steps are as follows:

A, it selects a face expression data collection as initial data, initial data is divided into training set data and test set Training set data and test set data are converted into TFrecord lattice using TensorFlow artificial intelligence learning system by data Formula file is saved；

B, the volume of Multi resolution feature extraction and global characteristics fusion is constructed using TensorFlow artificial intelligence learning system Product neural network: Multi resolution feature extraction module is located to the inlet of convolutional neural networks, convolutional neural networks are by multiple volumes Lamination, multiple global pool layers, fused layer and classification layer composition, multiple convolutional layers mention the characteristic pattern of different depth respectively It takes, completes to pass to multiple global pool layers after extracting, multiple global pool layer combination fused layers merge pond by global characteristics It is sent into classification layer after changing operation, the expression classification in layer of classifying is divided into: it is angry, detest, fears, it is glad, it is sad, it is surprised, it is tranquil Totally seven kinds, corresponding English classification is Angry, Disgust, Fear, Happy, Sad, Surprise, Neutal；

C, convolutional neural networks read training set data, pre-process to training set data, and pretreatment includes turning at random Turn, random cropping, random brightness variation and random Gaussian, after the completion with the training set data to convolutional neural networks carry out Model training, after model training of every completion, convolutional neural networks read training set data and carry out facial expression recognition test, The accuracy rate for obtaining the facial expression recognition after the secondary model training, then proceedes to model training next time, in model training mistake Intermediate output model file is saved in journey；

D, training/test accuracy rate data and training/test loss data are exported after each model training, and with curve Form real-time update show, pass through observation training/test accuracy rate curve and training/test loss curve, dynamic adjustment mould Shape parameter；

E, when training/test accuracy rate no longer rises and stablizes, and training/test loss value no longer declines and stablizes, Terminate the model training of convolutional neural networks, and saves the model parameter of last time；

F, convolutional neural networks load the model parameter of last time, then read test collection data, to test set data In each expression successively identify belonging to expression classification, after the identification for completing institute's espressiove, calculate the average standard of institute's espressiove True rate and average F1-score index, are finally completed the process of facial expression recognition.

Further, the detailed process of step A are as follows: select FER2013 human face expression data set as initial data, picture number It is .csv format according to file, by Usage attribute, data set is divided into Training, PrivateTest and PublicTest tri- Point, wherein PrivateTest data set is as test set data using Training data set as training set data；Using TensorFlow artificial intelligence learning system by training set data and test set data be converted into TFrecord formatted file into Row saves, and the efficiency of data is read when this format can improve trained, convenient for the processing and preservation of image.

Further, in the step B pond layer by the characteristic pattern of different depth by global poolization operation after, be transformed to one Dimensional vector, then all vectors are spliced, finally it is sent into classification layer.

Further, the detailed process of the step C are as follows:

Convolutional neural networks are read using multithreadingTFrecordThe training set data of format, reduce data reading The time-consuming taken pre-processes training set data after data are converted to numpy array from tensor type, pretreatment packet Overturning at random, random cropping, random brightness variation and random Gaussian are included, wherein random overturn including flip horizontal, vertically Overturning and horizontal vertical overturning；Random cropping is to be cut to cut origin to original image with the random center of selection, then will give birth to At new images size be normalized to the input size of network；Random brightness transformation is the linear change of the pixel value about original image It changes, effect intuitively is brightness change；Random Gaussian is superposition random Gaussian signal on the basis of original image, is used for mould The electrical noise that imitative camera component generates；Model training, mould are carried out to convolutional neural networks with the training set data after the completion Type training is divided into stress model structure, optimization loss function and error back propagation；Loss function uses Softmax cross entropy letter Number, optimizes loss function using Adam algorithm；After model training of every completion, convolutional neural networks read training set Data carry out facial expression recognition test, obtain the accuracy rate of facial expression recognition after the secondary model training and save the secondary mould Type file；Model training next time is then proceeded to, updates the preservation secondary model file after completing the secondary model training.

Further, the step D detailed process are as follows: after each model training output training/test accuracy rate data with Training/test loss data, and real-time update is shown in graph form, passes through observation training/test accuracy rate curve and instruction White silk/test loss curve, dynamic adjustment model parameter, and the model parameter adjusted will be needed to be stored in individual configuration file In, take automatic being adjusted with the mode combined manually.

Compared with prior art, the present invention has the advantage that

1, the data set FER2013 that the present invention uses has authority in facial expression recognition problem, and has numerous Research institution and scholar conduct a research around the data set；

2, the present invention is the facial expression recognition new method based on convolutional neural networks, has stronger robustness, in light According in changeable environment also have stable recognition effect；

3, the data preprocessing method that uses of the present invention efficiently solves that sample class is uneven and lazy weight is asked Topic；

4, recognition accuracy of the present invention on FER2013 data set is high, and recognition speed is fast, solves other identification sides In the case that method has Expression Recognition accuracy rate height, recognition speed is slower；And in the faster situation of recognition speed, Expression Recognition The low problem of accuracy rate, so as to effectively meet application request.

Detailed description of the invention

Fig. 1 is training flow chart of the invention；

Fig. 2 is the result schematic diagram overturn at random in training data pretreatment of the present invention；

Wherein, the first behavior original image；Second row is followed successively by flip vertical, flip horizontal and vertical-horizontal overturning；

Fig. 3 is the result schematic diagram of random cropping in training data pretreatment of the present invention；

Wherein, the first behavior original image；Three kinds of images that second behavior random cropping obtains；

Fig. 4 is the result schematic diagram of random brightness variation in training data pretreatment of the present invention；

Wherein, the first behavior original image；Three kinds of images that second behavior random brightness changes；

Fig. 5 is the result schematic diagram of random Gaussian in training data pretreatment of the present invention；

Wherein, the first behavior original image；Three kinds of images that second behavior addition random Gaussian obtains；

Fig. 6 is pretreatment process figure of the present invention to training data；

Wherein, k is the probability for controlling final output enhancing image, and range is 0~1；

Fig. 7 is the schematic diagram of Multi resolution feature extraction module in the present invention；

Fig. 8 is the schematic diagram of global characteristics fused layer in the present invention；

Fig. 9 is pretreatment schematic diagram of the present invention to test set data；

Figure 10 is the present invention to the corresponding confusion matrix of test set data recognition result.

Specific embodiment

The present invention will be further described below.

As shown, specific steps of the invention are as follows:

A, select FER2013 human face expression data set as initial data, image data file is .csv format, is passed through Usage attribute, data set are divided into Training, PrivateTest and PublicTest three parts, wherein by training data Collection is used as training set data, and PrivateTest data set is as test set data；System is learnt using TensorFlow artificial intelligence Training set data and test set data are converted into TFrecord formatted file and saved by system, and this format can improve training When read data efficiency, convenient for the processing and preservation of image；Since the training of neural network relies on a large amount of data set, The number and diversity of training data largely affect the performance of neural network；

B, the volume of Multi resolution feature extraction and global characteristics fusion is constructed using TensorFlow artificial intelligence learning system Product neural network: Multi resolution feature extraction module is located to the inlet of convolutional neural networks, convolutional neural networks are by multiple volumes Lamination, multiple global pool layers, fused layer and classification layer composition, multiple convolutional layers mention the characteristic pattern of different depth respectively It takes, completes to pass to multiple global pool layers after extracting, multiple global pool layer combination fused layers merge pond by global characteristics Final classification layer is sent into after changing operation；(such as table 1) sets X as input picture, and A is the convolution kernel having a size of 1*1, k1, k2 3*3 With the convolution kernel of 5*5, Y is that the characteristic pattern splicing generated by upper and lower two-way obtains.Because it is contemplated that various sizes of feature, it should Multi resolution feature extraction module allows neural network relatively sufficiently using the information of original pixels, and the module is in network Inlet, the quantity of characteristic pattern or fewer can improve the property of network under the premise of unobvious increase parameter amount Energy.Two-way head and the tail have 1*1 convolution kernel up and down simultaneously, but purpose is different, the former output channel number is greater than input channel number, make It with being augmented features, prepares for subsequent convolution, because the feature that convolution kernel extracts is past when feature port number is less Toward be it is insufficient, generally improve this problem by expanding channel；The output channel number of the latter is less than input channel number, main Acting on is pressure channel, reduces calculation amount.Image data after extracting obtains after being handled respectively by multiple convolutional layers The characteristic pattern of different depth, neural network pass through global pool layer by the characteristic pattern of different depth after global poolization operation, It is transformed to one-dimensional vector, then all vectors are spliced, is sent into final classification layer.Global pool layer provides for network Invariance on feature locations replaces response in this position with the statistical property of part, and in Feature-scale invariance, Neural network is maintained using various sizes of convolution kernel, a kind of existing method be generated using various sizes of image it is corresponding The characteristic pattern of size, but the calculation amount of network can be greatly increased in this way, the method that the present invention uses is different deep using network On the one hand the characteristic pattern of degree improves the utilization rate of data, does not in addition also have since this method uses the generated data of network Obviously increase network query function amount；

C, convolutional neural networks are read using multithreadingTFrecordThe training set data of format reduces data reading The time-consuming taken, after data are converted to numpy array from tensor type, as shown in fig. 6, being located in advance to training set data Reason, pretreatment includes random overturning, random cropping, random brightness variation and random Gaussian, wherein random overturning includes water Flat overturning, flip vertical and horizontal vertical overturning；Random cropping is to be carried out with the random center of selection to cut origin to original image It cuts, then the new images size of generation is normalized to the input size of network；Particularly may be divided into three steps, the first step be determine with Machine cut size, second step determine random cropping center, and third step is normalized image size.It is assumed that input picture size is D × H ignores channel, random cropping size range are as follows:

ε_size∈[ε_min,ε_max] (1)

Then random cropping is sized to:

ε_size=random (ε_min,ε_max) (2)

Wherein random () indicates to take the random value in section.By random cropping size ε_sizeAnd input image size can Determine random cropping center ε_posAre as follows:

Wherein ε_pos-x,ε_pos-yRespectively indicate the transverse and longitudinal coordinate of random cropping center in the picture.It finally again will be after cutting Picture size is normalized to specified size.

Random brightness transformation is the linear transformation of the pixel value about original image, and effect intuitively is brightness change；If Determining luminance factor is ε_light, original image pixel is v_x,y, wherein subscript x, y indicate pixel coordinate, then the image slices after brightness changes Element is ε_light·v_x,y；

Random Gaussian is superposition random Gaussian signal on the basis of original image, for imitating the generation of camera component Electrical noise；It is assumed that input image pixels are v_x,y, wherein subscript x, y indicate pixel coordinate, Gaussian noise be f (mean, Sigma), the enhancing coefficient of noise is k, wherein mean, and sigma, k are the random value in given section, then after adding noise Image pixel be v_x,y+k·f(mean,sigma)。

Then again with the probability output original image of k, image is enhanced with the probability output of 1-k, guarantees the instruction of form of ownership with this Neural metwork training can be passed through by practicing image；Model training, mould are carried out to convolutional neural networks with the training set data after the completion Type training is divided into stress model structure, optimization loss function and error back propagation；Specially lost using Adam algorithm optimization Function, and the backpropagation of error is carried out, update network parameter；

The loss function that neural network uses is Softmax cross entropy loss function, and Softmax function is by the defeated of network Numerical value conversion is probability out, it is easier to comparison and processing, formula are as follows:

Wherein y_iIt is the corresponding Linear Score function of correct classification, S_iIt is corresponding Softmax output of correctly classifying.And Softmax cross entropy loss function are as follows:

C=- ∑_iy′_ilogy_i (5)

After the loss function of Current Situation of Neural Network is calculated, the parameter of neural network is carried out more by Back Propagation Algorithm Newly.

In neural network training process, learning rate lr and learning rate attenuation coefficient δ are individually stated in dynamic configuration file, And other hyper parameters are then stated directly in the form of variable in a program, and under automatic adjustment mode, learning rate lr and learning rate The relationship of attenuation coefficient δ are as follows:

Lr=lr* δ^epoch (6)

Wherein epoch is current iteration number, and under manual fine-tuning mode, program can access dynamic in each iterative process State configuration file, reads newest primary parameter, under this mode, the relationship of learning rate lr and learning rate attenuation coefficient δ are as follows:

Lr=lr* δ (7)

After model training of every completion, convolutional neural networks read training set data and carry out facial expression recognition test, It obtains the accuracy rate of facial expression recognition after the secondary model training and saves the secondary model file；Then proceed to model next time Training updates the preservation secondary model file after completing the secondary model training.

D, training/test accuracy rate data and training/test loss data are exported after each model training, and with curve Form display is updated in TensorBoard in real time, pass through observation training/test accuracy rate curve and training/test loss Curve, dynamic adjustment model parameter, and the model parameter that adjusts will be needed to be stored in individual configuration file, take automatically with The mode combined manually is adjusted；

F, convolutional neural networks load last time model parameter, then read test collection data (i.e. FER2013'sPrivateTestData set), as shown in fig. 6, being pre-processed to test image, respectively on five positions (upper left, upper right, a left sides Under, bottom right and center) cut, and the picture size after cutting is normalized to network inputs size, a final test Image can be transformed to six images (image and original image after five cuttings) and be identified for neural network, finally lead to again Cross ballot decision-making mechanism obtain current test image expression classification and corresponding probability.

The Average Accuracy and F1-score of recognition result are calculated by confusion matrix, and confusion matrix is statistics Common evaluation means, also referred to as Error Matrix in habit and machine learning classification problem.Assuming that classification number is n, then square is obscured Battle array is exactly the square matrix of a n x n, and row indicates the actual distribution of sample, the classification results of column expression classifier, on each position Digital representation ratio, neural network prediction is such ratio in current category of test, then when the diagonal entry of matrix Closer to 1, illustrate that the performance of neural network is more excellent；The confusion matrix stage is being calculated, system is successively read every a kind of expression figure Picture counts recognition result, form is the vector of 7*7, as shown in fig. 7, on position (1,1) after neural network recognization 0.664 indicates that the expression picture all " Angry " classifications in training data passes through neural network recognization, wherein having 66.4% Picture network be identified as " Angry " classification, i.e. identification is correct, and the picture of residue 33.6% is then identified as other classifications.

As shown in table 2, the validity of method of the invention is illustrated, FER2013'sPrivateTestIt is obtained on data set Best recognition effect was obtained, the identification for greatly exceeding the mankind is horizontal.

Table 1: the neural network structure of building

Name	Kernel size/stride	Input size(Name)
			Conv1	3364/1	96961(*)
Multi resolution feature extraction module M	-	484832(Conv1)
			dConv2	3364/1	484864(M)
dConv3	33128/2	484864(dConv2)
			dConv4	33128/1	2424128(dConv3)
dConv5	33256/2	2424128(dConv4)
			dConv6	13256/1	1212256(dConv5)
dConv7	31256/1	1212256(dConv6)
			dConv8	13256/1	1212256(dConv7)
dConv9	31256/1	1212256(dConv8)
			dConv10	13256/1	1212256(dConv9)
dConv11	31256/1	1212256(dConv10)
			dConv12	33512/2	1212256(dConv11)
dConv13	33512/1	66512(dConv12)
			Global pool layer GP1	6*6/1	66512(dConv13)
Global pool layer GP2	12*12/1	1212256(dConv11)
			Global pool layer GP3	12*12/1	1212256(dConv9)
Fused layer Concat	-	(GP1+GP2+GP3)
			Conv14	1*1/1	111024(Concat)
Softmax	classifier	117(Conv14)

DConv indicates that depth separates convolution

The experimental result that various network models and the mankind distinguish on the PrivateTest data set of table 2:FER2013

Note: existing method is document (lan J.Goodfellow, Yoshua Bengio, et al.Challenges in Representation Learning:Areport on three machine learning contests[J],Machine Learning, 2015,64:59-63.) in obtain.

Claims

1. a kind of Multi resolution feature extraction and the facial expression recognizing method of global characteristics fusion, which is characterized in that specific steps Are as follows:

A, it selects a face expression data collection as initial data, initial data is divided into training set data and test set number According to training set data and test set data are converted into TFrecord format using TensorFlow artificial intelligence learning system File is saved；

B, the convolution mind of Multi resolution feature extraction and global characteristics fusion is constructed using TensorFlow artificial intelligence learning system Through network: Multi resolution feature extraction module is located at the inlet of convolutional neural networks, and convolutional neural networks are by multiple convolutional layers, more A global pool layer, fused layer and classification layer composition, multiple convolutional layers extract the characteristic pattern of different depth respectively, are completed Multiple global pool layers are passed to after extraction, multiple global pool layer combination fused layers are after global characteristics fusion pondization operation It is sent into classification layer, the expression classification in layer of classifying is divided into: it is angry, detest, fears, it is glad, it is sad, it is surprised, it is totally seven kinds tranquil；

C, convolutional neural networks read training set data, pre-process to training set data, pretreatment include random overturning, with Machine is cut, random brightness changes and random Gaussian, carries out model to convolutional neural networks with the training set data after the completion It trains, after model training of every completion, convolutional neural networks read training set data and carry out facial expression recognition test, obtain The accuracy rate of facial expression recognition, then proceedes to model training next time, during model training after the secondary model training Save intermediate output model file；

D, training/test accuracy rate data and training/test loss data are exported after each model training, and with the shape of curve Formula real-time update is shown, passes through observation training/test accuracy rate curve and training/test loss curve, dynamic adjustment model ginseng Number；

E, when training/test accuracy rate curve no longer rises and stablize, and training/test loss curve no longer decline and stabilization When, terminate the model training of convolutional neural networks, and save the model parameter of last time；

F, convolutional neural networks load the model parameter of last time, then read test collection data, in test set data Each facial expression image successively identifies affiliated expression classification, after the identification for completing institute's espressiove, calculates the average standard of institute's espressiove True rate and average F1-score index, are finally completed the process of facial expression recognition.

2. a kind of Multi resolution feature extraction according to claim 1 and the facial expression recognizing method of global characteristics fusion, It is characterized in that, the detailed process of step A are as follows: select FER2013 human face expression data set as initial data, image data text Part is .csv format, and by Usage attribute, data set is divided into Training, PrivateTest and PublicTest three parts, Wherein using Training data set as training set data, PrivateTest data set is as test set data；Using TensorFlow artificial intelligence learning system by training set data and test set data be converted into TFrecord formatted file into Row saves.

3. a kind of Multi resolution feature extraction according to claim 1 and the facial expression recognizing method of global characteristics fusion, It is characterized in that, in the step B pond layer by the characteristic pattern of different depth by global poolization operation after, be transformed to it is one-dimensional to Amount, then all vectors are spliced, finally it is sent into classification layer.

4. a kind of Multi resolution feature extraction according to claim 1 and the facial expression recognizing method of global characteristics fusion, It is characterized in that, the detailed process of the step C are as follows:

Convolutional neural networks read the training set data of TFrecord format using multithreading, carry out to training set data pre- Processing, pretreatment includes random overturning, random cropping, random brightness variation and random Gaussian, wherein random overturning includes Flip horizontal, flip vertical and horizontal vertical overturning；Random cropping be with the random center of selection for cut origin to original image into Row is cut, then the new images size of generation is normalized to the input size of network；Random brightness transformation is about original image The linear transformation of pixel value, effect intuitively are brightness change；Random Gaussian is superimposed at random on the basis of original image Gaussian signal, for imitating the electrical noise of camera component generation；After the completion with the training set data to convolutional neural networks Model training is carried out, model training is divided into stress model structure, optimization loss function and error back propagation；Loss function uses Softmax intersects entropy function, is optimized using Adam algorithm to loss function；After model training of every completion, convolution mind Training set data is read through network and carries out facial expression recognition test, obtains the standard of the facial expression recognition after the secondary model training True rate simultaneously saves the secondary model file；Model training next time is then proceeded to, updates this time of preservation after completing the secondary model training Model file.

5. a kind of Multi resolution feature extraction according to claim 1 and the facial expression recognizing method of global characteristics fusion, It is characterized in that, the step D detailed process are as follows: exported after each model training training/test accuracy rate data and training/ Loss data are tested, and real-time update is shown in graph form, passes through observation training/test accuracy rate curve and training/survey Loss curve, dynamic adjustment model parameter are tried, and the model parameter adjusted will be needed to be stored in individual configuration file, is taken Automatically it is adjusted with the mode combined manually.