CN110458021A

CN110458021A - A kind of face moving cell detection method based on physical characteristic and distribution character

Info

Publication number: CN110458021A
Application number: CN201910620049.XA
Authority: CN
Inventors: 胡巧平; 申瑞民; 姜飞
Original assignee: Shanghai Jiaotong University
Current assignee: Shanghai Jiaotong University
Priority date: 2019-07-10
Filing date: 2019-07-10
Publication date: 2019-11-15

Abstract

The face moving cell detection method based on physical characteristic and distribution character that the present invention relates to a kind of, it is characterized in that, this method is handled one group of sequence of pictures based on the face moving cell detection model of pre-training, face moving cell testing result is obtained, the face moving cell detection model includes sequentially connected intersection splicing network and shot and long term memory network.Compared with prior art, the present invention considers for the first time and solves the problems, such as that data distribution is unbalanced between different faces moving cell, further improves face moving cell detection effect.

Description

A kind of face moving cell detection method based on physical characteristic and distribution character

Technical field

The present invention relates to field of computer technology, are transported more particularly, to a kind of based on the face of physical characteristic and distribution character Moving cell detection method.

Background technique

The facial expression analysis of people is the key areas of artificial intelligence, and face moving cell (AU, action unit) Detection it is most important for human face's Expression analysis.The facial expression of people is face caused by the facial muscle movement of people Portion acts coded system and the movement of one or more muscle is known as face moving cell one by one.The almost all of table of face Feelings can be indicated with the combination of a face moving cell or multiple face moving cells.Such as: laughing at can above be mentioned with cheek (AU6) and the corners of the mouth raise up (AU12) combination indicate, as shown in Figure 3.

The detection of face moving cell seeks to come out which kind people someone occurs on the face according to picture or video detection Face moving cell, as soon as such as a people laugh at, be likely to occur that cheek above mentions (AU6) and the corners of the mouth raises up (AU12).

For the facial expression analysis of people, the facial expression (Emotion) of people can be generally divided into seven by researcher Class: glad (Happiness), sad (Sadness), surprised (Surprise), frightened (Fear), angry (Anger), detest (Disgust) and despise (Contempt), referred to as pervasive facial expression.Every kind of expression therein can with the combination of AU come It indicates, this seven kinds of facial expressions and the corresponding relationship of face moving cell are as shown in table 1.Therefore, it can detect that face movement is single In the case where member, pervasive facial expression can be obtained directly according to table 1.

Corresponding relationship between the pervasive expression of table 1 and AU

Emotion	Action units
		Happiness	6+12
Sadness	1+4+15
		Surprise	1+2+5B+26
Fear	1+2+4+5+7+20+26
		Anger	4+5+7+23
Disgust	9+15+16
		Contempt	R12A+R14A

In addition, the demand of face facial expression analysis is insufficient in practical application scene with seven kinds of pervasive expressions, For example the expression of hospital's detection Principle of Pain, the puzzled expression etc. of detection student in education activities, these expressions can not be with pervasive Seven kinds of expressions express, it is necessary to described with this detailed muscular movement of AU.Therefore, facial table of the AU detection for people Mutual affection analysis, is very important.

AU detection algorithm seeks to extract face characteristic and classified calculating goes out face and has which AU to occur.Existing AU Detection algorithm, which is substantially focused on, removes contrived experiment using the 2 big physical characteristics of AU.1) temporal characteristics, for video data Speech, because AU is a continuous movement, timing information is very important AU detection, for example, when the 1st second this People is laughing at, he is also very big a possibility that laughing at the 1.5th second；2) correlation between AU, the correlation between AU are Refer to that certain AU often occur together, certain AU have mutual exclusion characteristic and will not usually occur together, and most typical example is exactly general Expression is fitted with the corresponding relationship between AU, since pervasive expression is the expression often occurred in people's daily life, their institutes are right The AU combination answered is the AU combination often occurred together；3) region characteristic, according to the definition of AU and source, for specific a certain For AU, it is only related with a certain piece of region of face whether AU occurs, such as AU12 only related with the region around the corners of the mouth, with face Other parts include that the regions such as eyes forehead are unrelated, and region characteristic can also be referred to as sparse characteristic.AU detection so far Algorithm is essentially all the physical characteristic using above-mentioned AU, carries out exquisite algorithm design, the detection knot of Lai Tisheng AU Fruit.And algorithm described herein, it is also special other than in view of the physical characteristic (including temporal characteristics and AU correlation) of AU The distribution character (being unevenly distributed weighing apparatus characteristic) of AU is not considered, that is, the probability that different AU occurs is different, some AU are in daily life Often occur in work, and some AU then considerably less appearance, to realize higher detection effect.

The basic common sense of AU algorithm: the input of algorithm is picture or video, and the output of algorithm is to judge whether various AU deposit In.For a certain specific AU, output only exists and two states is not present, so be two classification problems, and due to Algorithm is to find out a variety of AU and both know about whether there is, so AU detection algorithm will solve the problems, such as it is that two classification of multi-tag are asked Topic a, that is to say, that algorithm should obtain the result that multiple AU whether there is respectively.

Document " Deep region and multi-label learning for facial action unit Detection " (Kaili Zhao, Wen Sheng Chu, Honggang Zhang, in Computer Vision and Pattern Recognition, 2016, pp.3391-3399) DRML algorithm is disclosed, DRML algorithm is based on typical depth Neural network AlexNet network structure, it is contemplated that two big physical characteristics in AU detection: the correlation between region characteristic and AU Property.

As shown in figure 4, DRML algorithm deletes it on the basis of typical deep neural network AlexNet network structure His pond (pooling) layer only remains one layer of pond (pooling) layer, and increases the region of author's designed, designed Layer, for learning the region characteristic of face.

Specifically, DRML network structure input picture is the face picture (Aligned face image) after calibration, figure Piece size be it is wide and it is high be all 170 pixels the leftmost side colour element face single picture (see Fig. 4).Picture is sent to DRML net Network successively passes through conv1 (convolutional layer 1), region2 (region layer 2), pool3 (pond layer 3), conv4 (convolutional layer 4), Conv5 (convolutional layer 5), conv6 (convolutional layer 6), conv7 (convolutional layer 7), fc8 (full articulamentum 8), fc9 (full articulamentum 9), Outpout (output) obtains the output result of 12 kinds of AU for totally 10 layers.Output is also full articulamentum.Other than region layers, Other layers can find and call directly in deep learning frame.

As shown in figure 5, being the schematic diagram of region layers (Region layer).Picture by first layer convolutional calculation it Afterwards, obtain it is 32 wide and it is high be all 160 characteristic pattern (feature map), 160 × 160 characteristic pattern is split as 8 × 8= 64 pieces, each piece are then the small characteristic pattern of 20 × 20 pixels is successively passed through BN layers by the small characteristic patterns of 20 × 20 pixels (batch normalization layer, batch regularization layer), ReLU layers (ReLU function active coating), conv layers (convolution layer, convolutional layer) is finally stitched together 64 pieces of all calculated results, as output.Thus, Region layers of Main physical is meant that the uniform piecemeal of a face, is individually learnt to every piece, then to each piece Result is practised to merge.

In addition, DRML algorithm is in training using multi-tag sigmoid intersection in order to learn to the correlation between AU Entropy function:

L is loss in formula, and Y is true value (ground truth), i.e. whether this input picture actually these AU deposit In；For the value that algorithm calculates, i.e. the algorithm AU situation that detected picture；N is number of pictures, this is because depth Practising in network is usually that a collection of picture of a batch is sent into e-learning, so N refers to that how many a collection of picture, N are more than or equal to 1 Integer；C is number of tags, i.e., how many kind AU needs to detect, and output (output) layer shown in Fig. 4 has 12 outputs, i.e. C= 12, there are 12 kinds of AU needs to detect whether exist；N is the index of N, and c is the index of C.

The true value of every kind of face moving cell can be -1 ,+1,0.Wherein+1 is positive sample, i.e. this AU exists, -1 For negative sample, i.e. this AU is not present, and 0 is invalid.In this way, the true value of every picture can be indicated with 12 dimensional vectors, often It is one-dimensional to represent a kind of AU.Such as: [1,1,0,0, -1, -1, -1, -1, -1, -1, -1, -1] represents this and occurs the 1st the 2nd on the face There is not the representative AU of the 5th to the 12nd dimension in the representative AU of dimension, it is not known that whether the representative AU of the 3rd and the 4th dimension goes out It is existing.According to multi-tag sigmoid cross entropy function formula, the pervious part of "+" is to calculate true value as+1 i.e. figure of positive sample Piece loss, what the later part of "+" calculated is that true value is -1 i.e. picture loss of negative sample, and the true value for working as certain AU is When 0, it all cannot participate in calculating before and after "+", so this can be ignored, be not involved in calculating.

DRML algorithm has the disadvantage in that 1) algorithm basis is AlexNet, and the network number of plies is shallower, leads to learning effect not Good, it is bad that AU detects detection effect；2) do not account for timing information and different AU is unevenly distributed weighing apparatus characteristic, necessarily leads to algorithm It is a little better to the AU detection effect more than those sample sizes, and it is excessively poor to the few AU detection effect of sample size.

Document " Action unit detection with region adaptation, multi-labeling Learning and optimal temporal fusing " (Wei Li, Farnaz Abtahi, and Zhigang Zhu, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017) R-T1 algorithm is disclosed, R-T1 algorithm is based on deep learning network VGG network structure, and fully utilize AU three are big Physical characteristic (correlation, region characteristic between temporal characteristics, AU) design, wherein temporal characteristics use LSTM (long Short term system, shot and long term memory network) network, the correlation between AU learnt using multi-tag, and region characteristic is adopted It is realized with the self-designed ROI Nets of author.

As shown in fig. 6, R-T1 algorithm is based on VGG network structure, the part that convolutional layer conv12 is later in VGG is deleted, ROI Nets and LSTM network, the result of a variety of AU of final output are spliced.Damage used by multi-tag study is carried out when training Lose the expression formula of (loss) function are as follows:

In formula, l is true value, and can use 0 or 1,0 is negative sample, i.e., the AU does not occur, and 1 is positive sample, that is, is occurred The AU, P are the probability for the AU occur that algorithm calculates, and author is with 0.05 and 1.05 come loss prevention explosion, limitation loss Range.

Specifically, R-T1 algorithm input data is the picture of W high H 24 wide as a sequence, the choosing method of sequence It is for present frame picture, the part before this picture randomly chooses 23 pictures in video, arranges sequentially in time For sequence of pictures, N number of sequence is inputted every time.Image data passes through conv1 to the conv12 layer of VGG first, obtain 512 14 × 14 characteristic pattern (feature map) is then inputted according to the 20 of picture characteristic points (landmarks) relevant to AU in original Position in picture, is mapped in characteristic pattern, and on characteristic pattern, 3 × 3 region is intercepted centered on characteristic point, extend to 6 × 6 sizes, then convolution is done, finally the region calculated result the mapping of all characteristic points is integrated, as the output of ROI Nets, finally Timing information is obtained using LSTM network structure, obtains the output result (AU Labels) of AU.

ROI Nets explanation: ROI (Region Of Interests) is interested region.ROI Nets is for needle To different AU, it is used to extract the relevant provincial characteristics of AU with reference to 20 characteristic point selection different zones.Characteristic point can provide eyes The position of nose mouth etc. can substantially confirm AU according to characteristic point position since the muscle position that different AU are related to is different Muscle position, thus selection region center.It is mapped to corresponding point on characteristic pattern according to characteristic point position, with this 20 points is The heart, selection region extract 20 regions from characteristic pattern, are expanded (upscale) and convolution (conv) operation, and most Whole result integrates,

LSTM network: LSTM is normally used for the data that processing has timing information, and LSTM is by multiple LSTM memory units Composition, a LSTM memory unit is as shown in Figure 7.In figure, t is current time, and t-1 is last moment, and C is state, and h is defeated Out, Sigmoid is Sigmoid activation primitive, and tanh is tanh activation primitive.Entire LSTM memory unit passes through input gate, something lost Forget door and out gate determines more new information, forgets partial information, and exports result and updated state.

R-T1 algorithm has the disadvantage in that 1) input data not only needs picture, it is also necessary to characteristic point information, so R-T1 The input data of network is more than the data for merely entering picture；2) do not account for timing information and different AU is unevenly distributed Heng Te Property necessarily causes algorithm a little better to the AU detection effect more than those sample sizes, and non-to the few AU detection effect of sample size It is often poor；3) the unbalanced problem of the positive and negative sample data of same AU, negative sample in usual data set are not accounted for (AU does not occur) This number causes AU to examine considerably beyond positive sample (AU appearance) number, the unbalanced training for influencing whether algorithm of positive negative sample It is not high to survey effect.

Summary of the invention

It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide one kind to be based on physical characteristic With the face moving cell detection method of distribution character.

The purpose of the present invention can be achieved through the following technical solutions:

A kind of face moving cell detection method based on physical characteristic and distribution character, which is characterized in that this method base One group of sequence of pictures is handled in the face moving cell detection model of pre-training, obtains face moving cell detection knot Fruit, the face moving cell detection model include sequentially connected intersection splicing network and shot and long term memory network.

Further, the splicing network that intersects is based on the VGG network for intersecting splicing block, i.e., to intersect splicing block replacement All convolutional layers obtain intersecting splicing network in VGG network.

Further, each intersection splicing block includes sequentially connected first convolutional layer and the first activation primitive, institute The output and the input of the first convolutional layer for stating the first activation primitive are spliced, and have just merged upper one layer of feature and next layer in this way Feature.

Further, first activation primitive includes regularization, and regularization weakens unessential characteristic variable automatically, is prevented Only over-fitting.

Further, the second convolution is connected with after the output of first activation primitive and the input splicing of the first convolutional layer Layer, the convolution kernel size of second convolutional layer are 1 × 1, in order to keep and original identical port number.

During the extraction process, the feature more bottom that the layer that is more first calculated extracts is finer for feature, more it is high-rise after The feature that the layer of calculating extracts more is partial to semantic feature, by the fusion of this feature between layers, strengthens low-level image feature Extraction, finer feature makes in limited amount of training data study be used to differentiate AU to enough adequately characteristic informations It is possibly realized, therefore, intersects splicing block, itself have and different AU are distributed with insensitive characteristic, even if some AU sample sizes Few, also our algorithm extracts sufficient feature and carries out having detected for AU enough.

Further, the sequence of pictures is the picture of successive frame.In sequence of pictures, adjacent two sequential correlations are more By force, since video is that 25 frames are per second, the time interval between every two frame only has 40 milliseconds, and face AU is difficult to have in 40 milliseconds Very big variation, thus the sequence of pictures of successive frame has very stronger timing information.

Further, the picture has 32, finds by contrast test, and effect is more when 32 pictures are as a sequence It is good.

Further, loss function used by the pre-training of the face moving cell detection model is multi-tag Sigmoid intersects entropy function, carries out multi-tag study.The processing to invalid tag is devised in the loss function, by one A little negative samples are set as the mode of invalid tag, i.e., true value be -1 be set as 0 be not involved in costing bio disturbance, balance same The positive and negative sample distribution of kind of AU, enable training pattern when model training more preferably.

Compared with prior art, the invention has the following advantages that

(1) face moving cell detection method of the present invention considers that data distribution is uneven between different faces moving cell for the first time Weighing apparatus problem, and proposing intersection splicing block structure keeps network no longer sensitive to different faces moving cell data distribution, in number The promotion of detection effect is especially apparent above the face moving cell being in a disadvantageous position according to amount.

(2) one 1 × 1 convolutional layer is also added in present invention intersection splicing block to be made in this way to restore number of active lanes The convolutional layer number of parameters for obtaining script VGG is retained, thus can be detected using the trained model of VGG-Face to the present invention Method initializes.Since VGG-Face is trained in a large amount of human face datas originally, the feature that it is extracted is exactly The distinctive characteristic information of face, is trained on this basis, greatly accelerates the convergence rate of model of the present invention.

(3) present invention intersects entropy function as loss function using multi-tag sigmoid, devises in the function to invalid The processing of label balances the positive and negative sample distribution of same AU, makes in such a way that some negative samples are set as invalid tag Training pattern when model can be trained more preferably.

(4) present invention using 32 pictures of successive frame as one group of list entries, adjacent two picture in the sequence Sequential correlation is more reinforced, and since video is that 25 frames are per second, the time interval between every two frame only has 40 milliseconds, and face AU is difficult There is very big variation in 40 milliseconds, thus list entries of the present invention has very stronger timing information, and 32 pictures Effect is best when as a sequence.

Detailed description of the invention

Fig. 1 is the structural schematic diagram of face moving cell detection model of the present invention；

Fig. 2 is the structural schematic diagram of the convolutional layer and intersection splicing block in VGG network, wherein (2a) is in VGG network One convolutional layer, (2b) are to intersect convolution block；

Fig. 3 is the face moving cell combination table diagram laughed at, and (3a) is to laugh at figure, and (3b) is that the cheek laughed in figure above mentions (AU6) figure, (3c) are that the corners of the mouth for laughing in figure raises up (AU12) figure；

Fig. 4 is the structural schematic diagram of DRML algorithm；

The structural schematic diagram that Fig. 5 is region layers in DRML algorithm；

Fig. 6 is the structural schematic diagram of R-T1 algorithm；

Fig. 7 is the structural schematic diagram of a LSTM memory unit.

Specific embodiment

The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.The present embodiment is with technical solution of the present invention Premised on implemented, the detailed implementation method and specific operation process are given, but protection scope of the present invention is not limited to Following embodiments.

Embodiment 1

The present embodiment provides a kind of face moving cell detection method based on physical characteristic and distribution character, this method base Multi-tag learning classification is carried out to one group of sequence of pictures in face moving cell (AU) detection model of pre-training, obtains face fortune Moving cell testing result, the face moving cell detection model include sequentially connected intersection splicing network and shot and long term memory Network.

The present embodiment is based on this deep learning frame of caffe, realizes that the face moving cell is examined in Ubuntu system Model is surveyed, which does not need to increase special layer, it is only necessary to modify the existing layer structure of caffe i.e. It is achievable, realize that difficulty is very low, test performance is very good on public AU detection algorithm data set BP4D, DISFA, GFT, surpasses Other algorithms to do very well on these data sets at present are crossed.

The realization of the detection method includes foundation, training and the test of face moving cell detection model, is carried out below Detailed description:

1, the foundation of face moving cell detection model

It is constructed, but is devised as shown in Figure 1, the face moving cell detection model of the present embodiment is based on VGG network Intersection splicing block (cross-concat block) is substituted convolutional layer all in VGG and has obtained intersecting splicing network (cross-concat network) is connected with a LSTM network composition face moving cell detection mould after intersecting splicing network Type.Output is plurality of human faces moving cell after input picture (Input frames) sequentially enters intersection splicing network and LSTM network Label (AU labels).

As shown in Fig. 2, input data directly passes through convolutional layer, using a ReLU in convolutional layer (conv) in VGG Activation primitive is exported.Intersecting splicing block to be connected with convolutional layer, one that a convolution kernel size is 3 × 3 in turn includes just Then change the activation primitive of (BN) and ReLU, the convolutional layer and a ReLU activation primitive that a convolution kernel size is 1 × 1, and And be connected to convolution kernel size be 3 × 3 convolutional layers input and the activation primitive including regularization (BN) and ReLU output into Row splicing (C), one layer of feature upper in this way and next layer of feature are merged, in order to keep and original identical channel Number.The convolutional layer that convolution kernel size is 1 × 1 is used to adjust to keep and identical port number originally.

Usually, during the extraction process, the feature more bottom that the layer more first calculated extracts is finer, higher for feature The feature that the layer that layer calculates afterwards extracts more is partial to semantic feature, by the fusion of this feature between layers, strengthens The extraction of low-level image feature, finer feature make in limited amount of training data study to sufficient enough characteristic information For differentiating that AU is possibly realized, therefore, cross-concat block intersects splicing block, itself has to different AU distribution not Sensitive characteristic, even if some AU sample sizes are few, also this face moving cell detection model extracts sufficient feature enough Carry out having detected for AU.

The input of model: similar with R-T1 algorithm, the input of this face moving cell detection model is sequence of pictures, corresponding Use LSTM network to learn the timing information of AU.The places different from R-T1 are that each sequence of R-T1 algorithm is chosen 24 frame methods be that, for present frame picture, part before this picture randomly chooses 23 pictures in video, according to when Between be sequentially arranged as sequence of pictures；And face moving cell detection model of the present invention uses 32 pictures of successive frame.Sequence chart In piece, adjacent two sequential correlations are more reinforced, and since video is that 25 frames are per second, the time interval between every two frame only has 40 millis Second, face AU is difficult have very big variation in 40 milliseconds, thus sequence of pictures of the present invention has very stronger timing Information.Meanwhile the selection of how many picture is used for a sequence, by testing 16,24,32 three parameters, as the result is shown Effect is best when 32 pictures are as a sequence, thus selects 32 pictures as a sequence.

Loss function: present invention employs loss (loss) functions identical with DRML algorithm: multi-tag sigmoid is handed over Pitch entropy function, expression formula are as follows:

L is loss in formula.Y be ground truth, true value, i.e., this input picture actually these AU whether deposit In；For the value that algorithm calculates, i.e. the algorithm AU situation that detected picture；N is number of pictures, this is because depth Practising in network is usually that a collection of picture of a batch is sent into e-learning, so N refers to that how many picture N of a batch is greater than equal to 1 Integer；C is number of tags, i.e., how many kind AU needs to detect, and n is the index of N, and c is the index of C.

The advantages of loss function, is, devises the processing to invalid tag, therefore can be by some negative samples Be set as the mode of invalid tag, that is, true value be -1 be set as 0 be not involved in loss (loss) calculating, can put down in this way Weigh the positive and negative sample distribution of same AU, enable training pattern when model training more preferably.

2, the training of face moving cell detection model

The training of face moving cell detection model including the following steps:

1) training data is handled: obtaining the picture and label information of training

Data set is divided into three parts, a copy of it is used as training data.Take continuous 32 frame as defeated training data Enter sequence of pictures.The label of training data makes label file according to the true value that data set provides.Label file is a txt File, every a line represent the label of a specific figure, such as a certain behavior " 1.jpg 110 0-1-1-1-1-1-1 - 1-1 " it then represents and occurs the AU that the 1st, 2 dimensions represent in the picture of entitled 1.jpg, the 5th to the 12nd dimension do not occur and represent AU, and the 3rd, 4 dimension represent AU data invalid.In BP4D and GFT data set, every figure is direct with the presence or absence of certain AU It provides, it is possible to directly use, and in DISFA data set, AU, which whether there is, not to be directly given, but gives different AU The intensity (intensity can be totally 6 kinds of 0, A, B, C, D, E) of appearance, we have selected being considered there are this AU of C, D, E intensity, 0, A, B intensity are considered there is no the AU.

2) network configuration parameters

Trained parameter configuration is as follows: initial learning rate is set as 0.001, and learning rate is using gradually decline strategy, configuration Parameter gamma is 0.1, stepsize 15000；Momentum momentum is set as 0.9, and maximum number of iterations is 50000 times, often 32 pictures of secondary training input are learnt as a sequence.

3) training code

Using caffe deep learning frame, DRML is added and discloses multi-tag input layer and multi-tag sigmoid in code Intersect entropy function layer, compiled in Ubuntu system, configures good algorithm network parameter, be trained.

3, the test of face moving cell detection model

The test of face moving cell detection model will first carry out the selection of threshold value, then be tested.

1) threshold value selects

Second part data in data set are used as to adjust parameter according to (first part's data are used to train), adjustment is sentenced The threshold value that disconnected AU whether there is is asked and is adjusting parameter to select the detection of AU algorithm according to AU algorithm testing result above under different threshold values As a result threshold value when preferably is as the threshold parameter adjusted.

2) it tests

Part III data in data set are used as test data, and (first part is used to train, and second part is used To adjust ginseng), it using the optimal threshold selected when adjusting and joining, is tested, calculating AU whether there is.

Embodiment 2

The present embodiment is in order to test performance of the invention, with detection side of the invention on 3 public AU detection data collection Method is tested, and in experiment, detection method is referred to as CCT (cross-concat and temporal network), In order to preferably show the effect for intersecting splicing block (cross-concat block), the network abbreviation for removing LSTM in CCT For CC.

General AU detection algorithm measure of effectiveness standard includes two kinds, F1score and AUC.Wherein F1score is accurate The harmonic-mean of rate (Precision) and recall rate (Recall), AUC are the area under the ROC curve. F1 score and AUC are that higher to represent algorithm detection effect better.

Table 2, table 3 and table 4 respectively illustrate the contrast and experiment on these three data sets.

Algorithm comparing result (adding the black matrix table of bracket optimal, black matrix tabular order is excellent) on 2 BP4D data set of table

Algorithm comparing result (adding the black matrix table of bracket optimal, black matrix tabular order is excellent) on 3 DISFA data set of table

Algorithm F1score comparing result on 4 GFT data set of table (black matrix table is optimal)

To inventive algorithm (CCT) and CC algorithm, DRML algorithm, R-T1 algorithm, JPML algorithm, CNN+LSTM algorithm, APL Algorithm, EAC algorithm, CPM algorithm are compared.(data on table are selected from the article that these algorithms are delivered, some algorithm number According to missing, for example R-T1 algorithm does not provide AUC's as a result, the reason is that these algorithms do not announce corresponding test result；By In on GFT data set other algorithms only calculate F1 and do not calculate AUC, so also only giving F1's as a result, not being AUC Comparison.)

From experimental result as can be seen that inventive algorithm can promote the effect of AU detection.Moreover, in this number of AU1, AU2 It is especially apparent according to the effect promoting that the AU that amount accounts for disadvantage is detected above.

The preferred embodiment of the present invention has been described in detail above.It should be appreciated that those skilled in the art without It needs creative work according to the present invention can conceive and makes many modifications and variations.Therefore, all technologies in the art Personnel are available by logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea Technical solution, all should be within the scope of protection determined by the claims.

Claims

1. a kind of face moving cell detection method based on physical characteristic and distribution character, which is characterized in that this method is based on The face moving cell detection model of pre-training handles one group of sequence of pictures, obtains face moving cell testing result, The face moving cell detection model includes sequentially connected intersection splicing network and shot and long term memory network.

2. the face moving cell detection method according to claim 1 based on physical characteristic and distribution character, feature It is, the splicing network that intersects is based on the VGG network for intersecting splicing block.

3. the face moving cell detection method according to claim 2 based on physical characteristic and distribution character, feature It is, each intersection splicing block includes sequentially connected first convolutional layer and the first activation primitive, the first activation letter Several output and the input of the first convolutional layer are spliced.

4. the face moving cell detection method according to claim 3 based on physical characteristic and distribution character, feature It is, first activation primitive includes regularization.

5. the face moving cell detection method according to claim 3 or 4 based on physical characteristic and distribution character, special Sign is, is connected with the second convolutional layer after the output of first activation primitive and the input splicing of the first convolutional layer, described the The convolution kernel size of two convolutional layers is 1 × 1.

6. the face moving cell detection method according to claim 1 based on physical characteristic and distribution character, feature It is, the sequence of pictures is the picture of successive frame.

7. the face moving cell detection method according to claim 6 based on physical characteristic and distribution character, feature It is, the picture there are 32.

8. the face moving cell detection method according to claim 1 based on physical characteristic and distribution character, feature It is, the loss function of the face moving cell detection model pre-training is that multi-tag sigmoid intersects entropy function.