CN110458021A - A kind of face moving cell detection method based on physical characteristic and distribution character - Google Patents
A kind of face moving cell detection method based on physical characteristic and distribution character Download PDFInfo
- Publication number
- CN110458021A CN110458021A CN201910620049.XA CN201910620049A CN110458021A CN 110458021 A CN110458021 A CN 110458021A CN 201910620049 A CN201910620049 A CN 201910620049A CN 110458021 A CN110458021 A CN 110458021A
- Authority
- CN
- China
- Prior art keywords
- moving cell
- face moving
- cell detection
- physical characteristic
- detection method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 71
- 238000012549 training Methods 0.000 claims abstract description 26
- 238000012360 testing method Methods 0.000 claims abstract description 14
- 238000000034 method Methods 0.000 claims abstract description 11
- 230000007787 long-term memory Effects 0.000 claims abstract description 5
- 230000006870 function Effects 0.000 claims description 18
- 230000004913 activation Effects 0.000 claims description 14
- 230000000694 effects Effects 0.000 abstract description 18
- 238000004422 calculation algorithm Methods 0.000 description 62
- 230000014509 gene expression Effects 0.000 description 12
- 230000008921 facial expression Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 8
- 239000000284 extract Substances 0.000 description 7
- 230000002123 temporal effect Effects 0.000 description 6
- 239000011159 matrix material Substances 0.000 description 5
- 238000013135 deep learning Methods 0.000 description 4
- 238000002474 experimental method Methods 0.000 description 4
- 238000010195 expression analysis Methods 0.000 description 4
- 238000000605 extraction Methods 0.000 description 4
- 230000015654 memory Effects 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 230000009471 action Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 210000000887 face Anatomy 0.000 description 3
- 210000003205 muscle Anatomy 0.000 description 3
- 238000005303 weighing Methods 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000008451 emotion Effects 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 230000004927 fusion Effects 0.000 description 2
- 239000000203 mixture Substances 0.000 description 2
- 238000003909 pattern recognition Methods 0.000 description 2
- 238000011176 pooling Methods 0.000 description 2
- 239000013598 vector Substances 0.000 description 2
- 102220473072 Chemerin-like receptor 2_R14Q_mutation Human genes 0.000 description 1
- 241001269238 Data Species 0.000 description 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 description 1
- 240000005373 Panax quinquefolius Species 0.000 description 1
- 235000003140 Panax quinquefolius Nutrition 0.000 description 1
- 102220493465 Sodium/calcium exchanger 3_R12A_mutation Human genes 0.000 description 1
- 230000006978 adaptation Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000011248 coating agent Substances 0.000 description 1
- 238000000576 coating method Methods 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013480 data collection Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007717 exclusion Effects 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 210000001097 facial muscle Anatomy 0.000 description 1
- 210000001061 forehead Anatomy 0.000 description 1
- 235000008434 ginseng Nutrition 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003387 muscular Effects 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000002265 prevention Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000001737 promoting effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V20/00—Scenes; Scene-specific elements
- G06V20/40—Scenes; Scene-specific elements in video content
- G06V20/41—Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
- G06V40/176—Dynamic expression
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Multimedia (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The face moving cell detection method based on physical characteristic and distribution character that the present invention relates to a kind of, it is characterized in that, this method is handled one group of sequence of pictures based on the face moving cell detection model of pre-training, face moving cell testing result is obtained, the face moving cell detection model includes sequentially connected intersection splicing network and shot and long term memory network.Compared with prior art, the present invention considers for the first time and solves the problems, such as that data distribution is unbalanced between different faces moving cell, further improves face moving cell detection effect.
Description
Technical field
The present invention relates to field of computer technology, are transported more particularly, to a kind of based on the face of physical characteristic and distribution character
Moving cell detection method.
Background technique
The facial expression analysis of people is the key areas of artificial intelligence, and face moving cell (AU, action unit)
Detection it is most important for human face's Expression analysis.The facial expression of people is face caused by the facial muscle movement of people
Portion acts coded system and the movement of one or more muscle is known as face moving cell one by one.The almost all of table of face
Feelings can be indicated with the combination of a face moving cell or multiple face moving cells.Such as: laughing at can above be mentioned with cheek
(AU6) and the corners of the mouth raise up (AU12) combination indicate, as shown in Figure 3.
The detection of face moving cell seeks to come out which kind people someone occurs on the face according to picture or video detection
Face moving cell, as soon as such as a people laugh at, be likely to occur that cheek above mentions (AU6) and the corners of the mouth raises up (AU12).
For the facial expression analysis of people, the facial expression (Emotion) of people can be generally divided into seven by researcher
Class: glad (Happiness), sad (Sadness), surprised (Surprise), frightened (Fear), angry (Anger), detest
(Disgust) and despise (Contempt), referred to as pervasive facial expression.Every kind of expression therein can with the combination of AU come
It indicates, this seven kinds of facial expressions and the corresponding relationship of face moving cell are as shown in table 1.Therefore, it can detect that face movement is single
In the case where member, pervasive facial expression can be obtained directly according to table 1.
Corresponding relationship between the pervasive expression of table 1 and AU
Emotion | Action units |
Happiness | 6+12 |
Sadness | 1+4+15 |
Surprise | 1+2+5B+26 |
Fear | 1+2+4+5+7+20+26 |
Anger | 4+5+7+23 |
Disgust | 9+15+16 |
Contempt | R12A+R14A |
In addition, the demand of face facial expression analysis is insufficient in practical application scene with seven kinds of pervasive expressions,
For example the expression of hospital's detection Principle of Pain, the puzzled expression etc. of detection student in education activities, these expressions can not be with pervasive
Seven kinds of expressions express, it is necessary to described with this detailed muscular movement of AU.Therefore, facial table of the AU detection for people
Mutual affection analysis, is very important.
AU detection algorithm seeks to extract face characteristic and classified calculating goes out face and has which AU to occur.Existing AU
Detection algorithm, which is substantially focused on, removes contrived experiment using the 2 big physical characteristics of AU.1) temporal characteristics, for video data
Speech, because AU is a continuous movement, timing information is very important AU detection, for example, when the 1st second this
People is laughing at, he is also very big a possibility that laughing at the 1.5th second;2) correlation between AU, the correlation between AU are
Refer to that certain AU often occur together, certain AU have mutual exclusion characteristic and will not usually occur together, and most typical example is exactly general
Expression is fitted with the corresponding relationship between AU, since pervasive expression is the expression often occurred in people's daily life, their institutes are right
The AU combination answered is the AU combination often occurred together;3) region characteristic, according to the definition of AU and source, for specific a certain
For AU, it is only related with a certain piece of region of face whether AU occurs, such as AU12 only related with the region around the corners of the mouth, with face
Other parts include that the regions such as eyes forehead are unrelated, and region characteristic can also be referred to as sparse characteristic.AU detection so far
Algorithm is essentially all the physical characteristic using above-mentioned AU, carries out exquisite algorithm design, the detection knot of Lai Tisheng AU
Fruit.And algorithm described herein, it is also special other than in view of the physical characteristic (including temporal characteristics and AU correlation) of AU
The distribution character (being unevenly distributed weighing apparatus characteristic) of AU is not considered, that is, the probability that different AU occurs is different, some AU are in daily life
Often occur in work, and some AU then considerably less appearance, to realize higher detection effect.
The basic common sense of AU algorithm: the input of algorithm is picture or video, and the output of algorithm is to judge whether various AU deposit
In.For a certain specific AU, output only exists and two states is not present, so be two classification problems, and due to
Algorithm is to find out a variety of AU and both know about whether there is, so AU detection algorithm will solve the problems, such as it is that two classification of multi-tag are asked
Topic a, that is to say, that algorithm should obtain the result that multiple AU whether there is respectively.
Document " Deep region and multi-label learning for facial action unit
Detection " (Kaili Zhao, Wen Sheng Chu, Honggang Zhang, in Computer Vision and
Pattern Recognition, 2016, pp.3391-3399) DRML algorithm is disclosed, DRML algorithm is based on typical depth
Neural network AlexNet network structure, it is contemplated that two big physical characteristics in AU detection: the correlation between region characteristic and AU
Property.
As shown in figure 4, DRML algorithm deletes it on the basis of typical deep neural network AlexNet network structure
His pond (pooling) layer only remains one layer of pond (pooling) layer, and increases the region of author's designed, designed
Layer, for learning the region characteristic of face.
Specifically, DRML network structure input picture is the face picture (Aligned face image) after calibration, figure
Piece size be it is wide and it is high be all 170 pixels the leftmost side colour element face single picture (see Fig. 4).Picture is sent to DRML net
Network successively passes through conv1 (convolutional layer 1), region2 (region layer 2), pool3 (pond layer 3), conv4 (convolutional layer 4),
Conv5 (convolutional layer 5), conv6 (convolutional layer 6), conv7 (convolutional layer 7), fc8 (full articulamentum 8), fc9 (full articulamentum 9),
Outpout (output) obtains the output result of 12 kinds of AU for totally 10 layers.Output is also full articulamentum.Other than region layers,
Other layers can find and call directly in deep learning frame.
As shown in figure 5, being the schematic diagram of region layers (Region layer).Picture by first layer convolutional calculation it
Afterwards, obtain it is 32 wide and it is high be all 160 characteristic pattern (feature map), 160 × 160 characteristic pattern is split as 8 × 8=
64 pieces, each piece are then the small characteristic pattern of 20 × 20 pixels is successively passed through BN layers by the small characteristic patterns of 20 × 20 pixels
(batch normalization layer, batch regularization layer), ReLU layers (ReLU function active coating), conv layers
(convolution layer, convolutional layer) is finally stitched together 64 pieces of all calculated results, as output.Thus,
Region layers of Main physical is meant that the uniform piecemeal of a face, is individually learnt to every piece, then to each piece
Result is practised to merge.
In addition, DRML algorithm is in training using multi-tag sigmoid intersection in order to learn to the correlation between AU
Entropy function:
L is loss in formula, and Y is true value (ground truth), i.e. whether this input picture actually these AU deposit
In;For the value that algorithm calculates, i.e. the algorithm AU situation that detected picture;N is number of pictures, this is because depth
Practising in network is usually that a collection of picture of a batch is sent into e-learning, so N refers to that how many a collection of picture, N are more than or equal to 1
Integer;C is number of tags, i.e., how many kind AU needs to detect, and output (output) layer shown in Fig. 4 has 12 outputs, i.e. C=
12, there are 12 kinds of AU needs to detect whether exist;N is the index of N, and c is the index of C.
The true value of every kind of face moving cell can be -1 ,+1,0.Wherein+1 is positive sample, i.e. this AU exists, -1
For negative sample, i.e. this AU is not present, and 0 is invalid.In this way, the true value of every picture can be indicated with 12 dimensional vectors, often
It is one-dimensional to represent a kind of AU.Such as: [1,1,0,0, -1, -1, -1, -1, -1, -1, -1, -1] represents this and occurs the 1st the 2nd on the face
There is not the representative AU of the 5th to the 12nd dimension in the representative AU of dimension, it is not known that whether the representative AU of the 3rd and the 4th dimension goes out
It is existing.According to multi-tag sigmoid cross entropy function formula, the pervious part of "+" is to calculate true value as+1 i.e. figure of positive sample
Piece loss, what the later part of "+" calculated is that true value is -1 i.e. picture loss of negative sample, and the true value for working as certain AU is
When 0, it all cannot participate in calculating before and after "+", so this can be ignored, be not involved in calculating.
DRML algorithm has the disadvantage in that 1) algorithm basis is AlexNet, and the network number of plies is shallower, leads to learning effect not
Good, it is bad that AU detects detection effect;2) do not account for timing information and different AU is unevenly distributed weighing apparatus characteristic, necessarily leads to algorithm
It is a little better to the AU detection effect more than those sample sizes, and it is excessively poor to the few AU detection effect of sample size.
Document " Action unit detection with region adaptation, multi-labeling
Learning and optimal temporal fusing " (Wei Li, Farnaz Abtahi, and Zhigang Zhu, in
The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July
2017) R-T1 algorithm is disclosed, R-T1 algorithm is based on deep learning network VGG network structure, and fully utilize AU three are big
Physical characteristic (correlation, region characteristic between temporal characteristics, AU) design, wherein temporal characteristics use LSTM (long
Short term system, shot and long term memory network) network, the correlation between AU learnt using multi-tag, and region characteristic is adopted
It is realized with the self-designed ROI Nets of author.
As shown in fig. 6, R-T1 algorithm is based on VGG network structure, the part that convolutional layer conv12 is later in VGG is deleted,
ROI Nets and LSTM network, the result of a variety of AU of final output are spliced.Damage used by multi-tag study is carried out when training
Lose the expression formula of (loss) function are as follows:
In formula, l is true value, and can use 0 or 1,0 is negative sample, i.e., the AU does not occur, and 1 is positive sample, that is, is occurred
The AU, P are the probability for the AU occur that algorithm calculates, and author is with 0.05 and 1.05 come loss prevention explosion, limitation loss
Range.
Specifically, R-T1 algorithm input data is the picture of W high H 24 wide as a sequence, the choosing method of sequence
It is for present frame picture, the part before this picture randomly chooses 23 pictures in video, arranges sequentially in time
For sequence of pictures, N number of sequence is inputted every time.Image data passes through conv1 to the conv12 layer of VGG first, obtain 512 14 ×
14 characteristic pattern (feature map) is then inputted according to the 20 of picture characteristic points (landmarks) relevant to AU in original
Position in picture, is mapped in characteristic pattern, and on characteristic pattern, 3 × 3 region is intercepted centered on characteristic point, extend to 6 ×
6 sizes, then convolution is done, finally the region calculated result the mapping of all characteristic points is integrated, as the output of ROI Nets, finally
Timing information is obtained using LSTM network structure, obtains the output result (AU Labels) of AU.
ROI Nets explanation: ROI (Region Of Interests) is interested region.ROI Nets is for needle
To different AU, it is used to extract the relevant provincial characteristics of AU with reference to 20 characteristic point selection different zones.Characteristic point can provide eyes
The position of nose mouth etc. can substantially confirm AU according to characteristic point position since the muscle position that different AU are related to is different
Muscle position, thus selection region center.It is mapped to corresponding point on characteristic pattern according to characteristic point position, with this 20 points is
The heart, selection region extract 20 regions from characteristic pattern, are expanded (upscale) and convolution (conv) operation, and most
Whole result integrates,
LSTM network: LSTM is normally used for the data that processing has timing information, and LSTM is by multiple LSTM memory units
Composition, a LSTM memory unit is as shown in Figure 7.In figure, t is current time, and t-1 is last moment, and C is state, and h is defeated
Out, Sigmoid is Sigmoid activation primitive, and tanh is tanh activation primitive.Entire LSTM memory unit passes through input gate, something lost
Forget door and out gate determines more new information, forgets partial information, and exports result and updated state.
R-T1 algorithm has the disadvantage in that 1) input data not only needs picture, it is also necessary to characteristic point information, so R-T1
The input data of network is more than the data for merely entering picture;2) do not account for timing information and different AU is unevenly distributed Heng Te
Property necessarily causes algorithm a little better to the AU detection effect more than those sample sizes, and non-to the few AU detection effect of sample size
It is often poor;3) the unbalanced problem of the positive and negative sample data of same AU, negative sample in usual data set are not accounted for (AU does not occur)
This number causes AU to examine considerably beyond positive sample (AU appearance) number, the unbalanced training for influencing whether algorithm of positive negative sample
It is not high to survey effect.
Summary of the invention
It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide one kind to be based on physical characteristic
With the face moving cell detection method of distribution character.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of face moving cell detection method based on physical characteristic and distribution character, which is characterized in that this method base
One group of sequence of pictures is handled in the face moving cell detection model of pre-training, obtains face moving cell detection knot
Fruit, the face moving cell detection model include sequentially connected intersection splicing network and shot and long term memory network.
Further, the splicing network that intersects is based on the VGG network for intersecting splicing block, i.e., to intersect splicing block replacement
All convolutional layers obtain intersecting splicing network in VGG network.
Further, each intersection splicing block includes sequentially connected first convolutional layer and the first activation primitive, institute
The output and the input of the first convolutional layer for stating the first activation primitive are spliced, and have just merged upper one layer of feature and next layer in this way
Feature.
Further, first activation primitive includes regularization, and regularization weakens unessential characteristic variable automatically, is prevented
Only over-fitting.
Further, the second convolution is connected with after the output of first activation primitive and the input splicing of the first convolutional layer
Layer, the convolution kernel size of second convolutional layer are 1 × 1, in order to keep and original identical port number.
During the extraction process, the feature more bottom that the layer that is more first calculated extracts is finer for feature, more it is high-rise after
The feature that the layer of calculating extracts more is partial to semantic feature, by the fusion of this feature between layers, strengthens low-level image feature
Extraction, finer feature makes in limited amount of training data study be used to differentiate AU to enough adequately characteristic informations
It is possibly realized, therefore, intersects splicing block, itself have and different AU are distributed with insensitive characteristic, even if some AU sample sizes
Few, also our algorithm extracts sufficient feature and carries out having detected for AU enough.
Further, the sequence of pictures is the picture of successive frame.In sequence of pictures, adjacent two sequential correlations are more
By force, since video is that 25 frames are per second, the time interval between every two frame only has 40 milliseconds, and face AU is difficult to have in 40 milliseconds
Very big variation, thus the sequence of pictures of successive frame has very stronger timing information.
Further, the picture has 32, finds by contrast test, and effect is more when 32 pictures are as a sequence
It is good.
Further, loss function used by the pre-training of the face moving cell detection model is multi-tag
Sigmoid intersects entropy function, carries out multi-tag study.The processing to invalid tag is devised in the loss function, by one
A little negative samples are set as the mode of invalid tag, i.e., true value be -1 be set as 0 be not involved in costing bio disturbance, balance same
The positive and negative sample distribution of kind of AU, enable training pattern when model training more preferably.
Compared with prior art, the invention has the following advantages that
(1) face moving cell detection method of the present invention considers that data distribution is uneven between different faces moving cell for the first time
Weighing apparatus problem, and proposing intersection splicing block structure keeps network no longer sensitive to different faces moving cell data distribution, in number
The promotion of detection effect is especially apparent above the face moving cell being in a disadvantageous position according to amount.
(2) one 1 × 1 convolutional layer is also added in present invention intersection splicing block to be made in this way to restore number of active lanes
The convolutional layer number of parameters for obtaining script VGG is retained, thus can be detected using the trained model of VGG-Face to the present invention
Method initializes.Since VGG-Face is trained in a large amount of human face datas originally, the feature that it is extracted is exactly
The distinctive characteristic information of face, is trained on this basis, greatly accelerates the convergence rate of model of the present invention.
(3) present invention intersects entropy function as loss function using multi-tag sigmoid, devises in the function to invalid
The processing of label balances the positive and negative sample distribution of same AU, makes in such a way that some negative samples are set as invalid tag
Training pattern when model can be trained more preferably.
(4) present invention using 32 pictures of successive frame as one group of list entries, adjacent two picture in the sequence
Sequential correlation is more reinforced, and since video is that 25 frames are per second, the time interval between every two frame only has 40 milliseconds, and face AU is difficult
There is very big variation in 40 milliseconds, thus list entries of the present invention has very stronger timing information, and 32 pictures
Effect is best when as a sequence.
Detailed description of the invention
Fig. 1 is the structural schematic diagram of face moving cell detection model of the present invention;
Fig. 2 is the structural schematic diagram of the convolutional layer and intersection splicing block in VGG network, wherein (2a) is in VGG network
One convolutional layer, (2b) are to intersect convolution block;
Fig. 3 is the face moving cell combination table diagram laughed at, and (3a) is to laugh at figure, and (3b) is that the cheek laughed in figure above mentions
(AU6) figure, (3c) are that the corners of the mouth for laughing in figure raises up (AU12) figure;
Fig. 4 is the structural schematic diagram of DRML algorithm;
The structural schematic diagram that Fig. 5 is region layers in DRML algorithm;
Fig. 6 is the structural schematic diagram of R-T1 algorithm;
Fig. 7 is the structural schematic diagram of a LSTM memory unit.
Specific embodiment
The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.The present embodiment is with technical solution of the present invention
Premised on implemented, the detailed implementation method and specific operation process are given, but protection scope of the present invention is not limited to
Following embodiments.
Embodiment 1
The present embodiment provides a kind of face moving cell detection method based on physical characteristic and distribution character, this method base
Multi-tag learning classification is carried out to one group of sequence of pictures in face moving cell (AU) detection model of pre-training, obtains face fortune
Moving cell testing result, the face moving cell detection model include sequentially connected intersection splicing network and shot and long term memory
Network.
The present embodiment is based on this deep learning frame of caffe, realizes that the face moving cell is examined in Ubuntu system
Model is surveyed, which does not need to increase special layer, it is only necessary to modify the existing layer structure of caffe i.e.
It is achievable, realize that difficulty is very low, test performance is very good on public AU detection algorithm data set BP4D, DISFA, GFT, surpasses
Other algorithms to do very well on these data sets at present are crossed.
The realization of the detection method includes foundation, training and the test of face moving cell detection model, is carried out below
Detailed description:
1, the foundation of face moving cell detection model
It is constructed, but is devised as shown in Figure 1, the face moving cell detection model of the present embodiment is based on VGG network
Intersection splicing block (cross-concat block) is substituted convolutional layer all in VGG and has obtained intersecting splicing network
(cross-concat network) is connected with a LSTM network composition face moving cell detection mould after intersecting splicing network
Type.Output is plurality of human faces moving cell after input picture (Input frames) sequentially enters intersection splicing network and LSTM network
Label (AU labels).
As shown in Fig. 2, input data directly passes through convolutional layer, using a ReLU in convolutional layer (conv) in VGG
Activation primitive is exported.Intersecting splicing block to be connected with convolutional layer, one that a convolution kernel size is 3 × 3 in turn includes just
Then change the activation primitive of (BN) and ReLU, the convolutional layer and a ReLU activation primitive that a convolution kernel size is 1 × 1, and
And be connected to convolution kernel size be 3 × 3 convolutional layers input and the activation primitive including regularization (BN) and ReLU output into
Row splicing (C), one layer of feature upper in this way and next layer of feature are merged, in order to keep and original identical channel
Number.The convolutional layer that convolution kernel size is 1 × 1 is used to adjust to keep and identical port number originally.
Usually, during the extraction process, the feature more bottom that the layer more first calculated extracts is finer, higher for feature
The feature that the layer that layer calculates afterwards extracts more is partial to semantic feature, by the fusion of this feature between layers, strengthens
The extraction of low-level image feature, finer feature make in limited amount of training data study to sufficient enough characteristic information
For differentiating that AU is possibly realized, therefore, cross-concat block intersects splicing block, itself has to different AU distribution not
Sensitive characteristic, even if some AU sample sizes are few, also this face moving cell detection model extracts sufficient feature enough
Carry out having detected for AU.
The input of model: similar with R-T1 algorithm, the input of this face moving cell detection model is sequence of pictures, corresponding
Use LSTM network to learn the timing information of AU.The places different from R-T1 are that each sequence of R-T1 algorithm is chosen
24 frame methods be that, for present frame picture, part before this picture randomly chooses 23 pictures in video, according to when
Between be sequentially arranged as sequence of pictures;And face moving cell detection model of the present invention uses 32 pictures of successive frame.Sequence chart
In piece, adjacent two sequential correlations are more reinforced, and since video is that 25 frames are per second, the time interval between every two frame only has 40 millis
Second, face AU is difficult have very big variation in 40 milliseconds, thus sequence of pictures of the present invention has very stronger timing
Information.Meanwhile the selection of how many picture is used for a sequence, by testing 16,24,32 three parameters, as the result is shown
Effect is best when 32 pictures are as a sequence, thus selects 32 pictures as a sequence.
Loss function: present invention employs loss (loss) functions identical with DRML algorithm: multi-tag sigmoid is handed over
Pitch entropy function, expression formula are as follows:
L is loss in formula.Y be ground truth, true value, i.e., this input picture actually these AU whether deposit
In;For the value that algorithm calculates, i.e. the algorithm AU situation that detected picture;N is number of pictures, this is because depth
Practising in network is usually that a collection of picture of a batch is sent into e-learning, so N refers to that how many picture N of a batch is greater than equal to 1
Integer;C is number of tags, i.e., how many kind AU needs to detect, and n is the index of N, and c is the index of C.
The true value of every kind of face moving cell can be -1 ,+1,0.Wherein+1 is positive sample, i.e. this AU exists, -1
For negative sample, i.e. this AU is not present, and 0 is invalid.In this way, the true value of every picture can be indicated with 12 dimensional vectors, often
It is one-dimensional to represent a kind of AU.Such as: [1,1,0,0, -1, -1, -1, -1, -1, -1, -1, -1] represents this and occurs the 1st the 2nd on the face
There is not the representative AU of the 5th to the 12nd dimension in the representative AU of dimension, it is not known that whether the representative AU of the 3rd and the 4th dimension goes out
It is existing.According to multi-tag sigmoid cross entropy function formula, the pervious part of "+" is to calculate true value as+1 i.e. figure of positive sample
Piece loss, what the later part of "+" calculated is that true value is -1 i.e. picture loss of negative sample, and the true value for working as certain AU is
When 0, it all cannot participate in calculating before and after "+", so this can be ignored, be not involved in calculating.
The advantages of loss function, is, devises the processing to invalid tag, therefore can be by some negative samples
Be set as the mode of invalid tag, that is, true value be -1 be set as 0 be not involved in loss (loss) calculating, can put down in this way
Weigh the positive and negative sample distribution of same AU, enable training pattern when model training more preferably.
2, the training of face moving cell detection model
The training of face moving cell detection model including the following steps:
1) training data is handled: obtaining the picture and label information of training
Data set is divided into three parts, a copy of it is used as training data.Take continuous 32 frame as defeated training data
Enter sequence of pictures.The label of training data makes label file according to the true value that data set provides.Label file is a txt
File, every a line represent the label of a specific figure, such as a certain behavior " 1.jpg 110 0-1-1-1-1-1-1
- 1-1 " it then represents and occurs the AU that the 1st, 2 dimensions represent in the picture of entitled 1.jpg, the 5th to the 12nd dimension do not occur and represent
AU, and the 3rd, 4 dimension represent AU data invalid.In BP4D and GFT data set, every figure is direct with the presence or absence of certain AU
It provides, it is possible to directly use, and in DISFA data set, AU, which whether there is, not to be directly given, but gives different AU
The intensity (intensity can be totally 6 kinds of 0, A, B, C, D, E) of appearance, we have selected being considered there are this AU of C, D, E intensity,
0, A, B intensity are considered there is no the AU.
2) network configuration parameters
Trained parameter configuration is as follows: initial learning rate is set as 0.001, and learning rate is using gradually decline strategy, configuration
Parameter gamma is 0.1, stepsize 15000;Momentum momentum is set as 0.9, and maximum number of iterations is 50000 times, often
32 pictures of secondary training input are learnt as a sequence.
3) training code
Using caffe deep learning frame, DRML is added and discloses multi-tag input layer and multi-tag sigmoid in code
Intersect entropy function layer, compiled in Ubuntu system, configures good algorithm network parameter, be trained.
3, the test of face moving cell detection model
The test of face moving cell detection model will first carry out the selection of threshold value, then be tested.
1) threshold value selects
Second part data in data set are used as to adjust parameter according to (first part's data are used to train), adjustment is sentenced
The threshold value that disconnected AU whether there is is asked and is adjusting parameter to select the detection of AU algorithm according to AU algorithm testing result above under different threshold values
As a result threshold value when preferably is as the threshold parameter adjusted.
2) it tests
Part III data in data set are used as test data, and (first part is used to train, and second part is used
To adjust ginseng), it using the optimal threshold selected when adjusting and joining, is tested, calculating AU whether there is.
Embodiment 2
The present embodiment is in order to test performance of the invention, with detection side of the invention on 3 public AU detection data collection
Method is tested, and in experiment, detection method is referred to as CCT (cross-concat and temporal network),
In order to preferably show the effect for intersecting splicing block (cross-concat block), the network abbreviation for removing LSTM in CCT
For CC.
General AU detection algorithm measure of effectiveness standard includes two kinds, F1score and AUC.Wherein F1score is accurate
The harmonic-mean of rate (Precision) and recall rate (Recall), AUC are the area under the ROC curve.
F1 score and AUC are that higher to represent algorithm detection effect better.
Table 2, table 3 and table 4 respectively illustrate the contrast and experiment on these three data sets.
Algorithm comparing result (adding the black matrix table of bracket optimal, black matrix tabular order is excellent) on 2 BP4D data set of table
Algorithm comparing result (adding the black matrix table of bracket optimal, black matrix tabular order is excellent) on 3 DISFA data set of table
Algorithm F1score comparing result on 4 GFT data set of table (black matrix table is optimal)
To inventive algorithm (CCT) and CC algorithm, DRML algorithm, R-T1 algorithm, JPML algorithm, CNN+LSTM algorithm, APL
Algorithm, EAC algorithm, CPM algorithm are compared.(data on table are selected from the article that these algorithms are delivered, some algorithm number
According to missing, for example R-T1 algorithm does not provide AUC's as a result, the reason is that these algorithms do not announce corresponding test result;By
In on GFT data set other algorithms only calculate F1 and do not calculate AUC, so also only giving F1's as a result, not being AUC
Comparison.)
From experimental result as can be seen that inventive algorithm can promote the effect of AU detection.Moreover, in this number of AU1, AU2
It is especially apparent according to the effect promoting that the AU that amount accounts for disadvantage is detected above.
The preferred embodiment of the present invention has been described in detail above.It should be appreciated that those skilled in the art without
It needs creative work according to the present invention can conceive and makes many modifications and variations.Therefore, all technologies in the art
Personnel are available by logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea
Technical solution, all should be within the scope of protection determined by the claims.
Claims (8)
1. a kind of face moving cell detection method based on physical characteristic and distribution character, which is characterized in that this method is based on
The face moving cell detection model of pre-training handles one group of sequence of pictures, obtains face moving cell testing result,
The face moving cell detection model includes sequentially connected intersection splicing network and shot and long term memory network.
2. the face moving cell detection method according to claim 1 based on physical characteristic and distribution character, feature
It is, the splicing network that intersects is based on the VGG network for intersecting splicing block.
3. the face moving cell detection method according to claim 2 based on physical characteristic and distribution character, feature
It is, each intersection splicing block includes sequentially connected first convolutional layer and the first activation primitive, the first activation letter
Several output and the input of the first convolutional layer are spliced.
4. the face moving cell detection method according to claim 3 based on physical characteristic and distribution character, feature
It is, first activation primitive includes regularization.
5. the face moving cell detection method according to claim 3 or 4 based on physical characteristic and distribution character, special
Sign is, is connected with the second convolutional layer after the output of first activation primitive and the input splicing of the first convolutional layer, described the
The convolution kernel size of two convolutional layers is 1 × 1.
6. the face moving cell detection method according to claim 1 based on physical characteristic and distribution character, feature
It is, the sequence of pictures is the picture of successive frame.
7. the face moving cell detection method according to claim 6 based on physical characteristic and distribution character, feature
It is, the picture there are 32.
8. the face moving cell detection method according to claim 1 based on physical characteristic and distribution character, feature
It is, the loss function of the face moving cell detection model pre-training is that multi-tag sigmoid intersects entropy function.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910620049.XA CN110458021A (en) | 2019-07-10 | 2019-07-10 | A kind of face moving cell detection method based on physical characteristic and distribution character |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910620049.XA CN110458021A (en) | 2019-07-10 | 2019-07-10 | A kind of face moving cell detection method based on physical characteristic and distribution character |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110458021A true CN110458021A (en) | 2019-11-15 |
Family
ID=68482630
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910620049.XA Pending CN110458021A (en) | 2019-07-10 | 2019-07-10 | A kind of face moving cell detection method based on physical characteristic and distribution character |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110458021A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113486867A (en) * | 2021-09-07 | 2021-10-08 | 北京世纪好未来教育科技有限公司 | Face micro-expression recognition method and device, electronic equipment and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1731418A (en) * | 2005-08-19 | 2006-02-08 | 清华大学 | Method of robust accurate eye positioning in complicated background image |
CN105512624A (en) * | 2015-12-01 | 2016-04-20 | 天津中科智能识别产业技术研究院有限公司 | Smile face recognition method and device for human face image |
CN106683087A (en) * | 2016-12-26 | 2017-05-17 | 华南理工大学 | Coated tongue constitution distinguishing method based on depth neural network |
CN108304788A (en) * | 2018-01-18 | 2018-07-20 | 陕西炬云信息科技有限公司 | Face identification method based on deep neural network |
CN109508660A (en) * | 2018-10-31 | 2019-03-22 | 上海交通大学 | A kind of AU detection method based on video |
-
2019
- 2019-07-10 CN CN201910620049.XA patent/CN110458021A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1731418A (en) * | 2005-08-19 | 2006-02-08 | 清华大学 | Method of robust accurate eye positioning in complicated background image |
CN105512624A (en) * | 2015-12-01 | 2016-04-20 | 天津中科智能识别产业技术研究院有限公司 | Smile face recognition method and device for human face image |
CN106683087A (en) * | 2016-12-26 | 2017-05-17 | 华南理工大学 | Coated tongue constitution distinguishing method based on depth neural network |
CN108304788A (en) * | 2018-01-18 | 2018-07-20 | 陕西炬云信息科技有限公司 | Face identification method based on deep neural network |
CN109508660A (en) * | 2018-10-31 | 2019-03-22 | 上海交通大学 | A kind of AU detection method based on video |
Non-Patent Citations (2)
Title |
---|
胡巧平等: "CCT: A Cross-Concat and Temporal Neural Network for Multi-Label Action Unit Detection", 《2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)》 * |
龚震霆等: "卷积神经网络在脑脊液图像分类上的应用", 《计算机工程与设计》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113486867A (en) * | 2021-09-07 | 2021-10-08 | 北京世纪好未来教育科技有限公司 | Face micro-expression recognition method and device, electronic equipment and storage medium |
CN113486867B (en) * | 2021-09-07 | 2021-12-14 | 北京世纪好未来教育科技有限公司 | Face micro-expression recognition method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110532900B (en) | Facial expression recognition method based on U-Net and LS-CNN | |
Ahmed et al. | Facial expression recognition using convolutional neural network with data augmentation | |
CN107633207B (en) | AU characteristic recognition methods, device and storage medium | |
CN107491726B (en) | Real-time expression recognition method based on multichannel parallel convolutional neural network | |
Chen et al. | Assessing four neural networks on handwritten digit recognition dataset (MNIST) | |
CN109101938B (en) | Multi-label age estimation method based on convolutional neural network | |
CN108806792A (en) | Deep learning facial diagnosis system | |
Lei et al. | A skin segmentation algorithm based on stacked autoencoders | |
CN106485214A (en) | A kind of eyes based on convolutional neural networks and mouth state identification method | |
CN104992223A (en) | Intensive population estimation method based on deep learning | |
Zhang et al. | Physiognomy: Personality traits prediction by learning | |
CN109522925A (en) | A kind of image-recognizing method, device and storage medium | |
CN110378383A (en) | A kind of picture classification method based on Keras frame and deep neural network | |
Burie et al. | ICFHR2016 competition on the analysis of handwritten text in images of balinese palm leaf manuscripts | |
CN109753950A (en) | Dynamic human face expression recognition method | |
CN109635668A (en) | Facial expression recognizing method and system based on soft label integrated rolled product neural network | |
CN109190514A (en) | Face character recognition methods and system based on two-way shot and long term memory network | |
CN112861718A (en) | Lightweight feature fusion crowd counting method and system | |
CN114863263B (en) | Snakehead fish detection method for blocking in class based on cross-scale hierarchical feature fusion | |
CN110215216A (en) | Based on the with different levels Activity recognition method in skeletal joint point subregion, system | |
CN109903339A (en) | A kind of video group personage's position finding and detection method based on multidimensional fusion feature | |
CN109101881B (en) | Real-time blink detection method based on multi-scale time sequence image | |
Raval et al. | Real-time sign language recognition using computer vision | |
CN114596605A (en) | Expression recognition method with multi-feature fusion | |
CN113011243A (en) | Facial expression analysis method based on capsule network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
CB02 | Change of applicant information |
Address after: 200030 Dongchuan Road, Minhang District, Minhang District, Shanghai Applicant after: SHANGHAI JIAO TONG University Address before: 200030 Huashan Road, Shanghai, No. 1954, No. Applicant before: SHANGHAI JIAO TONG University |
|
CB02 | Change of applicant information | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191115 |
|
RJ01 | Rejection of invention patent application after publication |