CN110458021A - A kind of face moving cell detection method based on physical characteristic and distribution character - Google Patents

A kind of face moving cell detection method based on physical characteristic and distribution character Download PDF

Info

Publication number
CN110458021A
CN110458021A CN201910620049.XA CN201910620049A CN110458021A CN 110458021 A CN110458021 A CN 110458021A CN 201910620049 A CN201910620049 A CN 201910620049A CN 110458021 A CN110458021 A CN 110458021A
Authority
CN
China
Prior art keywords
moving cell
face moving
cell detection
physical characteristic
detection method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910620049.XA
Other languages
Chinese (zh)
Inventor
胡巧平
申瑞民
姜飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201910620049.XA priority Critical patent/CN110458021A/en
Publication of CN110458021A publication Critical patent/CN110458021A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/41Higher-level, semantic clustering, classification or understanding of video scenes, e.g. detection, labelling or Markovian modelling of sport events or news items
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Mathematical Physics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Image Analysis (AREA)

Abstract

The face moving cell detection method based on physical characteristic and distribution character that the present invention relates to a kind of, it is characterized in that, this method is handled one group of sequence of pictures based on the face moving cell detection model of pre-training, face moving cell testing result is obtained, the face moving cell detection model includes sequentially connected intersection splicing network and shot and long term memory network.Compared with prior art, the present invention considers for the first time and solves the problems, such as that data distribution is unbalanced between different faces moving cell, further improves face moving cell detection effect.

Description

A kind of face moving cell detection method based on physical characteristic and distribution character
Technical field
The present invention relates to field of computer technology, are transported more particularly, to a kind of based on the face of physical characteristic and distribution character Moving cell detection method.
Background technique
The facial expression analysis of people is the key areas of artificial intelligence, and face moving cell (AU, action unit) Detection it is most important for human face's Expression analysis.The facial expression of people is face caused by the facial muscle movement of people Portion acts coded system and the movement of one or more muscle is known as face moving cell one by one.The almost all of table of face Feelings can be indicated with the combination of a face moving cell or multiple face moving cells.Such as: laughing at can above be mentioned with cheek (AU6) and the corners of the mouth raise up (AU12) combination indicate, as shown in Figure 3.
The detection of face moving cell seeks to come out which kind people someone occurs on the face according to picture or video detection Face moving cell, as soon as such as a people laugh at, be likely to occur that cheek above mentions (AU6) and the corners of the mouth raises up (AU12).
For the facial expression analysis of people, the facial expression (Emotion) of people can be generally divided into seven by researcher Class: glad (Happiness), sad (Sadness), surprised (Surprise), frightened (Fear), angry (Anger), detest (Disgust) and despise (Contempt), referred to as pervasive facial expression.Every kind of expression therein can with the combination of AU come It indicates, this seven kinds of facial expressions and the corresponding relationship of face moving cell are as shown in table 1.Therefore, it can detect that face movement is single In the case where member, pervasive facial expression can be obtained directly according to table 1.
Corresponding relationship between the pervasive expression of table 1 and AU
Emotion Action units
Happiness 6+12
Sadness 1+4+15
Surprise 1+2+5B+26
Fear 1+2+4+5+7+20+26
Anger 4+5+7+23
Disgust 9+15+16
Contempt R12A+R14A
In addition, the demand of face facial expression analysis is insufficient in practical application scene with seven kinds of pervasive expressions, For example the expression of hospital's detection Principle of Pain, the puzzled expression etc. of detection student in education activities, these expressions can not be with pervasive Seven kinds of expressions express, it is necessary to described with this detailed muscular movement of AU.Therefore, facial table of the AU detection for people Mutual affection analysis, is very important.
AU detection algorithm seeks to extract face characteristic and classified calculating goes out face and has which AU to occur.Existing AU Detection algorithm, which is substantially focused on, removes contrived experiment using the 2 big physical characteristics of AU.1) temporal characteristics, for video data Speech, because AU is a continuous movement, timing information is very important AU detection, for example, when the 1st second this People is laughing at, he is also very big a possibility that laughing at the 1.5th second;2) correlation between AU, the correlation between AU are Refer to that certain AU often occur together, certain AU have mutual exclusion characteristic and will not usually occur together, and most typical example is exactly general Expression is fitted with the corresponding relationship between AU, since pervasive expression is the expression often occurred in people's daily life, their institutes are right The AU combination answered is the AU combination often occurred together;3) region characteristic, according to the definition of AU and source, for specific a certain For AU, it is only related with a certain piece of region of face whether AU occurs, such as AU12 only related with the region around the corners of the mouth, with face Other parts include that the regions such as eyes forehead are unrelated, and region characteristic can also be referred to as sparse characteristic.AU detection so far Algorithm is essentially all the physical characteristic using above-mentioned AU, carries out exquisite algorithm design, the detection knot of Lai Tisheng AU Fruit.And algorithm described herein, it is also special other than in view of the physical characteristic (including temporal characteristics and AU correlation) of AU The distribution character (being unevenly distributed weighing apparatus characteristic) of AU is not considered, that is, the probability that different AU occurs is different, some AU are in daily life Often occur in work, and some AU then considerably less appearance, to realize higher detection effect.
The basic common sense of AU algorithm: the input of algorithm is picture or video, and the output of algorithm is to judge whether various AU deposit In.For a certain specific AU, output only exists and two states is not present, so be two classification problems, and due to Algorithm is to find out a variety of AU and both know about whether there is, so AU detection algorithm will solve the problems, such as it is that two classification of multi-tag are asked Topic a, that is to say, that algorithm should obtain the result that multiple AU whether there is respectively.
Document " Deep region and multi-label learning for facial action unit Detection " (Kaili Zhao, Wen Sheng Chu, Honggang Zhang, in Computer Vision and Pattern Recognition, 2016, pp.3391-3399) DRML algorithm is disclosed, DRML algorithm is based on typical depth Neural network AlexNet network structure, it is contemplated that two big physical characteristics in AU detection: the correlation between region characteristic and AU Property.
As shown in figure 4, DRML algorithm deletes it on the basis of typical deep neural network AlexNet network structure His pond (pooling) layer only remains one layer of pond (pooling) layer, and increases the region of author's designed, designed Layer, for learning the region characteristic of face.
Specifically, DRML network structure input picture is the face picture (Aligned face image) after calibration, figure Piece size be it is wide and it is high be all 170 pixels the leftmost side colour element face single picture (see Fig. 4).Picture is sent to DRML net Network successively passes through conv1 (convolutional layer 1), region2 (region layer 2), pool3 (pond layer 3), conv4 (convolutional layer 4), Conv5 (convolutional layer 5), conv6 (convolutional layer 6), conv7 (convolutional layer 7), fc8 (full articulamentum 8), fc9 (full articulamentum 9), Outpout (output) obtains the output result of 12 kinds of AU for totally 10 layers.Output is also full articulamentum.Other than region layers, Other layers can find and call directly in deep learning frame.
As shown in figure 5, being the schematic diagram of region layers (Region layer).Picture by first layer convolutional calculation it Afterwards, obtain it is 32 wide and it is high be all 160 characteristic pattern (feature map), 160 × 160 characteristic pattern is split as 8 × 8= 64 pieces, each piece are then the small characteristic pattern of 20 × 20 pixels is successively passed through BN layers by the small characteristic patterns of 20 × 20 pixels (batch normalization layer, batch regularization layer), ReLU layers (ReLU function active coating), conv layers (convolution layer, convolutional layer) is finally stitched together 64 pieces of all calculated results, as output.Thus, Region layers of Main physical is meant that the uniform piecemeal of a face, is individually learnt to every piece, then to each piece Result is practised to merge.
In addition, DRML algorithm is in training using multi-tag sigmoid intersection in order to learn to the correlation between AU Entropy function:
L is loss in formula, and Y is true value (ground truth), i.e. whether this input picture actually these AU deposit In;For the value that algorithm calculates, i.e. the algorithm AU situation that detected picture;N is number of pictures, this is because depth Practising in network is usually that a collection of picture of a batch is sent into e-learning, so N refers to that how many a collection of picture, N are more than or equal to 1 Integer;C is number of tags, i.e., how many kind AU needs to detect, and output (output) layer shown in Fig. 4 has 12 outputs, i.e. C= 12, there are 12 kinds of AU needs to detect whether exist;N is the index of N, and c is the index of C.
The true value of every kind of face moving cell can be -1 ,+1,0.Wherein+1 is positive sample, i.e. this AU exists, -1 For negative sample, i.e. this AU is not present, and 0 is invalid.In this way, the true value of every picture can be indicated with 12 dimensional vectors, often It is one-dimensional to represent a kind of AU.Such as: [1,1,0,0, -1, -1, -1, -1, -1, -1, -1, -1] represents this and occurs the 1st the 2nd on the face There is not the representative AU of the 5th to the 12nd dimension in the representative AU of dimension, it is not known that whether the representative AU of the 3rd and the 4th dimension goes out It is existing.According to multi-tag sigmoid cross entropy function formula, the pervious part of "+" is to calculate true value as+1 i.e. figure of positive sample Piece loss, what the later part of "+" calculated is that true value is -1 i.e. picture loss of negative sample, and the true value for working as certain AU is When 0, it all cannot participate in calculating before and after "+", so this can be ignored, be not involved in calculating.
DRML algorithm has the disadvantage in that 1) algorithm basis is AlexNet, and the network number of plies is shallower, leads to learning effect not Good, it is bad that AU detects detection effect;2) do not account for timing information and different AU is unevenly distributed weighing apparatus characteristic, necessarily leads to algorithm It is a little better to the AU detection effect more than those sample sizes, and it is excessively poor to the few AU detection effect of sample size.
Document " Action unit detection with region adaptation, multi-labeling Learning and optimal temporal fusing " (Wei Li, Farnaz Abtahi, and Zhigang Zhu, in The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017) R-T1 algorithm is disclosed, R-T1 algorithm is based on deep learning network VGG network structure, and fully utilize AU three are big Physical characteristic (correlation, region characteristic between temporal characteristics, AU) design, wherein temporal characteristics use LSTM (long Short term system, shot and long term memory network) network, the correlation between AU learnt using multi-tag, and region characteristic is adopted It is realized with the self-designed ROI Nets of author.
As shown in fig. 6, R-T1 algorithm is based on VGG network structure, the part that convolutional layer conv12 is later in VGG is deleted, ROI Nets and LSTM network, the result of a variety of AU of final output are spliced.Damage used by multi-tag study is carried out when training Lose the expression formula of (loss) function are as follows:
In formula, l is true value, and can use 0 or 1,0 is negative sample, i.e., the AU does not occur, and 1 is positive sample, that is, is occurred The AU, P are the probability for the AU occur that algorithm calculates, and author is with 0.05 and 1.05 come loss prevention explosion, limitation loss Range.
Specifically, R-T1 algorithm input data is the picture of W high H 24 wide as a sequence, the choosing method of sequence It is for present frame picture, the part before this picture randomly chooses 23 pictures in video, arranges sequentially in time For sequence of pictures, N number of sequence is inputted every time.Image data passes through conv1 to the conv12 layer of VGG first, obtain 512 14 × 14 characteristic pattern (feature map) is then inputted according to the 20 of picture characteristic points (landmarks) relevant to AU in original Position in picture, is mapped in characteristic pattern, and on characteristic pattern, 3 × 3 region is intercepted centered on characteristic point, extend to 6 × 6 sizes, then convolution is done, finally the region calculated result the mapping of all characteristic points is integrated, as the output of ROI Nets, finally Timing information is obtained using LSTM network structure, obtains the output result (AU Labels) of AU.
ROI Nets explanation: ROI (Region Of Interests) is interested region.ROI Nets is for needle To different AU, it is used to extract the relevant provincial characteristics of AU with reference to 20 characteristic point selection different zones.Characteristic point can provide eyes The position of nose mouth etc. can substantially confirm AU according to characteristic point position since the muscle position that different AU are related to is different Muscle position, thus selection region center.It is mapped to corresponding point on characteristic pattern according to characteristic point position, with this 20 points is The heart, selection region extract 20 regions from characteristic pattern, are expanded (upscale) and convolution (conv) operation, and most Whole result integrates,
LSTM network: LSTM is normally used for the data that processing has timing information, and LSTM is by multiple LSTM memory units Composition, a LSTM memory unit is as shown in Figure 7.In figure, t is current time, and t-1 is last moment, and C is state, and h is defeated Out, Sigmoid is Sigmoid activation primitive, and tanh is tanh activation primitive.Entire LSTM memory unit passes through input gate, something lost Forget door and out gate determines more new information, forgets partial information, and exports result and updated state.
R-T1 algorithm has the disadvantage in that 1) input data not only needs picture, it is also necessary to characteristic point information, so R-T1 The input data of network is more than the data for merely entering picture;2) do not account for timing information and different AU is unevenly distributed Heng Te Property necessarily causes algorithm a little better to the AU detection effect more than those sample sizes, and non-to the few AU detection effect of sample size It is often poor;3) the unbalanced problem of the positive and negative sample data of same AU, negative sample in usual data set are not accounted for (AU does not occur) This number causes AU to examine considerably beyond positive sample (AU appearance) number, the unbalanced training for influencing whether algorithm of positive negative sample It is not high to survey effect.
Summary of the invention
It is an object of the present invention to overcome the above-mentioned drawbacks of the prior art and provide one kind to be based on physical characteristic With the face moving cell detection method of distribution character.
The purpose of the present invention can be achieved through the following technical solutions:
A kind of face moving cell detection method based on physical characteristic and distribution character, which is characterized in that this method base One group of sequence of pictures is handled in the face moving cell detection model of pre-training, obtains face moving cell detection knot Fruit, the face moving cell detection model include sequentially connected intersection splicing network and shot and long term memory network.
Further, the splicing network that intersects is based on the VGG network for intersecting splicing block, i.e., to intersect splicing block replacement All convolutional layers obtain intersecting splicing network in VGG network.
Further, each intersection splicing block includes sequentially connected first convolutional layer and the first activation primitive, institute The output and the input of the first convolutional layer for stating the first activation primitive are spliced, and have just merged upper one layer of feature and next layer in this way Feature.
Further, first activation primitive includes regularization, and regularization weakens unessential characteristic variable automatically, is prevented Only over-fitting.
Further, the second convolution is connected with after the output of first activation primitive and the input splicing of the first convolutional layer Layer, the convolution kernel size of second convolutional layer are 1 × 1, in order to keep and original identical port number.
During the extraction process, the feature more bottom that the layer that is more first calculated extracts is finer for feature, more it is high-rise after The feature that the layer of calculating extracts more is partial to semantic feature, by the fusion of this feature between layers, strengthens low-level image feature Extraction, finer feature makes in limited amount of training data study be used to differentiate AU to enough adequately characteristic informations It is possibly realized, therefore, intersects splicing block, itself have and different AU are distributed with insensitive characteristic, even if some AU sample sizes Few, also our algorithm extracts sufficient feature and carries out having detected for AU enough.
Further, the sequence of pictures is the picture of successive frame.In sequence of pictures, adjacent two sequential correlations are more By force, since video is that 25 frames are per second, the time interval between every two frame only has 40 milliseconds, and face AU is difficult to have in 40 milliseconds Very big variation, thus the sequence of pictures of successive frame has very stronger timing information.
Further, the picture has 32, finds by contrast test, and effect is more when 32 pictures are as a sequence It is good.
Further, loss function used by the pre-training of the face moving cell detection model is multi-tag Sigmoid intersects entropy function, carries out multi-tag study.The processing to invalid tag is devised in the loss function, by one A little negative samples are set as the mode of invalid tag, i.e., true value be -1 be set as 0 be not involved in costing bio disturbance, balance same The positive and negative sample distribution of kind of AU, enable training pattern when model training more preferably.
Compared with prior art, the invention has the following advantages that
(1) face moving cell detection method of the present invention considers that data distribution is uneven between different faces moving cell for the first time Weighing apparatus problem, and proposing intersection splicing block structure keeps network no longer sensitive to different faces moving cell data distribution, in number The promotion of detection effect is especially apparent above the face moving cell being in a disadvantageous position according to amount.
(2) one 1 × 1 convolutional layer is also added in present invention intersection splicing block to be made in this way to restore number of active lanes The convolutional layer number of parameters for obtaining script VGG is retained, thus can be detected using the trained model of VGG-Face to the present invention Method initializes.Since VGG-Face is trained in a large amount of human face datas originally, the feature that it is extracted is exactly The distinctive characteristic information of face, is trained on this basis, greatly accelerates the convergence rate of model of the present invention.
(3) present invention intersects entropy function as loss function using multi-tag sigmoid, devises in the function to invalid The processing of label balances the positive and negative sample distribution of same AU, makes in such a way that some negative samples are set as invalid tag Training pattern when model can be trained more preferably.
(4) present invention using 32 pictures of successive frame as one group of list entries, adjacent two picture in the sequence Sequential correlation is more reinforced, and since video is that 25 frames are per second, the time interval between every two frame only has 40 milliseconds, and face AU is difficult There is very big variation in 40 milliseconds, thus list entries of the present invention has very stronger timing information, and 32 pictures Effect is best when as a sequence.
Detailed description of the invention
Fig. 1 is the structural schematic diagram of face moving cell detection model of the present invention;
Fig. 2 is the structural schematic diagram of the convolutional layer and intersection splicing block in VGG network, wherein (2a) is in VGG network One convolutional layer, (2b) are to intersect convolution block;
Fig. 3 is the face moving cell combination table diagram laughed at, and (3a) is to laugh at figure, and (3b) is that the cheek laughed in figure above mentions (AU6) figure, (3c) are that the corners of the mouth for laughing in figure raises up (AU12) figure;
Fig. 4 is the structural schematic diagram of DRML algorithm;
The structural schematic diagram that Fig. 5 is region layers in DRML algorithm;
Fig. 6 is the structural schematic diagram of R-T1 algorithm;
Fig. 7 is the structural schematic diagram of a LSTM memory unit.
Specific embodiment
The present invention is described in detail with specific embodiment below in conjunction with the accompanying drawings.The present embodiment is with technical solution of the present invention Premised on implemented, the detailed implementation method and specific operation process are given, but protection scope of the present invention is not limited to Following embodiments.
Embodiment 1
The present embodiment provides a kind of face moving cell detection method based on physical characteristic and distribution character, this method base Multi-tag learning classification is carried out to one group of sequence of pictures in face moving cell (AU) detection model of pre-training, obtains face fortune Moving cell testing result, the face moving cell detection model include sequentially connected intersection splicing network and shot and long term memory Network.
The present embodiment is based on this deep learning frame of caffe, realizes that the face moving cell is examined in Ubuntu system Model is surveyed, which does not need to increase special layer, it is only necessary to modify the existing layer structure of caffe i.e. It is achievable, realize that difficulty is very low, test performance is very good on public AU detection algorithm data set BP4D, DISFA, GFT, surpasses Other algorithms to do very well on these data sets at present are crossed.
The realization of the detection method includes foundation, training and the test of face moving cell detection model, is carried out below Detailed description:
1, the foundation of face moving cell detection model
It is constructed, but is devised as shown in Figure 1, the face moving cell detection model of the present embodiment is based on VGG network Intersection splicing block (cross-concat block) is substituted convolutional layer all in VGG and has obtained intersecting splicing network (cross-concat network) is connected with a LSTM network composition face moving cell detection mould after intersecting splicing network Type.Output is plurality of human faces moving cell after input picture (Input frames) sequentially enters intersection splicing network and LSTM network Label (AU labels).
As shown in Fig. 2, input data directly passes through convolutional layer, using a ReLU in convolutional layer (conv) in VGG Activation primitive is exported.Intersecting splicing block to be connected with convolutional layer, one that a convolution kernel size is 3 × 3 in turn includes just Then change the activation primitive of (BN) and ReLU, the convolutional layer and a ReLU activation primitive that a convolution kernel size is 1 × 1, and And be connected to convolution kernel size be 3 × 3 convolutional layers input and the activation primitive including regularization (BN) and ReLU output into Row splicing (C), one layer of feature upper in this way and next layer of feature are merged, in order to keep and original identical channel Number.The convolutional layer that convolution kernel size is 1 × 1 is used to adjust to keep and identical port number originally.
Usually, during the extraction process, the feature more bottom that the layer more first calculated extracts is finer, higher for feature The feature that the layer that layer calculates afterwards extracts more is partial to semantic feature, by the fusion of this feature between layers, strengthens The extraction of low-level image feature, finer feature make in limited amount of training data study to sufficient enough characteristic information For differentiating that AU is possibly realized, therefore, cross-concat block intersects splicing block, itself has to different AU distribution not Sensitive characteristic, even if some AU sample sizes are few, also this face moving cell detection model extracts sufficient feature enough Carry out having detected for AU.
The input of model: similar with R-T1 algorithm, the input of this face moving cell detection model is sequence of pictures, corresponding Use LSTM network to learn the timing information of AU.The places different from R-T1 are that each sequence of R-T1 algorithm is chosen 24 frame methods be that, for present frame picture, part before this picture randomly chooses 23 pictures in video, according to when Between be sequentially arranged as sequence of pictures;And face moving cell detection model of the present invention uses 32 pictures of successive frame.Sequence chart In piece, adjacent two sequential correlations are more reinforced, and since video is that 25 frames are per second, the time interval between every two frame only has 40 millis Second, face AU is difficult have very big variation in 40 milliseconds, thus sequence of pictures of the present invention has very stronger timing Information.Meanwhile the selection of how many picture is used for a sequence, by testing 16,24,32 three parameters, as the result is shown Effect is best when 32 pictures are as a sequence, thus selects 32 pictures as a sequence.
Loss function: present invention employs loss (loss) functions identical with DRML algorithm: multi-tag sigmoid is handed over Pitch entropy function, expression formula are as follows:
L is loss in formula.Y be ground truth, true value, i.e., this input picture actually these AU whether deposit In;For the value that algorithm calculates, i.e. the algorithm AU situation that detected picture;N is number of pictures, this is because depth Practising in network is usually that a collection of picture of a batch is sent into e-learning, so N refers to that how many picture N of a batch is greater than equal to 1 Integer;C is number of tags, i.e., how many kind AU needs to detect, and n is the index of N, and c is the index of C.
The true value of every kind of face moving cell can be -1 ,+1,0.Wherein+1 is positive sample, i.e. this AU exists, -1 For negative sample, i.e. this AU is not present, and 0 is invalid.In this way, the true value of every picture can be indicated with 12 dimensional vectors, often It is one-dimensional to represent a kind of AU.Such as: [1,1,0,0, -1, -1, -1, -1, -1, -1, -1, -1] represents this and occurs the 1st the 2nd on the face There is not the representative AU of the 5th to the 12nd dimension in the representative AU of dimension, it is not known that whether the representative AU of the 3rd and the 4th dimension goes out It is existing.According to multi-tag sigmoid cross entropy function formula, the pervious part of "+" is to calculate true value as+1 i.e. figure of positive sample Piece loss, what the later part of "+" calculated is that true value is -1 i.e. picture loss of negative sample, and the true value for working as certain AU is When 0, it all cannot participate in calculating before and after "+", so this can be ignored, be not involved in calculating.
The advantages of loss function, is, devises the processing to invalid tag, therefore can be by some negative samples Be set as the mode of invalid tag, that is, true value be -1 be set as 0 be not involved in loss (loss) calculating, can put down in this way Weigh the positive and negative sample distribution of same AU, enable training pattern when model training more preferably.
2, the training of face moving cell detection model
The training of face moving cell detection model including the following steps:
1) training data is handled: obtaining the picture and label information of training
Data set is divided into three parts, a copy of it is used as training data.Take continuous 32 frame as defeated training data Enter sequence of pictures.The label of training data makes label file according to the true value that data set provides.Label file is a txt File, every a line represent the label of a specific figure, such as a certain behavior " 1.jpg 110 0-1-1-1-1-1-1 - 1-1 " it then represents and occurs the AU that the 1st, 2 dimensions represent in the picture of entitled 1.jpg, the 5th to the 12nd dimension do not occur and represent AU, and the 3rd, 4 dimension represent AU data invalid.In BP4D and GFT data set, every figure is direct with the presence or absence of certain AU It provides, it is possible to directly use, and in DISFA data set, AU, which whether there is, not to be directly given, but gives different AU The intensity (intensity can be totally 6 kinds of 0, A, B, C, D, E) of appearance, we have selected being considered there are this AU of C, D, E intensity, 0, A, B intensity are considered there is no the AU.
2) network configuration parameters
Trained parameter configuration is as follows: initial learning rate is set as 0.001, and learning rate is using gradually decline strategy, configuration Parameter gamma is 0.1, stepsize 15000;Momentum momentum is set as 0.9, and maximum number of iterations is 50000 times, often 32 pictures of secondary training input are learnt as a sequence.
3) training code
Using caffe deep learning frame, DRML is added and discloses multi-tag input layer and multi-tag sigmoid in code Intersect entropy function layer, compiled in Ubuntu system, configures good algorithm network parameter, be trained.
3, the test of face moving cell detection model
The test of face moving cell detection model will first carry out the selection of threshold value, then be tested.
1) threshold value selects
Second part data in data set are used as to adjust parameter according to (first part's data are used to train), adjustment is sentenced The threshold value that disconnected AU whether there is is asked and is adjusting parameter to select the detection of AU algorithm according to AU algorithm testing result above under different threshold values As a result threshold value when preferably is as the threshold parameter adjusted.
2) it tests
Part III data in data set are used as test data, and (first part is used to train, and second part is used To adjust ginseng), it using the optimal threshold selected when adjusting and joining, is tested, calculating AU whether there is.
Embodiment 2
The present embodiment is in order to test performance of the invention, with detection side of the invention on 3 public AU detection data collection Method is tested, and in experiment, detection method is referred to as CCT (cross-concat and temporal network), In order to preferably show the effect for intersecting splicing block (cross-concat block), the network abbreviation for removing LSTM in CCT For CC.
General AU detection algorithm measure of effectiveness standard includes two kinds, F1score and AUC.Wherein F1score is accurate The harmonic-mean of rate (Precision) and recall rate (Recall), AUC are the area under the ROC curve. F1 score and AUC are that higher to represent algorithm detection effect better.
Table 2, table 3 and table 4 respectively illustrate the contrast and experiment on these three data sets.
Algorithm comparing result (adding the black matrix table of bracket optimal, black matrix tabular order is excellent) on 2 BP4D data set of table
Algorithm comparing result (adding the black matrix table of bracket optimal, black matrix tabular order is excellent) on 3 DISFA data set of table
Algorithm F1score comparing result on 4 GFT data set of table (black matrix table is optimal)
To inventive algorithm (CCT) and CC algorithm, DRML algorithm, R-T1 algorithm, JPML algorithm, CNN+LSTM algorithm, APL Algorithm, EAC algorithm, CPM algorithm are compared.(data on table are selected from the article that these algorithms are delivered, some algorithm number According to missing, for example R-T1 algorithm does not provide AUC's as a result, the reason is that these algorithms do not announce corresponding test result;By In on GFT data set other algorithms only calculate F1 and do not calculate AUC, so also only giving F1's as a result, not being AUC Comparison.)
From experimental result as can be seen that inventive algorithm can promote the effect of AU detection.Moreover, in this number of AU1, AU2 It is especially apparent according to the effect promoting that the AU that amount accounts for disadvantage is detected above.
The preferred embodiment of the present invention has been described in detail above.It should be appreciated that those skilled in the art without It needs creative work according to the present invention can conceive and makes many modifications and variations.Therefore, all technologies in the art Personnel are available by logical analysis, reasoning, or a limited experiment on the basis of existing technology under this invention's idea Technical solution, all should be within the scope of protection determined by the claims.

Claims (8)

1. a kind of face moving cell detection method based on physical characteristic and distribution character, which is characterized in that this method is based on The face moving cell detection model of pre-training handles one group of sequence of pictures, obtains face moving cell testing result, The face moving cell detection model includes sequentially connected intersection splicing network and shot and long term memory network.
2. the face moving cell detection method according to claim 1 based on physical characteristic and distribution character, feature It is, the splicing network that intersects is based on the VGG network for intersecting splicing block.
3. the face moving cell detection method according to claim 2 based on physical characteristic and distribution character, feature It is, each intersection splicing block includes sequentially connected first convolutional layer and the first activation primitive, the first activation letter Several output and the input of the first convolutional layer are spliced.
4. the face moving cell detection method according to claim 3 based on physical characteristic and distribution character, feature It is, first activation primitive includes regularization.
5. the face moving cell detection method according to claim 3 or 4 based on physical characteristic and distribution character, special Sign is, is connected with the second convolutional layer after the output of first activation primitive and the input splicing of the first convolutional layer, described the The convolution kernel size of two convolutional layers is 1 × 1.
6. the face moving cell detection method according to claim 1 based on physical characteristic and distribution character, feature It is, the sequence of pictures is the picture of successive frame.
7. the face moving cell detection method according to claim 6 based on physical characteristic and distribution character, feature It is, the picture there are 32.
8. the face moving cell detection method according to claim 1 based on physical characteristic and distribution character, feature It is, the loss function of the face moving cell detection model pre-training is that multi-tag sigmoid intersects entropy function.
CN201910620049.XA 2019-07-10 2019-07-10 A kind of face moving cell detection method based on physical characteristic and distribution character Pending CN110458021A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910620049.XA CN110458021A (en) 2019-07-10 2019-07-10 A kind of face moving cell detection method based on physical characteristic and distribution character

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910620049.XA CN110458021A (en) 2019-07-10 2019-07-10 A kind of face moving cell detection method based on physical characteristic and distribution character

Publications (1)

Publication Number Publication Date
CN110458021A true CN110458021A (en) 2019-11-15

Family

ID=68482630

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910620049.XA Pending CN110458021A (en) 2019-07-10 2019-07-10 A kind of face moving cell detection method based on physical characteristic and distribution character

Country Status (1)

Country Link
CN (1) CN110458021A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486867A (en) * 2021-09-07 2021-10-08 北京世纪好未来教育科技有限公司 Face micro-expression recognition method and device, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1731418A (en) * 2005-08-19 2006-02-08 清华大学 Method of robust accurate eye positioning in complicated background image
CN105512624A (en) * 2015-12-01 2016-04-20 天津中科智能识别产业技术研究院有限公司 Smile face recognition method and device for human face image
CN106683087A (en) * 2016-12-26 2017-05-17 华南理工大学 Coated tongue constitution distinguishing method based on depth neural network
CN108304788A (en) * 2018-01-18 2018-07-20 陕西炬云信息科技有限公司 Face identification method based on deep neural network
CN109508660A (en) * 2018-10-31 2019-03-22 上海交通大学 A kind of AU detection method based on video

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1731418A (en) * 2005-08-19 2006-02-08 清华大学 Method of robust accurate eye positioning in complicated background image
CN105512624A (en) * 2015-12-01 2016-04-20 天津中科智能识别产业技术研究院有限公司 Smile face recognition method and device for human face image
CN106683087A (en) * 2016-12-26 2017-05-17 华南理工大学 Coated tongue constitution distinguishing method based on depth neural network
CN108304788A (en) * 2018-01-18 2018-07-20 陕西炬云信息科技有限公司 Face identification method based on deep neural network
CN109508660A (en) * 2018-10-31 2019-03-22 上海交通大学 A kind of AU detection method based on video

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
胡巧平等: "CCT: A Cross-Concat and Temporal Neural Network for Multi-Label Action Unit Detection", 《2018 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO (ICME)》 *
龚震霆等: "卷积神经网络在脑脊液图像分类上的应用", 《计算机工程与设计》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113486867A (en) * 2021-09-07 2021-10-08 北京世纪好未来教育科技有限公司 Face micro-expression recognition method and device, electronic equipment and storage medium
CN113486867B (en) * 2021-09-07 2021-12-14 北京世纪好未来教育科技有限公司 Face micro-expression recognition method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN110532900B (en) Facial expression recognition method based on U-Net and LS-CNN
Ahmed et al. Facial expression recognition using convolutional neural network with data augmentation
CN107633207B (en) AU characteristic recognition methods, device and storage medium
CN107491726B (en) Real-time expression recognition method based on multichannel parallel convolutional neural network
Chen et al. Assessing four neural networks on handwritten digit recognition dataset (MNIST)
CN109101938B (en) Multi-label age estimation method based on convolutional neural network
CN108806792A (en) Deep learning facial diagnosis system
Lei et al. A skin segmentation algorithm based on stacked autoencoders
CN106485214A (en) A kind of eyes based on convolutional neural networks and mouth state identification method
CN104992223A (en) Intensive population estimation method based on deep learning
Zhang et al. Physiognomy: Personality traits prediction by learning
CN109522925A (en) A kind of image-recognizing method, device and storage medium
CN110378383A (en) A kind of picture classification method based on Keras frame and deep neural network
Burie et al. ICFHR2016 competition on the analysis of handwritten text in images of balinese palm leaf manuscripts
CN109753950A (en) Dynamic human face expression recognition method
CN109635668A (en) Facial expression recognizing method and system based on soft label integrated rolled product neural network
CN109190514A (en) Face character recognition methods and system based on two-way shot and long term memory network
CN112861718A (en) Lightweight feature fusion crowd counting method and system
CN114863263B (en) Snakehead fish detection method for blocking in class based on cross-scale hierarchical feature fusion
CN110215216A (en) Based on the with different levels Activity recognition method in skeletal joint point subregion, system
CN109903339A (en) A kind of video group personage's position finding and detection method based on multidimensional fusion feature
CN109101881B (en) Real-time blink detection method based on multi-scale time sequence image
Raval et al. Real-time sign language recognition using computer vision
CN114596605A (en) Expression recognition method with multi-feature fusion
CN113011243A (en) Facial expression analysis method based on capsule network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information

Address after: 200030 Dongchuan Road, Minhang District, Minhang District, Shanghai

Applicant after: SHANGHAI JIAO TONG University

Address before: 200030 Huashan Road, Shanghai, No. 1954, No.

Applicant before: SHANGHAI JIAO TONG University

CB02 Change of applicant information
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191115

RJ01 Rejection of invention patent application after publication