CN108229268A - Expression Recognition and convolutional neural networks model training method, device and electronic equipment - Google Patents
Expression Recognition and convolutional neural networks model training method, device and electronic equipment Download PDFInfo
- Publication number
- CN108229268A CN108229268A CN201611268009.6A CN201611268009A CN108229268A CN 108229268 A CN108229268 A CN 108229268A CN 201611268009 A CN201611268009 A CN 201611268009A CN 108229268 A CN108229268 A CN 108229268A
- Authority
- CN
- China
- Prior art keywords
- expression
- roi
- facial image
- convolutional neural
- neural networks
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
- G06V40/175—Static expression
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/20—Image preprocessing
- G06V10/25—Determination of region of interest [ROI] or a volume of interest [VOI]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/168—Feature extraction; Face representation
- G06V40/171—Local features and components; Facial parts ; Occluding parts, e.g. glasses; Geometrical relationships
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Human Computer Interaction (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
An embodiment of the present invention provides a kind of Expression Recognition and convolutional neural networks model training method, device and electronic equipment, the method includes:By the face key point in the convolution layer segment of convolutional neural networks model and the facial image to be detected of acquisition, human face expression feature extraction is carried out to facial image to be detected, obtains human face expression characteristic pattern;Determine ROI corresponding with each face key point in human face expression characteristic pattern;Pond processing is carried out to determining each ROI by the pond layer segment of convolutional neural networks model, obtains the ROI feature figure behind pond;The Expression Recognition result of facial image is obtained according at least to ROI feature figure.Through the embodiment of the present invention, subtle expression shape change can effectively be captured, the otherness that different facial attitude tapes come can be preferably handled simultaneously, make full use of the detailed information of facial multiple regions variation, have to the face of subtle expression shape change and different postures and more accurately identify.
Description
Technical field
The present embodiments relate to a kind of field of artificial intelligence more particularly to expression recognition method, device and electronics
Equipment and, a kind of convolutional neural networks model training method, device and electronic equipment.
Background technology
Facial expression recognition technology refers to specify an expression classification to given facial image, including:Angry, detest,
Happily, it is sad, frightened, surprised etc..At present, facial expression recognition technology in human-computer interaction, clinical diagnosis, long-distance education, detect
It looks into the fields such as hearing and gradually shows wide application prospect, be the popular research direction of computer vision and artificial intelligence.
A kind of existing facial expression recognition technology is the identification technology based on conventional machines learning framework.Use the tradition
Machine learning frame, which carries out Expression Recognition, can include 4 basic steps:Face datection, face characteristic extraction, Feature Dimension Reduction and
According to tagsort.But:First, face characteristic extraction needs engineer and extracts, and the profession of specific area is needed to know
Know;Second, compared to depth characteristic (feature map), the classical geometric properties such as level of abstractions such as Gabor filter, SIFT
It is weak with ability to express;Third, more and more training datas, training duration is difficult to be utilized in traditional machine learning method, and instructs
It is complicated to practice process dispersion.
Lead to that existing Expression Recognition cost is higher, and Expression Recognition accuracy rate is relatively low as a result,.
Invention content
An embodiment of the present invention provides a kind of Expression Recognition technical solutions.
It is according to embodiments of the present invention in a first aspect, provide a kind of expression recognition method, including:Pass through convolutional Neural net
Face key point in the convolution layer segment of network model and the facial image to be detected obtained, to facial image to be detected into
Pedestrian's face human facial feature extraction obtains human face expression characteristic pattern;It determines crucial with each face in the human face expression characteristic pattern
The corresponding region of interest ROI of point;Pond is carried out to determining each ROI by the pond layer segment of convolutional neural networks model
Change is handled, and obtains the ROI feature figure behind pond;The Expression Recognition of the facial image is obtained according at least to the ROI feature figure
As a result.
Optionally, the facial image includes Static Human Face image.
Optionally, the facial image includes the facial image in sequence of frames of video.
Optionally, according at least to the ROI feature figure obtain the facial image Expression Recognition as a result, including:According to
The ROI feature figure of the facial image of the present frame obtains the preliminary Expression Recognition knot of the facial image of the present frame
Fruit;According to the preliminary Expression Recognition result of the present frame and at least Expression Recognition of the facial image of a prior frame as a result, obtaining
Take the Expression Recognition result of the facial image of the present frame.
Optionally, according to the preliminary Expression Recognition result of the present frame and at least expression of the facial image of a prior frame
Recognition result, obtain the Expression Recognition of the facial image of the present frame as a result, including:By the facial image of the present frame
Preliminary facial expression recognition result and the facial expression recognition result of at least facial image of a prior frame are weighted processing, obtain
The facial image of the present frame Expression Recognition as a result, wherein, the preliminary Expression Recognition of the facial image of the present frame
As a result weight is more than the weight of the Expression Recognition result of the facial image of any prior frame.
Optionally, according to the preliminary Expression Recognition result of the present frame and at least expression of the facial image of a prior frame
Recognition result before the Expression Recognition result for the facial image for obtaining the present frame, further includes:Determine the present frame regarding
Position in frequency frame sequence is greater than or equal to setting position threshold value.
Optionally, the method further includes:It is less than in response to position of the present frame in the sequence of frames of video and sets
Fixed position threshold, export the facial expression recognition of the facial image of the present frame as a result, and/or, preserve the present frame
Facial image facial expression recognition result.
Optionally, in the convolution layer segment by convolutional neural networks model and the facial image to be detected obtained
Face key point, human face expression feature extraction is carried out to facial image to be detected, obtains human face expression characteristic pattern, including:
Face critical point detection is carried out to facial image to be detected, obtains the face key point in the facial image;According to described
Face key point carries out human face expression feature by the convolution layer segment of the convolutional neural networks model to the facial image
Extraction obtains human face expression characteristic pattern.
Optionally, in the convolution layer segment by convolutional neural networks model and the facial image to be detected obtained
In face key point, before carrying out human face expression feature extraction to facial image to be detected, the method further includes:It obtains
The sample image of training trains the convolutional neural networks model using the sample image, wherein, in the sample image
Include the information of face key point and the markup information of human face expression.
Optionally, the sample image for obtaining training trains the convolutional neural networks using the sample image
Model, including:The sample image of training is obtained, by the convolution layer segment of convolutional neural networks model to the sample image
Human face expression feature extraction is carried out, obtains human face expression characteristic pattern;It determines to close with each face in the human face expression characteristic pattern
The corresponding region of interest ROI of key point;Determining each ROI is carried out by the pond layer segment of convolutional neural networks model
Pondization processing, obtains the ROI feature figure behind pond;According at least to the ROI feature figure, the convolutional neural networks model is adjusted
Network parameter.
Optionally it is determined that area-of-interest corresponding with each face key point in the human face expression characteristic pattern
ROI, including:In the human face expression characteristic pattern, corresponding each position is determined according to the coordinate of each face key point;
Using determining each position as reference point, the region of corresponding each setting range is obtained, each region of acquisition is determined as
Corresponding ROI.
Optionally, the pond layer segment by convolutional neural networks model carries out pond processing to determining each ROI,
The ROI feature figure behind pond is obtained to include:Pond is carried out to determining each ROI by the pond layer segment of convolutional neural networks model
Change is handled, and obtains the ROI feature figure being sized behind pond;It is described according at least to the ROI feature figure, adjust the convolution
The network parameter of neural network model, including:The ROI entrance loss layers being sized are obtained to the sample image
Carry out the expression classification resultant error of expression classification;According to the expression classification resultant error, the convolutional neural networks are adjusted
The network parameter of model.
Optionally, it by the ROI entrance loss layers being sized, obtains and expression classification is carried out to the sample image
Expression classification resultant error, including:By the ROI entrance loss layers being sized, pass through the logistic regression of the loss layer
Loss function calculates the expression classification resultant error and exports.
Optionally, the logistic regression loss function is the logistic regression loss function with setting expression classification quantity.
Optionally, the human face expression sample image to be trained is the sample image of sequence of frames of video.
Optionally, before the information of the sample image for obtaining training and corresponding face key point, the side
Method further includes:The sample image of the training is detected, obtains the information of face key point.
Second aspect according to embodiments of the present invention provides a kind of expression recognition apparatus, including:First determining module,
For by the convolution layer segment of convolutional neural networks model and obtain facial image to be detected in face key point, it is right
Facial image to be detected carries out human face expression feature extraction, obtains human face expression characteristic pattern;2nd the 5th determining module, is used for
Determine region of interest ROI corresponding with each face key point in the human face expression characteristic pattern;Third determining module,
For carrying out pond processing to determining each ROI by the pond layer segment of convolutional neural networks model, the ROI behind pond is obtained
Characteristic pattern;4th determining module, for obtaining the Expression Recognition result of the facial image according at least to the ROI feature figure.
Optionally, the facial image includes Static Human Face image.
Optionally, the facial image includes the facial image in sequence of frames of video.
Optionally, the third determining module, including:First acquisition submodule, for the face according to the present frame
The ROI feature figure of image obtains the preliminary Expression Recognition result of the facial image of the present frame;Second obtains submodule
Block, for the preliminary Expression Recognition result according to the present frame and at least Expression Recognition knot of the facial image of a prior frame
Fruit obtains the Expression Recognition result of the facial image of the present frame.
Optionally, second acquisition submodule, for the preliminary human face expression of the facial image of the present frame to be known
Other result and the facial expression recognition result of at least facial image of a prior frame are weighted processing, obtain the present frame
The Expression Recognition of facial image is as a result, wherein, the weight of the preliminary Expression Recognition result of the facial image of the present frame is more than
The weight of the Expression Recognition result of the facial image of any prior frame.
Optionally, described device further includes:5th determining module, for determining the present frame in sequence of frames of video
Position is greater than or equal to setting position threshold value.
Optionally, described device further includes:Respond module, in response to the present frame in the sequence of frames of video
Position be less than setting position threshold, export the facial expression recognition of the facial image of the present frame as a result, and/or, protect
Deposit the facial expression recognition result of the facial image of the present frame.
Optionally, first determining module for carrying out face critical point detection to facial image to be detected, obtains
Face key point in the facial image;According to the face key point, pass through the convolution of the convolutional neural networks model
Layer segment carries out human face expression feature extraction to the facial image, obtains human face expression characteristic pattern.
Optionally, first determining module, for being treated by the convolution layer segment of the convolutional neural networks model
The facial image of detection carries out face key point extraction;According to the face key point of extraction, facial image to be detected is carried out
Human face expression feature extraction obtains human face expression characteristic pattern.
Optionally, described device further includes:Training module for obtaining the sample image of training, uses the sample
Image trains the convolutional neural networks model, wherein, the information and face of face key point are included in the sample image
The markup information of expression.
Optionally, the training module, including:First submodule for obtaining the sample image of training, passes through convolution
The convolution layer segment of neural network model carries out human face expression feature extraction to the sample image, obtains human face expression feature
Figure;The second submodule, for determining region of interest corresponding with each face key point in the human face expression characteristic pattern
Domain ROI;Third submodule, the pond layer segment for passing through convolutional neural networks model carry out pond Hua Chu to determining each ROI
Reason obtains the ROI feature figure behind pond;4th submodule, for according at least to the ROI feature figure, adjusting the convolution god
Network parameter through network model.
Optionally, the second submodule, in the human face expression characteristic pattern, according to each face key point
Coordinate determines corresponding each position;Using determining each position as reference point, the area of corresponding each setting range is obtained
Each region of acquisition is determined as corresponding ROI by domain.
Optionally, the third submodule, it is each to what is determined for passing through the pond layer segment of convolutional neural networks model
ROI carries out pond processing, obtains the ROI feature figure being sized behind pond;4th submodule, for by the setting
The ROI entrance loss layers of size obtain the expression classification resultant error that expression classification is carried out to the sample image;According to described
Expression classification resultant error adjusts the network parameter of the convolutional neural networks model.
Optionally, the 4th submodule, for by the ROI entrance loss layers being sized, passing through the loss
The logistic regression loss function of layer calculates the expression classification resultant error and exports.
Optionally, the logistic regression loss function is the logistic regression loss function with setting expression classification quantity.
Optionally, the human face expression sample image to be trained is the sample image of sequence of frames of video.
Optionally, described device further includes:6th determining module is examined for the sample image to the training
It surveys, obtains the information of face key point.
The third aspect according to embodiments of the present invention, provides a kind of electronic equipment, including:Processor, memory, communication
Element and communication bus, the processor, the memory and the communication device are completed mutual by the communication bus
Communication;For the memory for storing an at least executable instruction, the executable instruction makes the processor perform first
Any expression recognition method of aspect.
Fourth aspect according to embodiments of the present invention provides a kind of computer readable storage medium, and the computer can
Storage medium is read to be stored with:For passing through the convolution layer segment of convolutional neural networks model and the facial image to be detected obtained
In face key point, human face expression feature extraction is carried out to facial image to be detected, obtain human face expression characteristic pattern can
Execute instruction;For determining region of interest ROI corresponding with each face key point in the human face expression characteristic pattern
Executable instruction;For carrying out pond processing to determining each ROI by the pond layer segment of convolutional neural networks model, obtain
Obtain the executable instruction of the ROI feature figure behind pond;For obtaining the table of the facial image according at least to the ROI feature figure
The executable instruction of feelings recognition result.
5th aspect according to embodiments of the present invention, provides a kind of convolutional neural networks model training method, including:It obtains
The sample image of training and the information of corresponding face key point are taken, wherein, include human face expression in the sample image
Markup information;Human face expression feature is carried out by the convolution layer segment of convolutional neural networks model to the sample image to carry
It takes, obtains human face expression characteristic pattern;Determine that sense corresponding with each face key point is emerging in the human face expression characteristic pattern
Interesting region ROI;Pond processing is carried out to determining each ROI by the pond layer segment of convolutional neural networks model, obtains pond
ROI feature figure afterwards;According at least to the ROI feature figure, the network parameter of the convolutional neural networks model is adjusted.
Optionally it is determined that area-of-interest corresponding with each face key point in the human face expression characteristic pattern
ROI, including:In the human face expression characteristic pattern, corresponding each position is determined according to the coordinate of each face key point;
Using determining each position as reference point, the region of corresponding each setting range is obtained, each region of acquisition is determined as
Corresponding ROI.
Optionally, the pond layer segment by convolutional neural networks model carries out pond processing to determining each ROI,
The ROI feature figure behind pond is obtained, including:Determining each ROI is carried out by the pond layer segment of convolutional neural networks model
Pondization processing, obtains the ROI feature figure being sized behind pond;It is described according at least to the ROI feature figure, adjust the volume
The network parameter of product neural network model, including:The ROI entrance loss layers being sized are obtained to the sample graph
Expression classification resultant error as carrying out expression classification;According to the expression classification resultant error, the convolutional Neural net is adjusted
The network parameter of network model.
Optionally, it by the ROI entrance loss layers being sized, obtains and expression classification is carried out to the sample image
Expression classification resultant error, including:By the ROI entrance loss layers being sized, pass through the logistic regression of the loss layer
Loss function calculates the expression classification resultant error and exports.
Optionally, the logistic regression loss function is the logistic regression loss function with setting expression classification quantity.
Optionally, the human face expression sample image to be trained is the sample image of sequence of frames of video.
Optionally, before the information of the sample image for obtaining training and corresponding face key point, the side
Method further includes:The sample image of the training is detected, obtains the information of face key point.
6th aspect according to embodiments of the present invention, provides a kind of convolutional neural networks model training apparatus, including:The
One acquisition module, for obtaining the information of the sample image of training and corresponding face key point, wherein, the sample image
In include the markup information of human face expression;Second acquisition module, for passing through the convolution layer segment of convolutional neural networks model
Human face expression feature extraction is carried out to the sample image, obtains human face expression characteristic pattern;Third acquisition module, for determining
State region of interest ROI corresponding with each face key point in human face expression characteristic pattern;4th acquisition module, for leading to
The pond layer segment for crossing convolution neural network model carries out pond processing to determining each ROI, obtains the ROI feature behind pond
Figure;5th acquisition module, for according at least to the ROI feature figure, adjusting the network parameter of the convolutional neural networks model.
Optionally, the third acquisition module, in the human face expression characteristic pattern, according to each face key point
Coordinate determine corresponding each position;Using determining each position as reference point, corresponding each setting range is obtained
Each region of acquisition is determined as corresponding ROI by region.
Optionally, the 4th acquisition module, for passing through the pond layer segment of convolutional neural networks model to determining
Each ROI carries out pond processing, obtains the ROI feature figure being sized behind pond;5th acquisition module, including:First
Acquisition submodule carries out expression classification for by the ROI entrance loss layers being sized, obtaining to the sample image
Expression classification resultant error;Submodule is adjusted, for according to the expression classification resultant error, adjusting the convolutional neural networks
The network parameter of model.
Optionally, first acquisition submodule, for by the ROI entrance loss layers being sized, by described
The logistic regression loss function of loss layer calculates the expression classification resultant error and exports.
Optionally, the logistic regression loss function is the logistic regression loss function with setting expression classification quantity.
Optionally, the human face expression sample image to be trained is the sample image of sequence of frames of video.
Optionally, described device further includes:6th acquisition module is examined for the sample image to the training
It surveys, obtains the information of face key point.
7th aspect according to embodiments of the present invention, provides a kind of electronic equipment, including:Processor, memory, communication
Element and communication bus, the processor, the memory and the communication device are completed mutual by the communication bus
Communication;For the memory for storing an at least executable instruction, the executable instruction makes the processor perform the 5th
Any convolutional neural networks model training method of aspect.
Eighth aspect according to embodiments of the present invention provides a kind of computer readable storage medium, and the computer can
Storage medium is read to be stored with:For obtaining the information of the sample image of training and corresponding face key point, wherein, the sample
Include the executable instruction of the markup information of human face expression in this image;For passing through the convolutional layer of convolutional neural networks model
Part carries out human face expression feature extraction to the sample image, obtains the executable instruction of human face expression characteristic pattern;For true
The executable instruction of region of interest ROI corresponding with each face key point in the fixed human face expression characteristic pattern;With
Pond processing is carried out to determining each ROI in the pond layer segment by convolutional neural networks model, it is special to obtain the ROI behind pond
Levy the executable instruction of figure;For according at least to the ROI feature figure, adjusting the network parameter of the convolutional neural networks model
Executable instruction.
The technical solution provided according to embodiments of the present invention is carrying out human face expression feature extraction according to face key point,
After obtaining human face expression characteristic pattern, determine that face key point is corresponding from human face expression characteristic pattern further according to face key point
ROI (Region Of Interest, area-of-interest), after processing of the ROI by ROI ponds layer, can obtain ROI feature
Figure;Then, human face expression is determined according to ROI feature figure.By select using corresponding to the region of face key point as ROI, energy
It is enough effectively to capture subtle expression shape change, while can preferably handle the otherness that different facial attitude tapes come, fully profit
The detailed information changed with facial multiple regions has the face of subtle expression shape change and different postures and more accurately knows
Not.
Description of the drawings
Fig. 1 is a kind of step flow chart of according to embodiments of the present invention one expression recognition method;
Fig. 2 is a kind of step flow chart of according to embodiments of the present invention two expression recognition method;
Fig. 3 is a kind of step flow chart of according to embodiments of the present invention three expression recognition method;
Fig. 4 is a kind of step flow chart of according to embodiments of the present invention four convolutional neural networks model training method;
Fig. 5 is a kind of structure diagram of according to embodiments of the present invention five expression recognition apparatus;
Fig. 6 is a kind of structure diagram of according to embodiments of the present invention six convolutional neural networks model training apparatus;
Fig. 7 is the structure diagram of according to embodiments of the present invention seven a kind of electronic equipment;
Fig. 8 is the structure diagram of according to embodiments of the present invention eight a kind of electronic equipment.
Specific embodiment
(identical label represents identical element in several attached drawings) and embodiment below in conjunction with the accompanying drawings, implement the present invention
The specific embodiment of example is described in further detail.Following embodiment is used to illustrate the present invention, but be not limited to the present invention
Range.
It will be understood by those skilled in the art that the terms such as " first ", " second " in the embodiment of the present invention are only used for distinguishing
Different step, equipment or module etc. neither represent any particular technology meaning, also do not indicate that the inevitable logic between them is suitable
Sequence.
Embodiment one
With reference to Fig. 1, a kind of step flow chart of according to embodiments of the present invention one expression recognition method is shown.
The expression recognition method of the present embodiment includes the following steps:
Step S102:By in the convolution layer segment of convolutional neural networks model and the facial image to be detected of acquisition
Face key point carries out human face expression feature extraction to facial image to be detected, obtains human face expression characteristic pattern.
Trained convolutional neural networks model has the function of facial expression recognition, includes at least input layer portion
Divide, convolution layer segment, pond layer segment, full connection layer segment etc..Wherein, input layer segment is used for input picture;Convolutional layer portion
Divide and carry out feature extraction;Pond layer segment carries out carrying out pond processing to the handling result of convolution layer segment, such as to convolutional layer portion
The characteristic pattern separately won carries out down-sampled etc.;Full connection layer segment can be used for classifying etc..
In the present embodiment, human face expression feature extraction is carried out by the convolution layer segment of convolutional neural networks model, is obtained
Human face expression characteristic pattern.In addition, the acquisition for face key point, it, can be in input convolutional Neural in a kind of feasible pattern
Before network model, by carrying out face critical point detection acquisition to facial image to be detected;In another feasible pattern,
It can be extracted by the convolution layer segment of convolutional neural networks model, that is, convolution layer segment first extracts face to be detected
Face key point in image, then, the face key point based on extraction carry out further human face expression feature extraction, obtain
Human face expression characteristic pattern;It, can also be before convolutional neural networks model be inputted, manually to be checked in still further possibility
The mark that the facial image of survey carries out face key point obtains.
Step S104:Determine ROI corresponding with each face key point in human face expression characteristic pattern.
The handling result to image entirety, the handling result are contained in the human face expression characteristic pattern of convolution layer segment output
Comprising larger data volume, if based on this progress facial expression recognition, mass data need to be handled, system processing load is heavier.
For this purpose, in the scheme of the embodiment of the present invention, first according to face key point, determine that each face key point corresponds to
ROI (Region Of Interest, area-of-interest).For example, in the information according to face key point, determine and each
During the corresponding ROI of face key point, in human face expression characteristic pattern, corresponding position is determined according to the coordinate of face key point
It puts;Centered on determining position, the region of setting range is obtained, the region of acquisition is determined as ROI.
Step S106:Pond processing is carried out to determining each ROI by the pond layer segment of convolutional neural networks model, is obtained
Obtain the ROI feature figure behind pond.
Wherein, pondization processing includes but not limited to down-sampled processing.
Step S108:The Expression Recognition result of facial image is obtained according at least to ROI feature figure.
Include the characteristic information of human face expression in ROI feature figure, can be obtained to people to be detected according to ROI feature figure
The Expression Recognition result of human face expression in face image.
Through this embodiment, human face expression feature extraction is being carried out according to face key point, is obtaining human face expression characteristic pattern
Afterwards, determine that face key point corresponding ROI, the ROI pass through ROI ponds from human face expression characteristic pattern further according to face key point
After the processing of layer, ROI feature figure can be obtained;Then, human face expression is determined according to ROI feature figure.It will be corresponded to by selection
The region of face key point can effectively capture subtle expression shape change, while can preferably handle different sides as ROI
The otherness that portion's attitude tape comes makes full use of the detailed information of facial multiple regions variation, to subtle expression shape change and not
Face with posture has and more accurately identifies.
Embodiment two
With reference to Fig. 2, a kind of step flow chart of according to embodiments of the present invention two expression recognition method is shown.
In the present embodiment, a convolutional neural networks model with facial expression recognition is first trained, then, is based on
The model carries out the facial expression recognition of image.But it should be understood by those skilled in the art that in actual use, can also use
Third party trains the convolutional neural networks model completed to carry out facial expression recognition.
The expression recognition method of the present embodiment includes the following steps:
Step S202:The sample image of training is obtained, uses sample image training convolutional neural networks model.
Wherein, sample image can be still image, or the sample image of sequence of frames of video.It is wrapped in sample image
The markup information of information and human face expression containing face key point.In the present embodiment, by training sample image into
Row detection obtains the information of face key point.
In a kind of feasible pattern for realizing this step, the sample image of training is obtained, passes through convolutional neural networks mould
The convolution layer segment of type carries out human face expression feature extraction to sample image, obtains human face expression characteristic pattern;Determine human face expression
ROI corresponding with each face key point in characteristic pattern;By the pond layer segment of convolutional neural networks model to determining
Each ROI carry out pond processing, obtain pond after ROI feature figure;According at least to ROI feature figure, convolutional neural networks are adjusted
The network parameter of model.
Wherein, it in a kind of feasible pattern, determines corresponding with each face key point in human face expression characteristic pattern
ROI includes:In human face expression characteristic pattern, corresponding each position is determined according to the coordinate of each face key point;With true
Fixed each position is reference point, obtains the region of corresponding each setting range, each region of acquisition is determined as corresponding to
ROI.
Pond processing is carried out to determining each ROI by the pond layer segment of convolutional neural networks model, pond can be obtained
The ROI feature figure being sized after change.And according to ROI feature figure, when adjusting the network parameter of convolutional neural networks model,
The ROI feature figure entrance loss layer that can will be sized obtains the expression classification result that expression classification is carried out to sample image
Error;According to expression classification resultant error, the network parameter of convolutional neural networks model is adjusted.Wherein, the network parameter of adjustment
Including but not limited to weight parameter weight, offset parameter bias etc..
The acquisition of expression classification resultant error can pass through loss layer by the ROI entrance loss layers that will be sized
Logistic regression loss function calculates expression classification resultant error and exports and obtain.Wherein, logistic regression loss function can be
Logistic regression loss function with setting expression classification quantity.
By above procedure, the training of the convolutional neural networks model with expression identification function is realized, and then, it is based on
The convolutional neural networks model that the training is completed can carry out the expression detection of face.
Step S204:Obtain facial image to be detected.
Wherein, facial image to be detected can be Static Human Face image, or the facial image of sequence of frames of video.
Step S206:Face critical point detection is carried out to facial image to be detected, the face obtained in facial image closes
Key point.
In the present embodiment, in a manner that advanced pedestrian's face image detects acquisition face key point.If but as previously mentioned, volume
Product neural network model has the function of face critical point detection, then can be directly by facial image input convolution god to be detected
Through network model, the facial image progress face key point that detection is treated by the convolution layer segment of convolutional neural networks model carries
It takes.And then according to the face key point of extraction, human face expression feature extraction is carried out to facial image to be detected, obtains face
Expressive features figure.
Step S208:By in the convolution layer segment of convolutional neural networks model and the facial image to be detected of acquisition
Face key point carries out human face expression feature extraction to facial image to be detected, obtains human face expression characteristic pattern.
Step S210:Determine ROI corresponding with each face key point in human face expression characteristic pattern.
Step S212:Pond processing is carried out to determining each ROI by the pond layer segment of convolutional neural networks model, is obtained
Obtain the ROI feature figure corresponding with each ROI behind pond.
Step S214:The Expression Recognition result of facial image is obtained according at least to ROI feature figure.
After ROI feature figure is obtained, can Expression Recognition be carried out according to ROI feature figure.
In a kind of preferred embodiment, when using the face table in the continuous sequence of frames of video of convolutional neural networks model inspection
During feelings image, if on the basis of present frame, it can be first using convolutional neural networks model to current in sequence of frames of video
Frame is detected, and according to the ROI feature figure of the facial image of present frame, obtains the preliminary Expression Recognition of the facial image of present frame
As a result;And then according to the preliminary Expression Recognition result of present frame and at least the facial image of a prior frame Expression Recognition as a result,
Obtain the Expression Recognition result of the facial image of present frame.For example, after the preliminary Expression Recognition result of face for obtaining present frame,
It can also judge whether position of the present frame in sequence of frames of video is greater than or equal to the position threshold of setting;If it is not, then due to
Position of the present frame in sequence of frames of video is less than the position threshold of setting, is made with the preliminary Expression Recognition result of the face of present frame
The facial expression recognition result output of facial image for final present frame and/or, preserve the people of the facial image of present frame
Face Expression Recognition result;If so, obtain present frame before setting quantity video frame facial expression recognition result;It ought
The preliminary Expression Recognition result of the facial image of previous frame and the facial expression recognition of at least facial image of a prior frame obtained
As a result linear weighted function processing is carried out, obtains the Expression Recognition result of the facial image of present frame.Wherein, an at least prior frame can be with
A discontinuous frame or multiframe before a continuous frame or multiframe or present frame before being present frame.Pass through above-mentioned mistake
Journey can determine the Expression Recognition of present frame as a result, avoiding the error of single frame detection according to the testing result of continuous multiframe,
So that testing result is more accurate.
Wherein, by the facial expression recognition result of present frame with obtain an at least prior frame facial expression recognition knot
Can be the preliminary Expression Recognition result of face of present frame and the face table of the prior frame obtained when fruit carries out linear weighted function processing
Feelings recognition result sets weight respectively, and when setting weight, the weight of the preliminary Expression Recognition result of face of present frame, which is more than, to be obtained
The weight of the facial expression recognition result of any prior frame taken;Then, according to the weight of setting, to the face of current video frame
Preliminary Expression Recognition result carries out linear weighted function with the prior frame facial expression recognition result obtained.Because mainly for forward sight is worked as
Frequency frame carries out Expression Recognition, so the testing result for current video frame sets heavier weight, by associated video frame
While testing result is as reference, it can effectively ensure that current video frame as detection target.
It should be noted that in above process, the setting number of the video frame before the position threshold of setting, present frame
Amount and the weight of setting can be appropriately arranged with by those skilled in the art according to actual conditions.Wherein it is preferred to currently
Video frame before video frame sets quantity as 3.
Through this embodiment, using the convolutional neural networks model that can precisely identify human face expression, face can be captured
The expression of slight change so that Expression Recognition is more accurately and fast.Also, for continuous sequence of frames of video, pass through the company of fusion
The testing result of continuous multiframe, effectively prevents the error of single frame detection, also further improves the accuracy of expression detection.
Embodiment three
With reference to Fig. 3, a kind of step flow chart of according to embodiments of the present invention three expression recognition method is shown.
The present embodiment illustrates the expression recognition method of the embodiment of the present invention in the form of a specific example.This
The expression recognition method of embodiment both includes convolutional neural networks model training part, also includes the use of the convolution god of training completion
The part of Expression Recognition is carried out through network model.
The expression recognition method of the present embodiment includes the following steps:
Step S302:Facial Expression Image is collected, and carries out expression mark, forms a sample graph image set to be trained
It closes.
For example, by being labelled with ten kinds of expressions manually, it is respectively:Angry, tranquil, puzzlement is detested, is happy, sad, harmful
Be afraid of, surprised, strabismus and scream.
Step S304:The face and its key point in every sample image are detected using Face datection algorithm, and utilizes pass
Key point alignment face.
In this step, conventional Face datection algorithm can be utilized to detect face and its key in every sample image
Point such as includes 21 face key points of eyes, face etc.;Then, it is aligned face using 21 face key points.
Step S306:Use the sample image and face key point training CNN models for having carried out expression mark.
In the present embodiment, the brief configuration example of a CNN model is as follows:
// first part
1. data input layer
// second part
2.<=1 convolutional layer 1_1 (3x3x4/2)
3.<=2 ReLU layers of nonlinear responses
4.<=3Pooling layers // Pooling layers common
5.<=4 convolutional layer 1_2 (3x3x6/2)
6.<=5 ReLU layers of nonlinear responses
7.<=6Pooling layers
8.<=7 convolutional layer 1_3 (3x3x6)
9.<=8 ReLU layers of nonlinear responses
10.<=9Pooling layers
11.<=10 convolutional layer 2_1 (3x3x12/2)
12.<=11 ReLU layers of nonlinear responses
13.<=12Pooling layers
14.<=13 convolutional layer 2_2 (3x3x12)
15.<=14 ReLU layers of nonlinear responses
16.<=15Pooling layers
17.<=16 ReLU layers of nonlinear responses
18.<=17 convolutional layer 5_4 (3x3x16)
// Part III
19.<The Pooling layers in Pooling layers of=18ROI // carry out ROI ponds
20.<=19 full articulamentums
21.<=20 loss layers
In above-mentioned CNN model structures, the sample image and face key point that have carried out expression mark pass through first part
Input layer input CNN models be trained;Then it is handled by the conventional convolution layer segment of second part;Based on second
Partial handling result obtains ROI feature figure according to face key point, and obtained ROI feature figure is inputted Pooling layers of ROI
The processing of ROI pondizations is carried out, obtains the ROI feature figure of Chi Huahou;ROI feature figure behind pond sequentially inputs full articulamentum and damage again
Lose layer;The network parameter of adjustment CNN models is determined how according to the handling result of loss layer, CNN models are trained.
Wherein, when obtaining ROI feature figure according to face key point based on the handling result of second part, for face 21
The corresponding ROI of key point can be mapped back the last one convolutional layer of CNN models according to the coordinate of 21 key points first
On the characteristic pattern of (in the present embodiment be the 32nd layer) output, that is, according to the face detected on original sample image 21
Key point is mapped on the characteristic pattern of the 32nd layer of output, and extracts out 21 cells on characteristic pattern centered on these key points
Domain (such as 3 × 3 region or irregular codes etc.), it is then defeated using the characteristic pattern in this 21 regions as Pooling layers of ROI
Enter, obtain ROI feature figure, then ROI feature figure is input to full articulamentum, be followed by the logistic regression loss function layer of ten classification
Error is calculated, and reversely in the result and the human face expression mark of mark by (e.g., SoftmaxWithloss Layer)
Propagated error, so as to update the parameter of CNN models (parameter for including full articulamentum).So cycle repeatedly, until error no longer
It reduces, the convergence of CNN models obtains the model of training completion.
Because 21 ROI regions are covered with the relevant all positions of human face expression, and without redundancy so that CNN
Model, which can focus more on, learns these regions well, it is easier to capture the subtle variation of face muscle;After Pooling layers of ROI
The ROI feature for obtaining regular length represents, so as to allow that identical network can also be used when inputting different size of ROI region
Structure;The character representation of the regular length is sequentially inputted to full articulamentum and loss layer again, obtains final expression classification knot
Fruit.
Wherein, ROI Pooling layers are the pond layers for ROI feature figure, for example, some ROI region coordinate be (x1,
Y1, x2, y2), then input size is (y2-y1) × (x2-x1), if the Output Size of Pooling layers of ROI is pooled_
Height × pooled_width, then the output of each grid be [(y2-y1)/pooled_height] × [(x2-x1)/
Pooled_width] pool area result.
Furthermore, it is necessary to explanation is:
In the explanation of above-mentioned convolutional network structure, 2.<=1 shows that current layer for the second layer, is inputted as first layer;Convolutional layer
Bracket is that convolution layer parameter (3x3x16) shows that convolution kernel size is 3x3 below, port number 16.Other the rest may be inferred, no longer
It repeats.
In above-mentioned convolutional network structure, there are one nonlinear response unit R eLU after each convolutional layer.Preferably,
PReLU (ParametricRectified Linear Units, parametrization correct linear unit) may be used in the ReLU, to have
Effect improves the accuracy of detection of CNN models.
In addition, the convolution kernel of convolutional layer is set as 3x3, local message can be preferably integrated;Set the interval of convolutional layer
Stride can allow upper strata feature to obtain the visual field of bigger under the premise of calculation amount is not increased.
But those skilled in the art are it should be apparent that the number of plies of the size of above-mentioned convolution kernel, port number and convolutional layer
Quantity is exemplary illustration, and in practical applications, those skilled in the art can be adaptively adjusted according to actual needs,
The embodiment of the present invention is not restricted this.In addition, all layers of combination and parameter in convolutional network model in the present embodiment
All be it is optional, can be in any combination.
Step S308:Expression Recognition carries out the Facial Expression Image after alignment, and obtain by the CNN models that training is completed
To recognition result.
With CNN model trainings the difference is that when the CNN models that training is used to complete carry out Expression Recognition, CNN
What the full articulamentum of model was followed by is the logistic regression layer of ten classification rather than logistic regression loss function layer, to directly obtain
Recognition result.
For single image, the CNN models that can be directly completed by above-mentioned training carry out Expression Recognition.
For sequence of frames of video, the identification of each of which frame is as single image.But in order to promote sequence of frames of video
The accuracy rate of Expression Recognition can merge multiframe.For example, setting t=1 as video lead frame, work as t>=3 be present frame
When position is more than or equal to third frame, two frames before present frame and its are carried out at the same time identification, respectively obtain the identification of this three frame
As a result.If three frames of input are denoted as Xt-2, Xt-1 and Xt respectively, the recognition result of three frames is denoted as Yt-2, Yt-1 and Yt, it will
The recognition result of this three frame does a linear weighted function, if present frame weight 0.5, two frame weights before the frame are all 0.25, that
Final prediction result is Y=0.25 × Yt-2+0.25 × Yt-1+0.5 × Yt.Work as t<3, i.e., current frame position is less than third
During needle frame, Y=Yt.Those skilled in the art actually should it should be apparent that the weight of above-mentioned each frame is merely illustrative
In, those skilled in the art suitably can set weight for each frame according to actual needs, be more than the corresponding weight of present frame
Other frames.
Through this embodiment, by the way that the corresponding region of 21 key points using face is selected to make full use of face as ROI
The detailed information of multiple regions variation can capture the expression of facial slight change so that identification is more accurate, quickly;By right
Multiframe is merged so that the scheme of the present embodiment can be effectively applicable to the Expression Recognition based on video.
The expression recognition method of the present embodiment can be performed by any suitable equipment with data-handling capacity, including
But it is not limited to:Mobile terminal, PC machine, server, mobile unit, advertisement machine, face check-in etc..
Example IV
With reference to Fig. 4, the step of showing a kind of according to embodiments of the present invention four convolutional neural networks model training method
Flow chart.
The convolutional neural networks model training method of the present embodiment includes the following steps:
Step S402:Obtain the sample image of training and the information of corresponding face key point.
Wherein, the markup information of human face expression is included in sample image, can be sample in advance when carrying out CNN training
This image is labeled, and in the present embodiment, corresponding human facial expression information is marked out in sample image, so that follow-up basis should
Markup information determines whether CNN training results are accurate.
In addition, for every sample image, it is also necessary to obtain the information of corresponding face key point.Therefore, actually should
In, before this step, it can also include:The sample image of training is detected, obtains the information of face key point.
To make training effect more preferable, after the information for obtaining face key point, face pair can also be carried out according to these key points
Together, the sample image input CNN after face is aligned carries out sample training.It is aligned by face, sample image can be improved
Training effect.
In above process, face key point is obtained to pattern detection and alignment face can be by those skilled in the art
It is realized using any suitable relevant way, wherein, the realization that face key is obtained to pattern detection can include but is not limited to:
By the CNN with face key point positioning function, alternatively, ASM (ActiveShapeModel) method, alternatively, G-EBGM (bases
In the elastic graph matching of Gabor characteristic) etc.;The realization of face alignment can include but is not limited to:AAM(Active
Appearance Model), CLM (ConstrainedLocalModels, constrained partial model) etc..
In addition, conventional key point may be used in face key point, such as 68 key points of face, but not limited to this, this hair
In bright embodiment, 21 key points of face can be used, this 21 key points respectively include:Per each 3 keys in side eyebrow position
Point (brows, eyebrow tail and eyebrow peak), each 3 key points (inner eye corner, the tail of the eye, pupil center) of every side eyes, nose areas
4 key points (wing of nose outermost points, nose, the nose lowermost points of both sides), 5, face position key point (two labial angles, on
Lip depression points, lower lip depression points, lower lip and upper lip contact line intermediate position points).On the one hand, this 21 key points represent face
Key position, can be with Efficient Characterization face characteristic;On the other hand, the embodiment of the present invention can be completed by this 21 key points
Convolutional neural networks model training, reduce data volume and training cost.
Step S404:Human face expression feature extraction is carried out to sample image by the convolution layer segment of CNN models, obtains people
Face expressive features figure.
In the present embodiment, the convolutional coding structure of conventional CNN models may be used in the conventional part of CNN models, to sample image
Processing be referred to relevant CNN models convolution layer segment processing carry out, details are not described herein.Through convolution layer segment
After processing, obtaining corresponding human face expression characteristic pattern, (certain handling result of convolution layer segment can be understood as CNN models at certain
Output result in secondary training process).
Step S406:It determines in human face expression characteristic pattern, ROI corresponding with each face key point.
The handling result to image entirety is contained in the human face expression characteristic pattern of convolutional layer output, if directly using the knot
Fruit carries out subsequent expression training, then on the one hand data volume to be treated is larger, on the other hand, can not also be directed to human face expression
It is targetedly trained, causes training result not accurate.
For this purpose, in the scheme of the embodiment of the present invention, in the information according to face key point, determine crucial with each face
During the corresponding ROI of point, in human face expression characteristic pattern, corresponding each position is determined according to the coordinate of each face key point
It puts;By each determining position for reference point (as centered on point, in practical applications, allowing the central point, there are a small ranges
Deviation), obtain the region of setting range, each region of acquisition be determined as corresponding ROI.It it is 21 for face key point
For key point, when determining ROI, can first it be mapped according to the coordinate of face key point, i.e. 21 key points of face
It returns on the human face expression characteristic pattern of the last one convolutional layer output of CNN, then, with each key point on human face expression characteristic pattern
Centered on, a certain range of region (generally extracting ranging from 3 × 3~7 × 7, preferably 3 × 3) is extracted, with this 21 regions
Input of the characteristic pattern as ROI Pooling Layer (ROI ponds layer).This 21 regions cover relevant with human face expression
All positions, and without redundancy so that network, which can focus more on, learns these regions well, it is easier to capture face muscle
Subtle variation.
Step S408:Pond processing is carried out to determining each ROI by the pond layer segment of CNN, obtains the ROI behind pond
Characteristic pattern.
In CNN models, pond layer often behind convolutional layer, by pond come reduce feature that convolutional layer exports to
Amount, while improve as a result, so as to being as a result less prone to over-fitting.It, can be according to the size dynamic of image for different images
The size and step-length of ground computing pool window, to obtain the pond result of same size.
In embodiments of the present invention, ROI is inputted into pond layer, after the ROI pondizations processing of pond layer, fixation can be obtained
The unified ROI feature figure of the character representation of the ROI of length, i.e. size.
Step S410:According at least to the ROI feature figure of Chi Huahou, the network parameter of CNN models is adjusted.
In a kind of feasible pattern, the ROI feature figure being sized behind ROI ponds can be inputted full articulamentum and carried out
Corresponding processing;Then, then by the ROI entrance loss layers that treated is sized, obtain and expression point is carried out to sample image
The expression classification resultant error of class, for example, the error can be the output result of loss layer;According to expression classification resultant error,
Adjust the network parameter of CNN models.
After the handling result for obtaining ROI ponds layer, which can be inputted full articulamentum, pass through full articulamentum
Different size of image is converted to the feature of identical dimensional;Then, the feature entrance loss layer full articulamentum exported, to obtain
Obtain loss result;The network parameter of adjustment CNN models is decided whether according to the loss result to continue to train.Specifically
To the present embodiment, after the ROI feature figure for obtaining ROI ponds layer, which is inputted into full articulamentum, is set
The ROI feature of dimension, wherein, setting dimension can be appropriately arranged with by those skilled in the art according to actual demand, and the present invention is real
Example is applied not to be restricted this;The ROI feature of the setting dimension is entered loss layer, passes through loss function counting loss result;Into
And judge whether the training output of CNN models meets the condition of convergence according to loss result;If meeting the condition of convergence, terminate CNN
The training of model;If being unsatisfactory for the condition of convergence, the parameter of CNN model trainings (is including but not limited to weighed according to loss result
Weight parameter weight, offset parameter bias etc.) it is adjusted;Continue the training of CNN models using the parameter after adjustment, directly
Meet the condition of convergence to training result.Wherein, the condition of convergence can according to actual needs be suitably set by those skilled in the art,
The embodiment of the present invention is not restricted this.
In the present embodiment, the loss function of loss layer is using logistic regression loss function, and in the case, full articulamentum is defeated
The ROI being sized gone out will be entered loss layer, and expression classification result is carried out by the logistic regression loss function of loss layer
Error calculation, and then error originated from input result of calculation.Optionally, logistic regression loss function is with setting expression classification quantity
Logistic regression loss function.For example, being labelled with the expression of ten types in sample image, the training goal of CNN models is can
The expression of this ten type is identified and classified, then logistic regression loss function can be with for the logistic regression of very class damage
Function is lost, wherein, " ten classification " expression can be detected and be identified ten types of mark by the logistic regression loss function
Expression.
In addition, in practical CNN model trainings, the people of sequence of frames of video can also be used for trained sample image
Face expression sample image has certain contact in sequence of frames of video between video frame, using sequence of frames of video as training sample
This, be more conducive to the CNN models after training in actually detected to continuous video frame in human face expression identification.
By the above process, the training of the Expression Recognition to the CNN models in the embodiment of the present invention is realized, with routine
Training method is different, after being handled by the convolution layer segment in CNN models human face expression sample image, according to face
Key point determines ROI in convolutional layer handling result, ROI input ROI ponds layer is handled, finally according to ROI ponds
The handling result of layer determines the training to convolutional neural networks model.By select using corresponding to the region of face key point as
ROI can be more targetedly trained, and the detailed information of face's multiple regions variation can be made full use of, to subtle
The face of expression shape change and different postures, which has, more accurately to be identified.This facial area using corresponding to face key point is made
For the CNN model training methods of ROI, subtle expression shape change can be effectively captured, while can preferably handle different faces
The otherness that attitude tape comes, so as to improve the precision of prediction of CNN models and robustness.Also, it is compared to conventional machines study
Frame carries out the mode of Expression Recognition, using the CNN of structure of the embodiment of the present invention because of the structure feature of itself, can not only use
Big data quantity sample is trained, and training effectiveness is high, and training cost is relatively low.
The convolutional neural networks model training method of the present embodiment can be by any suitable with data-handling capacity
Equipment performs, including but not limited to:Mobile terminal, PC machine etc..
Embodiment five
With reference to Fig. 5, a kind of structure diagram of according to embodiments of the present invention five expression recognition apparatus is shown;It specifically includes
Following module:
First determining module 502, for passing through the convolution layer segment of convolutional neural networks model and obtaining to be detected
Face key point in facial image carries out human face expression feature extraction to facial image to be detected, it is special to obtain human face expression
Sign figure.
Second determining module 504, it is corresponding respectively with each face key point in the human face expression characteristic pattern for determining
Region of interest ROI.
Third determining module 506, the pond layer segment for passing through convolutional neural networks model carry out determining each ROI
Pondization processing, obtains the ROI feature figure behind pond.
4th determining module 508, for obtaining the Expression Recognition knot of the facial image according at least to the ROI feature figure
Fruit.
Optionally, the facial image includes Static Human Face image.
Optionally, the facial image includes the facial image in sequence of frames of video.
Optionally, the third determining module 506, including:First acquisition submodule 5062, for according to the present frame
Facial image the ROI feature figure, obtain the preliminary Expression Recognition result of the facial image of the present frame;Second obtains
Submodule 5064, for the preliminary Expression Recognition result according to the present frame and at least expression of the facial image of a prior frame
Recognition result obtains the Expression Recognition result of the facial image of the present frame.
Optionally, second acquisition submodule 5064, for by the preliminary face table of the facial image of the present frame
The facial expression recognition result of feelings recognition result and at least facial image of a prior frame is weighted processing, obtains described current
The Expression Recognition of the facial image of frame is as a result, wherein, the weight of the preliminary Expression Recognition result of the facial image of the present frame
More than the weight of the Expression Recognition result of the facial image of any prior frame.
Optionally, described device further includes:5th determining module 510, for determining the present frame in sequence of frames of video
In position be greater than or equal to setting position threshold value.
Optionally, described device further includes:Respond module 512, in response to the present frame in the video frame sequence
Position in row is less than the position threshold of setting, export the facial expression recognition of the facial image of the present frame as a result, and/
Or, preserve the facial expression recognition result of the facial image of the present frame.
Optionally, first determining module 502, for carrying out face critical point detection to facial image to be detected,
Obtain the face key point in the facial image;According to the face key point, pass through the convolutional neural networks model
Convolution layer segment carries out human face expression feature extraction to the facial image, obtains human face expression characteristic pattern.
Optionally, first determining module 502, for passing through the convolution layer segment pair of the convolutional neural networks model
Facial image to be detected carries out face key point extraction;According to the face key point of extraction, to facial image to be detected into
Pedestrian's face human facial feature extraction obtains human face expression characteristic pattern.
Optionally, described device further includes:Training module 514 for obtaining the sample image of training, uses the sample
This image trains the convolutional neural networks model, wherein, information and the people of face key point are included in the sample image
The markup information of face expression.
Optionally, the training module 514, including:First submodule 5142, for obtaining the sample image of training,
Human face expression feature extraction is carried out to the sample image by the convolution layer segment of convolutional neural networks model, obtains face table
Feelings characteristic pattern;The second submodule 5144, it is corresponding respectively with each face key point in the human face expression characteristic pattern for determining
Region of interest ROI;Third submodule 5146, it is each to what is determined for passing through the pond layer segment of convolutional neural networks model
ROI carries out pond processing, obtains the ROI feature figure behind pond;4th submodule 5148, for according at least to the ROI feature
Figure adjusts the network parameter of the convolutional neural networks model.
Optionally, the second submodule 5142, in the human face expression characteristic pattern, according to each face key
The coordinate of point determines corresponding each position;Using determining each position as reference point, corresponding each setting range is obtained
Region, each region of acquisition is determined as corresponding ROI.
Optionally, the third submodule 5146, for passing through the pond layer segment of convolutional neural networks model to determining
Each ROI carry out pond processing, obtain pond after the ROI feature figure being sized;4th submodule 5148, for inciting somebody to action
The ROI entrance loss layers being sized obtain the expression classification resultant error that expression classification is carried out to the sample image;
According to the expression classification resultant error, the network parameter of the convolutional neural networks model is adjusted.
Optionally, the 4th submodule 5148, for by the ROI entrance loss layers being sized, by described
The logistic regression loss function of loss layer calculates the expression classification resultant error and exports.
Optionally, the logistic regression loss function is the logistic regression loss function with setting expression classification quantity.
Optionally, the human face expression sample image to be trained is the sample image of sequence of frames of video.
Optionally, the 6th determining module 516 is detected for the sample image to the training, obtains face and closes
The information of key point.
Expression recognition apparatus through this embodiment can perform any one expression recognition method in embodiment one to three,
And the advantageous effect of this method is obtained, therefore not to repeat here.
Embodiment six
With reference to Fig. 6, a kind of structure of according to embodiments of the present invention six convolutional neural networks model training apparatus is shown
Block diagram;Specifically include following module:
First acquisition module 602, for obtaining the information of the sample image of training and corresponding face key point,
In, the markup information of human face expression is included in the sample image.
Second acquisition module 604, for pass through the convolution layer segment of convolutional neural networks model to the sample image into
Pedestrian's face human facial feature extraction obtains human face expression characteristic pattern.
Third acquisition module 606, it is corresponding respectively with each face key point in the human face expression characteristic pattern for determining
Region of interest ROI.
4th acquisition module 608, the pond layer segment for passing through convolutional neural networks model carry out determining each ROI
Pondization processing, obtains the ROI feature figure behind pond.
5th acquisition module 610, for according at least to the ROI feature figure, adjusting the convolutional neural networks model
Network parameter.
Optionally, the third acquisition module 606, in the human face expression characteristic pattern, being closed according to each face
The coordinate of key point determines corresponding each position;Using determining each position as reference point, corresponding each setting model is obtained
Each region of acquisition is determined as corresponding ROI by the region enclosed.
Optionally, the 4th acquisition module 608, for passing through the pond layer segment of convolutional neural networks model to determining
Each ROI carry out pond processing, obtain pond after the ROI feature figure being sized;5th acquisition module 610, including:
First acquisition submodule 6102, for obtaining the ROI entrance loss layers being sized to the sample image carry out table
The expression classification resultant error of mutual affection class;Submodule 6104 is adjusted, for according to the expression classification resultant error, described in adjustment
The network parameter of convolutional neural networks model.
Optionally, first acquisition submodule 6104, for by the ROI entrance loss layers being sized, passing through
The logistic regression loss function of the loss layer calculates the expression classification resultant error and exports.
Optionally, the logistic regression loss function is the logistic regression loss function with setting expression classification quantity.
Optionally, the human face expression sample image to be trained is the sample image of sequence of frames of video.
Optionally, described device further includes:6th acquisition module 612, for being carried out to the sample image of the training
Detection obtains the information of face key point.
Expression recognition apparatus through this embodiment can perform any one expression recognition method in embodiment one to three,
And the advantageous effect of this method is obtained, therefore not to repeat here.
Embodiment seven
The embodiment of the present invention five provides a kind of electronic equipment, such as can be mobile terminal, personal computer (PC), put down
Plate computer, server etc..Below with reference to Fig. 7, it illustrates suitable for being used for realizing the terminal device of the embodiment of the present invention or service
The structure diagram of the electronic equipment 700 of device:As shown in fig. 7, electronic equipment 700 includes one or more processors, communication member
Part etc., one or more of processors are for example:One or more central processing unit (CPU) 701 and/or one or more
Image processor (GPU) 713 etc., processor can according to the executable instruction being stored in read-only memory (ROM) 702 or
From the executable instruction that storage section 708 is loaded into random access storage device (RAM) 703 perform various appropriate actions and
Processing.Communication device includes communication component 712 and/or communication interface 709.Wherein, communication component 712 may include but be not limited to net
Card, the network interface card may include but be not limited to IB (Infiniband) network interface card, and communication interface 709 includes such as LAN card, modulation /demodulation
The communication interface of the network interface card of device etc., communication interface 709 perform communication process via the network of such as internet.
Processor can communicate with read-only memory 702 and/or random access storage device 703 to perform executable instruction,
It is connected by communication bus 704 with communication component 712 and is communicated through communication component 712 with other target devices, so as to completes this
The corresponding operation of any one expression recognition method that inventive embodiments provide, for example, the convolution by convolutional neural networks model
It is special to carry out human face expression to facial image to be detected for face key point in layer segment and the facial image to be detected obtained
Sign extraction, obtains human face expression characteristic pattern;It determines corresponding with each face key point in the human face expression characteristic pattern
Region of interest ROI;Pond processing is carried out to determining each ROI by the pond layer segment of convolutional neural networks model, is obtained
ROI feature figure behind pond;The Expression Recognition result of the facial image is obtained according at least to the ROI feature figure.
In addition, in RAM 703, it can also be stored with various programs and data needed for device operation.CPU701 or
GPU713, ROM702 and RAM703 are connected with each other by communication bus 704.In the case where there is RAM703, ROM702 is can
Modeling block.RAM703 stores executable instruction or executable instruction is written into ROM702 at runtime, and executable instruction makes place
It manages device and performs the corresponding operation of above-mentioned communication means.Input/output (I/O) interface 705 is also connected to communication bus 704.Communication
Component 712 can be integrally disposed, may be set to be with multiple submodule (such as multiple IB network interface cards), and in communication bus chain
It connects.
I/O interfaces 705 are connected to lower component:Importation 706 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 707 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.;Storage section 708 including hard disk etc.;
And the communication interface 709 of the network interface card including LAN card, modem etc..Driver 710 is also according to needing to connect
It is connected to I/O interfaces 705.Detachable media 711, such as disk, CD, magneto-optic disk, semiconductor memory etc. are pacified as needed
On driver 710, in order to be mounted into storage section 708 as needed from the computer program read thereon.
Need what is illustrated, framework as shown in Figure 7 is only a kind of optional realization method, can root during concrete practice
The component count amount and type of above-mentioned Fig. 7 are selected, are deleted, increased or replaced according to actual needs;It is set in different function component
Put, can also be used it is separately positioned or integrally disposed and other implementations, such as GPU and CPU separate setting or can be by GPU collection
Into on CPU, communication device separates setting, can also be integrally disposed on CPU or GPU, etc..These interchangeable embodiment party
Formula each falls within protection scope of the present invention.
Particularly, according to embodiments of the present invention, it is soft to may be implemented as computer for the process above with reference to flow chart description
Part program.For example, the embodiment of the present invention includes a kind of computer program product, including being tangibly embodied in machine readable media
On computer program, computer program included for the program code of the method shown in execution flow chart, and program code can wrap
The corresponding instruction of corresponding execution method and step provided in an embodiment of the present invention is included, for example, the volume by convolutional neural networks model
Face key point in lamination part and the facial image to be detected obtained carries out human face expression to facial image to be detected
Feature extraction obtains human face expression characteristic pattern;It determines corresponding respectively with each face key point in the human face expression characteristic pattern
Region of interest ROI;Pond processing is carried out to determining each ROI by the pond layer segment of convolutional neural networks model, is obtained
Obtain the ROI feature figure behind pond;The Expression Recognition result of the facial image is obtained according at least to the ROI feature figure.At this
In the embodiment of sample, which from network can be downloaded and installed and/or is situated between from detachable by communication device
Matter 711 is mounted.When the computer program is executed by processor, the above-mentioned work(limited in the method for the embodiment of the present invention is performed
Energy.
Embodiment eight
The embodiment of the present invention eight provides a kind of electronic equipment, such as can be mobile terminal, personal computer (PC), put down
Plate computer, server etc..Below with reference to Fig. 8, it illustrates suitable for being used for realizing the terminal device of the embodiment of the present invention or service
The structure diagram of the electronic equipment 800 of device:As shown in figure 8, electronic equipment 800 includes one or more processors, communication member
Part etc., one or more of processors are for example:One or more central processing unit (CPU) 801 and/or one or more
Image processor (GPU) 813 etc., processor can according to the executable instruction being stored in read-only memory (ROM) 802 or
From the executable instruction that storage section 808 is loaded into random access storage device (RAM) 803 perform various appropriate actions and
Processing.Communication device includes communication component 812 and/or communication interface 809.Wherein, communication component 812 may include but be not limited to net
Card, the network interface card may include but be not limited to IB (Infiniband) network interface card, and communication interface 809 includes such as LAN card, modulation /demodulation
The communication interface of the network interface card of device etc., communication interface 809 perform communication process via the network of such as internet.
Processor can communicate with read-only memory 802 and/or random access storage device 803 to perform executable instruction,
It is connected by communication bus 804 with communication component 812 and is communicated through communication component 812 with other target devices, so as to completes this
The corresponding operation of any one convolutional neural networks model training method that inventive embodiments provide, for example, obtaining the sample of training
The information of this image and corresponding face key point, wherein, the markup information of human face expression is included in the sample image;It is logical
The convolution layer segment for crossing convolution neural network model carries out human face expression feature extraction to the sample image, obtains human face expression
Characteristic pattern;Determine region of interest ROI corresponding with each face key point in the human face expression characteristic pattern;Pass through volume
The pond layer segment of product neural network model carries out pond processing to determining each ROI, obtains the ROI feature figure behind pond;Extremely
Less according to the ROI feature figure, the network parameter of the convolutional neural networks model is adjusted.
In addition, in RAM 803, it can also be stored with various programs and data needed for device operation.CPU801 or
GPU813, ROM802 and RAM803 are connected with each other by communication bus 804.In the case where there is RAM803, ROM802 is can
Modeling block.RAM803 stores executable instruction or executable instruction is written into ROM802 at runtime, and executable instruction makes place
It manages device and performs the corresponding operation of above-mentioned communication means.Input/output (I/O) interface 805 is also connected to communication bus 804.Communication
Component 812 can be integrally disposed, may be set to be with multiple submodule (such as multiple IB network interface cards), and in communication bus chain
It connects.
I/O interfaces 805 are connected to lower component:Importation 806 including keyboard, mouse etc.;It is penetrated including such as cathode
The output par, c 807 of spool (CRT), liquid crystal display (LCD) etc. and loud speaker etc.;Storage section 808 including hard disk etc.;
And the communication interface 809 of the network interface card including LAN card, modem etc..Driver 810 is also according to needing to connect
It is connected to I/O interfaces 805.Detachable media 811, such as disk, CD, magneto-optic disk, semiconductor memory etc. are pacified as needed
On driver 810, in order to be mounted into storage section 808 as needed from the computer program read thereon.
Need what is illustrated, framework as shown in Figure 8 is only a kind of optional realization method, can root during concrete practice
The component count amount and type of above-mentioned Fig. 8 are selected, are deleted, increased or replaced according to actual needs;It is set in different function component
Put, can also be used it is separately positioned or integrally disposed and other implementations, such as GPU and CPU separate setting or can be by GPU collection
Into on CPU, communication device separates setting, can also be integrally disposed on CPU or GPU, etc..These interchangeable embodiment party
Formula each falls within protection scope of the present invention.
Particularly, according to embodiments of the present invention, it is soft to may be implemented as computer for the process above with reference to flow chart description
Part program.For example, the embodiment of the present invention includes a kind of computer program product, including being tangibly embodied in machine readable media
On computer program, computer program included for the program code of the method shown in execution flow chart, and program code can wrap
Include it is corresponding perform the corresponding instruction of method and step provided in an embodiment of the present invention, for example, obtaining the sample image of training and right
The information of face key point answered, wherein, the markup information of human face expression is included in the sample image;Pass through convolutional Neural
The convolution layer segment of network model carries out human face expression feature extraction to the sample image, obtains human face expression characteristic pattern;Really
Region of interest ROI corresponding with each face key point in the fixed human face expression characteristic pattern;Pass through convolutional Neural net
The pond layer segment of network model carries out pond processing to determining each ROI, obtains the ROI feature figure behind pond;According at least to institute
ROI feature figure is stated, adjusts the network parameter of the convolutional neural networks model.In such embodiments, the computer program
It can be downloaded and installed from network and/or be mounted from detachable media 811 by communication device.In the computer journey
When sequence is executed by processor, the above-mentioned function of being limited in the method for the embodiment of the present invention is performed.
It may be noted that according to the needs of implementation, all parts/step described in the embodiment of the present invention can be split as more
The part operation of two or more components/steps or components/steps can be also combined into new component/step by multi-part/step
Suddenly, to realize the purpose of the embodiment of the present invention.
It is above-mentioned to realize or be implemented as in hardware, firmware according to the method for the embodiment of the present invention to be storable in note
Software or computer code in recording medium (such as CD ROM, RAM, floppy disk, hard disk or magneto-optic disk) are implemented through net
The original storage that network is downloaded is in long-range recording medium or nonvolatile machine readable media and will be stored in local recording medium
In computer code, can be stored in using all-purpose computer, application specific processor or can compile so as to method described here
Such software processing in journey or the recording medium of specialized hardware (such as ASIC or FPGA).It is appreciated that computer, processing
Device, microprocessor controller or programmable hardware include can storing or receive software or computer code storage assembly (for example,
RAM, ROM, flash memory etc.), when the software or computer code are by computer, processor or hardware access and when performing, realize
Processing method described here.In addition, when all-purpose computer access is used to implement the code for the processing being shown here, code
It performs and is converted to all-purpose computer to perform the special purpose computer of processing being shown here.
Those of ordinary skill in the art may realize that each exemplary lists described with reference to the embodiments described herein
Member and method and step can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
It is performed with hardware or software mode, specific application and design constraint depending on technical solution.Professional technician
Described function can be realized using distinct methods to each specific application, but this realization is it is not considered that exceed
The range of the embodiment of the present invention.
Embodiment of above is merely to illustrate the embodiment of the present invention, and is not the limitation to the embodiment of the present invention, related skill
The those of ordinary skill in art field in the case where not departing from the spirit and scope of the embodiment of the present invention, can also make various
Variation and modification, therefore all equivalent technical solutions also belong to the scope of the embodiment of the present invention, the patent of the embodiment of the present invention
Protection domain should be defined by the claims.
Claims (10)
1. a kind of expression recognition method, which is characterized in that including:
It is right by the face key point in the convolution layer segment of convolutional neural networks model and the facial image to be detected of acquisition
Facial image to be detected carries out human face expression feature extraction, obtains human face expression characteristic pattern;
Determine region of interest ROI corresponding with each face key point in the human face expression characteristic pattern;
Pond processing is carried out to determining each ROI by the pond layer segment of convolutional neural networks model, obtains the ROI behind pond
Characteristic pattern;
The Expression Recognition result of the facial image is obtained according at least to the ROI feature figure.
2. according to the method described in claim 1, it is characterized in that, the facial image includes Static Human Face image.
3. according to the method described in claim 1, it is characterized in that, the facial image includes the face figure in sequence of frames of video
Picture.
4. according to the method described in claim 3, it is characterized in that, obtain the face figure according at least to the ROI feature figure
The Expression Recognition of picture as a result, including:
According to the ROI feature figure of the facial image of the present frame, the preliminary table of the facial image of the present frame is obtained
Feelings recognition result;
According to the preliminary Expression Recognition result of the present frame and at least Expression Recognition of the facial image of a prior frame as a result, obtaining
Take the Expression Recognition result of the facial image of the present frame.
5. according to the method described in claim 4, it is characterized in that, according to the preliminary Expression Recognition result of the present frame and extremely
The Expression Recognition of the facial image of a few prior frame is as a result, obtain the Expression Recognition of the facial image of the present frame as a result, packet
It includes:
By the preliminary facial expression recognition result of the facial image of the present frame and the people of at least facial image of a prior frame
Face Expression Recognition result is weighted processing, obtains the Expression Recognition of the facial image of the present frame as a result, wherein, described to work as
The weight of the preliminary Expression Recognition result of the facial image of previous frame is more than the Expression Recognition result of the facial image of any prior frame
Weight.
6. a kind of convolutional neural networks model training method, which is characterized in that including:
The sample image of training and the information of corresponding face key point are obtained, wherein, someone is included in the sample image
The markup information of face expression;
Human face expression feature extraction is carried out to the sample image by the convolution layer segment of convolutional neural networks model, obtains people
Face expressive features figure;
Determine region of interest ROI corresponding with each face key point in the human face expression characteristic pattern;
Pond processing is carried out to determining each ROI by the pond layer segment of convolutional neural networks model, obtains the ROI behind pond
Characteristic pattern;
According at least to the ROI feature figure, the network parameter of the convolutional neural networks model is adjusted.
7. a kind of expression recognition apparatus, which is characterized in that including:
First determining module, for the facial image to be detected for passing through the convolution layer segment of convolutional neural networks model and obtaining
In face key point, human face expression feature extraction is carried out to facial image to be detected, obtains human face expression characteristic pattern;
2nd the 5th determining module, for determining sense corresponding with each face key point in the human face expression characteristic pattern
Interest region ROI;
Third determining module, the pond layer segment for passing through convolutional neural networks model carry out pond Hua Chu to determining each ROI
Reason obtains the ROI feature figure behind pond;
4th determining module, for obtaining the Expression Recognition result of the facial image according at least to the ROI feature figure.
8. a kind of convolutional neural networks model training apparatus, which is characterized in that including:
First acquisition module, for obtaining the information of the sample image of training and corresponding face key point, wherein, the sample
Include the markup information of human face expression in this image;
Second acquisition module, the convolution layer segment for passing through convolutional neural networks model carry out face table to the sample image
Feelings feature extraction obtains human face expression characteristic pattern;
Third acquisition module, it is corresponding interested with each face key point in the human face expression characteristic pattern for determining
Region ROI;
4th acquisition module, the pond layer segment for passing through convolutional neural networks model carry out pond Hua Chu to determining each ROI
Reason obtains the ROI feature figure behind pond;
5th acquisition module, for according at least to the ROI feature figure, adjusting the network ginseng of the convolutional neural networks model
Number.
9. a kind of electronic equipment, which is characterized in that including:Processor, memory, communication device and communication bus, the processing
Device, the memory and the communication device complete mutual communication by the communication bus;
For the memory for storing an at least executable instruction, the executable instruction makes the processor perform right such as will
Seek any expression recognition methods of 1-5.
10. a kind of electronic equipment, which is characterized in that including:Processor, memory, communication device and communication bus, the processing
Device, the memory and the communication device complete mutual communication by the communication bus;
For the memory for storing an at least executable instruction, the executable instruction makes the processor perform right such as will
Seek the 6 convolutional neural networks model training methods.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611268009.6A CN108229268A (en) | 2016-12-31 | 2016-12-31 | Expression Recognition and convolutional neural networks model training method, device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611268009.6A CN108229268A (en) | 2016-12-31 | 2016-12-31 | Expression Recognition and convolutional neural networks model training method, device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108229268A true CN108229268A (en) | 2018-06-29 |
Family
ID=62656512
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611268009.6A Pending CN108229268A (en) | 2016-12-31 | 2016-12-31 | Expression Recognition and convolutional neural networks model training method, device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108229268A (en) |
Cited By (32)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109190564A (en) * | 2018-09-05 | 2019-01-11 | 厦门集微科技有限公司 | A kind of method, apparatus of image analysis, computer storage medium and terminal |
CN109255827A (en) * | 2018-08-24 | 2019-01-22 | 太平洋未来科技(深圳)有限公司 | Three-dimensional face images generation method, device and electronic equipment |
CN109389030A (en) * | 2018-08-23 | 2019-02-26 | 平安科技(深圳)有限公司 | Facial feature points detection method, apparatus, computer equipment and storage medium |
CN109409262A (en) * | 2018-10-11 | 2019-03-01 | 北京迈格威科技有限公司 | Image processing method, image processing apparatus, computer readable storage medium |
CN109522818A (en) * | 2018-10-29 | 2019-03-26 | 中国科学院深圳先进技术研究院 | A kind of method, apparatus of Expression Recognition, terminal device and storage medium |
CN109544537A (en) * | 2018-11-26 | 2019-03-29 | 中国科学技术大学 | The fast automatic analysis method of hip joint x-ray image |
CN109711356A (en) * | 2018-12-28 | 2019-05-03 | 广州海昇教育科技有限责任公司 | A kind of expression recognition method and system |
CN109840485A (en) * | 2019-01-23 | 2019-06-04 | 科大讯飞股份有限公司 | A kind of micro- human facial feature extraction method, apparatus, equipment and readable storage medium storing program for executing |
CN109977925A (en) * | 2019-04-22 | 2019-07-05 | 北京字节跳动网络技术有限公司 | Expression determines method, apparatus and electronic equipment |
CN110070076A (en) * | 2019-05-08 | 2019-07-30 | 北京字节跳动网络技术有限公司 | Method and apparatus for choosing trained sample |
CN110110611A (en) * | 2019-04-16 | 2019-08-09 | 深圳壹账通智能科技有限公司 | Portrait attribute model construction method, device, computer equipment and storage medium |
CN110135476A (en) * | 2019-04-28 | 2019-08-16 | 深圳市中电数通智慧安全科技股份有限公司 | A kind of detection method of personal safety equipment, device, equipment and system |
CN110162670A (en) * | 2019-05-27 | 2019-08-23 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating expression packet |
CN110176012A (en) * | 2019-05-28 | 2019-08-27 | 腾讯科技(深圳)有限公司 | Target Segmentation method, pond method, apparatus and storage medium in image |
EP3564854A1 (en) * | 2018-10-30 | 2019-11-06 | Baidu Online Network Technology (Beijing) Co., Ltd. | Facial expression recognition method, apparatus, electronic device, and storage medium |
CN110490164A (en) * | 2019-08-26 | 2019-11-22 | 北京达佳互联信息技术有限公司 | Generate the method, apparatus, equipment and medium of virtual expression |
CN110705419A (en) * | 2019-09-24 | 2020-01-17 | 新华三大数据技术有限公司 | Emotion recognition method, early warning method, model training method and related device |
CN110728168A (en) * | 2018-07-17 | 2020-01-24 | 广州虎牙信息科技有限公司 | Part recognition method, device, equipment and storage medium |
CN111259689A (en) * | 2018-11-30 | 2020-06-09 | 百度在线网络技术(北京)有限公司 | Method and apparatus for transmitting information |
CN111259753A (en) * | 2020-01-10 | 2020-06-09 | 杭州飞步科技有限公司 | Method and device for processing key points of human face |
CN111325190A (en) * | 2020-04-01 | 2020-06-23 | 京东方科技集团股份有限公司 | Expression recognition method and device, computer equipment and readable storage medium |
CN111508495A (en) * | 2020-05-02 | 2020-08-07 | 北京花兰德科技咨询服务有限公司 | Artificial intelligent robot cooperating with human and communication method |
CN111723926A (en) * | 2019-03-22 | 2020-09-29 | 北京地平线机器人技术研发有限公司 | Training method and training device for neural network model for determining image parallax |
CN112084953A (en) * | 2020-09-10 | 2020-12-15 | 济南博观智能科技有限公司 | Method, system and equipment for identifying face attributes and readable storage medium |
CN112733574A (en) * | 2019-10-14 | 2021-04-30 | 中移(苏州)软件技术有限公司 | Face recognition method and device and computer readable storage medium |
CN112825122A (en) * | 2019-11-20 | 2021-05-21 | 北京眼神智能科技有限公司 | Ethnicity judgment method, ethnicity judgment device, ethnicity judgment medium and ethnicity judgment equipment based on two-dimensional face image |
CN112911393A (en) * | 2018-07-24 | 2021-06-04 | 广州虎牙信息科技有限公司 | Part recognition method, device, terminal and storage medium |
CN113537124A (en) * | 2021-07-28 | 2021-10-22 | 平安科技(深圳)有限公司 | Model training method, device and storage medium |
CN110097004B (en) * | 2019-04-30 | 2022-03-29 | 北京字节跳动网络技术有限公司 | Facial expression recognition method and device |
CN114627218A (en) * | 2022-05-16 | 2022-06-14 | 成都市谛视无限科技有限公司 | Human face fine expression capturing method and device based on virtual engine |
CN116302294A (en) * | 2023-05-18 | 2023-06-23 | 安元科技股份有限公司 | Method and system for automatically identifying component attribute through interface |
WO2023142886A1 (en) * | 2022-01-28 | 2023-08-03 | 华为技术有限公司 | Expression transfer method, model training method, and device |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102339391A (en) * | 2010-07-27 | 2012-02-01 | 株式会社理光 | Multiobject identification method and device |
CN103258204A (en) * | 2012-02-21 | 2013-08-21 | 中国科学院心理研究所 | Automatic micro-expression recognition method based on Gabor features and edge orientation histogram (EOH) features |
CN104715249A (en) * | 2013-12-16 | 2015-06-17 | 株式会社理光 | Object tracking method and device |
CN105095827A (en) * | 2014-04-18 | 2015-11-25 | 汉王科技股份有限公司 | Facial expression recognition device and facial expression recognition method |
CN105654049A (en) * | 2015-12-29 | 2016-06-08 | 中国科学院深圳先进技术研究院 | Facial expression recognition method and device |
CN105787867A (en) * | 2016-04-21 | 2016-07-20 | 华为技术有限公司 | Method and apparatus for processing video images based on neural network algorithm |
CN106096557A (en) * | 2016-06-15 | 2016-11-09 | 浙江大学 | A kind of semi-supervised learning facial expression recognizing method based on fuzzy training sample |
-
2016
- 2016-12-31 CN CN201611268009.6A patent/CN108229268A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102339391A (en) * | 2010-07-27 | 2012-02-01 | 株式会社理光 | Multiobject identification method and device |
CN103258204A (en) * | 2012-02-21 | 2013-08-21 | 中国科学院心理研究所 | Automatic micro-expression recognition method based on Gabor features and edge orientation histogram (EOH) features |
CN104715249A (en) * | 2013-12-16 | 2015-06-17 | 株式会社理光 | Object tracking method and device |
CN105095827A (en) * | 2014-04-18 | 2015-11-25 | 汉王科技股份有限公司 | Facial expression recognition device and facial expression recognition method |
CN105654049A (en) * | 2015-12-29 | 2016-06-08 | 中国科学院深圳先进技术研究院 | Facial expression recognition method and device |
CN105787867A (en) * | 2016-04-21 | 2016-07-20 | 华为技术有限公司 | Method and apparatus for processing video images based on neural network algorithm |
CN106096557A (en) * | 2016-06-15 | 2016-11-09 | 浙江大学 | A kind of semi-supervised learning facial expression recognizing method based on fuzzy training sample |
Non-Patent Citations (1)
Title |
---|
ROSS GIRSHICK: ""Fast R-CNN"", 《2015 IEEE INTERNATIONAL CONFERENCE ON COMPUTER VISION》 * |
Cited By (50)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110728168B (en) * | 2018-07-17 | 2022-07-22 | 广州虎牙信息科技有限公司 | Part recognition method, device, equipment and storage medium |
CN110728168A (en) * | 2018-07-17 | 2020-01-24 | 广州虎牙信息科技有限公司 | Part recognition method, device, equipment and storage medium |
CN112911393B (en) * | 2018-07-24 | 2023-08-01 | 广州虎牙信息科技有限公司 | Method, device, terminal and storage medium for identifying part |
CN112911393A (en) * | 2018-07-24 | 2021-06-04 | 广州虎牙信息科技有限公司 | Part recognition method, device, terminal and storage medium |
CN109389030A (en) * | 2018-08-23 | 2019-02-26 | 平安科技(深圳)有限公司 | Facial feature points detection method, apparatus, computer equipment and storage medium |
CN109255827A (en) * | 2018-08-24 | 2019-01-22 | 太平洋未来科技(深圳)有限公司 | Three-dimensional face images generation method, device and electronic equipment |
CN109190564A (en) * | 2018-09-05 | 2019-01-11 | 厦门集微科技有限公司 | A kind of method, apparatus of image analysis, computer storage medium and terminal |
CN109409262A (en) * | 2018-10-11 | 2019-03-01 | 北京迈格威科技有限公司 | Image processing method, image processing apparatus, computer readable storage medium |
CN109522818B (en) * | 2018-10-29 | 2021-03-30 | 中国科学院深圳先进技术研究院 | Expression recognition method and device, terminal equipment and storage medium |
CN109522818A (en) * | 2018-10-29 | 2019-03-26 | 中国科学院深圳先进技术研究院 | A kind of method, apparatus of Expression Recognition, terminal device and storage medium |
US11151363B2 (en) | 2018-10-30 | 2021-10-19 | Baidu Online Network Technology (Beijing) Co., Ltd. | Expression recognition method, apparatus, electronic device, and storage medium |
EP3564854A1 (en) * | 2018-10-30 | 2019-11-06 | Baidu Online Network Technology (Beijing) Co., Ltd. | Facial expression recognition method, apparatus, electronic device, and storage medium |
CN109544537A (en) * | 2018-11-26 | 2019-03-29 | 中国科学技术大学 | The fast automatic analysis method of hip joint x-ray image |
CN111259689A (en) * | 2018-11-30 | 2020-06-09 | 百度在线网络技术(北京)有限公司 | Method and apparatus for transmitting information |
CN111259689B (en) * | 2018-11-30 | 2023-04-25 | 百度在线网络技术(北京)有限公司 | Method and device for transmitting information |
CN109711356A (en) * | 2018-12-28 | 2019-05-03 | 广州海昇教育科技有限责任公司 | A kind of expression recognition method and system |
CN109711356B (en) * | 2018-12-28 | 2023-11-10 | 广州海昇教育科技有限责任公司 | Expression recognition method and system |
CN109840485B (en) * | 2019-01-23 | 2021-10-08 | 科大讯飞股份有限公司 | Micro-expression feature extraction method, device, equipment and readable storage medium |
CN109840485A (en) * | 2019-01-23 | 2019-06-04 | 科大讯飞股份有限公司 | A kind of micro- human facial feature extraction method, apparatus, equipment and readable storage medium storing program for executing |
CN111723926B (en) * | 2019-03-22 | 2023-09-12 | 北京地平线机器人技术研发有限公司 | Training method and training device for neural network model for determining image parallax |
CN111723926A (en) * | 2019-03-22 | 2020-09-29 | 北京地平线机器人技术研发有限公司 | Training method and training device for neural network model for determining image parallax |
CN110110611A (en) * | 2019-04-16 | 2019-08-09 | 深圳壹账通智能科技有限公司 | Portrait attribute model construction method, device, computer equipment and storage medium |
CN109977925A (en) * | 2019-04-22 | 2019-07-05 | 北京字节跳动网络技术有限公司 | Expression determines method, apparatus and electronic equipment |
CN109977925B (en) * | 2019-04-22 | 2020-11-27 | 北京字节跳动网络技术有限公司 | Expression determination method and device and electronic equipment |
CN110135476A (en) * | 2019-04-28 | 2019-08-16 | 深圳市中电数通智慧安全科技股份有限公司 | A kind of detection method of personal safety equipment, device, equipment and system |
CN110097004B (en) * | 2019-04-30 | 2022-03-29 | 北京字节跳动网络技术有限公司 | Facial expression recognition method and device |
CN110070076A (en) * | 2019-05-08 | 2019-07-30 | 北京字节跳动网络技术有限公司 | Method and apparatus for choosing trained sample |
US11023716B2 (en) | 2019-05-27 | 2021-06-01 | Beijing Bytedance Network Technology Co., Ltd. | Method and device for generating stickers |
CN110162670A (en) * | 2019-05-27 | 2019-08-23 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating expression packet |
CN110162670B (en) * | 2019-05-27 | 2020-05-08 | 北京字节跳动网络技术有限公司 | Method and device for generating expression package |
CN110176012B (en) * | 2019-05-28 | 2022-12-13 | 腾讯科技(深圳)有限公司 | Object segmentation method in image, pooling method, device and storage medium |
CN110176012A (en) * | 2019-05-28 | 2019-08-27 | 腾讯科技(深圳)有限公司 | Target Segmentation method, pond method, apparatus and storage medium in image |
CN110490164A (en) * | 2019-08-26 | 2019-11-22 | 北京达佳互联信息技术有限公司 | Generate the method, apparatus, equipment and medium of virtual expression |
CN110705419A (en) * | 2019-09-24 | 2020-01-17 | 新华三大数据技术有限公司 | Emotion recognition method, early warning method, model training method and related device |
CN112733574A (en) * | 2019-10-14 | 2021-04-30 | 中移(苏州)软件技术有限公司 | Face recognition method and device and computer readable storage medium |
CN112733574B (en) * | 2019-10-14 | 2023-04-07 | 中移(苏州)软件技术有限公司 | Face recognition method and device and computer readable storage medium |
CN112825122A (en) * | 2019-11-20 | 2021-05-21 | 北京眼神智能科技有限公司 | Ethnicity judgment method, ethnicity judgment device, ethnicity judgment medium and ethnicity judgment equipment based on two-dimensional face image |
CN111259753A (en) * | 2020-01-10 | 2020-06-09 | 杭州飞步科技有限公司 | Method and device for processing key points of human face |
CN111325190A (en) * | 2020-04-01 | 2020-06-23 | 京东方科技集团股份有限公司 | Expression recognition method and device, computer equipment and readable storage medium |
CN111325190B (en) * | 2020-04-01 | 2023-06-30 | 京东方科技集团股份有限公司 | Expression recognition method and device, computer equipment and readable storage medium |
WO2021196928A1 (en) * | 2020-04-01 | 2021-10-07 | 京东方科技集团股份有限公司 | Expression recognition method and apparatus, computer device, and readable storage medium |
US12002289B2 (en) | 2020-04-01 | 2024-06-04 | Boe Technology Group Co., Ltd. | Expression recognition method and apparatus, computer device, and readable storage medium |
CN111508495A (en) * | 2020-05-02 | 2020-08-07 | 北京花兰德科技咨询服务有限公司 | Artificial intelligent robot cooperating with human and communication method |
CN112084953A (en) * | 2020-09-10 | 2020-12-15 | 济南博观智能科技有限公司 | Method, system and equipment for identifying face attributes and readable storage medium |
CN112084953B (en) * | 2020-09-10 | 2024-05-10 | 济南博观智能科技有限公司 | Face attribute identification method, system, equipment and readable storage medium |
CN113537124A (en) * | 2021-07-28 | 2021-10-22 | 平安科技(深圳)有限公司 | Model training method, device and storage medium |
WO2023142886A1 (en) * | 2022-01-28 | 2023-08-03 | 华为技术有限公司 | Expression transfer method, model training method, and device |
CN114627218A (en) * | 2022-05-16 | 2022-06-14 | 成都市谛视无限科技有限公司 | Human face fine expression capturing method and device based on virtual engine |
CN116302294A (en) * | 2023-05-18 | 2023-06-23 | 安元科技股份有限公司 | Method and system for automatically identifying component attribute through interface |
CN116302294B (en) * | 2023-05-18 | 2023-09-01 | 安元科技股份有限公司 | Method and system for automatically identifying component attribute through interface |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108229268A (en) | Expression Recognition and convolutional neural networks model training method, device and electronic equipment | |
CN110569795B (en) | Image identification method and device and related equipment | |
CN108229269A (en) | Method for detecting human face, device and electronic equipment | |
CN108038469B (en) | Method and apparatus for detecting human body | |
CN108875708A (en) | Behavior analysis method, device, equipment, system and storage medium based on video | |
CN106407889A (en) | Video human body interaction motion identification method based on optical flow graph depth learning model | |
CN106326857A (en) | Gender identification method and gender identification device based on face image | |
CN107358157A (en) | A kind of human face in-vivo detection method, device and electronic equipment | |
CN105469376B (en) | The method and apparatus for determining picture similarity | |
CN110443189A (en) | Face character recognition methods based on multitask multi-tag study convolutional neural networks | |
CN105426850A (en) | Human face identification based related information pushing device and method | |
CN106068514A (en) | For identifying the system and method for face in free media | |
CN108717663A (en) | Face label fraud judgment method, device, equipment and medium based on micro- expression | |
CN109344759A (en) | A kind of relatives' recognition methods based on angle loss neural network | |
CN108629326A (en) | The action behavior recognition methods of objective body and device | |
CN106295591A (en) | Gender identification method based on facial image and device | |
CN110532925B (en) | Driver fatigue detection method based on space-time graph convolutional network | |
CN110197729A (en) | Tranquillization state fMRI data classification method and device based on deep learning | |
CN110503081A (en) | Act of violence detection method, system, equipment and medium based on inter-frame difference | |
CN106897659A (en) | The recognition methods of blink motion and device | |
CN113723530B (en) | Intelligent psychological assessment system based on video analysis and electronic psychological sand table | |
CN104517097A (en) | Kinect-based moving human body posture recognition method | |
CN107292229A (en) | A kind of image-recognizing method and device | |
CN109034134A (en) | Abnormal driving behavioral value method based on multitask depth convolutional neural networks | |
CN107316029A (en) | A kind of live body verification method and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180629 |
|
RJ01 | Rejection of invention patent application after publication |