CN107423707A

CN107423707A - A kind of face Emotion identification method based under complex environment

Info

Publication number: CN107423707A
Application number: CN201710612421.3A
Authority: CN
Inventors: 陈援东; 陈忠
Original assignee: Shenzhen Paluo Artificial Intelligence Technology Co Ltd
Current assignee: Shenzhen Paluo Artificial Intelligence Technology Co Ltd
Priority date: 2017-07-25
Filing date: 2017-07-25
Publication date: 2017-12-01

Abstract

The invention discloses a kind of face Emotion identification method based under mobile embedded complex environment, face is divided into forehead, looks, cheek, nose, face, chin main region by this method, and it is further divided into 68 characteristic points, for features described above point, in order to realize the discrimination of face Emotion identification, the degree of accuracy and reliability under circumstances, the method of user's face portion and expressive features classification in the case of normality, uses methods of the Faster R CNN based on facial zone convolutional neural networks under illumination, reflection, shadowed condition；The method that Bayesian network, Markov Chain, variation reasoning in the case where moving, shake, rocking and moving complex situations are combined；The method being combined in the case where face shows incomplete, plurality of human faces environment, noisy background condition using depth convolutional neural networks and super resolution degree antagonism neutral net SRGANs, enhancing study, back-propagation algorithm, discard algorithm, it is effectively facilitated effect, the degree of accuracy and the reliability of expression recognition.

Description

A kind of face Emotion identification method based under complex environment

Technical field

The present invention relates to the technical field that the moving video image of artificial intelligence identifies, more particularly under a kind of complex environment Face identification method, this is an emerging field in current global range.

Background technology

Recognition of face based on video image is an emerging important research field in the world, is reached the U.S. is tall and handsome (NVIDIA) since releasing the deep learning graphics processor video card (GPU) based on server and host computer within 2006, the world Leading high-tech company such as Google, Apple Inc., Amazon Company, well-known scientific research institutions such as Massachusetts science and engineering, Berkeley And Stamford, one after another using tall and handsome application of the GPU video cards research and development based on image recognition reached.By contrast, the mobile insertion in the whole world The video image identification market of formula lag far behind the similar application based on server, until in mid-May, 2017 just have including It is tall and handsome reach including manufacturer start to provide the GPU modules of high-performance machine vision and deep learning, mobile embedded video figure When being in and save up strength to start out as identification application market.

Recognition of face mainly includes three facial zone division, feature point extraction and classification parts.Facial zone divides Face is divided into several interest regions (Region ofInterest)；Feature point extraction is carried from each interest region The characteristic point of key is taken, then merges characteristic point, it is linear that feature extracting method generally comprises PCA principal component analysis, LDA The methods of analysis；Classification is the suitable discrimination standard of selection and categorised decision, designs the grader such as RVM for being adapted to face characteristic Method Using Relevance Vector Machine, Kernel nuclear technology, statistical nature or neutral net carry out Classification and Identification.In facial image identification, feature Point extraction and classification need design parameter, and determine the weights of parameter, are weighted processing, obtain image transform coefficients matrix, So as to realize the recognition of face with certain accuracy rate.In mobile embedded video human face identification, recognition of face needs gram Take the various information such as light and shade, operating state, face position, more people and the background noisy degree of light.Currently used for recognition of face Prior art has CNN convolutional neural networks, Bayesian network (Bayes Network), depth convolutional neural networks (Deep Convolutional Neural Network), antagonism neutral net GANs (Super-resolution Generative Adversarial Network), enhancing study (Reinforcement Learning), sparse features, random forest, action The methods of tracking, statistical model or SVMs.In field of face identification, current most companies are all to use above-mentioned What single method, single method can reach certain effect, but also have in the presence of very big black box, such as in camera The various complex environments such as illumination captured, fuzzy, shade, rock, shake, move, face show not entirely, plurality of human faces environment, noise Discrimination under miscellaneous background is not high, accuracy rate is not high, realizes cost prohibitive.In ARM central processing units and (SuSE) Linux OS structure Into mobile embedded calculating platform on, the method for in the market Emotion identification lacks the support of bottom core face recognition technology, Apply not or rarely the efficient quickly face Emotion identification case in mobile embedded type equipment video.Appointing more than therefore The face Emotion identification method of what monotechnics exploitation, all it is difficult to develop into skill on the basis of mobile embedded calculating platform Art scheme, therefore be difficult industrialization.

The content of the invention

For the weak point of generally existing in above-mentioned technology, primary and foremost purpose of the present invention is to provide one kind and is based on complex environment Under face Emotion identification method, this method solved in mobile embedded various complex environments such as illumination, fuzzy, shade, shifting Move, rock, shaking, face position is not complete, the face Emotion identification rate that is obtained in real time under plurality of human faces environment, noisy background, accurate Degree and reliability.

Another object of the present invention is to provide a kind of face Emotion identification method based under complex environment, and this method uses The image steganalysis (Image and Pattern Recognition) and deep learning (Deep of traditional artificial intelligence Learning the method) being combined, bases such as " happy, sad, angry, frightened, detest, surprised, contempt, neutrality " is effectively differentiated This mood and deeper into combination mood, promote it is man-machine between affective interaction, promote human-computer interaction amusement as in various types of games The four-dimensional dummy environment such as emotion stage property, augmented reality or virtual reality in virtual and simulation games, interpersonal long-range Emotive advisory, mood regulation and emotion communication in video exchange.

To achieve these goals, implementation method of the invention is described as follows.

A kind of face Emotion identification method based under complex environment, it is characterised in that face is divided into volume by this method Head, looks, cheek, nose, face, chin main region, and it is further divided into 68 characteristic points.For above-mentioned 68 features Point, in order to realize discrimination, the degree of accuracy and reliability in various complex environment human face Emotion identifications, in illumination, reflection, the moon The method being combined under the conditions of shadow using the quick regional area convolutional neural networks of Faster R-CNN and RVM Method Using Relevance Vector Machines； Bayesian network (Bayes Network) in the case where moving, shake, rocking and moving complex situations, Markov Chain, variation push away Manage the method being combined；Depth convolutional neural networks are used in the case where face shows incomplete, plurality of human faces environment, noisy background condition (Deep Convolutional Neural Network) and super resolution degree antagonism neutral net SRGANs (Super- Resolution Generative Adversarial Network), enhancing study (Reinforcement Learning), The method that back-propagation algorithm (Backpropagation), discard algorithm (Dropout) are combined, is effectively facilitated face table Effect, the degree of accuracy and the reliability of feelings identification.

Described characteristic point includes 68, forms the uniform characteristics distribution of face, and provides a set of testing index, the finger Mark includes traceable (Traceable), slight (Slight), significantly medium (Medium), (Marked), very big (Great), most (Maximum) totally 6 states by force.

Further, some combinational algorithms, including face characteristic and texture processing algorithm, light are developed in terms of application technology Processing Algorithm and plurality of human faces track algorithm etc. are imitated, and mobile embedded software is built based on C language and builds internet using soft Part.

Specifically, expression recognition needs to overcome the light and shade of light, operating state, face position, more people and background The various information such as noisy degree.For any of the above complexity face recognition process, feature and grader are subjected to combined optimization, hair Wave the function of feature representation and Classifier combination cooperation, and by (input layer, hidden layer and the output layer totally 3 of traditional artificial intelligence It is more than layer) non-linear the reflecting of Linear Mapping and deep-neural-network (including at least more than 20 layers of input layer, hidden layer, output layer) Penetrate and be combined, such as under conditions of illumination, fuzzy, shade, using traditional RWM Method Using Relevance Vector Machines feature point extraction and The tagsort of Faster R-CNN neutral nets is combined, and promotes the face Emotion identification degree of accuracy under these conditions, so It is afterwards to use the number with powerful learning ability and efficient feature representation ability (Representation) in deep learning Model is learned, changes picture to face from the initial data of Pixel-level, edge, texture, part, whole face picture, multiple faces Expression successively extracts information.

Further, such as continuous action, display do not use depth entirely, under the conditions of noisy environment under other complex situations Learning method, including Deep CNN depth convolutional neural networks, Bayes CNN Bayes convolutional neural networks, SRGANs supersolution Analysis degree confrontation generation network, RL enhancing learning method in 2 or 2 above in association with method, realize local feature region obtain, Global sign ability, the interference in recognition of face assorting process and decision method, so as to improve internet and environment of internet of things Lower movement regards the face Emotion identification degree of accuracy of embedded video captured in real-time.

Further, for face in identification process, interference can also be carried out and handled, i.e., the deep layer volume more than 20 layers In product neutral net, using superiority of the multilayer neural network in abstract, cluster, sign, mobile embedded type equipment is being ensured Video capture during face efficient process while, peeled off for various complicated non-face intrinsic features；I Combine super resolution degree confrontation network, it by the positive counter-example method of antagonistic training calculate positive counter-example facial image to damage-retardation Become estranged reconstruct high-definition picture and original high-resolution image loss weighted sum, obtain above complex situations under face table Feelings identify；Back-propagation algorithm is used in deep neural network learning process, checked the error of recognition of face, and is used Stochastic gradient descent (Stochastic Gradient Descent) method is to hidden layer-hidden layer, hidden layer-output layer Weights are updated, and are promoted the global optimization during neural network learning, are formed the video capture continuous picture based on prediction Expression recognition result method.

Specifically, fast area convolutional neural networks (Faster R-CNN) are used under illumination, reflection or shadow condition The method for being combined sum with traditional RVM graders.Pass through the Face datection fast positioning human face region from video frequency pick-up head rapidly And face features point information is obtained, specifically, by intensity of illumination and direction, reflection, the quantization of shade illumination, adopting With illumination mode is determined, illumination system is divided into light source, face and photoenvironment, determines the global context light in illumination model Intensity, point of observation position, the front and back of face and background calculates illumination, specular color calculates and its separation, to illumination, anti- Penetrate and carry out illumination elimination, reflection-absorption and illumination compensation with shade, eliminate bearing for the recognition of face under these illumination complex situations Face rings；Using the diffusing reflection model (Diffuse Reflectivity Model) of illumination model including ambient light, skin Specular reflectance model (Mirror Reflectivity Model) and lambert's illumination model (Lambert based on material and texture Illumination Model) in a kind of illumination to face handle, generate various training samples and the survey of light irradiation Sample sheet；The mode for taking size normalization, unitary of illumination to combine carries out abstract pretreatment to facial image.Size normalizes Algorithm is to be changed in given change to each position equiblibrium mass distribution of the feature with invariance such as face area, girth, face, is carried out Image rotation, zoom in or out, be met the face standard picture of needs；Illumination normalizing algorithm is to be imaged to manage in Retinex On the basis of, a kind of multiresolution fast algorithm based on Total Variation is established, for calculating the illumination component in image, And then calculate the reflectance factor image of reflection image texture, finally the reflectance factor of the illumination of low frequency and high frequency is separated, Obtain accurate face picture.Non-face intrinsic attribute is stripped out from face by more than；By the human face region after segmentation Image is sent in the fast area convolutional neural networks trained, the feature of each region portion of quick obtaining face；Series connection is all The feature of human face region, and using the grader of Method Using Relevance Vector Machine RVM (Relevance Vector Machine) supervised learning The classification of expression is carried out, forms face Emotion identification (the Facial Emotion in illumination, reflection or shade Recognition)。

Under movement, the complex situations rocked and obscured, according to the characteristic of division under static state, Bayes's convolution is used Neutral net (Bayes CNN) carries out unsupervised deep learning, the face picture continuously captured according to video, extraction face spy Sign, is classified, and solve data fitting problems automatically using Bayes classifier.Under general distribution and common dimension, adopt (Expectation Maximization) algorithm is maximized with EM desired values, sufficient desired value is calculated in statistics, Using the existing estimate to face parameter hidden variable, maximum likelihood estimator (Maximum Likelihood are calculated Estimation)；People is calculated by MAP estimation (MaximumAPosteriori) on the basis of desired value is maximized The value of face parameter.By the method for data mining, to data being described property cluster analysis, according to similarity, space length or If density by data object automatically be respectively Ganlei, such as space length method in the space at each position of face have it is constant Shape.Descriptive cluster is mainly that the general property of face characteristic data is classified, it is intended to finds that facial recognition data collection owns Similitude, regularity and pattern in class between variable, for example judge the combination of the moving cell of face region of interest, form new spy Seek peace classification, and form trickleer mood.Using Naive Bayes Classifier (Bayes) method carry out above class and The reasoning of numeric data, eight kinds of moods such as pleasure, anger, sorrow, happiness of face are classified and segmented；In movement, rock and obscure Under conditions of, it is necessary to carry out video frequency object tracking, positioning and identification for dynamic human face object, this relates to high-dimensional condition The solution of distribution.Under these conditions, the picture convolution integral computing of posterior probability can not be directly calculated, therefore can using Ma Er Husband's chain Meng Teka multi-methods (MCMC).MCMC is one group and having for face and Emotional Picture sample is progressively produced from Posterior distrbutionp Efficacious prescriptions method, it obtains the inference on model parameter by the average value of the face sample under continuous acquisition above dynamic condition, Approximate Likelihood Function value is solved, the optimization problem for solving the face Emotion identification under conditions above.It is different in the parameter calculated in real time In the case of often huge, such as under the conditions of the dynamic human face crawl of 60 frame per second, every equipment picture meter of daily 24 hours Calculation amount is at 60*60*60*24=5184000 pictures (relative to iPhone average 3000 photos), and we are using becoming decibel Leaf this method processing video pictures data.Variational Bayesian method is a kind of method approximate solution Bayes by function approximation A kind of technology of the integration of difficult product in reasoning and machine learning, it is generally used for the complexity under the conditions of real-time video picture big data Statistical model, it can be considered as the extension of EM algorithms, and it introduces latent variable on the basis of observable variable and parameter value, Approached by optimal method, so as to solve the imponderable problem of more probability distribution under video dynamic frame environment, And solves the imponderable problem of big matrix inversion of picture convolution；With Backpropagation algorithms recall training with When upper video pictures data, in order to prevent the over-fitting problem of data, we are removed implicit at random using Dropout algorithms The neuron of layer and visible layer, Bayes's CNN networks are trained in multiple batch processing, and draw consensus forecast rate, so as to return Receive (or extensive) go out a reliable and stable Bayesian model；In this process, we are also using enhancing learning method (Reinforcement Learning), the intelligently mapping from environment to behavior under conditions of unsupervised so that reward letter Number functional value maximum, heighten the reliability of Bayesian network.Intensified learning under embedded environment mainly includes analysis policy The rewards and punishments assessment of the strategy process, value network of network, the algorithm of prototype network are recommended, and search for optimal decision behavior respectively, Analysis is moved, rocked, the video human face picture classification under fuzzy situation, for example judges to shake by the frame difference between analyzing per frame Dynamic program or the intelligent connecting for judging fuzzy state, carrying out Intelligent human-face picture by continuous time series, shadow The interference feature for ringing face and Expression Recognition is peeled off, and is realized movement, is rocked and the expression recognition under hazy condition Accuracy and reliability, environment is characterized using model, carried out in the movement of AMR mobile embedded type equipments crawl or continuous face Expression Recognition.

In the case where face shows incomplete, plurality of human faces environment, noisy background, in order to ensure the reliability of facial expression classification, Processing face show it is not full-time, employ Deep CNN depth convolutional neural networks, SRGANs super resolutions degree confrontation generate network, The method that RL enhancing study is combined, using the sparse feature of the moderate of the upper strata neuron response of deep neural network to face Attribute analyzed, in the case where detection recognizes face, the feature such as gray value or marginal value that obtain face are being protected Hold in the class for keeping minimum while change between class and change, there is very strong selectivity to face and the attribute of expression, to face In the case of partial occlusion, this method accurately can identify face by behavioral characteristics, have very strong robustness, and lead to Cross the face that the global information of face and the contextual information of successive video frames identify needs to judge, and call face system Shape of face in system is spliced to showing that infull face carries out completion, feature is described and expression is global, obtains the face for needing to identify Partial data；In the case of plurality of human faces, face snap lighting system is regarded simultaneously from the multiple faces of motion tracking by continuously capturing The residence time of frequency frame, judge to need the importance and precedence of multiple faces, and each face machine expression is carried out Classification is handled；Using scale invariant feature conversion SIFT (Scale Invariant Feature under noisy background Transformation) algorithm, detect and position the background object unrelated with face such as window, chair, landscape painting etc. simultaneously handle They are peeled off away from the face picture for needing to identify, only display needs the face and expression of Tracking Recognition.During this Using Backpropagation algorithms (algorithm introduction see before), in the case where output layer cannot get data, along boarding steps The weights of each neuron are changed in backtracking on the basis of degree declines, and obtain mean test value, optimize Deep CNN convolution models.We Network is resisted using SRGANs super resolutions degree, it calculates pair of positive counter-example facial image by the positive counter-example method of antagonistic training Damage-retardation become estranged reconstruct high-definition picture and original high-resolution image loss weighted sum, obtain above complex situations under people Face Expression Recognition.Antagonistic training mainly includes maker, arbiter and ultramagnifier, and wherein maker is responsible for exporting interference people The true and false that face picture, arbiter carry out face picture judges；It is reconstitution training be to original high definition and it is down-sampled after it is low The paired samples of resolution ratio is trained, by reconstructing Feature Mapping between high-definition picture and original high-resolution image Euclidean distance, it is fine definition picture low resolution picture reconstruction, and feeds back to Deep CNN and carry out complicated ring above Face and Emotion identification under border.

Present invention incorporates image and the methodology of pattern-recognition, neutral net and Bayes networking, has invented based on shifting Move Embedded complex environment human face Emotion identification method, with reference to the embedded software based on C, based on three-tier architecture because The various high-new skills such as working application software, figure and pattern-recognition, deep neural network, complex mathematical model, big data application Art, each fields such as the four-dimensional environment of robot, interaction entertainment, Internet of Things, internet, virtual reality and augmented reality have than Higher market value.

The present invention combines traditional images and simple direct and the efficient classification of neutral net and the automated reasoning of pattern-recognition Etc. advantage, and using super resolution degree antagonism neutral net, enhancing study and back-propagation algorithm, improve shallow-layer and depth Practise it is various under the conditions of mobile embedded environment in frame of video operation and the synchronism stability of expression recognition run, and can For moving, obscuring, rocking, shaking shade, illumination variation, face position show it is real under incomplete, plurality of human faces environment, noisy background Real-time human face expression processing, has reached high accuracy, high efficiency and reliability when real.

Brief description of the drawings

Fig. 1 is the expression recognition and classification process figure that the present invention is realized.

Fig. 2 is the flow chart that the present invention realizes quick local convolutional neural networks.

Fig. 3 is the flow chart that the present invention realizes Bayes's convolutional neural networks.

Fig. 4 is the flow chart that the present invention realizes Deep CNN depth convolutional neural networks.

Embodiment

In order to more clearly state the present invention, the present invention is further described below in conjunction with the accompanying drawings.

Shown in Fig. 1, for the overall flow figure based on the face Emotion identification method under complex environment realized of the present invention, The flow of expression recognition is：First video pictures are inputted, Face datection is then carried out, testing result is pre-processed, Interference (going interference to handle) is removed, then enters line edge and is mapped into pedestrian's face region recognition and human face region selection, is directed to simultaneously Data after mapping carry out carrying out 8 area-of-interests Gabor wavelet conversion, carry out uniform pattern to the image after conversion LBP feature extractions, dimension-reduction treatment is then carried out to the feature of extraction by PCA principal component analysis, carries out characteristic vector contrast, fortune Moving cell and neutral expression's contrast, finally draw 8 kinds of expressions.

Relative normality ARM screens shooting face under the conditions of, using image-recognizing method by Face datection rapidly from Fast positioning human face region and face features point information is obtained in video frequency pick-up head；According to face 68 characteristic points of face Position, 8 area-of-interests are carried out with Gabor wavelet conversion, the LBP features to the image progress uniform pattern after conversion carry Take, carry out dimension-reduction treatment to the feature of extraction by PCA principal component analysis, and by with the feature in training, test sample to Amount is contrasted, and forms face and the preliminary recognition result of expression；Basic exercise unit (the Basic showed according to human face region Action Unit) with the contrast of neutral expression (Neutral Facial Expression) moving cell, with reference to face and table The preliminary recognition result of feelings, human face expression is finely divided and finally draws 8 kinds of main facial expression classifications and segments classification, shape Into face Emotion identification.

Face is divided into forehead, looks, cheek, nose, face, chin main region, and is further divided into some Individual characteristic point, for features described above point, in order to realize the discrimination of face Emotion identification, the degree of accuracy and reliable under circumstances Property, the method for user's face portion and expressive features classification in the case of normality；Used under illumination, reflection, shadowed condition Methods of the Faster R-CNN based on facial zone convolutional neural networks；In the case where moving, shake, rocking and moving complex situations The method that Bayesian network (Bayes Network), Markov chain Monte-Carlo method, variation reasoning are combined；In face Display is not complete, uses depth convolutional neural networks (Deep Convolutional under plurality of human faces environment, noisy background condition Neural Network) and super resolution degree antagonism neutral net SRGANs (Super-resolution Generative Adversarial Network), enhancing study (Reinforcement Learning), back-propagation algorithm (Backpropagation), the method that discard algorithm (Dropout) is combined, be effectively facilitated expression recognition effect, The degree of accuracy and reliability.

Described characteristic point, it is 68 characteristic points, forms the characteristic point distribution of face, and a set of testing index is provided, institute Stating index includes traceable (Traceable), slight (Slight), significantly medium (Medium), (Marked), greatly (Great), most strong (Maximum) totally 6 states.

Expression recognition needs to overcome the light and shade of light, operating state, face position, more people and background noisy degree etc. each Kind information.For any of the above complexity face recognition process, feature and grader separate or combined optimization, performance are special Sign expression and Classifier combination cooperation function, and by traditional artificial intelligence (input layer, hidden layer and output layer totally 3 layers with On) Nonlinear Mapping of Linear Mapping and deep-neural-network (including at least more than 20 layers of input layer, hidden layer, output layer) enters Row combines

Under conditions of illumination, fuzzy, shade, using the feature point extraction and Faster R- of traditional RVM Method Using Relevance Vector Machines The tagsort of CNN neutral nets is combined, and promotes the face Emotion identification degree of accuracy under these conditions, then in depth Habit is to use the mathematical modeling with powerful learning ability and efficient feature representation ability (Representation), from picture The initial data of plain level, edge, texture, part, whole face successively extract information to human face expression.

Deep learning method, including Deep CNN depth are used under the conditions of continuous action, display not complete, noisy environment etc. Convolutional neural networks, Bayes CNN Bayes convolutional neural networks, SRGANs super resolutions degree confrontation generation network, RL enhancings are learned 2 in learning method or 2 above in association with method, realize local feature region acquisition, global sign ability, recognition of face classification During interference and decision method, regard embedded video captured in real-time so as to improve under internet and environment of internet of things movement The face Emotion identification degree of accuracy.

For face in identification process, interference can also be carried out and handled, i.e., the deep layer convolutional Neural more than 20 layers In network, using superiority of the multilayer neural network in abstract, cluster, sign, the video of mobile embedded type equipment is being ensured During candid photograph while face efficient process, peeled off for various complicated non-face intrinsic features；We combine Super resolution degree resists network, and it calculates the confrontation loss and again of positive counter-example facial image by the positive counter-example method of antagonistic training Structure high-definition picture and the weighted sum of original high-resolution image loss, the human face expression obtained under above complex situations are known Not；Back-propagation algorithm is used in deep neural network learning process, checked the error of recognition of face, and using random Gradient declines (Stochastic Gradient Descent) method to hidden layer-hidden layer, the weights of hidden layer-output layer It is updated, promotes the global optimization during neural network learning, forms the people of the video capture continuous picture based on prediction Face Expression Recognition result method.

As shown in Fig. 2 fast area convolutional neural networks (Faster R-CNN) are used under illumination, reflection or shadow condition The method for being combined sum with traditional RVM correlation classifiers.Pass through the Face datection fast positioning face from video frequency pick-up head rapidly Region simultaneously obtains face features point information, enters line light source and photoenvironment analysis, then carry out illumination elimination, absorption and benefit Repay, then establish photo-irradiation treatment model (including diffusing reflection, mirror-reflection and lambert's illumination model), according to model generation test Sample, then pre-processed, carry out size normalization and Retinex normalizeds, then peel off non-face feature, carried out Faster R-CNN facial images are trained, and the acquisition of face local feature and series connection are then carried out, finally by RVM correlation classifiers Carry out Expression Recognition.

Specifically, by intensity of illumination and direction, reflection, the quantization of shade illumination, using determining illumination mode, Illumination system is divided into light source, face and photoenvironment, determine global context light intensity in illumination model, point of observation position, Face and the front and back of background calculate illumination, specular color calculates and its separation, and illumination is carried out to illumination, reflection and shade Elimination, reflection-absorption and illumination compensation, eliminate the negative effect of the recognition of face under these illumination complex situations；Using illumination Model includes the diffusing reflection model (Diffuse Reflectivity Model) of ambient light, the specular reflectance model of skin (Mirror Reflectivity Model) and lambert's illumination model (Lambert based on material and texture Illumination Model) in a kind of illumination to face handle, generate various training samples and the survey of light irradiation Sample sheet；The mode for taking size normalization, unitary of illumination to combine carries out abstract pretreatment to facial image.Size normalizes Algorithm is to be changed in given change to each position equiblibrium mass distribution of the feature with invariance such as face area, girth, face, is carried out Image rotation, zoom in or out, be met the face standard picture of needs；Illumination normalizing algorithm is to be imaged to manage in Retinex On the basis of, a kind of multiresolution fast algorithm based on Total Variation is established, for calculating the illumination component in image, And then calculate the reflectance factor image of reflection image texture, finally the reflectance factor of the illumination of low frequency and high frequency is separated, Obtain accurate face picture.Non-face intrinsic attribute is stripped out from face by more than；By the human face region after segmentation Image is sent in the fast area convolutional neural networks trained, the feature of each region portion of quick obtaining face；Series connection is all The feature of human face region, and using the grader of Method Using Relevance Vector Machine RVM (Relevance Vector Machine) supervised learning The classification of expression is carried out, forms face Emotion identification (the Facial Emotion in illumination, reflection or shade Recognition)。

In the training of positive counter-example, the interference picture generation of antagonism neutral net and the advantage differentiated, pass through maker (Generator) face picture of input tape jamming pattern, maker face figure is verified by arbiter (Discriminator) The true and false of piece, genuine face picture will be determined as by ultramagnifier (Feedback Processor) and be transmitted to depth convolutional Neural net Network carries out the processing of human face expression；In reconstruct high-definition picture and original high-resolution image loss, mainly analysis is former The characteristic value of beginning resolution ratio and classification, the reconstruct of image is carried out in global and contextual information.Back-propagation algorithm is used In deep neural network learning process, the error of recognition of face is checked, and to hidden layer-hidden layer, hidden layer-output layer Weights be updated, promote neural network learning during global optimization, form the video capture sequential chart based on prediction The expression recognition result method of piece.

Under movement, the complex situations rocked and obscured, according to the characteristic of division under static state, as shown in figure 3, using The unsupervised deep learning of Bayes's convolutional neural networks (Bayes CNN) progress, the face picture continuously captured according to video, Face characteristic is extracted, is classified automatically using Bayes classifier, and solve data fitting problems.In general distribution and commonly Under dimension, (Expectation Maximization) algorithm is maximized using EM desired values, fully calculated in statistics Desired value, using the existing estimate to face parameter hidden variable, calculate maximum likelihood estimator (Maximum Likelihood Estimation)；Pass through MAP estimation (Maximum A on the basis of desired value is maximized Posteriori the value of face parameter) is calculated.By the method for data mining, to data being described property cluster analysis, according to If data object is respectively in each portion of face in Ganlei, such as space length method automatically by similarity, space length or density Have in the space of position indeformable.Descriptive cluster is mainly that the general property of face characteristic data is classified, it is intended to is found Similitude, regularity and pattern in class between all variables of facial recognition data collection, for example judge the motion list of face region of interest The combination of member, new feature and classification are formed, and form trickleer mood.Using Naive Bayes Classifier ( Bayes) method carries out the reasoning of above class and numeric data, eight kinds of moods such as pleasure, anger, sorrow, happiness of face are carried out classification and Subdivision；In movement, rock and obscure under conditions of, it is necessary to carry out video frequency object tracking, positioning and knowledge for dynamic human face object Not, this solution for relating to high-dimensional condition distribution.Under these conditions, the picture convolution of posterior probability can not directly be calculated Integral operation, therefore using Markov chain Meng Teka multi-methods (MCMC).MCMC is one group of progressive real estate from Posterior distrbutionp The effective ways of stranger's face and Emotional Picture sample, it is averaged by the face sample under continuous acquisition above dynamic condition Value, obtains the inference on model parameter, solves Approximate Likelihood Function value, solves the excellent of face Emotion identification under conditions above Change problem.In the case where the abnormal parameters calculated in real time are huge, such as under the conditions of the dynamic human face crawl of 60 frame per second, often The platform equipment picture amount of calculation of daily 24 hours is in 60*60*60*24=5184000 pictures (relative to the flat of iPhone storages Equal 3000 photos), we are using variational Bayesian method processing video pictures data.Variational Bayesian method is that one kind passes through A kind of technology of difficult product integration in the method approximate solution Bayesian inference of function approximation and machine learning, it is generally used in real time Complex statistics model under the conditions of video pictures big data, it can be considered as the extension of EM algorithms, and it is in observable variable and ginseng Latent variable is introduced on the basis of numerical value, is approached by optimal method, so as to solve mobile video dynamic frame ring The imponderable problem of more probability distribution under border, and solve the imponderable problem of the big matrix inversion of picture convolution；With When Backpropagation algorithms backtracking training above video pictures data, in order to prevent the over-fitting problem of data, I Hidden layer and the neuron of visible layer are removed using Dropout algorithms at random, Bayes CNN is trained in multiple batch processing Network, and consensus forecast rate is drawn, go out a reliable and stable Bayesian model so as to conclude (or extensive)；In this process In, we are also using enhancing learning method (Reinforcement Learning), intelligently from ring under conditions of unsupervised Mapping of the border to behavior so that prize signal functional value is maximum, heightens the reliability of Bayesian network.Under embedded environment Intensified learning mainly includes the rewards and punishments assessment of the strategy process, value network of analysis Policy Network, the algorithm of prototype network is recommended, Optimal decision behavior is searched for respectively, analysis is moved, rocked, the video human face picture classification under fuzzy situation, such as by dividing Frame difference between the every frame of analysis judges the program rocked or the state obscured by the judgement of continuous time series, carries out face The intelligent connecting of picture, the interference feature of influence face and Expression Recognition is peeled off, realize it is mobile, rock and hazy condition Under expression recognition accuracy and reliability, using model characterize environment, AMR mobile embedded type equipments crawl shifting Dynamic or continuous face carries out Expression Recognition.

In the case where face shows incomplete, plurality of human faces environment, noisy background, in order to ensure the reliability of facial expression classification, Processing face show it is not full-time, as shown in figure 4, employing Deep CNN depth convolutional neural networks, SRGANs super resolution degree pair The method that antibiosis is combined into network, RL enhancing study, the moderate using the upper strata neuron response of deep neural network are sparse Feature is analyzed the attribute of face, in the case where detection recognizes face, obtain face feature such as gray value or Marginal value keeps changing in minimum class while between keeping class changing, and has very strong selection to face and the attribute of expression Property, to face by the case of partial occlusion, this method accurately can identify face by behavioral characteristics, have very strong Shandong Rod, and the face identified by the global information of face and the contextual information of successive video frames to needs judges, and The shape of face in face system is called to splice to showing that infull face carries out completion, feature is described and expression is global, obtaining needs The partial data of the face of identification；In the case of plurality of human faces, face snap lighting system passes through simultaneously from the multiple faces of motion tracking The residence time of continuous crawl frame of video, judge to need the importance and precedence of multiple faces, and to each face machine Device expression carries out classification processing；Using scale invariant feature conversion SIFT (Scale Invariant under noisy background Feature Transformation) algorithm, detects and positions the background object unrelated with face such as window, chair, landscape Picture etc. is simultaneously peeled off away them from the face picture for needing to identify, only display needs the face and expression of Tracking Recognition. Using Backpropagation algorithms (algorithm introduction see before) during this, in the case where output layer cannot get data, The weights of each neuron are changed in backtracking on the basis of along stochastic gradient descent, obtain mean test value, optimize Deep CNN convolution Model., using SRGANs super resolutions degree confrontation network, it calculates positive counter-example people by the positive counter-example method of antagonistic training for we The confrontation of face image is lost and reconstruct high-definition picture and the weighted sum of original high-resolution image loss, more than acquisition complexity In the case of expression recognition.Antagonistic training mainly includes maker, arbiter and ultramagnifier, and wherein maker is responsible for defeated Go out interference face picture, the true and false of arbiter progress face picture judges；Reconstitution training is to original high definition and drop The paired samples of low resolution after sampling is trained, by reconstructing between high-definition picture and original high-resolution image Feature Mapping Euclidean distance, be fine definition picture low resolution picture reconstruction, and feed back to Deep CNN and enter Face and Emotion identification more than row under complicated ring.

Present invention incorporates image and pattern-recognition, the methodology of deep neural network and Bayesian network, base has been invented In the method for mobile embedded complex environment human face Emotion identification, with reference to the embedded software based on C, based on three-tier architecture Because working application software, figure and pattern-recognition, deep neural network, complex mathematical model, big data application etc. various height New technology, have in each fields such as the four-dimensional environment of robot, Internet of Things, internet, virtual reality and augmented reality higher Market value.

It is excellent that the present invention combines simple direct and neutral net the efficient classification and reasoning of traditional images and pattern-recognition etc. Gesture, and using super resolution degree antagonism neutral net, enhancing study and back-propagation algorithm, improve shallow-layer and deep learning is each The synchronism stability of frame of video operation and expression recognition in mobile embedded environment under the conditions of kind is run, and can be directed to Move, obscure, rocking, shaking shade, illumination variation, face position show realized under incomplete, plurality of human faces environment, noisy background it is real When the processing of real-time human face expression, reached high accuracy, high efficiency and reliability.

Disclosed above is only several specific embodiments of the present invention, but the present invention is not limited to this, any ability What the technical staff in domain can think change should all fall into protection scope of the present invention.

Claims

A kind of 1. face Emotion identification method based under complex environment, it is characterised in that this method by face be divided into forehead, Looks, cheek, nose, face, chin main region, and 68 characteristic points are further divided into, for features described above point, normal The method of user's face portion and expressive features classification in the case of state, uses Faster R- under illumination, reflection, shadowed condition Methods of the CNN based on facial zone convolutional neural networks；Bayesian network in the case where moving, shake, rocking and moving complex situations The method that network, Markov Chain, variation reasoning are combined；Make in the case where face shows incomplete, plurality of human faces environment, noisy background condition Calculated with depth convolutional neural networks and super resolution degree antagonism neutral net SRGANs, enhancing study, back-propagation algorithm, discarding The method that method is combined, it is effectively facilitated effect, the degree of accuracy and the reliability of expression recognition.
2. the face Emotion identification method according to claim 1 based under complex environment, it is characterised in that described spy Point is levied, is 68 characteristic points, the characteristic point for forming face is uniformly distributed, and provides a set of testing index, and the index includes Traceable (Traceable), slight (Slight), significantly medium (Medium), (Marked), greatly (Great), most strong (Maximum) totally 6 states.
3. the face Emotion identification method according to claim 2 based under complex environment, it is characterised in that more than being directed to Various complexity face recognition process, feature and grader are subjected to combined optimization, play feature representation and Classifier combination association Work energy, and the Nonlinear Mapping of the Linear Mapping of input layer, hidden layer and output layer and deep-neural-network is tied Close, under conditions of illumination, fuzzy, shade, using feature point extraction and Faster the R-CNN god of traditional RWM Method Using Relevance Vector Machines Tagsort through network is combined, and promotes the face Emotion identification degree of accuracy under these conditions.
4. the face Emotion identification method according to claim 3 based under complex environment, it is characterised in that for face In identification process, interference can also be carried out and handled, i.e., in the deep layer convolutional neural networks more than 20 layers, utilize multilayer god Superiority through network in abstract, cluster, sign, face is efficient during the video capture of mobile embedded type equipment is ensured While processing, peeled off for various complicated non-face intrinsic features；Network is resisted with reference to super resolution degree, by right The positive counter-example method of resistance training calculates confrontation loss and reconstruct high-definition picture and the original high score of positive counter-example facial image The weighted sum of resolution image impairment, obtain the expression recognition under above complex situations；Back-propagation algorithm is used in depth Spend during neural network learning, checked the error of recognition of face, and using stochastic gradient descent method to hidden layer-hide Layer, the weights of hidden layer-output layer are updated, and promote the global optimization during neural network learning.
5. the face Emotion identification method according to claim 4 based under complex environment, it is characterised in that illumination, reflection Or the method for using fast area convolutional neural networks and traditional RVM graders to be combined sum under shadow condition, examined by face Survey and fast positioning human face region and obtain face features point information from video frequency pick-up head rapidly, specifically, by right Intensity of illumination and direction, reflection, the quantization of shade illumination, using determine illumination mode, illumination system be divided into light source, face and Photoenvironment, determine the front and back meter of global context light intensity in illumination model, point of observation position, face and background Illumination, specular color calculating and its separation are calculated, illumination elimination, reflection-absorption and illumination compensation are carried out to illumination, reflection and shade, Eliminate the negative effect of the recognition of face under these illumination complex situations；Include the diffusing reflection mould of ambient light using illumination model At a kind of illumination to face in type, the specular reflectance model of skin and lambert's illumination model based on material and texture Reason, generate the various training samples and test sample of light irradiation；The mode that size normalization, unitary of illumination combine is taken to people Face image carries out abstract pretreatment；Non-face intrinsic attribute is stripped out from face by more than；By the face area after segmentation Area image is sent in the fast area convolutional neural networks trained, the feature of each region portion of quick obtaining face；Series connection institute There are the feature of human face region, and the classification of the grader progress expression using SVMs RVM supervised learnings, formed in light According to, reflection or shade face Emotion identification.
6. the face Emotion identification method according to claim 4 based under complex environment, it is characterised in that in movement, shake Under dynamic and fuzzy complex situations, according to the characteristic of division under static state, carried out using Bayes's convolutional neural networks without prison The deep learning superintended and directed, the face picture continuously captured according to video, face characteristic is extracted, carried out using Bayes classifier automatic Classification, and solve data fitting problems.
7. the face Emotion identification method according to claim 6 based under complex environment, it is characterised in that typically dividing Under cloth and common dimension, algorithm is maximized using EM desired values, desired value is calculated in statistics, using hidden to face parameter The existing estimate of variable is hidden, calculates maximum likelihood estimator；Pass through MAP estimation on the basis of desired value is maximized Calculate the value of face parameter；By the method for data mining, to data being described property cluster analysis, according to similarity, space If data object is respectively Ganlei automatically by distance or density, above class and numerical value are carried out using Naive Bayes Classifier method The reasoning of data, eight kinds of moods such as pleasure, anger, sorrow, happiness of face are classified and segmented.
8. the face Emotion identification method according to claim 7 based under complex environment, it is characterised in that in movement, shake , it is necessary to video frequency object tracking, positioning and identification be carried out for dynamic human face object, using markov under conditions of dynamic and fuzzy Chain Meng Teka multi-methods, by the average value of the face sample under continuous acquisition above dynamic condition, obtain on model parameter Inference, solve Approximate Likelihood Function value, solve conditions above under face Emotion identification optimization problem.
9. the face Emotion identification method according to claim 4 based under complex environment, it is characterised in that show in face Show under incomplete, plurality of human faces environment, noisy background, in order to ensure the reliability of facial expression classification, shown in processing face incomplete When, employing Deep CNN depth convolutional neural networks, SRGANs super resolutions degree confrontation generation network, RL strengthens study and is combined Method, the attribute of face is analyzed using the moderate of the upper strata neuron response of deep neural network sparse feature, In the case where detection recognizes face, the feature such as gray value or marginal value of face are obtained while changing between keeping class The minimum interior change of class is kept, and by the global information of face and the contextual information of successive video frames to needing to know others Face is judged, and calls the shape of face in face system to showing that infull face carries out completion, feature is described and the expression overall situation Splicing, obtain the partial data for the face for needing to identify.
10. the face Emotion identification method according to claim 9 based under complex environment, it is characterised in that in plurality of human faces In the case of, face snap lighting system judges simultaneously from the multiple faces of motion tracking, and by continuously capturing the residence time of frame of video The importance and precedence of multiple faces are needed, and classification processing is carried out to each face machine expression；In noisy background It is lower using scale invariant feature conversion SIFT algorithms, detect and position the background object unrelated with face and them from needing to know Peeled off away in other face picture, using Backpropagation algorithms, in the case where output layer cannot get data, The weights of each neuron are changed in backtracking on the basis of along stochastic gradient descent, obtain mean test value, optimize DeepCNN convolution moulds Type；Network is resisted using SRGANs super resolutions degree, positive counter-example facial image is calculated by the positive counter-example method of antagonistic training Confrontation loss and reconstruct high-definition picture and the weighted sum of original high-resolution image loss, are obtained under above complex situations Expression recognition；Antagonistic training mainly includes maker, arbiter and ultramagnifier, and wherein maker is responsible for exporting interference The true and false that face picture, arbiter carry out face picture judges；It is reconstitution training be to original high definition and it is down-sampled after The paired samples of low resolution is trained, and is reflected by the feature reconstructed between high-definition picture and original high-resolution image The Euclidean distance penetrated, it is fine definition picture low resolution picture reconstruction, and feeds back to Deep CNN and carry out above again Face and Emotion identification under heterocycle.