CN109508625A - A kind of analysis method and device of affection data - Google Patents
A kind of analysis method and device of affection data Download PDFInfo
- Publication number
- CN109508625A CN109508625A CN201811046253.7A CN201811046253A CN109508625A CN 109508625 A CN109508625 A CN 109508625A CN 201811046253 A CN201811046253 A CN 201811046253A CN 109508625 A CN109508625 A CN 109508625A
- Authority
- CN
- China
- Prior art keywords
- data
- skeleton
- classifier
- characteristic
- trained
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/172—Classification, e.g. identification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
- G06V40/174—Facial expression recognition
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Oral & Maxillofacial Surgery (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Computational Biology (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
- Image Processing (AREA)
Abstract
The invention discloses a kind of analysis methods of affection data, which comprises carries out feature extraction to the sample data of group photo image, obtains facial feature data, environmental characteristic data and the skeleton characteristic in sample data;Facial feature data, environmental characteristic data and/or skeleton characteristic are subjected to Fusion Features, obtain the fusion feature data in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic;Utilize the face classifier after training, the environment classifier after training, the skeleton classifier after training and the integrated classification device after training, image recognition is carried out to the target data of group photo image respectively, the corresponding emotion attribute of the target data is determined according to recognition result.The present invention also provides a kind of devices of affection data.
Description
Technical field
The present invention relates to the analysis methods and device of data analysis technique more particularly to a kind of affection data.
Background technique
The prior art can only carry out emotion recognition to the picture comprising single personage, and can not be to including multiple personages
Picture of taking a group photo carries out emotion recognition.
Summary of the invention
In order to solve the above technical problems, the embodiment of the invention provides a kind of analysis method of affection data and devices.
The technical solution of the embodiment of the present invention is achieved in that
One side according to an embodiment of the present invention provides a kind of analysis method of affection data, which comprises
Feature extraction is carried out to the sample data of group photo image, obtains facial feature data in the sample data, ring
Border characteristic and skeleton characteristic;
The facial feature data, the environmental characteristic data and/or skeleton characteristic are subjected to Fusion Features,
Obtain the fusion feature number in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic
According to;
It is trained based on the facial feature data confronting portions class device, the face classifier after being trained;It is based on
The environmental characteristic data are trained environment classifier, the environment classifier after being trained;Based on the skeleton
Characteristic is trained skeleton classifier, the skeleton classifier after being trained;Based on the fusion feature
Data are trained integrated classification device, the integrated classification device after being trained;
Utilize the face classifier after training, the environment classifier after training, the skeleton classifier after training and instruction
Integrated classification device after white silk carries out image recognition to the target data of group photo image respectively, determines the mesh according to recognition result
Mark the corresponding emotion attribute of data.
In above scheme, by the facial feature data, the environmental characteristic data and/or skeleton characteristic into
Row Fusion Features obtain in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic
Fusion feature data, comprising:
It is using nonlinear mapping function that the facial feature data, the environmental characteristic data and/or skeleton is special
Sign data are mapped in the same higher dimensional space, are obtained except the facial feature data, the environmental characteristic data and/or human body
New characteristic except skeleton character data;
Using the new characteristic as the fusion feature data.
In above scheme, the method also includes:
Based on the facial feature data, emotional semantic classification is carried out to the sample data;
Affective tag label is carried out to the corresponding sample data of emotional category each in classification results.
In above scheme, the method also includes:
Based on the environmental characteristic data, environment classification is carried out to the sample data;
Environmental labels label is carried out to the corresponding sample data of environment category each in classification results.
In above scheme, the method also includes:
It calculates in recognition result and characterizes the sum of emotion attribute probability of emotion attribute value of the affective tag;Or it calculates and knows
The sum of environment attribute probability of environment attribute value of the environmental labels is characterized in other result;
Determine the corresponding emotion attribute value of maximum emotion attribute probability in the sum of described emotion attribute probability;By the feelings
Feel the corresponding affective tag of attribute value, is determined as the corresponding emotion attribute of the target data;
Alternatively, determining the corresponding environment attribute value of maximum environment attribute probability in the sum of described environment attribute probability;It will
The corresponding environmental labels of the environment attribute value, are determined as the corresponding emotion attribute of the target data.
In above scheme, it is trained based on the facial feature data confronting portions class device, the face after being trained
Classifier, comprising:
Based on the facial feature data, the corresponding facial area of each group photo image in the sample data is determined;Benefit
With the corresponding sample data set of the facial area, network fine tuning is carried out to the face classifier, the face after being finely tuned
Classifier.
In above scheme, environment classifier is trained based on the environmental characteristic data, the environment after being trained
Classifier, comprising:
Based on the environmental characteristic data, the corresponding environmental area of each group photo image in the sample data is determined;Benefit
With the corresponding sample data set in the environmental area, network fine tuning is carried out to the environment classifier, the environment after being finely tuned
Classifier.
In above scheme, skeleton classifier is trained based on the skeleton characteristic, is trained
Skeleton classifier afterwards, comprising:
Based on the skeleton characteristic, the corresponding skeleton of each group photo image in the sample data is determined
Key point;Using the corresponding sample data set of the skeleton key point, it is micro- that network is carried out to the skeleton classifier
It adjusts, the skeleton classifier after being finely tuned.
Another aspect according to embodiments of the present invention, provides a kind of analytical equipment of affection data, described device includes:
Extraction unit carries out feature extraction for the sample data to group photo image, obtains the face in the sample data
Portion's characteristic, environmental characteristic data and skeleton characteristic;
Integrated unit is used for the facial feature data, the environmental characteristic data and/or skeleton characteristic
Fusion Features are carried out, are obtained in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic
Fusion feature data;
Training unit, for being trained based on the facial feature data confronting portions class device, the face after being trained
Classifier;Environment classifier is trained based on the environmental characteristic data, the environment classifier after being trained;It is based on
The skeleton characteristic is trained skeleton classifier, the skeleton classifier after being trained;It is based on
The fusion feature data are trained integrated classification device, the integrated classification device after being trained;
Recognition unit, for utilizing the face classifier after training, the environment classifier after training, the human body bone after training
Integrated classification device after bone classifier and training carries out image recognition to the target data of group photo image respectively, is tied according to identification
Fruit determines the corresponding emotion attribute of the target data.
The third aspect according to embodiments of the present invention, provides a kind of analytical equipment of affection data, described device includes:
Memory, processor and it is stored in the executable program that memory is moved by processor, which is characterized in that the processor fortune
The analysis method of affection data described in any one of analysis method of above-mentioned affection data is executed when the row executable program
The step of.
In the technical solution of the embodiment of the present invention, the analysis method and device of a kind of affection data are provided, by extracting sample
Facial feature data, environmental characteristic data and skeleton characteristic in notebook data carry out the crowd in group photo image
Emotional intensity analysis, and extra heavy fusion is carried out by facial feature data and environmental characteristic data, it obtains except the facial characteristics
Fusion feature data except data, the environmental characteristic data and/or skeleton characteristic, can obtain group photo image
In multiple personages whole emotional intensity.To improve the emotion recognition degree for image of taking a group photo.
Detailed description of the invention
Fig. 1 is the flow diagram of the analysis method of affection data in the embodiment of the present invention;
Fig. 2 is concatenated convolutional neural network flow diagram;
Fig. 3 is the basic procedure schematic diagram of fine tuning;
Fig. 4 is the human facial expression recognition schematic illustration being finely adjusted based on method for trimming to VGG-FACE model;
Fig. 5 is the schematic network structure of VGG-FACE model;
Fig. 6 is convolution process operation chart in neural network;
Fig. 7 is maximum value pond process operation chart;
Fig. 8 is the structural schematic diagram of Inception module;
Fig. 9 is the analytical equipment structure composition schematic diagram one of affection data in the embodiment of the present invention;
Figure 10 is the structure composition schematic diagram two of the analytical equipment of affection data.
Specific embodiment
Detailed description of the preferred embodiments with reference to the accompanying drawing.It should be understood that this place is retouched
The specific embodiment stated is merely to illustrate and explain the present invention, and is not intended to restrict the invention.
Fig. 1 is the flow diagram of the analysis method of affection data in the embodiment of the present invention;As shown in Figure 1, the method
Include:
Step 101, feature extraction is carried out to the sample data of group photo image, obtains the facial characteristics in the sample data
Data, environmental characteristic data and skeleton characteristic;
Here, group photo image specifically can be include at least two personage's head portraits image.
Specifically, this method is mainly used in the image processing terminal with image processing function, the image processing terminal
It can be mobile phone, computer, monitoring device etc..
Firstly, image processing terminal needs to first pass through net before the sample data to group photo image carries out feature extraction
Network crawler obtains group photo image that is a large amount of and including at least two personage's head portraits, accessed group photo figure from network
The quantity of picture can be greater than 100,000.
Then, image processing terminal carries out optical sieving to a large amount of group photo image got again.Specifically, at image
Effective group photo image can only be retained, then will effectively take a group photo figure for the invalid image-erasing in image of taking a group photo by managing terminal
As being saved as the sample data of group photo image, iconic model is trained so as to subsequent.
Here, invalid image can refer to the image that nobody's face or face are not known.
In the application, after image processing terminal gets effective group photo image, to the sample data of group photo image into
Row feature extraction, to obtain the facial feature data in the sample data, environmental characteristic data and skeleton characteristic.
Specifically, image processing terminal can pass through when the sample data to group photo image carries out facial feature extraction
Face recognition technology group photo image corresponding to sample data carries out Face datection, determines face location according to testing result.
Then, the face picture in group photo image is intercepted based on face location, the face picture of interception is updated to human face recognition model
In, the facial feature data in the group photo image can be obtained.
Here, image processing terminal specifically can be using cascade convolutional neural networks (MTCNN algorithm) to group photo image
The detection of human face five-sense-organ characteristic point is carried out, the face position in group photo image is then calibrated using the human face five-sense-organ characteristic point detected
It sets.
Specifically, MTCNN is a kind of cascade convolutional neural networks frame, and MTCNN will by way of multi-task learning
Two tasks of Face datection and human face five-sense-organ positioning feature point integrate.MTCNN network structure mainly includes three phases,
Each stage is made of a convolutional neural networks (CNN).
Firstly, passing through the convolutional neural networks (Proposal of a shallow-layer in the first stage of MTCNN network structure
Network, P-Net) structure group photo image in obtain human face region candidate window and bounding box regression vector, and use this
Bounding box returns, and calibrates to candidate window, and the candidate of high superposed is then merged by non-maxima suppression (NMS)
Frame, to quickly generate a large amount of candidate windows;
Secondly, passing through an opposite P-Net more complex convolutional neural networks in the second stage of MTCNN network structure
(Refine Network, R-Net) structure optimizes candidate window to exclude non-face window;
Finally, passing through a more complicated convolutional neural networks (Output in the phase III of MTCNN network structure
Network, O-Net) structure optimizes output window again, while exporting the coordinate of five human face characteristic points.Here, five faces
Characteristic point includes: nose characteristic point, left and right corners of the mouth characteristic point and right and left eyes characteristic point.It is specific as shown in Figure 2.
Fig. 2 is concatenated convolutional neural network flow chart, as shown in Figure 2:
Firstly, by the group photo image scaling of input to different sizes, image pyramid is formed, and by the image gold word of formation
Tower is as input picture.
Then, in the first stage (Proposal Network (P-Net)), schemed using the full convolutional network of P-Net in input
The candidate window of human face region and the regression vector of bounding box are obtained as in, and return (Bounding box using bounding box
Regression method) corrects candidate window, then reuses the candidate that non-maxima suppression (NMS) merges high superposed
Frame.
At second stage (Refine Network (R-Net)), improve candidate window using R-Net.The time of P-Net will be passed through
Window is selected to be input in R-Net, refusal falls the window of most of false (non-face), continues to use the method and non-of frame recurrence
The method that maximum inhibits merges the candidate frame of overlapping.
At phase III (Output Network (O-Net)), final face frame and feature point are exported using O-Net
It sets.
Wherein, the phase III is similar with second stage step, the difference is that generating 5 features of face in the phase III
Point position.
It, can also be to identifying after the facial image that image processing terminal identifies in group photo image in the application
Facial image is screened.Specifically, size can be extracted suitable, image clearly, positive face picture, gives up ruler
Very little too small, the ambiguous face of image, to improve the quality of training pattern.
Here, image processing terminal facial image for extracting in group photo image can be include face histogram
Piece, it is each group photo image at least exist there are two include face rectangle picture.When the short side of the rectangle picture is less than 24
When pixel, which can directly be given up, and save the maximum face picture of size in rectangle picture.
In the application, image processing terminal can also utilize interception after getting the face picture in group photo image
Face picture is finely adjusted preset VGG-FACE model, is specifically finely adjusted using method for trimming to VGG-FACE model.
In the application, the face picture that image processing terminal will acquire is updated to the VGG-FACE model after fine tuning, i.e.,
Face affective characteristics can be obtained, wherein the VGG-FACE model after fine tuning can identify the feelings that face is shown in face picture
Sense and emotional intensity.In other words, the emotion that face is shown both had been carried in the face affective characteristics, such as: " happiness ",
" indignation ", " sadness " etc. also carry the strength grade of corresponding emotion, such as: " joyful ", " great rejoicing ", " wild with joy " etc..It is specific micro-
The basic schematic diagram adjusted is as shown in Figure 3.
Fig. 3 is that neural network finely tunes schematic illustration, as shown in figure 3, adjusting each layer of ginseng using reflection propagation algorithm
Number.Here, the object of fine tuning can be all network layers, be also possible to designated layer.Specifically, network layer includes: convolutional layer, pond
Change layer and inception layers.
It specifically, is generally only higher to the number of plies in network in the case where the data set for fine tuning is not sufficiently more
Parameter be finely adjusted, reason is to prevent over-fitting, and with the raising of convolution depth, the feature of extraction can exist larger
Difference.It is generally believed that the convolutional neural networks feature of lower level has versatility for image data, can incite somebody to action
It regards edge detector or color lump detector as.And with the promotion of the number of plies, feature can include the thin of more current data sets
Save information.
In the application, main selection (containing 2.6M face-images, covers about 2.6K difference to a large amount of human face datas
Personage) as training main body VGG-FACE model be finely adjusted.VGG-FACE model is that a kind of depth is very big, but ties
Structure is relatively easy and effective convolutional neural networks, is mainly used for the training of face-image.It is specifically based on method for trimming pair
The human facial expression recognition principle that VGG-FACE model is finely adjusted is as shown in Figure 4.
In Fig. 4, which includes 11 computing modules altogether, and each computing module is arrived more by a linear processor and one
A nonlinear processor constitutes (predominantly RELU non-linear unit and maximum pond filter).Wherein the first eight computing module
Referred to as convolution module, linear processor are one group of linear filter group, the main operation for completing linear convolution;Three fortune afterwards
It calculates module and is referred to as full link block, linear processor is also one group of linear filter group, and difference is each filter volume
The size and input data of product core are consistent, i.e., each filter is directly handled whole input picture.
Fig. 5 is the schematic network structure of VGG-FACE model;The filter size of each convolutional layer is given in Fig. 5,
Number of filter, the size that when convolution step-length and convolution fills outward.
When being finely adjusted using method for trimming to VGG-FACE model, the size for treating training group photo image first is carried out
Adjustment makes the size to training group photo image become 224 × 224 (pixels), then takes out 80% from training group photo image
Image is as sample training collection, and the image of taking-up 20% is as test sample collection from training group photo image.By will be from group photo
The face picture obtained in image is input in VGG-FACE model, to obtain the face affective characteristics in the group photo image.
Conv is convolutional layer in Fig. 5, and convolutional layer is to extract characteristics of image by carrying out convolution operation to group photo image.?
In convolutional neural networks, each convolutional layer would generally include multiple trainable convolution masks (i.e. convolution kernel), different convolution
Template corresponds to different characteristics of image.After convolution kernel and input picture carry out convolution operation, (such as by nonlinear activation function
Sigmoid function, RELU function, ELU function etc.) processing, it can map to obtain corresponding characteristic pattern (Feature Map).
Wherein, the parameter of convolution kernel is usually being calculated using specific learning algorithm (such as stochastic gradient descent algorithm).Convolution
Refer to that the pixel value with parameter and image corresponding position in convolution mask is weighted the operation of summation.One typical volume
Product process can be as shown in Figure 6.
Fig. 6 is convolution process operation chart in neural network;In Fig. 6, with convolution nuclear convolution one 4 × 4 of 2 × 2
Matrix obtains one 3 × 3 convolution results, particular by sleiding form window, carries out to all positions in input picture
Convolution operation can obtain corresponding characteristic pattern later.
Compared with traditional neural network, the outstanding advantage of convolutional neural networks is that it abandons traditional neural network
" full connection " design between middle adjacent layer is greatly reduced in such a way that locally connection and weight are shared and needs training
Model parameter number, reduces calculation amount.
Part connects and refers in convolutional neural networks, each neuron and a regional area in input picture
It is connected, rather than is connect entirely with all neurons in input picture.Weight is shared to be referred in the not same district of input picture
Domain is shared Connecting quantity (i.e. convolution nuclear parameter).In addition, the convolutional neural networks design method that locally connection and weight are shared,
So that the feature that network extracts has the stability of height, it is insensitive to translation, scaling and deformation etc..
Pool is pond layer in Fig. 5, and pond layer usually occurs with convolutional layer in pairs, and after convolutional layer, pond layer is used to
Down-sampled operation is carried out to input feature vector figure.Image is commonly entered after convolution operation, a large amount of characteristic patterns that can be obtained, feature
Dimension is excessively high to will lead to the sharp increase of network query function amount.Therefore, pond layer greatly reduces model by the dimension of reduction characteristic pattern
Number of parameters.
RELU is amendment linear unit (Rectified Linear Unit) function in Fig. 5, is a kind of artificial neural network
In common activation primitive, generally refer to using ramp function and its mutation as the nonlinear function of representative.RELU functional form is such as
Under:
θ (x)=max (0, x) (1);
For entering the input vector x from upper one layer of neural network of neuron, line rectification activation primitive is used
Neuron can export max (0, x), until next layer of neuron or the output as entire neural network.
In this way, on the one hand reducing the calculation amount of the network operation, the risk of network over-fitting is on the other hand also reduced.Pond
It is one-to-one for changing the characteristic pattern of characteristic pattern and convolutional layer that layer obtains, therefore pondization operation is only reduction of characteristic pattern dimension
Degree, number do not change.
At present there is common pond method in convolutional neural networks: maximum value pond (Max Pooling), mean value pond
(Mean Pooling) and random pool (Stochastic Pooling).For a sampling subregion, maximum value pond
Change the output result for referring to choosing the maximum point of wherein pixel value as the region;Mean value pond, which refers to calculating, wherein to be owned
The mean value of pixel uses the mean value as the output of sampling area;Random pool refers to randomly selecting one from sampling area
A pixel value exports as a result, and usual pixel value is bigger, and the probability selected is higher.Maximum value pond process such as Fig. 7 institute
Show.
Fig. 7 is maximum value pond process operation chart in neural network;In Fig. 7, including one 4 × 4 image
Characteristic area is region 1, region 2, region 3 and region 4 respectively, wherein includes 4 pixel values (1,1,5,6) in region 1;Area
It include 4 pixel values (2,4,7,8) in domain 2;It include 4 pixel values (3,2,1,2) in region 3;It include 4 pixels in region 4
It is worth (1,0,3,4);By choosing output of the maximum point of pixel value in each characteristic area as the region as a result, i.e. region 1
For " 6 ", region 2 be " 8 ", region 3 is " 3 ", region 4 is " 4 ".
In the application, image processing terminal may be used also after the facial feature data in the sample data for obtaining group photo image
To be based on the facial feature data, emotional semantic classification is carried out to the sample data;And it is corresponding to emotional category each in classification results
Sample data carry out affective tag label.
For example, using existing face recognition mode, for example, emotional semantic classification is carried out to group photo image using MTCNN algorithm,
Three kinds of " positive emotion " sample, " calmness " sample, " negative emotion " sample samples are divided into a large amount of group photo images that will acquire
This collection, it is, of course, also possible to carry out emotion division to group photo image by the way of semi-automatic or artificial.
It, can be corresponding to emotional category each in classification results after obtaining the emotional semantic classification result for sample data
Sample data carries out affective tag label.For example, the personage in image of taking a group photo to be presented with to the figure of the positive emotions such as glad, surprised
Piece is labeled as " positive emotion " sample;Picture indicia by the personage in image of taking a group photo without special expression is " calmness " sample;It will
The picture indicia that personage in group photo image is presented with the negative emotions such as sadness, indignation, fear is " negative emotion " sample.
In the application, image processing terminal is when the sample data to group photo image carries out environmental characteristic extraction, firstly, can
It is adjusted with the size first to group photo image, the size of the group photo image is made to become 224 × 224, then, from training group photo figure
The image of taking-up 80% is as sample training collection as in, and the image of taking-up 20% is as test sample from training group photo image
Collection.The face picture obtained from group photo image is then input to deep learning structure GoogLeNet and VGG-16 Structure Network
Be finely adjusted in network, need at this time the number of nodes by the last full articulamentum of GoogLeNet and VGG-16 network be set as 3 or
9.It is environmental characteristic to extract the feature of draw pond layer output.Wherein, the structure of GoogLeNet network is specific such as 1 institute of table
Show:
Table 1
Tool, the specific structure of the Inception module in table 1 is as shown in Figure 8.
Fig. 8 is the structural schematic diagram of Inception module, in Fig. 8, by the way that 3 × 3 convolution sum, 1 × 1 convolution to be stacked on
Together, the brightness of network is on the one hand increased, on the other hand increases network to the adaptability of scale.
In table 1, Dropout layers are referred in the training process of network, ignore at random a part neuron, allow its not
On the one hand work, this method can accelerate operation, on the other hand reduce the risk of model over-fitting.P=0.5 ignores at random
50% neuron.
VGG-16 network module specific structure is as shown in table 2:
Table 2
In the application, image processing terminal may be used also after the environmental characteristic data in the sample data for obtaining group photo image
To be based on the environmental characteristic data, environment classification is carried out to the sample data;And to environment category each in classification results
Corresponding sample data carries out environmental labels label.
For example, 9 grades can be divided into according to environment, such as: for the relevant environments such as " wedding ", " dinner party ",
Can mark as great rejoicing " atmosphere the pleasant environment such as " park ", " flowers and plants " can be marked as happiness " atmosphere.
Specifically, label atmosphere grade can by digital representation, such as: with digital label " -4 "-"+4 " (or 0-8 etc. other mark
Label) totally 9 digital representative ring border atmosphere, " 4 " performance " wild with joy ", " 3 " performance " great rejoicing ", " 2 " performance " joyful ", " 1 " performance are " pleased
Fastly ", " 0 " performance " calmness ", " -1 " performance " sad ", " -2 " performance " sadness ", " -3 " performance " grief ", " -4 " performance are " sad
Sorrow ".Above-mentioned atmosphere grade can be marked on picture by way of labelling, then to pass through search quick obtaining.
In the application, image processing terminal is when the sample data to group photo image carries out skeleton feature extraction, tool
The key point extraction that trained code OpenPose group photo image corresponding to sample data carries out skeleton can be used in body,
In, the key point of nearly 130 skeletons can be extracted altogether, comprising: then face+movement+hand uses what is extracted
Skeleton image is finely adjusted GoogLeNet the and ResNet50 model after pre-training respectively.
In the application, the skeleton image after extraction is identical as the original group photo size of image, and only includes human body bone
Bone feature.
Step 102, the facial feature data, the environmental characteristic data and/or skeleton characteristic are carried out
Fusion Features obtain melting in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic
Close characteristic;
In the application, nonlinear mapping function specifically can use by the facial feature data, the environmental characteristic number
According to and/or skeleton characteristic be mapped in the same higher dimensional space, obtain except the facial feature data, the environment
New characteristic except characteristic and/or skeleton characteristic;Then, using the new characteristic as institute
State fusion feature data.
Specifically, image processing terminal can be used in neural network, and full connection layer functions are defeated before softmax layer functions
4096 dimensional characteristics out are merged.
Specific fusion method can be selected: core typical case association analysis (KCCA, Kernel Canonical
Correlation Analysis), nuclear matrix fusion (KMF, Kernel matrix fusion), core intersect the factor (KCFA,
Kernel cross factor analysis) etc..In this way, by same model obtain except above-mentioned facial feature data, on
The fusion feature except environmental characteristic data and/or skeleton characteristic is stated, can be promoted to face figure in group photo image
The identification accuracy of piece.
Here, (1) is accomplished by based on the Fusion Features of KCCA algorithm
Assuming that sample X=(x1, x2..., xn)T∈Rn×pWith sample Y=(y1, y2..., yn)T∈Rn×qIt is that two mean values are
Zero eigenmatrix, wherein every a line x of matrixiAnd yiIndicate a feature vector, line number n indicates that this feature matrix characterizes
Data set size, p, q respectively indicate the dimension of two kinds of features.By the feature vector composition characteristic in eigenmatrix with a line
To { (x1,y1),(x2,y2),…,(xn,yn), each feature is to both from two different mode, canonical correlation analysis
Purpose be find mapping matrix α=(α1,α2,…,αd) and β=(β1,β2,…,βd), d≤min (p, q) so that following formula at
It is vertical:
Wherein, Cxx=XTX, Cyy=YTY, Cxy=XTY respectively indicates the auto-covariance matrix and the two of two eigenmatrixes
Covariance matrix.Above-mentioned optimization problem can be converted to the solution for seeking Eigenvalue Problems:
Canonical correlation analysis is analyzed based on linear space, can not be obtained non-linear between different modalities feature
Relationship, thus introduce kernel method on the basis of canonical correlation analysis, propose kernel canonical correlation analysis (KCCA) method, for original
The canonical correlation analysis algorithm of beginning adds nonlinear property.The basic thought and Nonlinear Support Vector Machines class of KCCA algorithm
Seemingly, it is that original eigenmatrix X and Y are mapped to higher dimensional space, i.e. nuclear space X ' and Y ', carries out correlation in nuclear space
Analysis.
The majorized function of kernel canonical correlation analysis are as follows:
Wherein KxAnd KyIt is nuclear matrix, meets, Kx=X 'TX ', Ky=Y 'TY′.It is similar with CCA, above-mentioned majorized function
Solution can also be converted to the problem of seeking characteristic value, due to being related to inverse of a matrix in characteristic value solution procedure, and nuclear matrix without
Method guarantees invertibity, in order to solve this problem, carries out Regularization to formula (5):
Wherein 0≤t≤1 is regularization coefficient.
(2) Fusion Features based on KMF are accomplished by
The thought of nuclear matrix fusion is finding a public subspace for two different mode, which can be maximum
Degree characterize the feature of two mode.Assuming that X=(x1, x2..., xn)TWith Y=(y1, y2..., yn)TRespectively correspond sample number
According to the feature extracted from two mode, KxAnd KyThe nuclear matrix of two mode is respectively corresponded, is met
WhereinIndicate core letter, above-mentioned two nuclear matrix is combined by algebraic operation, be can choose core by nuclear matrix fusion
The weighted sum either product of two elements of position is corresponded in matrix as the nuclear matrix element after combination.It selects herein previous
Kind fusion method, fused matrix can indicate are as follows:
Kf=aKx+bKy (7)
Wherein, a+b=1.After obtaining fused matrix, dimension-reduction treatment can be carried out to it by traditional kernel method,
To obtain compressed characteristic value.
(3) Fusion Features based on KCFA are accomplished by
Projection matrix in canonical correlation analysis (CCA, Canonical Correlation Analysis) algorithm is asked
Solution can be converted to the problem of solving eigenmatrix, but this also correspondingly requires covariance matrix reversible, and the requirement is certain
The application of canonical correlation analysis is limited in degree.Cross over model factorial analysis improves canonical correlation analysis, mesh
Mark is to find to make the mapping matrix of Frobenius Norm minimum in projector space.Assuming that the eigenmatrix of two mode is respectively X
∈Rn×p, Y ∈ Rn×q, the corresponding matrix of a linear transformation is respectively U=(u1, u2..., ud), V=(v1, v2..., vd), meet d
≤ min (p, q), the then majorized function of the cross over model factor are as follows:
WhereinIndicate the Frobenius norm of input matrix, I indicates unit matrix.By
Known to the property of Frobenius norm:Tr () is indicated
The mark of matrix.It is determining, thus tr (XX since X and Y is as eigenmatrixT) and tr (YYT) it is constant, then formula (8) can be with
Abbreviation are as follows:
minU,V(XUVTYT) s.t. UTU=I, VTV=1 (9)
The solution of the above problem can be converted to the problem of singular value decomposition, enable XTY=SxyΛxyDxy, then corresponding transfer
Matrix is respectively U=Sxy, V=Dxy。
Basic cross over model factorial analysis can only learn the linear character of both modalities which, can equally incite somebody to action by kernel method
It is core cross over model factorial analysis (KCFA) that it, which is expanded,.Assuming that after X ' and Y ' respectively X and Y Nonlinear Mapping to higher dimensional space
Eigenmatrix, Kx, KyFor corresponding nuclear matrix.Similar with cross over model factorial analysis, cross over model factorial analysis can also be converted
To solve X 'TThe singular value decomposition problem of Y '.
Step 103, it is trained based on the facial feature data confronting portions class device, the face classification after being trained
Device;Environment classifier is trained based on the environmental characteristic data, the environment classifier after being trained;Based on the people
Body skeleton character data are trained skeleton classifier, the skeleton classifier after being trained;Melted based on described
It closes characteristic to be trained integrated classification device, the integrated classification device after being trained.
In the application, the type of classifier can be support vector machines (SVM, Support Vector Machine), k most
The types such as neighbouring sorting algorithm KNN, fully-connected network it is one or more, the type of classifier corresponding to each feature can be with
It is identical to can also be different.
Specifically, image processing terminal is trained based on the facial feature data confronting portions class device, is trained
When rear face classifier, it can be based on the facial feature data, determine each group photo image correspondence in the sample data
Facial area;Using the corresponding sample data set of the facial area, network fine tuning is carried out to the face classifier, is obtained
Face classifier after fine tuning.
Image processing terminal is based on the environmental characteristic data and is trained to environment classifier, the environment after being trained
When classifier, the environmental characteristic data can be specifically based on, determine the corresponding ring of each group photo image in the sample data
Border region;Using the corresponding sample data set in the environmental area, network fine tuning is carried out to the environment classifier, is finely tuned
Environment classifier afterwards.
Image processing terminal is based on the skeleton characteristic and is trained to skeleton classifier, is trained
When rear skeleton classifier, it can be specifically based on the skeleton characteristic, be determined each in the sample data
The corresponding skeleton key point of group photo image;Using the corresponding sample data set of the skeleton key point, to the people
Body bone classifier carries out network fine tuning, the skeleton classifier after being finely tuned.
Step 104, the face classifier after training, the environment classifier after training, the skeleton point after training are utilized
Integrated classification device after class device and training carries out image recognition to the target data of group photo image respectively, true according to recognition result
Determine the corresponding emotion attribute of the target data.
In the application, the environment classifier after face classifier, training after image processing terminal is using training, training
Skeleton classifier afterwards and the integrated classification device after training, carry out image recognition to the target data of group photo image respectively
When, it can also calculate and characterize the sum of emotion attribute probability of emotion attribute value of the affective tag in recognition result;Or it calculates
The sum of the environment attribute probability of environment attribute value of the environmental labels is characterized in recognition result;Then, according to the emotion category
Property the sum of probability in the corresponding emotion attribute of maximum emotion attribute probability be worth corresponding affective tag, be determined as the number of targets
According to corresponding emotion attribute;Alternatively, according to the corresponding environment of maximum environment attribute probability in the sum of described environment attribute probability
The corresponding environmental labels of attribute value are determined as the corresponding emotion attribute of the target data.
Specifically, image processing terminal gets the face classifier after training, the environment classifier after training, after training
Skeleton classifier and training after integrated classification device after, can by each classifier exploitation right reassignment algorithm, count
Calculate prediction result corresponding with each classifier.
For example, for the ambient image for image of taking a group photo.The ambient image is input to GoogLeNent network and VGG-16
In network, prediction calculating is carried out, obtains the prediction result of ambient image;
For the facial image in group photo image, which is input in Goog-FACE network, prediction meter is carried out
It calculates, obtains the prediction result of facial image;
For the first human body bone image (face+body) in group photo image, which is input to
In GoogLeNent network, prediction calculating is carried out, obtains the prediction result of the first human body bone image;
For the second skeleton image (face+body+hand) in group photo image, which is inputted
Into GoogLeNent network, prediction calculating is carried out, obtains the prediction result of the second skeleton image;
In the application, the feature in ambient image and facial image can also be merged, be obtained in addition to the environment map
The prediction result of blending image other than picture and the facial image.
Here, it also can wrap in blending image containing new ambient image and facial image, mentioned as long as not including
The ambient image and facial image of taking-up.
In the application, after the corresponding image prediction result of each classifier will be obtained, the fusion of decision-making level's information can use
Method to obtained in step 103 Various Classifiers on Regional (training after face classifier, training after environment classifier, training
Integrated classification device after rear skeleton classifier and training) prediction result merged.
Specifically, image processing terminal can filter out each personage's from group photo image according to grade classification algorithm
Affective characteristics, here, using the affective characteristics of each personage in image of taking a group photo as crowd's affective characteristics of the group photo image.
Here it is possible to pass through the face classifier after training, instruction using affective tag or environmental labels in group photo image
The skeleton classifier after environment classifier, training after white silk and the integrated classification device after training, respectively to group photo image
Target data carries out image recognition, and in image recognition processes, determines the corresponding image attributes of target data, here, figure
As attribute includes affective tag or environmental labels.
Specifically, the probability that affective tag or environmental labels can be belonged to by calculating target data, determines target data
Corresponding image attributes.Then, after summing to the probability of each label, the corresponding label of a maximum probability is made
For the target labels of target data, which is then the image attributes of the target data.It is carried out specific to target data
The calculation formula of image attributes prediction is as follows:
S=ω0Sscene+ω1Sfusion+ω2Sface+ω3Sskeleton1+ω4Sskeleton2 (10)
Wherein, S is to predict that target data belongs to the probability of certain tag attributes, ω0-ω4Refer to and respectively corresponds various features
Weight, and meet ω0+ω1+ω2+ω3+ω4=1.
This programme is analyzed using the emotional intensity of face affective characteristics, the crowd of scene characteristic and skeleton feature,
And merged by the emotional intensity feature of face and the feature of scene, it obtains except the face characteristic and environment that have identified are special
Fusion feature except sign.Then, the face characteristic data, environmental characteristic data, skeleton characteristic by identifying
Classifier is trained respectively with fusion feature data, obtains everyone affective characteristics in group photo image.It can obtain more
The whole emotional intensity of the group photo image of people.Wherein, the identification of face is identified using two kinds of face emotion models, is mentioned
The accuracy of the high face emotion recognition for multiple group photo images.
Fig. 9 is the analytical equipment structure composition schematic diagram one of affection data in the embodiment of the present invention, as shown in figure 9, described
Device includes:
Extraction unit 901 carries out feature extraction for the sample data to group photo image, obtains in the sample data
Facial feature data, environmental characteristic data and skeleton characteristic;
Integrated unit 902 is used for the facial feature data, the environmental characteristic data and/or skeleton feature
Data carry out Fusion Features, obtain except the facial feature data, the environmental characteristic data and/or skeleton characteristic
Except fusion feature data;
Training unit 903, for being trained based on the facial feature data confronting portions class device, after being trained
Face classifier;Environment classifier is trained based on the environmental characteristic data, the environment classifier after being trained;Base
Skeleton classifier is trained in the skeleton characteristic, the skeleton classifier after being trained;Base
Integrated classification device is trained in the fusion feature data, the integrated classification device after being trained;
Recognition unit 904, for utilizing the face classifier after training, the environment classifier after training, the people after training
Integrated classification device after body bone classifier and training carries out image recognition to the target data of group photo image respectively, according to knowledge
Other result determines the corresponding emotion attribute of the target data.
In the application, integrated unit 902 is specifically used for utilizing nonlinear mapping function by the facial feature data, institute
It states environmental characteristic data and/or skeleton characteristic is mapped in the same higher dimensional space, obtain except the facial characteristics
New characteristic except data, the environmental characteristic data and/or skeleton characteristic;By the new characteristic
According to as the fusion feature data.
In the application, described device further include: taxon 905 and marking unit 906;
Specifically, the taxon 905 carries out feelings to the sample data for being based on the facial feature data
Sense classification;
Marking unit 906, for carrying out affective tag mark to the corresponding sample data of emotional category each in classification results
Note.
In the application, the taxon 905 is also used to based on the environmental characteristic data, to the sample data into
Row environment classification;
Marking unit 906 is also used to carry out environmental labels to the corresponding sample data of environment category each in classification results
Label.
In the application, described device further include: computing unit 907 and determination unit 908;
Specifically, computing unit 907, for calculating the feelings for characterizing the emotion attribute value of the affective tag in recognition result
Feel the sum of attribute probability;Or calculate recognition result in characterize the environmental labels environment attribute value environment attribute probability it
With;
Determination unit 908, for determining the corresponding feelings of maximum emotion attribute probability in the sum of described emotion attribute probability
Feel attribute value;Alternatively, determining the corresponding environment attribute value of maximum environment attribute probability in the sum of described environment attribute probability;It will
The emotion attribute is worth corresponding affective tag, is determined as the corresponding emotion attribute of the target data;Alternatively, by the environment
The corresponding environmental labels of attribute value are determined as the corresponding emotion attribute of the target data.
In the application, described device further include: adjustment unit 909;
Specifically, the determination unit 908 is also used to determine in the sample data based on the facial feature data
The corresponding facial area of each group photo image;
The adjustment unit 909, for utilizing the corresponding sample data set of the facial area, to the face classifier
Carry out network fine tuning, the face classifier after being finely tuned.
In the application, the determination unit 908 is also used to determine the sample data based on the environmental characteristic data
In the corresponding environmental area of each group photo image;
The adjustment unit 909 is also used to using the corresponding sample data set in the environmental area, to the environment classification
Device carries out network fine tuning, the environment classifier after being finely tuned.
In the application, the determination unit 908 is also used to determine the sample based on the skeleton characteristic
The corresponding skeleton key point of each group photo image in data;
The adjustment unit 909 is also used to using the corresponding sample data set of the skeleton key point, to the people
Body bone classifier carries out network fine tuning, the skeleton classifier after being finely tuned.
It should be understood that the analytical equipment of affection data provided by the above embodiment is when carrying out information push, only with
The division progress of above-mentioned each program module can according to need for example, in practical application and distribute above-mentioned processing by not
Same program module is completed, i.e., the internal structure of the analytical equipment of affection data is divided into different program modules, to complete
All or part of processing described above.In addition, the analytical equipment and affection data of affection data provided by the above embodiment
Analysis method embodiment belong to same design, specific implementation process is detailed in embodiment of the method, and which is not described herein again.
Figure 10 is the structure composition schematic diagram two of the analytical equipment of affection data, and the analytical equipment 1000 of affection data can be with
Be mobile phone, computer, digital broadcast terminal, information transceiving equipment, game console, tablet device, personal digital assistant,
Information Push Server, content server etc..The analytical equipment 1000 of affection data shown in Fig. 10 includes: at least one processing
Device 1001, memory 1002, at least one network interface 1004 and user interface 1003.In the analytical equipment 1000 of affection data
Various components be coupled by bus system 1005.It is understood that bus system 1005 is for realizing between these components
Connection communication.Bus system 1005 further includes that power bus, control bus and status signal are total in addition to including data/address bus
Line.But for the sake of clear explanation, various buses are all designated as bus system 1005 in Figure 10.
Wherein, user interface 1003 may include display, keyboard, mouse, trace ball, click wheel, key, button, touching
Feel plate or touch screen etc..
It is appreciated that memory 1002 can be volatile memory or nonvolatile memory, volatibility may also comprise
Both with nonvolatile memory.Wherein, nonvolatile memory can be read-only memory (ROM, Read Only
Memory), programmable read only memory (PROM, Programmable Read-Only Memory), erasable programmable are read-only
Memory (EPROM, Erasable Programmable Read-Only Memory), electrically erasable programmable read-only memory
The storage of (EEPROM, Electrically Erasable Programmable Read-Only Memory), magnetic random access
Device (FRAM, ferromagnetic random access memory), flash memory (Flash Memory), magnetic surface are deposited
Reservoir, CD or CD-ROM (CD-ROM, Compact Disc Read-Only Memory);Magnetic surface storage can be
Magnetic disk storage or magnetic tape storage.Volatile memory can be random access memory (RAM, Random Access
Memory), it is used as External Cache.By exemplary but be not restricted explanation, the RAM of many forms is available, such as
Static random access memory (SRAM, Static Random Access Memory), synchronous static random access memory
(SSRAM, Synchronous Static Random Access Memory), dynamic random access memory (DRAM,
Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM, Synchronous
Dynamic Random Access Memory), double data speed synchronous dynamic RAM (DDRSDRAM,
Double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random
Access memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), synchronized links
Dynamic random access memory (SLDRAM, SyncLink Dynamic Random Access Memory), direct rambus
Random access memory (DRRAM, Direct Rambus Random Access Memory).Description of the embodiment of the present invention is deposited
Reservoir 1002 is intended to include but is not limited to the memory of these and any other suitable type.
Memory 1002 in the embodiment of the present invention is for storing various types of data to support the analysis of affection data
The operation of device 1000.The example of these data includes: any calculating for operating on the analytical equipment 900 of affection data
Machine program, such as operating system 10021 and application program 10022;Music data;Animation data;Book information;Video etc..Wherein,
Operating system 10021 includes various system programs, such as ccf layer, core library layer, driving layer etc., for realizing various basic industry
It is engaged in and handles hardware based task.Application program 10022 may include various application programs, such as media player
(Media Player), browser (Browser) etc., for realizing various applied business.Realize present invention method
Program may be embodied in application program 10022.
The method that the embodiments of the present invention disclose can be applied in processor 1001, or real by processor 1001
It is existing.Processor 1001 may be a kind of IC chip, the processing capacity with signal.During realization, the above method
Each step can be completed by the instruction of the integrated logic circuit of the hardware in processor 1001 or software form.Above-mentioned
Processor 1001 can be general processor, digital signal processor (DSP, Digital Signal Processor), or
Other programmable logic device, discrete gate or transistor logic, discrete hardware components etc..Processor 1001 may be implemented
Or disclosed each method, step and logic diagram in the execution embodiment of the present invention.General processor can be microprocessor
Or any conventional processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware
Decoding processor executes completion, or in decoding processor hardware and software module combination execute completion.Software module can
To be located in storage medium, which is located at memory 1002, and processor 1001 reads the information in memory 1002, knot
Close the step of its hardware completes preceding method.
In the exemplary embodiment, the analytical equipment 1000 of affection data can be by the dedicated integrated electricity of one or more application
Road (ASIC, Application Specific Integrated Circuit), DSP, programmable logic device (PLD,
Programmable Logic Device), Complex Programmable Logic Devices (CPLD, Complex Programmable Logic
Device), field programmable gate array (FPGA, Field-Programmable Gate Array), general processor, control
Device, microcontroller (MCU, Micro Controller Unit), microprocessor (Microprocessor) or other electronics member
Part is realized, for executing preceding method.
It when the specific processor 1001 runs the computer program, executes: the sample data of group photo image is carried out
Feature extraction obtains facial feature data, environmental characteristic data and the skeleton characteristic in the sample data;By institute
It states facial feature data, the environmental characteristic data and/or skeleton characteristic and carries out Fusion Features, obtain except the face
Fusion feature data except portion's characteristic, the environmental characteristic data and/or skeleton characteristic;Based on the face
Portion's characteristic confronting portions class device is trained, the face classifier after being trained;Based on the environmental characteristic data pair
Environment classifier is trained, the environment classifier after being trained;Based on the skeleton characteristic to skeleton
Classifier is trained, the skeleton classifier after being trained;Based on the fusion feature data to integrated classification device into
Row training, the integrated classification device after being trained;Utilize the face classifier after training, the environment classifier after training, training
Skeleton classifier afterwards and the integrated classification device after training, carry out image recognition to the target data of group photo image respectively,
The corresponding emotion attribute of the target data is determined according to recognition result.
It when the specific processor 1001 runs the computer program, also executes: utilizing nonlinear mapping function by institute
It states facial feature data, the environmental characteristic data and/or skeleton characteristic to be mapped in the same higher dimensional space, obtain
To the new characteristic in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic;
Using the new characteristic as the fusion feature data.
It when the specific processor 1001 runs the computer program, also executes: being based on the facial feature data, it is right
The sample data carries out emotional semantic classification;Affective tag mark is carried out to the corresponding sample data of emotional category each in classification results
Note.
It when the specific processor 1001 runs the computer program, also executes: being based on the environmental characteristic data, it is right
The sample data carries out environment classification;Environmental labels mark is carried out to the corresponding sample data of environment category each in classification results
Note.
It when the specific processor 1001 runs the computer program, also executes: calculating in recognition result described in characterization
The sum of emotion attribute probability of emotion attribute value of affective tag;Or calculate the environment that the environmental labels are characterized in recognition result
The sum of environment attribute probability of attribute value;Determine the corresponding feelings of maximum emotion attribute probability in the sum of described emotion attribute probability
Feel attribute value;Alternatively, determining the corresponding environment attribute value of maximum environment attribute probability in the sum of described environment attribute probability;It will
The emotion attribute is worth corresponding affective tag, is determined as the corresponding emotion attribute of the target data;Alternatively, by the environment
The corresponding environmental labels of attribute value are determined as the corresponding emotion attribute of the target data.
It when the specific processor 1001 runs the computer program, also executes: being based on the facial feature data, really
The corresponding facial area of each group photo image in the fixed sample data;Using the corresponding sample data set of the facial area,
Network fine tuning is carried out to the face classifier, the face classifier after being finely tuned.
It when the specific processor 1001 runs the computer program, also executes: being based on the environmental characteristic data, really
The corresponding environmental area of each group photo image in the fixed sample data;Using the corresponding sample data set in the environmental area,
Network fine tuning is carried out to the environment classifier, the environment classifier after being finely tuned.
It when the specific processor 1001 runs the computer program, also executes: being based on the skeleton characteristic
According to determining the corresponding skeleton key point of each group photo image in the sample data;Utilize the skeleton key point
Corresponding sample data set carries out network fine tuning to the skeleton classifier, the skeleton classifier after being finely tuned.
In the exemplary embodiment, the embodiment of the invention also provides a kind of computer readable storage medium, for example including
The memory 1002 of computer program, above-mentioned computer program can be held by the processor 1001 of the analytical equipment 1000 of affection data
Row, to complete step described in preceding method.Computer readable storage medium can be FRAM, ROM, PROM, EPROM, EEPROM,
The memories such as Flash Memory, magnetic surface storage, CD or CD-ROM;Be also possible to include one of above-mentioned memory or
The various equipment of any combination, such as mobile phone, computer, tablet device, personal digital assistant.
A kind of computer readable storage medium, is stored thereon with computer program, which is run by processor
When, it executes: using nonlinear mapping function that the facial feature data, the environmental characteristic data and/or skeleton is special
Sign data are mapped in the same higher dimensional space, are obtained except the facial feature data, the environmental characteristic data and/or human body
New characteristic except skeleton character data;Using the new characteristic as the fusion feature data.
It when the computer program is run by processor, also executes: the facial feature data is based on, to the sample data
Carry out emotional semantic classification;Affective tag label is carried out to the corresponding sample data of emotional category each in classification results.
It when the computer program is run by processor, also executes: the environmental characteristic data is based on, to the sample data
Carry out environment classification;Environmental labels label is carried out to the corresponding sample data of environment category each in classification results.
It when the computer program is run by processor, also executes: calculating the feelings for characterizing the affective tag in recognition result
Feel the sum of the emotion attribute probability of attribute value;Or calculate the environment that the environment attribute value of the environmental labels is characterized in recognition result
The sum of attribute probability;Determine the corresponding emotion attribute value of maximum emotion attribute probability in the sum of described emotion attribute probability;Or
Person determines the corresponding environment attribute value of maximum environment attribute probability in the sum of described environment attribute probability;By the emotion category
Property the corresponding affective tag of value, be determined as the corresponding emotion attribute of the target data;Alternatively, the environment attribute value is corresponding
Environmental labels, be determined as the corresponding emotion attribute of the target data.
It when the computer program is run by processor, also executes: being based on the facial feature data, determine the sample number
The corresponding facial area of each group photo image in;Using the corresponding sample data set of the facial area, to the face part
Class device carries out network fine tuning, the face classifier after being finely tuned.
It when the computer program is run by processor, also executes: being based on the environmental characteristic data, determine the sample number
The corresponding environmental area of each group photo image in;Using the corresponding sample data set in the environmental area, to the environment point
Class device carries out network fine tuning, the environment classifier after being finely tuned.
It when the computer program is run by processor, also executes: being based on the skeleton characteristic, determine the sample
The corresponding skeleton key point of each group photo image in notebook data;Utilize the corresponding sample data of the skeleton key point
Collection carries out network fine tuning to the skeleton classifier, the skeleton classifier after being finely tuned.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.
Claims (10)
1. a kind of analysis method of affection data, which comprises
Feature extraction is carried out to the sample data of group photo image, it is special to obtain facial feature data in the sample data, environment
Levy data and skeleton characteristic;
The facial feature data, the environmental characteristic data and/or skeleton characteristic are subjected to Fusion Features, obtained
Fusion feature data in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic;
It is trained based on the facial feature data confronting portions class device, the face classifier after being trained;Based on described
Environmental characteristic data are trained environment classifier, the environment classifier after being trained;Based on the skeleton feature
Data are trained skeleton classifier, the skeleton classifier after being trained;Based on the fusion feature data
Integrated classification device is trained, the integrated classification device after being trained;
After the face classifier after training, the environment classifier after training, the skeleton classifier after training and training
Integrated classification device, respectively to group photo image target data carry out image recognition, the number of targets is determined according to recognition result
According to corresponding emotion attribute.
2. according to the method described in claim 1, by the facial feature data, the environmental characteristic data and/or human body bone
Bone characteristic carries out Fusion Features, obtains except the facial feature data, the environmental characteristic data and/or skeleton are special
Levy the fusion feature data except data, comprising:
Using nonlinear mapping function by the facial feature data, the environmental characteristic data and/or skeleton characteristic
According to being mapped in the same higher dimensional space, obtain except the facial feature data, the environmental characteristic data and/or skeleton
New characteristic except characteristic;
Using the new characteristic as the fusion feature data.
3. according to the method described in claim 1, the method also includes:
Based on the facial feature data, emotional semantic classification is carried out to the sample data;
Affective tag label is carried out to the corresponding sample data of emotional category each in classification results.
4. according to the method described in claim 1, the method also includes:
Based on the environmental characteristic data, environment classification is carried out to the sample data;
Environmental labels label is carried out to the corresponding sample data of environment category each in classification results.
5. the method according to claim 3 or 4, the method also includes:
It calculates in recognition result and characterizes the sum of emotion attribute probability of emotion attribute value of the affective tag;Or calculate identification knot
The sum of the environment attribute probability of environment attribute value of the environmental labels is characterized in fruit;
Determine the corresponding emotion attribute value of maximum emotion attribute probability in the sum of described emotion attribute probability;By the emotion category
Property the corresponding affective tag of value, be determined as the corresponding emotion attribute of the target data;
Alternatively, determining the corresponding environment attribute value of maximum environment attribute probability in the sum of described environment attribute probability;It will be described
The corresponding environmental labels of environment attribute value are determined as the corresponding emotion attribute of the target data.
6. being instructed according to the method described in claim 1, being trained based on the facial feature data confronting portions class device
Face classifier after white silk, comprising:
Based on the facial feature data, the corresponding facial area of each group photo image in the sample data is determined;Using institute
The corresponding sample data set of facial area is stated, network fine tuning is carried out to the face classifier, the face classification after being finely tuned
Device.
7. being instructed according to the method described in claim 1, being trained based on the environmental characteristic data to environment classifier
Environment classifier after white silk, comprising:
Based on the environmental characteristic data, the corresponding environmental area of each group photo image in the sample data is determined;Using institute
The corresponding sample data set in environmental area is stated, network fine tuning is carried out to the environment classifier, the environment classification after being finely tuned
Device.
8. according to the method described in claim 1, being instructed based on the skeleton characteristic to skeleton classifier
Practice, the skeleton classifier after being trained, comprising:
Based on the skeleton characteristic, determine that the corresponding skeleton of each group photo image is crucial in the sample data
Point;Using the corresponding sample data set of the skeleton key point, network fine tuning is carried out to the skeleton classifier, is obtained
Skeleton classifier after to fine tuning.
9. a kind of analytical equipment of affection data, described device include:
Extraction unit carries out feature extraction for the sample data to group photo image, and the face obtained in the sample data is special
Levy data, environmental characteristic data and skeleton characteristic;
Integrated unit, for carrying out the facial feature data, the environmental characteristic data and/or skeleton characteristic
Fusion Features obtain melting in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic
Close characteristic;
Training unit, for being trained based on the facial feature data confronting portions class device, the face part after being trained
Class device;Environment classifier is trained based on the environmental characteristic data, the environment classifier after being trained;Based on described
Skeleton characteristic is trained skeleton classifier, the skeleton classifier after being trained;Based on described
Fusion feature data are trained integrated classification device, the integrated classification device after being trained;
Recognition unit, for utilizing the face classifier after training, the environment classifier after training, the skeleton point after training
Integrated classification device after class device and training carries out image recognition to the target data of group photo image respectively, true according to recognition result
Determine the corresponding emotion attribute of the target data.
10. a kind of analytical equipment of affection data, described device includes: memory, processor and is stored in memory and is located
Manage the executable program of device movement, which is characterized in that the processor executes such as claim 1 when running the executable program
To 8 described in any item affection datas analysis method the step of.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811046253.7A CN109508625A (en) | 2018-09-07 | 2018-09-07 | A kind of analysis method and device of affection data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811046253.7A CN109508625A (en) | 2018-09-07 | 2018-09-07 | A kind of analysis method and device of affection data |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109508625A true CN109508625A (en) | 2019-03-22 |
Family
ID=65745711
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811046253.7A Pending CN109508625A (en) | 2018-09-07 | 2018-09-07 | A kind of analysis method and device of affection data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109508625A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110008371A (en) * | 2019-04-16 | 2019-07-12 | 张怡卓 | A kind of individualized music recommended method and system based on facial expression recognition |
CN111179175A (en) * | 2019-12-27 | 2020-05-19 | 深圳力维智联技术有限公司 | Image processing method and device based on convolutional neural network and storage medium |
CN111832364A (en) * | 2019-04-22 | 2020-10-27 | 普天信息技术有限公司 | Face recognition method and device |
CN112215930A (en) * | 2020-10-19 | 2021-01-12 | 珠海金山网络游戏科技有限公司 | Data processing method and device |
CN112784776A (en) * | 2021-01-26 | 2021-05-11 | 山西三友和智慧信息技术股份有限公司 | BPD facial emotion recognition method based on improved residual error network |
JP2022503426A (en) * | 2019-09-27 | 2022-01-12 | ベイジン センスタイム テクノロジー デベロップメント カンパニー, リミテッド | Human body detection methods, devices, computer equipment and storage media |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20090055426A (en) * | 2007-11-28 | 2009-06-02 | 중앙대학교 산학협력단 | Emotion recognition mothod and system based on feature fusion |
CN105138991A (en) * | 2015-08-27 | 2015-12-09 | 山东工商学院 | Video emotion identification method based on emotion significant feature integration |
CN105739688A (en) * | 2016-01-21 | 2016-07-06 | 北京光年无限科技有限公司 | Man-machine interaction method and device based on emotion system, and man-machine interaction system |
CN107169508A (en) * | 2017-04-17 | 2017-09-15 | 杭州电子科技大学 | A kind of cheongsam Image emotional semantic method for recognizing semantics based on fusion feature |
CN107729835A (en) * | 2017-10-10 | 2018-02-23 | 浙江大学 | A kind of expression recognition method based on face key point region traditional characteristic and face global depth Fusion Features |
-
2018
- 2018-09-07 CN CN201811046253.7A patent/CN109508625A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20090055426A (en) * | 2007-11-28 | 2009-06-02 | 중앙대학교 산학협력단 | Emotion recognition mothod and system based on feature fusion |
CN105138991A (en) * | 2015-08-27 | 2015-12-09 | 山东工商学院 | Video emotion identification method based on emotion significant feature integration |
CN105739688A (en) * | 2016-01-21 | 2016-07-06 | 北京光年无限科技有限公司 | Man-machine interaction method and device based on emotion system, and man-machine interaction system |
CN107169508A (en) * | 2017-04-17 | 2017-09-15 | 杭州电子科技大学 | A kind of cheongsam Image emotional semantic method for recognizing semantics based on fusion feature |
CN107729835A (en) * | 2017-10-10 | 2018-02-23 | 浙江大学 | A kind of expression recognition method based on face key point region traditional characteristic and face global depth Fusion Features |
Non-Patent Citations (2)
Title |
---|
XIN GUO ET AL: "《Group-level emotion recognition using deep models on image scene, faces, and skeletons》", 《PROCEEDINGS OF THE 19TH ACM INTERNATIONAL CONFERENCE ON MULTIMODAL INTERACTION》 * |
陈国青等: "《中国信息系统研究 新兴技术背景下的机遇与挑战》", 30 November 2011, 同济大学出版社 * |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110008371A (en) * | 2019-04-16 | 2019-07-12 | 张怡卓 | A kind of individualized music recommended method and system based on facial expression recognition |
CN111832364A (en) * | 2019-04-22 | 2020-10-27 | 普天信息技术有限公司 | Face recognition method and device |
CN111832364B (en) * | 2019-04-22 | 2024-04-23 | 普天信息技术有限公司 | Face recognition method and device |
JP2022503426A (en) * | 2019-09-27 | 2022-01-12 | ベイジン センスタイム テクノロジー デベロップメント カンパニー, リミテッド | Human body detection methods, devices, computer equipment and storage media |
JP7101829B2 (en) | 2019-09-27 | 2022-07-15 | ベイジン・センスタイム・テクノロジー・デベロップメント・カンパニー・リミテッド | Human body detection methods, devices, computer equipment and storage media |
CN111179175A (en) * | 2019-12-27 | 2020-05-19 | 深圳力维智联技术有限公司 | Image processing method and device based on convolutional neural network and storage medium |
CN111179175B (en) * | 2019-12-27 | 2023-04-07 | 深圳力维智联技术有限公司 | Image processing method and device based on convolutional neural network and storage medium |
CN112215930A (en) * | 2020-10-19 | 2021-01-12 | 珠海金山网络游戏科技有限公司 | Data processing method and device |
CN112784776A (en) * | 2021-01-26 | 2021-05-11 | 山西三友和智慧信息技术股份有限公司 | BPD facial emotion recognition method based on improved residual error network |
CN112784776B (en) * | 2021-01-26 | 2022-07-08 | 山西三友和智慧信息技术股份有限公司 | BPD facial emotion recognition method based on improved residual error network |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109508625A (en) | A kind of analysis method and device of affection data | |
CN109359538B (en) | Training method of convolutional neural network, gesture recognition method, device and equipment | |
US20210271862A1 (en) | Expression recognition method and related apparatus | |
He et al. | Supercnn: A superpixelwise convolutional neural network for salient object detection | |
CN110532920B (en) | Face recognition method for small-quantity data set based on FaceNet method | |
Zhang et al. | End-to-end photo-sketch generation via fully convolutional representation learning | |
CN112990054B (en) | Compact linguistics-free facial expression embedding and novel triple training scheme | |
Guo et al. | Automatic image cropping for visual aesthetic enhancement using deep neural networks and cascaded regression | |
Nakajima et al. | Full-body person recognition system | |
CN109492638A (en) | Method for text detection, device and electronic equipment | |
Kadam et al. | Detection and localization of multiple image splicing using MobileNet V1 | |
CN109886153A (en) | A kind of real-time face detection method based on depth convolutional neural networks | |
CN112183435A (en) | Two-stage hand target detection method | |
CN107967461A (en) | The training of SVM difference models and face verification method, apparatus, terminal and storage medium | |
CN109325408A (en) | A kind of gesture judging method and storage medium | |
CN112257665A (en) | Image content recognition method, image recognition model training method, and medium | |
CN111291713B (en) | Gesture recognition method and system based on skeleton | |
Wang et al. | Labanotation generation from motion capture data for protection of folk dance | |
CN112819510A (en) | Fashion trend prediction method, system and equipment based on clothing multi-attribute recognition | |
CN114627312B (en) | Zero sample image classification method, system, equipment and storage medium | |
Fu et al. | Complementarity-aware Local-global Feature Fusion Network for Building Extraction in Remote Sensing Images | |
Begum et al. | A novel approach for multimodal facial expression recognition using deep learning techniques | |
CN113688864B (en) | Human-object interaction relation classification method based on split attention | |
CN115083011A (en) | Sign language understanding visual gesture recognition method and system based on deep learning, computer device and storage medium | |
CN114998702A (en) | Entity recognition and knowledge graph generation method and system based on BlendMask |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190322 |