CN109508625A

CN109508625A - A kind of analysis method and device of affection data

Info

Publication number: CN109508625A
Application number: CN201811046253.7A
Authority: CN
Inventors: 徐嵚嵛; 李琳; 周冰; 崔兴宇
Original assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; MIGU Culture Technology Co Ltd
Priority date: 2018-09-07
Filing date: 2018-09-07
Publication date: 2019-03-22

Abstract

The invention discloses a kind of analysis methods of affection data, which comprises carries out feature extraction to the sample data of group photo image, obtains facial feature data, environmental characteristic data and the skeleton characteristic in sample data；Facial feature data, environmental characteristic data and/or skeleton characteristic are subjected to Fusion Features, obtain the fusion feature data in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic；Utilize the face classifier after training, the environment classifier after training, the skeleton classifier after training and the integrated classification device after training, image recognition is carried out to the target data of group photo image respectively, the corresponding emotion attribute of the target data is determined according to recognition result.The present invention also provides a kind of devices of affection data.

Description

A kind of analysis method and device of affection data

Technical field

The present invention relates to the analysis methods and device of data analysis technique more particularly to a kind of affection data.

Background technique

The prior art can only carry out emotion recognition to the picture comprising single personage, and can not be to including multiple personages Picture of taking a group photo carries out emotion recognition.

Summary of the invention

In order to solve the above technical problems, the embodiment of the invention provides a kind of analysis method of affection data and devices.

The technical solution of the embodiment of the present invention is achieved in that

One side according to an embodiment of the present invention provides a kind of analysis method of affection data, which comprises

Feature extraction is carried out to the sample data of group photo image, obtains facial feature data in the sample data, ring Border characteristic and skeleton characteristic；

The facial feature data, the environmental characteristic data and/or skeleton characteristic are subjected to Fusion Features, Obtain the fusion feature number in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic According to；

It is trained based on the facial feature data confronting portions class device, the face classifier after being trained；It is based on The environmental characteristic data are trained environment classifier, the environment classifier after being trained；Based on the skeleton Characteristic is trained skeleton classifier, the skeleton classifier after being trained；Based on the fusion feature Data are trained integrated classification device, the integrated classification device after being trained；

Utilize the face classifier after training, the environment classifier after training, the skeleton classifier after training and instruction Integrated classification device after white silk carries out image recognition to the target data of group photo image respectively, determines the mesh according to recognition result Mark the corresponding emotion attribute of data.

In above scheme, by the facial feature data, the environmental characteristic data and/or skeleton characteristic into Row Fusion Features obtain in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic Fusion feature data, comprising:

It is using nonlinear mapping function that the facial feature data, the environmental characteristic data and/or skeleton is special Sign data are mapped in the same higher dimensional space, are obtained except the facial feature data, the environmental characteristic data and/or human body New characteristic except skeleton character data；

Using the new characteristic as the fusion feature data.

In above scheme, the method also includes:

Based on the facial feature data, emotional semantic classification is carried out to the sample data；

Affective tag label is carried out to the corresponding sample data of emotional category each in classification results.

In above scheme, the method also includes:

Based on the environmental characteristic data, environment classification is carried out to the sample data；

Environmental labels label is carried out to the corresponding sample data of environment category each in classification results.

In above scheme, the method also includes:

It calculates in recognition result and characterizes the sum of emotion attribute probability of emotion attribute value of the affective tag；Or it calculates and knows The sum of environment attribute probability of environment attribute value of the environmental labels is characterized in other result；

Determine the corresponding emotion attribute value of maximum emotion attribute probability in the sum of described emotion attribute probability；By the feelings Feel the corresponding affective tag of attribute value, is determined as the corresponding emotion attribute of the target data；

Alternatively, determining the corresponding environment attribute value of maximum environment attribute probability in the sum of described environment attribute probability；It will The corresponding environmental labels of the environment attribute value, are determined as the corresponding emotion attribute of the target data.

In above scheme, it is trained based on the facial feature data confronting portions class device, the face after being trained Classifier, comprising:

Based on the facial feature data, the corresponding facial area of each group photo image in the sample data is determined；Benefit With the corresponding sample data set of the facial area, network fine tuning is carried out to the face classifier, the face after being finely tuned Classifier.

In above scheme, environment classifier is trained based on the environmental characteristic data, the environment after being trained Classifier, comprising:

Based on the environmental characteristic data, the corresponding environmental area of each group photo image in the sample data is determined；Benefit With the corresponding sample data set in the environmental area, network fine tuning is carried out to the environment classifier, the environment after being finely tuned Classifier.

In above scheme, skeleton classifier is trained based on the skeleton characteristic, is trained Skeleton classifier afterwards, comprising:

Based on the skeleton characteristic, the corresponding skeleton of each group photo image in the sample data is determined Key point；Using the corresponding sample data set of the skeleton key point, it is micro- that network is carried out to the skeleton classifier It adjusts, the skeleton classifier after being finely tuned.

Another aspect according to embodiments of the present invention, provides a kind of analytical equipment of affection data, described device includes:

Extraction unit carries out feature extraction for the sample data to group photo image, obtains the face in the sample data Portion's characteristic, environmental characteristic data and skeleton characteristic；

Integrated unit is used for the facial feature data, the environmental characteristic data and/or skeleton characteristic Fusion Features are carried out, are obtained in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic Fusion feature data；

Training unit, for being trained based on the facial feature data confronting portions class device, the face after being trained Classifier；Environment classifier is trained based on the environmental characteristic data, the environment classifier after being trained；It is based on The skeleton characteristic is trained skeleton classifier, the skeleton classifier after being trained；It is based on The fusion feature data are trained integrated classification device, the integrated classification device after being trained；

Recognition unit, for utilizing the face classifier after training, the environment classifier after training, the human body bone after training Integrated classification device after bone classifier and training carries out image recognition to the target data of group photo image respectively, is tied according to identification Fruit determines the corresponding emotion attribute of the target data.

The third aspect according to embodiments of the present invention, provides a kind of analytical equipment of affection data, described device includes: Memory, processor and it is stored in the executable program that memory is moved by processor, which is characterized in that the processor fortune The analysis method of affection data described in any one of analysis method of above-mentioned affection data is executed when the row executable program The step of.

In the technical solution of the embodiment of the present invention, the analysis method and device of a kind of affection data are provided, by extracting sample Facial feature data, environmental characteristic data and skeleton characteristic in notebook data carry out the crowd in group photo image Emotional intensity analysis, and extra heavy fusion is carried out by facial feature data and environmental characteristic data, it obtains except the facial characteristics Fusion feature data except data, the environmental characteristic data and/or skeleton characteristic, can obtain group photo image In multiple personages whole emotional intensity.To improve the emotion recognition degree for image of taking a group photo.

Detailed description of the invention

Fig. 1 is the flow diagram of the analysis method of affection data in the embodiment of the present invention；

Fig. 2 is concatenated convolutional neural network flow diagram；

Fig. 3 is the basic procedure schematic diagram of fine tuning；

Fig. 4 is the human facial expression recognition schematic illustration being finely adjusted based on method for trimming to VGG-FACE model；

Fig. 5 is the schematic network structure of VGG-FACE model；

Fig. 6 is convolution process operation chart in neural network；

Fig. 7 is maximum value pond process operation chart；

Fig. 8 is the structural schematic diagram of Inception module；

Fig. 9 is the analytical equipment structure composition schematic diagram one of affection data in the embodiment of the present invention；

Figure 10 is the structure composition schematic diagram two of the analytical equipment of affection data.

Specific embodiment

Detailed description of the preferred embodiments with reference to the accompanying drawing.It should be understood that this place is retouched The specific embodiment stated is merely to illustrate and explain the present invention, and is not intended to restrict the invention.

Fig. 1 is the flow diagram of the analysis method of affection data in the embodiment of the present invention；As shown in Figure 1, the method Include:

Step 101, feature extraction is carried out to the sample data of group photo image, obtains the facial characteristics in the sample data Data, environmental characteristic data and skeleton characteristic；

Here, group photo image specifically can be include at least two personage's head portraits image.

Specifically, this method is mainly used in the image processing terminal with image processing function, the image processing terminal It can be mobile phone, computer, monitoring device etc..

Firstly, image processing terminal needs to first pass through net before the sample data to group photo image carries out feature extraction Network crawler obtains group photo image that is a large amount of and including at least two personage's head portraits, accessed group photo figure from network The quantity of picture can be greater than 100,000.

Then, image processing terminal carries out optical sieving to a large amount of group photo image got again.Specifically, at image Effective group photo image can only be retained, then will effectively take a group photo figure for the invalid image-erasing in image of taking a group photo by managing terminal As being saved as the sample data of group photo image, iconic model is trained so as to subsequent.

Here, invalid image can refer to the image that nobody's face or face are not known.

In the application, after image processing terminal gets effective group photo image, to the sample data of group photo image into Row feature extraction, to obtain the facial feature data in the sample data, environmental characteristic data and skeleton characteristic.

Specifically, image processing terminal can pass through when the sample data to group photo image carries out facial feature extraction Face recognition technology group photo image corresponding to sample data carries out Face datection, determines face location according to testing result. Then, the face picture in group photo image is intercepted based on face location, the face picture of interception is updated to human face recognition model In, the facial feature data in the group photo image can be obtained.

Here, image processing terminal specifically can be using cascade convolutional neural networks (MTCNN algorithm) to group photo image The detection of human face five-sense-organ characteristic point is carried out, the face position in group photo image is then calibrated using the human face five-sense-organ characteristic point detected It sets.

Specifically, MTCNN is a kind of cascade convolutional neural networks frame, and MTCNN will by way of multi-task learning Two tasks of Face datection and human face five-sense-organ positioning feature point integrate.MTCNN network structure mainly includes three phases, Each stage is made of a convolutional neural networks (CNN).

Firstly, passing through the convolutional neural networks (Proposal of a shallow-layer in the first stage of MTCNN network structure Network, P-Net) structure group photo image in obtain human face region candidate window and bounding box regression vector, and use this Bounding box returns, and calibrates to candidate window, and the candidate of high superposed is then merged by non-maxima suppression (NMS) Frame, to quickly generate a large amount of candidate windows；

Secondly, passing through an opposite P-Net more complex convolutional neural networks in the second stage of MTCNN network structure (Refine Network, R-Net) structure optimizes candidate window to exclude non-face window；

Finally, passing through a more complicated convolutional neural networks (Output in the phase III of MTCNN network structure Network, O-Net) structure optimizes output window again, while exporting the coordinate of five human face characteristic points.Here, five faces Characteristic point includes: nose characteristic point, left and right corners of the mouth characteristic point and right and left eyes characteristic point.It is specific as shown in Figure 2.

Fig. 2 is concatenated convolutional neural network flow chart, as shown in Figure 2:

Firstly, by the group photo image scaling of input to different sizes, image pyramid is formed, and by the image gold word of formation Tower is as input picture.

Then, in the first stage (Proposal Network (P-Net)), schemed using the full convolutional network of P-Net in input The candidate window of human face region and the regression vector of bounding box are obtained as in, and return (Bounding box using bounding box Regression method) corrects candidate window, then reuses the candidate that non-maxima suppression (NMS) merges high superposed Frame.

At second stage (Refine Network (R-Net)), improve candidate window using R-Net.The time of P-Net will be passed through Window is selected to be input in R-Net, refusal falls the window of most of false (non-face), continues to use the method and non-of frame recurrence The method that maximum inhibits merges the candidate frame of overlapping.

At phase III (Output Network (O-Net)), final face frame and feature point are exported using O-Net It sets.

Wherein, the phase III is similar with second stage step, the difference is that generating 5 features of face in the phase III Point position.

It, can also be to identifying after the facial image that image processing terminal identifies in group photo image in the application Facial image is screened.Specifically, size can be extracted suitable, image clearly, positive face picture, gives up ruler Very little too small, the ambiguous face of image, to improve the quality of training pattern.

Here, image processing terminal facial image for extracting in group photo image can be include face histogram Piece, it is each group photo image at least exist there are two include face rectangle picture.When the short side of the rectangle picture is less than 24 When pixel, which can directly be given up, and save the maximum face picture of size in rectangle picture.

In the application, image processing terminal can also utilize interception after getting the face picture in group photo image Face picture is finely adjusted preset VGG-FACE model, is specifically finely adjusted using method for trimming to VGG-FACE model.

In the application, the face picture that image processing terminal will acquire is updated to the VGG-FACE model after fine tuning, i.e., Face affective characteristics can be obtained, wherein the VGG-FACE model after fine tuning can identify the feelings that face is shown in face picture Sense and emotional intensity.In other words, the emotion that face is shown both had been carried in the face affective characteristics, such as: " happiness ", " indignation ", " sadness " etc. also carry the strength grade of corresponding emotion, such as: " joyful ", " great rejoicing ", " wild with joy " etc..It is specific micro- The basic schematic diagram adjusted is as shown in Figure 3.

Fig. 3 is that neural network finely tunes schematic illustration, as shown in figure 3, adjusting each layer of ginseng using reflection propagation algorithm Number.Here, the object of fine tuning can be all network layers, be also possible to designated layer.Specifically, network layer includes: convolutional layer, pond Change layer and inception layers.

It specifically, is generally only higher to the number of plies in network in the case where the data set for fine tuning is not sufficiently more Parameter be finely adjusted, reason is to prevent over-fitting, and with the raising of convolution depth, the feature of extraction can exist larger Difference.It is generally believed that the convolutional neural networks feature of lower level has versatility for image data, can incite somebody to action It regards edge detector or color lump detector as.And with the promotion of the number of plies, feature can include the thin of more current data sets Save information.

In the application, main selection (containing 2.6M face-images, covers about 2.6K difference to a large amount of human face datas Personage) as training main body VGG-FACE model be finely adjusted.VGG-FACE model is that a kind of depth is very big, but ties Structure is relatively easy and effective convolutional neural networks, is mainly used for the training of face-image.It is specifically based on method for trimming pair The human facial expression recognition principle that VGG-FACE model is finely adjusted is as shown in Figure 4.

In Fig. 4, which includes 11 computing modules altogether, and each computing module is arrived more by a linear processor and one A nonlinear processor constitutes (predominantly RELU non-linear unit and maximum pond filter).Wherein the first eight computing module Referred to as convolution module, linear processor are one group of linear filter group, the main operation for completing linear convolution；Three fortune afterwards It calculates module and is referred to as full link block, linear processor is also one group of linear filter group, and difference is each filter volume The size and input data of product core are consistent, i.e., each filter is directly handled whole input picture.

Fig. 5 is the schematic network structure of VGG-FACE model；The filter size of each convolutional layer is given in Fig. 5, Number of filter, the size that when convolution step-length and convolution fills outward.

When being finely adjusted using method for trimming to VGG-FACE model, the size for treating training group photo image first is carried out Adjustment makes the size to training group photo image become 224 × 224 (pixels), then takes out 80% from training group photo image Image is as sample training collection, and the image of taking-up 20% is as test sample collection from training group photo image.By will be from group photo The face picture obtained in image is input in VGG-FACE model, to obtain the face affective characteristics in the group photo image.

Conv is convolutional layer in Fig. 5, and convolutional layer is to extract characteristics of image by carrying out convolution operation to group photo image.? In convolutional neural networks, each convolutional layer would generally include multiple trainable convolution masks (i.e. convolution kernel), different convolution Template corresponds to different characteristics of image.After convolution kernel and input picture carry out convolution operation, (such as by nonlinear activation function Sigmoid function, RELU function, ELU function etc.) processing, it can map to obtain corresponding characteristic pattern (Feature Map). Wherein, the parameter of convolution kernel is usually being calculated using specific learning algorithm (such as stochastic gradient descent algorithm).Convolution Refer to that the pixel value with parameter and image corresponding position in convolution mask is weighted the operation of summation.One typical volume Product process can be as shown in Figure 6.

Fig. 6 is convolution process operation chart in neural network；In Fig. 6, with convolution nuclear convolution one 4 × 4 of 2 × 2 Matrix obtains one 3 × 3 convolution results, particular by sleiding form window, carries out to all positions in input picture Convolution operation can obtain corresponding characteristic pattern later.

Compared with traditional neural network, the outstanding advantage of convolutional neural networks is that it abandons traditional neural network " full connection " design between middle adjacent layer is greatly reduced in such a way that locally connection and weight are shared and needs training Model parameter number, reduces calculation amount.

Part connects and refers in convolutional neural networks, each neuron and a regional area in input picture It is connected, rather than is connect entirely with all neurons in input picture.Weight is shared to be referred in the not same district of input picture Domain is shared Connecting quantity (i.e. convolution nuclear parameter).In addition, the convolutional neural networks design method that locally connection and weight are shared, So that the feature that network extracts has the stability of height, it is insensitive to translation, scaling and deformation etc..

Pool is pond layer in Fig. 5, and pond layer usually occurs with convolutional layer in pairs, and after convolutional layer, pond layer is used to Down-sampled operation is carried out to input feature vector figure.Image is commonly entered after convolution operation, a large amount of characteristic patterns that can be obtained, feature Dimension is excessively high to will lead to the sharp increase of network query function amount.Therefore, pond layer greatly reduces model by the dimension of reduction characteristic pattern Number of parameters.

RELU is amendment linear unit (Rectified Linear Unit) function in Fig. 5, is a kind of artificial neural network In common activation primitive, generally refer to using ramp function and its mutation as the nonlinear function of representative.RELU functional form is such as Under:

θ (x)=max (0, x) (1)；

For entering the input vector x from upper one layer of neural network of neuron, line rectification activation primitive is used Neuron can export max (0, x), until next layer of neuron or the output as entire neural network.

In this way, on the one hand reducing the calculation amount of the network operation, the risk of network over-fitting is on the other hand also reduced.Pond It is one-to-one for changing the characteristic pattern of characteristic pattern and convolutional layer that layer obtains, therefore pondization operation is only reduction of characteristic pattern dimension Degree, number do not change.

At present there is common pond method in convolutional neural networks: maximum value pond (Max Pooling), mean value pond (Mean Pooling) and random pool (Stochastic Pooling).For a sampling subregion, maximum value pond Change the output result for referring to choosing the maximum point of wherein pixel value as the region；Mean value pond, which refers to calculating, wherein to be owned The mean value of pixel uses the mean value as the output of sampling area；Random pool refers to randomly selecting one from sampling area A pixel value exports as a result, and usual pixel value is bigger, and the probability selected is higher.Maximum value pond process such as Fig. 7 institute Show.

Fig. 7 is maximum value pond process operation chart in neural network；In Fig. 7, including one 4 × 4 image Characteristic area is region 1, region 2, region 3 and region 4 respectively, wherein includes 4 pixel values (1,1,5,6) in region 1；Area It include 4 pixel values (2,4,7,8) in domain 2；It include 4 pixel values (3,2,1,2) in region 3；It include 4 pixels in region 4 It is worth (1,0,3,4)；By choosing output of the maximum point of pixel value in each characteristic area as the region as a result, i.e. region 1 For " 6 ", region 2 be " 8 ", region 3 is " 3 ", region 4 is " 4 ".

In the application, image processing terminal may be used also after the facial feature data in the sample data for obtaining group photo image To be based on the facial feature data, emotional semantic classification is carried out to the sample data；And it is corresponding to emotional category each in classification results Sample data carry out affective tag label.

For example, using existing face recognition mode, for example, emotional semantic classification is carried out to group photo image using MTCNN algorithm, Three kinds of " positive emotion " sample, " calmness " sample, " negative emotion " sample samples are divided into a large amount of group photo images that will acquire This collection, it is, of course, also possible to carry out emotion division to group photo image by the way of semi-automatic or artificial.

It, can be corresponding to emotional category each in classification results after obtaining the emotional semantic classification result for sample data Sample data carries out affective tag label.For example, the personage in image of taking a group photo to be presented with to the figure of the positive emotions such as glad, surprised Piece is labeled as " positive emotion " sample；Picture indicia by the personage in image of taking a group photo without special expression is " calmness " sample；It will The picture indicia that personage in group photo image is presented with the negative emotions such as sadness, indignation, fear is " negative emotion " sample.

In the application, image processing terminal is when the sample data to group photo image carries out environmental characteristic extraction, firstly, can It is adjusted with the size first to group photo image, the size of the group photo image is made to become 224 × 224, then, from training group photo figure The image of taking-up 80% is as sample training collection as in, and the image of taking-up 20% is as test sample from training group photo image Collection.The face picture obtained from group photo image is then input to deep learning structure GoogLeNet and VGG-16 Structure Network Be finely adjusted in network, need at this time the number of nodes by the last full articulamentum of GoogLeNet and VGG-16 network be set as 3 or 9.It is environmental characteristic to extract the feature of draw pond layer output.Wherein, the structure of GoogLeNet network is specific such as 1 institute of table Show:

Table 1

Tool, the specific structure of the Inception module in table 1 is as shown in Figure 8.

Fig. 8 is the structural schematic diagram of Inception module, in Fig. 8, by the way that 3 × 3 convolution sum, 1 × 1 convolution to be stacked on Together, the brightness of network is on the one hand increased, on the other hand increases network to the adaptability of scale.

In table 1, Dropout layers are referred in the training process of network, ignore at random a part neuron, allow its not On the one hand work, this method can accelerate operation, on the other hand reduce the risk of model over-fitting.P=0.5 ignores at random 50% neuron.

VGG-16 network module specific structure is as shown in table 2:

Table 2

In the application, image processing terminal may be used also after the environmental characteristic data in the sample data for obtaining group photo image To be based on the environmental characteristic data, environment classification is carried out to the sample data；And to environment category each in classification results Corresponding sample data carries out environmental labels label.

For example, 9 grades can be divided into according to environment, such as: for the relevant environments such as " wedding ", " dinner party ", Can mark as great rejoicing " atmosphere the pleasant environment such as " park ", " flowers and plants " can be marked as happiness " atmosphere. Specifically, label atmosphere grade can by digital representation, such as: with digital label " -4 "-"+4 " (or 0-8 etc. other mark Label) totally 9 digital representative ring border atmosphere, " 4 " performance " wild with joy ", " 3 " performance " great rejoicing ", " 2 " performance " joyful ", " 1 " performance are " pleased Fastly ", " 0 " performance " calmness ", " -1 " performance " sad ", " -2 " performance " sadness ", " -3 " performance " grief ", " -4 " performance are " sad Sorrow ".Above-mentioned atmosphere grade can be marked on picture by way of labelling, then to pass through search quick obtaining.

In the application, image processing terminal is when the sample data to group photo image carries out skeleton feature extraction, tool The key point extraction that trained code OpenPose group photo image corresponding to sample data carries out skeleton can be used in body, In, the key point of nearly 130 skeletons can be extracted altogether, comprising: then face+movement+hand uses what is extracted Skeleton image is finely adjusted GoogLeNet the and ResNet50 model after pre-training respectively.

In the application, the skeleton image after extraction is identical as the original group photo size of image, and only includes human body bone Bone feature.

Step 102, the facial feature data, the environmental characteristic data and/or skeleton characteristic are carried out Fusion Features obtain melting in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic Close characteristic；

In the application, nonlinear mapping function specifically can use by the facial feature data, the environmental characteristic number According to and/or skeleton characteristic be mapped in the same higher dimensional space, obtain except the facial feature data, the environment New characteristic except characteristic and/or skeleton characteristic；Then, using the new characteristic as institute State fusion feature data.

Specifically, image processing terminal can be used in neural network, and full connection layer functions are defeated before softmax layer functions 4096 dimensional characteristics out are merged.

Specific fusion method can be selected: core typical case association analysis (KCCA, Kernel Canonical Correlation Analysis), nuclear matrix fusion (KMF, Kernel matrix fusion), core intersect the factor (KCFA, Kernel cross factor analysis) etc..In this way, by same model obtain except above-mentioned facial feature data, on The fusion feature except environmental characteristic data and/or skeleton characteristic is stated, can be promoted to face figure in group photo image The identification accuracy of piece.

Here, (1) is accomplished by based on the Fusion Features of KCCA algorithm

Assuming that sample X=(x₁, x₂..., x_n)^T∈R^n×pWith sample Y=(y₁, y₂..., y_n)^T∈R^n×qIt is that two mean values are Zero eigenmatrix, wherein every a line x of matrix_iAnd y_iIndicate a feature vector, line number n indicates that this feature matrix characterizes Data set size, p, q respectively indicate the dimension of two kinds of features.By the feature vector composition characteristic in eigenmatrix with a line To { (x₁,y₁),(x₂,y₂),…,(x_n,y_n), each feature is to both from two different mode, canonical correlation analysis Purpose be find mapping matrix α=(α₁,α₂,…,α_d) and β=(β₁,β₂,…,β_d), d≤min (p, q) so that following formula at It is vertical:

Wherein, C_xx=X^TX, C_yy=Y^TY, C_xy=X^TY respectively indicates the auto-covariance matrix and the two of two eigenmatrixes Covariance matrix.Above-mentioned optimization problem can be converted to the solution for seeking Eigenvalue Problems:

Canonical correlation analysis is analyzed based on linear space, can not be obtained non-linear between different modalities feature Relationship, thus introduce kernel method on the basis of canonical correlation analysis, propose kernel canonical correlation analysis (KCCA) method, for original The canonical correlation analysis algorithm of beginning adds nonlinear property.The basic thought and Nonlinear Support Vector Machines class of KCCA algorithm Seemingly, it is that original eigenmatrix X and Y are mapped to higher dimensional space, i.e. nuclear space X ' and Y ', carries out correlation in nuclear space Analysis.

The majorized function of kernel canonical correlation analysis are as follows:

Wherein K_xAnd K_yIt is nuclear matrix, meets, K_x=X '^TX ', K_y=Y '^TY′.It is similar with CCA, above-mentioned majorized function Solution can also be converted to the problem of seeking characteristic value, due to being related to inverse of a matrix in characteristic value solution procedure, and nuclear matrix without Method guarantees invertibity, in order to solve this problem, carries out Regularization to formula (5):

Wherein 0≤t≤1 is regularization coefficient.

(2) Fusion Features based on KMF are accomplished by

The thought of nuclear matrix fusion is finding a public subspace for two different mode, which can be maximum Degree characterize the feature of two mode.Assuming that X=(x₁, x₂..., x_n)^TWith Y=(y₁, y₂..., y_n)^TRespectively correspond sample number According to the feature extracted from two mode, K_xAnd K_yThe nuclear matrix of two mode is respectively corresponded, is met WhereinIndicate core letter, above-mentioned two nuclear matrix is combined by algebraic operation, be can choose core by nuclear matrix fusion The weighted sum either product of two elements of position is corresponded in matrix as the nuclear matrix element after combination.It selects herein previous Kind fusion method, fused matrix can indicate are as follows:

K_f=aK_x+bK_y (7)

Wherein, a+b=1.After obtaining fused matrix, dimension-reduction treatment can be carried out to it by traditional kernel method, To obtain compressed characteristic value.

(3) Fusion Features based on KCFA are accomplished by

Projection matrix in canonical correlation analysis (CCA, Canonical Correlation Analysis) algorithm is asked Solution can be converted to the problem of solving eigenmatrix, but this also correspondingly requires covariance matrix reversible, and the requirement is certain The application of canonical correlation analysis is limited in degree.Cross over model factorial analysis improves canonical correlation analysis, mesh Mark is to find to make the mapping matrix of Frobenius Norm minimum in projector space.Assuming that the eigenmatrix of two mode is respectively X ∈R^n×p, Y ∈ R^n×q, the corresponding matrix of a linear transformation is respectively U=(u₁, u₂..., u_d), V=(v₁, v₂..., v_d), meet d ≤ min (p, q), the then majorized function of the cross over model factor are as follows:

WhereinIndicate the Frobenius norm of input matrix, I indicates unit matrix.By Known to the property of Frobenius norm:Tr () is indicated The mark of matrix.It is determining, thus tr (XX since X and Y is as eigenmatrix^T) and tr (YY^T) it is constant, then formula (8) can be with Abbreviation are as follows:

min_U,V(XUV^TY^T) s.t. U^TU=I, V^TV=1 (9)

The solution of the above problem can be converted to the problem of singular value decomposition, enable X^TY=S_xyΛ_xyD_xy, then corresponding transfer Matrix is respectively U=S_xy, V=D_xy。

Basic cross over model factorial analysis can only learn the linear character of both modalities which, can equally incite somebody to action by kernel method It is core cross over model factorial analysis (KCFA) that it, which is expanded,.Assuming that after X ' and Y ' respectively X and Y Nonlinear Mapping to higher dimensional space Eigenmatrix, K_x, K_yFor corresponding nuclear matrix.Similar with cross over model factorial analysis, cross over model factorial analysis can also be converted To solve X '^TThe singular value decomposition problem of Y '.

Step 103, it is trained based on the facial feature data confronting portions class device, the face classification after being trained Device；Environment classifier is trained based on the environmental characteristic data, the environment classifier after being trained；Based on the people Body skeleton character data are trained skeleton classifier, the skeleton classifier after being trained；Melted based on described It closes characteristic to be trained integrated classification device, the integrated classification device after being trained.

In the application, the type of classifier can be support vector machines (SVM, Support Vector Machine), k most The types such as neighbouring sorting algorithm KNN, fully-connected network it is one or more, the type of classifier corresponding to each feature can be with It is identical to can also be different.

Specifically, image processing terminal is trained based on the facial feature data confronting portions class device, is trained When rear face classifier, it can be based on the facial feature data, determine each group photo image correspondence in the sample data Facial area；Using the corresponding sample data set of the facial area, network fine tuning is carried out to the face classifier, is obtained Face classifier after fine tuning.

Image processing terminal is based on the environmental characteristic data and is trained to environment classifier, the environment after being trained When classifier, the environmental characteristic data can be specifically based on, determine the corresponding ring of each group photo image in the sample data Border region；Using the corresponding sample data set in the environmental area, network fine tuning is carried out to the environment classifier, is finely tuned Environment classifier afterwards.

Image processing terminal is based on the skeleton characteristic and is trained to skeleton classifier, is trained When rear skeleton classifier, it can be specifically based on the skeleton characteristic, be determined each in the sample data The corresponding skeleton key point of group photo image；Using the corresponding sample data set of the skeleton key point, to the people Body bone classifier carries out network fine tuning, the skeleton classifier after being finely tuned.

Step 104, the face classifier after training, the environment classifier after training, the skeleton point after training are utilized Integrated classification device after class device and training carries out image recognition to the target data of group photo image respectively, true according to recognition result Determine the corresponding emotion attribute of the target data.

In the application, the environment classifier after face classifier, training after image processing terminal is using training, training Skeleton classifier afterwards and the integrated classification device after training, carry out image recognition to the target data of group photo image respectively When, it can also calculate and characterize the sum of emotion attribute probability of emotion attribute value of the affective tag in recognition result；Or it calculates The sum of the environment attribute probability of environment attribute value of the environmental labels is characterized in recognition result；Then, according to the emotion category Property the sum of probability in the corresponding emotion attribute of maximum emotion attribute probability be worth corresponding affective tag, be determined as the number of targets According to corresponding emotion attribute；Alternatively, according to the corresponding environment of maximum environment attribute probability in the sum of described environment attribute probability The corresponding environmental labels of attribute value are determined as the corresponding emotion attribute of the target data.

Specifically, image processing terminal gets the face classifier after training, the environment classifier after training, after training Skeleton classifier and training after integrated classification device after, can by each classifier exploitation right reassignment algorithm, count Calculate prediction result corresponding with each classifier.

For example, for the ambient image for image of taking a group photo.The ambient image is input to GoogLeNent network and VGG-16 In network, prediction calculating is carried out, obtains the prediction result of ambient image；

For the facial image in group photo image, which is input in Goog-FACE network, prediction meter is carried out It calculates, obtains the prediction result of facial image；

For the first human body bone image (face+body) in group photo image, which is input to In GoogLeNent network, prediction calculating is carried out, obtains the prediction result of the first human body bone image；

For the second skeleton image (face+body+hand) in group photo image, which is inputted Into GoogLeNent network, prediction calculating is carried out, obtains the prediction result of the second skeleton image；

In the application, the feature in ambient image and facial image can also be merged, be obtained in addition to the environment map The prediction result of blending image other than picture and the facial image.

Here, it also can wrap in blending image containing new ambient image and facial image, mentioned as long as not including The ambient image and facial image of taking-up.

In the application, after the corresponding image prediction result of each classifier will be obtained, the fusion of decision-making level's information can use Method to obtained in step 103 Various Classifiers on Regional (training after face classifier, training after environment classifier, training Integrated classification device after rear skeleton classifier and training) prediction result merged.

Specifically, image processing terminal can filter out each personage's from group photo image according to grade classification algorithm Affective characteristics, here, using the affective characteristics of each personage in image of taking a group photo as crowd's affective characteristics of the group photo image.

Here it is possible to pass through the face classifier after training, instruction using affective tag or environmental labels in group photo image The skeleton classifier after environment classifier, training after white silk and the integrated classification device after training, respectively to group photo image Target data carries out image recognition, and in image recognition processes, determines the corresponding image attributes of target data, here, figure As attribute includes affective tag or environmental labels.

Specifically, the probability that affective tag or environmental labels can be belonged to by calculating target data, determines target data Corresponding image attributes.Then, after summing to the probability of each label, the corresponding label of a maximum probability is made For the target labels of target data, which is then the image attributes of the target data.It is carried out specific to target data The calculation formula of image attributes prediction is as follows:

S=ω₀S^scene+ω₁S^fusion+ω₂S^face+ω₃S^skeleton1+ω₄S^skeleton2 (10)

Wherein, S is to predict that target data belongs to the probability of certain tag attributes, ω₀-ω₄Refer to and respectively corresponds various features Weight, and meet ω₀+ω₁+ω₂+ω₃+ω₄=1.

This programme is analyzed using the emotional intensity of face affective characteristics, the crowd of scene characteristic and skeleton feature, And merged by the emotional intensity feature of face and the feature of scene, it obtains except the face characteristic and environment that have identified are special Fusion feature except sign.Then, the face characteristic data, environmental characteristic data, skeleton characteristic by identifying Classifier is trained respectively with fusion feature data, obtains everyone affective characteristics in group photo image.It can obtain more The whole emotional intensity of the group photo image of people.Wherein, the identification of face is identified using two kinds of face emotion models, is mentioned The accuracy of the high face emotion recognition for multiple group photo images.

Fig. 9 is the analytical equipment structure composition schematic diagram one of affection data in the embodiment of the present invention, as shown in figure 9, described Device includes:

Extraction unit 901 carries out feature extraction for the sample data to group photo image, obtains in the sample data Facial feature data, environmental characteristic data and skeleton characteristic；

Integrated unit 902 is used for the facial feature data, the environmental characteristic data and/or skeleton feature Data carry out Fusion Features, obtain except the facial feature data, the environmental characteristic data and/or skeleton characteristic Except fusion feature data；

Training unit 903, for being trained based on the facial feature data confronting portions class device, after being trained Face classifier；Environment classifier is trained based on the environmental characteristic data, the environment classifier after being trained；Base Skeleton classifier is trained in the skeleton characteristic, the skeleton classifier after being trained；Base Integrated classification device is trained in the fusion feature data, the integrated classification device after being trained；

Recognition unit 904, for utilizing the face classifier after training, the environment classifier after training, the people after training Integrated classification device after body bone classifier and training carries out image recognition to the target data of group photo image respectively, according to knowledge Other result determines the corresponding emotion attribute of the target data.

In the application, integrated unit 902 is specifically used for utilizing nonlinear mapping function by the facial feature data, institute It states environmental characteristic data and/or skeleton characteristic is mapped in the same higher dimensional space, obtain except the facial characteristics New characteristic except data, the environmental characteristic data and/or skeleton characteristic；By the new characteristic According to as the fusion feature data.

In the application, described device further include: taxon 905 and marking unit 906；

Specifically, the taxon 905 carries out feelings to the sample data for being based on the facial feature data Sense classification；

Marking unit 906, for carrying out affective tag mark to the corresponding sample data of emotional category each in classification results Note.

In the application, the taxon 905 is also used to based on the environmental characteristic data, to the sample data into Row environment classification；

Marking unit 906 is also used to carry out environmental labels to the corresponding sample data of environment category each in classification results Label.

In the application, described device further include: computing unit 907 and determination unit 908；

Specifically, computing unit 907, for calculating the feelings for characterizing the emotion attribute value of the affective tag in recognition result Feel the sum of attribute probability；Or calculate recognition result in characterize the environmental labels environment attribute value environment attribute probability it With；

Determination unit 908, for determining the corresponding feelings of maximum emotion attribute probability in the sum of described emotion attribute probability Feel attribute value；Alternatively, determining the corresponding environment attribute value of maximum environment attribute probability in the sum of described environment attribute probability；It will The emotion attribute is worth corresponding affective tag, is determined as the corresponding emotion attribute of the target data；Alternatively, by the environment The corresponding environmental labels of attribute value are determined as the corresponding emotion attribute of the target data.

In the application, described device further include: adjustment unit 909；

Specifically, the determination unit 908 is also used to determine in the sample data based on the facial feature data The corresponding facial area of each group photo image；

The adjustment unit 909, for utilizing the corresponding sample data set of the facial area, to the face classifier Carry out network fine tuning, the face classifier after being finely tuned.

In the application, the determination unit 908 is also used to determine the sample data based on the environmental characteristic data In the corresponding environmental area of each group photo image；

The adjustment unit 909 is also used to using the corresponding sample data set in the environmental area, to the environment classification Device carries out network fine tuning, the environment classifier after being finely tuned.

In the application, the determination unit 908 is also used to determine the sample based on the skeleton characteristic The corresponding skeleton key point of each group photo image in data；

The adjustment unit 909 is also used to using the corresponding sample data set of the skeleton key point, to the people Body bone classifier carries out network fine tuning, the skeleton classifier after being finely tuned.

It should be understood that the analytical equipment of affection data provided by the above embodiment is when carrying out information push, only with The division progress of above-mentioned each program module can according to need for example, in practical application and distribute above-mentioned processing by not Same program module is completed, i.e., the internal structure of the analytical equipment of affection data is divided into different program modules, to complete All or part of processing described above.In addition, the analytical equipment and affection data of affection data provided by the above embodiment Analysis method embodiment belong to same design, specific implementation process is detailed in embodiment of the method, and which is not described herein again.

Figure 10 is the structure composition schematic diagram two of the analytical equipment of affection data, and the analytical equipment 1000 of affection data can be with Be mobile phone, computer, digital broadcast terminal, information transceiving equipment, game console, tablet device, personal digital assistant, Information Push Server, content server etc..The analytical equipment 1000 of affection data shown in Fig. 10 includes: at least one processing Device 1001, memory 1002, at least one network interface 1004 and user interface 1003.In the analytical equipment 1000 of affection data Various components be coupled by bus system 1005.It is understood that bus system 1005 is for realizing between these components Connection communication.Bus system 1005 further includes that power bus, control bus and status signal are total in addition to including data/address bus Line.But for the sake of clear explanation, various buses are all designated as bus system 1005 in Figure 10.

Wherein, user interface 1003 may include display, keyboard, mouse, trace ball, click wheel, key, button, touching Feel plate or touch screen etc..

It is appreciated that memory 1002 can be volatile memory or nonvolatile memory, volatibility may also comprise Both with nonvolatile memory.Wherein, nonvolatile memory can be read-only memory (ROM, Read Only Memory), programmable read only memory (PROM, Programmable Read-Only Memory), erasable programmable are read-only Memory (EPROM, Erasable Programmable Read-Only Memory), electrically erasable programmable read-only memory The storage of (EEPROM, Electrically Erasable Programmable Read-Only Memory), magnetic random access Device (FRAM, ferromagnetic random access memory), flash memory (Flash Memory), magnetic surface are deposited Reservoir, CD or CD-ROM (CD-ROM, Compact Disc Read-Only Memory)；Magnetic surface storage can be Magnetic disk storage or magnetic tape storage.Volatile memory can be random access memory (RAM, Random Access Memory), it is used as External Cache.By exemplary but be not restricted explanation, the RAM of many forms is available, such as Static random access memory (SRAM, Static Random Access Memory), synchronous static random access memory (SSRAM, Synchronous Static Random Access Memory), dynamic random access memory (DRAM, Dynamic Random Access Memory), Synchronous Dynamic Random Access Memory (SDRAM, Synchronous Dynamic Random Access Memory), double data speed synchronous dynamic RAM (DDRSDRAM, Double Data Rate Synchronous Dynamic Random Access Memory), enhanced synchronous dynamic random Access memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), synchronized links Dynamic random access memory (SLDRAM, SyncLink Dynamic Random Access Memory), direct rambus Random access memory (DRRAM, Direct Rambus Random Access Memory).Description of the embodiment of the present invention is deposited Reservoir 1002 is intended to include but is not limited to the memory of these and any other suitable type.

Memory 1002 in the embodiment of the present invention is for storing various types of data to support the analysis of affection data The operation of device 1000.The example of these data includes: any calculating for operating on the analytical equipment 900 of affection data Machine program, such as operating system 10021 and application program 10022；Music data；Animation data；Book information；Video etc..Wherein, Operating system 10021 includes various system programs, such as ccf layer, core library layer, driving layer etc., for realizing various basic industry It is engaged in and handles hardware based task.Application program 10022 may include various application programs, such as media player (Media Player), browser (Browser) etc., for realizing various applied business.Realize present invention method Program may be embodied in application program 10022.

The method that the embodiments of the present invention disclose can be applied in processor 1001, or real by processor 1001 It is existing.Processor 1001 may be a kind of IC chip, the processing capacity with signal.During realization, the above method Each step can be completed by the instruction of the integrated logic circuit of the hardware in processor 1001 or software form.Above-mentioned Processor 1001 can be general processor, digital signal processor (DSP, Digital Signal Processor), or Other programmable logic device, discrete gate or transistor logic, discrete hardware components etc..Processor 1001 may be implemented Or disclosed each method, step and logic diagram in the execution embodiment of the present invention.General processor can be microprocessor Or any conventional processor etc..The step of method in conjunction with disclosed in the embodiment of the present invention, can be embodied directly in hardware Decoding processor executes completion, or in decoding processor hardware and software module combination execute completion.Software module can To be located in storage medium, which is located at memory 1002, and processor 1001 reads the information in memory 1002, knot Close the step of its hardware completes preceding method.

In the exemplary embodiment, the analytical equipment 1000 of affection data can be by the dedicated integrated electricity of one or more application Road (ASIC, Application Specific Integrated Circuit), DSP, programmable logic device (PLD, Programmable Logic Device), Complex Programmable Logic Devices (CPLD, Complex Programmable Logic Device), field programmable gate array (FPGA, Field-Programmable Gate Array), general processor, control Device, microcontroller (MCU, Micro Controller Unit), microprocessor (Microprocessor) or other electronics member Part is realized, for executing preceding method.

It when the specific processor 1001 runs the computer program, executes: the sample data of group photo image is carried out Feature extraction obtains facial feature data, environmental characteristic data and the skeleton characteristic in the sample data；By institute It states facial feature data, the environmental characteristic data and/or skeleton characteristic and carries out Fusion Features, obtain except the face Fusion feature data except portion's characteristic, the environmental characteristic data and/or skeleton characteristic；Based on the face Portion's characteristic confronting portions class device is trained, the face classifier after being trained；Based on the environmental characteristic data pair Environment classifier is trained, the environment classifier after being trained；Based on the skeleton characteristic to skeleton Classifier is trained, the skeleton classifier after being trained；Based on the fusion feature data to integrated classification device into Row training, the integrated classification device after being trained；Utilize the face classifier after training, the environment classifier after training, training Skeleton classifier afterwards and the integrated classification device after training, carry out image recognition to the target data of group photo image respectively, The corresponding emotion attribute of the target data is determined according to recognition result.

It when the specific processor 1001 runs the computer program, also executes: utilizing nonlinear mapping function by institute It states facial feature data, the environmental characteristic data and/or skeleton characteristic to be mapped in the same higher dimensional space, obtain To the new characteristic in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic； Using the new characteristic as the fusion feature data.

It when the specific processor 1001 runs the computer program, also executes: being based on the facial feature data, it is right The sample data carries out emotional semantic classification；Affective tag mark is carried out to the corresponding sample data of emotional category each in classification results Note.

It when the specific processor 1001 runs the computer program, also executes: being based on the environmental characteristic data, it is right The sample data carries out environment classification；Environmental labels mark is carried out to the corresponding sample data of environment category each in classification results Note.

It when the specific processor 1001 runs the computer program, also executes: calculating in recognition result described in characterization The sum of emotion attribute probability of emotion attribute value of affective tag；Or calculate the environment that the environmental labels are characterized in recognition result The sum of environment attribute probability of attribute value；Determine the corresponding feelings of maximum emotion attribute probability in the sum of described emotion attribute probability Feel attribute value；Alternatively, determining the corresponding environment attribute value of maximum environment attribute probability in the sum of described environment attribute probability；It will The emotion attribute is worth corresponding affective tag, is determined as the corresponding emotion attribute of the target data；Alternatively, by the environment The corresponding environmental labels of attribute value are determined as the corresponding emotion attribute of the target data.

It when the specific processor 1001 runs the computer program, also executes: being based on the facial feature data, really The corresponding facial area of each group photo image in the fixed sample data；Using the corresponding sample data set of the facial area, Network fine tuning is carried out to the face classifier, the face classifier after being finely tuned.

It when the specific processor 1001 runs the computer program, also executes: being based on the environmental characteristic data, really The corresponding environmental area of each group photo image in the fixed sample data；Using the corresponding sample data set in the environmental area, Network fine tuning is carried out to the environment classifier, the environment classifier after being finely tuned.

It when the specific processor 1001 runs the computer program, also executes: being based on the skeleton characteristic According to determining the corresponding skeleton key point of each group photo image in the sample data；Utilize the skeleton key point Corresponding sample data set carries out network fine tuning to the skeleton classifier, the skeleton classifier after being finely tuned.

In the exemplary embodiment, the embodiment of the invention also provides a kind of computer readable storage medium, for example including The memory 1002 of computer program, above-mentioned computer program can be held by the processor 1001 of the analytical equipment 1000 of affection data Row, to complete step described in preceding method.Computer readable storage medium can be FRAM, ROM, PROM, EPROM, EEPROM, The memories such as Flash Memory, magnetic surface storage, CD or CD-ROM；Be also possible to include one of above-mentioned memory or The various equipment of any combination, such as mobile phone, computer, tablet device, personal digital assistant.

A kind of computer readable storage medium, is stored thereon with computer program, which is run by processor When, it executes: using nonlinear mapping function that the facial feature data, the environmental characteristic data and/or skeleton is special Sign data are mapped in the same higher dimensional space, are obtained except the facial feature data, the environmental characteristic data and/or human body New characteristic except skeleton character data；Using the new characteristic as the fusion feature data.

It when the computer program is run by processor, also executes: the facial feature data is based on, to the sample data Carry out emotional semantic classification；Affective tag label is carried out to the corresponding sample data of emotional category each in classification results.

It when the computer program is run by processor, also executes: the environmental characteristic data is based on, to the sample data Carry out environment classification；Environmental labels label is carried out to the corresponding sample data of environment category each in classification results.

It when the computer program is run by processor, also executes: calculating the feelings for characterizing the affective tag in recognition result Feel the sum of the emotion attribute probability of attribute value；Or calculate the environment that the environment attribute value of the environmental labels is characterized in recognition result The sum of attribute probability；Determine the corresponding emotion attribute value of maximum emotion attribute probability in the sum of described emotion attribute probability；Or Person determines the corresponding environment attribute value of maximum environment attribute probability in the sum of described environment attribute probability；By the emotion category Property the corresponding affective tag of value, be determined as the corresponding emotion attribute of the target data；Alternatively, the environment attribute value is corresponding Environmental labels, be determined as the corresponding emotion attribute of the target data.

It when the computer program is run by processor, also executes: being based on the facial feature data, determine the sample number The corresponding facial area of each group photo image in；Using the corresponding sample data set of the facial area, to the face part Class device carries out network fine tuning, the face classifier after being finely tuned.

It when the computer program is run by processor, also executes: being based on the environmental characteristic data, determine the sample number The corresponding environmental area of each group photo image in；Using the corresponding sample data set in the environmental area, to the environment point Class device carries out network fine tuning, the environment classifier after being finely tuned.

It when the computer program is run by processor, also executes: being based on the skeleton characteristic, determine the sample The corresponding skeleton key point of each group photo image in notebook data；Utilize the corresponding sample data of the skeleton key point Collection carries out network fine tuning to the skeleton classifier, the skeleton classifier after being finely tuned.

The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain Lid is within protection scope of the present invention.

Claims

1. a kind of analysis method of affection data, which comprises

Feature extraction is carried out to the sample data of group photo image, it is special to obtain facial feature data in the sample data, environment Levy data and skeleton characteristic；

The facial feature data, the environmental characteristic data and/or skeleton characteristic are subjected to Fusion Features, obtained Fusion feature data in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic；

It is trained based on the facial feature data confronting portions class device, the face classifier after being trained；Based on described Environmental characteristic data are trained environment classifier, the environment classifier after being trained；Based on the skeleton feature Data are trained skeleton classifier, the skeleton classifier after being trained；Based on the fusion feature data Integrated classification device is trained, the integrated classification device after being trained；

After the face classifier after training, the environment classifier after training, the skeleton classifier after training and training Integrated classification device, respectively to group photo image target data carry out image recognition, the number of targets is determined according to recognition result According to corresponding emotion attribute.

2. according to the method described in claim 1, by the facial feature data, the environmental characteristic data and/or human body bone Bone characteristic carries out Fusion Features, obtains except the facial feature data, the environmental characteristic data and/or skeleton are special Levy the fusion feature data except data, comprising:

Using nonlinear mapping function by the facial feature data, the environmental characteristic data and/or skeleton characteristic According to being mapped in the same higher dimensional space, obtain except the facial feature data, the environmental characteristic data and/or skeleton New characteristic except characteristic；

Using the new characteristic as the fusion feature data.

3. according to the method described in claim 1, the method also includes:

4. according to the method described in claim 1, the method also includes:

5. the method according to claim 3 or 4, the method also includes:

It calculates in recognition result and characterizes the sum of emotion attribute probability of emotion attribute value of the affective tag；Or calculate identification knot The sum of the environment attribute probability of environment attribute value of the environmental labels is characterized in fruit；

Determine the corresponding emotion attribute value of maximum emotion attribute probability in the sum of described emotion attribute probability；By the emotion category Property the corresponding affective tag of value, be determined as the corresponding emotion attribute of the target data；

Alternatively, determining the corresponding environment attribute value of maximum environment attribute probability in the sum of described environment attribute probability；It will be described The corresponding environmental labels of environment attribute value are determined as the corresponding emotion attribute of the target data.

6. being instructed according to the method described in claim 1, being trained based on the facial feature data confronting portions class device Face classifier after white silk, comprising:

Based on the facial feature data, the corresponding facial area of each group photo image in the sample data is determined；Using institute The corresponding sample data set of facial area is stated, network fine tuning is carried out to the face classifier, the face classification after being finely tuned Device.

7. being instructed according to the method described in claim 1, being trained based on the environmental characteristic data to environment classifier Environment classifier after white silk, comprising:

Based on the environmental characteristic data, the corresponding environmental area of each group photo image in the sample data is determined；Using institute The corresponding sample data set in environmental area is stated, network fine tuning is carried out to the environment classifier, the environment classification after being finely tuned Device.

8. according to the method described in claim 1, being instructed based on the skeleton characteristic to skeleton classifier Practice, the skeleton classifier after being trained, comprising:

Based on the skeleton characteristic, determine that the corresponding skeleton of each group photo image is crucial in the sample data Point；Using the corresponding sample data set of the skeleton key point, network fine tuning is carried out to the skeleton classifier, is obtained Skeleton classifier after to fine tuning.

9. a kind of analytical equipment of affection data, described device include:

Extraction unit carries out feature extraction for the sample data to group photo image, and the face obtained in the sample data is special Levy data, environmental characteristic data and skeleton characteristic；

Integrated unit, for carrying out the facial feature data, the environmental characteristic data and/or skeleton characteristic Fusion Features obtain melting in addition to the facial feature data, the environmental characteristic data and/or skeleton characteristic Close characteristic；

Training unit, for being trained based on the facial feature data confronting portions class device, the face part after being trained Class device；Environment classifier is trained based on the environmental characteristic data, the environment classifier after being trained；Based on described Skeleton characteristic is trained skeleton classifier, the skeleton classifier after being trained；Based on described Fusion feature data are trained integrated classification device, the integrated classification device after being trained；

Recognition unit, for utilizing the face classifier after training, the environment classifier after training, the skeleton point after training Integrated classification device after class device and training carries out image recognition to the target data of group photo image respectively, true according to recognition result Determine the corresponding emotion attribute of the target data.

10. a kind of analytical equipment of affection data, described device includes: memory, processor and is stored in memory and is located Manage the executable program of device movement, which is characterized in that the processor executes such as claim 1 when running the executable program To 8 described in any item affection datas analysis method the step of.