CN106096642A

CN106096642A - Based on the multi-modal affective characteristics fusion method differentiating locality preserving projections

Info

Publication number: CN106096642A
Application number: CN201610397708.4A
Authority: CN
Inventors: 徐嵚嵛; 卢官明; 闫静杰
Original assignee: Nanjing Post and Telecommunication University
Current assignee: Nanjing Post and Telecommunication University; Nanjing University of Posts and Telecommunications
Priority date: 2016-06-07
Filing date: 2016-06-07
Publication date: 2016-11-09
Anticipated expiration: 2036-06-07
Also published as: CN106096642B

Abstract

The invention discloses based on the multi-modal affective characteristics fusion method differentiating locality preserving projections, first the sample data of every kind of mode in multi-modal affection data storehouse is extracted affective characteristics by the method, such as phonetic feature, expressive features, posture feature etc., then use and differentiate that the affective characteristics of various mode is mapped in unified discriminating subspace by locality preserving projections method, many stack features after finally mapping carry out fused in tandem, obtain the multi-modal affective characteristics merged.Using the multi-modal affective characteristics that merges as the grader of input can efficiently identify out anger, dislike, fear, the basic emotion such as glad, sad and surprised, the realization for exploitation mankind's emotional semantic classification identification system and man-machine interaction provides a kind of new method and approach.

Description

Based on the multi-modal affective characteristics fusion method differentiating locality preserving projections

Technical field

The invention belongs to image procossing and area of pattern recognition, relate to a kind of feature being applied to multi-modal emotion recognition and melt Conjunction method, particularly to a kind of based on the multi-modal affective characteristics fusion method differentiating locality preserving projections.

Background technology

Emotional expression is always Human communication and the main mode understood mutually.Due to sending out energetically of computer technology Exhibition, interactive capability (HCI, Human Computer Interaction) becomes increasingly have researching value and reality meaning Justice, how the emotion of people is identified the most just becoming most important by computer.Along with the development of information technology, either In laboratory or actual life, the emotion information that the mankind express is easy to be obtained by various sensors.Wherein scheme Picture and voice are to be easiest to the emotion information of acquisition and are also most important information for emotion recognition.

Calculating which emotion of function identification is a complicated problem, and in actual life, the emotion expressed by people is the most only There is subtle difference, and these difference mankind are difficult to distinguish, so computer can only identify some basic feelings for Mu Qian Sense, such as angry, dislike, fear, the basic emotion such as glad, sad and surprised.But these basic emotions are identified Emotion technology has had the field of being widely applied, such as in fields such as education, medical treatment, man-machine interaction, audio-visual amusements.

In the past few decades, there is a lot of emotion recognition based on single mode, be most commonly that facial expression emotion recognition, Speech emotion recognition and emotion recognition based on attitude, but single mode emotion recognition has bigger restriction, expressed by people Emotion information be a kind of multi-modal emotion information, such as one personal expression indignation, his sound, facial expression, health Attitude, heart rate and body temperature etc. all can from have under normal condition bigger different.If the emotion only with a kind of mode is special Levy and be identified, certainly will will not obtain preferable result, especially in actual environment.Result of study shows, compared to single mode State emotion recognition, multi-modal emotion recognition is relatively reliable and accurate.Multi-modal emotion recognition considers the multiple feelings expressed by people Sense information, comprehensively weighs the emotion expressed by people, for the interference of different situations (such as face in real life The problems such as image information, there may be different illumination, angle) the most more robustness.

For multi-modal emotion recognition, Feature Fusion is a most important ring, the difference that different sensors is obtained by it Affective characteristics carry out merging thus obtain fusion feature send into grader be identified.Common Feature fusion mainly divides It is three classes: score layer fusion method, Feature-level fusion method and Decision-level fusion method.These three method is in order to be conducive at present Real-time, the important information that should keep enough realizes Information Compression again, inevitably has information loss, causes identifying Degree of accuracy decline.Wherein Feature-level fusion method has a wide range of applications at voice and image domains.At present to multi-modal feelings The other research of the perception single mode emotion recognition that is far from is improved and abundant.

In prior art, Publication No. CN105138991A, entitled " a kind of based on emotion significant characteristics merge regard Frequently emotion identification method " patent of invention disclose a kind of based on emotion significant characteristics merge video feeling recognition methods, Its shortcoming is: the characteristics of image in video and phonetic feature can only be carried out Feature Fusion, poor expandability, it is impossible to other more Multi-modal feature carries out Feature Fusion；The image and the most non-immediate affective characteristics of phonetic feature that extract but use color emotion Intensity level and audio frequency sentiment dictionary are indicated；Blending algorithm is excessively simple, by the affective characteristics after simple Weighted Fusion Discriminability is poor.

Summary of the invention

After the technical problem to be solved in the present invention is the fusion that the Feature fusion for multi-modal emotion recognition exists The poor problem of affective characteristics identification, and existing single mode emotion recognition technology can not obtain and accurately identify knot The problem of fruit.

For solving the problems referred to above, the present invention is directed to the demand of human emotion's automatic evaluation system and man-machine interactive system, Propose, based on the multi-modal affective characteristics fusion method differentiating locality preserving projections, to provide one more for man-machine interaction problems Accurately, reliable approach.Concrete technical scheme is as follows:

Multi-modal affective characteristics fusion method based on discriminating locality preserving projections, comprises the following steps:

A, first in multi-modal affection data storehouse every kind of mode sample data extract affective characteristics, then by various moulds The affective characteristics vector of state carries out dimension-reduction treatment, for the sample d of jth kind mode_jDimensional feature vector x_ijrRepresent, i.e.Wherein, 1≤j≤m, m are the number of mode, and 1≤i≤c, c are the number of emotional category, 1≤r≤n_ij, n_ijFor belonging to In the i-th class emotion, the number of samples of jth kind mode, x_ijrRepresent the spy of r the sample belonging to the i-th class emotion, jth kind mode Levy vector；

B, characteristic vector to the different modalities after dimensionality reduction carry out differentiating locality preserving projections, obtain optimal projecting direction α；

C, respectively the characteristic vector of different modalities is mapped, Y_j=α^TX_j, X_jFor c X_ijThe matrix of composition, i.e. X_j= [X_1j,...,X_ij,...,X_cj]^T；

D, will map after feature carry out series connection and obtain fusion feature:

Z=[α^TX₁,...,α^TX_j,...,α^TX_m]^T。

Further, step B is carried out after dimensionality reduction differentiate that locality preserving projections purpose is to solve for optimal projection matrix α, will The affective characteristics vector x of various mode_ijrIt is mapped in unified discriminating subspace, characteristic vector y after being mapped_ijr, tool Body step is as follows:

B1: definition within class scatter matrix

\begin{matrix} S_{W}^{y} = \frac{1}{2} (Σ_{i = 1}^{c} Σ_{j = 1}^{m} Σ_{r = 1}^{n_{i j}} Σ_{l = 1}^{n_{i j}} (y_{i j r} - y_{i j l}) {(y_{i j r} - y_{i j l})}^{T} \\ + Σ_{i = 1}^{c} Σ_{j = 1}^{m} Σ_{k = 1, j &NotEqual; k}^{m} Σ_{r = 1}^{n_{i j}} Σ_{l = 1}^{n_{i j}} (y_{i j r} - y_{i k l}) {(y_{i j r} - y_{i k l})}^{T}) W_{r l} \end{matrix}

Wherein, y_iklRepresent from the i-th class emotion, kth kind mode the mapping of the l sample after characteristic vector, 1≤k≤ M, W_rlFor keeping weight from the local between phase feeling of sympathy and the characteristic vector of mode；

B2: definition inter _ class relationship matrix

S_{B}^{y} = \frac{1}{2} (Σ_{i = 1}^{c} Σ_{h = 1}^{c} (μ_{i} - μ_{h}) {(μ_{i} - μ_{h})}^{T}) B_{i h}

Wherein B_ihFor keeping weight, μ from the local between the characteristic vector average of identical mode_iIt is that the i-th class sample reflects Characteristic vector average after penetrating:

μ_{i} = \frac{1}{n_{i}} Σ_{j = 1}^{m} Σ_{r = 1}^{n_{i j}} y_{i j r}

Wherein, n_iIt is the number of the i-th apoplexy due to endogenous wind sample, μ_hBe h class sample characteristic vector average；

B3: maximizing inter _ class relationship matrix, minimize within class scatter matrix, this target can be expressed as Optimization problem:

(α_{1}^{*}, α_{2}^{*}, ..., α_{m}^{*}) = \arg \underset{α_{1}, α_{2}, ..., α_{m}}{m a x} \frac{T r (S_{B}^{y})}{T r (S_{W}^{y})}

Wherein, Tr () is matrix trace.

Further, at definition within class scatter matrixStep B1 in, local between described characteristic vector keeps power Weight matrix W_rl, it is defined as follows:

Definition is from phase feeling of sympathy and characteristic vector x of mode_ijrWith x_ijlBetween local keep weight matrix

Wherein, x_ijlRepresent from the i-th class emotion, the characteristic vector of l sample of jth kind mode, 1≤l≤n_ij, ginseng Number t can be set by experience, and the characteristic vector from different emotions or mode is not considered the weight between them.

Further, at definition inter _ class relationship matrixStep B2 in, local between described characteristic vector average is protected Hold weight matrix B_ih, concrete steps and being defined as follows:

First the i-th class emotion, the characteristic vector average of jth kind mode are calculated

μ_{i j}^{(x)} = \frac{1}{n_{i j}} Σ_{r = 1}^{n_{i j}} x_{i j r}

WhereinSubscript (x) represent former sample space, calculate from h class emotion, the feature of jth kind mode equally Vector average

μ_{h j}^{(x)} = \frac{1}{n_{h j}} Σ_{r = 1}^{n_{i j}} x_{h j r}

Wherein, n_hjFor belonging to h class emotion, the number of samples of jth kind mode, x_hjrRepresent and belong to h class emotion, jth kind The characteristic vector of r sample of mode, 1≤h≤c；

Define the characteristic vector average from identical modeWithBetween local keep weight matrix

Wherein, parameter t can be set by experience equally, for not considering the characteristic vector average from different modalities Weight between them.

Further, it is in step B3, described optimization problem, inter _ class relationship matrix will be maximized, minimize class Interior dispersion, obtains the projecting direction of maximumSpecifically comprise the following steps that

B3.1: convert the optimization problem in B3, obtains following optimization problem:

(α_{1}^{*}, α_{2}^{*}, ..., α_{m}^{*}) = \arg \underset{α_{1}, α_{2}, ..., α_{m}}{m a x} \frac{T r (α^{T} D α)}{T r (α^{T} S α)}

In optimization formula, denominator part is within class scatter matrix:

α^{T} S α = [\begin{matrix} α_{1}^{T} & α_{2}^{T} & ... & α_{m}^{T} \end{matrix}] (\begin{matrix} S_{11} & S_{12} & ... & S_{1 m} \\ S_{21} & S_{22} & ... & S_{2 m} \\ . & . & . & . \\ . & . & . & . \\ . & . & . & . \\ S_{m 1} & S_{m 2} & ... & S_{m m} \end{matrix}) [\begin{matrix} α_{1} \\ α_{2} \\ . \\ . \\ . \\ α_{m} \end{matrix}] = S_{W}^{y}

Wherein matrixCan be expressed as:

S_{j k} = \{\begin{matrix} Σ_{i = 1}^{c} X_{i j} {LX}_{i j}^{T} & j = k \\ - Σ_{i = 1}^{c} n_{i j} n_{i k} {μ_{i j}}^{(x)} μ_{i k}^{(x) T} & j &NotEqual; k \end{matrix}

Wherein μ_ik ^(x)For from the i-th class emotion, the characteristic vector average of kth kind mode, n_ikFor from the i-th class emotion, The number of samples of k kind mode, X_ijFor n_ijIndividual characteristic vector x_ijrThe eigenmatrix of composition, L=mD_rr-W_rl,D_rrIt it is a diagonal angle Matrix, its value is row or column and (W is symmetrical matrix) of the weight matrix W of characteristic vector between sample, i.e.

Optimization formula Middle molecule part is inter _ class relationship matrix:

α^{T} D α = [\begin{matrix} α_{1}^{T} & α_{2}^{T} & ... & α_{m}^{T} \end{matrix}] (\begin{matrix} D_{11} & D_{12} & ... & D_{1 m} \\ D_{21} & D_{22} & ... & D_{2 m} \\ . & . & . & . \\ . & . & . & . \\ . & . & . & . \\ D_{m 1} & D_{m 2} & ... & D_{m m} \end{matrix}) [\begin{matrix} α_{1} \\ α_{2} \\ . \\ . \\ . \\ α_{m} \end{matrix}] = S_{B}^{y}

Wherein matrixCan be expressed as:

D_{j k} = \{\begin{matrix} \frac{m c}{4} M_{j}^{(x)} E_{j j} M_{j}^{x (T)} - \frac{1}{m^{2}} Σ_{j = 1}^{m} Σ_{k = 1}^{m} (Σ_{i = 1}^{c} n_{i j} μ_{i j}^{(x)}) {(Σ_{i = 1}^{c} n_{i k} μ_{i k}^{(x)})}^{T} & j = k \\ \frac{c}{4} (Σ_{i = 1}^{c} μ_{i j}^{(x)} μ_{i k}^{(x) T}) - \frac{1}{m^{2}} Σ_{j = 1}^{m} Σ_{k = 1}^{m} (Σ_{i = 1}^{c} n_{i j} μ_{i j}^{(x)}) {(Σ_{i = 1}^{c} n_{i k} μ_{i k}^{(x)})}^{T} & j &NotEqual; k \end{matrix}

WhereinFor c mean vectorThe matrix of composition, E_jjIt it is local holding weight B of average_ihRow or Row and, i.e.

B3.2: owing to the optimization problem in B3.1 does not exist closed solutions, needs to change into the ratio of mark the mark of ratio, Finally give following optimization problem:

(α_{1}^{*}, α_{2}^{*}, ..., α_{m}^{*}) = \arg \underset{α_{1}, α_{2}, ..., α_{m}}{m a x} T r (\frac{α^{T} D α}{α^{T} S α})

By the method for generalized eigenvalue decomposition, above formula solves optimal projection matrix

Compared with prior art, it is an advantage of the current invention that:

(1) affective characteristics using multi-modal fusion in emotion recognition problem has more compared to single mode affective characteristics High accuracy and objectivity, also have more preferable robustness in reality.

(2) multi-modal affective characteristics fusion method based on discriminating locality preserving projections, not only allows for inter _ class relationship, Have also contemplated that within-cluster variance, different classes of sample is had preferable discrimination, the locality preserving projections of introducing again can be very The good nonlinear situation of adaptation.Finally give the multi-modal emotional fusion feature being more suitable for emotion recognition.

Above-mentioned advantage is verified also by experimental result, and the present invention is by introducing based on differentiating locality preserving projections Multi-modal affective characteristics fusion method, be used in multi-modal expression classification recognition work in, can efficiently identify out anger, Six kinds of expressions such as disliking, fear, be glad, sad and surprised, for exploitation human emotion's automatic evaluation system and man-machine interaction system System provides a kind of new method and approach.

Accompanying drawing explanation

Fig. 1 is the flow chart based on the multi-modal affective characteristics fusion method differentiating locality preserving projections of the present invention.

Fig. 2 is the parts of images in bimodal emotion data base.

Detailed description of the invention

In conjunction with accompanying drawing, specific embodiments of the present invention are further described in detail.The present invention based on discriminating office Portion keeps the realization of the multi-modal affective characteristics fusion method of projection, as it is shown in figure 1, mainly comprise the steps of

Step 1: gather still image and the sound bite of video in multi-modal data storehouse

In specific implementation process, use eNTERFACE bimodal database.This data base comprises 1260 from 42 people Individual video segment, each video has affective tag, have expressed 6 kinds of basic emotions: angry, dislike, fear, glad, sad and Surprised (corresponding label 1-6 respectively), as shown in Figure 2.Video size is 720 × 576, and sample frequency is 25fps, sound in video Sample frequency be 48kHz.By video framing, take the abundantest frame static images as this video of wherein expressing one's feelings.Will be every Individual video separation goes out voice, as the sound bite that this video is corresponding.The corresponding still image of final each video segment and One section of voice.Arbitrarily choose wherein 75% image and corresponding voice as training sample, remaining 25% as test specimens This.

Step 2: image and voice messaging are carried out feature extraction, carries out dimension-reduction treatment, and represent by characteristic vector

First the still image obtained previous step carries out cutting, intercepts face part image, and size is 128 × 128, Then carry out aliging, the image pretreatment operation such as dimension normalization and gray balance, finally to image zooming-out Gabor, SIFT, The features such as LBP (are extracted Gabor characteristic) in the present embodiment.For sound bite, use the speech processes workbox of specialty OpenSmile extracts various features (being extracted emobase2010 feature in the present embodiment).Due to the characteristic vector warp extracted It is commonly present the problem that dimension is too high, uses the method for PCA dimensionality reduction to obtain the feature of suitable dimension, use d_jDimensional feature vector represents Characteristics of image after dimensionality reduction and speech feature vector, i.e.Wherein, 1≤j≤m, m are the number of mode, 1≤i≤c, c For the number of emotional category, 1≤r≤n_ij, n_ijFor belonging to the i-th class emotion, the number of samples of jth kind mode, x_ijrExpression belongs to I-th class emotion, the characteristic vector of r sample of jth kind mode, additionally, n_iBeing the number of the i-th apoplexy due to endogenous wind sample, n is all samples This number.C=6 in the present embodiment, m=2, n_ij=210, n_i=420, n=1260, for the multi-modal number that other are different According to storehouse, only these parameters, such as m=3 during three modal data storehouses need to be changed.

Step 3: use and differentiate locality preserving projections method, solve optimal projection matrix α, the emotion of various mode is special Levy vector x_ijrIt is mapped in unified discriminating subspace, characteristic vector y after being mapped_ijr, specifically comprise the following steps that

First, definition is from identical category and characteristic vector x of mode_ijrWith x_ijlBetween local keep weight matrix

Wherein, x_ijlRepresent from the i-th class, the characteristic vector of l sample of jth mode, 1≤l≤n_ij, parameter t can Obtained by experience.Characteristic vector from different modalities or classification is not considered the weight between them.Then define all kinds of Within class scatter matrix

\begin{matrix} S_{W}^{y} = \frac{1}{2} (Σ_{i = 1}^{c} Σ_{j = 1}^{m} Σ_{r = 1}^{n_{i j}} Σ_{l = 1}^{n_{i j}} (y_{i j r} - y_{i j l}) {(y_{i j r} - y_{i j l})}^{T} \\ + Σ_{i = 1}^{c} Σ_{j = 1}^{m} Σ_{k = 1, j &NotEqual; k}^{m} Σ_{r = 1}^{n_{i j}} Σ_{l = 1}^{n_{i j}} (y_{i j r} - y_{i k l}) {(y_{i j r} - y_{i k l})}^{T}) W_{r l} \end{matrix} - - - (2)

Wherein, y_iklRepresent from the i-th class emotion, kth kind mode the mapping of the l sample after characteristic vector, 1≤k≤ m。

Then, try to achieve from the i-th class emotion, the characteristic vector average of jth kind mode

μ_{i j}^{(x)} = \frac{1}{n_{i j}} Σ_{r = 1}^{n_{i j}} x_{i j r} - - - (3)

μ_{h j}^{(x)} = \frac{1}{n_{h j}} Σ_{r = 1}^{n_{i j}} x_{h j r} - - - (4)

Wherein n_hjFor belonging to h class emotion, the number of samples of jth kind mode, x_hjrRepresent and belong to h class emotion, jth kind The characteristic vector of r sample of mode, 1≤h≤c.With within class scatter matrixSimilar, definition is from identical mode Characteristic vector averageWithBetween local keep weight matrix

Define all kinds of inter _ class relationship matrixes subsequently

S_{B}^{y} = \frac{1}{2} (Σ_{i = 1}^{c} Σ_{h = 1}^{c} (μ_{i} - μ_{h}) {(μ_{i} - μ_{h})}^{T}) B_{i h} - - - (6)

Wherein, μ_iIt is the meansigma methods of characteristic vector after the i-th class sample maps:

μ_{i} = \frac{1}{n_{i}} Σ_{j = 1}^{m} Σ_{r = 1}^{n_{i j}} y_{i j r}; - - - (7)

It is similar to, μ_hIt it is the meansigma methods of h class sample characteristics.

Finally, minimize within class scatter matrix to maximize inter _ class relationship matrix simultaneously, obtain following optimization Formula:

(α_{1}^{*}, α_{2}^{*}, ..., α_{m}^{*}) = \arg \underset{α_{1}, α_{2}, ..., α_{m}}{m a x} \frac{T r (S_{B}^{y})}{T r (S_{W}^{y})} - - - (8)

Wherein, Tr () is matrix trace.Following optimization problem is obtained by abbreviation and conversion:

(α_{1}^{*}, α_{2}^{*}, ..., α_{m}^{*}) = \arg \underset{α_{1}, α_{2}, ..., α_{m}}{m a x} \frac{T r (α^{T} D α)}{T r (α^{T} S α)} - - - (9)

In optimization formula, denominator part is within class scatter matrix:

α^{T} S α = [\begin{matrix} α_{1}^{T} & α_{2}^{T} & ... & α_{m}^{T} \end{matrix}] (\begin{matrix} S_{11} & S_{12} & ... & S_{1 m} \\ S_{21} & S_{22} & ... & S_{2 m} \\ . & . & . & . \\ . & . & . & . \\ . & . & . & . \\ S_{m 1} & S_{m 2} & ... & S_{m m} \end{matrix}) [\begin{matrix} α_{1} \\ α_{2} \\ . \\ . \\ . \\ α_{m} \end{matrix}] = S_{W}^{y} - - - (10)

Wherein matrixCan be expressed as:

S_{j k} = \{\begin{matrix} Σ_{i = 1}^{c} X_{i j} {LX}_{i j}^{T} & j = k \\ - Σ_{i = 1}^{c} n_{i j} n_{i k} {μ_{i j}}^{(x)} μ_{i k}^{(x) T} & j &NotEqual; k \end{matrix} - - - (11)

Wherein μ_ik ^(x)For from the i-th class emotion, the characteristic vector average of kth kind mode, n_ikFor from the i-th class emotion, Number of samples in k kind mode, X_ijFor n_ijIndividual characteristic vector x_ijrThe eigenmatrix of composition, L=mD_rr-W_rl,D_rrBe one right Angular moment battle array, its value is row or column and (W is symmetrical matrix) of the weight matrix W of characteristic vector between sample, i.e.

Optimization formula Middle molecule part is inter _ class relationship matrix:

α^{T} D α = [\begin{matrix} α_{1}^{T} & α_{2}^{T} & ... & α_{m}^{T} \end{matrix}] (\begin{matrix} D_{11} & D_{12} & ... & D_{1 m} \\ D_{21} & D_{22} & ... & D_{2 m} \\ . & . & . & . \\ . & . & . & . \\ . & . & . & . \\ D_{m 1} & D_{m 2} & ... & D_{m m} \end{matrix}) [\begin{matrix} α_{1} \\ α_{2} \\ . \\ . \\ . \\ α_{m} \end{matrix}] = S_{B}^{y} - - - (12)

Wherein matrixCan be expressed as:

D_{j k} = \{\begin{matrix} \frac{m c}{4} M_{j}^{(x)} E_{j j} M_{j}^{x (T)} - \frac{1}{m^{2}} Σ_{j = 1}^{m} Σ_{k = 1}^{m} (Σ_{i = 1}^{c} n_{i j} μ_{i j}^{(x)}) {(Σ_{i = 1}^{c} n_{i k} μ_{i k}^{(x)})}^{T} & j = k \\ \frac{c}{4} (Σ_{i = 1}^{c} μ_{i j}^{(x)} μ_{i k}^{(x) T}) - \frac{1}{m^{2}} Σ_{j = 1}^{m} Σ_{k = 1}^{m} (Σ_{i = 1}^{c} n_{i j} μ_{i j}^{(x)}) {(Σ_{i = 1}^{c} n_{i k} μ_{i k}^{(x)})}^{T} & j &NotEqual; k \end{matrix} - - - (13)

WhereinMean vector for c featureThe matrix of composition, E_jjIt it is local holding weight B of average_ihRow Or row and, i.e.

Owing to formula (9) does not exist closed solutions, need to change into the ratio of mark the mark of ratio:

(α_{1}^{*}, α_{2}^{*}, ..., α_{m}^{*}) = \arg \underset{α_{1}, α_{2}, ..., α_{m}}{m a x} T r (\frac{α^{T} D α}{α^{T} S α}) - - - (14)

By generalized eigenvalue decomposition, solve formula (13), obtain optimal mapping

Step 4: training sample, test sample are projected the feature after being mapped, the feature after mapping is carried out Series connection obtains fusion feature

Characteristics of image and phonetic feature are multiplied by α respectively map, Y_j=α^TX_j, wherein X_jFor c X_ijThe square of composition Battle array, i.e. X_j=[X_1j,...,X_ij,...,X_cj]^T, then the feature mapped to be connected, concrete grammar is as follows:

Z^{t r a i n} = {[α^{T} X_{1}^{t r a i n}, ..., α^{T} X_{j}^{t r a i n}, ..., α^{T} X_{m}^{t r a i n}]}^{T} - - - (15)

Z^{t e s t} = {[α^{T} X_{1}^{t e s t}, ..., α^{T} X_{j}^{t e s t}, ..., α^{T} X_{m}^{t e s t}]}^{T} - - - (16)

Step 5: the fusion feature of training sample is sent into and is trained in grader and tests by test sample

The fusion feature of training sample obtained in the previous step is sent in grader (using libSVM in the present embodiment), Obtain suitable model and parameter by the training of grader, finally test data are sent in grader and be identified result.

The embodiment of above detailed description of the invention, not in order to limit the present invention, all the spirit and principles in the present invention it In, any modification, equivalent substitution and improvement etc. made, should be included within the scope of the present invention.

Claims

1. based on the multi-modal affective characteristics fusion method differentiating locality preserving projections, it is characterised in that comprise the following steps:

A, first in multi-modal affection data storehouse every kind of mode sample data extract affective characteristics, then by various mode Affective characteristics vector carries out dimension-reduction treatment, for the sample d of jth kind mode_jDimensional feature vector x_ijrRepresent, i.e.Wherein, 1≤j≤m, m are the number of mode, and 1≤i≤c, c are the number of emotional category, 1≤r≤n_ij, n_ijFor belonging to In the i-th class emotion, the number of samples of jth kind mode, x_ijrRepresent the spy of r the sample belonging to the i-th class emotion, jth kind mode Levy vector；

Z=[α^TX₁,...,α^TX_j,...,α^TX_m]^T。

Multi-modal affective characteristics fusion method based on discriminating locality preserving projections the most according to claim 1, its feature Being in step B, described discriminating locality preserving projections purpose is to solve for optimal projection matrix α, and the emotion of various mode is special Levy vector x_ijrIt is mapped in unified discriminating subspace, characteristic vector y after being mapped_ijr, specifically comprise the following steps that

B1: definition within class scatter matrix

\begin{matrix} S_{W}^{y} = \frac{1}{2} (Σ_{i = 1}^{c} Σ_{j = 1}^{m} Σ_{r = 1}^{n_{i j}} Σ_{l = 1}^{n_{i j}} (y_{i j r} - y_{i j l}) {(y_{i j r} - y_{i j l})}^{T} \\ + Σ_{i = 1}^{c} Σ_{j = 1}^{m} Σ_{k = 1, j &NotEqual; k}^{m} Σ_{r = 1}^{n_{i j}} Σ_{l = 1}^{n_{i j}} (y_{i j r} - y_{i k l}) {(y_{i j r} - y_{i k l})}^{T}) W_{r l} \end{matrix}

Wherein, y_iklRepresent from the i-th class emotion, kth kind mode the mapping of the l sample after characteristic vector, 1≤k≤m, W_rl For keeping weight from the local between phase feeling of sympathy and the characteristic vector of mode；

B2: definition inter _ class relationship matrix

S_{B}^{y} = \frac{1}{2} (Σ_{i = 1}^{c} Σ_{h = 1}^{c} (μ_{i} - μ_{h}) {(μ_{i} - μ_{h})}^{T}) B_{i h}

Wherein B_ihFor keeping weight, μ from the local between the characteristic vector average of identical mode_iIt is after the i-th class sample maps Characteristic vector average:

μ_{i} = \frac{1}{n_{i}} Σ_{j = 1}^{m} Σ_{r = 1}^{n_{i j}} y_{i j r}

(α_{1}^{*}, α_{2}^{*}, ..., α_{m}^{*}) = \arg \underset{α_{1}, α_{2}, ..., α_{m}}{m a x} \frac{T r (S_{B}^{y})}{T r (S_{W}^{y})}

Wherein, Tr () is matrix trace.

Multi-modal affective characteristics fusion method based on discriminating locality preserving projections the most according to claim 2, its feature Being in step B1, the local between described characteristic vector keeps weight matrix W_rl, it is defined as follows:

Wherein, x_ijlRepresent from the i-th class emotion, the characteristic vector of l sample of jth kind mode, 1≤l≤n_ij, parameter t can It is set by experience, the characteristic vector from different emotions or mode is not considered the weight between them.

Multi-modal affective characteristics fusion method based on discriminating locality preserving projections the most according to claim 2, its feature Being in step B2, the local between described characteristic vector average keeps weight matrix B_ih, concrete steps and being defined as follows:

μ_{i j}^{(x)} = \frac{1}{n_{i j}} Σ_{r = 1}^{n_{i j}} x_{i j r}

WhereinSubscript (x) represent former sample space, calculate from h class emotion, the characteristic vector of jth kind mode equally Average

μ_{h j}^{(x)} = \frac{1}{n_{h j}} Σ_{r = 1}^{n_{i j}} x_{h j r}

Wherein, n_hjFor belonging to h class emotion, the number of samples of jth kind mode, x_hjrRepresent and belong to h class emotion, jth kind mode The characteristic vector of r sample, 1≤h≤c；

Wherein, parameter t can be set by experience equally, for the characteristic vector average from different modalities is not considered them Between weight.

Multi-modal affective characteristics fusion method based on discriminating locality preserving projections the most according to claim 2, its feature It is in step B3, described optimization problem, inter _ class relationship matrix will be maximized, minimize within-cluster variance, obtain Big projecting directionSpecifically comprise the following steps that

(α_{1}^{*}, α_{2}^{*}, ..., α_{m}^{*}) = \arg \underset{α_{1}, α_{2}, ..., α_{m}}{m a x} \frac{T r (α^{T} D α)}{T r (α^{T} S α)}

In optimization formula, denominator part is within class scatter matrix:

α^{T} S α = [\begin{matrix} α_{1}^{T} & α_{2}^{T} & ... & α_{m}^{T} \end{matrix}] (\begin{matrix} S_{11} & S_{12} & ... & S_{1 m} \\ S_{21} & S_{22} & ... & S_{2 m} \\ \begin{matrix} . \\ . \\ . \end{matrix} & \begin{matrix} . \\ . \\ . \end{matrix} & \begin{matrix} . \\ . \\ . \end{matrix} & \begin{matrix} . \\ . \\ . \end{matrix} \\ S_{m 1} & S_{m 2} & ... & S_{m m} \end{matrix}) [\begin{matrix} α_{1} \\ α_{2} \\ . \\ . \\ . \\ α_{m} \end{matrix}] = S_{W}^{y}

Wherein matrixCan be expressed as:

S_{j k} = \{\begin{matrix} Σ_{i = 1}^{c} X_{i j} {LX}_{i j}^{T} & j = k \\ - Σ_{i = 1}^{c} n_{i j} n_{i k} {μ_{i j}}^{(x)} μ_{i k}^{(x) T} & j &NotEqual; k \end{matrix}

Wherein μ_ik ^(x)For from the i-th class emotion, the characteristic vector average of kth kind mode, n_ikFor from the i-th class emotion, kth kind mould The number of samples of state, X_ijFor n_ijIndividual characteristic vector x_ijrThe eigenmatrix of composition, L=mD_rr-W_rl,D_rrIt is a diagonal matrix, Its value is row or column and (W is symmetrical matrix) of the weight matrix W of characteristic vector between sample, i.e.

Optimization formula Middle molecule part is inter _ class relationship matrix:

α^{T} D α = [\begin{matrix} α_{1}^{T} & α_{2}^{T} & ... & α_{m}^{T} \end{matrix}] (\begin{matrix} D_{11} & D_{12} & ... & D_{1 m} \\ D_{21} & D_{22} & ... & D_{2 m} \\ \begin{matrix} . \\ . \\ . \end{matrix} & \begin{matrix} . \\ . \\ . \end{matrix} & \begin{matrix} . \\ . \\ . \end{matrix} & \begin{matrix} . \\ . \\ . \end{matrix} \\ D_{m 1} & D_{m 2} & ... & D_{m m} \end{matrix}) [\begin{matrix} α_{1} \\ α_{2} \\ . \\ . \\ . \\ α_{m} \end{matrix}] = S_{B}^{y}

Wherein matrixCan be expressed as:

D_{j k} = \{\begin{matrix} \frac{m c}{4} M_{j}^{(x)} E_{j j} M_{j}^{(x) T} - \frac{1}{m^{2}} Σ_{j = 1}^{m} Σ_{k = 1}^{m} (Σ_{i = 1}^{c} n_{i j} μ_{i j}^{(x)}) {(Σ_{i = 1}^{c} n_{i k} μ_{i k}^{(x)})}^{T} & j = k \\ \frac{c}{4} (Σ_{i = 1}^{c} μ_{i j}^{(x)} μ_{i k}^{(x) T}) - \frac{1}{m^{2}} Σ_{j = 1}^{m} Σ_{k = 1}^{m} (Σ_{i = 1}^{c} n_{i j} μ_{i j}^{(x)}) {(Σ_{i = 1}^{c} n_{i k} μ_{i k}^{(x)})}^{T} & j &NotEqual; k \end{matrix}

B3.2: owing to the optimization problem in B3.1 does not exist closed solutions, needs to change into the ratio of mark the mark of ratio, finally Obtain following optimization problem:

(α_{1}^{*}, α_{2}^{*}, ..., α_{m}^{*}) = \arg \underset{α_{1}, α_{2}, ..., α_{m}}{m a x} T r (\frac{α^{T} D α}{α^{T} S α})