CN106096642A - Based on the multi-modal affective characteristics fusion method differentiating locality preserving projections - Google Patents

Based on the multi-modal affective characteristics fusion method differentiating locality preserving projections Download PDF

Info

Publication number
CN106096642A
CN106096642A CN201610397708.4A CN201610397708A CN106096642A CN 106096642 A CN106096642 A CN 106096642A CN 201610397708 A CN201610397708 A CN 201610397708A CN 106096642 A CN106096642 A CN 106096642A
Authority
CN
China
Prior art keywords
alpha
sigma
characteristic vector
matrix
class
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610397708.4A
Other languages
Chinese (zh)
Other versions
CN106096642B (en
Inventor
徐嵚嵛
卢官明
闫静杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201610397708.4A priority Critical patent/CN106096642B/en
Publication of CN106096642A publication Critical patent/CN106096642A/en
Application granted granted Critical
Publication of CN106096642B publication Critical patent/CN106096642B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Image Analysis (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses based on the multi-modal affective characteristics fusion method differentiating locality preserving projections, first the sample data of every kind of mode in multi-modal affection data storehouse is extracted affective characteristics by the method, such as phonetic feature, expressive features, posture feature etc., then use and differentiate that the affective characteristics of various mode is mapped in unified discriminating subspace by locality preserving projections method, many stack features after finally mapping carry out fused in tandem, obtain the multi-modal affective characteristics merged.Using the multi-modal affective characteristics that merges as the grader of input can efficiently identify out anger, dislike, fear, the basic emotion such as glad, sad and surprised, the realization for exploitation mankind's emotional semantic classification identification system and man-machine interaction provides a kind of new method and approach.

Description

Based on the multi-modal affective characteristics fusion method differentiating locality preserving projections
Technical field
The invention belongs to image procossing and area of pattern recognition, relate to a kind of feature being applied to multi-modal emotion recognition and melt Conjunction method, particularly to a kind of based on the multi-modal affective characteristics fusion method differentiating locality preserving projections.
Background technology
Emotional expression is always Human communication and the main mode understood mutually.Due to sending out energetically of computer technology Exhibition, interactive capability (HCI, Human Computer Interaction) becomes increasingly have researching value and reality meaning Justice, how the emotion of people is identified the most just becoming most important by computer.Along with the development of information technology, either In laboratory or actual life, the emotion information that the mankind express is easy to be obtained by various sensors.Wherein scheme Picture and voice are to be easiest to the emotion information of acquisition and are also most important information for emotion recognition.
Calculating which emotion of function identification is a complicated problem, and in actual life, the emotion expressed by people is the most only There is subtle difference, and these difference mankind are difficult to distinguish, so computer can only identify some basic feelings for Mu Qian Sense, such as angry, dislike, fear, the basic emotion such as glad, sad and surprised.But these basic emotions are identified Emotion technology has had the field of being widely applied, such as in fields such as education, medical treatment, man-machine interaction, audio-visual amusements.
In the past few decades, there is a lot of emotion recognition based on single mode, be most commonly that facial expression emotion recognition, Speech emotion recognition and emotion recognition based on attitude, but single mode emotion recognition has bigger restriction, expressed by people Emotion information be a kind of multi-modal emotion information, such as one personal expression indignation, his sound, facial expression, health Attitude, heart rate and body temperature etc. all can from have under normal condition bigger different.If the emotion only with a kind of mode is special Levy and be identified, certainly will will not obtain preferable result, especially in actual environment.Result of study shows, compared to single mode State emotion recognition, multi-modal emotion recognition is relatively reliable and accurate.Multi-modal emotion recognition considers the multiple feelings expressed by people Sense information, comprehensively weighs the emotion expressed by people, for the interference of different situations (such as face in real life The problems such as image information, there may be different illumination, angle) the most more robustness.
For multi-modal emotion recognition, Feature Fusion is a most important ring, the difference that different sensors is obtained by it Affective characteristics carry out merging thus obtain fusion feature send into grader be identified.Common Feature fusion mainly divides It is three classes: score layer fusion method, Feature-level fusion method and Decision-level fusion method.These three method is in order to be conducive at present Real-time, the important information that should keep enough realizes Information Compression again, inevitably has information loss, causes identifying Degree of accuracy decline.Wherein Feature-level fusion method has a wide range of applications at voice and image domains.At present to multi-modal feelings The other research of the perception single mode emotion recognition that is far from is improved and abundant.
In prior art, Publication No. CN105138991A, entitled " a kind of based on emotion significant characteristics merge regard Frequently emotion identification method " patent of invention disclose a kind of based on emotion significant characteristics merge video feeling recognition methods, Its shortcoming is: the characteristics of image in video and phonetic feature can only be carried out Feature Fusion, poor expandability, it is impossible to other more Multi-modal feature carries out Feature Fusion;The image and the most non-immediate affective characteristics of phonetic feature that extract but use color emotion Intensity level and audio frequency sentiment dictionary are indicated;Blending algorithm is excessively simple, by the affective characteristics after simple Weighted Fusion Discriminability is poor.
Summary of the invention
After the technical problem to be solved in the present invention is the fusion that the Feature fusion for multi-modal emotion recognition exists The poor problem of affective characteristics identification, and existing single mode emotion recognition technology can not obtain and accurately identify knot The problem of fruit.
For solving the problems referred to above, the present invention is directed to the demand of human emotion's automatic evaluation system and man-machine interactive system, Propose, based on the multi-modal affective characteristics fusion method differentiating locality preserving projections, to provide one more for man-machine interaction problems Accurately, reliable approach.Concrete technical scheme is as follows:
Multi-modal affective characteristics fusion method based on discriminating locality preserving projections, comprises the following steps:
A, first in multi-modal affection data storehouse every kind of mode sample data extract affective characteristics, then by various moulds The affective characteristics vector of state carries out dimension-reduction treatment, for the sample d of jth kind modejDimensional feature vector xijrRepresent, i.e.Wherein, 1≤j≤m, m are the number of mode, and 1≤i≤c, c are the number of emotional category, 1≤r≤nij, nijFor belonging to In the i-th class emotion, the number of samples of jth kind mode, xijrRepresent the spy of r the sample belonging to the i-th class emotion, jth kind mode Levy vector;
B, characteristic vector to the different modalities after dimensionality reduction carry out differentiating locality preserving projections, obtain optimal projecting direction α;
C, respectively the characteristic vector of different modalities is mapped, YjTXj, XjFor c XijThe matrix of composition, i.e. Xj= [X1j,...,Xij,...,Xcj]T
D, will map after feature carry out series connection and obtain fusion feature:
Z=[αTX1,...,αTXj,...,αTXm]T
Further, step B is carried out after dimensionality reduction differentiate that locality preserving projections purpose is to solve for optimal projection matrix α, will The affective characteristics vector x of various modeijrIt is mapped in unified discriminating subspace, characteristic vector y after being mappedijr, tool Body step is as follows:
B1: definition within class scatter matrix
S W y = 1 2 ( Σ i = 1 c Σ j = 1 m Σ r = 1 n i j Σ l = 1 n i j ( y i j r - y i j l ) ( y i j r - y i j l ) T + Σ i = 1 c Σ j = 1 m Σ k = 1 , j ≠ k m Σ r = 1 n i j Σ l = 1 n i j ( y i j r - y i k l ) ( y i j r - y i k l ) T ) W r l
Wherein, yiklRepresent from the i-th class emotion, kth kind mode the mapping of the l sample after characteristic vector, 1≤k≤ M, WrlFor keeping weight from the local between phase feeling of sympathy and the characteristic vector of mode;
B2: definition inter _ class relationship matrix
S B y = 1 2 ( Σ i = 1 c Σ h = 1 c ( μ i - μ h ) ( μ i - μ h ) T ) B i h
Wherein BihFor keeping weight, μ from the local between the characteristic vector average of identical modeiIt is that the i-th class sample reflects Characteristic vector average after penetrating:
μ i = 1 n i Σ j = 1 m Σ r = 1 n i j y i j r
Wherein, niIt is the number of the i-th apoplexy due to endogenous wind sample, μhBe h class sample characteristic vector average;
B3: maximizing inter _ class relationship matrix, minimize within class scatter matrix, this target can be expressed as Optimization problem:
( α 1 * , α 2 * , ... , α m * ) = arg m a x α 1 , α 2 , ... , α m T r ( S B y ) T r ( S W y )
Wherein, Tr () is matrix trace.
Further, at definition within class scatter matrixStep B1 in, local between described characteristic vector keeps power Weight matrix Wrl, it is defined as follows:
Definition is from phase feeling of sympathy and characteristic vector x of modeijrWith xijlBetween local keep weight matrix
Wherein, xijlRepresent from the i-th class emotion, the characteristic vector of l sample of jth kind mode, 1≤l≤nij, ginseng Number t can be set by experience, and the characteristic vector from different emotions or mode is not considered the weight between them.
Further, at definition inter _ class relationship matrixStep B2 in, local between described characteristic vector average is protected Hold weight matrix Bih, concrete steps and being defined as follows:
First the i-th class emotion, the characteristic vector average of jth kind mode are calculated
μ i j ( x ) = 1 n i j Σ r = 1 n i j x i j r
WhereinSubscript (x) represent former sample space, calculate from h class emotion, the feature of jth kind mode equally Vector average
μ h j ( x ) = 1 n h j Σ r = 1 n i j x h j r
Wherein, nhjFor belonging to h class emotion, the number of samples of jth kind mode, xhjrRepresent and belong to h class emotion, jth kind The characteristic vector of r sample of mode, 1≤h≤c;
Define the characteristic vector average from identical modeWithBetween local keep weight matrix
Wherein, parameter t can be set by experience equally, for not considering the characteristic vector average from different modalities Weight between them.
Further, it is in step B3, described optimization problem, inter _ class relationship matrix will be maximized, minimize class Interior dispersion, obtains the projecting direction of maximumSpecifically comprise the following steps that
B3.1: convert the optimization problem in B3, obtains following optimization problem:
( α 1 * , α 2 * , ... , α m * ) = arg m a x α 1 , α 2 , ... , α m T r ( α T D α ) T r ( α T S α )
In optimization formula, denominator part is within class scatter matrix:
α T S α = [ α 1 T α 2 T ... α m T ] S 11 S 12 ... S 1 m S 21 S 22 ... S 2 m . . . . . . . . . . . . S m 1 S m 2 ... S m m α 1 α 2 . . . α m = S W y
Wherein matrixCan be expressed as:
S j k = Σ i = 1 c X i j LX i j T j = k - Σ i = 1 c n i j n i k μ i j ( x ) μ i k ( x ) T j ≠ k
Wherein μik (x)For from the i-th class emotion, the characteristic vector average of kth kind mode, nikFor from the i-th class emotion, The number of samples of k kind mode, XijFor nijIndividual characteristic vector xijrThe eigenmatrix of composition, L=mDrr-Wrl,DrrIt it is a diagonal angle Matrix, its value is row or column and (W is symmetrical matrix) of the weight matrix W of characteristic vector between sample, i.e.
Optimization formula Middle molecule part is inter _ class relationship matrix:
α T D α = [ α 1 T α 2 T ... α m T ] D 11 D 12 ... D 1 m D 21 D 22 ... D 2 m . . . . . . . . . . . . D m 1 D m 2 ... D m m α 1 α 2 . . . α m = S B y
Wherein matrixCan be expressed as:
D j k = m c 4 M j ( x ) E j j M j x ( T ) - 1 m 2 Σ j = 1 m Σ k = 1 m ( Σ i = 1 c n i j μ i j ( x ) ) ( Σ i = 1 c n i k μ i k ( x ) ) T j = k c 4 ( Σ i = 1 c μ i j ( x ) μ i k ( x ) T ) - 1 m 2 Σ j = 1 m Σ k = 1 m ( Σ i = 1 c n i j μ i j ( x ) ) ( Σ i = 1 c n i k μ i k ( x ) ) T j ≠ k
WhereinFor c mean vectorThe matrix of composition, EjjIt it is local holding weight B of averageihRow or Row and, i.e.
B3.2: owing to the optimization problem in B3.1 does not exist closed solutions, needs to change into the ratio of mark the mark of ratio, Finally give following optimization problem:
( α 1 * , α 2 * , ... , α m * ) = arg m a x α 1 , α 2 , ... , α m T r ( α T D α α T S α )
By the method for generalized eigenvalue decomposition, above formula solves optimal projection matrix
Compared with prior art, it is an advantage of the current invention that:
(1) affective characteristics using multi-modal fusion in emotion recognition problem has more compared to single mode affective characteristics High accuracy and objectivity, also have more preferable robustness in reality.
(2) multi-modal affective characteristics fusion method based on discriminating locality preserving projections, not only allows for inter _ class relationship, Have also contemplated that within-cluster variance, different classes of sample is had preferable discrimination, the locality preserving projections of introducing again can be very The good nonlinear situation of adaptation.Finally give the multi-modal emotional fusion feature being more suitable for emotion recognition.
Above-mentioned advantage is verified also by experimental result, and the present invention is by introducing based on differentiating locality preserving projections Multi-modal affective characteristics fusion method, be used in multi-modal expression classification recognition work in, can efficiently identify out anger, Six kinds of expressions such as disliking, fear, be glad, sad and surprised, for exploitation human emotion's automatic evaluation system and man-machine interaction system System provides a kind of new method and approach.
Accompanying drawing explanation
Fig. 1 is the flow chart based on the multi-modal affective characteristics fusion method differentiating locality preserving projections of the present invention.
Fig. 2 is the parts of images in bimodal emotion data base.
Detailed description of the invention
In conjunction with accompanying drawing, specific embodiments of the present invention are further described in detail.The present invention based on discriminating office Portion keeps the realization of the multi-modal affective characteristics fusion method of projection, as it is shown in figure 1, mainly comprise the steps of
Step 1: gather still image and the sound bite of video in multi-modal data storehouse
In specific implementation process, use eNTERFACE bimodal database.This data base comprises 1260 from 42 people Individual video segment, each video has affective tag, have expressed 6 kinds of basic emotions: angry, dislike, fear, glad, sad and Surprised (corresponding label 1-6 respectively), as shown in Figure 2.Video size is 720 × 576, and sample frequency is 25fps, sound in video Sample frequency be 48kHz.By video framing, take the abundantest frame static images as this video of wherein expressing one's feelings.Will be every Individual video separation goes out voice, as the sound bite that this video is corresponding.The corresponding still image of final each video segment and One section of voice.Arbitrarily choose wherein 75% image and corresponding voice as training sample, remaining 25% as test specimens This.
Step 2: image and voice messaging are carried out feature extraction, carries out dimension-reduction treatment, and represent by characteristic vector
First the still image obtained previous step carries out cutting, intercepts face part image, and size is 128 × 128, Then carry out aliging, the image pretreatment operation such as dimension normalization and gray balance, finally to image zooming-out Gabor, SIFT, The features such as LBP (are extracted Gabor characteristic) in the present embodiment.For sound bite, use the speech processes workbox of specialty OpenSmile extracts various features (being extracted emobase2010 feature in the present embodiment).Due to the characteristic vector warp extracted It is commonly present the problem that dimension is too high, uses the method for PCA dimensionality reduction to obtain the feature of suitable dimension, use djDimensional feature vector represents Characteristics of image after dimensionality reduction and speech feature vector, i.e.Wherein, 1≤j≤m, m are the number of mode, 1≤i≤c, c For the number of emotional category, 1≤r≤nij, nijFor belonging to the i-th class emotion, the number of samples of jth kind mode, xijrExpression belongs to I-th class emotion, the characteristic vector of r sample of jth kind mode, additionally, niBeing the number of the i-th apoplexy due to endogenous wind sample, n is all samples This number.C=6 in the present embodiment, m=2, nij=210, ni=420, n=1260, for the multi-modal number that other are different According to storehouse, only these parameters, such as m=3 during three modal data storehouses need to be changed.
Step 3: use and differentiate locality preserving projections method, solve optimal projection matrix α, the emotion of various mode is special Levy vector xijrIt is mapped in unified discriminating subspace, characteristic vector y after being mappedijr, specifically comprise the following steps that
First, definition is from identical category and characteristic vector x of modeijrWith xijlBetween local keep weight matrix
Wherein, xijlRepresent from the i-th class, the characteristic vector of l sample of jth mode, 1≤l≤nij, parameter t can Obtained by experience.Characteristic vector from different modalities or classification is not considered the weight between them.Then define all kinds of Within class scatter matrix
S W y = 1 2 ( Σ i = 1 c Σ j = 1 m Σ r = 1 n i j Σ l = 1 n i j ( y i j r - y i j l ) ( y i j r - y i j l ) T + Σ i = 1 c Σ j = 1 m Σ k = 1 , j ≠ k m Σ r = 1 n i j Σ l = 1 n i j ( y i j r - y i k l ) ( y i j r - y i k l ) T ) W r l - - - ( 2 )
Wherein, yiklRepresent from the i-th class emotion, kth kind mode the mapping of the l sample after characteristic vector, 1≤k≤ m。
Then, try to achieve from the i-th class emotion, the characteristic vector average of jth kind mode
μ i j ( x ) = 1 n i j Σ r = 1 n i j x i j r - - - ( 3 )
WhereinSubscript (x) represent former sample space, calculate from h class emotion, the feature of jth kind mode equally Vector average
μ h j ( x ) = 1 n h j Σ r = 1 n i j x h j r - - - ( 4 )
Wherein nhjFor belonging to h class emotion, the number of samples of jth kind mode, xhjrRepresent and belong to h class emotion, jth kind The characteristic vector of r sample of mode, 1≤h≤c.With within class scatter matrixSimilar, definition is from identical mode Characteristic vector averageWithBetween local keep weight matrix
Wherein, parameter t can be set by experience equally, for not considering the characteristic vector average from different modalities Weight between them.
Define all kinds of inter _ class relationship matrixes subsequently
S B y = 1 2 ( Σ i = 1 c Σ h = 1 c ( μ i - μ h ) ( μ i - μ h ) T ) B i h - - - ( 6 )
Wherein, μiIt is the meansigma methods of characteristic vector after the i-th class sample maps:
μ i = 1 n i Σ j = 1 m Σ r = 1 n i j y i j r ; - - - ( 7 )
It is similar to, μhIt it is the meansigma methods of h class sample characteristics.
Finally, minimize within class scatter matrix to maximize inter _ class relationship matrix simultaneously, obtain following optimization Formula:
( α 1 * , α 2 * , ... , α m * ) = arg m a x α 1 , α 2 , ... , α m T r ( S B y ) T r ( S W y ) - - - ( 8 )
Wherein, Tr () is matrix trace.Following optimization problem is obtained by abbreviation and conversion:
( α 1 * , α 2 * , ... , α m * ) = arg m a x α 1 , α 2 , ... , α m T r ( α T D α ) T r ( α T S α ) - - - ( 9 )
In optimization formula, denominator part is within class scatter matrix:
α T S α = [ α 1 T α 2 T ... α m T ] S 11 S 12 ... S 1 m S 21 S 22 ... S 2 m . . . . . . . . . . . . S m 1 S m 2 ... S m m α 1 α 2 . . . α m = S W y - - - ( 10 )
Wherein matrixCan be expressed as:
S j k = Σ i = 1 c X i j LX i j T j = k - Σ i = 1 c n i j n i k μ i j ( x ) μ i k ( x ) T j ≠ k - - - ( 11 )
Wherein μik (x)For from the i-th class emotion, the characteristic vector average of kth kind mode, nikFor from the i-th class emotion, Number of samples in k kind mode, XijFor nijIndividual characteristic vector xijrThe eigenmatrix of composition, L=mDrr-Wrl,DrrBe one right Angular moment battle array, its value is row or column and (W is symmetrical matrix) of the weight matrix W of characteristic vector between sample, i.e.
Optimization formula Middle molecule part is inter _ class relationship matrix:
α T D α = [ α 1 T α 2 T ... α m T ] D 11 D 12 ... D 1 m D 21 D 22 ... D 2 m . . . . . . . . . . . . D m 1 D m 2 ... D m m α 1 α 2 . . . α m = S B y - - - ( 12 )
Wherein matrixCan be expressed as:
D j k = m c 4 M j ( x ) E j j M j x ( T ) - 1 m 2 Σ j = 1 m Σ k = 1 m ( Σ i = 1 c n i j μ i j ( x ) ) ( Σ i = 1 c n i k μ i k ( x ) ) T j = k c 4 ( Σ i = 1 c μ i j ( x ) μ i k ( x ) T ) - 1 m 2 Σ j = 1 m Σ k = 1 m ( Σ i = 1 c n i j μ i j ( x ) ) ( Σ i = 1 c n i k μ i k ( x ) ) T j ≠ k - - - ( 13 )
WhereinMean vector for c featureThe matrix of composition, EjjIt it is local holding weight B of averageihRow Or row and, i.e.
Owing to formula (9) does not exist closed solutions, need to change into the ratio of mark the mark of ratio:
( α 1 * , α 2 * , ... , α m * ) = arg m a x α 1 , α 2 , ... , α m T r ( α T D α α T S α ) - - - ( 14 )
By generalized eigenvalue decomposition, solve formula (13), obtain optimal mapping
Step 4: training sample, test sample are projected the feature after being mapped, the feature after mapping is carried out Series connection obtains fusion feature
Characteristics of image and phonetic feature are multiplied by α respectively map, YjTXj, wherein XjFor c XijThe square of composition Battle array, i.e. Xj=[X1j,...,Xij,...,Xcj]T, then the feature mapped to be connected, concrete grammar is as follows:
Z t r a i n = [ α T X 1 t r a i n , ... , α T X j t r a i n , ... , α T X m t r a i n ] T - - - ( 15 )
Z t e s t = [ α T X 1 t e s t , ... , α T X j t e s t , ... , α T X m t e s t ] T - - - ( 16 )
Step 5: the fusion feature of training sample is sent into and is trained in grader and tests by test sample
The fusion feature of training sample obtained in the previous step is sent in grader (using libSVM in the present embodiment), Obtain suitable model and parameter by the training of grader, finally test data are sent in grader and be identified result.
The embodiment of above detailed description of the invention, not in order to limit the present invention, all the spirit and principles in the present invention it In, any modification, equivalent substitution and improvement etc. made, should be included within the scope of the present invention.

Claims (5)

1. based on the multi-modal affective characteristics fusion method differentiating locality preserving projections, it is characterised in that comprise the following steps:
A, first in multi-modal affection data storehouse every kind of mode sample data extract affective characteristics, then by various mode Affective characteristics vector carries out dimension-reduction treatment, for the sample d of jth kind modejDimensional feature vector xijrRepresent, i.e.Wherein, 1≤j≤m, m are the number of mode, and 1≤i≤c, c are the number of emotional category, 1≤r≤nij, nijFor belonging to In the i-th class emotion, the number of samples of jth kind mode, xijrRepresent the spy of r the sample belonging to the i-th class emotion, jth kind mode Levy vector;
B, characteristic vector to the different modalities after dimensionality reduction carry out differentiating locality preserving projections, obtain optimal projecting direction α;
C, respectively the characteristic vector of different modalities is mapped, YjTXj, XjFor c XijThe matrix of composition, i.e. Xj= [X1j,...,Xij,...,Xcj]T
D, will map after feature carry out series connection and obtain fusion feature:
Z=[αTX1,...,αTXj,...,αTXm]T
Multi-modal affective characteristics fusion method based on discriminating locality preserving projections the most according to claim 1, its feature Being in step B, described discriminating locality preserving projections purpose is to solve for optimal projection matrix α, and the emotion of various mode is special Levy vector xijrIt is mapped in unified discriminating subspace, characteristic vector y after being mappedijr, specifically comprise the following steps that
B1: definition within class scatter matrix
S W y = 1 2 ( Σ i = 1 c Σ j = 1 m Σ r = 1 n i j Σ l = 1 n i j ( y i j r - y i j l ) ( y i j r - y i j l ) T + Σ i = 1 c Σ j = 1 m Σ k = 1 , j ≠ k m Σ r = 1 n i j Σ l = 1 n i j ( y i j r - y i k l ) ( y i j r - y i k l ) T ) W r l
Wherein, yiklRepresent from the i-th class emotion, kth kind mode the mapping of the l sample after characteristic vector, 1≤k≤m, Wrl For keeping weight from the local between phase feeling of sympathy and the characteristic vector of mode;
B2: definition inter _ class relationship matrix
S B y = 1 2 ( Σ i = 1 c Σ h = 1 c ( μ i - μ h ) ( μ i - μ h ) T ) B i h
Wherein BihFor keeping weight, μ from the local between the characteristic vector average of identical modeiIt is after the i-th class sample maps Characteristic vector average:
μ i = 1 n i Σ j = 1 m Σ r = 1 n i j y i j r
Wherein, niIt is the number of the i-th apoplexy due to endogenous wind sample, μhBe h class sample characteristic vector average;
B3: maximizing inter _ class relationship matrix, minimize within class scatter matrix, this target can be expressed as optimization Problem:
( α 1 * , α 2 * , ... , α m * ) = arg m a x α 1 , α 2 , ... , α m T r ( S B y ) T r ( S W y )
Wherein, Tr () is matrix trace.
Multi-modal affective characteristics fusion method based on discriminating locality preserving projections the most according to claim 2, its feature Being in step B1, the local between described characteristic vector keeps weight matrix Wrl, it is defined as follows:
Definition is from phase feeling of sympathy and characteristic vector x of modeijrWith xijlBetween local keep weight matrix
Wherein, xijlRepresent from the i-th class emotion, the characteristic vector of l sample of jth kind mode, 1≤l≤nij, parameter t can It is set by experience, the characteristic vector from different emotions or mode is not considered the weight between them.
Multi-modal affective characteristics fusion method based on discriminating locality preserving projections the most according to claim 2, its feature Being in step B2, the local between described characteristic vector average keeps weight matrix Bih, concrete steps and being defined as follows:
First the i-th class emotion, the characteristic vector average of jth kind mode are calculated
μ i j ( x ) = 1 n i j Σ r = 1 n i j x i j r
WhereinSubscript (x) represent former sample space, calculate from h class emotion, the characteristic vector of jth kind mode equally Average
μ h j ( x ) = 1 n h j Σ r = 1 n i j x h j r
Wherein, nhjFor belonging to h class emotion, the number of samples of jth kind mode, xhjrRepresent and belong to h class emotion, jth kind mode The characteristic vector of r sample, 1≤h≤c;
Define the characteristic vector average from identical modeWithBetween local keep weight matrix
Wherein, parameter t can be set by experience equally, for the characteristic vector average from different modalities is not considered them Between weight.
Multi-modal affective characteristics fusion method based on discriminating locality preserving projections the most according to claim 2, its feature It is in step B3, described optimization problem, inter _ class relationship matrix will be maximized, minimize within-cluster variance, obtain Big projecting directionSpecifically comprise the following steps that
B3.1: convert the optimization problem in B3, obtains following optimization problem:
( α 1 * , α 2 * , ... , α m * ) = arg m a x α 1 , α 2 , ... , α m T r ( α T D α ) T r ( α T S α )
In optimization formula, denominator part is within class scatter matrix:
α T S α = α 1 T α 2 T ... α m T S 11 S 12 ... S 1 m S 21 S 22 ... S 2 m . . . . . . . . . . . . S m 1 S m 2 ... S m m α 1 α 2 . . . α m = S W y
Wherein matrixCan be expressed as:
S j k = Σ i = 1 c X i j LX i j T j = k - Σ i = 1 c n i j n i k μ i j ( x ) μ i k ( x ) T j ≠ k
Wherein μik (x)For from the i-th class emotion, the characteristic vector average of kth kind mode, nikFor from the i-th class emotion, kth kind mould The number of samples of state, XijFor nijIndividual characteristic vector xijrThe eigenmatrix of composition, L=mDrr-Wrl,DrrIt is a diagonal matrix, Its value is row or column and (W is symmetrical matrix) of the weight matrix W of characteristic vector between sample, i.e.
Optimization formula Middle molecule part is inter _ class relationship matrix:
α T D α = α 1 T α 2 T ... α m T D 11 D 12 ... D 1 m D 21 D 22 ... D 2 m . . . . . . . . . . . . D m 1 D m 2 ... D m m α 1 α 2 . . . α m = S B y
Wherein matrixCan be expressed as:
D j k = m c 4 M j ( x ) E j j M j ( x ) T - 1 m 2 Σ j = 1 m Σ k = 1 m ( Σ i = 1 c n i j μ i j ( x ) ) ( Σ i = 1 c n i k μ i k ( x ) ) T j = k c 4 ( Σ i = 1 c μ i j ( x ) μ i k ( x ) T ) - 1 m 2 Σ j = 1 m Σ k = 1 m ( Σ i = 1 c n i j μ i j ( x ) ) ( Σ i = 1 c n i k μ i k ( x ) ) T j ≠ k
WhereinFor c mean vectorThe matrix of composition, EjjIt it is local holding weight B of averageihRow or row and, I.e.
B3.2: owing to the optimization problem in B3.1 does not exist closed solutions, needs to change into the ratio of mark the mark of ratio, finally Obtain following optimization problem:
( α 1 * , α 2 * , ... , α m * ) = arg m a x α 1 , α 2 , ... , α m T r ( α T D α α T S α )
By the method for generalized eigenvalue decomposition, above formula solves optimal projection matrix
CN201610397708.4A 2016-06-07 2016-06-07 Multi-mode emotional feature fusion method based on identification of local preserving projection Active CN106096642B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610397708.4A CN106096642B (en) 2016-06-07 2016-06-07 Multi-mode emotional feature fusion method based on identification of local preserving projection

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610397708.4A CN106096642B (en) 2016-06-07 2016-06-07 Multi-mode emotional feature fusion method based on identification of local preserving projection

Publications (2)

Publication Number Publication Date
CN106096642A true CN106096642A (en) 2016-11-09
CN106096642B CN106096642B (en) 2020-11-13

Family

ID=57227299

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610397708.4A Active CN106096642B (en) 2016-06-07 2016-06-07 Multi-mode emotional feature fusion method based on identification of local preserving projection

Country Status (1)

Country Link
CN (1) CN106096642B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776740A (en) * 2016-11-17 2017-05-31 天津大学 A kind of social networks Text Clustering Method based on convolutional neural networks
CN108122006A (en) * 2017-12-20 2018-06-05 南通大学 Embedded method for diagnosing faults is locally kept based on differential weights
CN109284783A (en) * 2018-09-27 2019-01-29 广州慧睿思通信息科技有限公司 Machine learning-based worship counting method and device, user equipment and medium
CN109584885A (en) * 2018-10-29 2019-04-05 李典 A kind of audio-video output method based on multimode emotion recognition technology
CN109872728A (en) * 2019-02-27 2019-06-11 南京邮电大学 Voice and posture bimodal emotion recognition method based on kernel canonical correlation analysis
CN112289306A (en) * 2020-11-18 2021-01-29 上海依图网络科技有限公司 Method and device for identifying minor based on human body characteristics

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544963A (en) * 2013-11-07 2014-01-29 东南大学 Voice emotion recognition method based on core semi-supervised discrimination and analysis
CN105138991A (en) * 2015-08-27 2015-12-09 山东工商学院 Video emotion identification method based on emotion significant feature integration
CN104778689B (en) * 2015-03-30 2018-01-05 广西师范大学 A kind of image hashing method based on average secondary image and locality preserving projections

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103544963A (en) * 2013-11-07 2014-01-29 东南大学 Voice emotion recognition method based on core semi-supervised discrimination and analysis
CN104778689B (en) * 2015-03-30 2018-01-05 广西师范大学 A kind of image hashing method based on average secondary image and locality preserving projections
CN105138991A (en) * 2015-08-27 2015-12-09 山东工商学院 Video emotion identification method based on emotion significant feature integration

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
WEI-LUN CHAO, ET AL.: "Facial expression recognition based on improved local binary pattern and class-regularized locality preserving projection", 《SIGNAL PROCESSING》 *
WEIWEI YU ET AL.: "Discriminant Locality Preserving Projections: A New Method to Face Representation and Recognition", 《PROCEEDINGS 2ND JOINT IEEE INTERNATIONAL WORKSHOP ON VS-PETS》 *
ZHIHONG ZENG, ET AL.: "Audio-Visual Emotion Recognition in Adult Attachment Interview", 《ICMI "06 PROCEEDINGS OF THE 8TH INTERNATIONAL CONFERENCE ON MULTIMODAL INTERFACES》 *
张石清: "基于语音和人脸的情感识别研究", 《中国博士学位论文全文数据库信息科技辑》 *
徐涛: "局部保持典型相关分析及其在人脸识别中的应用", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
韩丹: "融合局部特征学习的表情分析技术研究及应用系统", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106776740A (en) * 2016-11-17 2017-05-31 天津大学 A kind of social networks Text Clustering Method based on convolutional neural networks
CN108122006A (en) * 2017-12-20 2018-06-05 南通大学 Embedded method for diagnosing faults is locally kept based on differential weights
CN109284783A (en) * 2018-09-27 2019-01-29 广州慧睿思通信息科技有限公司 Machine learning-based worship counting method and device, user equipment and medium
CN109584885A (en) * 2018-10-29 2019-04-05 李典 A kind of audio-video output method based on multimode emotion recognition technology
CN109872728A (en) * 2019-02-27 2019-06-11 南京邮电大学 Voice and posture bimodal emotion recognition method based on kernel canonical correlation analysis
CN112289306A (en) * 2020-11-18 2021-01-29 上海依图网络科技有限公司 Method and device for identifying minor based on human body characteristics
CN112289306B (en) * 2020-11-18 2024-03-26 上海依图网络科技有限公司 Juvenile identification method and device based on human body characteristics

Also Published As

Publication number Publication date
CN106096642B (en) 2020-11-13

Similar Documents

Publication Publication Date Title
CN106096642A (en) Based on the multi-modal affective characteristics fusion method differentiating locality preserving projections
CN112990054B (en) Compact linguistics-free facial expression embedding and novel triple training scheme
CN105739688A (en) Man-machine interaction method and device based on emotion system, and man-machine interaction system
CN109190561B (en) Face recognition method and system in video playing
Areeb et al. Helping hearing-impaired in emergency situations: A deep learning-based approach
CN104063721B (en) A kind of human behavior recognition methods learnt automatically based on semantic feature with screening
CN109637522A (en) A kind of speech-emotion recognition method extracting deep space attention characteristics based on sound spectrograph
CN105516280A (en) Multi-mode learning process state information compression recording method
Kaluri et al. An enhanced framework for sign gesture recognition using hidden Markov model and adaptive histogram technique.
CN105205449A (en) Sign language recognition method based on deep learning
CN109034099A (en) A kind of expression recognition method and device
Hazourli et al. Multi-facial patches aggregation network for facial expression recognition and facial regions contributions to emotion display
Jing et al. Recognizing american sign language manual signs from rgb-d videos
CN115936944B (en) Virtual teaching management method and device based on artificial intelligence
CN111028319A (en) Three-dimensional non-photorealistic expression generation method based on facial motion unit
CN110163156A (en) It is a kind of based on convolution from the lip feature extracting method of encoding model
CN106991385A (en) A kind of facial expression recognizing method of feature based fusion
Chetty et al. A multilevel fusion approach for audiovisual emotion recognition
CN111985532B (en) Scene-level context-aware emotion recognition deep network method
CN114724224A (en) Multi-mode emotion recognition method for medical care robot
Abdulsalam et al. Emotion recognition system based on hybrid techniques
Aran et al. Sign language tutoring tool
Ullah et al. Emotion recognition from occluded facial images using deep ensemble model.
Guo et al. Facial expression recognition: a review
CN111368663A (en) Method, device, medium and equipment for recognizing static facial expressions in natural scene

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: Room 201, building 2, phase II, No.1 Kechuang Road, Yaohua street, Qixia District, Nanjing City, Jiangsu Province, 210003

Applicant after: NANJING University OF POSTS AND TELECOMMUNICATIONS

Address before: 210003 Gulou District, Jiangsu, Nanjing new model road, No. 66

Applicant before: NANJING University OF POSTS AND TELECOMMUNICATIONS

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant