CN108985358A - Emotion identification method, apparatus, equipment and storage medium - Google Patents
Emotion identification method, apparatus, equipment and storage medium Download PDFInfo
- Publication number
- CN108985358A CN108985358A CN201810694899.XA CN201810694899A CN108985358A CN 108985358 A CN108985358 A CN 108985358A CN 201810694899 A CN201810694899 A CN 201810694899A CN 108985358 A CN108985358 A CN 108985358A
- Authority
- CN
- China
- Prior art keywords
- session
- modal
- session information
- fusion
- vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 230000008451 emotion Effects 0.000 title claims abstract description 66
- 238000000034 method Methods 0.000 title claims abstract description 44
- 230000004927 fusion Effects 0.000 claims abstract description 70
- 230000002996 emotional effect Effects 0.000 claims abstract description 29
- 239000013598 vector Substances 0.000 claims description 64
- 238000000605 extraction Methods 0.000 claims description 6
- 238000010801 machine learning Methods 0.000 claims description 5
- 238000004590 computer program Methods 0.000 claims description 4
- 238000012549 training Methods 0.000 abstract description 19
- 230000036651 mood Effects 0.000 abstract description 11
- 230000008569 process Effects 0.000 abstract description 11
- 238000010586 diagram Methods 0.000 description 8
- 230000002452 interceptive effect Effects 0.000 description 6
- 238000012545 processing Methods 0.000 description 6
- 230000003287 optical effect Effects 0.000 description 5
- 239000000284 extract Substances 0.000 description 4
- 230000006870 function Effects 0.000 description 4
- 230000005291 magnetic effect Effects 0.000 description 4
- 238000002844 melting Methods 0.000 description 4
- 230000008018 melting Effects 0.000 description 4
- 238000013473 artificial intelligence Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 2
- 230000002902 bimodal effect Effects 0.000 description 2
- 238000013136 deep learning model Methods 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000003993 interaction Effects 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000008909 emotion recognition Effects 0.000 description 1
- 230000008921 facial expression Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
- G06F18/253—Fusion techniques of extracted features
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Hospice & Palliative Care (AREA)
- Evolutionary Biology (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Child & Adolescent Psychology (AREA)
- Evolutionary Computation (AREA)
- Psychiatry (AREA)
- Signal Processing (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Image Analysis (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
The embodiment of the invention discloses a kind of Emotion identification method, apparatus, equipment and storage mediums.Wherein, this method comprises: determining the fusion session characteristics of multi-modal session information;In the multi-modal Emotion identification model that the fusion session characteristics input of the multi-modal session information is constructed in advance, the emotional characteristics of the multi-modal session information are obtained.Technical solution provided in an embodiment of the present invention, by the way that the session characteristics of mode each in multi-modal session information are merged to obtain fusion session characteristics, and the fusion session characteristics are input in a unified multi-modal Emotion identification model, for model training, final mood result can directly be predicted, it is not necessary that the identification model of each mode is respectively trained, and carry out the fusion of different model results.Sample training process is simplified, and improves the accuracy of Emotion identification result.
Description
Technical field
The present embodiments relate to field of artificial intelligence more particularly to a kind of Emotion identification method, apparatus, equipment and
Storage medium.
Background technique
With the development of artificial intelligence, intelligent interaction plays increasingly important role in more and more fields.
And in intelligent interaction, an important direction is how to identify the emotional state that user is current in multi-modal interactive process, from
And the feedback of mood level is provided for entire intelligent interactive system, it makes adjustment in time, to cope under different emotional states
User promotes the service quality of entire interactive process.
Currently, main Emotion identification method is as shown in Figure 1, whole process is as follows: by voice, text and facial expression image
Independent modeling is carried out etc. each mode, and is fused together the result of each model finally, according to rule or engineering
Model is practised, fusion judgement, the multi-modal Emotion identification result of one entirety of final output are carried out to the result of multiple mode.
Since meaning is different under different scenes for same word, the emotional state of expression is different, and above method versatility
It is poor;In addition it is also necessary to acquire mass data, higher cost and result controllability is poor dependent on manual operation.
Summary of the invention
The embodiment of the invention provides a kind of Emotion identification method, apparatus, equipment and storage mediums, simplify sample training
Process, and improve the accuracy of Emotion identification result.
In a first aspect, the embodiment of the invention provides a kind of Emotion identification methods, this method comprises:
Determine the fusion session characteristics of multi-modal session information;
In the multi-modal Emotion identification model that the fusion session characteristics input of the multi-modal session information is constructed in advance,
Obtain the emotional characteristics of the multi-modal session information.Second aspect, the embodiment of the invention also provides a kind of Emotion identification dresses
It sets, which includes:
Fusion feature determining module, for determining the fusion session characteristics of multi-modal session information;
Emotional characteristics determining module, for construct the fusion session characteristics input of the multi-modal session information in advance
In multi-modal Emotion identification model, the emotional characteristics of the multi-modal session information are obtained.
The third aspect, the embodiment of the invention also provides a kind of equipment, which includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processing
Device realizes any Emotion identification method in first aspect.
Fourth aspect, the embodiment of the invention also provides a kind of storage mediums, are stored thereon with computer program, the program
Any Emotion identification method in first aspect is realized when being executed by processor.
Technical solution provided in an embodiment of the present invention, by melting the session characteristics of mode each in multi-modal session information
Conjunction obtains fusion session characteristics, and the fusion session characteristics are input in a unified multi-modal Emotion identification model, supplies
Model training, so that it may directly predict final mood as a result, being not necessarily to that the identification model of each mode is respectively trained, and carry out difference
The fusion of model result.Sample training process is simplified, and improves the accuracy of Emotion identification result.
Detailed description of the invention
Fig. 1 is a kind of multi-modal emotion recognition schematic diagram based on independent modal training that the prior art provides;
Fig. 2A is a kind of flow chart of the Emotion identification method provided in the embodiment of the present invention one;
Fig. 2 B be the present invention implement be applicable in based on multi-modal Fusion Features learning model schematic diagram;
Fig. 3 is a kind of flow chart of the Emotion identification method provided in the embodiment of the present invention two;
Fig. 4 is a kind of structural block diagram of the Emotion identification device provided in the embodiment of the present invention three;
Fig. 5 is a kind of structural schematic diagram of the equipment provided in the embodiment of the present invention four.
Specific embodiment
The embodiment of the present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this
Locate described specific embodiment and is used only for explaining the embodiment of the present invention, rather than limitation of the invention.It further needs exist for
Bright, only parts related to embodiments of the present invention are shown for ease of description, in attached drawing rather than entire infrastructure.
Embodiment one
Fig. 2A is a kind of flow chart for Emotion identification method that the embodiment of the present invention one provides, and Fig. 2 B is the embodiment of the present invention
Be applicable in based on multi-modal Fusion Features learning model schematic diagram.The present embodiment is suitable for how multi-modal interactive process
The case where user emotion is recognized accurately.This method can be executed by Emotion identification device provided in an embodiment of the present invention, should
The mode that software and/or hardware can be used in device is realized, and can be integrated in and be calculated in equipment.A and 2B referring to fig. 2, this method tool
Body includes:
S210 determines the fusion session characteristics of multi-modal session information.
Wherein, the term used when mode is a kind of interactive, it is multi-modal refer to integrated use text, image, video,
The phenomenon that multiple means such as voice and gesture and symbolic carrier interact.Corresponding, multi-modal session information is while wrapping
Session information containing at least two mode such as includes the session information of three kinds of voice, text and image mode simultaneously.
Merging session characteristics is by merge by the session characteristics for the different modalities for including in a session information
It arrives.Optionally, deep learning model can be used, while considering the multiple modal characteristics for including in a session information to determine
The fusion session characteristics of multi-modal session information.
The fusion session characteristics of multi-modal session information are inputted the multi-modal Emotion identification model constructed in advance by S220
In, obtain the emotional characteristics of multi-modal session information.
Wherein, multi-modal Emotion identification model is known based on language identification, intelligent knowledge figure and the text in artificial intelligence
The model that other technology etc. is established;Specifically, can be in advance using sample data set to initial machine learning model such as neural network
What model training obtained.Emotional characteristics are multi-modal Emotion identification as a result, for characterizing individual to a kind of state of extraneous things
Degree, may include type of emotion and emotional intensity etc.;Type of emotion may include happiness, anger, sorrow and pleasure etc.;Emotional intensity is to be used for
Characterize the degree of strength of a certain mood.
Illustratively, the fusion session characteristics of multi-modal session information are inputted to the multi-modal Emotion identification mould constructed in advance
It can also include: to believe according to the fusion session characteristics of multi-modal session sample information and multi-modal session sample before in type
The emotional characteristics of breath are trained initial machine learning model, obtain multi-modal Emotion identification model.
Specifically, obtaining a large amount of multi-modal meeting by constantly accumulating the session information under various scenes in interactive process
The fusion session characteristics of sample information and the emotional characteristics of corresponding multi-modal session sample information are talked about, as training sample
This collection is input in neural network and is trained to it, after training by each sample, obtains multi-modal Emotion identification model.
When the fusion session characteristics of a multi-modal session information are input in the multi-modal Emotion identification model, model can combine should
The existing parameter of model judges the fusion session characteristics of input, and exports corresponding emotional characteristics.
It should be noted that since the prior art needs individually to establish identification model to each mode, and by each model
As a result weighting obtains final mood result, it is therefore desirable to a large amount of training sample, and there are the moulds that single mode learns out
Type poor quality, the problem of the whole Emotion identification effect difference eventually led to.And the present embodiment B referring to fig. 2, due to directly will be more
The session characteristics of each mode merge to obtain fusion session characteristics in mode session information, and only need to be by fusion session characteristics input
In the multi-modal Emotion identification model unified to one, for model training, so that it may export final emotional characteristics, training sample phase
Greatly reduce than the prior art;And due to the fusion of multi-modal session characteristics so that the multi-modal Emotion identification model can not only be learned
The characteristic information of each mode is practised, can also be learnt to the characteristic relation between different modalities, be can be avoided and the prior art occur
Since the model quality that single mode learns out is bad, the problem of the whole Emotion identification effect difference eventually led to.
It is illustrated by taking text and voice bimodal session information as an example.When user says " I just want to buy apple X now,
Just want that certificate ", when this sentence, if considering text modality information and speech modality respectively using existing technology
Information eventually leads to Emotion identification result inaccuracy then can this sentence be not know be noted as negative emotions.But
It is, using the technical solution of this implementation along with the letter in terms of the speech modality of user while considering text modality information
Breath, for example, when user says this sentence voice fluctuating it is very violent, by by " text "+" voice " bimodal Fusion Features,
It is negative emotions that mood, which can finally be recognized accurately,.
Furthermore, it is necessary to which, it is emphasized that the emotional characteristics of multi-modal session sample information used by the present embodiment are comprehensive
It closes and considers to be labeled multi-modal session information in the case where each mode, it can be ensured that the emotional state marked is
Do not have ambiguous, constructs a more accurate data set for model training below, make finally obtained multi-modal Emotion identification mould
Type is more acurrate.And the prior art is that independent each mode is labeled, and since independence is labeled a mode, Ke Nengwu
Method correctly marks the emotional characteristics of a sentence, and the recognition accuracy that will lead to the corresponding mood model of each mode is poor,
Eventually lead to subsequent result fusing stage effect decline.
Technical solution provided in an embodiment of the present invention, by melting the session characteristics of mode each in multi-modal session information
Conjunction obtains fusion session characteristics, and the fusion session characteristics are input in a unified multi-modal Emotion identification model, supplies
Model training, so that it may directly predict final mood as a result, being not necessarily to that the identification model of each mode is respectively trained, and carry out difference
The fusion of model result.Sample training process is simplified, and improves the accuracy of Emotion identification result.
Embodiment two
Fig. 3 is a kind of flow chart of Emotion identification method provided by Embodiment 2 of the present invention, and the present embodiment is in above-mentioned implementation
On the basis of example one, further the fusion session characteristics of the multi-modal session information of determination are optimized.Referring to Fig. 3, the party
Method specifically includes:
S310 determines at least two mode meetings in voice conversation information, text session information and image session information respectively
The vector for talking about information indicates.
Illustratively, multi-modal session information may include: voice conversation information, text session information and image session letter
Breath.The vector expression of session information refers to expression of the session information in vector space, can be obtained by modeling.
Specifically, the characteristic parameter of emotional change can be characterized by extracting respectively, to text in voice conversation information
Session information cut sentence and word cutting etc. extract keyword and extract in image session information effective dynamic expression feature or
Static expressive features, and be input to vector and extract in model, the vector of available voice conversation information indicates, image session is believed
The vector of breath indicates and the vector of text session information indicates.Vector extraction model, which can be one, to be had phonetic feature, text
This keyword and characteristics of image etc. are converted to the collective model that corresponding vector indicates, are also possible to each submodel and are composed
's.
S320 merges the vector expression of at least two mode session informations, obtains melting for multi-modal session information
The vector for closing session characteristics indicates.
Specifically, the vector of each mode session information being indicated to, direct splicing is one long according to certain rules
Unified vector table be shown as multi-modal session information fusion session characteristics vector indicate, to realize multiple modalities meeting
Talk about the fusion that the vector of information indicates.Key message part in the vector expression of each mode session information of extraction can also be passed through
Vector indicate, and splice to obtain the vector expression of the fusion session characteristics of multi-modal session information.
Illustratively, carrying out fusion to the vector expression of at least two mode session informations may include: according to preset
Mode sequence indicates to carry out sequential concatenation to the vector of at least two mode session informations.
Wherein, preset mode sequence can be the sequencing of pre-set mode input, can be according to the actual situation
It is modified.It such as can increase, delete or be inserted into some mode, so as to dynamically adjust the sequencing of each mode input.
Specifically, when the vector that the corresponding each mode session information of the multi-modal session information of input has been determined indicates
Afterwards, according to the input sequence of each mode, the vector expression of each mode session information is directly connected to, to realize a variety of moulds
The fusion that the vector of state session information indicates.
Illustratively, merging to the vector expression of at least two mode session informations can also include: to extract respectively
The nonlinear characteristic that the vector of at least two mode session informations indicates;To the non-thread of at least two mode session informations of extraction
Property feature is merged.
Wherein, the nonlinear characteristic that vector indicates is used to characterize the unique portion of a vector, can be in vector expression
It is not 0 part.The nonlinear characteristic that the vector of a corresponding mode session information indicates refers to a mode session information
In can identify mood word vector indicate.Such as the vector of a mode session information is expressed as [0,1,1,0,0], then
The nonlinear characteristic that the vector of the mode session information indicates can be [1,1].
Specifically, B can be by the vector table of each mode session information in multi-modal Fusion Features layer referring to fig. 2
Show, be input in deep learning model, first passes through one layer of full articulamentum (Full Connection Layer, FCL) behaviour respectively
Make, extracts the nonlinear characteristic that the vector of each mode session information indicates, obtain corresponding hidden layer vector;Then by the hidden of output
Layer vector is stitched together to realize the fusion that the vector of multiple modalities session information indicates.
The fusion session characteristics of multi-modal session information are inputted the multi-modal Emotion identification model constructed in advance by S330
In, obtain the emotional characteristics of multi-modal session information.
Specifically, the expression of the vectors of the fusion session characteristics of multi-modal session information is input to construct in advance it is multi-modal
In Emotion identification model, model can judge the fusion session characteristics of input, and export in conjunction with the existing parameter of the model
Corresponding emotional characteristics.
Technical solution provided in an embodiment of the present invention, by by mode session information each in multi-modal session information to
Amount indicates that the vector for being merged to obtain the fusion session characteristics of multi-modal session information indicates, and by the fusion session characteristics
Vector expression is input in a unified multi-modal Emotion identification model, for model training, so that it may directly predict final
Mood is as a result, be not necessarily to that the identification model of each mode is respectively trained, and carry out the fusion of different model results.Simplify sample training
Process, and improve the accuracy of Emotion identification result.
Embodiment three
Fig. 4 is a kind of structural block diagram for Emotion identification device that the embodiment of the present invention three provides, which can be performed this hair
Emotion identification method provided by bright any embodiment has the corresponding functional module of execution method and beneficial effect.Such as Fig. 4 institute
Show, the apparatus may include:
Fusion feature determining module 410, for determining the fusion session characteristics of multi-modal session information;
Emotional characteristics determining module 420, for construct the fusion session characteristics input of multi-modal session information in advance
In multi-modal Emotion identification model, the emotional characteristics of multi-modal session information are obtained.
Technical solution provided in an embodiment of the present invention, by melting the session characteristics of mode each in multi-modal session information
Conjunction obtains fusion session characteristics, and the fusion session characteristics are input in a unified multi-modal Emotion identification model, supplies
Model training, so that it may directly predict final mood as a result, being not necessarily to that the identification model of each mode is respectively trained, and carry out difference
The fusion of model result.Sample training process is simplified, and improves the accuracy of Emotion identification result.
Illustratively, fusion feature determining module 410 may include:
Multi-modal vector determination unit, for determining voice conversation information, text session information and image session letter respectively
The vector of at least two mode session informations indicates in breath;
Vector determination unit is merged, merges, obtains more for the vector expression at least two mode session informations
The vector of the fusion session characteristics of mode session information indicates.
Optionally, fusion vector determination unit is specifically used for:
According to preset mode sequence, the vector of at least two mode session informations is indicated to carry out sequential concatenation.
Optionally, fusion vector determination unit also particularly useful for:
The nonlinear characteristic that the vector of at least two mode session informations indicates is extracted respectively;To at least two moulds of extraction
The nonlinear characteristic of state session information is merged.
Illustratively, above-mentioned apparatus can also include:
Identification model determining module, for according to multi-modal session sample information fusion session characteristics and multi-modal meeting
The emotional characteristics for talking about sample information, are trained initial machine learning model, obtain multi-modal Emotion identification model.
Example IV
Fig. 5 is a kind of structural schematic diagram for equipment that the embodiment of the present invention four provides, and Fig. 5, which is shown, to be suitable for being used to realizing this
The block diagram of the example devices of inventive embodiments embodiment.The equipment 12 that Fig. 5 is shown is only an example, should not be to this hair
The function and use scope of bright embodiment bring any restrictions.As shown in figure 5, the table in the form of universal computing device of equipment 12
It is existing.The component of equipment 12 can include but is not limited to: one or more processor or processing unit 16, system storage
28, connect the bus 18 of different system components (including system storage 28 and processing unit 16).
Bus 18 indicates one of a few class bus structures or a variety of, including memory bus or Memory Controller,
Peripheral bus, graphics acceleration port, processor or the local bus using any bus structures in a variety of bus structures.It lifts
For example, these architectures include but is not limited to industry standard architecture (ISA) bus, microchannel architecture (MAC)
Bus, enhanced isa bus, Video Electronics Standards Association (VESA) local bus and peripheral component interconnection (PCI) bus.
Equipment 12 typically comprises a variety of computer system readable media.These media can be it is any can be by equipment 12
The usable medium of access, including volatile and non-volatile media, moveable and immovable medium.
System storage 28 may include the computer system readable media of form of volatile memory, such as arbitrary access
Memory (RAM) 30 and/or cache memory 32.Equipment 12 may further include it is other it is removable/nonremovable,
Volatile/non-volatile computer system storage medium.Only as an example, storage system 34 can be used for reading and writing irremovable
, non-volatile magnetic media (Fig. 5 do not show, commonly referred to as " hard disk drive ").Although being not shown in Fig. 5, use can be provided
In the disc driver read and write to removable non-volatile magnetic disk (such as " floppy disk "), and to removable anonvolatile optical disk
The CD drive of (such as CD-ROM, DVD-ROM or other optical mediums) read-write.In these cases, each driver can
To be connected by one or more data media interfaces with bus 18.System storage 28 may include that at least one program produces
Product, the program product have one group of (for example, at least one) program module, these program modules are configured to perform of the invention real
Apply the function of each embodiment of example.
Program/utility 40 with one group of (at least one) program module 42 can store and store in such as system
In device 28, such program module 42 includes but is not limited to operating system, one or more application program, other program modules
And program data, it may include the realization of network environment in each of these examples or certain combination.Program module 42
Usually execute the function and/or method in described embodiment of the embodiment of the present invention.
Equipment 12 can also be communicated with one or more external equipments 14 (such as keyboard, sensing equipment, display 24 etc.),
Can also be enabled a user to one or more equipment interacted with the equipment 12 communication, and/or with enable the equipment 12 with
One or more of the other any equipment (such as network interface card, modem etc.) communication for calculating equipment and being communicated.It is this logical
Letter can be carried out by input/output (I/O) interface 22.Also, equipment 12 can also by network adapter 20 and one or
The multiple networks of person (such as local area network (LAN), wide area network (WAN) and/or public network, such as internet) communication.As shown,
Network adapter 20 is communicated by bus 18 with other modules of equipment 12.It should be understood that although not shown in the drawings, can combine
Equipment 12 use other hardware and/or software module, including but not limited to: microcode, device driver, redundant processing unit,
External disk drive array, RAID system, tape drive and data backup storage system etc..
Processing unit 16 by the program that is stored in system storage 28 of operation, thereby executing various function application and
Data processing, such as realize Emotion identification method provided by the embodiment of the present invention.
Embodiment five
The embodiment of the present invention five also provides a kind of computer readable storage medium, be stored thereon with computer program (or
For computer executable instructions), Emotion identification side described in above-mentioned any embodiment can be realized when which is executed by processor
Method.
The computer storage medium of the embodiment of the present invention, can be using any of one or more computer-readable media
Combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.It is computer-readable
Storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device or
Device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: tool
There are electrical connection, the portable computer diskette, hard disk, random access memory (RAM), read-only memory of one or more conducting wires
(ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-
ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage
Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device
Using or it is in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited
In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
Can with one or more programming languages or combinations thereof come write for execute the embodiment of the present invention operation
Computer program code, described program design language include object oriented program language-such as Java,
Smalltalk, C++, further include conventional procedural programming language-such as " C " language or similar program design language
Speech.Program code can be executed fully on the user computer, partly be executed on the user computer, as an independence
Software package execute, part on the user computer part execute on the remote computer or completely in remote computer or
It is executed on server.In situations involving remote computers, remote computer can pass through the network of any kind --- packet
It includes local area network (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as benefit
It is connected with ISP by internet).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being implemented by above embodiments to the present invention
Example is described in further detail, but the embodiment of the present invention is not limited only to above embodiments, is not departing from structure of the present invention
It can also include more other equivalent embodiments in the case where think of, and the scope of the present invention is determined by scope of the appended claims
It is fixed.
Claims (12)
1. a kind of Emotion identification method characterized by comprising
Determine the fusion session characteristics of multi-modal session information;
In the multi-modal Emotion identification model that the fusion session characteristics input of the multi-modal session information is constructed in advance, obtain
The emotional characteristics of the multi-modal session information.
2. being wrapped the method according to claim 1, wherein determining the fusion session characteristics of multi-modal session information
It includes:
Respectively determine voice conversation information, text session information and image session information at least two mode session informations to
Amount indicates;
The vector expression of at least two mode session information is merged, the fusion session of multi-modal session information is obtained
The vector of feature indicates.
3. according to the method described in claim 2, it is characterized in that, the vector at least two mode session information indicates
It is merged, comprising:
According to preset mode sequence, the vector of at least two mode session information is indicated to carry out sequential concatenation.
4. according to the method described in claim 2, it is characterized in that, the vector at least two mode session information indicates
It is merged, comprising:
The nonlinear characteristic that the vector of at least two mode session information indicates is extracted respectively;
The nonlinear characteristic of at least two mode session information of extraction is merged.
5. the method according to claim 1, wherein the fusion session characteristics of the multi-modal session information are defeated
Before entering in the multi-modal Emotion identification model constructed in advance, further includes:
According to multi-modal session sample information fusion session characteristics and the multi-modal session sample information emotional characteristics,
Initial machine learning model is trained, the multi-modal Emotion identification model is obtained.
6. a kind of Emotion identification device characterized by comprising
Fusion feature determining module, for determining the fusion session characteristics of multi-modal session information;
Emotional characteristics determining module, for the fusion session characteristics of the multi-modal session information to be inputted the multimode constructed in advance
In state Emotion identification model, the emotional characteristics of the multi-modal session information are obtained.
7. device according to claim 6, which is characterized in that the fusion feature determining module includes:
Multi-modal vector determination unit, for being determined in voice conversation information, text session information and image session information respectively
The vector of at least two mode session informations indicates;
Vector determination unit is merged, merges, obtains more for the vector expression at least two mode session information
The vector of the fusion session characteristics of mode session information indicates.
8. device according to claim 7, which is characterized in that the fusion vector determination unit is specifically used for:
According to preset mode sequence, the vector of at least two mode session information is indicated to carry out sequential concatenation.
9. device according to claim 7, which is characterized in that the fusion vector determination unit also particularly useful for:
The nonlinear characteristic that the vector of at least two mode session information indicates is extracted respectively;
The nonlinear characteristic of at least two mode session information of extraction is merged.
10. device according to claim 6, which is characterized in that further include:
Identification model determining module, for according to multi-modal session sample information fusion session characteristics and the multi-modal meeting
The emotional characteristics for talking about sample information, are trained initial machine learning model, obtain the multi-modal Emotion identification model.
11. a kind of equipment, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Existing Emotion identification method according to any one of claims 1 to 5.
12. a kind of storage medium, is stored thereon with computer program, which is characterized in that the realization when program is executed by processor
Emotion identification method according to any one of claims 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810694899.XA CN108985358B (en) | 2018-06-29 | 2018-06-29 | Emotion recognition method, device, equipment and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810694899.XA CN108985358B (en) | 2018-06-29 | 2018-06-29 | Emotion recognition method, device, equipment and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108985358A true CN108985358A (en) | 2018-12-11 |
CN108985358B CN108985358B (en) | 2021-03-02 |
Family
ID=64538992
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810694899.XA Active CN108985358B (en) | 2018-06-29 | 2018-06-29 | Emotion recognition method, device, equipment and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108985358B (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110021308A (en) * | 2019-05-16 | 2019-07-16 | 北京百度网讯科技有限公司 | Voice mood recognition methods, device, computer equipment and storage medium |
CN110083716A (en) * | 2019-05-07 | 2019-08-02 | 青海大学 | Multi-modal affection computation method and system based on Tibetan language |
CN110390956A (en) * | 2019-08-15 | 2019-10-29 | 龙马智芯(珠海横琴)科技有限公司 | Emotion recognition network model, method and electronic equipment |
CN110991427A (en) * | 2019-12-25 | 2020-04-10 | 北京百度网讯科技有限公司 | Emotion recognition method and device for video and computer equipment |
CN111507402A (en) * | 2020-04-17 | 2020-08-07 | 北京声智科技有限公司 | Method, device, medium and equipment for determining response mode |
CN111681645A (en) * | 2019-02-25 | 2020-09-18 | 北京嘀嘀无限科技发展有限公司 | Emotion recognition model training method, emotion recognition device and electronic equipment |
CN111816211A (en) * | 2019-04-09 | 2020-10-23 | Oppo广东移动通信有限公司 | Emotion recognition method and device, storage medium and electronic equipment |
CN112148836A (en) * | 2020-09-07 | 2020-12-29 | 北京字节跳动网络技术有限公司 | Multi-modal information processing method, device, equipment and storage medium |
CN112183022A (en) * | 2020-09-25 | 2021-01-05 | 北京优全智汇信息技术有限公司 | Loss assessment method and device |
CN112233698A (en) * | 2020-10-09 | 2021-01-15 | 中国平安人寿保险股份有限公司 | Character emotion recognition method and device, terminal device and storage medium |
CN112347774A (en) * | 2019-08-06 | 2021-02-09 | 北京搜狗科技发展有限公司 | Model determination method and device for user emotion recognition |
CN112418034A (en) * | 2020-11-12 | 2021-02-26 | 元梦人文智能国际有限公司 | Multi-modal emotion recognition method and device, electronic equipment and storage medium |
CN113128284A (en) * | 2019-12-31 | 2021-07-16 | 上海汽车集团股份有限公司 | Multi-mode emotion recognition method and device |
CN114005468A (en) * | 2021-09-07 | 2022-02-01 | 华院计算技术(上海)股份有限公司 | Interpretable emotion recognition method and system based on global working space |
CN115496226A (en) * | 2022-09-29 | 2022-12-20 | 中国电信股份有限公司 | Multi-modal emotion analysis method, device, equipment and storage based on gradient adjustment |
WO2023226239A1 (en) * | 2022-05-24 | 2023-11-30 | 网易(杭州)网络有限公司 | Object emotion analysis method and apparatus and electronic device |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102930298A (en) * | 2012-09-02 | 2013-02-13 | 北京理工大学 | Audio visual emotion recognition method based on multi-layer boosted HMM |
US8781989B2 (en) * | 2008-01-14 | 2014-07-15 | Aptima, Inc. | Method and system to predict a data value |
CN104835507A (en) * | 2015-03-30 | 2015-08-12 | 渤海大学 | Serial-parallel combined multi-mode emotion information fusion and identification method |
CN105427869A (en) * | 2015-11-02 | 2016-03-23 | 北京大学 | Session emotion autoanalysis method based on depth learning |
CN106503805A (en) * | 2016-11-14 | 2017-03-15 | 合肥工业大学 | A kind of bimodal based on machine learning everybody talk with sentiment analysis system and method |
CN107705807A (en) * | 2017-08-24 | 2018-02-16 | 平安科技(深圳)有限公司 | Voice quality detecting method, device, equipment and storage medium based on Emotion identification |
-
2018
- 2018-06-29 CN CN201810694899.XA patent/CN108985358B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8781989B2 (en) * | 2008-01-14 | 2014-07-15 | Aptima, Inc. | Method and system to predict a data value |
CN102930298A (en) * | 2012-09-02 | 2013-02-13 | 北京理工大学 | Audio visual emotion recognition method based on multi-layer boosted HMM |
CN104835507A (en) * | 2015-03-30 | 2015-08-12 | 渤海大学 | Serial-parallel combined multi-mode emotion information fusion and identification method |
CN105427869A (en) * | 2015-11-02 | 2016-03-23 | 北京大学 | Session emotion autoanalysis method based on depth learning |
CN106503805A (en) * | 2016-11-14 | 2017-03-15 | 合肥工业大学 | A kind of bimodal based on machine learning everybody talk with sentiment analysis system and method |
CN107705807A (en) * | 2017-08-24 | 2018-02-16 | 平安科技(深圳)有限公司 | Voice quality detecting method, device, equipment and storage medium based on Emotion identification |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111681645B (en) * | 2019-02-25 | 2023-03-31 | 北京嘀嘀无限科技发展有限公司 | Emotion recognition model training method, emotion recognition device and electronic equipment |
CN111681645A (en) * | 2019-02-25 | 2020-09-18 | 北京嘀嘀无限科技发展有限公司 | Emotion recognition model training method, emotion recognition device and electronic equipment |
CN111816211A (en) * | 2019-04-09 | 2020-10-23 | Oppo广东移动通信有限公司 | Emotion recognition method and device, storage medium and electronic equipment |
CN110083716A (en) * | 2019-05-07 | 2019-08-02 | 青海大学 | Multi-modal affection computation method and system based on Tibetan language |
CN110021308A (en) * | 2019-05-16 | 2019-07-16 | 北京百度网讯科技有限公司 | Voice mood recognition methods, device, computer equipment and storage medium |
CN112347774A (en) * | 2019-08-06 | 2021-02-09 | 北京搜狗科技发展有限公司 | Model determination method and device for user emotion recognition |
CN110390956A (en) * | 2019-08-15 | 2019-10-29 | 龙马智芯(珠海横琴)科技有限公司 | Emotion recognition network model, method and electronic equipment |
CN110991427A (en) * | 2019-12-25 | 2020-04-10 | 北京百度网讯科技有限公司 | Emotion recognition method and device for video and computer equipment |
CN113128284A (en) * | 2019-12-31 | 2021-07-16 | 上海汽车集团股份有限公司 | Multi-mode emotion recognition method and device |
CN111507402A (en) * | 2020-04-17 | 2020-08-07 | 北京声智科技有限公司 | Method, device, medium and equipment for determining response mode |
CN112148836A (en) * | 2020-09-07 | 2020-12-29 | 北京字节跳动网络技术有限公司 | Multi-modal information processing method, device, equipment and storage medium |
CN112183022A (en) * | 2020-09-25 | 2021-01-05 | 北京优全智汇信息技术有限公司 | Loss assessment method and device |
CN112233698A (en) * | 2020-10-09 | 2021-01-15 | 中国平安人寿保险股份有限公司 | Character emotion recognition method and device, terminal device and storage medium |
CN112233698B (en) * | 2020-10-09 | 2023-07-25 | 中国平安人寿保险股份有限公司 | Character emotion recognition method, device, terminal equipment and storage medium |
CN112418034A (en) * | 2020-11-12 | 2021-02-26 | 元梦人文智能国际有限公司 | Multi-modal emotion recognition method and device, electronic equipment and storage medium |
CN112418034B (en) * | 2020-11-12 | 2024-08-20 | 上海元梦智能科技有限公司 | Multi-mode emotion recognition method and device, electronic equipment and storage medium |
CN114005468A (en) * | 2021-09-07 | 2022-02-01 | 华院计算技术(上海)股份有限公司 | Interpretable emotion recognition method and system based on global working space |
WO2023226239A1 (en) * | 2022-05-24 | 2023-11-30 | 网易(杭州)网络有限公司 | Object emotion analysis method and apparatus and electronic device |
CN115496226A (en) * | 2022-09-29 | 2022-12-20 | 中国电信股份有限公司 | Multi-modal emotion analysis method, device, equipment and storage based on gradient adjustment |
CN115496226B (en) * | 2022-09-29 | 2024-08-09 | 中国电信股份有限公司 | Multi-modal emotion analysis method, device, equipment and storage based on gradient adjustment |
Also Published As
Publication number | Publication date |
---|---|
CN108985358B (en) | 2021-03-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108985358A (en) | Emotion identification method, apparatus, equipment and storage medium | |
JP7432556B2 (en) | Methods, devices, equipment and media for man-machine interaction | |
Zhao et al. | Exploring spatio-temporal representations by integrating attention-based bidirectional-LSTM-RNNs and FCNs for speech emotion recognition | |
CN107657017B (en) | Method and apparatus for providing voice service | |
CN109003624B (en) | Emotion recognition method and device, computer equipment and storage medium | |
CN114694076A (en) | Multi-modal emotion analysis method based on multi-task learning and stacked cross-modal fusion | |
WO2020253509A1 (en) | Situation- and emotion-oriented chinese speech synthesis method, device, and storage medium | |
CN108326855A (en) | A kind of exchange method of robot, device, equipment and storage medium | |
CN109036405A (en) | Voice interactive method, device, equipment and storage medium | |
CN108922564B (en) | Emotion recognition method and device, computer equipment and storage medium | |
CN111862977A (en) | Voice conversation processing method and system | |
US10956480B2 (en) | System and method for generating dialogue graphs | |
CN107481720A (en) | A kind of explicit method for recognizing sound-groove and device | |
CN110262665A (en) | Method and apparatus for output information | |
CN109034203A (en) | Training, expression recommended method, device, equipment and the medium of expression recommended models | |
CN112527962A (en) | Intelligent response method and device based on multi-mode fusion, machine readable medium and equipment | |
CN114880441A (en) | Visual content generation method, device, system, equipment and medium | |
CN112765971B (en) | Text-to-speech conversion method and device, electronic equipment and storage medium | |
CN111159358A (en) | Multi-intention recognition training and using method and device | |
CN115577161A (en) | Multi-mode emotion analysis model fusing emotion resources | |
US20200234181A1 (en) | Implementing training of a machine learning model for embodied conversational agent | |
CN109408834A (en) | Auxiliary machinery interpretation method, device, equipment and storage medium | |
CN115222857A (en) | Method, apparatus, electronic device and computer readable medium for generating avatar | |
US10559298B2 (en) | Discussion model generation system and method | |
CN116403601A (en) | Emotion recognition model training method, emotion recognition device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |