CN108039181A - The emotion information analysis method and device of a kind of voice signal - Google Patents

The emotion information analysis method and device of a kind of voice signal Download PDF

Info

Publication number
CN108039181A
CN108039181A CN201711065483.3A CN201711065483A CN108039181A CN 108039181 A CN108039181 A CN 108039181A CN 201711065483 A CN201711065483 A CN 201711065483A CN 108039181 A CN108039181 A CN 108039181A
Authority
CN
China
Prior art keywords
expressed
information
emotion
emotion information
voice signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711065483.3A
Other languages
Chinese (zh)
Other versions
CN108039181B (en
Inventor
王富田
李健
张连毅
武卫东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING INFOQUICK SINOVOICE SPEECH TECHNOLOGY CORP
Beijing Sinovoice Technology Co Ltd
Original Assignee
BEIJING INFOQUICK SINOVOICE SPEECH TECHNOLOGY CORP
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING INFOQUICK SINOVOICE SPEECH TECHNOLOGY CORP filed Critical BEIJING INFOQUICK SINOVOICE SPEECH TECHNOLOGY CORP
Priority to CN201711065483.3A priority Critical patent/CN108039181B/en
Publication of CN108039181A publication Critical patent/CN108039181A/en
Application granted granted Critical
Publication of CN108039181B publication Critical patent/CN108039181B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

An embodiment of the present invention provides the emotion information analysis method and device of a kind of voice signal, when its method is included in the emotion information expressed by the voice signal that analysis user sends, text message and speech parameter information in voice signal are extracted;Text emotion is carried out to text information to analyze to obtain the emotion information expressed by text information, and speech emotional is carried out to the speech parameter information and analyzes to obtain the emotion information expressed by the speech parameter;The emotion information expressed by emotion information and the speech parameter information according to expressed by text information obtains the expressed emotion information of the voice signal.The embodiment of the present invention can improve the accuracy of the emotion information expressed by definite voice signal.

Description

The emotion information analysis method and device of a kind of voice signal
Technical field
The present invention relates to field of computer technology, more particularly to the emotion information analysis method and dress of a kind of voice signal Put.
Background technology
People can express various emotion informations when speaking, such as, glad, angry, shock, it is sad and in Property etc..
With the rapid development of technology, intelligent sound interactive terminal is widely used, and more and more enterprises utilize Intelligent sound interactive terminal services to provide a user, in order to improve service quality during service is provided a user, intelligence Energy interactive voice terminal generally requires the voice signal emotion to be expressed that analysis user sends.
Wherein, in the prior art, the voice signal that intelligent sound interactive terminal can be sent according to user analyzes the sound Emotion information expressed by sound signal, such as size, intonation and word speed of sound when being spoken by user etc. determine use Emotion information expressed by family.For example, user is very angry at this time, " your this way is said with loud, quick and high intonation Make the people very indignant " with the angry emotion information of expression, sound when intelligent sound interactive terminal says this word according to user is big It is very angry at this time that small, word speed and intonation analyze user.
However, it is found by the inventors that if user is very angry at this time, but user but says " you with the more tranquil tone This way makes the people very indignant ", the not up to angry standard of sound size, intonation and word speed when being spoken due to user, Therefore, the emotion information expressed by the words will not be determined as anger by intelligent sound interactive terminal, and is likely in being determined as Property, so as to definite mistake occur, cause to determine that the accuracy of emotion information expressed by voice signal that user sends is relatively low.
The content of the invention
The embodiment of the present invention the technical problem to be solved is that:Determine the emotion letter expressed by the voice signal that user sends The accuracy of breath is relatively low.
In order to improve the accuracy of the emotion information expressed by the voice signal that definite user sends, the embodiment of the present invention carries A kind of sentiment analysis method and apparatus of voice signal are supplied.
In a first aspect, an embodiment of the present invention provides a kind of sentiment analysis method of voice signal, the described method includes:
Extract the text message and speech parameter information in voice signal;
Text emotion is carried out to the text message to analyze to obtain the emotion information expressed by the text message;
The emotion that speech emotional is analyzed to obtain expressed by the speech parameter information is carried out to the speech parameter information to believe Breath;
The emotion information expressed by emotion information and the speech parameter information according to expressed by the text message obtains Take the emotion information expressed by the voice signal.
Wherein, the emotion for analyzing to obtain expressed by the text message to text message progress text emotion is believed Breath, including:
Text emotion analysis is carried out to the text message using LSTM algorithms, is obtained each expressed by the text message The probable value of a emotion information.
Wherein, it is described that speech parameter information progress speech emotional is analyzed to obtain expressed by the speech parameter information Emotion information, including:
Speech emotional analysis is carried out to the speech parameter using CNN algorithms, is obtained each expressed by the speech parameter The probable value of a emotion information.
Wherein, the emotion information according to expressed by the text message and the feelings expressed by the speech parameter information Feel the emotion information expressed by voice signal described in acquisition of information, including:
For each emotion information, the probable value of the emotion information according to expressed by the text message with it is described The probable value of the emotion information expressed by speech parameter information, calculates the emotion information expressed by the voice signal Combined chance value;
The highest emotion information of combined chance value is determined as to the expressed emotion information of the voice signal.
Wherein, the probable value of the emotion information according to expressed by the text message is believed with the speech parameter The probable value of the expressed emotion information of breath, including:
Calculate between the probable value of the emotion information expressed by the text message and pre-set text emotion coefficient First product;
Calculate the probable value of the emotion information expressed by the speech parameter information and default speech emotional coefficient it Between the second product;
Calculate the 3rd product between first product and the default matrix-vector of the emotion information;
Calculate the 4th product between second product and the default matrix-vector of the emotion information;
The synthesis of the emotion according to expressed by the 3rd product, the 4th product obtain the voice signal is general Rate value.
Second aspect, an embodiment of the present invention provides a kind of emotion information analytical equipment of voice signal, described device bag Include:
Extraction module, for extracting text message and speech parameter information in voice signal;
First analysis module, analyzes to obtain expressed by the text message for carrying out text emotion to the text message Emotion information;
Second analysis module, analyzes to obtain the speech parameter letter for carrying out the speech parameter information speech emotional The expressed emotion information of breath;
Acquisition module, for expressed by the emotion information according to expressed by the text message and the speech parameter information Emotion information obtain emotion information expressed by the voice signal.
Wherein, first analysis module is specifically used for:Text emotion is carried out to the text message using LSTM algorithms Analysis, obtains the probable value of each emotion information expressed by the text message.
Wherein, second analysis module is specifically used for:Speech emotional point is carried out to the speech parameter using CNN algorithms Analysis, obtains the probable value of each emotion information expressed by the speech parameter.
Wherein, the acquisition module includes:
Computing unit, for for each emotion information, the emotion information according to expressed by the text message Probable value and the speech parameter information expressed by the emotion information probable value, calculate expressed by the voice signal The emotion information combined chance value;
Determination unit, for the highest emotion information of combined chance value to be determined as to the expressed feelings of the voice signal Feel information.
Wherein, the computing unit includes:
First computation subunit, for calculating the probable value of the emotion information expressed by the text message with presetting The first product between text emotion coefficient;
Second computation subunit, for calculate the probable value of the emotion information expressed by the speech parameter information with The second product between default speech emotional coefficient;
3rd computation subunit, for calculating between first product and the default matrix-vector of the emotion information 3rd product;
4th computation subunit, for calculating between second product and the default matrix-vector of the emotion information 4th product;
Subelement is obtained, expressed by obtaining the voice signal according to the 3rd product, the 4th product The combined chance value of the emotion information.
Compared with prior art, the embodiment of the present invention includes advantages below:
In embodiments of the present invention, in the emotion information expressed by the voice signal that analysis user sends, sound is extracted Text message and speech parameter information in signal;Text emotion is carried out to text information to analyze to obtain text information institute table The emotion information reached, and the emotion that speech emotional is analyzed to obtain expressed by the speech parameter is carried out to the speech parameter information and is believed Breath;The emotion information expressed by emotion information and the speech parameter information according to expressed by text information obtains sound letter Number expressed emotion information.
When determining the expressed emotion information of the voice signal, the prior art is only according to big in the voice signal Small, intonation and word speed determine the expressed emotion information of the voice signal, and the embodiment of the present invention is believed according to the sound Text message and speech parameter information in number determine the expressed emotion information of the voice signal.
Compared with the prior art, the embodiment of the present invention is in addition to according to speech parameter information, in combination with text Information, more fully hereinafter analyzes the emotion information expressed by the voice signal, therefore can avoid the occurrence of the prior art In erroneous judgement situation, therefore, the embodiment of the present invention can improve the accuracy of the emotion information expressed by definite voice signal.
Brief description of the drawings
Fig. 1 is a kind of step flow chart of the emotion information analysis method embodiment of voice signal of the present invention;
Fig. 2 is a kind of structure diagram of the emotion information analytical equipment embodiment of voice signal of the present invention.
Embodiment
In order to make the foregoing objectives, features and advantages of the present invention clearer and more comprehensible, it is below in conjunction with the accompanying drawings and specific real Applying mode, the present invention is described in further detail.
Reference Fig. 1, shows a kind of step flow of the emotion information analysis method embodiment of voice signal of the present invention Figure, specifically may include steps of:
In step S101, text message and speech parameter information in voice signal are extracted;
In embodiments of the present invention, can be carried using DNN (Deep Neural Network, deep neural network) algorithm Take the text message and speech parameter information in the voice signal, alternatively, using LSTM (Long Short-Term Memory, Shot and long term memory network) algorithm and CTC (Connectionist temporal classification, the classification of connection sequential) Text message and speech parameter information in the model extraction voice signal.
Wherein, text message includes the content expressed by voice signal, for example, user says in short:" you so allow me It is very angry ", " you so allow me very angry " this eight words can be the text message of the voice signal.
Speech parameter information includes speech speed, signal-to-noise ratio, voice size, tone, average pitch, the fundamental tone of voice signal Scope and fundamental tone change etc..
In embodiments of the present invention, after voice signal is sent by the face and nose of user, in some frequencies Signal strength can reduce, for example, the signal strength of high frequency treatment reduces, and less than the signal strength at low frequency, can so cause sound Sound signal distortion, and then the accuracy of the emotion information expressed by voice signal can be reduced.Therefore, believe to improve definite sound The accuracy of emotion information expressed by number is, it is necessary to detect signal strength of the voice signal in each frequency, when detecting Signal strength can then strengthen signal strength on these frequencies when relatively low in some frequencies.
In an alternative embodiment of the invention, it is necessary to which voice signal is split as multiple short voice signals, soldier couple according to the time Multiple short voice signals carry out short signal intensive analysis, in short-term zero passage analysis, in short-term average signal strength analysis, correlation respectively Property analysis and average signal strength difference analysis, to determine the voiceless sound and voiced sound etc. in voice signal, in order to afterwards Extract the speech parameter information of the voice signal.
Secondly, the environment where when user speaks is also typically present noise, and usual noise is existing always, and sound is believed Number it is not existing always, therefore, it is necessary to detect whether there are voice signal, is detecting whether there are during voice signal, can be with The starting point and ending point of voice signal is detected using the methods of double threshold diagnostic method, and then determines the voice signal, is kept away Exempt from that excessive noise is mixed in the voice signal while is handled, it is possible to reduce the data volume of processing and time, secondly may be used also The influence brought to avoid the sentiment analysis result of the noise on analysis voice signal, to improve the emotion of voice signal parsing knot The accuracy of fruit.
In step s 102, the emotion that text emotion is analyzed to obtain expressed by text information is carried out to text information to believe Breath;
In embodiments of the present invention, sentiment analysis can be carried out to text information text using LSTM algorithms, obtains text The probable value of each emotion information expressed by this information, and the emotion information expressed by as text information.
Certainly, when carrying out text emotion analysis to text information, the embodiment of the present invention can also use other texts Sentiment analysis method, the embodiment of the present invention do not limit the text emotion to being utilized during the progress text emotion analysis of text information Analysis method.
In embodiments of the present invention, a variety of emotions can be locally located in advance in technical staff, for example, glad, angry, shake It is frightened, sad, worried and neutral etc..In this way, after by analyzing text message, can obtain expressed by text information Angry probable value, the glad probable value expressed by text information, the probable value of shock expressed by text information, Worried probable value expressed by sad probable value, text information and text information institute expressed by text information The neutral probable value of expression.
In step s 103, speech emotional is carried out to the speech parameter information to analyze to obtain the feelings expressed by the speech parameter Feel information;
In embodiments of the present invention, calculated using CNN (Convolutional Neural Network, convolutional neural networks) Method carries out speech emotional analysis to the speech parameter, obtains the probable value of each emotion information expressed by the speech parameter, and As the emotion information expressed by the speech parameter information.
For example, obtain the angry probable value expressed by the speech parameter, the glad probability expressed by the speech parameter Value, the probable value of shock expressed by the speech parameter, the sad probable value expressed by the speech parameter, the speech parameter institute Neutral probable value expressed by the worried probable value of expression and the speech parameter.
Certainly, when carrying out speech emotional analysis to the speech parameter, the embodiment of the present invention can also use other voices Sentiment analysis method, the embodiment of the present invention do not limit the speech emotional to being utilized during voice messaging progress text emotion analysis Analysis method.
In step S104, the feelings expressed by emotion information and the speech parameter information according to expressed by text information Emotion information expressed by the sense acquisition of information voice signal.
In embodiments of the present invention, can be according to this article for any one emotion in pre-set multiple emotions The probable value of the emotion information expressed by this information and the probable value of the emotion information expressed by the speech parameter information, meter Calculate the combined chance value of the emotion information expressed by the voice signal;For in pre-set multiple emotions other are each A emotion, equally performs aforesaid operations, and the synthesis that can so respectively obtain each emotion expressed by the voice signal is general The highest emotion information of combined chance value, is then determined as the expressed emotion information of the voice signal by rate value.
Wherein, the probable value of the emotion according to expressed by text information and the feelings expressed by the speech parameter information The probable value of sense, the specific steps for calculating the combined chance value of the emotion expressed by the voice signal can be by following flow Realize, including:
Calculate first between the probable value of the emotion information expressed by text information and pre-set text emotion coefficient Product;Calculate second between the probable value of the emotion information expressed by the speech parameter information and default speech emotional coefficient Product;Calculate the 3rd product between the first product and the default matrix-vector of the emotion information;Calculate the second product and the feelings Feel the 4th product between the default matrix-vector of information;According to expressed by the 3rd product, the 4th product obtain the voice signal The emotion information combined chance value.For example, by the 3rd product and the 4th product input tanh functions, sound letter is obtained The combined chance value of the emotion information expressed by number.
Wherein, in embodiments of the present invention, presetting speech emotional coefficient can be identical with pre-set text emotion coefficient, also may be used With difference.
Technical staff in advance can count the voice signal of substantial amounts of expression user feeling, count text message The weight to show emotion is each able to speech parameter, if the weight that text message can show emotion is joined more than voice The weight that number information can show emotion, then can set pre-set text emotion coefficient to be more than default speech emotional coefficient;If The weight that text message can show emotion is less than the weight that speech parameter information can show emotion, then can set default text This emotion coefficient is less than default speech emotional coefficient;If the weight that text message can show emotion is equal to speech parameter information The weight that can be showed emotion, then can set pre-set text emotion coefficient to be equal to default speech emotional coefficient.Afterwards, will set Good pre-set text emotion coefficient and default speech emotional coefficient is respectively stored in local so that in step S104 can directly from It is local to obtain pre-set text emotion coefficient and default speech emotional coefficient, then calculate the emotion letter expressed by text information The first product between the probable value and pre-set text emotion coefficient of breath;Calculate the emotion letter expressed by the speech parameter information The second product between the probable value of breath and default speech emotional coefficient;Calculate the default matrix of the second product and the emotion information The 4th product between vector;The emotion information according to expressed by the 3rd product, the 4th product obtain the voice signal it is comprehensive Close probable value.For example, by the 3rd product and the 4th product input tanh functions, the emotion expressed by the voice signal is obtained The combined chance value of information.
In embodiments of the present invention, in the emotion information expressed by the voice signal that analysis user sends, sound is extracted Text message and speech parameter information in signal;Text emotion is carried out to text information to analyze to obtain text information institute table The emotion information reached, and the emotion that speech emotional is analyzed to obtain expressed by the speech parameter is carried out to the speech parameter information and is believed Breath;The emotion information expressed by emotion information and the speech parameter information according to expressed by text information obtains sound letter Number expressed emotion information.
When determining the expressed emotion information of the voice signal, the prior art is only according to big in the voice signal Small, intonation and word speed determine the expressed emotion information of the voice signal, and the embodiment of the present invention is believed according to the sound Text message and speech parameter information in number determine the expressed emotion information of the voice signal.
Compared with the prior art, the embodiment of the present invention is in addition to according to speech parameter information, in combination with text Information, more fully hereinafter analyzes the emotion information expressed by the voice signal, therefore can avoid the occurrence of the prior art In erroneous judgement situation, therefore, the embodiment of the present invention can improve the accuracy of the emotion information expressed by definite voice signal.
It should be noted that for embodiment of the method, in order to be briefly described, therefore it is all expressed as to a series of action group Close, but those skilled in the art should know, the embodiment of the present invention and from the limitation of described sequence of movement, because according to According to the embodiment of the present invention, some steps can use other orders or be carried out at the same time.Secondly, those skilled in the art also should Know, embodiment described in this description belongs to preferred embodiment, and the involved action not necessarily present invention is implemented Necessary to example.
With reference to Fig. 2, a kind of structure diagram of the emotion information analytical equipment embodiment of voice signal of the present invention, tool are shown Body can include following module:
Extraction module 11, for extracting text message and speech parameter information in voice signal;
First analysis module 12, analyzes to obtain text message institute table for carrying out text emotion to the text message The emotion information reached;
Second analysis module 13, analyzes to obtain the speech parameter for carrying out speech emotional to the speech parameter information Emotion information expressed by information;
Acquisition module 14, for the emotion information according to expressed by the text message and speech parameter information institute table The emotion information reached obtains the emotion information expressed by the voice signal.
Wherein, first analysis module 12 is specifically used for:Using shot and long term memory network LSTM algorithms to the text Information carries out text emotion analysis, obtains the probable value of each emotion information expressed by the text message.
Wherein, second analysis module 13 is specifically used for:Using convolutional neural networks CNN algorithms to the speech parameter Speech emotional analysis is carried out, obtains the probable value of each emotion information expressed by the speech parameter.
Wherein, the acquisition module 14 includes:
Computing unit, for for each emotion information, the emotion information according to expressed by the text message Probable value and the speech parameter information expressed by the emotion information probable value, calculate expressed by the voice signal The emotion information combined chance value;
Determination unit, for the highest emotion information of combined chance value to be determined as to the expressed feelings of the voice signal Feel information.
Wherein, the computing unit includes:
First computation subunit, for calculating the probable value of the emotion information expressed by the text message with presetting The first product between text emotion coefficient;
Second computation subunit, for calculate the probable value of the emotion information expressed by the speech parameter information with The second product between default speech emotional coefficient;
3rd computation subunit, for calculating between first product and the default matrix-vector of the emotion information 3rd product;
4th computation subunit, for calculating between second product and the default matrix-vector of the emotion information 4th product;
Subelement is obtained, expressed by obtaining the voice signal according to the 3rd product, the 4th product The combined chance value of the emotion information.
In embodiments of the present invention, in the emotion information expressed by the voice signal that analysis user sends, sound is extracted Text message and speech parameter information in signal;Text emotion is carried out to text information to analyze to obtain text information institute table The emotion information reached, and the emotion that speech emotional is analyzed to obtain expressed by the speech parameter is carried out to the speech parameter information and is believed Breath;The emotion information expressed by emotion information and the speech parameter information according to expressed by text information obtains sound letter Number expressed emotion information.
When determining the expressed emotion information of the voice signal, the prior art is only according to big in the voice signal Small, intonation and word speed determine the expressed emotion information of the voice signal, and the embodiment of the present invention is believed according to the sound Text message and speech parameter information in number determine the expressed emotion information of the voice signal.
Compared with the prior art, the embodiment of the present invention is in addition to according to speech parameter information, in combination with text Information, more fully hereinafter analyzes the emotion information expressed by voice signal, therefore can avoid the occurrence of in the prior art Erroneous judgement situation, therefore, the embodiment of the present invention can improve the accuracy of the emotion information expressed by definite voice signal.
For device embodiment, since it is substantially similar to embodiment of the method, so description is fairly simple, it is related Part illustrates referring to the part of embodiment of the method.
Each embodiment in this specification is described by the way of progressive, what each embodiment stressed be with The difference of other embodiment, between each embodiment identical similar part mutually referring to.
It should be understood by those skilled in the art that, the embodiment of the embodiment of the present invention can be provided as method, apparatus or calculate Machine program product.Therefore, the embodiment of the present invention can use complete hardware embodiment, complete software embodiment or combine software and The form of the embodiment of hardware aspect.Moreover, the embodiment of the present invention can use one or more wherein include computer can With in the computer-usable storage medium (including but not limited to magnetic disk storage, CD-ROM, optical memory etc.) of program code The form of the computer program product of implementation.
The embodiment of the present invention be with reference to according to the method for the embodiment of the present invention, terminal device (system) and computer program The flowchart and/or the block diagram of product describes.It should be understood that it can realize flowchart and/or the block diagram by computer program instructions In each flow and/or block and flowchart and/or the block diagram in flow and/or square frame combination.These can be provided Computer program instructions are set to all-purpose computer, special purpose computer, Embedded Processor or other programmable data processing terminals Standby processor is to produce a machine so that is held by the processor of computer or other programmable data processing terminal equipments Capable instruction is produced and is used for realization in one flow of flow chart or multiple flows and/or one square frame of block diagram or multiple square frames The device for the function of specifying.
These computer program instructions, which may also be stored in, can guide computer or other programmable data processing terminal equipments In the computer-readable memory to work in a specific way so that the instruction being stored in the computer-readable memory produces bag The manufacture of command device is included, which realizes in one flow of flow chart or multiple flows and/or one side of block diagram The function of being specified in frame or multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing terminal equipments so that Series of operation steps is performed on computer or other programmable terminal equipments to produce computer implemented processing, so that The instruction performed on computer or other programmable terminal equipments is provided and is used for realization in one flow of flow chart or multiple flows And/or specified in one square frame of block diagram or multiple square frames function the step of.
Although having been described for the preferred embodiment of the embodiment of the present invention, those skilled in the art once know base This creative concept, then can make these embodiments other change and modification.So appended claims are intended to be construed to Including preferred embodiment and fall into all change and modification of range of embodiment of the invention.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or order.Moreover, term " comprising ", "comprising" or its any other variant meaning Covering non-exclusive inclusion, so that process, method, article or terminal device including a series of elements are not only wrapped Those key elements are included, but also including other elements that are not explicitly listed, or further include as this process, method, article Or the key element that terminal device is intrinsic.In the absence of more restrictions, wanted by what sentence "including a ..." limited Element, it is not excluded that also there are other identical element in the process including the key element, method, article or terminal device.
Above to the emotion information analysis method and device of a kind of voice signal provided by the present invention, detailed Jie has been carried out Continue, specific case used herein is set forth the principle of the present invention and embodiment, and the explanation of above example is only It is the method and its core concept for being used to help understand the present invention;Meanwhile for those of ordinary skill in the art, according to this hair Bright thought, there will be changes in specific embodiments and applications, in conclusion this specification content should not manage Solve as limitation of the present invention.

Claims (10)

  1. A kind of 1. emotion information analysis method of voice signal, it is characterised in that the described method includes:
    Extract the text message and speech parameter information in voice signal;
    Text emotion is carried out to the text message to analyze to obtain the emotion information expressed by the text message;
    Speech emotional is carried out to the speech parameter information to analyze to obtain the emotion information expressed by the speech parameter information;
    Emotion information expressed by emotion information and the speech parameter information according to expressed by the text message obtains institute State the emotion information expressed by voice signal.
  2. 2. according to the method described in claim 1, it is characterized in that, described analyze text message progress text emotion To the emotion information expressed by the text message, including:
    Text emotion analysis is carried out to the text message using shot and long term memory network LSTM algorithms, obtains the text message The probable value of expressed each emotion information.
  3. 3. according to the method described in claim 2, it is characterized in that, described carry out speech emotional point to the speech parameter information Analysis obtains the emotion information expressed by the speech parameter information, including:
    Speech emotional analysis is carried out to the speech parameter using convolutional neural networks CNN algorithms, obtains the speech parameter institute The probable value of each emotion information of expression.
  4. 4. the according to the method described in claim 3, it is characterized in that, emotion information according to expressed by the text message The emotion information expressed by the voice signal is obtained with the emotion information expressed by the speech parameter information, including:
    For each emotion information, probable value and the voice of the emotion information according to expressed by the text message The probable value of the emotion information expressed by parameter information, calculates the comprehensive of the emotion information expressed by the voice signal Close probable value;
    The highest emotion information of combined chance value is determined as to the expressed emotion information of the voice signal.
  5. 5. the according to the method described in claim 4, it is characterized in that, emotion according to expressed by the text message The probable value of information and the probable value of the emotion information expressed by the speech parameter information, including:
    Calculate first between the probable value of the emotion information expressed by the text message and pre-set text emotion coefficient Product;
    Calculate between the probable value of the emotion information expressed by the speech parameter information and default speech emotional coefficient Second product;
    Calculate the 3rd product between first product and the default matrix-vector of the emotion information;
    Calculate the 4th product between second product and the default matrix-vector of the emotion information;
    The synthesis of the emotion information according to expressed by the 3rd product, the 4th product obtain the voice signal is general Rate value.
  6. 6. the emotion information analytical equipment of a kind of voice signal, it is characterised in that described device includes:
    Extraction module, for extracting text message and speech parameter information in voice signal;
    First analysis module, analyzes to obtain the feelings expressed by the text message for carrying out text emotion to the text message Feel information;
    Second analysis module, analyzes to obtain the speech parameter information institute for carrying out speech emotional to the speech parameter information The emotion information of expression;
    Acquisition module, for the emotion information according to expressed by the text message and the feelings expressed by the speech parameter information Feel the emotion information expressed by voice signal described in acquisition of information.
  7. 7. device according to claim 6, it is characterised in that first analysis module is specifically used for:Utilize shot and long term Memory network LSTM algorithms carry out text emotion analysis to the text message, obtain each feelings expressed by the text message Feel the probable value of information.
  8. 8. device according to claim 7, it is characterised in that second analysis module is specifically used for:Utilize convolution god Speech emotional analysis is carried out to the speech parameter through network C NN algorithms, obtains each emotion expressed by the speech parameter The probable value of information.
  9. 9. device according to claim 8, it is characterised in that the acquisition module includes:
    Computing unit, for for each emotion information, the emotion information according to expressed by the text message it is general Rate value and the probable value of the emotion information expressed by the speech parameter information, calculate the institute expressed by the voice signal State the combined chance value of emotion information;
    Determination unit, the expressed emotion for the highest emotion information of combined chance value to be determined as to the voice signal are believed Breath.
  10. 10. device according to claim 9, it is characterised in that the computing unit includes:
    First computation subunit, for calculating the probable value and pre-set text of the emotion information expressed by the text message The first product between emotion coefficient;
    Second computation subunit, for calculating the probable value of the emotion information expressed by the speech parameter information with presetting The second product between speech emotional coefficient;
    3rd computation subunit, for calculating the 3rd between first product and the default matrix-vector of the emotion information Product;
    4th computation subunit, for calculating the 4th between second product and the default matrix-vector of the emotion information Product;
    Subelement is obtained, for according to expressed by the 3rd product, the 4th product acquisition voice signal The combined chance value of emotion information.
CN201711065483.3A 2017-11-02 2017-11-02 Method and device for analyzing emotion information of sound signal Active CN108039181B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711065483.3A CN108039181B (en) 2017-11-02 2017-11-02 Method and device for analyzing emotion information of sound signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711065483.3A CN108039181B (en) 2017-11-02 2017-11-02 Method and device for analyzing emotion information of sound signal

Publications (2)

Publication Number Publication Date
CN108039181A true CN108039181A (en) 2018-05-15
CN108039181B CN108039181B (en) 2021-02-12

Family

ID=62092727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711065483.3A Active CN108039181B (en) 2017-11-02 2017-11-02 Method and device for analyzing emotion information of sound signal

Country Status (1)

Country Link
CN (1) CN108039181B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109192225A (en) * 2018-09-28 2019-01-11 清华大学 The method and device of speech emotion recognition and mark
CN109243492A (en) * 2018-10-28 2019-01-18 国家计算机网络与信息安全管理中心 A kind of speech emotion recognition system and recognition methods
CN110390956A (en) * 2019-08-15 2019-10-29 龙马智芯(珠海横琴)科技有限公司 Emotion recognition network model, method and electronic equipment
CN110570879A (en) * 2019-09-11 2019-12-13 深圳壹账通智能科技有限公司 Intelligent conversation method and device based on emotion recognition and computer equipment
CN110675859A (en) * 2019-09-05 2020-01-10 华南理工大学 Multi-emotion recognition method, system, medium, and apparatus combining speech and text
WO2021139108A1 (en) * 2020-01-10 2021-07-15 平安科技(深圳)有限公司 Intelligent emotion recognition method and apparatus, electronic device, and storage medium

Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100145695A1 (en) * 2008-12-08 2010-06-10 Electronics And Telecommunications Research Institute Apparatus for context awareness and method using the same
CN102623009A (en) * 2012-03-02 2012-08-01 安徽科大讯飞信息技术股份有限公司 Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis
CN102819744A (en) * 2012-06-29 2012-12-12 北京理工大学 Emotion recognition method with information of two channels fused
US20130268273A1 (en) * 2012-04-10 2013-10-10 Oscal Tzyh-Chiang Chen Method of recognizing gender or age of a speaker according to speech emotion or arousal
CN103456314A (en) * 2013-09-03 2013-12-18 广州创维平面显示科技有限公司 Emotion recognition method and device
CN103810994A (en) * 2013-09-05 2014-05-21 江苏大学 Method and system for voice emotion inference on basis of emotion context
CN104200804A (en) * 2014-09-19 2014-12-10 合肥工业大学 Various-information coupling emotion recognition method for human-computer interaction
CN105334743A (en) * 2015-11-18 2016-02-17 深圳创维-Rgb电子有限公司 Intelligent home control method and system based on emotion recognition
CN105427869A (en) * 2015-11-02 2016-03-23 北京大学 Session emotion autoanalysis method based on depth learning
CN105976809A (en) * 2016-05-25 2016-09-28 中国地质大学(武汉) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion
CN106128479A (en) * 2016-06-30 2016-11-16 福建星网视易信息系统有限公司 A kind of performance emotion identification method and device
CN106297826A (en) * 2016-08-18 2017-01-04 竹间智能科技(上海)有限公司 Speech emotional identification system and method
CN106297783A (en) * 2016-08-05 2017-01-04 易晓阳 A kind of interactive voice identification intelligent terminal
CN106598948A (en) * 2016-12-19 2017-04-26 杭州语忆科技有限公司 Emotion recognition method based on long-term and short-term memory neural network and by combination with autocoder
CN106782615A (en) * 2016-12-20 2017-05-31 科大讯飞股份有限公司 Speech data emotion detection method and apparatus and system
CN106782602A (en) * 2016-12-01 2017-05-31 南京邮电大学 Speech-emotion recognition method based on length time memory network and convolutional neural networks
CN106847309A (en) * 2017-01-09 2017-06-13 华南理工大学 A kind of speech-emotion recognition method
CN107038154A (en) * 2016-11-25 2017-08-11 阿里巴巴集团控股有限公司 A kind of text emotion recognition methods and device
US20170278067A1 (en) * 2016-03-25 2017-09-28 International Business Machines Corporation Monitoring activity to detect potential user actions
CN107247702A (en) * 2017-05-05 2017-10-13 桂林电子科技大学 A kind of text emotion analysis and processing method and system
CN107272607A (en) * 2017-05-11 2017-10-20 上海斐讯数据通信技术有限公司 A kind of intelligent home control system and method
CN107291696A (en) * 2017-06-28 2017-10-24 达而观信息科技(上海)有限公司 A kind of comment word sentiment analysis method and system based on deep learning

Patent Citations (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100145695A1 (en) * 2008-12-08 2010-06-10 Electronics And Telecommunications Research Institute Apparatus for context awareness and method using the same
CN102623009A (en) * 2012-03-02 2012-08-01 安徽科大讯飞信息技术股份有限公司 Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis
US20130268273A1 (en) * 2012-04-10 2013-10-10 Oscal Tzyh-Chiang Chen Method of recognizing gender or age of a speaker according to speech emotion or arousal
CN102819744A (en) * 2012-06-29 2012-12-12 北京理工大学 Emotion recognition method with information of two channels fused
CN103456314A (en) * 2013-09-03 2013-12-18 广州创维平面显示科技有限公司 Emotion recognition method and device
CN103810994A (en) * 2013-09-05 2014-05-21 江苏大学 Method and system for voice emotion inference on basis of emotion context
CN104200804A (en) * 2014-09-19 2014-12-10 合肥工业大学 Various-information coupling emotion recognition method for human-computer interaction
CN105427869A (en) * 2015-11-02 2016-03-23 北京大学 Session emotion autoanalysis method based on depth learning
CN105334743A (en) * 2015-11-18 2016-02-17 深圳创维-Rgb电子有限公司 Intelligent home control method and system based on emotion recognition
US20170278067A1 (en) * 2016-03-25 2017-09-28 International Business Machines Corporation Monitoring activity to detect potential user actions
CN105976809A (en) * 2016-05-25 2016-09-28 中国地质大学(武汉) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion
CN106128479A (en) * 2016-06-30 2016-11-16 福建星网视易信息系统有限公司 A kind of performance emotion identification method and device
CN106297783A (en) * 2016-08-05 2017-01-04 易晓阳 A kind of interactive voice identification intelligent terminal
CN106297826A (en) * 2016-08-18 2017-01-04 竹间智能科技(上海)有限公司 Speech emotional identification system and method
CN107038154A (en) * 2016-11-25 2017-08-11 阿里巴巴集团控股有限公司 A kind of text emotion recognition methods and device
CN106782602A (en) * 2016-12-01 2017-05-31 南京邮电大学 Speech-emotion recognition method based on length time memory network and convolutional neural networks
CN106598948A (en) * 2016-12-19 2017-04-26 杭州语忆科技有限公司 Emotion recognition method based on long-term and short-term memory neural network and by combination with autocoder
CN106782615A (en) * 2016-12-20 2017-05-31 科大讯飞股份有限公司 Speech data emotion detection method and apparatus and system
CN106847309A (en) * 2017-01-09 2017-06-13 华南理工大学 A kind of speech-emotion recognition method
CN107247702A (en) * 2017-05-05 2017-10-13 桂林电子科技大学 A kind of text emotion analysis and processing method and system
CN107272607A (en) * 2017-05-11 2017-10-20 上海斐讯数据通信技术有限公司 A kind of intelligent home control system and method
CN107291696A (en) * 2017-06-28 2017-10-24 达而观信息科技(上海)有限公司 A kind of comment word sentiment analysis method and system based on deep learning

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
徐莹莹: "基于深度神经网络模型的句子级文本情感分类研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
曹宇慧: "基于深度学习的文本情感分析研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
蔡慧苹: "基于word embedding和CNN的情感分析类型", 《计算机应用研究》 *
谢坷珍: "融合人脸表情和语音的双模态情感识别研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
陈雷: "面向股票评论的情感分析系统研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109192225A (en) * 2018-09-28 2019-01-11 清华大学 The method and device of speech emotion recognition and mark
CN109243492A (en) * 2018-10-28 2019-01-18 国家计算机网络与信息安全管理中心 A kind of speech emotion recognition system and recognition methods
CN110390956A (en) * 2019-08-15 2019-10-29 龙马智芯(珠海横琴)科技有限公司 Emotion recognition network model, method and electronic equipment
CN110675859A (en) * 2019-09-05 2020-01-10 华南理工大学 Multi-emotion recognition method, system, medium, and apparatus combining speech and text
CN110675859B (en) * 2019-09-05 2021-11-23 华南理工大学 Multi-emotion recognition method, system, medium, and apparatus combining speech and text
CN110570879A (en) * 2019-09-11 2019-12-13 深圳壹账通智能科技有限公司 Intelligent conversation method and device based on emotion recognition and computer equipment
WO2021047180A1 (en) * 2019-09-11 2021-03-18 深圳壹账通智能科技有限公司 Emotion recognition-based smart chat method, device, and computer apparatus
WO2021139108A1 (en) * 2020-01-10 2021-07-15 平安科技(深圳)有限公司 Intelligent emotion recognition method and apparatus, electronic device, and storage medium

Also Published As

Publication number Publication date
CN108039181B (en) 2021-02-12

Similar Documents

Publication Publication Date Title
CN108039181A (en) The emotion information analysis method and device of a kind of voice signal
CN108630193B (en) Voice recognition method and device
JP6755304B2 (en) Information processing device
WO2021128741A1 (en) Voice emotion fluctuation analysis method and apparatus, and computer device and storage medium
KR102413692B1 (en) Apparatus and method for caculating acoustic score for speech recognition, speech recognition apparatus and method, and electronic device
CN109545192B (en) Method and apparatus for generating a model
CN109545193B (en) Method and apparatus for generating a model
WO2020253128A1 (en) Voice recognition-based communication service method, apparatus, computer device, and storage medium
CN112581938B (en) Speech breakpoint detection method, device and equipment based on artificial intelligence
CN110570853A (en) Intention recognition method and device based on voice data
CN110992942B (en) Voice recognition method and device for voice recognition
CN107919137A (en) The long-range measures and procedures for the examination and approval, device, equipment and readable storage medium storing program for executing
US10971149B2 (en) Voice interaction system for interaction with a user by voice, voice interaction method, and program
CN112992191B (en) Voice endpoint detection method and device, electronic equipment and readable storage medium
CN114127849A (en) Speech emotion recognition method and device
CN109994126A (en) Audio message segmentation method, device, storage medium and electronic equipment
CN113096647B (en) Voice model training method and device and electronic equipment
CN109215647A (en) Voice awakening method, electronic equipment and non-transient computer readable storage medium
CN108877779B (en) Method and device for detecting voice tail point
CN111667834B (en) Hearing-aid equipment and hearing-aid method
CN112735385A (en) Voice endpoint detection method and device, computer equipment and storage medium
CN107910021A (en) A kind of symbol insertion method and device
US11551707B2 (en) Speech processing method, information device, and computer program product
CN111414748A (en) Traffic data processing method and device
CN110444194A (en) A kind of speech detection method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant