CN108039181B - Method and device for analyzing emotion information of sound signal - Google Patents

Method and device for analyzing emotion information of sound signal Download PDF

Info

Publication number
CN108039181B
CN108039181B CN201711065483.3A CN201711065483A CN108039181B CN 108039181 B CN108039181 B CN 108039181B CN 201711065483 A CN201711065483 A CN 201711065483A CN 108039181 B CN108039181 B CN 108039181B
Authority
CN
China
Prior art keywords
information
emotion
expressed
text
sound signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201711065483.3A
Other languages
Chinese (zh)
Other versions
CN108039181A (en
Inventor
王富田
李健
张连毅
武卫东
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Sinovoice Technology Co Ltd
Original Assignee
Beijing Sinovoice Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Sinovoice Technology Co Ltd filed Critical Beijing Sinovoice Technology Co Ltd
Priority to CN201711065483.3A priority Critical patent/CN108039181B/en
Publication of CN108039181A publication Critical patent/CN108039181A/en
Application granted granted Critical
Publication of CN108039181B publication Critical patent/CN108039181B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Abstract

The embodiment of the invention provides a method and a device for analyzing emotion information of a sound signal, wherein the method comprises the steps of extracting text information and voice parameter information in the sound signal when analyzing emotion information expressed by the sound signal sent by a user; performing text emotion analysis on the text information to obtain emotion information expressed by the text information, and performing voice emotion analysis on the voice parameter information to obtain emotion information expressed by the voice parameter; and acquiring the expressed emotion information of the sound signal according to the emotion information expressed by the text information and the emotion information expressed by the voice parameter information. The embodiment of the invention can improve the accuracy of determining the emotion information expressed by the sound signal.

Description

Method and device for analyzing emotion information of sound signal
Technical Field
The invention relates to the technical field of computers, in particular to a method and a device for analyzing emotion information of a sound signal.
Background
When a person speaks, a variety of emotional information may be expressed, such as, happy, angry, frightened, sad, neutral, and so on.
With the rapid development of the technology, the intelligent voice interaction terminal is widely used, more and more enterprises provide services for users by using the intelligent voice interaction terminal, and in order to improve the service quality in the process of providing the services for the users, the intelligent voice interaction terminal often needs to analyze the emotion to be expressed by the voice signals sent by the users.
In the prior art, the intelligent voice interaction terminal may analyze emotion information expressed by a voice signal sent by a user according to the voice signal, for example, the emotion information expressed by the user is determined according to the size, intonation, speed, and the like of the voice when the user speaks. For example, the user is angry at this time, saying "you are angry with this practice" in loud, quick and expensive intonation to express angry emotion information, and the intelligent voice interaction terminal analyzes the user's angry at this time according to the sound size, the speech rate and the intonation at the time the user says this sentence.
However, the inventor found that if the user is angry at this time, but the user says "you are angry with this practice" in a calmer voice, because the voice size, intonation, and voice speed of the user speaking do not meet the angry standard, the intelligent voice interactive terminal will not determine the emotion information expressed in this sentence as angry, but will likely determine as neutral, and thus a determination error occurs, resulting in a low accuracy rate of determining the emotion information expressed by the voice signal uttered by the user.
Disclosure of Invention
The technical problem to be solved by the embodiment of the invention is as follows: the accuracy of the emotion information expressed by the voice signal sent by the user is determined to be low.
In order to improve the accuracy of determining emotion information expressed by a sound signal emitted by a user, the embodiment of the invention provides an emotion analysis method and device for the sound signal.
In a first aspect, an embodiment of the present invention provides an emotion analysis method for a sound signal, where the method includes:
extracting text information and voice parameter information in the sound signal;
performing text emotion analysis on the text information to obtain emotion information expressed by the text information;
performing voice emotion analysis on the voice parameter information to obtain emotion information expressed by the voice parameter information;
and acquiring the emotion information expressed by the sound signal according to the emotion information expressed by the text information and the emotion information expressed by the voice parameter information.
The obtaining of emotion information expressed by the text information by performing text emotion analysis on the text information includes:
and carrying out text emotion analysis on the text information by using an LSTM algorithm to obtain probability values of all emotion information expressed by the text information.
Wherein, the obtaining the emotion information expressed by the voice parameter information by performing voice emotion analysis on the voice parameter information includes:
and performing voice emotion analysis on the voice parameters by using a CNN algorithm to obtain probability values of all emotion information expressed by the voice parameters.
The acquiring the emotion information expressed by the sound signal according to the emotion information expressed by the text information and the emotion information expressed by the voice parameter information includes:
for each emotional information, calculating a comprehensive probability value of the emotional information expressed by the sound signal according to the probability value of the emotional information expressed by the text information and the probability value of the emotional information expressed by the voice parameter information;
and determining the emotion information with the highest comprehensive probability value as the expressed emotion information of the sound signal.
Wherein the probability value of the emotion information expressed according to the text information and the probability value of the emotion information expressed according to the voice parameter information comprises:
calculating a first product between the probability value of the emotion information expressed by the text information and a preset text emotion coefficient;
calculating a second product between the probability value of the emotion information expressed by the voice parameter information and a preset voice emotion coefficient;
calculating a third product between the first product and a preset matrix vector of the emotion information;
calculating a fourth product between the second product and a preset matrix vector of the emotion information;
and acquiring the comprehensive probability value of the emotion expressed by the sound signal according to the third product and the fourth product.
In a second aspect, an embodiment of the present invention provides an apparatus for analyzing emotion information of a sound signal, where the apparatus includes:
the extraction module is used for extracting text information and voice parameter information in the sound signal;
the first analysis module is used for carrying out text emotion analysis on the text information to obtain emotion information expressed by the text information;
the second analysis module is used for carrying out voice emotion analysis on the voice parameter information to obtain emotion information expressed by the voice parameter information;
and the obtaining module is used for obtaining the emotion information expressed by the sound signal according to the emotion information expressed by the text information and the emotion information expressed by the voice parameter information.
Wherein the first analysis module is specifically configured to: and carrying out text emotion analysis on the text information by using an LSTM algorithm to obtain probability values of all emotion information expressed by the text information.
Wherein the second analysis module is specifically configured to: and performing voice emotion analysis on the voice parameters by using a CNN algorithm to obtain probability values of all emotion information expressed by the voice parameters.
Wherein the acquisition module comprises:
a calculating unit, configured to calculate, for each piece of emotion information, a comprehensive probability value of the emotion information expressed by the sound signal according to a probability value of the emotion information expressed by the text information and a probability value of the emotion information expressed by the speech parameter information;
and the determining unit is used for determining the emotion information with the highest comprehensive probability value as the expressed emotion information of the sound signal.
Wherein the calculation unit includes:
the first calculating subunit is used for calculating a first product between the probability value of the emotion information expressed by the text information and a preset text emotion coefficient;
the second calculating subunit is used for calculating a second product between the probability value of the emotion information expressed by the voice parameter information and a preset voice emotion coefficient;
the third calculation subunit is used for calculating a third product between the first product and a preset matrix vector of the emotion information;
the fourth calculating subunit is used for calculating a fourth product between the second product and the preset matrix vector of the emotion information;
and the obtaining subunit is used for obtaining the comprehensive probability value of the emotion information expressed by the sound signal according to the third product and the fourth product.
Compared with the prior art, the embodiment of the invention has the following advantages:
in the embodiment of the invention, when the emotion information expressed by a sound signal sent by a user is analyzed, text information and voice parameter information in the sound signal are extracted; performing text emotion analysis on the text information to obtain emotion information expressed by the text information, and performing voice emotion analysis on the voice parameter information to obtain emotion information expressed by the voice parameter; and acquiring the expressed emotion information of the sound signal according to the emotion information expressed by the text information and the emotion information expressed by the voice parameter information.
In determining the expressed emotion information of the sound signal, the prior art determines the expressed emotion information of the sound signal only according to the size, intonation and speech speed in the sound signal, and the embodiment of the invention determines the expressed emotion information of the sound signal according to the text information and the speech parameter information in the sound signal.
Compared with the prior art, the method and the device have the advantages that the emotion information expressed by the sound signal is analyzed more comprehensively by combining the text information besides the voice parameter information, so that the misjudgment condition in the prior art can be avoided, and the accuracy of determining the emotion information expressed by the sound signal can be improved.
Drawings
FIG. 1 is a flowchart illustrating the steps of an embodiment of a method for emotion information analysis of a sound signal according to the present invention;
fig. 2 is a block diagram showing an embodiment of an emotion information analyzing apparatus for audio signals according to the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Referring to fig. 1, a flowchart illustrating steps of an embodiment of a method for analyzing emotion information of a sound signal according to the present invention is shown, which may specifically include the following steps:
in step S101, extracting text information and speech parameter information in the sound signal;
in the embodiment of the present invention, text information and speech parameter information in the sound signal may be extracted using a Deep Neural Network (DNN) algorithm, or text information and speech parameter information in the sound signal may be extracted using a Short-Term Memory (LSTM) algorithm and a Connection Temporal Classification (CTC) model.
The text information includes the content expressed by the sound signal, for example, the user speaks a sentence: the eight words "you are so happy," and "you are so happy" may be the text information of the sound signal.
The voice parameter information includes voice speed, signal-to-noise ratio, voice size, pitch, average pitch, pitch range, pitch variation, and the like of the voice signal.
In the embodiment of the present invention, after the sound signal is emitted through the mouth and nose of the user, the signal strength at some frequencies is reduced, for example, the signal strength at high frequencies is reduced and is lower than that at low frequencies, which may cause distortion of the sound signal, and thus may reduce the accuracy of the emotion information expressed by the sound signal. Therefore, in order to improve the accuracy of determining the emotion information expressed by a voice signal, it is necessary to detect the signal strength of the voice signal at various frequencies, and when the signal strength is detected to be low at some frequencies, the signal strength at those frequencies can be enhanced.
In another embodiment of the present invention, the sound signal needs to be split into a plurality of short sound signals according to time, and the soldier performs short-time signal strength analysis, short-time zero-crossing analysis, average signal strength analysis, short-time correlation analysis and average signal strength difference analysis on the plurality of short sound signals respectively to determine unvoiced sound, voiced sound and the like in the sound signal, so as to extract the speech parameter information of the sound signal later.
Secondly, the environment where the user speaks also has noise, the noise exists all the time, and the sound signal does not exist all the time, so whether the sound signal exists needs to be detected, when whether the sound signal exists is detected, a starting point and an ending point of the sound signal can be detected by using methods such as a double-threshold discrimination method and the like, the sound signal is further determined, excessive noise is prevented from being mixed in the sound signal to be processed at the same time, the data size and time of processing can be reduced, and the influence of the noise on the emotion analysis result of the sound signal can be avoided, so that the accuracy of the emotion analysis result of the sound signal is improved.
In step S102, performing text emotion analysis on the text information to obtain emotion information expressed by the text information;
in the embodiment of the invention, the text of the text information can be subjected to emotion analysis by using an LSTM algorithm, so that the probability value of each emotion information expressed by the text information is obtained and is used as the emotion information expressed by the text information.
Of course, when performing text emotion analysis on the text information, other text emotion analysis methods may also be adopted in the embodiments of the present invention, and the embodiments of the present invention do not limit the text emotion analysis method used when performing text emotion analysis on the text information.
In the embodiment of the present invention, the technician may set a variety of emotions locally in advance, such as happy, angry, frightened, sad, urgent, neutral, and so on. Thus, after analyzing the text information, the probability value of anger expressed by the text information, the probability value of joy expressed by the text information, the probability value of startle expressed by the text information, the probability value of sadness expressed by the text information, the probability value of urgency expressed by the text information and the probability value of neutrality expressed by the text information can be obtained.
In step S103, performing speech emotion analysis on the speech parameter information to obtain emotion information expressed by the speech parameter;
in the embodiment of the present invention, a CNN (Convolutional Neural Network) algorithm is used to perform speech emotion analysis on the speech parameter, so as to obtain probability values of each emotion information expressed by the speech parameter, and the probability values are used as emotion information expressed by the speech parameter information.
For example, a probability value of anger expressed by the voice parameter, a probability value of joy expressed by the voice parameter, a probability value of startle expressed by the voice parameter, a probability value of sadness expressed by the voice parameter, a probability value of urgency expressed by the voice parameter, and a probability value of neutrality expressed by the voice parameter are obtained.
Of course, when performing speech emotion analysis on the speech parameter, other speech emotion analysis methods may also be adopted in the embodiments of the present invention, and the embodiments of the present invention do not limit the speech emotion analysis method used when performing text emotion analysis on the speech information.
In step S104, emotion information expressed by the sound signal is acquired according to emotion information expressed by the text information and emotion information expressed by the speech parameter information.
In the embodiment of the present invention, for any one of a plurality of preset emotions, a comprehensive probability value of the emotion information expressed by the sound signal may be calculated according to a probability value of the emotion information expressed by the text information and a probability value of the emotion information expressed by the speech parameter information; the above operation is performed for each of the other emotions in the preset emotions, so that the comprehensive probability value of each emotion expressed by the sound signal can be obtained respectively, and then the emotion information with the highest comprehensive probability value is determined as the expressed emotion information of the sound signal.
The specific step of calculating the comprehensive probability value of the emotion expressed by the sound signal according to the probability value of the emotion expressed by the text information and the probability value of the emotion expressed by the voice parameter information may be implemented by the following processes, including:
calculating a first product between the probability value of the emotion information expressed by the text information and a preset text emotion coefficient; calculating a second product between the probability value of the emotion information expressed by the voice parameter information and a preset voice emotion coefficient; calculating a third product between the first product and a preset matrix vector of the emotion information; calculating a fourth product between the second product and a preset matrix vector of the emotion information; and acquiring the comprehensive probability value of the emotional information expressed by the sound signal according to the third product and the fourth product. For example, the third product and the fourth product are input into the tanh function to obtain the integrated probability value of the emotion information expressed by the sound signal.
In the embodiment of the present invention, the preset speech emotion coefficient and the preset text emotion coefficient may be the same or different.
Technical personnel can count a large number of sound signals expressing user emotion in advance, count the weight that text information and voice parameters can respectively express emotion, and set a preset text emotion coefficient to be larger than a preset voice emotion coefficient if the weight that the text information can express emotion is larger than the weight that the voice parameter information can express emotion; if the weight of the text information capable of expressing the emotion is smaller than the weight of the voice parameter information capable of expressing the emotion, the preset text emotion coefficient can be set to be smaller than the preset voice emotion coefficient; if the weight of the text information capable of expressing the emotion is equal to the weight of the voice parameter information capable of expressing the emotion, the preset text emotion coefficient can be set to be equal to the preset voice emotion coefficient. Then, storing the preset text emotion coefficients and the preset voice emotion coefficients in local respectively, so that the preset text emotion coefficients and the preset voice emotion coefficients can be directly obtained from local in step S104, and then calculating a first product between the probability value of the emotion information expressed by the text information and the preset text emotion coefficients; calculating a second product between the probability value of the emotion information expressed by the voice parameter information and a preset voice emotion coefficient; calculating a fourth product between the second product and a preset matrix vector of the emotion information; and acquiring the comprehensive probability value of the emotional information expressed by the sound signal according to the third product and the fourth product. For example, the third product and the fourth product are input into the tanh function to obtain the integrated probability value of the emotion information expressed by the sound signal.
In the embodiment of the invention, when the emotion information expressed by a sound signal sent by a user is analyzed, text information and voice parameter information in the sound signal are extracted; performing text emotion analysis on the text information to obtain emotion information expressed by the text information, and performing voice emotion analysis on the voice parameter information to obtain emotion information expressed by the voice parameter; and acquiring the expressed emotion information of the sound signal according to the emotion information expressed by the text information and the emotion information expressed by the voice parameter information.
In determining the expressed emotion information of the sound signal, the prior art determines the expressed emotion information of the sound signal only according to the size, intonation and speech speed in the sound signal, and the embodiment of the invention determines the expressed emotion information of the sound signal according to the text information and the speech parameter information in the sound signal.
Compared with the prior art, the method and the device have the advantages that the emotion information expressed by the sound signal is analyzed more comprehensively by combining the text information besides the voice parameter information, so that the misjudgment condition in the prior art can be avoided, and the accuracy of determining the emotion information expressed by the sound signal can be improved.
It should be noted that, for simplicity of description, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present invention is not limited by the illustrated order of acts, as some steps may occur in other orders or concurrently in accordance with the embodiments of the present invention. Further, those skilled in the art will appreciate that the embodiments described in the specification are presently preferred and that no particular act is required to implement the invention.
Referring to fig. 2, a block diagram of an embodiment of an emotion information analyzing apparatus for a sound signal according to the present invention is shown, and may specifically include the following modules:
the extraction module 11 is used for extracting text information and voice parameter information in the sound signal;
the first analysis module 12 is configured to perform text emotion analysis on the text information to obtain emotion information expressed by the text information;
the second analysis module 13 is configured to perform speech emotion analysis on the speech parameter information to obtain emotion information expressed by the speech parameter information;
and the obtaining module 14 is configured to obtain emotion information expressed by the sound signal according to the emotion information expressed by the text information and the emotion information expressed by the voice parameter information.
Wherein the first analysis module 12 is specifically configured to: and carrying out text emotion analysis on the text information by using a long-short term memory network (LSTM) algorithm to obtain probability values of all emotion information expressed by the text information.
Wherein the second analysis module 13 is specifically configured to: and performing voice emotion analysis on the voice parameters by using a Convolutional Neural Network (CNN) algorithm to obtain probability values of all emotion information expressed by the voice parameters.
Wherein the obtaining module 14 includes:
a calculating unit, configured to calculate, for each piece of emotion information, a comprehensive probability value of the emotion information expressed by the sound signal according to a probability value of the emotion information expressed by the text information and a probability value of the emotion information expressed by the speech parameter information;
and the determining unit is used for determining the emotion information with the highest comprehensive probability value as the expressed emotion information of the sound signal.
Wherein the calculation unit includes:
the first calculating subunit is used for calculating a first product between the probability value of the emotion information expressed by the text information and a preset text emotion coefficient;
the second calculating subunit is used for calculating a second product between the probability value of the emotion information expressed by the voice parameter information and a preset voice emotion coefficient;
the third calculation subunit is used for calculating a third product between the first product and a preset matrix vector of the emotion information;
the fourth calculating subunit is used for calculating a fourth product between the second product and the preset matrix vector of the emotion information;
and the obtaining subunit is used for obtaining the comprehensive probability value of the emotion information expressed by the sound signal according to the third product and the fourth product.
In the embodiment of the invention, when the emotion information expressed by a sound signal sent by a user is analyzed, text information and voice parameter information in the sound signal are extracted; performing text emotion analysis on the text information to obtain emotion information expressed by the text information, and performing voice emotion analysis on the voice parameter information to obtain emotion information expressed by the voice parameter; and acquiring the expressed emotion information of the sound signal according to the emotion information expressed by the text information and the emotion information expressed by the voice parameter information.
In determining the expressed emotion information of the sound signal, the prior art determines the expressed emotion information of the sound signal only according to the size, intonation and speech speed in the sound signal, and the embodiment of the invention determines the expressed emotion information of the sound signal according to the text information and the speech parameter information in the sound signal.
Compared with the prior art, the method and the device have the advantages that the emotion information expressed by the sound signals is analyzed more comprehensively by combining the text information besides the voice parameter information, so that misjudgment in the prior art can be avoided, and the accuracy of determining the emotion information expressed by the sound signals can be improved.
For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
The embodiments in the present specification are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, terminal devices (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing terminal to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing terminal, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing terminal to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing terminal to cause a series of operational steps to be performed on the computer or other programmable terminal to produce a computer implemented process such that the instructions which execute on the computer or other programmable terminal provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications of these embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.
Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or terminal that comprises the element.
The method and the device for analyzing emotion information of a sound signal provided by the invention are described in detail, a specific example is applied in the text to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (6)

1. A method for emotion information analysis of a sound signal, the method comprising:
extracting text information and voice parameter information in a sound signal, wherein the method comprises the following steps: detecting signal intensities of the sound signal at respective frequencies, and when it is detected that the signal intensity at a high frequency is lower than the signal intensity at a low frequency, enhancing the signal intensity at the high frequency;
performing text emotion analysis on the text information to obtain emotion information expressed by the text information;
performing voice emotion analysis on the voice parameter information to obtain emotion information expressed by the voice parameter information;
obtaining the emotion information expressed by the sound signal according to the emotion information expressed by the text information and the emotion information expressed by the voice parameter information, wherein the method comprises the following steps:
calculating the comprehensive probability value of the emotion information expressed by the sound signal according to the probability value of the emotion information expressed by the text information and the probability value of the emotion information expressed by the voice parameter information;
determining the emotion information with the highest comprehensive probability value as the expressed emotion information of the sound signal;
wherein, the calculating the comprehensive probability value of the emotion information expressed by the sound signal according to the probability value of the emotion information expressed by the text information and the probability value of the emotion information expressed by the voice parameter information comprises:
calculating a first product between the probability value of the emotion information expressed by the text information and a preset text emotion coefficient;
calculating a second product between the probability value of the emotion information expressed by the voice parameter information and a preset voice emotion coefficient;
calculating a third product between the first product and a preset matrix vector of the emotion information;
calculating a fourth product between the second product and a preset matrix vector of the emotion information;
and acquiring the comprehensive probability value of the emotional information expressed by the sound signal according to the third product and the fourth product.
2. The method of claim 1, wherein the obtaining emotion information expressed by the text information through text emotion analysis of the text information comprises:
and carrying out text emotion analysis on the text information by using a long-short term memory network (LSTM) algorithm to obtain probability values of all emotion information expressed by the text information.
3. The method of claim 2, wherein performing speech emotion analysis on the speech parameter information to obtain emotion information expressed by the speech parameter information comprises:
and performing voice emotion analysis on the voice parameters by using a Convolutional Neural Network (CNN) algorithm to obtain probability values of all emotion information expressed by the voice parameters.
4. An apparatus for analyzing emotion information of a sound signal, the apparatus comprising:
the extraction module is used for extracting text information and voice parameter information in the sound signal, wherein the extraction module comprises: detecting signal intensities of the sound signal at respective frequencies, and when it is detected that the signal intensity at a high frequency is lower than the signal intensity at a low frequency, enhancing the signal intensity at the high frequency;
the first analysis module is used for carrying out text emotion analysis on the text information to obtain emotion information expressed by the text information;
the second analysis module is used for carrying out voice emotion analysis on the voice parameter information to obtain emotion information expressed by the voice parameter information;
an obtaining module, configured to obtain emotion information expressed by the sound signal according to emotion information expressed by the text information and emotion information expressed by the speech parameter information, where the obtaining module includes:
calculating the comprehensive probability value of the emotion information expressed by the sound signal according to the probability value of the emotion information expressed by the text information and the probability value of the emotion information expressed by the voice parameter information;
determining the emotion information with the highest comprehensive probability value as the expressed emotion information of the sound signal;
wherein, the calculating the comprehensive probability value of the emotion information expressed by the sound signal according to the probability value of the emotion information expressed by the text information and the probability value of the emotion information expressed by the voice parameter information comprises:
calculating a first product between the probability value of the emotion information expressed by the text information and a preset text emotion coefficient;
calculating a second product between the probability value of the emotion information expressed by the voice parameter information and a preset voice emotion coefficient;
calculating a third product between the first product and a preset matrix vector of the emotion information;
calculating a fourth product between the second product and a preset matrix vector of the emotion information;
and acquiring the comprehensive probability value of the emotional information expressed by the sound signal according to the third product and the fourth product.
5. The apparatus of claim 4, wherein the first analysis module is specifically configured to: and carrying out text emotion analysis on the text information by using a long-short term memory network (LSTM) algorithm to obtain probability values of all emotion information expressed by the text information.
6. The apparatus of claim 5, wherein the second analysis module is specifically configured to: and performing voice emotion analysis on the voice parameters by using a Convolutional Neural Network (CNN) algorithm to obtain probability values of all emotion information expressed by the voice parameters.
CN201711065483.3A 2017-11-02 2017-11-02 Method and device for analyzing emotion information of sound signal Active CN108039181B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711065483.3A CN108039181B (en) 2017-11-02 2017-11-02 Method and device for analyzing emotion information of sound signal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711065483.3A CN108039181B (en) 2017-11-02 2017-11-02 Method and device for analyzing emotion information of sound signal

Publications (2)

Publication Number Publication Date
CN108039181A CN108039181A (en) 2018-05-15
CN108039181B true CN108039181B (en) 2021-02-12

Family

ID=62092727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711065483.3A Active CN108039181B (en) 2017-11-02 2017-11-02 Method and device for analyzing emotion information of sound signal

Country Status (1)

Country Link
CN (1) CN108039181B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109192225B (en) * 2018-09-28 2021-07-09 清华大学 Method and device for recognizing and marking speech emotion
CN109243492A (en) * 2018-10-28 2019-01-18 国家计算机网络与信息安全管理中心 A kind of speech emotion recognition system and recognition methods
CN110390956A (en) * 2019-08-15 2019-10-29 龙马智芯(珠海横琴)科技有限公司 Emotion recognition network model, method and electronic equipment
CN110675859B (en) * 2019-09-05 2021-11-23 华南理工大学 Multi-emotion recognition method, system, medium, and apparatus combining speech and text
CN110570879A (en) * 2019-09-11 2019-12-13 深圳壹账通智能科技有限公司 Intelligent conversation method and device based on emotion recognition and computer equipment
CN111223498A (en) * 2020-01-10 2020-06-02 平安科技(深圳)有限公司 Intelligent emotion recognition method and device and computer readable storage medium

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100145695A1 (en) * 2008-12-08 2010-06-10 Electronics And Telecommunications Research Institute Apparatus for context awareness and method using the same
CN102623009A (en) * 2012-03-02 2012-08-01 安徽科大讯飞信息技术股份有限公司 Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis
CN102819744A (en) * 2012-06-29 2012-12-12 北京理工大学 Emotion recognition method with information of two channels fused
US20130268273A1 (en) * 2012-04-10 2013-10-10 Oscal Tzyh-Chiang Chen Method of recognizing gender or age of a speaker according to speech emotion or arousal
CN103456314A (en) * 2013-09-03 2013-12-18 广州创维平面显示科技有限公司 Emotion recognition method and device
CN103810994A (en) * 2013-09-05 2014-05-21 江苏大学 Method and system for voice emotion inference on basis of emotion context
CN104200804A (en) * 2014-09-19 2014-12-10 合肥工业大学 Various-information coupling emotion recognition method for human-computer interaction
CN105976809A (en) * 2016-05-25 2016-09-28 中国地质大学(武汉) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion
CN106128479A (en) * 2016-06-30 2016-11-16 福建星网视易信息系统有限公司 A kind of performance emotion identification method and device
CN107272607A (en) * 2017-05-11 2017-10-20 上海斐讯数据通信技术有限公司 A kind of intelligent home control system and method

Family Cites Families (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105427869A (en) * 2015-11-02 2016-03-23 北京大学 Session emotion autoanalysis method based on depth learning
CN105334743B (en) * 2015-11-18 2018-10-26 深圳创维-Rgb电子有限公司 A kind of intelligent home furnishing control method and its system based on emotion recognition
US20170278067A1 (en) * 2016-03-25 2017-09-28 International Business Machines Corporation Monitoring activity to detect potential user actions
CN106297783A (en) * 2016-08-05 2017-01-04 易晓阳 A kind of interactive voice identification intelligent terminal
CN106297826A (en) * 2016-08-18 2017-01-04 竹间智能科技(上海)有限公司 Speech emotional identification system and method
CN107038154A (en) * 2016-11-25 2017-08-11 阿里巴巴集团控股有限公司 A kind of text emotion recognition methods and device
CN106782602B (en) * 2016-12-01 2020-03-17 南京邮电大学 Speech emotion recognition method based on deep neural network
CN106598948B (en) * 2016-12-19 2019-05-03 杭州语忆科技有限公司 Emotion identification method based on shot and long term Memory Neural Networks combination autocoder
CN106782615B (en) * 2016-12-20 2020-06-12 科大讯飞股份有限公司 Voice data emotion detection method, device and system
CN106847309A (en) * 2017-01-09 2017-06-13 华南理工大学 A kind of speech-emotion recognition method
CN107247702A (en) * 2017-05-05 2017-10-13 桂林电子科技大学 A kind of text emotion analysis and processing method and system
CN107291696A (en) * 2017-06-28 2017-10-24 达而观信息科技(上海)有限公司 A kind of comment word sentiment analysis method and system based on deep learning

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100145695A1 (en) * 2008-12-08 2010-06-10 Electronics And Telecommunications Research Institute Apparatus for context awareness and method using the same
CN102623009A (en) * 2012-03-02 2012-08-01 安徽科大讯飞信息技术股份有限公司 Abnormal emotion automatic detection and extraction method and system on basis of short-time analysis
US20130268273A1 (en) * 2012-04-10 2013-10-10 Oscal Tzyh-Chiang Chen Method of recognizing gender or age of a speaker according to speech emotion or arousal
CN102819744A (en) * 2012-06-29 2012-12-12 北京理工大学 Emotion recognition method with information of two channels fused
CN103456314A (en) * 2013-09-03 2013-12-18 广州创维平面显示科技有限公司 Emotion recognition method and device
CN103810994A (en) * 2013-09-05 2014-05-21 江苏大学 Method and system for voice emotion inference on basis of emotion context
CN104200804A (en) * 2014-09-19 2014-12-10 合肥工业大学 Various-information coupling emotion recognition method for human-computer interaction
CN105976809A (en) * 2016-05-25 2016-09-28 中国地质大学(武汉) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion
CN106128479A (en) * 2016-06-30 2016-11-16 福建星网视易信息系统有限公司 A kind of performance emotion identification method and device
CN107272607A (en) * 2017-05-11 2017-10-20 上海斐讯数据通信技术有限公司 A kind of intelligent home control system and method

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
融合人脸表情和语音的双模态情感识别研究;谢坷珍;《中国优秀硕士学位论文全文数据库信息科技辑》;20160715(第07期);I138-873 *

Also Published As

Publication number Publication date
CN108039181A (en) 2018-05-15

Similar Documents

Publication Publication Date Title
CN108039181B (en) Method and device for analyzing emotion information of sound signal
CN109473123B (en) Voice activity detection method and device
US9875739B2 (en) Speaker separation in diarization
CN108630193B (en) Voice recognition method and device
WO2021128741A1 (en) Voice emotion fluctuation analysis method and apparatus, and computer device and storage medium
US20180218731A1 (en) Voice interaction apparatus and voice interaction method
US11630999B2 (en) Method and system for analyzing customer calls by implementing a machine learning model to identify emotions
US20200168209A1 (en) System and method for determining the compliance of agent scripts
WO2020253128A1 (en) Voice recognition-based communication service method, apparatus, computer device, and storage medium
CN110570853A (en) Intention recognition method and device based on voice data
CN108899033B (en) Method and device for determining speaker characteristics
US10971149B2 (en) Voice interaction system for interaction with a user by voice, voice interaction method, and program
CN110364178B (en) Voice processing method and device, storage medium and electronic equipment
CN111916061A (en) Voice endpoint detection method and device, readable storage medium and electronic equipment
CN111415653B (en) Method and device for recognizing speech
US20210118464A1 (en) Method and apparatus for emotion recognition from speech
CN108877779B (en) Method and device for detecting voice tail point
CN110728996A (en) Real-time voice quality inspection method, device, equipment and computer storage medium
CN112509561A (en) Emotion recognition method, device, equipment and computer readable storage medium
CN107680584B (en) Method and device for segmenting audio
CN108962226B (en) Method and apparatus for detecting end point of voice
CN112992190B (en) Audio signal processing method and device, electronic equipment and storage medium
CN111640450A (en) Multi-person audio processing method, device, equipment and readable storage medium
JP2018005122A (en) Detection device, detection method, and detection program
CN113099043A (en) Customer service control method, apparatus and computer-readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant