CN107545905B - Emotion recognition method based on sound characteristics - Google Patents
Emotion recognition method based on sound characteristics Download PDFInfo
- Publication number
- CN107545905B CN107545905B CN201710720391.8A CN201710720391A CN107545905B CN 107545905 B CN107545905 B CN 107545905B CN 201710720391 A CN201710720391 A CN 201710720391A CN 107545905 B CN107545905 B CN 107545905B
- Authority
- CN
- China
- Prior art keywords
- voice
- emotion
- emotion recognition
- voice signal
- recognition result
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related
Links
Images
Abstract
The invention provides a method for recognizing emotion based on sound characteristics, which comprises the following steps: the voice recording module reads a voice signal; the voice preprocessing module identifies the language to which the read voice signal belongs, and performs sentence division on the read voice signal according to sentences to obtain a preprocessed voice signal with a language mark; the voice processing module calculates and extracts voice characteristic parameters according to a preset method and the language mark of the preprocessed voice signal; the emotion processing module obtains an emotion recognition result of each sentence according to the language mark of the preprocessed sound signal and the extracted voice characteristic parameters, and the emotion recognition result is described in a probability manner; the emotion post-processing module acquires an emotion recognition result of each sentence of the voice signal, and adjusts the emotion recognition result according to a preset mode to obtain the emotion recognition result of the voice signal. The method provided by the invention can improve the accuracy of emotion recognition.
Description
Technical Field
The invention relates to the field of emotion recognition, in particular to an emotion recognition method based on sound characteristics.
Background
With the advancement of science and technology, natural language processing, especially language recognition, has been applied in more and more industries, such as mobile phone voice assistant, self-help voice service, etc., among which, improving the ability to recognize emotion in language is an important method for improving service quality.
In other fields, such as endowing emotion recognition function on a voice communication tool, the two parties in communication can be helped to know the emotion of each other in time and promote communication. In distance teaching, the learner's mood is identified, and when the learner exhibits anxiety and dissatisfaction due to encountering problems or unintelligible material, the teacher or system may adjust the way and progress of the teaching, or give mood-adjusting guidance. In the field of language navigation, when the voice emotion recognition function recognizes that the driver is in an emotional unstable state, the system can give a reminder or automatically adjust driving parameters to prevent accidents.
The existing emotion recognition method based on voice characteristics only considers the content of a sentence of voice no matter a vector segmentation Markov distance discrimination method, a principal component analysis method, a neural network method or a hidden Markov model is used, and meanwhile, due to the difference of different language cultures, emotion recognition can only be performed on the voice of a single language, so that the accuracy of voice emotion recognition is not high enough.
Disclosure of Invention
In order to solve the problems, the invention provides a method for recognizing emotion based on voice characteristics, which can improve the accuracy of emotion recognition.
The embodiment of the invention provides an emotion recognition method based on sound characteristics, which comprises the following steps:
the voice recording module reads a voice signal;
the voice preprocessing module identifies the language to which the read voice signal belongs, and performs sentence division on the read voice signal according to sentences to obtain a preprocessed voice signal with a language mark;
the voice processing module calculates and extracts voice characteristic parameters according to a preset method and the language mark of the preprocessed voice signal;
the emotion processing module obtains an emotion recognition result of each sentence according to the language mark of the preprocessed sound signal and the extracted voice characteristic parameters, and the emotion recognition result is described in a probability manner;
the emotion post-processing module acquires an emotion recognition result of each sentence of the voice signal, and adjusts the emotion recognition result according to a preset mode to obtain the emotion recognition result of the voice signal.
Preferably, the voice feature further includes:
prosodic features including rising, falling, accent, and accent.
Preferably, the emotion recognition result for each sentence is calculated by using a principal component analysis method, a mixture gaussian model method or a hidden markov model.
Preferably, the emotion recognition result of each sentence is obtained by:
calculating the distance between each sentence and each emotion by using a vector segmentation type Mahalanobis distance discrimination method;
according to a preset method, the distance value is converted into a probability value, so that: the smaller the distance, the greater the probability and the sum of all probabilities is 1;
and taking the probability as the emotion recognition result of each sentence.
Preferably, the adjusting the emotion recognition result according to a preset mode includes:
calculating a comprehensive probability after adjusting the emotion recognition result according to a first formula, and selecting an adjusting scheme with the highest comprehensive probability to adjust the emotion recognition result, wherein the first formula is as follows:
P=K(θ)αn-i(1-α)i
the probability value corresponding to the number of emotions contained in the sound signal is obtained through sample statistics and is a monotonically decreasing preset function, the number of emotions contained in the sound signal is theta, the accuracy rate of emotion recognition for each sentence is alpha, the number of sentences contained in the sound signal is n, and the number of sentences for which emotion recognition results are adjusted is i.
Preferably, the emotion recognition method based on voice characteristics further includes:
the voice-to-text module reads the voice signal and converts the voice signal into text information;
the character emotion recognition module divides the converted character information into words and searches in an emotion word database, wherein words corresponding to different emotions are stored in the emotion word database;
and when the word information contains words corresponding to a certain emotion and the number of words corresponding to other emotions is lower than a preset threshold value, recognizing the emotion of the voice signal as the emotion.
Preferably, the voice recording module reads voice signals, and includes:
the voice recording module reads a voice signal;
the voice recording module checks the length of the voice signal, and when the length of the voice signal exceeds a preset threshold value, the voice recording module segments the voice signal so that the length of each segment of the voice signal does not exceed the preset threshold value.
The emotion recognition method based on the voice characteristics can improve the accuracy of emotion recognition.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a flowchart of an emotion recognition method based on voice characteristics according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
The embodiment of the invention provides an emotion recognition method based on sound characteristics, as shown in fig. 1, comprising the following steps:
the voice recording module reads a voice signal;
the voice preprocessing module identifies the language to which the read voice signal belongs, and performs sentence division on the read voice signal according to sentences to obtain a preprocessed voice signal with a language mark;
the voice processing module calculates and extracts voice characteristic parameters according to a preset method and the language mark of the preprocessed voice signal;
the emotion processing module obtains an emotion recognition result of each sentence according to the language mark of the preprocessed sound signal and the extracted voice characteristic parameters, and the emotion recognition result is described in a probability manner;
the emotion post-processing module acquires an emotion recognition result of each sentence of the voice signal, and adjusts the emotion recognition result according to a preset mode to obtain the emotion recognition result of the voice signal.
The language to which the sound signal belongs is identified in advance, so that the emotion of the sound can be identified by using the specific sound characteristics, the accuracy of emotion identification on the sound signal is increased, and meanwhile, the emotion of one section of sound is identified as a whole by using the emotion post-processing module, so that the accuracy of emotion identification on the sound signal is further increased.
In an embodiment of the present invention, the voice feature further includes:
prosodic features including rising, falling, accent, and accent. By combining with the rhythm characteristics, the voice characteristics are more comprehensive, and the emotion recognition with higher accuracy is easier to realize.
In an embodiment of the present invention, the emotion recognition result for each sentence is calculated by using a principal component analysis method, a mixed gaussian model method or a hidden markov model. By using a principal component analysis method or a mixed Gaussian model method or a hidden Markov model for calculation, a emotion recognition result described in a probability mode can be directly obtained, and an emotion post-processing module is conveniently used for recognizing the emotion of a section of voice as a whole, so that the accuracy of emotion recognition on a voice signal is increased.
In an embodiment of the present invention, the emotion recognition result for each sentence is obtained by:
calculating the distance between each sentence and each emotion by using a vector segmentation type Mahalanobis distance discrimination method;
according to a preset method, the distance value is converted into a probability value, so that: the smaller the distance, the greater the probability and the sum of all probabilities is 1;
and taking the probability as the emotion recognition result of each sentence.
By the method, the emotion recognition result described in a probability mode can be obtained, and the emotion of a section of voice can be recognized as a whole by using the emotion post-processing module conveniently, so that the accuracy of emotion recognition on the voice signal is increased.
In an embodiment of the present invention, the adjusting the emotion recognition result according to a preset manner includes:
calculating a comprehensive probability after adjusting the emotion recognition result according to a first formula, and selecting an adjusting scheme with the highest comprehensive probability to adjust the emotion recognition result, wherein the first formula is as follows:
P=K(θ)αn-i(1-α)i
the probability value corresponding to the number of emotions contained in the sound signal is obtained through sample statistics and is a monotonically decreasing preset function, the number of emotions contained in the sound signal is theta, the accuracy rate of emotion recognition for each sentence is alpha, the number of sentences contained in the sound signal is n, and the number of sentences for which emotion recognition results are adjusted is i.
By using the emotion post-processing module, the emotion of a section of voice is recognized as a whole by considering the probability of emotion change of a section of voice and the probability of emotion recognition error for each section of voice, and the accuracy of emotion recognition on voice signals is further increased.
In an embodiment of the present invention, the emotion recognition method based on voice characteristics further includes:
the voice-to-text module reads the voice signal and converts the voice signal into text information;
the character emotion recognition module divides the converted character information into words and searches in an emotion word database, wherein words corresponding to different emotions are stored in the emotion word database;
and when the word information contains words corresponding to a certain emotion and the number of words corresponding to other emotions is lower than a preset threshold value, recognizing the emotion of the voice signal as the emotion.
By using the mode of converting voice into characters, more accurate emotion recognition can be achieved under the condition that words have obvious emotion tendency.
In one embodiment of the present invention, the sound recording module reads a sound signal, including:
the voice recording module reads a voice signal;
the voice recording module checks the length of the voice signal, and when the length of the voice signal exceeds a preset threshold value, the voice recording module segments the voice signal so that the length of each segment of the voice signal does not exceed the preset threshold value.
By limiting the length of each section of sound, the calculation amount of recognizing the emotion of one section of sound as a whole by using the emotion post-processing module can be limited, and the speed of recognizing the emotion of the sound signal is increased.
The emotion recognition method based on the voice characteristics can improve the accuracy of emotion recognition.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.
Claims (6)
1. A method for emotion recognition based on voice characteristics, comprising:
the voice recording module reads a voice signal;
the voice preprocessing module identifies the language to which the read voice signal belongs, and performs sentence division on the read voice signal according to sentences to obtain a preprocessed voice signal with a language mark;
the voice processing module calculates and extracts voice characteristic parameters according to a preset method and the language mark of the preprocessed voice signal;
the emotion processing module obtains an emotion recognition result of each sentence according to the language mark of the preprocessed sound signal and the extracted voice characteristic parameters, and the emotion recognition result is described in a probability manner;
the emotion post-processing module acquires an emotion recognition result of each sentence of the voice signal, and adjusts the emotion recognition result according to a preset mode to obtain the emotion recognition result of the voice signal;
the adjusting of the emotion recognition result according to the preset mode comprises the following steps:
calculating a comprehensive probability after adjusting the emotion recognition result according to a first formula, and selecting an adjusting scheme with the highest comprehensive probability to adjust the emotion recognition result, wherein the first formula is as follows:
P=K(θ)αn-i(1-α)i
the probability value corresponding to the number of emotions contained in the sound signal is obtained through sample statistics and is a monotonically decreasing preset function, the number of emotions contained in the sound signal is theta, the accuracy rate of emotion recognition for each sentence is alpha, the number of sentences contained in the sound signal is n, and the number of sentences for which emotion recognition results are adjusted is i.
2. The method of claim 1, wherein the speech feature further comprises:
prosodic features including rising, falling, accent, and accent.
3. The method of claim 1, wherein the emotion recognition result for each sentence is obtained by calculation using a principal component analysis method or a mixture gaussian model method or a hidden markov model.
4. The method of claim 1, wherein the emotion recognition result for each sentence is obtained by:
calculating the distance between each sentence and each emotion by using a vector segmentation type Mahalanobis distance discrimination method;
according to a preset method, the distance value is converted into a probability value, so that: the smaller the distance, the greater the probability and the sum of all probabilities is 1;
and taking the probability as the emotion recognition result of each sentence.
5. The method of claim 1, further comprising:
the voice-to-text module reads the voice signal and converts the voice signal into text information;
the character emotion recognition module divides the converted character information into words and searches in an emotion word database, wherein words corresponding to different emotions are stored in the emotion word database;
and when the word information contains words corresponding to a certain emotion and the number of words corresponding to other emotions is lower than a preset threshold value, recognizing the emotion of the voice signal as the emotion.
6. The method of claim 1, wherein the sound entry module reads a sound signal comprising:
the voice recording module reads a voice signal;
the voice recording module checks the length of the voice signal, and when the length of the voice signal exceeds a preset threshold value, the voice recording module segments the voice signal so that the length of each segment of the voice signal does not exceed the preset threshold value.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710720391.8A CN107545905B (en) | 2017-08-21 | 2017-08-21 | Emotion recognition method based on sound characteristics |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710720391.8A CN107545905B (en) | 2017-08-21 | 2017-08-21 | Emotion recognition method based on sound characteristics |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107545905A CN107545905A (en) | 2018-01-05 |
CN107545905B true CN107545905B (en) | 2021-01-05 |
Family
ID=60958751
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710720391.8A Expired - Fee Related CN107545905B (en) | 2017-08-21 | 2017-08-21 | Emotion recognition method based on sound characteristics |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107545905B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108682419A (en) * | 2018-03-30 | 2018-10-19 | 京东方科技集团股份有限公司 | Sound control method and equipment, computer readable storage medium and equipment |
CN110660412A (en) * | 2018-06-28 | 2020-01-07 | Tcl集团股份有限公司 | Emotion guiding method and device and terminal equipment |
CN112447170A (en) * | 2019-08-29 | 2021-03-05 | 北京声智科技有限公司 | Security method and device based on sound information and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102005010285A1 (en) * | 2005-03-01 | 2006-09-07 | Deutsche Telekom Ag | Speech recognition involves speech recognizer which uses different speech models for linguistic analysis and an emotion recognizer is also present for determining emotional condition of person |
CN102142253A (en) * | 2010-01-29 | 2011-08-03 | 富士通株式会社 | Voice emotion identification equipment and method |
CN104504027A (en) * | 2014-12-12 | 2015-04-08 | 北京国双科技有限公司 | Method and device for automatically selecting webpage content |
CN105320960A (en) * | 2015-10-14 | 2016-02-10 | 北京航空航天大学 | Voting based classification method for cross-language subjective and objective sentiments |
CN106297825A (en) * | 2016-07-25 | 2017-01-04 | 华南理工大学 | A kind of speech-emotion recognition method based on integrated degree of depth belief network |
-
2017
- 2017-08-21 CN CN201710720391.8A patent/CN107545905B/en not_active Expired - Fee Related
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
DE102005010285A1 (en) * | 2005-03-01 | 2006-09-07 | Deutsche Telekom Ag | Speech recognition involves speech recognizer which uses different speech models for linguistic analysis and an emotion recognizer is also present for determining emotional condition of person |
CN102142253A (en) * | 2010-01-29 | 2011-08-03 | 富士通株式会社 | Voice emotion identification equipment and method |
CN104504027A (en) * | 2014-12-12 | 2015-04-08 | 北京国双科技有限公司 | Method and device for automatically selecting webpage content |
CN105320960A (en) * | 2015-10-14 | 2016-02-10 | 北京航空航天大学 | Voting based classification method for cross-language subjective and objective sentiments |
CN106297825A (en) * | 2016-07-25 | 2017-01-04 | 华南理工大学 | A kind of speech-emotion recognition method based on integrated degree of depth belief network |
Non-Patent Citations (2)
Title |
---|
多语种情感语音的韵律特征分析和情感识别研究;姜晓庆等;《声学学报》;20060531;第31卷(第3期);217-221 * |
语音信号的情感特征分析与识别研究综述;余伶俐等;《电路与系统学报》;20070831;第12卷(第4期);76-84 * |
Also Published As
Publication number | Publication date |
---|---|
CN107545905A (en) | 2018-01-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106503646B (en) | Multi-mode emotion recognition system and method | |
CN107039034B (en) | Rhythm prediction method and system | |
CN106575502B (en) | System and method for providing non-lexical cues in synthesized speech | |
CN102194454B (en) | Equipment and method for detecting key word in continuous speech | |
CN103761975B (en) | Method and device for oral evaluation | |
US20050228664A1 (en) | Refining of segmental boundaries in speech waveforms using contextual-dependent models | |
EP3734595A1 (en) | Methods and systems for providing speech recognition systems based on speech recordings logs | |
CN107545905B (en) | Emotion recognition method based on sound characteristics | |
ATE389225T1 (en) | VOICE RECOGNITION | |
CN109545197B (en) | Voice instruction identification method and device and intelligent terminal | |
CN105261246A (en) | Spoken English error correcting system based on big data mining technology | |
US11810471B2 (en) | Computer implemented method and apparatus for recognition of speech patterns and feedback | |
US11961524B2 (en) | System and method for extracting and displaying speaker information in an ATC transcription | |
CN112818680B (en) | Corpus processing method and device, electronic equipment and computer readable storage medium | |
CN112927679A (en) | Method for adding punctuation marks in voice recognition and voice recognition device | |
Sinclair et al. | A semi-markov model for speech segmentation with an utterance-break prior | |
CN108806691B (en) | Voice recognition method and system | |
Manjunath et al. | Development of consonant-vowel recognition systems for Indian languages: Bengali and Odia | |
CN112767961B (en) | Accent correction method based on cloud computing | |
Ishihara et al. | Automatic transformation of environmental sounds into sound-imitation words based on Japanese syllable structure. | |
CN110992986B (en) | Word syllable stress reading error detection method, device, electronic equipment and storage medium | |
AT&T | ||
Sreejith et al. | Automatic prosodic labeling and broad class Phonetic Engine for Malayalam | |
CN112863493A (en) | Voice data labeling method and device and electronic equipment | |
CN117275458B (en) | Speech generation method, device and equipment for intelligent customer service and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20210105 Termination date: 20210821 |