CN107545905B - Emotion recognition method based on sound characteristics - Google Patents

Emotion recognition method based on sound characteristics Download PDF

Info

Publication number
CN107545905B
CN107545905B CN201710720391.8A CN201710720391A CN107545905B CN 107545905 B CN107545905 B CN 107545905B CN 201710720391 A CN201710720391 A CN 201710720391A CN 107545905 B CN107545905 B CN 107545905B
Authority
CN
China
Prior art keywords
voice
emotion
emotion recognition
voice signal
recognition result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201710720391.8A
Other languages
Chinese (zh)
Other versions
CN107545905A (en
Inventor
王超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Uai Robot Technology Co ltd
Original Assignee
Beijing Uai Robot Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Uai Robot Technology Co ltd filed Critical Beijing Uai Robot Technology Co ltd
Priority to CN201710720391.8A priority Critical patent/CN107545905B/en
Publication of CN107545905A publication Critical patent/CN107545905A/en
Application granted granted Critical
Publication of CN107545905B publication Critical patent/CN107545905B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention provides a method for recognizing emotion based on sound characteristics, which comprises the following steps: the voice recording module reads a voice signal; the voice preprocessing module identifies the language to which the read voice signal belongs, and performs sentence division on the read voice signal according to sentences to obtain a preprocessed voice signal with a language mark; the voice processing module calculates and extracts voice characteristic parameters according to a preset method and the language mark of the preprocessed voice signal; the emotion processing module obtains an emotion recognition result of each sentence according to the language mark of the preprocessed sound signal and the extracted voice characteristic parameters, and the emotion recognition result is described in a probability manner; the emotion post-processing module acquires an emotion recognition result of each sentence of the voice signal, and adjusts the emotion recognition result according to a preset mode to obtain the emotion recognition result of the voice signal. The method provided by the invention can improve the accuracy of emotion recognition.

Description

Emotion recognition method based on sound characteristics
Technical Field
The invention relates to the field of emotion recognition, in particular to an emotion recognition method based on sound characteristics.
Background
With the advancement of science and technology, natural language processing, especially language recognition, has been applied in more and more industries, such as mobile phone voice assistant, self-help voice service, etc., among which, improving the ability to recognize emotion in language is an important method for improving service quality.
In other fields, such as endowing emotion recognition function on a voice communication tool, the two parties in communication can be helped to know the emotion of each other in time and promote communication. In distance teaching, the learner's mood is identified, and when the learner exhibits anxiety and dissatisfaction due to encountering problems or unintelligible material, the teacher or system may adjust the way and progress of the teaching, or give mood-adjusting guidance. In the field of language navigation, when the voice emotion recognition function recognizes that the driver is in an emotional unstable state, the system can give a reminder or automatically adjust driving parameters to prevent accidents.
The existing emotion recognition method based on voice characteristics only considers the content of a sentence of voice no matter a vector segmentation Markov distance discrimination method, a principal component analysis method, a neural network method or a hidden Markov model is used, and meanwhile, due to the difference of different language cultures, emotion recognition can only be performed on the voice of a single language, so that the accuracy of voice emotion recognition is not high enough.
Disclosure of Invention
In order to solve the problems, the invention provides a method for recognizing emotion based on voice characteristics, which can improve the accuracy of emotion recognition.
The embodiment of the invention provides an emotion recognition method based on sound characteristics, which comprises the following steps:
the voice recording module reads a voice signal;
the voice preprocessing module identifies the language to which the read voice signal belongs, and performs sentence division on the read voice signal according to sentences to obtain a preprocessed voice signal with a language mark;
the voice processing module calculates and extracts voice characteristic parameters according to a preset method and the language mark of the preprocessed voice signal;
the emotion processing module obtains an emotion recognition result of each sentence according to the language mark of the preprocessed sound signal and the extracted voice characteristic parameters, and the emotion recognition result is described in a probability manner;
the emotion post-processing module acquires an emotion recognition result of each sentence of the voice signal, and adjusts the emotion recognition result according to a preset mode to obtain the emotion recognition result of the voice signal.
Preferably, the voice feature further includes:
prosodic features including rising, falling, accent, and accent.
Preferably, the emotion recognition result for each sentence is calculated by using a principal component analysis method, a mixture gaussian model method or a hidden markov model.
Preferably, the emotion recognition result of each sentence is obtained by:
calculating the distance between each sentence and each emotion by using a vector segmentation type Mahalanobis distance discrimination method;
according to a preset method, the distance value is converted into a probability value, so that: the smaller the distance, the greater the probability and the sum of all probabilities is 1;
and taking the probability as the emotion recognition result of each sentence.
Preferably, the adjusting the emotion recognition result according to a preset mode includes:
calculating a comprehensive probability after adjusting the emotion recognition result according to a first formula, and selecting an adjusting scheme with the highest comprehensive probability to adjust the emotion recognition result, wherein the first formula is as follows:
P=K(θ)αn-i(1-α)i
the probability value corresponding to the number of emotions contained in the sound signal is obtained through sample statistics and is a monotonically decreasing preset function, the number of emotions contained in the sound signal is theta, the accuracy rate of emotion recognition for each sentence is alpha, the number of sentences contained in the sound signal is n, and the number of sentences for which emotion recognition results are adjusted is i.
Preferably, the emotion recognition method based on voice characteristics further includes:
the voice-to-text module reads the voice signal and converts the voice signal into text information;
the character emotion recognition module divides the converted character information into words and searches in an emotion word database, wherein words corresponding to different emotions are stored in the emotion word database;
and when the word information contains words corresponding to a certain emotion and the number of words corresponding to other emotions is lower than a preset threshold value, recognizing the emotion of the voice signal as the emotion.
Preferably, the voice recording module reads voice signals, and includes:
the voice recording module reads a voice signal;
the voice recording module checks the length of the voice signal, and when the length of the voice signal exceeds a preset threshold value, the voice recording module segments the voice signal so that the length of each segment of the voice signal does not exceed the preset threshold value.
The emotion recognition method based on the voice characteristics can improve the accuracy of emotion recognition.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a flowchart of an emotion recognition method based on voice characteristics according to an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
The embodiment of the invention provides an emotion recognition method based on sound characteristics, as shown in fig. 1, comprising the following steps:
the voice recording module reads a voice signal;
the voice preprocessing module identifies the language to which the read voice signal belongs, and performs sentence division on the read voice signal according to sentences to obtain a preprocessed voice signal with a language mark;
the voice processing module calculates and extracts voice characteristic parameters according to a preset method and the language mark of the preprocessed voice signal;
the emotion processing module obtains an emotion recognition result of each sentence according to the language mark of the preprocessed sound signal and the extracted voice characteristic parameters, and the emotion recognition result is described in a probability manner;
the emotion post-processing module acquires an emotion recognition result of each sentence of the voice signal, and adjusts the emotion recognition result according to a preset mode to obtain the emotion recognition result of the voice signal.
The language to which the sound signal belongs is identified in advance, so that the emotion of the sound can be identified by using the specific sound characteristics, the accuracy of emotion identification on the sound signal is increased, and meanwhile, the emotion of one section of sound is identified as a whole by using the emotion post-processing module, so that the accuracy of emotion identification on the sound signal is further increased.
In an embodiment of the present invention, the voice feature further includes:
prosodic features including rising, falling, accent, and accent. By combining with the rhythm characteristics, the voice characteristics are more comprehensive, and the emotion recognition with higher accuracy is easier to realize.
In an embodiment of the present invention, the emotion recognition result for each sentence is calculated by using a principal component analysis method, a mixed gaussian model method or a hidden markov model. By using a principal component analysis method or a mixed Gaussian model method or a hidden Markov model for calculation, a emotion recognition result described in a probability mode can be directly obtained, and an emotion post-processing module is conveniently used for recognizing the emotion of a section of voice as a whole, so that the accuracy of emotion recognition on a voice signal is increased.
In an embodiment of the present invention, the emotion recognition result for each sentence is obtained by:
calculating the distance between each sentence and each emotion by using a vector segmentation type Mahalanobis distance discrimination method;
according to a preset method, the distance value is converted into a probability value, so that: the smaller the distance, the greater the probability and the sum of all probabilities is 1;
and taking the probability as the emotion recognition result of each sentence.
By the method, the emotion recognition result described in a probability mode can be obtained, and the emotion of a section of voice can be recognized as a whole by using the emotion post-processing module conveniently, so that the accuracy of emotion recognition on the voice signal is increased.
In an embodiment of the present invention, the adjusting the emotion recognition result according to a preset manner includes:
calculating a comprehensive probability after adjusting the emotion recognition result according to a first formula, and selecting an adjusting scheme with the highest comprehensive probability to adjust the emotion recognition result, wherein the first formula is as follows:
P=K(θ)αn-i(1-α)i
the probability value corresponding to the number of emotions contained in the sound signal is obtained through sample statistics and is a monotonically decreasing preset function, the number of emotions contained in the sound signal is theta, the accuracy rate of emotion recognition for each sentence is alpha, the number of sentences contained in the sound signal is n, and the number of sentences for which emotion recognition results are adjusted is i.
By using the emotion post-processing module, the emotion of a section of voice is recognized as a whole by considering the probability of emotion change of a section of voice and the probability of emotion recognition error for each section of voice, and the accuracy of emotion recognition on voice signals is further increased.
In an embodiment of the present invention, the emotion recognition method based on voice characteristics further includes:
the voice-to-text module reads the voice signal and converts the voice signal into text information;
the character emotion recognition module divides the converted character information into words and searches in an emotion word database, wherein words corresponding to different emotions are stored in the emotion word database;
and when the word information contains words corresponding to a certain emotion and the number of words corresponding to other emotions is lower than a preset threshold value, recognizing the emotion of the voice signal as the emotion.
By using the mode of converting voice into characters, more accurate emotion recognition can be achieved under the condition that words have obvious emotion tendency.
In one embodiment of the present invention, the sound recording module reads a sound signal, including:
the voice recording module reads a voice signal;
the voice recording module checks the length of the voice signal, and when the length of the voice signal exceeds a preset threshold value, the voice recording module segments the voice signal so that the length of each segment of the voice signal does not exceed the preset threshold value.
By limiting the length of each section of sound, the calculation amount of recognizing the emotion of one section of sound as a whole by using the emotion post-processing module can be limited, and the speed of recognizing the emotion of the sound signal is increased.
The emotion recognition method based on the voice characteristics can improve the accuracy of emotion recognition.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (6)

1. A method for emotion recognition based on voice characteristics, comprising:
the voice recording module reads a voice signal;
the voice preprocessing module identifies the language to which the read voice signal belongs, and performs sentence division on the read voice signal according to sentences to obtain a preprocessed voice signal with a language mark;
the voice processing module calculates and extracts voice characteristic parameters according to a preset method and the language mark of the preprocessed voice signal;
the emotion processing module obtains an emotion recognition result of each sentence according to the language mark of the preprocessed sound signal and the extracted voice characteristic parameters, and the emotion recognition result is described in a probability manner;
the emotion post-processing module acquires an emotion recognition result of each sentence of the voice signal, and adjusts the emotion recognition result according to a preset mode to obtain the emotion recognition result of the voice signal;
the adjusting of the emotion recognition result according to the preset mode comprises the following steps:
calculating a comprehensive probability after adjusting the emotion recognition result according to a first formula, and selecting an adjusting scheme with the highest comprehensive probability to adjust the emotion recognition result, wherein the first formula is as follows:
P=K(θ)αn-i(1-α)i
the probability value corresponding to the number of emotions contained in the sound signal is obtained through sample statistics and is a monotonically decreasing preset function, the number of emotions contained in the sound signal is theta, the accuracy rate of emotion recognition for each sentence is alpha, the number of sentences contained in the sound signal is n, and the number of sentences for which emotion recognition results are adjusted is i.
2. The method of claim 1, wherein the speech feature further comprises:
prosodic features including rising, falling, accent, and accent.
3. The method of claim 1, wherein the emotion recognition result for each sentence is obtained by calculation using a principal component analysis method or a mixture gaussian model method or a hidden markov model.
4. The method of claim 1, wherein the emotion recognition result for each sentence is obtained by:
calculating the distance between each sentence and each emotion by using a vector segmentation type Mahalanobis distance discrimination method;
according to a preset method, the distance value is converted into a probability value, so that: the smaller the distance, the greater the probability and the sum of all probabilities is 1;
and taking the probability as the emotion recognition result of each sentence.
5. The method of claim 1, further comprising:
the voice-to-text module reads the voice signal and converts the voice signal into text information;
the character emotion recognition module divides the converted character information into words and searches in an emotion word database, wherein words corresponding to different emotions are stored in the emotion word database;
and when the word information contains words corresponding to a certain emotion and the number of words corresponding to other emotions is lower than a preset threshold value, recognizing the emotion of the voice signal as the emotion.
6. The method of claim 1, wherein the sound entry module reads a sound signal comprising:
the voice recording module reads a voice signal;
the voice recording module checks the length of the voice signal, and when the length of the voice signal exceeds a preset threshold value, the voice recording module segments the voice signal so that the length of each segment of the voice signal does not exceed the preset threshold value.
CN201710720391.8A 2017-08-21 2017-08-21 Emotion recognition method based on sound characteristics Expired - Fee Related CN107545905B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710720391.8A CN107545905B (en) 2017-08-21 2017-08-21 Emotion recognition method based on sound characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710720391.8A CN107545905B (en) 2017-08-21 2017-08-21 Emotion recognition method based on sound characteristics

Publications (2)

Publication Number Publication Date
CN107545905A CN107545905A (en) 2018-01-05
CN107545905B true CN107545905B (en) 2021-01-05

Family

ID=60958751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710720391.8A Expired - Fee Related CN107545905B (en) 2017-08-21 2017-08-21 Emotion recognition method based on sound characteristics

Country Status (1)

Country Link
CN (1) CN107545905B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108682419A (en) * 2018-03-30 2018-10-19 京东方科技集团股份有限公司 Sound control method and equipment, computer readable storage medium and equipment
CN110660412A (en) * 2018-06-28 2020-01-07 Tcl集团股份有限公司 Emotion guiding method and device and terminal equipment
CN112447170A (en) * 2019-08-29 2021-03-05 北京声智科技有限公司 Security method and device based on sound information and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102005010285A1 (en) * 2005-03-01 2006-09-07 Deutsche Telekom Ag Speech recognition involves speech recognizer which uses different speech models for linguistic analysis and an emotion recognizer is also present for determining emotional condition of person
CN102142253A (en) * 2010-01-29 2011-08-03 富士通株式会社 Voice emotion identification equipment and method
CN104504027A (en) * 2014-12-12 2015-04-08 北京国双科技有限公司 Method and device for automatically selecting webpage content
CN105320960A (en) * 2015-10-14 2016-02-10 北京航空航天大学 Voting based classification method for cross-language subjective and objective sentiments
CN106297825A (en) * 2016-07-25 2017-01-04 华南理工大学 A kind of speech-emotion recognition method based on integrated degree of depth belief network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102005010285A1 (en) * 2005-03-01 2006-09-07 Deutsche Telekom Ag Speech recognition involves speech recognizer which uses different speech models for linguistic analysis and an emotion recognizer is also present for determining emotional condition of person
CN102142253A (en) * 2010-01-29 2011-08-03 富士通株式会社 Voice emotion identification equipment and method
CN104504027A (en) * 2014-12-12 2015-04-08 北京国双科技有限公司 Method and device for automatically selecting webpage content
CN105320960A (en) * 2015-10-14 2016-02-10 北京航空航天大学 Voting based classification method for cross-language subjective and objective sentiments
CN106297825A (en) * 2016-07-25 2017-01-04 华南理工大学 A kind of speech-emotion recognition method based on integrated degree of depth belief network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
多语种情感语音的韵律特征分析和情感识别研究;姜晓庆等;《声学学报》;20060531;第31卷(第3期);217-221 *
语音信号的情感特征分析与识别研究综述;余伶俐等;《电路与系统学报》;20070831;第12卷(第4期);76-84 *

Also Published As

Publication number Publication date
CN107545905A (en) 2018-01-05

Similar Documents

Publication Publication Date Title
CN106503646B (en) Multi-mode emotion recognition system and method
CN107039034B (en) Rhythm prediction method and system
CN106575502B (en) System and method for providing non-lexical cues in synthesized speech
CN102194454B (en) Equipment and method for detecting key word in continuous speech
CN103761975B (en) Method and device for oral evaluation
US20050228664A1 (en) Refining of segmental boundaries in speech waveforms using contextual-dependent models
EP3734595A1 (en) Methods and systems for providing speech recognition systems based on speech recordings logs
CN107545905B (en) Emotion recognition method based on sound characteristics
ATE389225T1 (en) VOICE RECOGNITION
CN109545197B (en) Voice instruction identification method and device and intelligent terminal
CN105261246A (en) Spoken English error correcting system based on big data mining technology
US11810471B2 (en) Computer implemented method and apparatus for recognition of speech patterns and feedback
US11961524B2 (en) System and method for extracting and displaying speaker information in an ATC transcription
CN112818680B (en) Corpus processing method and device, electronic equipment and computer readable storage medium
CN112927679A (en) Method for adding punctuation marks in voice recognition and voice recognition device
Sinclair et al. A semi-markov model for speech segmentation with an utterance-break prior
CN108806691B (en) Voice recognition method and system
Manjunath et al. Development of consonant-vowel recognition systems for Indian languages: Bengali and Odia
CN112767961B (en) Accent correction method based on cloud computing
Ishihara et al. Automatic transformation of environmental sounds into sound-imitation words based on Japanese syllable structure.
CN110992986B (en) Word syllable stress reading error detection method, device, electronic equipment and storage medium
AT&T
Sreejith et al. Automatic prosodic labeling and broad class Phonetic Engine for Malayalam
CN112863493A (en) Voice data labeling method and device and electronic equipment
CN117275458B (en) Speech generation method, device and equipment for intelligent customer service and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210105

Termination date: 20210821