CN107545905A - Emotion identification method based on sound property - Google Patents

Emotion identification method based on sound property Download PDF

Info

Publication number
CN107545905A
CN107545905A CN201710720391.8A CN201710720391A CN107545905A CN 107545905 A CN107545905 A CN 107545905A CN 201710720391 A CN201710720391 A CN 201710720391A CN 107545905 A CN107545905 A CN 107545905A
Authority
CN
China
Prior art keywords
voice signal
emotion identification
mood
word
identification result
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201710720391.8A
Other languages
Chinese (zh)
Other versions
CN107545905B (en
Inventor
王超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing He Guang Artificial Intelligent Robot Technology Co Ltd
Original Assignee
Beijing He Guang Artificial Intelligent Robot Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing He Guang Artificial Intelligent Robot Technology Co Ltd filed Critical Beijing He Guang Artificial Intelligent Robot Technology Co Ltd
Priority to CN201710720391.8A priority Critical patent/CN107545905B/en
Publication of CN107545905A publication Critical patent/CN107545905A/en
Application granted granted Critical
Publication of CN107545905B publication Critical patent/CN107545905B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides the Emotion identification method based on sound property, including:Sound recording module reads voice signal;Sound pretreatment module identifies the language belonging to read voice signal, and the voice signal read is carried out into subordinate sentence by sentence, obtains the pretreated voice signal of language tag;Acoustic processing module is calculated, extraction speech characteristic parameter according to the language tag of pretreated voice signal by default method;Language tag and the speech characteristic parameter that is extracted of the mood processing module according to pretreated voice signal, obtain the Emotion identification result of every a word, the Emotion identification result describes in a probabilistic manner;Mood post-processing module obtains the Emotion identification result of every a word of voice signal, and Emotion identification result is adjusted according to default mode, obtains the Emotion identification result of the voice signal.Method provided by the invention, it is possible to increase the accuracy rate of Emotion identification.

Description

Emotion identification method based on sound property
Technical field
The present invention relates to Emotion identification field, more particularly to a kind of Emotion identification method based on sound property.
Background technology
With the development of science and technology natural language processing, especially language identification have had been applied in increasing industry, Such as mobile phone speech assistant, self-assisted voice service, among these services, improve the energy that the mood in language is identified Power is the important method improved service quality.
In other field, the function of Emotion identification is assigned such as on speech communication instrument, both call sides can be helped timely Understand mutual mood, promote exchange.The mood of identification learning person in remote teaching, when learner is because encountering a difficulty or can not The material of understanding and show anxiety and it is discontented when, teacher or system can adjust teaching method and progress, or give mood tune Section instructs.In language navigation field, when voice mood identification function identifies that driver is in the state of emotional instability, system It can give and remind, or automatically adjust drive parameter to prevent the generation of accident.
The existing Emotion identification method based on sound property, either using vector Splittable mahalanobis distance diagnostic method, Principle component analysis, neural net method or hidden Markov model, often only consider the content of a voice, simultaneously as The difference of different language culture, monolingual voice can only be directed to by, which also tending to, carries out Emotion identification, so as to cause voice mood The accuracy rate of identification is not high enough.
The content of the invention
To solve problem above, the invention provides a kind of Emotion identification method based on sound property, it is possible to increase feelings The accuracy rate of thread identification.
The embodiments of the invention provide a kind of Emotion identification method based on sound property, including:
Sound recording module reads voice signal;
Sound pretreatment module identifies the language belonging to read voice signal, and the voice signal read is pressed Sentence carries out subordinate sentence, obtains the pretreated voice signal of language tag;
Acoustic processing module is calculated, extraction language according to the language tag of pretreated voice signal by default method Sound characteristic parameter;
Language tag and the speech characteristic parameter that is extracted of the mood processing module according to pretreated voice signal, are obtained To the Emotion identification result of every a word, the Emotion identification result describes in a probabilistic manner;
Mood post-processing module obtains the Emotion identification result of every a word of voice signal, according to default mode to feelings Thread recognition result is adjusted, and obtains the Emotion identification result of the voice signal.
Preferably, the phonetic feature, in addition to:
Prosodic features, including rising tune, falling tone, accent and stress.
Preferably, the Emotion identification result for obtaining every a word, it is by using principle component analysis or mixed Gaussian Modelling or hidden Markov model are calculated.
Preferably, the Emotion identification result for obtaining every a word, to obtain by the following method:
Every a word and the distance of every kind of mood is calculated using vector Splittable mahalanobis distance diagnostic method;
According to default method, distance values are converted into probability numbers, made:Apart from smaller, probability it is bigger and, it is all general Rate sum is 1;
Emotion identification result using the probability as every a word.
Preferably, it is described that Emotion identification result is adjusted according to default mode, including:
The combined chance after some adjustment is carried out to Emotion identification result is calculated according to the first formula, chooses combined chance most High Adjusted Option is adjusted to Emotion identification result, and first formula is:
P=K (θ) αn-i(1-α)i
Wherein, K (θ) is the probable value corresponding to the quantity for the mood that voice signal includes, and is obtained by sample by statistics, For the preset function of monotone decreasing, θ is the quantity for the mood that voice signal includes, α be to the Emotion identification of every a word just True rate, the sentence quantity that n is included by voice signal, i are the quantity of the sentence of adjustment Emotion identification result.
Preferably, the Emotion identification method based on sound property, in addition to:
Sound turns character module and reads voice signal and be converted into text information;
Word Emotion identification module is segmented the text information changed out, and is examined in mood word database Rope, wherein the mood word database purchase has the word corresponding to different moods;
When the quantity of word that the text information is included corresponding to certain mood exceedes predetermined threshold value, and include other feelings It is this kind of mood by the Emotion identification of the voice signal when quantity of word corresponding to thread is less than predetermined threshold value.
Preferably, the sound recording module reads voice signal, including:
Sound recording module reads voice signal;
Sound recording module checks the length of voice signal, when the length of voice signal exceedes default threshold value, sound The voice signal is segmented by recording module, so that of length no more than default threshold value of each section of voice signal.
A kind of Emotion identification method based on sound property provided by the invention, it is possible to increase the accuracy rate of Emotion identification.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification Obtain it is clear that or being understood by implementing the present invention.The purpose of the present invention and other advantages can be by the explanations write Specifically noted structure is realized and obtained in book, claims and accompanying drawing.
Below by drawings and examples, technical scheme is described in further detail.
Brief description of the drawings
Accompanying drawing is used for providing a further understanding of the present invention, and a part for constitution instruction, the reality with the present invention Apply example to be used to explain the present invention together, be not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is a kind of flow chart of the Emotion identification method based on sound property in the embodiment of the present invention.
Embodiment
The preferred embodiments of the present invention are illustrated below in conjunction with accompanying drawing, it will be appreciated that described herein preferred real Apply example to be merely to illustrate and explain the present invention, be not intended to limit the present invention.
The embodiments of the invention provide a kind of Emotion identification method based on sound property, as shown in figure 1, including:
Sound recording module reads voice signal;
Sound pretreatment module identifies the language belonging to read voice signal, and the voice signal read is pressed Sentence carries out subordinate sentence, obtains the pretreated voice signal of language tag;
Acoustic processing module is calculated, extraction language according to the language tag of pretreated voice signal by default method Sound characteristic parameter;
Language tag and the speech characteristic parameter that is extracted of the mood processing module according to pretreated voice signal, are obtained To the Emotion identification result of every a word, the Emotion identification result describes in a probabilistic manner;
Mood post-processing module obtains the Emotion identification result of every a word of voice signal, according to default mode to feelings Thread recognition result is adjusted, and obtains the Emotion identification result of the voice signal.
By being identified in advance to the language belonging to voice signal, so as to using targetedly sound property to sound Mood be identified, add to voice signal carry out Emotion identification accuracy rate, while by using mood post-process mould Block, identified using the mood of one section of sound as overall, further increase the accuracy rate that Emotion identification is carried out to voice signal.
In one embodiment of the invention, the phonetic feature, in addition to:
Prosodic features, including rising tune, falling tone, accent and stress.By with reference to prosodic features, making phonetic feature more complete Face, so as to be easier to realize the Emotion identification of higher accuracy.
In one embodiment of the invention, the Emotion identification result for obtaining every a word, it is by using pivot Analytic approach or mixed Gauss model method or hidden Markov model are calculated.By using principle component analysis or mixed Gaussian mould Type method or hidden Markov model are calculated, and can directly obtain the Emotion identification result described in a probabilistic manner, convenient Identified using mood post-processing module using the mood of one section of sound as overall, mood knowledge is carried out to voice signal so as to increase Other accuracy rate.
In one embodiment of the invention, the Emotion identification result for obtaining every a word, for by the following method Obtain:
Every a word and the distance of every kind of mood is calculated using vector Splittable mahalanobis distance diagnostic method;
According to default method, distance values are converted into probability numbers, made:Apart from smaller, probability it is bigger and, it is all general Rate sum is 1;
Emotion identification result using the probability as every a word.
The Emotion identification result that can be described in a probabilistic manner by this method, the post processing of convenient use mood Module identifies using the mood of one section of sound as overall, so as to increase the accuracy rate that Emotion identification is carried out to voice signal.
In one embodiment of the invention, it is described that Emotion identification result is adjusted according to default mode, including:
The combined chance after some adjustment is carried out to Emotion identification result is calculated according to the first formula, chooses combined chance most High Adjusted Option is adjusted to Emotion identification result, and first formula is:
P=K (θ) αn-i(1-α)i
Wherein, K (θ) is the probable value corresponding to the quantity for the mood that voice signal includes, and is obtained by sample by statistics, For the preset function of monotone decreasing, θ is the quantity for the mood that voice signal includes, α be to the Emotion identification of every a word just True rate, the sentence quantity that n is included by voice signal, i are the quantity of the sentence of adjustment Emotion identification result.
By using mood post-processing module, the probability by one section of sound emotional change of consideration and the mood to every words The probability of mistake is identified, is identified using the mood of one section of sound as overall, further increases and mood is carried out to voice signal The accuracy rate of identification.
In one embodiment of the invention, the Emotion identification method based on sound property, in addition to:
Sound turns character module and reads voice signal and be converted into text information;
Word Emotion identification module is segmented the text information changed out, and is examined in mood word database Rope, wherein the mood word database purchase has the word corresponding to different moods;
When the quantity of word that the text information is included corresponding to certain mood exceedes predetermined threshold value, and include other feelings It is this kind of mood by the Emotion identification of the voice signal when quantity of word corresponding to thread is less than predetermined threshold value.
Turn the mode of word by using sound, can be in the case where word there is obvious mood to be inclined to, it is more accurate to accomplish Emotion identification.
In one embodiment of the invention, the sound recording module reads voice signal, including:
Sound recording module reads voice signal;
Sound recording module checks the length of voice signal, when the length of voice signal exceedes default threshold value, sound The voice signal is segmented by recording module, so that of length no more than default threshold value of each section of voice signal.
By limiting the length of every section of sound, can limit using mood post-processing module using the mood of one section of sound as Integrally come the amount of calculation identified, the speed that Emotion identification is carried out to voice signal is improved.
A kind of Emotion identification method based on sound property provided by the invention, it is possible to increase the accuracy rate of Emotion identification.
The present invention is the flow with reference to method according to embodiments of the present invention, equipment (system) and computer program product Figure and/or block diagram describe.It should be understood that can be by every first-class in computer program instructions implementation process figure and/or block diagram Journey and/or the flow in square frame and flow chart and/or block diagram and/or the combination of square frame.These computer programs can be provided The processors of all-purpose computer, special-purpose computer, Embedded Processor or other programmable data processing devices is instructed to produce A raw machine so that produced by the instruction of computer or the computing device of other programmable data processing devices for real The device for the function of being specified in present one flow of flow chart or one square frame of multiple flows and/or block diagram or multiple square frames.
These computer program instructions, which may be alternatively stored in, can guide computer or other programmable data processing devices with spy Determine in the computer-readable memory that mode works so that the instruction being stored in the computer-readable memory, which produces, to be included referring to Make the manufacture of device, the command device realize in one flow of flow chart or multiple flows and/or one square frame of block diagram or The function of being specified in multiple square frames.
These computer program instructions can be also loaded into computer or other programmable data processing devices so that counted Series of operation steps is performed on calculation machine or other programmable devices to produce computer implemented processing, so as in computer or The instruction performed on other programmable devices is provided for realizing in one flow of flow chart or multiple flows and/or block diagram one The step of function of being specified in individual square frame or multiple square frames.
Obviously, those skilled in the art can carry out the essence of various changes and modification without departing from the present invention to the present invention God and scope.So, if these modifications and variations of the present invention belong to the scope of the claims in the present invention and its equivalent technologies Within, then the present invention is also intended to comprising including these changes and modification.

Claims (7)

  1. A kind of 1. Emotion identification method based on sound property, it is characterised in that including:
    Sound recording module reads voice signal;
    Sound pretreatment module identifies the language belonging to read voice signal, and by the voice signal read by sentence Subordinate sentence is carried out, obtains the pretreated voice signal of language tag;
    Acoustic processing module is calculated, extraction voice spy according to the language tag of pretreated voice signal by default method Levy parameter;
    Language tag and the speech characteristic parameter that is extracted of the mood processing module according to pretreated voice signal, are obtained every The Emotion identification result of a word, the Emotion identification result describe in a probabilistic manner;
    Mood post-processing module obtains the Emotion identification result of every a word of voice signal, and mood is known according to default mode Other result is adjusted, and obtains the Emotion identification result of the voice signal.
  2. 2. the method as described in claim 1, it is characterised in that the phonetic feature, in addition to:
    Prosodic features, including rising tune, falling tone, accent and stress.
  3. 3. the method as described in claim 1, it is characterised in that the Emotion identification result for obtaining every a word, to pass through It is calculated using principle component analysis or mixed Gauss model method or hidden Markov model.
  4. 4. the method as described in claim 1, it is characterised in that the Emotion identification result for obtaining every a word, to pass through Following methods obtain:
    Every a word and the distance of every kind of mood is calculated using vector Splittable mahalanobis distance diagnostic method;
    According to default method, distance values are converted into probability numbers, made:Apart from smaller, probability it is bigger and, all probability it With for 1;
    Emotion identification result using the probability as every a word.
  5. 5. the method as described in claim 1, it is characterised in that described to be adjusted according to default mode to Emotion identification result It is whole, including:
    The combined chance after some adjustment is carried out to Emotion identification result is calculated according to the first formula, chooses combined chance highest Adjusted Option is adjusted to Emotion identification result, and first formula is:
    P=K (θ) αn-i(1-α)i
    Wherein, K (θ) is the probable value corresponding to the quantity for the mood that voice signal includes, and is obtained by sample by statistics, for list The preset function successively decreased is adjusted, θ is the quantity for the mood that voice signal includes, and α is the accuracy to the Emotion identification of every a word, The sentence quantity that n is included by voice signal, i are the quantity of the sentence of adjustment Emotion identification result.
  6. 6. the method as described in claim 1, it is characterised in that also include:
    Sound turns character module and reads voice signal and be converted into text information;
    Word Emotion identification module is segmented the text information changed out, and is retrieved in mood word database, Wherein described mood word database purchase has the word corresponding to different moods;
    When the quantity of word that the text information is included corresponding to certain mood exceedes predetermined threshold value, and include other mood institutes It is this kind of mood by the Emotion identification of the voice signal when quantity of corresponding word is less than predetermined threshold value.
  7. 7. the method as described in claim 1, it is characterised in that the sound recording module reads voice signal, including:
    Sound recording module reads voice signal;
    Sound recording module checks the length of voice signal, when the length of voice signal exceedes default threshold value, sound typing The voice signal is segmented by module, so that of length no more than default threshold value of each section of voice signal.
CN201710720391.8A 2017-08-21 2017-08-21 Emotion recognition method based on sound characteristics Expired - Fee Related CN107545905B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710720391.8A CN107545905B (en) 2017-08-21 2017-08-21 Emotion recognition method based on sound characteristics

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710720391.8A CN107545905B (en) 2017-08-21 2017-08-21 Emotion recognition method based on sound characteristics

Publications (2)

Publication Number Publication Date
CN107545905A true CN107545905A (en) 2018-01-05
CN107545905B CN107545905B (en) 2021-01-05

Family

ID=60958751

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710720391.8A Expired - Fee Related CN107545905B (en) 2017-08-21 2017-08-21 Emotion recognition method based on sound characteristics

Country Status (1)

Country Link
CN (1) CN107545905B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108682419A (en) * 2018-03-30 2018-10-19 京东方科技集团股份有限公司 Sound control method and equipment, computer readable storage medium and equipment
CN110660412A (en) * 2018-06-28 2020-01-07 Tcl集团股份有限公司 Emotion guiding method and device and terminal equipment
CN112447170A (en) * 2019-08-29 2021-03-05 北京声智科技有限公司 Security method and device based on sound information and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102005010285A1 (en) * 2005-03-01 2006-09-07 Deutsche Telekom Ag Speech recognition involves speech recognizer which uses different speech models for linguistic analysis and an emotion recognizer is also present for determining emotional condition of person
CN102142253A (en) * 2010-01-29 2011-08-03 富士通株式会社 Voice emotion identification equipment and method
CN104504027A (en) * 2014-12-12 2015-04-08 北京国双科技有限公司 Method and device for automatically selecting webpage content
CN105320960A (en) * 2015-10-14 2016-02-10 北京航空航天大学 Voting based classification method for cross-language subjective and objective sentiments
CN106297825A (en) * 2016-07-25 2017-01-04 华南理工大学 A kind of speech-emotion recognition method based on integrated degree of depth belief network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE102005010285A1 (en) * 2005-03-01 2006-09-07 Deutsche Telekom Ag Speech recognition involves speech recognizer which uses different speech models for linguistic analysis and an emotion recognizer is also present for determining emotional condition of person
CN102142253A (en) * 2010-01-29 2011-08-03 富士通株式会社 Voice emotion identification equipment and method
CN104504027A (en) * 2014-12-12 2015-04-08 北京国双科技有限公司 Method and device for automatically selecting webpage content
CN105320960A (en) * 2015-10-14 2016-02-10 北京航空航天大学 Voting based classification method for cross-language subjective and objective sentiments
CN106297825A (en) * 2016-07-25 2017-01-04 华南理工大学 A kind of speech-emotion recognition method based on integrated degree of depth belief network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
余伶俐等: "语音信号的情感特征分析与识别研究综述", 《电路与系统学报》 *
姜晓庆等: "多语种情感语音的韵律特征分析和情感识别研究", 《声学学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108682419A (en) * 2018-03-30 2018-10-19 京东方科技集团股份有限公司 Sound control method and equipment, computer readable storage medium and equipment
CN110660412A (en) * 2018-06-28 2020-01-07 Tcl集团股份有限公司 Emotion guiding method and device and terminal equipment
CN112447170A (en) * 2019-08-29 2021-03-05 北京声智科技有限公司 Security method and device based on sound information and electronic equipment

Also Published As

Publication number Publication date
CN107545905B (en) 2021-01-05

Similar Documents

Publication Publication Date Title
CN111048062B (en) Speech synthesis method and apparatus
KR20210082153A (en) Method and system for generating synthesis voice for text via user interface
CN109036377A (en) A kind of phoneme synthesizing method and device
US11676572B2 (en) Instantaneous learning in text-to-speech during dialog
CN110914898A (en) System and method for speech recognition
Liu et al. Mongolian text-to-speech system based on deep neural network
Caponetti et al. Biologically inspired emotion recognition from speech
CN107221344A (en) A kind of speech emotional moving method
CN107545905A (en) Emotion identification method based on sound property
CN112509550A (en) Speech synthesis model training method, speech synthesis device and electronic equipment
Meng et al. Synthesizing English emphatic speech for multimodal corrective feedback in computer-aided pronunciation training
Laurinčiukaitė et al. Lithuanian Speech Corpus Liepa for development of human-computer interfaces working in voice recognition and synthesis mode
Sheikhan Generation of suprasegmental information for speech using a recurrent neural network and binary gravitational search algorithm for feature selection
US20140074468A1 (en) System and Method for Automatic Prediction of Speech Suitability for Statistical Modeling
Lee et al. Korean dialect identification based on intonation modeling
Wu et al. Generating emphatic speech with hidden Markov model for expressive speech synthesis
CN115359778A (en) Confrontation and meta-learning method based on speaker emotion voice synthesis model
Krug et al. Articulatory synthesis for data augmentation in phoneme recognition
CN107886938A (en) Virtual reality guides hypnosis method of speech processing and device
Gharavian et al. Combined classification method for prosodic stress recognition in Farsi language
Heba et al. Lexical emphasis detection in spoken French using F-Banks and neural networks
Houidhek et al. Dnn-based speech synthesis for arabic: modelling and evaluation
CN117711444B (en) Interaction method, device, equipment and storage medium based on talent expression
James et al. Exploring prosodic features modelling for secondary emotions needed for empathetic speech synthesis
Wusu-Ansah Emotion recognition from speech: An implementation in MATLAB

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20210105

Termination date: 20210821