CN101751923A - Voice mood sorting method and establishing method for mood semanteme model thereof - Google Patents

Voice mood sorting method and establishing method for mood semanteme model thereof Download PDF

Info

Publication number
CN101751923A
CN101751923A CN200810179497A CN200810179497A CN101751923A CN 101751923 A CN101751923 A CN 101751923A CN 200810179497 A CN200810179497 A CN 200810179497A CN 200810179497 A CN200810179497 A CN 200810179497A CN 101751923 A CN101751923 A CN 101751923A
Authority
CN
China
Prior art keywords
words
meaning
rhythm
mood
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200810179497A
Other languages
Chinese (zh)
Other versions
CN101751923B (en
Inventor
吴宗宪
李伟铨
林瑞堂
许进顺
朱家德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute for Information Industry
Original Assignee
Institute for Information Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute for Information Industry filed Critical Institute for Information Industry
Priority to CN2008101794972A priority Critical patent/CN101751923B/en
Publication of CN101751923A publication Critical patent/CN101751923A/en
Application granted granted Critical
Publication of CN101751923B publication Critical patent/CN101751923B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a voice mood sorting method and establishing method for mood semanteme model thereof. At first, a mood semanteme model is established according to the semanteme property and rhythm property included in the voice signal of a semanteme corpus and the semanteme property and rhythm property of each candidate detecting verbal in a candidate detecting voice signal are captured; then, the semanteme property and rhythm property of each candidate detecting verbal are inputted into the mood semanteme model, thus shorting the candidate detecting voice signal into corresponding mood category, thereby reinforcing the mood characteristics of each mood category through the rhythm property, and improving the currency of mood sorting.

Description

The sorting technique of voice mood and the method for building up of mood semanteme model thereof
Technical field
The invention relates to a kind of emotion identification method, and particularly relevant for a kind of sorting technique of the voice mood in conjunction with the meaning of one's words and the rhythm and the method for building up of mood semanteme model thereof.
Background technology
In recent years, because science and technology is with rapid changepl. never-ending changes and improvements, the communicative mode between people and the intelligent electronic installation no longer was the past to input to electronic installation with instruction, and electronic installation can satisfy in the mode that literal is responded again.Therefore, the man-machine interface between the following mankind and the intelligent electronic device also will through the most natural and easily communication media " voice " control.And in order to make man-machine interface system more diversity and hommization, many scholars, manufacturer then there's no one who doesn't or isn't take up to study the identification of mood.
With customer service system, the user who utilizes TV and network to do shopping at present is more and more general.When product broke down, the user mostly all can make a phone call to inquire to client service center.If customer service system can pick out the present emotional state of user, just the contact staff can pacify user's mood as early as possible.And the contact staff also can judge whether self can solve according to the user's who is picked out mood, and whether decision pacifies action for senior contact staff call forwarding.Thus, just can solve many unnecessary conflicts produces.In view of the above, how improving the accuracy of emotion identification, also is an important ring of research at present.
Summary of the invention
The invention provides a kind of method for building up of mood semanteme model, come the construction mood semanteme model in conjunction with meaning of one's words attribute and rhythm attribute.
The invention provides a kind of sorting technique of voice mood, analyze the meaning of one's words attribute of word and in conjunction with rhythm attribute, so as to improving the accuracy of classification.
The present invention proposes a kind of method for building up of mood semanteme model.A mood corpus is provided earlier, and it comprises a plurality of voice signals that belong to a plurality of mood classifications.Then, capture the meaning of one's words attribute and the rhythm attribute of each word in the voice signal respectively.Wherein, meaning of one's words attribute is an inquiry lexical knowledge bank and obtaining, and rhythm attribute is by capturing gained in each voice signal.Afterwards, just can set up mood semanteme model by the meaning of one's words attribute and the rhythm attribute of each voice signal.
In one embodiment of this invention, the above-mentioned step of setting up mood semanteme model comprises according to meaning of one's words attribute and rhythm attribute each voice signal being converted to meaning of one's words rhythm vector respectively.Again with these meaning of one's words rhythm vector substitution gauss hybrid models, to set up mood semanteme model.
In one embodiment of this invention, above-mentioned according to meaning of one's words attribute and rhythm attribute, change the step that each voice signal is a meaning of one's words rhythm vector respectively, according to the meaning of one's words attribute and the rhythm attribute of each word, obtain a meaning of one's words rhythm record earlier.Then, write down by the meaning of one's words rhythm and prospect a mood rule.Afterwards, just can meaning of one's words rhythm record be converted to meaning of one's words rhythm vector according to above-mentioned mood rule.
In one embodiment of this invention, the step of above-mentioned acquisition meaning of one's words rhythm record comprises: according to meaning of one's words attribute, judge whether each word is meaning of one's words label.Wherein, meaning of one's words label is to define and get according to lexical knowledge bank.When word belongs to meaning of one's words label, be a meaning of one's words rhythm label in conjunction with the meaning of one's words label of this word rhythm attribute pairing with it, with meaning of one's words rhythm tag record to meaning of one's words rhythm record.In addition, more can be according to meaning of one's words attribute, judge whether do not comprising the emotional characteristics speech in the word of meaning of one's words label, being a characteristic set, and characteristic set is recorded to meaning of one's words rhythm record in conjunction with the emotional characteristics speech rhythm attribute corresponding with it.
In one embodiment of this invention, the method for building up of mood semanteme model more comprises: according to a basic emotion rule and a lexical knowledge bank, stipulate meaning of one's words label.Wherein, meaning of one's words label comprises specific meaning of one's words label, negates meaning of one's words label and turnover meaning of one's words label.
In one embodiment of this invention, above-mentioned meaning of one's words rhythm record is converted to the step of meaning of one's words rhythm vector, at first,, calculates the meaning of one's words mark and the rhythm mark of the emotional characteristics speech in the meaning of one's words rhythm record according to the mood rule.Then, according to meaning of one's words mark and rhythm mark, the meaning of one's words rhythm that obtains each voice signal respectively is recorded in the dimension mark in the meaning of one's words rhythm vector.And the dimension of above-mentioned meaning of one's words rhythm vector is to determine according to the quantity of mood rule.
In one embodiment of this invention, above-mentioned before the step of meaning of one's words attribute that captures each word in the voice signal respectively and rhythm attribute, more comprise each voice signal is converted to sentence.And, the above-mentioned sentence speech that breaks is handled, and is obtained above-mentioned word.
In one embodiment of this invention, above-mentioned lexical knowledge bank is for knowing net (HowNet).Rhythm attribute comprises pitch, energy and the duration of a sound.
The present invention proposes a kind of sorting technique of voice mood.At first, according to respectively testing included meaning of one's words attribute of word and rhythm attribute in a plurality of voice signals, set up a mood semanteme model.And meaning of one's words attribute is an inquiry lexical knowledge bank and obtaining, and rhythm attribute then is obtained by each voice signal.Then, capture the meaning of one's words attribute and the rhythm attribute of each word to be measured in the voice signal to be measured.Afterwards, with the meaning of one's words attribute and the rhythm attribute of each word to be measured, the above-mentioned mood semanteme model of substitution is to obtain the mood semanteme mark.At last, judge the mood classification of voice signal to be measured according to the mood semanteme mark.
In one embodiment of this invention, the sorting technique of voice mood more comprises: in voice signal to be measured, detect the remarkable segment of a mood, with the prosodic features of the remarkable segment of acquisition mood.Afterwards, with prosodic features substitution mood rhythm model, to obtain mood rhythm mark.In view of the above, the step of the mood classification of above-mentioned judgement voice signal to be measured more can be judged according to mood semanteme mark and mood rhythm mark.
In one embodiment of this invention, above-mentioned in voice signal to be measured, the step of the remarkable segment of detection mood, the pitch track of fechtable voice signal to be measured, and the pitch track comprises a plurality of pitches.By the continuous segment that detects in the pitch track, with the continuous segment in the pitch track as the remarkable segment of mood.
Based on above-mentioned, the present invention trains meaning of one's words rhythm model according to the meaning of one's words attribute and the rhythm attribute of each word earlier, to be used as the classification of voice mood by this meaning of one's words rhythm model.Thus, strengthen the emotional characteristics of each mood classification, can improve the accuracy of mood classification via rhythm attribute.
Description of drawings
For above-mentioned purpose of the present invention, feature and advantage can be become apparent, below in conjunction with accompanying drawing the specific embodiment of the present invention is elaborated, wherein:
Fig. 1 is the method for building up process flow diagram of the mood semanteme model that illustrates according to first embodiment of the invention.
Fig. 2 is the sorting technique process flow diagram of the voice mood that illustrates according to first embodiment of the invention.
Fig. 3 is the method for building up process flow diagram of the mood semanteme model that illustrates according to second embodiment of the invention.
Fig. 4 is the synoptic diagram of knowing net conceptual record form that illustrates according to second embodiment of the invention.
Fig. 5 is the define method process flow diagram of the meaning of one's words label that illustrates according to second embodiment of the invention.
Fig. 6 is the synoptic diagram of the basic emotion factor that illustrates according to second embodiment of the invention.
Fig. 7 is the synoptic diagram of the meaning of one's words label that illustrates according to second embodiment of the invention.
Fig. 8 A is the synoptic diagram of the adeditive attribute that illustrates according to second embodiment of the invention.
Fig. 8 B is the synoptic diagram of the adeditive attribute weight table that illustrates according to second embodiment of the invention.
Fig. 9 is the synoptic diagram of the rhythm attribute weight table that illustrates according to second embodiment of the invention.
Figure 10 is the process flow diagram of the voice mood sorting technique that illustrates according to third embodiment of the invention.
The main element symbol description:
S105~S115: each step of the method for building up of the mood semanteme model of first embodiment of the invention
S205~S220: each step of the sorting technique of the voice mood of first embodiment of the invention
S310~S365: each step of the method for building up of the mood semanteme model of second embodiment of the invention
S505~S510: each step of define method of the meaning of one's words label of second embodiment of the invention
S1010~S1030: each step of voice mood sorting technique of third embodiment of the invention
301: know net
302,1002: meaning of one's words label data bank
303,1003: rhythm attribute data storehouse
304,1004: the mood rule
305: the adeditive attribute weight table
306: rhythm attribute weight table
1001: acoustic model
1005: mood semanteme model
1006: the mood rhythm model
Embodiment
Be different from tradition and only use the mood key word to classify, in the following example, will further analyze the literal meaning of one's words and be applied in the mood classification in mood.In order to make content of the present invention more clear, below the example that can implement according to this really as the present invention especially exemplified by embodiment.
First embodiment
Fig. 1 is the method for building up process flow diagram of the mood semanteme model that illustrates according to first embodiment of the invention.Please refer to Fig. 1, in step S105, provide a mood corpus, it comprises a plurality of voice signals.Before setting up mood semanteme model, collect the voice signal of multiple mood classification earlier.For example, can carry out voice recording respectively at angry, sad, glad and neutral four kinds of mood classifications by a plurality of different people, to set up the mood corpus.
Then, in step S110, capture the meaning of one's words attribute and the rhythm attribute of each word in each voice signal, in order to the eigenwert of coming with meaning of one's words attribute and rhythm attribute as classification.For example, earlier each voice signal is converted to sentence, the speech that again sentence broken is handled to be cut into a plurality of words.Afterwards, in a lexical knowledge bank, inquire about the meaning of one's words attribute of these words, and capture the rhythm attribute (for example, pitch, energy and the duration of a sound) of these words from voice signal.
At last, in step S115,, set up mood semanteme model according to meaning of one's words attribute and rhythm attribute.For example, according to meaning of one's words attribute and rhythm attribute, each voice signal is converted to meaning of one's words rhythm vector.Afterwards, relend by meaning of one's words rhythm vector and train mood semanteme model.Promptly be that emotional characteristicses such as the meaning of one's words attribute that will be captured and rhythm attribute utilize sorting technique to be concluded.
Generally speaking, sorting technique comprises support vector machine (Support Vector Machine, SVM), neural network (Neural Network, NN), concealed Marko husband module (Hidden Markov Model, HMM) and mixture-of-Gaussian mode (Gaussian Mixture Model, GMM) etc.In sorting technique, generally be to train by vector spatially.
For instance, a meaning of one's words rhythm record that can obtain this sentence according to the meaning of one's words attribute and the rhythm attribute of each word.Then, relend by writing down the mood rule of prospecting out each mood classification from the obtained whole meaning of one's words rhythm of mood corpus.Afterwards, just can meaning of one's words rhythm record be converted to meaning of one's words rhythm vector according to above-mentioned mood rule.
At this, can utilize above-mentioned lexical knowledge bank to define a plurality of meaning of one's words labels in advance.After the above-mentioned meaning of one's words label of definition, add the rhythm attribute in the voice signal again, and meaning of one's words label is extended to meaning of one's words rhythm label.In view of the above, just can strengthen the emotional characteristics of each mood classification via rhythm attribute.
Say further, after capturing the meaning of one's words attribute of each word, judge whether the meaning of one's words attribute of each word is meaning of one's words label.When word belongs to meaning of one's words label, just the meaning of one's words label of this word rhythm attribute pairing with it is combined into a meaning of one's words rhythm label.Afterwards, again with meaning of one's words rhythm tag record to the pairing meaning of one's words rhythm of sentence record.On the other hand, more can be according to meaning of one's words attribute, judge whether do not comprising the emotional characteristics speech in the word of meaning of one's words label, being a characteristic set, and characteristic set is recorded to meaning of one's words rhythm record in conjunction with the emotional characteristics speech rhythm attribute corresponding with it.In view of the above, just the mood rule that can get according to prospecting automatically is converted to meaning of one's words rhythm vector with meaning of one's words rhythm record, to train mood semanteme model by the relation of meaning of one's words rhythm vector in the space.
And after setting up the mood semanteme module, just can begin to carry out the identification of voice mood.Below lifting an example again illustrates.
Fig. 2 is the sorting technique process flow diagram of the voice mood that illustrates according to first embodiment of the invention.Please refer to Fig. 2, in step S205, receive a voice signal to be measured.After receiving voice signal to be measured, just voice signal to be measured is converted to sentence, and this sentence is cut into a plurality of words to be measured.
Then, shown in step S210, the meaning of one's words attribute of these words to be measured of inquiry and capture the rhythm attribute of these words to be measured from voice signal to be measured in lexical knowledge bank.
Afterwards, in step S215, with the meaning of one's words attribute and the rhythm attribute of each word to be measured, the substitution mood semanteme model is to obtain the mood semanteme mark.Because mood semanteme model is set up, and with meaning of one's words attribute and rhythm attribute substitution mood semanteme model, just can obtain the mood semanteme mark of this voice signal to be measured representative in each mood classification at this.
At last, in step S220,, judge the mood classification of voice signal to be measured according to the mood semanteme mark.Generally speaking, the high more person of mood semanteme mark normally represents last classification results.If on behalf of this voice signal to be measured, the mood semanteme mark of happy mood classification for the highest, promptly belong to happy mood classification.
Above-mentioned lexical knowledge bank is for example for knowing net (Hownet).Know that net is that a notion with the word representative of Chinese and english serves as to describe object, and disclose the relation between the attribute that is had between notion and the notion.Below just to know that net is an example, describes each step of setting up mood semanteme model in detail for an embodiment again.
Second embodiment
Fig. 3 is the method for building up process flow diagram of the mood semanteme model that illustrates according to second embodiment of the invention.Please refer to Fig. 3, in step S310, at first, inquiry knows that net 301 captures the meaning of one's words attribute of each word in the sentence.The notion of a plurality of words and the relation between these words in knowing net 301, have been write down.
Below enumerate an example and know the conceptual record form of net perfectly well.Fig. 4 is the synoptic diagram of knowing net conceptual record form that illustrates according to second embodiment of the invention.Please refer to Fig. 4, in knowing net, each word is to form a notes record by its notion and description.And each notes record mainly comprises function name variable (comprising word, speech word, word example and concept definition) and data.With " W_C=beats " is example, wherein " beat " to be data, and the function name variable of " beating " is " W_C ", that is to say that " beating " is a word.And be example with " G_C=V ", wherein " V " is data, and the function name variable of " V " is " G_C ", that is to say that " V " is a speech word.All the other by that analogy.
Return Fig. 3, then, in step S315, whether inquiry meaning of one's words label money storehouse 302 belongs to meaning of one's words label to judge meaning of one's words attribute.At this, meaning of one's words label is to stipulate by knowing net 301 central defined meaning of one's words attributes.Below lift each step of define method of example plain language meaning label again.
Fig. 5 is the define method process flow diagram of the meaning of one's words label that illustrates according to second embodiment of the invention.Please refer to Fig. 5, in step S505, stipulate the emotional factor of basic initiation earlier.For example, human under what situation or situation with reference to mood psychology to understand, can cause the generation of mood.Go out to cause the basic emotion factor of mood at summarizing after, analyze the main meaning of one's words that these basic emotion factors are implied.
For instance, Fig. 6 is the synoptic diagram of the basic emotion factor that illustrates according to second embodiment of the invention.The basic emotion factor of concluding gained that is shown in Figure 6.At this, be respectively happy emotional factor, angry emotional factor and sad emotional factor.
Via observation, can find all to have the performance of some specific meaning of one's words among these basic emotion factors to the basic emotion factor.For example: obtain certain benefit, remove certain pressure, lose certain benefit or the like.In above-mentioned meaning of one's words performance, for example will " obtain ", action description such as " releasing ", " losing " partly is called " action content word ", and will be attached to the action content word and the part that forms the complete meaning of one's words is called " attached action content word ", as certain benefit, pressure, target etc.
Get back to Fig. 5, correctly captured meaning of one's words attribute in the sentence, just, utilize the basic emotion rule and know that net defines meaning of one's words label afterwards as step S510 in order to pick out by voice signal.
For instance, Fig. 7 is the synoptic diagram of the meaning of one's words label that illustrates according to second embodiment of the invention.At this, meaning of one's words label comprises specific meaning of one's words label, negates meaning of one's words label and turnover meaning of one's words label.Wherein, specific meaning of one's words label is the word that is used for expressing the specific meaning of one's words, negates the word that meaning of one's words label has negative meaning, and turnover meaning of one's words label then has the word of tone turnover.
At this, just according to the verb of knowing in the net, select the meaning of one's words attribute that wherein has the specific meaning of one's words of expression, again these meaning of one's words attributes are divided into 15 classes, become the definition of 15 specific meaning of one's words labels.
With meaning of one's words label [reaching] is example, and the word that has " Vachieve| reaches ", " fulful| realization ", " end| termination ", " finish| finishes ", " succeed| success " attribute in knowing net will be classified into [reaching] this meaning of one's words label.For example, " finding " and " guessing " two words being recorded as in knowing net " DEF=Vachieve| reaches ", therefore, above-mentioned two words will be classified to [reaching] this meaning of one's words label.
The definition of negating meaning of one's words label is to have all directly acquisitions of word of feature " neg| is negative " with knowing in the definition of all words in the net, and becomes the definition of negative meaning label.
The definition of turnover meaning of one's words label then is to observe to know all adverbial words and conjunction in the net, will have the definition of word acquisition the becoming turnover meaning of one's words label of the turnover tone.In addition, according to the characteristic of turnover language, the meaning of one's words of will transferring label is divided into two kinds again, and one is [turnover-acquisition], and another is [turnover-omission].
After meaning of one's words label definition is finished, just can use it to voice mood is classified.
Return Fig. 3, in step S315, when the meaning of one's words attribute of word meets meaning of one's words label, shown in step S320, indicate corresponding meaning of one's words label.Afterwards, shown in step S325, inquiry rhythm attribute data storehouse 303 is to be extended for meaning of one's words label on meaning of one's words rhythm label.In view of the above, can strengthen the emotional characteristics of each mood classification via rhythm attribute.Then, in step S340, during so far the pairing meaning of one's words rhythm of sentence writes down with meaning of one's words rhythm tag record.
For instance, after voice signal being converted to the sentence and the speech that breaks, can capturing the rhythm attribute of this segment according to the segment of each word in voice signal, and be recorded to rhythm attribute data storehouse 303.At this, rhythm attribute comprises pitch, energy and the duration of a sound.And each rhythm attribute can be quantized into three degree respectively and represents, pitch and energy with high (H), in (M), low (L) represent, the duration of a sound then with length (L), in (M), short (S) represent.
Return step S315,, shown in step S330, in these words, get the emotional characteristics speech when the meaning of one's words attribute of word does not meet meaning of one's words label.Afterwards, shown in step S335, these emotional characteristics speech are become a characteristic set in conjunction with the rhythm attribute of correspondence.Then, in step S340, characteristic set is write down so far in the pairing meaning of one's words rhythm record of sentence.
That is to say, except the speech of being put on meaning of one's words label external with it, other words of not put on meaning of one's words label (for example, adjective or noun) in, get its emotional characteristics speech in knowing net 301, and this emotional characteristics speech is added among the meaning of one's words rhythm record meaning of one's words feature of a complete sentence at last just thus.
For instance, suppose that the sentence of voice signal conversion is, " all fast out of funds having lived, take and counted out money not bad today ".After through above-mentioned steps S310~S325, the meaning of one's words rhythm label that obtains " not having " is [negating _ PM_EH_DS], the meaning of one's words rhythm label of " fortunately " is [turnover-acquisition _ PL_EM_DL], the meaning of one's words rhythm label of " taking " is [obtaining _ PH_EH_DM], and other words then do not have any meaning of one's words label.
Then, do not indicate in the word of meaning of one's words label, find out the emotional characteristics speech according to its meaning of one's words attribute at other.Through after above-mentioned steps S330~S335, the characteristic set that obtains " money " is [wealth| wealth _ PH_EM_DS], and the characteristic set of " a bit " is [few| few _ PL_EM_DM].
In view of the above, the meaning of one's words rhythm that obtained record then is { [negating _ PM_EH_DS], [turnover-acquisition _ PL_EM_DL], [obtaining _ PH_EH_DM], [few| few _ PL_EM_DM], [wealth| wealth _ PH_EM_DS] }.
It should be noted that in the present embodiment only acquisition [turnover-acquisition] meaning of one's words rhythm label afterwards, so the meaning of one's words rhythm that obtains at last is recorded as { [obtaining _ PH_EH_DM], [few| few _ PL_EM_DM], [wealth| wealth _ PH_EM_DS] }.
And after the acquisition of the sentence after the identification via meaning of one's words attribute and rhythm attribute, become a meaning of one's words rhythm record.Afterwards, just use the technology of Date Mining, from whole meaning of one's words rhythm records, prospect out mood rule 304 automatically.
Because the sign program of meaning of one's words rhythm label is to be the center with the meaning of one's words rhythm label that is labeled, therefore desired mood conformation of rules is T → D.Wherein, T represents meaning of one's words rhythm label, for example [reaches _ PH_EM_DS], [releasing _ PM_EH_DM] etc.And D is the attached content word that is attached to certain action, is to utilize to know that net 301 captures main emotional characteristics speech and becomes characteristic set in conjunction with rhythm attribute, as [symbol| symbol _ PM_EH_DM].At this T and D can be one or more.In view of the above, no matter be that T1^T2 → D1 or T3 → D2^D3 are all possible.
After obtaining mood rule 304 via the Date Mining technology, in step S345~S355, utilize mood rule 304, each meaning of one's words rhythm record is converted to a meaning of one's words rhythm vector representation, wherein a dimension in each bar mood rule representation vector space.
If neutral mood rule is R 1 N, R 2 N...,
Figure G2008101794972D0000091
Happy mood rule is R 1 H, R 2 H...,
Figure G2008101794972D0000092
Angry mood rule is R 1 A, R 2 A...,
Figure G2008101794972D0000093
Sad mood rule is R 1 S, R 2 S...,
Figure G2008101794972D0000094
Then the meaning of one's words rhythm vector representation of each meaning of one's words rhythm record is
Figure G2008101794972D0000095
This meaning of one's words rhythm vector then is to be r in dimension N+ r H+ r A+ r SA point in the vector space.
In step S345,, calculate the meaning of one's words mark of meaning of one's words rhythm record according to mood rule 304.In detail, when score, check earlier whether the T part of meaning of one's words rhythm record meets the T part of mood rule 304.If the T of meaning of one's words rhythm record partly meets the T part of mood rule, just further D is partly checked, to calculate this dimension mark.
Because in knowing net 301 definition, word has social strata relation, this social strata relation is when two emotional characteristics speech are different between meaning of one's words rhythm record and mood rule 304, is used for calculating the meaning of one's words similarity of two emotional characteristics speech.In knowing net 301, if the darkest m layer that is divided into of social strata relation, stratum is darker, and it concerns that better promptly the meaning of one's words is more close.So two emotional characteristics speech D between meaning of one's words rhythm record and the mood rule 304 i, D jThe comparison mark be:
v p ( D i , D j ) = 1 if D i = D j L ( D i , D j ) m ( 1 N ( L ( D i , D j ) ) ) m - L ( D i , D j ) - 1 if D i ≠ D j .
Wherein, L (D i, D j) be D i, D jThe same path length of maximal phase, N (L (D i, D j)) being the maximal phase son node number of footpath under the node of going the same way, m is a social strata relation.
After social strata relation is tried to achieve,, carry out computing again according to different relations on attributes again, shown in step S350, calculate the score of adeditive attribute according to adeditive attribute weight table 305 if the emotional characteristics speech in the meaning of one's words rhythm record still has an adeditive attribute to exist.For instance, Fig. 8 A is the synoptic diagram of the adeditive attribute that illustrates according to second embodiment of the invention.Fig. 8 B is the synoptic diagram of the adeditive attribute weight table that illustrates according to second embodiment of the invention.At Fig. 8 A,, eight kinds of adeditive attributes have been defined to know that net 301 is an example.In Fig. 8 B,, give each adeditive attribute one weighted value according to the relation between each adeditive attribute.
Return Fig. 4, then, in step S355, calculate the score of rhythm attribute according to rhythm attribute weight table 306.For instance, Fig. 9 is the synoptic diagram of the rhythm attribute weight table that illustrates according to second embodiment of the invention.In Fig. 9, rhythm attribute can be quantized into three degree respectively represent, pitch and energy with high (H), in (M), low (L) represent, the duration of a sound then with length (L), in (M), short (S) represent.Give a weighted value according to the degree gap that quantizes.H and M degree are nearer, and the weighted value of giving is 0.5; And H and L degree are far away, and the weighted value of giving is 0.25; In addition, M and H degree are nearer, and the weighted value of being given also is 0.5.By that analogy.
Afterwards, in step S360, calculate each dimension mark in the meaning of one's words rhythm vector according to above-mentioned steps S345~S355.
For instance, suppose D iBe [symbol| symbol _ PM_EH_DM], D jBe [﹠amp; Language| language _ PH_EM_DM].D iIn the path of knowing net is 1.1.2.5.1.1, and D jIn the path of knowing net is 1.1.2.5.1, L (D i, D j) be 5, N (L (D i, D j)) be 4, try to achieve social strata relation mark v pBe 5/28.Adeditive attribute mark v rBe 0.5.Rhythm attribute scores v fBe 0.5.The one dimension mark of trying to achieve at last in meaning of one's words rhythm vector is v=v p* v r* v f
At last, in step S365, the meaning of one's words rhythm of each mood classification vector collect finish after, just can the gauss hybrid models construction go out the mood semanteme model of each mood classification.After the mood semanteme model construction is finished, just can begin to carry out the classification of voice mood.Below illustrate for another embodiment again.
The 3rd embodiment
Figure 10 is the process flow diagram of the voice mood sorting technique that illustrates according to third embodiment of the invention.Please refer to Figure 10, in the present embodiment, receive after the voice signal to be measured, can carry out the analysis of the literal meaning of one's words and the acquisition of prosodic features to voice signal to be measured respectively.
In the literal meaning of one's words analytically, at first shown in step S1010, utilize the acoustic model 1001 (for example HMM) of a speech recognition that voice signal to be measured is converted to sentence.Then, shown in step 1015, utilize meaning of one's words label data bank 1002, rhythm attribute data storehouse 1003 and mood rule 1004, this sentence is converted to meaning of one's words rhythm vector.
At this, step 1015 is same or similar with the step S310~S360 among aforementioned second embodiment, and the method for building up of meaning of one's words label data bank 302, rhythm attribute data storehouse 303 and mood rule 304 among meaning of one's words label data bank 1002, rhythm attribute data storehouse 1003 and mood rule 1004 and aforementioned second embodiment is also same or similar, so neitherly give unnecessary details at this again.
On the other hand, in the acquisition of prosodic features, at first shown in step S1020, in voice signal to be measured, detect the remarkable segment of a mood.In voice signal to be measured, influenced the accuracy rate of emotion identification for fear of non-mood segment, thereby can detect the remarkable segment of mood (Emotionally Salient Segment) earlier, to capture prosodic features at remarkable segment.The remarkable segment of so-called mood is the pitch track (Pitch Contour) that calculates whole voice signal to be measured earlier.If there is continuous segment in the pitch track, then this continuous segment is defined as the remarkable segment of mood.
Then, in step S1025, capture prosodic features based on the remarkable segment of mood.Prosodic features comprises maximum pitch value, minimum pitch value, average pitch value, pitch variance, maximum energy value, minimum energy value, the average energy value, energy variance, maximum resonance peak value, minimum resonance peak, average resonance peak, resonance peak variance, amounts to 12 parameters.These 12 parameters are considered as the rhythm vector of 12 dimensions.
At last in step S1030,, and decide the mood classification of voice signal to be measured according to shellfish formula theorem in conjunction with mood semanteme model 1005 and mood rhythm model 1006.In detail, just meaning of one's words rhythm vector substitution mood semanteme model 1005 is obtained the mood semanteme mark, and rhythm vector substitution mood rhythm model 1006 is obtained mood rhythm mark.Afterwards, the mood classification of the posterior probability maximum of being found out by shellfish formula theorem is last identification result.
At this, mood rhythm model 1006 is set up with gauss hybrid models.Just, capture above-mentioned 12 parameters respectively, train this mood rhythm model 1006 as rhythm vector with these 12 parameters with the voice signal in the mood corpus.
In sum, in the above-described embodiments, analyze the meaning of one's words attribute of each word, and, improve the accuracy of mood classification according to this further combined with rhythm attribute.In addition, more can only analyze, influence the accuracy of mood classification with the segment of avoiding non-mood at the remarkable segment of the mood in the voice signal.
Though the present invention discloses as above with preferred embodiment; right its is not in order to qualification the present invention, any those skilled in the art, without departing from the spirit and scope of the present invention; when can doing a little modification and perfect, so protection scope of the present invention is when with being as the criterion that claims were defined.

Claims (24)

1. the method for building up of a mood semanteme model comprises:
One mood corpus is provided, and it comprises a plurality of voice signals that belong to a plurality of mood classifications;
Capture an a plurality of words meaning of one's words attribute and the rhythm attribute separately in those voice signals respectively, wherein this meaning of one's words attribute is to obtain according to a lexical knowledge bank, and this rhythm attribute is obtained by each those voice signal; And
By those voice signals this meaning of one's words attribute and this rhythm attribute separately, set up this mood semanteme model.
2. the method for building up of mood semanteme model as claimed in claim 1 is characterized in that, by those voice signals this meaning of one's words attribute and this rhythm attribute separately, sets up the step of this mood semanteme model, comprising:
According to this meaning of one's words attribute and this rhythm attribute, each those voice signal of conversion are a meaning of one's words rhythm vector respectively, with by those voice signals this meaning of one's words rhythm vector separately, set up this mood semanteme model.
3. the method for building up of mood semanteme model as claimed in claim 2 is characterized in that, by those voice signals this meaning of one's words rhythm vector separately, sets up the step of this mood semanteme model, comprising:
With those voice signals this meaning of one's words rhythm vector substitution one gauss hybrid models separately, to set up this mood semanteme model.
4. the method for building up of mood semanteme model as claimed in claim 2 is characterized in that, according to this meaning of one's words attribute and this rhythm attribute, each those voice signal of conversion are the step of this meaning of one's words rhythm vector respectively, comprising:
According to those words this meaning of one's words attribute and this rhythm attribute separately, obtain a meaning of one's words rhythm record;
By this meaning of one's words rhythm record, prospect a mood rule; And
According to this mood rule, this meaning of one's words rhythm record is converted to this meaning of one's words rhythm vector.
5. the method for building up of mood semanteme model as claimed in claim 4 is characterized in that, according to those words this meaning of one's words attribute and this rhythm attribute separately, obtains the step of this meaning of one's words rhythm record, comprising:
According to this meaning of one's words attribute, judge whether each those word belongs to a meaning of one's words label, and wherein this meaning of one's words label is to define and get according to this lexical knowledge bank; And
When one of them belongs to this meaning of one's words label when those words, be a meaning of one's words rhythm label in conjunction with this meaning of one's words label this rhythm attribute pairing with it, with this meaning of one's words rhythm tag record to this meaning of one's words rhythm record.
6. the method for building up of mood semanteme model as claimed in claim 5 is characterized in that, more comprises:
According to a basic emotion factor and this lexical knowledge bank, stipulate this meaning of one's words label, wherein this meaning of one's words label comprises that a specific meaning of one's words label, is negated a meaning of one's words label and a turnover meaning of one's words label.
7. the method for building up of mood semanteme model as claimed in claim 5 is characterized in that, according to those words this meaning of one's words attribute and this rhythm attribute separately, obtains the step of this meaning of one's words rhythm record, more comprises:
According to this meaning of one's words attribute, judge in not belonging to those words of this meaning of one's words label, whether to comprise an emotional characteristics speech, being a characteristic set, and this characteristic set is recorded to this meaning of one's words rhythm record in conjunction with this emotional characteristics speech this rhythm attribute corresponding with it.
8. the method for building up of mood semanteme model as claimed in claim 7 is characterized in that, according to this mood rule, this meaning of one's words rhythm is write down the step that is converted to this meaning of one's words rhythm vector, comprising:
According to this mood rule, calculate a meaning of one's words mark and a rhythm mark of the emotional characteristics speech in this meaning of one's words rhythm record; And
According to this meaning of one's words mark and this rhythm mark, obtain those voice signals this meaning of one's words rhythm separately respectively and be recorded in a dimension mark in this meaning of one's words rhythm vector, and the dimension of this meaning of one's words rhythm vector is to determine according to the quantity of this mood rule.
9. the method for building up of mood semanteme model as claimed in claim 1 is characterized in that, before this meaning of one's words attribute and the step of this rhythm attribute separately of those words in capturing those voice signals respectively, more comprises:
Each those voice signal of conversion are a sentence; And
This sentence speech that breaks is handled, and obtained those words.
10. the method for building up of mood semanteme model as claimed in claim 1 is characterized in that, this lexical knowledge bank is for knowing net.
11. the method for building up of mood semanteme model as claimed in claim 1 is characterized in that, this rhythm attribute comprises pitch, energy and the duration of a sound.
12. the sorting technique of a voice mood comprises:
According to an included separately meaning of one's words attribute and the rhythm attribute of a plurality of test words in a plurality of voice signals, set up a mood semanteme model, wherein this meaning of one's words attribute is to obtain according to a lexical knowledge bank, and this rhythm attribute is obtained by each those voice signal;
Receive a voice signal to be measured;
Capture word a plurality of to be measured this meaning of one's words attribute and this rhythm attribute separately in this voice signal to be measured;
With those words to be measured this meaning of one's words attribute and this rhythm attribute separately, this mood semanteme model of substitution is to obtain a mood semanteme mark; And
According to this mood semanteme mark, judge the mood classification of this voice signal to be measured.
13. the sorting technique of voice mood as claimed in claim 12 is characterized in that, more comprises:
In this voice signal to be measured, detect the remarkable segment of a mood;
Capture a plurality of prosodic features of the remarkable segment of this mood;
With those prosodic features substitution one mood rhythm models, to obtain a mood rhythm mark.
14. the sorting technique of voice mood as claimed in claim 13 is characterized in that, according to this mood semanteme mark, judges the step of the mood classification of this voice signal to be measured, more comprises:
According to this mood semanteme mark and this mood rhythm mark, judge the mood classification of this voice signal to be measured.
15. the sorting technique of voice mood as claimed in claim 13 is characterized in that, in this voice signal to be measured, detects the step of the remarkable segment of this mood, comprising:
Capture a pitch track of this voice signal to be measured; And
Detect the continuous segment in this pitch track, with the continuous segment in this pitch track as the remarkable segment of this mood.
16. the sorting technique of voice mood as claimed in claim 12 is characterized in that, with those words to be measured this meaning of one's words attribute and this rhythm attribute separately, the step of this mood semanteme model of substitution comprises:
According to this meaning of one's words attribute and this rhythm attribute, changing this voice signal to be measured is a meaning of one's words rhythm vector; And
With this this mood semanteme model of meaning of one's words rhythm vector substitution.
17. the sorting technique of voice mood as claimed in claim 16 is characterized in that, according to this meaning of one's words attribute and this rhythm attribute, changes the step of this voice signal to be measured for this meaning of one's words rhythm vector, comprising:
According to those words to be measured this meaning of one's words attribute and this rhythm attribute separately, obtain a meaning of one's words rhythm record; And
According to a mood rule, this meaning of one's words rhythm record is converted to this meaning of one's words rhythm vector.
18. the sorting technique of voice mood as claimed in claim 17 is characterized in that, according to those words to be measured this meaning of one's words attribute and this rhythm attribute separately, obtains the step of this meaning of one's words rhythm record, comprising:
According to this meaning of one's words attribute, judge whether each those word to be measured belongs to a meaning of one's words label, and wherein this meaning of one's words label is to define and get according to this lexical knowledge bank;
When one of them belongs to this meaning of one's words label when those words to be measured, be a meaning of one's words rhythm label in conjunction with this meaning of one's words label this rhythm attribute pairing with it, with this meaning of one's words rhythm tag record to this meaning of one's words rhythm record; And
According to this meaning of one's words attribute, judge in not belonging to those words to be measured of this meaning of one's words label, whether to be an emotional characteristics speech, being a characteristic set, and this characteristic set is recorded to this meaning of one's words rhythm record in conjunction with this emotional characteristics speech this rhythm attribute corresponding with it.
19. the sorting technique of voice mood as claimed in claim 18 is characterized in that, this meaning of one's words label comprises that a specific meaning of one's words label, is negated a meaning of one's words label and a turnover meaning of one's words label.
20. the sorting technique of voice mood as claimed in claim 18 is characterized in that, according to this mood rule, this meaning of one's words rhythm is write down the step that is converted to this meaning of one's words rhythm vector, comprising:
According to this mood rule, calculate a meaning of one's words mark and a rhythm mark of the emotional characteristics speech in this meaning of one's words rhythm record; And
According to this meaning of one's words mark and this rhythm mark, obtain this meaning of one's words rhythm and be recorded in a dimension mark in this meaning of one's words rhythm vector, and the dimension of this meaning of one's words rhythm vector is to determine according to the quantity of this mood rule.
21. the sorting technique of voice mood as claimed in claim 12 is characterized in that, sets up the step of this mood semanteme model, comprising:
According to a gauss hybrid models, set up this mood semanteme model.
22. the sorting technique of voice mood as claimed in claim 12 is characterized in that, before acquisition those words to be measured in this voice signal to be measured this meaning of one's words attribute and the step of this rhythm attribute separately, more comprises:
Changing this voice signal to be measured is a sentence; And
This sentence speech that breaks is handled, and obtained those words to be measured.
23. the sorting technique of voice mood as claimed in claim 12 is characterized in that, this lexical knowledge bank is for knowing net.
24. the sorting technique of voice mood as claimed in claim 12 is characterized in that, this rhythm attribute comprises pitch, energy and the duration of a sound.
CN2008101794972A 2008-12-03 2008-12-03 Voice mood sorting method and establishing method for mood semanteme model thereof Expired - Fee Related CN101751923B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101794972A CN101751923B (en) 2008-12-03 2008-12-03 Voice mood sorting method and establishing method for mood semanteme model thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101794972A CN101751923B (en) 2008-12-03 2008-12-03 Voice mood sorting method and establishing method for mood semanteme model thereof

Publications (2)

Publication Number Publication Date
CN101751923A true CN101751923A (en) 2010-06-23
CN101751923B CN101751923B (en) 2012-04-18

Family

ID=42478794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101794972A Expired - Fee Related CN101751923B (en) 2008-12-03 2008-12-03 Voice mood sorting method and establishing method for mood semanteme model thereof

Country Status (1)

Country Link
CN (1) CN101751923B (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637433A (en) * 2011-02-09 2012-08-15 富士通株式会社 Method and system for identifying affective state loaded in voice signal
CN103024521A (en) * 2012-12-27 2013-04-03 深圳Tcl新技术有限公司 Program screening method, program screening system and television with program screening system
CN103390409A (en) * 2012-05-11 2013-11-13 鸿富锦精密工业(深圳)有限公司 Electronic device and method for sensing pornographic voice bands
CN103634472A (en) * 2013-12-06 2014-03-12 惠州Tcl移动通信有限公司 Method, system and mobile phone for judging mood and character of user according to call voice
CN104700829A (en) * 2015-03-30 2015-06-10 中南民族大学 System and method for recognizing voice emotion of animal
CN105355193A (en) * 2015-10-30 2016-02-24 百度在线网络技术(北京)有限公司 Speech synthesis method and device
CN106294845A (en) * 2016-08-19 2017-01-04 清华大学 The many emotions sorting technique extracted based on weight study and multiple features and device
CN106910513A (en) * 2015-12-22 2017-06-30 微软技术许可有限责任公司 Emotional intelligence chat engine
CN107004428A (en) * 2014-12-01 2017-08-01 雅马哈株式会社 Session evaluating apparatus and method
US9742710B2 (en) 2014-03-27 2017-08-22 Huawei Technologies Co., Ltd. Mood information processing method and apparatus
CN107331388A (en) * 2017-06-15 2017-11-07 重庆柚瓣科技有限公司 A kind of dialect collection system based on endowment robot
CN107516511A (en) * 2016-06-13 2017-12-26 微软技术许可有限责任公司 The Text To Speech learning system of intention assessment and mood
CN108962255A (en) * 2018-06-29 2018-12-07 北京百度网讯科技有限公司 Emotion identification method, apparatus, server and the storage medium of voice conversation
CN110910865A (en) * 2019-11-25 2020-03-24 秒针信息技术有限公司 Voice conversion method and device, storage medium and electronic device
CN111475023A (en) * 2020-04-07 2020-07-31 四川虹美智能科技有限公司 Refrigerator control method and device based on speech emotion recognition
CN111540358A (en) * 2020-04-26 2020-08-14 云知声智能科技股份有限公司 Man-machine interaction method, device, equipment and storage medium

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI602174B (en) * 2016-12-27 2017-10-11 李景峰 Emotion recording and management device, system and method based on voice recognition

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7983910B2 (en) * 2006-03-03 2011-07-19 International Business Machines Corporation Communicating across voice and text channels with emotion preservation

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637433B (en) * 2011-02-09 2015-11-25 富士通株式会社 The method and system of the affective state carried in recognition of speech signals
CN102637433A (en) * 2011-02-09 2012-08-15 富士通株式会社 Method and system for identifying affective state loaded in voice signal
CN103390409A (en) * 2012-05-11 2013-11-13 鸿富锦精密工业(深圳)有限公司 Electronic device and method for sensing pornographic voice bands
CN103024521B (en) * 2012-12-27 2017-02-08 深圳Tcl新技术有限公司 Program screening method, program screening system and television with program screening system
CN103024521A (en) * 2012-12-27 2013-04-03 深圳Tcl新技术有限公司 Program screening method, program screening system and television with program screening system
CN103634472A (en) * 2013-12-06 2014-03-12 惠州Tcl移动通信有限公司 Method, system and mobile phone for judging mood and character of user according to call voice
US9742710B2 (en) 2014-03-27 2017-08-22 Huawei Technologies Co., Ltd. Mood information processing method and apparatus
CN107004428A (en) * 2014-12-01 2017-08-01 雅马哈株式会社 Session evaluating apparatus and method
CN104700829B (en) * 2015-03-30 2018-05-01 中南民族大学 Animal sounds Emotion identification system and method
CN104700829A (en) * 2015-03-30 2015-06-10 中南民族大学 System and method for recognizing voice emotion of animal
CN105355193B (en) * 2015-10-30 2020-09-25 百度在线网络技术(北京)有限公司 Speech synthesis method and device
CN105355193A (en) * 2015-10-30 2016-02-24 百度在线网络技术(北京)有限公司 Speech synthesis method and device
CN106910513A (en) * 2015-12-22 2017-06-30 微软技术许可有限责任公司 Emotional intelligence chat engine
US11238842B2 (en) 2016-06-13 2022-02-01 Microsoft Technology Licensing, Llc Intent recognition and emotional text-to-speech learning
CN107516511A (en) * 2016-06-13 2017-12-26 微软技术许可有限责任公司 The Text To Speech learning system of intention assessment and mood
CN106294845B (en) * 2016-08-19 2019-08-09 清华大学 The susceptible thread classification method and device extracted based on weight study and multiple features
CN106294845A (en) * 2016-08-19 2017-01-04 清华大学 The many emotions sorting technique extracted based on weight study and multiple features and device
CN107331388A (en) * 2017-06-15 2017-11-07 重庆柚瓣科技有限公司 A kind of dialect collection system based on endowment robot
CN108962255A (en) * 2018-06-29 2018-12-07 北京百度网讯科技有限公司 Emotion identification method, apparatus, server and the storage medium of voice conversation
CN110910865A (en) * 2019-11-25 2020-03-24 秒针信息技术有限公司 Voice conversion method and device, storage medium and electronic device
CN111475023A (en) * 2020-04-07 2020-07-31 四川虹美智能科技有限公司 Refrigerator control method and device based on speech emotion recognition
CN111540358A (en) * 2020-04-26 2020-08-14 云知声智能科技股份有限公司 Man-machine interaction method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN101751923B (en) 2012-04-18

Similar Documents

Publication Publication Date Title
CN101751923B (en) Voice mood sorting method and establishing method for mood semanteme model thereof
CN101710490B (en) Method and device for compensating noise for voice assessment
CN106228980B (en) Data processing method and device
CN110570873B (en) Voiceprint wake-up method and device, computer equipment and storage medium
CN103177733B (en) Standard Chinese suffixation of a nonsyllabic "r" sound voice quality evaluating method and system
CN101261832A (en) Extraction and modeling method for Chinese speech sensibility information
CN108597494A (en) Tone testing method and device
CN108447471A (en) Audio recognition method and speech recognition equipment
CN106294774A (en) User individual data processing method based on dialogue service and device
CN103077720B (en) Speaker identification method and system
CN106782607A (en) Determine hot word grade of fit
CN101930735A (en) Speech emotion recognition equipment and speech emotion recognition method
CN108648759A (en) A kind of method for recognizing sound-groove that text is unrelated
CN109815336A (en) A kind of text polymerization and system
CN102982811A (en) Voice endpoint detection method based on real-time decoding
TWI389100B (en) Method for classifying speech emotion and method for establishing emotional semantic model thereof
CN105810191B (en) Merge the Chinese dialects identification method of prosodic information
CN112259106A (en) Voiceprint recognition method and device, storage medium and computer equipment
CN109471942A (en) Chinese comment sensibility classification method and device based on evidential reasoning rule
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
CN109243460A (en) A method of automatically generating news or interrogation record based on the local dialect
CN109272991A (en) Method, apparatus, equipment and the computer readable storage medium of interactive voice
CN110211595A (en) A kind of speaker clustering system based on deep learning
Gong et al. Vocalsound: A dataset for improving human vocal sounds recognition
CN110245216A (en) For the semantic matching method of question answering system, device, equipment and storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120418