CN101751923B - Voice mood sorting method and establishing method for mood semanteme model thereof - Google Patents

Voice mood sorting method and establishing method for mood semanteme model thereof Download PDF

Info

Publication number
CN101751923B
CN101751923B CN2008101794972A CN200810179497A CN101751923B CN 101751923 B CN101751923 B CN 101751923B CN 2008101794972 A CN2008101794972 A CN 2008101794972A CN 200810179497 A CN200810179497 A CN 200810179497A CN 101751923 B CN101751923 B CN 101751923B
Authority
CN
China
Prior art keywords
words
meaning
rhythm
mood
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN2008101794972A
Other languages
Chinese (zh)
Other versions
CN101751923A (en
Inventor
吴宗宪
李伟铨
林瑞堂
许进顺
朱家德
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute for Information Industry
Original Assignee
Institute for Information Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute for Information Industry filed Critical Institute for Information Industry
Priority to CN2008101794972A priority Critical patent/CN101751923B/en
Publication of CN101751923A publication Critical patent/CN101751923A/en
Application granted granted Critical
Publication of CN101751923B publication Critical patent/CN101751923B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a voice mood sorting method and establishing method for mood semanteme model thereof. At first, a mood semanteme model is established according to the semanteme property and rhythm property included in the voice signal of a semanteme corpus and the semanteme property and rhythm property of each candidate detecting verbal in a candidate detecting voice signal are captured; then, the semanteme property and rhythm property of each candidate detecting verbal are inputted into the mood semanteme model, thus shorting the candidate detecting voice signal into corresponding mood category, thereby reinforcing the mood characteristics of each mood category through the rhythm property, and improving the currency of mood sorting.

Description

The sorting technique of voice mood and the method for building up of mood semanteme model thereof
Technical field
The invention relates to a kind of emotion identification method, and particularly relevant for a kind of sorting technique of the voice mood that combines the meaning of one's words and the rhythm and the method for building up of mood semanteme model thereof.
Background technology
In recent years, because science and technology is with rapid changepl. never-ending changes and improvements, the communicative mode between people and the intelligent electronic installation no longer was the past to input to electronic installation with instruction, and electronic installation can satisfy with the mode that literal is responded again.Therefore, the man-machine interface between the following mankind and the intelligent electronic device also will through nature and easily communication media " voice " control.And in order to make man-machine interface system more diversity and hommization, many scholars, manufacturer then there's no one who doesn't or isn't take up to study the identification of mood.
With customer service system, the user who utilizes TV and network to do shopping at present is more and more general.When product broke down, the user mostly all can make a phone call to inquire to client service center.If customer service system can pick out the present emotional state of user, just the contact staff can pacify user's mood as early as possible.And the contact staff also can judge whether self can solve according to the user's who is picked out mood, and whether decision pacifies action for senior contact staff call forwarding.Thus, just can solve many unnecessary conflicts produces.In view of the above, how improving the accuracy of emotion identification, also is an important ring of research at present.
Summary of the invention
The present invention provides a kind of method for building up of mood semanteme model, comes the construction mood semanteme model in conjunction with meaning of one's words attribute and rhythm attribute.
The present invention provides a kind of sorting technique of voice mood, analyzes the meaning of one's words attribute of word and combines rhythm attribute, so as to improving the accuracy of classification.
The present invention proposes a kind of method for building up of mood semanteme model.A mood corpus is provided earlier, and it comprises a plurality of voice signals that belong to a plurality of mood classifications.Then, capture the meaning of one's words attribute and the rhythm attribute of each word in the voice signal respectively.Wherein, meaning of one's words attribute is an inquiry lexical knowledge bank and obtaining, and rhythm attribute is by capturing gained in each voice signal.Afterwards, just can set up mood semanteme model by the meaning of one's words attribute and the rhythm attribute of each voice signal.
In one embodiment of this invention, the above-mentioned step of setting up mood semanteme model comprises according to meaning of one's words attribute and rhythm attribute, converts each voice signal into meaning of one's words rhythm vector respectively.Again with these meaning of one's words rhythm vector substitution gauss hybrid models, to set up mood semanteme model.
In one embodiment of this invention, above-mentioned according to meaning of one's words attribute and rhythm attribute, changing each voice signal respectively is the step of meaning of one's words rhythm vector, according to the meaning of one's words attribute and the rhythm attribute of each word, obtains a meaning of one's words rhythm record earlier.Then, write down by the meaning of one's words rhythm and prospect mood rule.Afterwards, just can convert meaning of one's words rhythm record into meaning of one's words rhythm vector according to above-mentioned mood rule.
In one embodiment of this invention, above-mentioned acquisition meaning of one's words rhythm recorded steps comprises: according to meaning of one's words attribute, judge whether each word is meaning of one's words label.Wherein, meaning of one's words label is to define and get according to lexical knowledge bank.When word belongs to meaning of one's words label, be a meaning of one's words rhythm label in conjunction with the meaning of one's words label of this word rhythm attribute pairing, with meaning of one's words rhythm tag record to meaning of one's words rhythm record with it.In addition, more can judge in whether comprise the emotional characteristics speech, be a characteristic set to combine the emotional characteristics speech rhythm attribute corresponding, and characteristic set is recorded to meaning of one's words rhythm record with it according to meaning of one's words attribute not for the word of meaning of one's words label.
In one embodiment of this invention, the method for building up of mood semanteme model more comprises: according to a basic emotion rule and a lexical knowledge bank, stipulate meaning of one's words label.Wherein, meaning of one's words label comprises specific meaning of one's words label, negates meaning of one's words label and turnover meaning of one's words label.
In one embodiment of this invention, the above-mentioned meaning of one's words rhythm is write down converts the vectorial step of the meaning of one's words rhythm into, at first, according to the mood rule, calculates the meaning of one's words mark and the rhythm mark of the emotional characteristics speech in the meaning of one's words rhythm record.Then, according to meaning of one's words mark and rhythm mark, the meaning of one's words rhythm that obtains each voice signal respectively is recorded in the dimension mark in the meaning of one's words rhythm vector.And the dimension of above-mentioned meaning of one's words rhythm vector is to determine according to the quantity of mood rule.
In one embodiment of this invention, above-mentioned before the step of meaning of one's words attribute that captures each word in the voice signal respectively and rhythm attribute, more comprise converting each voice signal into sentence.And, the above-mentioned sentence speech that breaks is handled, and is obtained above-mentioned word.
In one embodiment of this invention, above-mentioned lexical knowledge bank is for knowing net (HowNet).Rhythm attribute comprises pitch, energy and the duration of a sound.
The present invention proposes a kind of sorting technique of voice mood.At first, according to respectively testing included meaning of one's words attribute of word and rhythm attribute in a plurality of voice signals, set up a mood semanteme model.And meaning of one's words attribute is an inquiry lexical knowledge bank and obtaining, and rhythm attribute then is to be obtained by each voice signal.Then, capture the meaning of one's words attribute and the rhythm attribute of each word to be measured in the voice signal to be measured.Afterwards, with the meaning of one's words attribute and the rhythm attribute of each word to be measured, the above-mentioned mood semanteme model of substitution is to obtain the mood semanteme mark.At last, judge the mood classification of voice signal to be measured according to the mood semanteme mark.
In one embodiment of this invention, the sorting technique of voice mood more comprises: in voice signal to be measured, detect the remarkable segment of a mood, with the prosodic features of the remarkable segment of acquisition mood.Afterwards, with prosodic features substitution mood rhythm model, to obtain mood rhythm mark.In view of the above, the step of the mood classification of above-mentioned judgement voice signal to be measured more can be judged according to mood semanteme mark and mood rhythm mark.
In one embodiment of this invention, above-mentioned in voice signal to be measured, the step of the remarkable segment of detection mood, the pitch track of fechtable voice signal to be measured, and the pitch track comprises a plurality of pitches.By the continuous segment that detects in the pitch track, with the continuous segment in the pitch track as the remarkable segment of mood.
Based on above-mentioned, the present invention trains meaning of one's words rhythm model according to the meaning of one's words attribute and the rhythm attribute of each word earlier, to be used as the classification of voice mood by this meaning of one's words rhythm model.Thus, strengthen the emotional characteristics of each mood classification, can improve the accuracy of mood classification via rhythm attribute.
Description of drawings
For let above-mentioned purpose of the present invention, feature and advantage can be more obviously understandable, elaborate below in conjunction with the accompanying drawing specific embodiments of the invention, wherein:
Fig. 1 is the method for building up process flow diagram according to the mood semanteme model that first embodiment of the invention illustrated.
Fig. 2 is the sorting technique process flow diagram according to the voice mood that first embodiment of the invention illustrated.
Fig. 3 is the method for building up process flow diagram according to the mood semanteme model that second embodiment of the invention illustrated.
Fig. 4 is the synoptic diagram according to the knowledge net conceptual record form that second embodiment of the invention illustrated.
Fig. 5 is the define method process flow diagram according to the meaning of one's words label that second embodiment of the invention illustrated.
Fig. 6 is the synoptic diagram according to the basic emotion factor that second embodiment of the invention illustrated.
Fig. 7 is the synoptic diagram according to the meaning of one's words label that second embodiment of the invention illustrated.
Fig. 8 A is the synoptic diagram according to the adeditive attribute that second embodiment of the invention illustrated.
Fig. 8 B is the synoptic diagram according to the adeditive attribute weight table that second embodiment of the invention illustrated.
Fig. 9 is the synoptic diagram according to the rhythm attribute weight table that second embodiment of the invention illustrated.
Figure 10 is the process flow diagram according to the voice mood sorting technique that third embodiment of the invention illustrated.
The main element symbol description:
S105~S115: each step of the method for building up of the mood semanteme model of first embodiment of the invention
S205~S220: each step of the sorting technique of the voice mood of first embodiment of the invention
S310~S365: each step of the method for building up of the mood semanteme model of second embodiment of the invention
S505~S510: each step of define method of the meaning of one's words label of second embodiment of the invention
S1010~S1030: each step of voice mood sorting technique of third embodiment of the invention
301: know net
302,1002: meaning of one's words label data bank
303,1003: rhythm attribute data storehouse
304,1004: the mood rule
305: the adeditive attribute weight table
306: rhythm attribute weight table
1001: acoustic model
1005: mood semanteme model
1006: the mood rhythm model
Embodiment
Be different from tradition and only use the mood key word to classify, in the following example, will further analyze the literal meaning of one's words and be applied in the mood classification in mood.In order to make content of the present invention more clear, below specially lift the example that embodiment can implement as the present invention really according to this.
First embodiment
Fig. 1 is the method for building up process flow diagram according to the mood semanteme model that first embodiment of the invention illustrated.Please with reference to Fig. 1, in step S105, a mood corpus is provided, it comprises a plurality of voice signals.Before setting up mood semanteme model, collect the voice signal of multiple mood classification earlier.For example, can carry out voice recording respectively to angry, sad, glad and neutral four kinds of mood classifications by a plurality of different people, to set up the mood corpus.
Then, in step S110, capture the meaning of one's words attribute and the rhythm attribute of each word in each voice signal, in order to coming eigenwert as classification with meaning of one's words attribute and rhythm attribute.For example, convert each voice signal into sentence earlier, again sentence is broken the speech processing to be cut into a plurality of words.Afterwards, in a lexical knowledge bank, inquire about the meaning of one's words attribute of these words, and capture the rhythm attribute (for example, pitch, energy and the duration of a sound) of these words from voice signal.
At last, in step S115,, set up mood semanteme model according to meaning of one's words attribute and rhythm attribute.For example, according to meaning of one's words attribute and rhythm attribute, convert each voice signal into meaning of one's words rhythm vector.Afterwards, relend by meaning of one's words rhythm vector and train mood semanteme model.Promptly be to utilize sorting technique to conclude emotional characteristicses such as meaning of one's words attribute that is captured and rhythm attribute.
Generally speaking; Sorting technique comprises SVMs (Support Vector Machine; SVM), neural network (Neural Network; NN), concealed Marko husband module (Hidden Markov Model, HMM) and mixture-of-gaussian mode (Gaussian Mixture Model, GMM) etc.In sorting technique, generally be to train by vector spatially.
A meaning of one's words rhythm record that for instance, can obtain this sentence according to the meaning of one's words attribute and the rhythm attribute of each word.Then, relend by whole meaning of one's words rhythm of being obtained from the mood corpus and write down the mood rule of prospecting out each mood classification.Afterwards, just can convert meaning of one's words rhythm record into meaning of one's words rhythm vector according to above-mentioned mood rule.
At this, can utilize above-mentioned lexical knowledge bank to define a plurality of meaning of one's words labels in advance.After the above-mentioned meaning of one's words label of definition, add the rhythm attribute in the voice signal again, and meaning of one's words label is extended to meaning of one's words rhythm label.In view of the above, just can strengthen the emotional characteristics of each mood classification via rhythm attribute.
Say further, after capturing the meaning of one's words attribute of each word, judge whether the meaning of one's words attribute of each word is meaning of one's words label.When word belongs to meaning of one's words label, just the meaning of one's words label of this word rhythm attribute pairing with it is combined into a meaning of one's words rhythm label.Afterwards, again with the pairing meaning of one's words rhythm of meaning of one's words rhythm tag record to sentence record.On the other hand, more can judge in whether comprise the emotional characteristics speech, be a characteristic set to combine the emotional characteristics speech rhythm attribute corresponding, and characteristic set is recorded to meaning of one's words rhythm record with it according to meaning of one's words attribute not for the word of meaning of one's words label.In view of the above, the mood rule that just can get according to prospecting automatically converts meaning of one's words rhythm record into meaning of one's words rhythm vector, to train mood semanteme model by the relation of meaning of one's words rhythm vector in the space.
And after setting up the mood semanteme module, just can begin to carry out the identification of voice mood.Below lifting an example again explains.
Fig. 2 is the sorting technique process flow diagram according to the voice mood that first embodiment of the invention illustrated.Please, in step S205, receive a voice signal to be measured with reference to Fig. 2.After receiving voice signal to be measured, just convert voice signal to be measured into sentence, and this sentence is cut into a plurality of words to be measured.
Then, shown in step S210, the meaning of one's words attribute of these words to be measured of inquiry and capture the rhythm attribute of these words to be measured from voice signal to be measured in lexical knowledge bank.
Afterwards, in step S215, with the meaning of one's words attribute and the rhythm attribute of each word to be measured, the substitution mood semanteme model is to obtain the mood semanteme mark.Because mood semanteme model is set up, and with meaning of one's words attribute and rhythm attribute substitution mood semanteme model, just can obtain the mood semanteme mark of this voice signal to be measured representative in each mood classification at this.
At last, in step S220,, judge the mood classification of voice signal to be measured according to the mood semanteme mark.Generally speaking, the high more person of mood semanteme mark normally represents last classification results.If on behalf of this voice signal to be measured, the mood semanteme mark of happy mood classification for the highest, promptly belong to happy mood classification.
Above-mentioned lexical knowledge bank is for example for knowing net (Hownet).Know that net is that a notion with the word representative of Chinese and english serves as to describe object, and disclose the relation between the attribute that is had between notion and the notion.Below, lift an embodiment again and specify each step of setting up mood semanteme model just to know that net is an example.
Second embodiment
Fig. 3 is the method for building up process flow diagram according to the mood semanteme model that second embodiment of the invention illustrated.Please with reference to Fig. 3, in step S310, at first, inquiry knows that net 301 captures the meaning of one's words attribute of each word in the sentence.In knowing net 301, the notion of a plurality of words and the relation between these words have been write down.
Below enumerate an example and know the conceptual record form of net perfectly well.Fig. 4 is the synoptic diagram according to the knowledge net conceptual record form that second embodiment of the invention illustrated.Please with reference to Fig. 4, in knowing net, each word is to form a notes record by its notion and description.And each notes record mainly comprises function name variable (comprising word, speech word property, word example and concept definition) and data.With " W_C=beats " is example, wherein " beat " to be data, and the function name variable of " beating " is " W_C ", that is to say that " beating " is a word.And be example with " G_C=V ", wherein " V " is data, and the function name variable of " V " is " G_C ", that is to say that " V " is a speech word property.All the other by that analogy.
Return Fig. 3, then, in step S315, whether inquiry meaning of one's words label money storehouse 302 belongs to meaning of one's words label to judge meaning of one's words attribute.At this, meaning of one's words label is to stipulate by knowing defined meaning of one's words attribute in the middle of the net 301.Below lift each step of define method of an example plain language meaning label again.
Fig. 5 is the define method process flow diagram according to the meaning of one's words label that second embodiment of the invention illustrated.Please, in step S505, stipulate the emotional factor of basic initiation earlier with reference to Fig. 5.For example, human under what situation or situation with reference to mood psychology to understand, can cause the generation of mood.Go out to cause the basic emotion factor of mood at summarizing after, analyze the main meaning of one's words that these basic emotion factors are implied.
For instance, Fig. 6 is the synoptic diagram according to the basic emotion factor that second embodiment of the invention illustrated.The basic emotion factor of concluding gained that is shown in Figure 6.At this, be respectively happy emotional factor, angry emotional factor and sad emotional factor.
Via observation, can find all to have the performance of some specific meaning of one's words among these basic emotion factors to the basic emotion factor.For example: obtain certain benefit, remove certain pressure, lose certain benefit or the like.In above-mentioned meaning of one's words performance; For example will " obtain ", action description such as " releasing ", " losing " partly is called " action content word "; And will be attached to the action content word and the part that forms the complete meaning of one's words is called " attached action content word ", like certain benefit, pressure, target etc.
Get back to Fig. 5, correctly capture meaning of one's words attribute in the sentence, just, utilize the basic emotion rule and know that net defines meaning of one's words label afterwards like step S510 in order to pick out by voice signal.
For instance, Fig. 7 is the synoptic diagram according to the meaning of one's words label that second embodiment of the invention illustrated.At this, meaning of one's words label comprises specific meaning of one's words label, negates meaning of one's words label and turnover meaning of one's words label.Wherein, specific meaning of one's words label is the word that is used for expressing the specific meaning of one's words, negates the word that meaning of one's words label has negative meaning, and turnover meaning of one's words label then has the word of tone turnover.
At this, just according to the verb of knowing in the net, select the meaning of one's words attribute that wherein has the specific meaning of one's words of expression, again these meaning of one's words attributes are divided into 15 types, become the definition of 15 specific meaning of one's words labels.
With meaning of one's words label [reaching] is example, and the word that in knowing net, has " Vachieve| reaches ", " fulful| realization ", " end| termination ", " finish| finishes ", " succeed| success " attribute will be classified into [reaching] this meaning of one's words label.For example, " finding " and " guessing " two words being recorded as in knowing net " DEF=Vachieve| reaches ", therefore, above-mentioned two words will be sorted out to [reaching] this meaning of one's words label.
The definition of negating meaning of one's words label is with all directly acquisitions of the word that has characteristic " neg| is negative " in the definition of knowing all words in the net, and becomes the definition of negative meaning label.
The definition of turnover meaning of one's words label then is to observe to know all adverbial words and conjunction in the net, will have the definition of word acquisition the becoming turnover meaning of one's words label of the turnover tone.In addition, according to the characteristic of turnover language, the meaning of one's words of will transferring label is divided into two kinds again, and one is [turnover-acquisition], and another is [turnover-omission].
After meaning of one's words label definition is accomplished, just can utilize it to come voice mood is classified.
Return Fig. 3, in step S315, when the meaning of one's words attribute of word meets meaning of one's words label, shown in step S320, indicate corresponding meaning of one's words label.Afterwards, shown in step S325, inquiry rhythm attribute data storehouse 303 is to be extended for meaning of one's words label on meaning of one's words rhythm label.In view of the above, can strengthen the emotional characteristics of each mood classification via rhythm attribute.Then, in step S340, during so far the pairing meaning of one's words rhythm of sentence writes down with meaning of one's words rhythm tag record.
For instance, after voice signal being converted into the sentence and the speech that breaks, can capture the rhythm attribute of this segment according to the segment of each word in voice signal, and be recorded to rhythm attribute data storehouse 303.At this, rhythm attribute comprises pitch, energy and the duration of a sound.And each rhythm attribute can be quantized into three degree respectively and representes, pitch and energy with high (H), in (M), low (L) represent, the duration of a sound then with length (L), in (M), short (S) represent.
Return step S315,, shown in step S330, in these words, get the emotional characteristics speech when the meaning of one's words attribute of word does not meet meaning of one's words label.Afterwards, shown in step S335, these emotional characteristics speech are combined corresponding rhythm attribute and become a characteristic set.Then, in step S340, characteristic set is write down so far in the pairing meaning of one's words rhythm record of sentence.
That is to say; Except the speech of being put on meaning of one's words label external with it; Other are not put in the word (for example, adjective or noun) of meaning of one's words label, get its emotional characteristics speech in knowing net 301; And this emotional characteristics speech is added among the meaning of one's words rhythm record meaning of one's words characteristic of a complete sentence at last just thus.
For instance, suppose that the sentence of voice signal conversion does, " all fast out of funds having lived, take and counted out money not bad today ".After through above-mentioned steps S310~S325; The meaning of one's words rhythm label that obtains " not having " is [negating _ PM_EH_DS]; The meaning of one's words rhythm label of " fortunately " is [turnover-acquisition _ PL_EM_DL]; The meaning of one's words rhythm label of " taking " is [obtaining _ PH_EH_DM], and other words then do not have any meaning of one's words label.
Then, do not indicate in the word of meaning of one's words label, find out the emotional characteristics speech according to its meaning of one's words attribute at other.Through after above-mentioned steps S330~S335, the characteristic set that obtains " money " is [wealth| wealth _ PH_EM_DS], and the characteristic set of " a bit " is [few| few _ PL_EM_DM].
In view of the above, the meaning of one's words rhythm that obtained record then is { [negating _ PM_EH_DS], [turnover-acquisition _ PL_EM_DL], [obtaining _ PH_EH_DM], [few| few _ PL_EM_DM], [wealth| wealth _ PH_EM_DS] }.
It should be noted that in the present embodiment only acquisition [turnover-acquisition] meaning of one's words rhythm label afterwards, so the meaning of one's words rhythm that obtains at last is recorded as { [obtaining _ PH_EH_DM], [few| few _ PL_EM_DM], [wealth| wealth _ PH_EM_DS] }.
And after the acquisition of the sentence after the identification via meaning of one's words attribute and rhythm attribute, become a meaning of one's words rhythm record.Afterwards, just use the technology of Date Mining, from whole meaning of one's words rhythm records, prospect out mood rule 304 automatically.
Because the sign program of meaning of one's words rhythm label is that the meaning of one's words rhythm label that is indicated with quilt is the center, therefore desired mood conformation of rules is T → D.Wherein, T represents meaning of one's words rhythm label, for example [reaches _ PH_EM_DS], [releasing _ PM_EH_DM] etc.And D is the attached content word that is attached to certain action, is utilize to know that net 301 captures main emotional characteristics speech and combines rhythm attribute and become characteristic set, like [symbol| symbol _ PM_EH_DM].At this T and D can be one or more.In view of the above, no matter be that T1^T2 → D1 or T3 → D2^D3 are all possible.
After obtaining mood rule 304 via the Date Mining technology, in step S345~S355, utilize mood rule 304, convert each meaning of one's words rhythm record into a meaning of one's words rhythm vector representation, wherein a dimension in each bar mood rule representation vector space.
If neutral mood rule is
Figure G2008101794972D00091
; ...
Figure G2008101794972D00093
; Happy mood rule is that the angry mood rule of
Figure G2008101794972D00095
Figure G2008101794972D00096
is
Figure G2008101794972D00097
; ...
Figure G2008101794972D00099
sad mood rule is
Figure G2008101794972D000910
Figure G2008101794972D000911
...
Figure G2008101794972D000912
Then the meaning of one's words rhythm vector representation of each meaning of one's words rhythm record does Sv = ( v 1 N , v 2 N , . . . , v r N N , v 1 H , v 2 H , . . . , v r H H , v 1 A , v 2 A , . . . v r A A , v 1 S , v 2 S , . . . v r S S ) . This meaning of one's words rhythm vector then is to be r in dimension N+ r H+ r A+ r SA point in the vector space.
In step S345,, calculate the meaning of one's words mark of meaning of one's words rhythm record according to mood rule 304.Say at length when score, whether the T part of inspection meaning of one's words rhythm record meets the T part of mood rule 304 earlier.If the T of meaning of one's words rhythm record partly meets the T part of mood rule, just further D is partly checked, to calculate this dimension mark.
Because in knowing net 301 definition, word has social strata relation, this social strata relation is when two emotional characteristics speech are different between meaning of one's words rhythm record and mood rule 304, is used for calculating the meaning of one's words similarity of two emotional characteristics speech.In knowing net 301, if social strata relation is divided into the m layer the most deeply, stratum is darker, and it concerns that better promptly the meaning of one's words is more close.So two emotional characteristics speech D between meaning of one's words rhythm record and the mood rule 304 i, D jThe comparison mark be:
v p ( D i , D j ) = 1 if D i = D j L ( D i , D j ) m ( 1 N ( L ( D i , D j ) ) ) m - L ( D i , D j ) - 1 if D i ≠ D j .
Wherein, L (D i, D j) be D i, D jThe same path length of maximal phase, N (L (D i, D j)) being the maximal phase son node number of footpath under the node of going the same way, m is a social strata relation.
After social strata relation is tried to achieve,, carry out computing again according to different relations on attributes again, shown in step S350, calculate the score of adeditive attribute according to adeditive attribute weight table 305 if the emotional characteristics speech in the meaning of one's words rhythm record still has an adeditive attribute to exist.For instance, Fig. 8 A is the synoptic diagram according to the adeditive attribute that second embodiment of the invention illustrated.Fig. 8 B is the synoptic diagram according to the adeditive attribute weight table that second embodiment of the invention illustrated.At Fig. 8 A,, eight kinds of adeditive attributes have been defined to know that net 301 is an example.In Fig. 8 B,, give each adeditive attribute one weighted value according to the relation between each adeditive attribute.
Return Fig. 4, then, in step S355, calculate the score of rhythm attribute according to rhythm attribute weight table 306.For instance, Fig. 9 is the synoptic diagram according to the rhythm attribute weight table that second embodiment of the invention illustrated.In Fig. 9, rhythm attribute can be quantized into three degree respectively represent, pitch and energy with high (H), in (M), low (L) represent, the duration of a sound then with length (L), in (M), short (S) represent.Degree gap according to quantizing gives a weighted value.H and M degree are nearer, and the weighted value of giving is 0.5; And H and L degree are far away, and the weighted value of giving is 0.25; In addition, M and H degree are nearer, and the weighted value of being given also is 0.5.By that analogy.
Afterwards, in step S360, calculate each the dimension mark in the meaning of one's words rhythm vector according to above-mentioned steps S345~S355.
For instance, suppose D iBe [symbol| symbol _ PM_EH_DM], D jBe [&language| language _ PH_EM_DM].D iPath knowing net is 1.1.2.5.1.1, and D jPath knowing net is 1.1.2.5.1, L (D i, D j) be 5, N (L (D i, D j)) be 4, try to achieve social strata relation mark v pBe 5/28.Adeditive attribute mark v rBe 0.5.Rhythm attribute scores v fBe 0.5.The one dimension mark of trying to achieve at last in meaning of one's words rhythm vector is v=v p* v r* v f
At last, in step S365, the meaning of one's words rhythm of each mood classification vector collect accomplish after, just can the gauss hybrid models construction go out the mood semanteme model of each mood classification.After the mood semanteme model construction is accomplished, just can begin to carry out the classification of voice mood.Below lifting another embodiment again explains.
The 3rd embodiment
Figure 10 is the process flow diagram according to the voice mood sorting technique that third embodiment of the invention illustrated.Please, in the present embodiment, receive after the voice signal to be measured, can carry out the analysis of the literal meaning of one's words and the acquisition of prosodic features to voice signal to be measured respectively with reference to Figure 10.
In the literal meaning of one's words analytically, at first shown in step S1010, utilize the acoustic model 1001 (for example HMM) of a speech recognition to convert voice signal to be measured into sentence.Then, shown in step 1015, utilize meaning of one's words label data bank 1002, rhythm attribute data storehouse 1003 and mood rule 1004, convert this sentence into meaning of one's words rhythm vector.
At this; Step 1015 is same or similar with the step S310~S360 among aforementioned second embodiment; And the method for building up of meaning of one's words label data bank 302, rhythm attribute data storehouse 303 and mood rule 304 among meaning of one's words label data bank 1002, rhythm attribute data storehouse 1003 and mood rule 1004 and aforementioned second embodiment is also same or similar, so neitherly give unnecessary details at this again.
On the other hand, in the acquisition of prosodic features, at first shown in step S1020, in voice signal to be measured, detect the remarkable segment of a mood.In voice signal to be measured, influenced the accuracy rate of emotion identification for fear of non-mood segment, thereby can detect the remarkable segment of mood (Emotionally Salient Segment) earlier, to capture prosodic features to remarkable segment.The remarkable segment of so-called mood is the pitch track (Pitch Contour) that calculates whole voice signal to be measured earlier.If there is continuous segment in the pitch track, then this continuous segment is defined as the remarkable segment of mood.
Then, in step S1025, capture prosodic features based on the remarkable segment of mood.Prosodic features comprises maximum pitch value, minimum pitch value, average pitch value, pitch variance, maximum energy value, minimum energy value, the average energy value, energy variance, maximum resonance peak value, minimum resonance peak, average resonance peak, resonance peak variance, amounts to 12 parameters.These 12 parameters are regarded as the rhythm vector of 12 dimensions.
At last in step S1030,, and decide the mood classification of voice signal to be measured according to shellfish formula theorem in conjunction with mood semanteme model 1005 and mood rhythm model 1006.At length say, just meaning of one's words rhythm vector substitution mood semanteme model 1005 is obtained the mood semanteme mark, and rhythm vector substitution mood rhythm model 1006 is obtained mood rhythm mark.Afterwards, the maximum mood classification of posterior probability by shellfish formula theorem is found out is last identification result.
At this, mood rhythm model 1006 is set up with gauss hybrid models.Just, capture above-mentioned 12 parameters respectively, train this mood rhythm model 1006 as rhythm vector with these 12 parameters with the voice signal in the mood corpus.
In sum, in the above-described embodiments, analyze the meaning of one's words attribute of each word, and further combine rhythm attribute, improve the accuracy of mood classification according to this.In addition, more can only analyze, influence the accuracy of mood classification with the segment of avoiding non-mood to the remarkable segment of mood in the voice signal.
Though the present invention discloses as above with preferred embodiment; Right its is not that any those skilled in the art are not breaking away from the spirit and scope of the present invention in order to qualification the present invention; When can doing a little modification and perfect, so protection scope of the present invention is when being as the criterion with what claims defined.

Claims (16)

1. the method for building up of a mood semanteme model comprises:
One mood corpus is provided, and it comprises a plurality of voice signals that belong to a plurality of mood classifications;
Changing these a plurality of voice signals is a sentence;
This sentence speech that breaks is handled, and obtained a plurality of words;
Capture these an a plurality of words meaning of one's words attribute and the rhythm attribute separately in these a plurality of voice signals respectively, wherein this meaning of one's words attribute is to obtain according to a lexical knowledge bank, and this rhythm attribute is to be obtained by this a plurality of voice signal;
According to this meaning of one's words attribute, judge whether these a plurality of words belong to a meaning of one's words label, wherein this meaning of one's words label is to define and get according to this lexical knowledge bank;
When one of them belongs to this meaning of one's words label when these a plurality of words, be a meaning of one's words rhythm label, with this meaning of one's words rhythm tag record to meaning of one's words rhythm record in conjunction with this meaning of one's words label this rhythm attribute pairing with it;
Convert this meaning of one's words rhythm record into meaning of one's words rhythm vector; And
By these a plurality of voice signals this meaning of one's words rhythm vector separately, set up this mood semanteme model.
2. the method for building up of mood semanteme model as claimed in claim 1 is characterized in that, by these a plurality of voice signals this meaning of one's words rhythm vector separately, sets up the step of this mood semanteme model, comprising:
With these a plurality of voice signals this meaning of one's words rhythm vector substitution one gauss hybrid models separately, to set up this mood semanteme model.
3. the method for building up of mood semanteme model as claimed in claim 1 is characterized in that, converts this meaning of one's words rhythm record into this meaning of one's words rhythm vectorial step, comprising:
By this meaning of one's words rhythm record, prospect mood rule; And
According to this mood rule, convert this meaning of one's words rhythm record into this meaning of one's words rhythm vector.
4. the method for building up of mood semanteme model as claimed in claim 1 is characterized in that, more comprises:
According to a basic emotion factor and this lexical knowledge bank, stipulate this meaning of one's words label, wherein this meaning of one's words label comprises that a specific meaning of one's words label, is negated a meaning of one's words label and a turnover meaning of one's words label.
5. the method for building up of mood semanteme model as claimed in claim 3 is characterized in that, is this meaning of one's words rhythm label in conjunction with this meaning of one's words label this rhythm attribute pairing with it, with this meaning of one's words rhythm tag record to this meaning of one's words rhythm recorded steps, more comprise:
According to this meaning of one's words attribute, judge in not belonging to these a plurality of words of this meaning of one's words label, whether to comprise an emotional characteristics speech, be a characteristic set to combine this emotional characteristics speech this rhythm attribute corresponding, and this characteristic set is recorded to this meaning of one's words rhythm record with it.
6. the method for building up of mood semanteme model as claimed in claim 5 is characterized in that, according to this mood rule, converts this meaning of one's words rhythm record into this meaning of one's words rhythm vectorial step, comprising:
According to this mood rule, calculate a meaning of one's words mark and a rhythm mark of the emotional characteristics speech in this meaning of one's words rhythm record; And
According to this meaning of one's words mark and this rhythm mark, obtain these a plurality of voice signals this meaning of one's words rhythm separately respectively and be recorded in the dimension mark in this meaning of one's words rhythm vector, and the dimension of this meaning of one's words rhythm vector is to determine according to the quantity of this mood rule.
7. the method for building up of mood semanteme model as claimed in claim 1 is characterized in that, this rhythm attribute comprises pitch, energy and the duration of a sound.
8. the sorting technique of a voice mood comprises:
According to an included separately meaning of one's words attribute and the rhythm attribute of a plurality of test words in a plurality of voice signals; Set up a mood semanteme model; Wherein this meaning of one's words attribute is to obtain according to a lexical knowledge bank, and this rhythm attribute is to be obtained by this a plurality of voice signal;
Receive a voice signal to be measured;
Changing this voice signal to be measured is a sentence;
This sentence speech that breaks is handled, and obtained a plurality of words to be measured;
Capture these a plurality of words to be measured this meaning of one's words attribute and this rhythm attribute separately in this voice signal to be measured;
According to this meaning of one's words attribute, judge whether these a plurality of words to be measured belong to a meaning of one's words label, and wherein this meaning of one's words label is to define and get according to this lexical knowledge bank;
When one of them belongs to this meaning of one's words label when these a plurality of words to be measured, be a meaning of one's words rhythm label, with this meaning of one's words rhythm tag record to meaning of one's words rhythm record in conjunction with this meaning of one's words label this rhythm attribute pairing with it;
According to mood rule, convert this meaning of one's words rhythm record into meaning of one's words rhythm vector;
With this this mood semanteme model of meaning of one's words rhythm vector substitution, to obtain a mood semanteme mark; And
According to this mood semanteme mark, judge the mood classification of this voice signal to be measured.
9. the sorting technique of voice mood as claimed in claim 8 is characterized in that, more comprises:
In this voice signal to be measured, detect the remarkable segment of a mood;
Capture a plurality of prosodic features of the remarkable segment of this mood;
Should a plurality of prosodic features substitution one mood rhythm models, to obtain a mood rhythm mark.
10. the sorting technique of voice mood as claimed in claim 9 is characterized in that, according to this mood semanteme mark, judges the step of the mood classification of this voice signal to be measured, more comprises:
According to this mood semanteme mark and this mood rhythm mark, judge the mood classification of this voice signal to be measured.
11. the sorting technique of voice mood as claimed in claim 9 is characterized in that, in this voice signal to be measured, detects the step of the remarkable segment of this mood, comprising:
Capture a pitch track of this voice signal to be measured; And
Detect the continuous segment in this pitch track, with the continuous segment in this pitch track as the remarkable segment of this mood.
12. the sorting technique of voice mood as claimed in claim 8 is characterized in that, is this meaning of one's words rhythm label in conjunction with this meaning of one's words label this rhythm attribute pairing with it, with this meaning of one's words rhythm tag record to this meaning of one's words rhythm recorded steps, comprising:
According to this meaning of one's words attribute; Whether judgement is an emotional characteristics speech in not belonging to these a plurality of words to be measured of this meaning of one's words label; To combine this emotional characteristics speech this rhythm attribute corresponding with it is a characteristic set, and this characteristic set is recorded to this meaning of one's words rhythm record.
13. the sorting technique of voice mood as claimed in claim 8 is characterized in that, this meaning of one's words label comprises that a specific meaning of one's words label, is negated a meaning of one's words label and a turnover meaning of one's words label.
14. the sorting technique of voice mood as claimed in claim 8 is characterized in that, according to this mood rule, converts this meaning of one's words rhythm record into this meaning of one's words rhythm vectorial step, comprising:
According to this mood rule, calculate a meaning of one's words mark and a rhythm mark of the emotional characteristics speech in this meaning of one's words rhythm record; And
According to this meaning of one's words mark and this rhythm mark, obtain this meaning of one's words rhythm and be recorded in the dimension mark in this meaning of one's words rhythm vector, and the dimension of this meaning of one's words rhythm vector is to determine according to the quantity of this mood rule.
15. the sorting technique of voice mood as claimed in claim 8 is characterized in that, sets up the step of this mood semanteme model, comprising:
According to a gauss hybrid models, set up this mood semanteme model.
16. the sorting technique of voice mood as claimed in claim 8 is characterized in that, this rhythm attribute comprises pitch, energy and the duration of a sound.
CN2008101794972A 2008-12-03 2008-12-03 Voice mood sorting method and establishing method for mood semanteme model thereof Expired - Fee Related CN101751923B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2008101794972A CN101751923B (en) 2008-12-03 2008-12-03 Voice mood sorting method and establishing method for mood semanteme model thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2008101794972A CN101751923B (en) 2008-12-03 2008-12-03 Voice mood sorting method and establishing method for mood semanteme model thereof

Publications (2)

Publication Number Publication Date
CN101751923A CN101751923A (en) 2010-06-23
CN101751923B true CN101751923B (en) 2012-04-18

Family

ID=42478794

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2008101794972A Expired - Fee Related CN101751923B (en) 2008-12-03 2008-12-03 Voice mood sorting method and establishing method for mood semanteme model thereof

Country Status (1)

Country Link
CN (1) CN101751923B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI602174B (en) * 2016-12-27 2017-10-11 李景峰 Emotion recording and management device, system and method based on voice recognition

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102637433B (en) * 2011-02-09 2015-11-25 富士通株式会社 The method and system of the affective state carried in recognition of speech signals
CN103390409A (en) * 2012-05-11 2013-11-13 鸿富锦精密工业(深圳)有限公司 Electronic device and method for sensing pornographic voice bands
CN103024521B (en) * 2012-12-27 2017-02-08 深圳Tcl新技术有限公司 Program screening method, program screening system and television with program screening system
CN103634472B (en) * 2013-12-06 2016-11-23 惠州Tcl移动通信有限公司 User mood and the method for personality, system and mobile phone is judged according to call voice
CN103905296A (en) 2014-03-27 2014-07-02 华为技术有限公司 Emotion information processing method and device
JP6464703B2 (en) * 2014-12-01 2019-02-06 ヤマハ株式会社 Conversation evaluation apparatus and program
CN104700829B (en) * 2015-03-30 2018-05-01 中南民族大学 Animal sounds Emotion identification system and method
CN105355193B (en) * 2015-10-30 2020-09-25 百度在线网络技术(北京)有限公司 Speech synthesis method and device
CN106910513A (en) * 2015-12-22 2017-06-30 微软技术许可有限责任公司 Emotional intelligence chat engine
CN107516511B (en) 2016-06-13 2021-05-25 微软技术许可有限责任公司 Text-to-speech learning system for intent recognition and emotion
CN106294845B (en) * 2016-08-19 2019-08-09 清华大学 The susceptible thread classification method and device extracted based on weight study and multiple features
CN107331388A (en) * 2017-06-15 2017-11-07 重庆柚瓣科技有限公司 A kind of dialect collection system based on endowment robot
CN108962255B (en) * 2018-06-29 2020-12-08 北京百度网讯科技有限公司 Emotion recognition method, emotion recognition device, server and storage medium for voice conversation
CN110910865B (en) * 2019-11-25 2022-12-13 秒针信息技术有限公司 Voice conversion method and device, storage medium and electronic device
CN111475023A (en) * 2020-04-07 2020-07-31 四川虹美智能科技有限公司 Refrigerator control method and device based on speech emotion recognition
CN111540358B (en) * 2020-04-26 2023-05-26 云知声智能科技股份有限公司 Man-machine interaction method, device, equipment and storage medium

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030368A (en) * 2006-03-03 2007-09-05 国际商业机器公司 Method and system for communicating across channels simultaneously with emotion preservation

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101030368A (en) * 2006-03-03 2007-09-05 国际商业机器公司 Method and system for communicating across channels simultaneously with emotion preservation

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI602174B (en) * 2016-12-27 2017-10-11 李景峰 Emotion recording and management device, system and method based on voice recognition

Also Published As

Publication number Publication date
CN101751923A (en) 2010-06-23

Similar Documents

Publication Publication Date Title
CN101751923B (en) Voice mood sorting method and establishing method for mood semanteme model thereof
CN101710490B (en) Method and device for compensating noise for voice assessment
Hall-Lew Improved representation of variance in measures of vowel merger
CN101930735B (en) Speech emotion recognition equipment and speech emotion recognition method
CN104584119B (en) Determine hot word grade of fit
CN103177733B (en) Standard Chinese suffixation of a nonsyllabic "r" sound voice quality evaluating method and system
CN110570873B (en) Voiceprint wake-up method and device, computer equipment and storage medium
CN108597494A (en) Tone testing method and device
CN109920449B (en) Beat analysis method, audio processing method, device, equipment and medium
TWI389100B (en) Method for classifying speech emotion and method for establishing emotional semantic model thereof
CN108648759A (en) A kind of method for recognizing sound-groove that text is unrelated
CN102982811A (en) Voice endpoint detection method based on real-time decoding
CN108447471A (en) Audio recognition method and speech recognition equipment
CN103077720B (en) Speaker identification method and system
CN101404160A (en) Voice denoising method based on audio recognition
CN105810191B (en) Merge the Chinese dialects identification method of prosodic information
CN109272991A (en) Method, apparatus, equipment and the computer readable storage medium of interactive voice
CN107871499A (en) Audio recognition method, system, computer equipment and computer-readable recording medium
Gong et al. Vocalsound: A dataset for improving human vocal sounds recognition
CN101540170A (en) Voiceprint recognition method based on biomimetic pattern recognition
CN101963972A (en) Method and system for extracting emotional keywords
CN107766560A (en) The evaluation method and system of customer service flow
CN103871413A (en) Men and women speaking voice classification method based on SVM and HMM mixing model
CN111128134A (en) Acoustic model training method, voice awakening method, device and electronic equipment
CN111048068B (en) Voice wake-up method, device and system and electronic equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20120418