CN101751923B

CN101751923B - Voice mood sorting method and establishing method for mood semanteme model thereof

Info

Publication number: CN101751923B
Application number: CN2008101794972A
Authority: CN
Inventors: 吴宗宪; 李伟铨; 林瑞堂; 许进顺; 朱家德
Original assignee: Institute for Information Industry
Current assignee: Institute for Information Industry
Priority date: 2008-12-03
Filing date: 2008-12-03
Publication date: 2012-04-18
Anticipated expiration: 2028-12-03
Also published as: CN101751923A

Abstract

The invention discloses a voice mood sorting method and establishing method for mood semanteme model thereof. At first, a mood semanteme model is established according to the semanteme property and rhythm property included in the voice signal of a semanteme corpus and the semanteme property and rhythm property of each candidate detecting verbal in a candidate detecting voice signal are captured; then, the semanteme property and rhythm property of each candidate detecting verbal are inputted into the mood semanteme model, thus shorting the candidate detecting voice signal into corresponding mood category, thereby reinforcing the mood characteristics of each mood category through the rhythm property, and improving the currency of mood sorting.

Description

The sorting technique of voice mood and the method for building up of mood semanteme model thereof

Technical field

The invention relates to a kind of emotion identification method, and particularly relevant for a kind of sorting technique of the voice mood that combines the meaning of one's words and the rhythm and the method for building up of mood semanteme model thereof.

Background technology

In recent years, because science and technology is with rapid changepl. never-ending changes and improvements, the communicative mode between people and the intelligent electronic installation no longer was the past to input to electronic installation with instruction, and electronic installation can satisfy with the mode that literal is responded again.Therefore, the man-machine interface between the following mankind and the intelligent electronic device also will through nature and easily communication media " voice " control.And in order to make man-machine interface system more diversity and hommization, many scholars, manufacturer then there's no one who doesn't or isn't take up to study the identification of mood.

With customer service system, the user who utilizes TV and network to do shopping at present is more and more general.When product broke down, the user mostly all can make a phone call to inquire to client service center.If customer service system can pick out the present emotional state of user, just the contact staff can pacify user's mood as early as possible.And the contact staff also can judge whether self can solve according to the user's who is picked out mood, and whether decision pacifies action for senior contact staff call forwarding.Thus, just can solve many unnecessary conflicts produces.In view of the above, how improving the accuracy of emotion identification, also is an important ring of research at present.

Summary of the invention

The present invention provides a kind of method for building up of mood semanteme model, comes the construction mood semanteme model in conjunction with meaning of one's words attribute and rhythm attribute.

The present invention provides a kind of sorting technique of voice mood, analyzes the meaning of one's words attribute of word and combines rhythm attribute, so as to improving the accuracy of classification.

The present invention proposes a kind of method for building up of mood semanteme model.A mood corpus is provided earlier, and it comprises a plurality of voice signals that belong to a plurality of mood classifications.Then, capture the meaning of one's words attribute and the rhythm attribute of each word in the voice signal respectively.Wherein, meaning of one's words attribute is an inquiry lexical knowledge bank and obtaining, and rhythm attribute is by capturing gained in each voice signal.Afterwards, just can set up mood semanteme model by the meaning of one's words attribute and the rhythm attribute of each voice signal.

In one embodiment of this invention, the above-mentioned step of setting up mood semanteme model comprises according to meaning of one's words attribute and rhythm attribute, converts each voice signal into meaning of one's words rhythm vector respectively.Again with these meaning of one's words rhythm vector substitution gauss hybrid models, to set up mood semanteme model.

In one embodiment of this invention, above-mentioned according to meaning of one's words attribute and rhythm attribute, changing each voice signal respectively is the step of meaning of one's words rhythm vector, according to the meaning of one's words attribute and the rhythm attribute of each word, obtains a meaning of one's words rhythm record earlier.Then, write down by the meaning of one's words rhythm and prospect mood rule.Afterwards, just can convert meaning of one's words rhythm record into meaning of one's words rhythm vector according to above-mentioned mood rule.

In one embodiment of this invention, above-mentioned acquisition meaning of one's words rhythm recorded steps comprises: according to meaning of one's words attribute, judge whether each word is meaning of one's words label.Wherein, meaning of one's words label is to define and get according to lexical knowledge bank.When word belongs to meaning of one's words label, be a meaning of one's words rhythm label in conjunction with the meaning of one's words label of this word rhythm attribute pairing, with meaning of one's words rhythm tag record to meaning of one's words rhythm record with it.In addition, more can judge in whether comprise the emotional characteristics speech, be a characteristic set to combine the emotional characteristics speech rhythm attribute corresponding, and characteristic set is recorded to meaning of one's words rhythm record with it according to meaning of one's words attribute not for the word of meaning of one's words label.

In one embodiment of this invention, the method for building up of mood semanteme model more comprises: according to a basic emotion rule and a lexical knowledge bank, stipulate meaning of one's words label.Wherein, meaning of one's words label comprises specific meaning of one's words label, negates meaning of one's words label and turnover meaning of one's words label.

In one embodiment of this invention, the above-mentioned meaning of one's words rhythm is write down converts the vectorial step of the meaning of one's words rhythm into, at first, according to the mood rule, calculates the meaning of one's words mark and the rhythm mark of the emotional characteristics speech in the meaning of one's words rhythm record.Then, according to meaning of one's words mark and rhythm mark, the meaning of one's words rhythm that obtains each voice signal respectively is recorded in the dimension mark in the meaning of one's words rhythm vector.And the dimension of above-mentioned meaning of one's words rhythm vector is to determine according to the quantity of mood rule.

In one embodiment of this invention, above-mentioned before the step of meaning of one's words attribute that captures each word in the voice signal respectively and rhythm attribute, more comprise converting each voice signal into sentence.And, the above-mentioned sentence speech that breaks is handled, and is obtained above-mentioned word.

In one embodiment of this invention, above-mentioned lexical knowledge bank is for knowing net (HowNet).Rhythm attribute comprises pitch, energy and the duration of a sound.

The present invention proposes a kind of sorting technique of voice mood.At first, according to respectively testing included meaning of one's words attribute of word and rhythm attribute in a plurality of voice signals, set up a mood semanteme model.And meaning of one's words attribute is an inquiry lexical knowledge bank and obtaining, and rhythm attribute then is to be obtained by each voice signal.Then, capture the meaning of one's words attribute and the rhythm attribute of each word to be measured in the voice signal to be measured.Afterwards, with the meaning of one's words attribute and the rhythm attribute of each word to be measured, the above-mentioned mood semanteme model of substitution is to obtain the mood semanteme mark.At last, judge the mood classification of voice signal to be measured according to the mood semanteme mark.

In one embodiment of this invention, the sorting technique of voice mood more comprises: in voice signal to be measured, detect the remarkable segment of a mood, with the prosodic features of the remarkable segment of acquisition mood.Afterwards, with prosodic features substitution mood rhythm model, to obtain mood rhythm mark.In view of the above, the step of the mood classification of above-mentioned judgement voice signal to be measured more can be judged according to mood semanteme mark and mood rhythm mark.

In one embodiment of this invention, above-mentioned in voice signal to be measured, the step of the remarkable segment of detection mood, the pitch track of fechtable voice signal to be measured, and the pitch track comprises a plurality of pitches.By the continuous segment that detects in the pitch track, with the continuous segment in the pitch track as the remarkable segment of mood.

Based on above-mentioned, the present invention trains meaning of one's words rhythm model according to the meaning of one's words attribute and the rhythm attribute of each word earlier, to be used as the classification of voice mood by this meaning of one's words rhythm model.Thus, strengthen the emotional characteristics of each mood classification, can improve the accuracy of mood classification via rhythm attribute.

Description of drawings

For let above-mentioned purpose of the present invention, feature and advantage can be more obviously understandable, elaborate below in conjunction with the accompanying drawing specific embodiments of the invention, wherein:

Fig. 1 is the method for building up process flow diagram according to the mood semanteme model that first embodiment of the invention illustrated.

Fig. 2 is the sorting technique process flow diagram according to the voice mood that first embodiment of the invention illustrated.

Fig. 3 is the method for building up process flow diagram according to the mood semanteme model that second embodiment of the invention illustrated.

Fig. 4 is the synoptic diagram according to the knowledge net conceptual record form that second embodiment of the invention illustrated.

Fig. 5 is the define method process flow diagram according to the meaning of one's words label that second embodiment of the invention illustrated.

Fig. 6 is the synoptic diagram according to the basic emotion factor that second embodiment of the invention illustrated.

Fig. 7 is the synoptic diagram according to the meaning of one's words label that second embodiment of the invention illustrated.

Fig. 8 A is the synoptic diagram according to the adeditive attribute that second embodiment of the invention illustrated.

Fig. 8 B is the synoptic diagram according to the adeditive attribute weight table that second embodiment of the invention illustrated.

Fig. 9 is the synoptic diagram according to the rhythm attribute weight table that second embodiment of the invention illustrated.

Figure 10 is the process flow diagram according to the voice mood sorting technique that third embodiment of the invention illustrated.

The main element symbol description:

S105～S115: each step of the method for building up of the mood semanteme model of first embodiment of the invention

S205～S220: each step of the sorting technique of the voice mood of first embodiment of the invention

S310～S365: each step of the method for building up of the mood semanteme model of second embodiment of the invention

S505～S510: each step of define method of the meaning of one's words label of second embodiment of the invention

S1010～S1030: each step of voice mood sorting technique of third embodiment of the invention

301: know net

302,1002: meaning of one's words label data bank

303,1003: rhythm attribute data storehouse

304,1004: the mood rule

305: the adeditive attribute weight table

306: rhythm attribute weight table

1001: acoustic model

1005: mood semanteme model

1006: the mood rhythm model

Embodiment

Be different from tradition and only use the mood key word to classify, in the following example, will further analyze the literal meaning of one's words and be applied in the mood classification in mood.In order to make content of the present invention more clear, below specially lift the example that embodiment can implement as the present invention really according to this.

First embodiment

Fig. 1 is the method for building up process flow diagram according to the mood semanteme model that first embodiment of the invention illustrated.Please with reference to Fig. 1, in step S105, a mood corpus is provided, it comprises a plurality of voice signals.Before setting up mood semanteme model, collect the voice signal of multiple mood classification earlier.For example, can carry out voice recording respectively to angry, sad, glad and neutral four kinds of mood classifications by a plurality of different people, to set up the mood corpus.

Then, in step S110, capture the meaning of one's words attribute and the rhythm attribute of each word in each voice signal, in order to coming eigenwert as classification with meaning of one's words attribute and rhythm attribute.For example, convert each voice signal into sentence earlier, again sentence is broken the speech processing to be cut into a plurality of words.Afterwards, in a lexical knowledge bank, inquire about the meaning of one's words attribute of these words, and capture the rhythm attribute (for example, pitch, energy and the duration of a sound) of these words from voice signal.

At last, in step S115,, set up mood semanteme model according to meaning of one's words attribute and rhythm attribute.For example, according to meaning of one's words attribute and rhythm attribute, convert each voice signal into meaning of one's words rhythm vector.Afterwards, relend by meaning of one's words rhythm vector and train mood semanteme model.Promptly be to utilize sorting technique to conclude emotional characteristicses such as meaning of one's words attribute that is captured and rhythm attribute.

Generally speaking; Sorting technique comprises SVMs (Support Vector Machine; SVM), neural network (Neural Network; NN), concealed Marko husband module (Hidden Markov Model, HMM) and mixture-of-gaussian mode (Gaussian Mixture Model, GMM) etc.In sorting technique, generally be to train by vector spatially.

A meaning of one's words rhythm record that for instance, can obtain this sentence according to the meaning of one's words attribute and the rhythm attribute of each word.Then, relend by whole meaning of one's words rhythm of being obtained from the mood corpus and write down the mood rule of prospecting out each mood classification.Afterwards, just can convert meaning of one's words rhythm record into meaning of one's words rhythm vector according to above-mentioned mood rule.

At this, can utilize above-mentioned lexical knowledge bank to define a plurality of meaning of one's words labels in advance.After the above-mentioned meaning of one's words label of definition, add the rhythm attribute in the voice signal again, and meaning of one's words label is extended to meaning of one's words rhythm label.In view of the above, just can strengthen the emotional characteristics of each mood classification via rhythm attribute.

Say further, after capturing the meaning of one's words attribute of each word, judge whether the meaning of one's words attribute of each word is meaning of one's words label.When word belongs to meaning of one's words label, just the meaning of one's words label of this word rhythm attribute pairing with it is combined into a meaning of one's words rhythm label.Afterwards, again with the pairing meaning of one's words rhythm of meaning of one's words rhythm tag record to sentence record.On the other hand, more can judge in whether comprise the emotional characteristics speech, be a characteristic set to combine the emotional characteristics speech rhythm attribute corresponding, and characteristic set is recorded to meaning of one's words rhythm record with it according to meaning of one's words attribute not for the word of meaning of one's words label.In view of the above, the mood rule that just can get according to prospecting automatically converts meaning of one's words rhythm record into meaning of one's words rhythm vector, to train mood semanteme model by the relation of meaning of one's words rhythm vector in the space.

And after setting up the mood semanteme module, just can begin to carry out the identification of voice mood.Below lifting an example again explains.

Fig. 2 is the sorting technique process flow diagram according to the voice mood that first embodiment of the invention illustrated.Please, in step S205, receive a voice signal to be measured with reference to Fig. 2.After receiving voice signal to be measured, just convert voice signal to be measured into sentence, and this sentence is cut into a plurality of words to be measured.

Then, shown in step S210, the meaning of one's words attribute of these words to be measured of inquiry and capture the rhythm attribute of these words to be measured from voice signal to be measured in lexical knowledge bank.

Afterwards, in step S215, with the meaning of one's words attribute and the rhythm attribute of each word to be measured, the substitution mood semanteme model is to obtain the mood semanteme mark.Because mood semanteme model is set up, and with meaning of one's words attribute and rhythm attribute substitution mood semanteme model, just can obtain the mood semanteme mark of this voice signal to be measured representative in each mood classification at this.

At last, in step S220,, judge the mood classification of voice signal to be measured according to the mood semanteme mark.Generally speaking, the high more person of mood semanteme mark normally represents last classification results.If on behalf of this voice signal to be measured, the mood semanteme mark of happy mood classification for the highest, promptly belong to happy mood classification.

Above-mentioned lexical knowledge bank is for example for knowing net (Hownet).Know that net is that a notion with the word representative of Chinese and english serves as to describe object, and disclose the relation between the attribute that is had between notion and the notion.Below, lift an embodiment again and specify each step of setting up mood semanteme model just to know that net is an example.

Second embodiment

Fig. 3 is the method for building up process flow diagram according to the mood semanteme model that second embodiment of the invention illustrated.Please with reference to Fig. 3, in step S310, at first, inquiry knows that net 301 captures the meaning of one's words attribute of each word in the sentence.In knowing net 301, the notion of a plurality of words and the relation between these words have been write down.

Below enumerate an example and know the conceptual record form of net perfectly well.Fig. 4 is the synoptic diagram according to the knowledge net conceptual record form that second embodiment of the invention illustrated.Please with reference to Fig. 4, in knowing net, each word is to form a notes record by its notion and description.And each notes record mainly comprises function name variable (comprising word, speech word property, word example and concept definition) and data.With " W_C=beats " is example, wherein " beat " to be data, and the function name variable of " beating " is " W_C ", that is to say that " beating " is a word.And be example with " G_C=V ", wherein " V " is data, and the function name variable of " V " is " G_C ", that is to say that " V " is a speech word property.All the other by that analogy.

Return Fig. 3, then, in step S315, whether inquiry meaning of one's words label money storehouse 302 belongs to meaning of one's words label to judge meaning of one's words attribute.At this, meaning of one's words label is to stipulate by knowing defined meaning of one's words attribute in the middle of the net 301.Below lift each step of define method of an example plain language meaning label again.

Fig. 5 is the define method process flow diagram according to the meaning of one's words label that second embodiment of the invention illustrated.Please, in step S505, stipulate the emotional factor of basic initiation earlier with reference to Fig. 5.For example, human under what situation or situation with reference to mood psychology to understand, can cause the generation of mood.Go out to cause the basic emotion factor of mood at summarizing after, analyze the main meaning of one's words that these basic emotion factors are implied.

For instance, Fig. 6 is the synoptic diagram according to the basic emotion factor that second embodiment of the invention illustrated.The basic emotion factor of concluding gained that is shown in Figure 6.At this, be respectively happy emotional factor, angry emotional factor and sad emotional factor.

Via observation, can find all to have the performance of some specific meaning of one's words among these basic emotion factors to the basic emotion factor.For example: obtain certain benefit, remove certain pressure, lose certain benefit or the like.In above-mentioned meaning of one's words performance; For example will " obtain ", action description such as " releasing ", " losing " partly is called " action content word "; And will be attached to the action content word and the part that forms the complete meaning of one's words is called " attached action content word ", like certain benefit, pressure, target etc.

Get back to Fig. 5, correctly capture meaning of one's words attribute in the sentence, just, utilize the basic emotion rule and know that net defines meaning of one's words label afterwards like step S510 in order to pick out by voice signal.

For instance, Fig. 7 is the synoptic diagram according to the meaning of one's words label that second embodiment of the invention illustrated.At this, meaning of one's words label comprises specific meaning of one's words label, negates meaning of one's words label and turnover meaning of one's words label.Wherein, specific meaning of one's words label is the word that is used for expressing the specific meaning of one's words, negates the word that meaning of one's words label has negative meaning, and turnover meaning of one's words label then has the word of tone turnover.

At this, just according to the verb of knowing in the net, select the meaning of one's words attribute that wherein has the specific meaning of one's words of expression, again these meaning of one's words attributes are divided into 15 types, become the definition of 15 specific meaning of one's words labels.

With meaning of one's words label [reaching] is example, and the word that in knowing net, has " Vachieve| reaches ", " fulful| realization ", " end| termination ", " finish| finishes ", " succeed| success " attribute will be classified into [reaching] this meaning of one's words label.For example, " finding " and " guessing " two words being recorded as in knowing net " DEF=Vachieve| reaches ", therefore, above-mentioned two words will be sorted out to [reaching] this meaning of one's words label.

The definition of negating meaning of one's words label is with all directly acquisitions of the word that has characteristic " neg| is negative " in the definition of knowing all words in the net, and becomes the definition of negative meaning label.

The definition of turnover meaning of one's words label then is to observe to know all adverbial words and conjunction in the net, will have the definition of word acquisition the becoming turnover meaning of one's words label of the turnover tone.In addition, according to the characteristic of turnover language, the meaning of one's words of will transferring label is divided into two kinds again, and one is [turnover-acquisition], and another is [turnover-omission].

After meaning of one's words label definition is accomplished, just can utilize it to come voice mood is classified.

Return Fig. 3, in step S315, when the meaning of one's words attribute of word meets meaning of one's words label, shown in step S320, indicate corresponding meaning of one's words label.Afterwards, shown in step S325, inquiry rhythm attribute data storehouse 303 is to be extended for meaning of one's words label on meaning of one's words rhythm label.In view of the above, can strengthen the emotional characteristics of each mood classification via rhythm attribute.Then, in step S340, during so far the pairing meaning of one's words rhythm of sentence writes down with meaning of one's words rhythm tag record.

For instance, after voice signal being converted into the sentence and the speech that breaks, can capture the rhythm attribute of this segment according to the segment of each word in voice signal, and be recorded to rhythm attribute data storehouse 303.At this, rhythm attribute comprises pitch, energy and the duration of a sound.And each rhythm attribute can be quantized into three degree respectively and representes, pitch and energy with high (H), in (M), low (L) represent, the duration of a sound then with length (L), in (M), short (S) represent.

Return step S315,, shown in step S330, in these words, get the emotional characteristics speech when the meaning of one's words attribute of word does not meet meaning of one's words label.Afterwards, shown in step S335, these emotional characteristics speech are combined corresponding rhythm attribute and become a characteristic set.Then, in step S340, characteristic set is write down so far in the pairing meaning of one's words rhythm record of sentence.

That is to say; Except the speech of being put on meaning of one's words label external with it; Other are not put in the word (for example, adjective or noun) of meaning of one's words label, get its emotional characteristics speech in knowing net 301; And this emotional characteristics speech is added among the meaning of one's words rhythm record meaning of one's words characteristic of a complete sentence at last just thus.

For instance, suppose that the sentence of voice signal conversion does, " all fast out of funds having lived, take and counted out money not bad today ".After through above-mentioned steps S310～S325; The meaning of one's words rhythm label that obtains " not having " is [negating _ PM_EH_DS]; The meaning of one's words rhythm label of " fortunately " is [turnover-acquisition _ PL_EM_DL]; The meaning of one's words rhythm label of " taking " is [obtaining _ PH_EH_DM], and other words then do not have any meaning of one's words label.

Then, do not indicate in the word of meaning of one's words label, find out the emotional characteristics speech according to its meaning of one's words attribute at other.Through after above-mentioned steps S330～S335, the characteristic set that obtains " money " is [wealth| wealth _ PH_EM_DS], and the characteristic set of " a bit " is [few| few _ PL_EM_DM].

In view of the above, the meaning of one's words rhythm that obtained record then is { [negating _ PM_EH_DS], [turnover-acquisition _ PL_EM_DL], [obtaining _ PH_EH_DM], [few| few _ PL_EM_DM], [wealth| wealth _ PH_EM_DS] }.

It should be noted that in the present embodiment only acquisition [turnover-acquisition] meaning of one's words rhythm label afterwards, so the meaning of one's words rhythm that obtains at last is recorded as { [obtaining _ PH_EH_DM], [few| few _ PL_EM_DM], [wealth| wealth _ PH_EM_DS] }.

And after the acquisition of the sentence after the identification via meaning of one's words attribute and rhythm attribute, become a meaning of one's words rhythm record.Afterwards, just use the technology of Date Mining, from whole meaning of one's words rhythm records, prospect out mood rule 304 automatically.

Because the sign program of meaning of one's words rhythm label is that the meaning of one's words rhythm label that is indicated with quilt is the center, therefore desired mood conformation of rules is T → D.Wherein, T represents meaning of one's words rhythm label, for example [reaches _ PH_EM_DS], [releasing _ PM_EH_DM] etc.And D is the attached content word that is attached to certain action, is utilize to know that net 301 captures main emotional characteristics speech and combines rhythm attribute and become characteristic set, like [symbol| symbol _ PM_EH_DM].At this T and D can be one or more.In view of the above, no matter be that T1^T2 → D1 or T3 → D2^D3 are all possible.

After obtaining mood rule 304 via the Date Mining technology, in step S345～S355, utilize mood rule 304, convert each meaning of one's words rhythm record into a meaning of one's words rhythm vector representation, wherein a dimension in each bar mood rule representation vector space.

If neutral mood rule is

; ...

; Happy mood rule is that the angry mood rule of

is

; ...

sad mood rule is

...

Then the meaning of one's words rhythm vector representation of each meaning of one's words rhythm record does

Sv = (v_{1}^{N}, v_{2}^{N}, . . ., v_{r_{N}}^{N}, v_{1}^{H}, v_{2}^{H}, . . ., v_{r_{H}}^{H}, v_{1}^{A}, v_{2}^{A}, . . . v_{r_{A}}^{A}, v_{1}^{S}, v_{2}^{S}, . . . v_{r_{S}}^{S}) .

This meaning of one's words rhythm vector then is to be r in dimension _N+ r _H+ r _A+ r _SA point in the vector space.

In step S345,, calculate the meaning of one's words mark of meaning of one's words rhythm record according to mood rule 304.Say at length when score, whether the T part of inspection meaning of one's words rhythm record meets the T part of mood rule 304 earlier.If the T of meaning of one's words rhythm record partly meets the T part of mood rule, just further D is partly checked, to calculate this dimension mark.

Because in knowing net 301 definition, word has social strata relation, this social strata relation is when two emotional characteristics speech are different between meaning of one's words rhythm record and mood rule 304, is used for calculating the meaning of one's words similarity of two emotional characteristics speech.In knowing net 301, if social strata relation is divided into the m layer the most deeply, stratum is darker, and it concerns that better promptly the meaning of one's words is more close.So two emotional characteristics speech D between meaning of one's words rhythm record and the mood rule 304 _i, D _jThe comparison mark be:

v_{p} (D_{i}, D_{j}) = \{\begin{matrix} 1 & if D_{i} = D_{j} \\ \frac{L (D_{i}, D_{j})}{m} {(\frac{1}{N (L (D_{i}, D_{j}))})}^{m - L (D_{i}, D_{j}) - 1} & if D_{i} &NotEqual; D_{j} \end{matrix} .

Wherein, L (D _i, D _j) be D _i, D _jThe same path length of maximal phase, N (L (D _i, D _j)) being the maximal phase son node number of footpath under the node of going the same way, m is a social strata relation.

After social strata relation is tried to achieve,, carry out computing again according to different relations on attributes again, shown in step S350, calculate the score of adeditive attribute according to adeditive attribute weight table 305 if the emotional characteristics speech in the meaning of one's words rhythm record still has an adeditive attribute to exist.For instance, Fig. 8 A is the synoptic diagram according to the adeditive attribute that second embodiment of the invention illustrated.Fig. 8 B is the synoptic diagram according to the adeditive attribute weight table that second embodiment of the invention illustrated.At Fig. 8 A,, eight kinds of adeditive attributes have been defined to know that net 301 is an example.In Fig. 8 B,, give each adeditive attribute one weighted value according to the relation between each adeditive attribute.

Return Fig. 4, then, in step S355, calculate the score of rhythm attribute according to rhythm attribute weight table 306.For instance, Fig. 9 is the synoptic diagram according to the rhythm attribute weight table that second embodiment of the invention illustrated.In Fig. 9, rhythm attribute can be quantized into three degree respectively represent, pitch and energy with high (H), in (M), low (L) represent, the duration of a sound then with length (L), in (M), short (S) represent.Degree gap according to quantizing gives a weighted value.H and M degree are nearer, and the weighted value of giving is 0.5; And H and L degree are far away, and the weighted value of giving is 0.25; In addition, M and H degree are nearer, and the weighted value of being given also is 0.5.By that analogy.

Afterwards, in step S360, calculate each the dimension mark in the meaning of one's words rhythm vector according to above-mentioned steps S345～S355.

For instance, suppose D _iBe [symbol| symbol _ PM_EH_DM], D _jBe [&language| language _ PH_EM_DM].D _iPath knowing net is 1.1.2.5.1.1, and D _jPath knowing net is 1.1.2.5.1, L (D _i, D _j) be 5, N (L (D _i, D _j)) be 4, try to achieve social strata relation mark v _pBe 5/28.Adeditive attribute mark v _rBe 0.5.Rhythm attribute scores v _fBe 0.5.The one dimension mark of trying to achieve at last in meaning of one's words rhythm vector is v=v _p* v _r* v _f

At last, in step S365, the meaning of one's words rhythm of each mood classification vector collect accomplish after, just can the gauss hybrid models construction go out the mood semanteme model of each mood classification.After the mood semanteme model construction is accomplished, just can begin to carry out the classification of voice mood.Below lifting another embodiment again explains.

The 3rd embodiment

Figure 10 is the process flow diagram according to the voice mood sorting technique that third embodiment of the invention illustrated.Please, in the present embodiment, receive after the voice signal to be measured, can carry out the analysis of the literal meaning of one's words and the acquisition of prosodic features to voice signal to be measured respectively with reference to Figure 10.

In the literal meaning of one's words analytically, at first shown in step S1010, utilize the acoustic model 1001 (for example HMM) of a speech recognition to convert voice signal to be measured into sentence.Then, shown in step 1015, utilize meaning of one's words label data bank 1002, rhythm attribute data storehouse 1003 and mood rule 1004, convert this sentence into meaning of one's words rhythm vector.

At this; Step 1015 is same or similar with the step S310～S360 among aforementioned second embodiment; And the method for building up of meaning of one's words label data bank 302, rhythm attribute data storehouse 303 and mood rule 304 among meaning of one's words label data bank 1002, rhythm attribute data storehouse 1003 and mood rule 1004 and aforementioned second embodiment is also same or similar, so neitherly give unnecessary details at this again.

On the other hand, in the acquisition of prosodic features, at first shown in step S1020, in voice signal to be measured, detect the remarkable segment of a mood.In voice signal to be measured, influenced the accuracy rate of emotion identification for fear of non-mood segment, thereby can detect the remarkable segment of mood (Emotionally Salient Segment) earlier, to capture prosodic features to remarkable segment.The remarkable segment of so-called mood is the pitch track (Pitch Contour) that calculates whole voice signal to be measured earlier.If there is continuous segment in the pitch track, then this continuous segment is defined as the remarkable segment of mood.

Then, in step S1025, capture prosodic features based on the remarkable segment of mood.Prosodic features comprises maximum pitch value, minimum pitch value, average pitch value, pitch variance, maximum energy value, minimum energy value, the average energy value, energy variance, maximum resonance peak value, minimum resonance peak, average resonance peak, resonance peak variance, amounts to 12 parameters.These 12 parameters are regarded as the rhythm vector of 12 dimensions.

At last in step S1030,, and decide the mood classification of voice signal to be measured according to shellfish formula theorem in conjunction with mood semanteme model 1005 and mood rhythm model 1006.At length say, just meaning of one's words rhythm vector substitution mood semanteme model 1005 is obtained the mood semanteme mark, and rhythm vector substitution mood rhythm model 1006 is obtained mood rhythm mark.Afterwards, the maximum mood classification of posterior probability by shellfish formula theorem is found out is last identification result.

At this, mood rhythm model 1006 is set up with gauss hybrid models.Just, capture above-mentioned 12 parameters respectively, train this mood rhythm model 1006 as rhythm vector with these 12 parameters with the voice signal in the mood corpus.

In sum, in the above-described embodiments, analyze the meaning of one's words attribute of each word, and further combine rhythm attribute, improve the accuracy of mood classification according to this.In addition, more can only analyze, influence the accuracy of mood classification with the segment of avoiding non-mood to the remarkable segment of mood in the voice signal.

Though the present invention discloses as above with preferred embodiment; Right its is not that any those skilled in the art are not breaking away from the spirit and scope of the present invention in order to qualification the present invention; When can doing a little modification and perfect, so protection scope of the present invention is when being as the criterion with what claims defined.

Claims

1. the method for building up of a mood semanteme model comprises:

One mood corpus is provided, and it comprises a plurality of voice signals that belong to a plurality of mood classifications;

Changing these a plurality of voice signals is a sentence;

This sentence speech that breaks is handled, and obtained a plurality of words;

Capture these an a plurality of words meaning of one's words attribute and the rhythm attribute separately in these a plurality of voice signals respectively, wherein this meaning of one's words attribute is to obtain according to a lexical knowledge bank, and this rhythm attribute is to be obtained by this a plurality of voice signal;

According to this meaning of one's words attribute, judge whether these a plurality of words belong to a meaning of one's words label, wherein this meaning of one's words label is to define and get according to this lexical knowledge bank;

When one of them belongs to this meaning of one's words label when these a plurality of words, be a meaning of one's words rhythm label, with this meaning of one's words rhythm tag record to meaning of one's words rhythm record in conjunction with this meaning of one's words label this rhythm attribute pairing with it;

Convert this meaning of one's words rhythm record into meaning of one's words rhythm vector; And

By these a plurality of voice signals this meaning of one's words rhythm vector separately, set up this mood semanteme model.

2. the method for building up of mood semanteme model as claimed in claim 1 is characterized in that, by these a plurality of voice signals this meaning of one's words rhythm vector separately, sets up the step of this mood semanteme model, comprising:

With these a plurality of voice signals this meaning of one's words rhythm vector substitution one gauss hybrid models separately, to set up this mood semanteme model.

3. the method for building up of mood semanteme model as claimed in claim 1 is characterized in that, converts this meaning of one's words rhythm record into this meaning of one's words rhythm vectorial step, comprising:

By this meaning of one's words rhythm record, prospect mood rule; And

According to this mood rule, convert this meaning of one's words rhythm record into this meaning of one's words rhythm vector.

4. the method for building up of mood semanteme model as claimed in claim 1 is characterized in that, more comprises:

According to a basic emotion factor and this lexical knowledge bank, stipulate this meaning of one's words label, wherein this meaning of one's words label comprises that a specific meaning of one's words label, is negated a meaning of one's words label and a turnover meaning of one's words label.

5. the method for building up of mood semanteme model as claimed in claim 3 is characterized in that, is this meaning of one's words rhythm label in conjunction with this meaning of one's words label this rhythm attribute pairing with it, with this meaning of one's words rhythm tag record to this meaning of one's words rhythm recorded steps, more comprise:

According to this meaning of one's words attribute, judge in not belonging to these a plurality of words of this meaning of one's words label, whether to comprise an emotional characteristics speech, be a characteristic set to combine this emotional characteristics speech this rhythm attribute corresponding, and this characteristic set is recorded to this meaning of one's words rhythm record with it.

6. the method for building up of mood semanteme model as claimed in claim 5 is characterized in that, according to this mood rule, converts this meaning of one's words rhythm record into this meaning of one's words rhythm vectorial step, comprising:

According to this mood rule, calculate a meaning of one's words mark and a rhythm mark of the emotional characteristics speech in this meaning of one's words rhythm record; And

According to this meaning of one's words mark and this rhythm mark, obtain these a plurality of voice signals this meaning of one's words rhythm separately respectively and be recorded in the dimension mark in this meaning of one's words rhythm vector, and the dimension of this meaning of one's words rhythm vector is to determine according to the quantity of this mood rule.

7. the method for building up of mood semanteme model as claimed in claim 1 is characterized in that, this rhythm attribute comprises pitch, energy and the duration of a sound.

8. the sorting technique of a voice mood comprises:

According to an included separately meaning of one's words attribute and the rhythm attribute of a plurality of test words in a plurality of voice signals; Set up a mood semanteme model; Wherein this meaning of one's words attribute is to obtain according to a lexical knowledge bank, and this rhythm attribute is to be obtained by this a plurality of voice signal;

Receive a voice signal to be measured;

Changing this voice signal to be measured is a sentence;

This sentence speech that breaks is handled, and obtained a plurality of words to be measured;

Capture these a plurality of words to be measured this meaning of one's words attribute and this rhythm attribute separately in this voice signal to be measured;

According to this meaning of one's words attribute, judge whether these a plurality of words to be measured belong to a meaning of one's words label, and wherein this meaning of one's words label is to define and get according to this lexical knowledge bank;

When one of them belongs to this meaning of one's words label when these a plurality of words to be measured, be a meaning of one's words rhythm label, with this meaning of one's words rhythm tag record to meaning of one's words rhythm record in conjunction with this meaning of one's words label this rhythm attribute pairing with it;

According to mood rule, convert this meaning of one's words rhythm record into meaning of one's words rhythm vector;

With this this mood semanteme model of meaning of one's words rhythm vector substitution, to obtain a mood semanteme mark; And

According to this mood semanteme mark, judge the mood classification of this voice signal to be measured.

9. the sorting technique of voice mood as claimed in claim 8 is characterized in that, more comprises:

In this voice signal to be measured, detect the remarkable segment of a mood;

Capture a plurality of prosodic features of the remarkable segment of this mood;

Should a plurality of prosodic features substitution one mood rhythm models, to obtain a mood rhythm mark.

10. the sorting technique of voice mood as claimed in claim 9 is characterized in that, according to this mood semanteme mark, judges the step of the mood classification of this voice signal to be measured, more comprises:

According to this mood semanteme mark and this mood rhythm mark, judge the mood classification of this voice signal to be measured.

11. the sorting technique of voice mood as claimed in claim 9 is characterized in that, in this voice signal to be measured, detects the step of the remarkable segment of this mood, comprising:

Capture a pitch track of this voice signal to be measured; And

Detect the continuous segment in this pitch track, with the continuous segment in this pitch track as the remarkable segment of this mood.

12. the sorting technique of voice mood as claimed in claim 8 is characterized in that, is this meaning of one's words rhythm label in conjunction with this meaning of one's words label this rhythm attribute pairing with it, with this meaning of one's words rhythm tag record to this meaning of one's words rhythm recorded steps, comprising:

According to this meaning of one's words attribute; Whether judgement is an emotional characteristics speech in not belonging to these a plurality of words to be measured of this meaning of one's words label; To combine this emotional characteristics speech this rhythm attribute corresponding with it is a characteristic set, and this characteristic set is recorded to this meaning of one's words rhythm record.

13. the sorting technique of voice mood as claimed in claim 8 is characterized in that, this meaning of one's words label comprises that a specific meaning of one's words label, is negated a meaning of one's words label and a turnover meaning of one's words label.

14. the sorting technique of voice mood as claimed in claim 8 is characterized in that, according to this mood rule, converts this meaning of one's words rhythm record into this meaning of one's words rhythm vectorial step, comprising:

According to this meaning of one's words mark and this rhythm mark, obtain this meaning of one's words rhythm and be recorded in the dimension mark in this meaning of one's words rhythm vector, and the dimension of this meaning of one's words rhythm vector is to determine according to the quantity of this mood rule.

15. the sorting technique of voice mood as claimed in claim 8 is characterized in that, sets up the step of this mood semanteme model, comprising:

According to a gauss hybrid models, set up this mood semanteme model.

16. the sorting technique of voice mood as claimed in claim 8 is characterized in that, this rhythm attribute comprises pitch, energy and the duration of a sound.