CN105280179A

CN105280179A - Text-to-speech processing method and system

Info

Publication number: CN105280179A
Application number: CN201510741753.2A
Authority: CN
Inventors: 王强
Original assignee: Xiaotiancai Technology Co Ltd
Current assignee: Xiaotiancai Technology Co Ltd
Priority date: 2015-11-02
Filing date: 2015-11-02
Publication date: 2016-01-27

Abstract

The invention discloses a text-to-speech processing method and a text-to-speech processing system. The text-to-speech processing method comprises the steps of: acquiring text information input by a user; converting the text information into speech; acquiring emotional characteristics of the text information, and reading prestored characteristic values corresponding to the emotional characteristics; and utilizing the characteristic values to adjust the speech and obtain output speech. The step of acquiring the emotional characteristics of the text information includes identifying key words in the text information, acquiring the emotional characteristics corresponding to the key words, or acquiring the emotional characteristics corresponding to the text information input by the user. The text-to-speech processing method converts the input text information into speech information with corresponding emotional characteristics by utilizing the characteristic values corresponding to the emotional characteristics, enriches characteristics of the output speech, restores the enriches characteristics that the user wants to express, and enhances user experience.

Description

A kind of disposal route of text-to-speech and system

Technical field

The present invention relates to word processing technical field, particularly relate to a kind of disposal route and system of text-to-speech.

Background technology

Usually occur that sender is inconvenient to speak in daily life and can only send information by word, but take over party again can only the situation of receiving speech information, at this time user just can convert the Word message that oneself edits to voice messaging by the technology of text-to-speech and sends out, but the voice messaging that current text-to-speech disposal route is converted to is only simple phonetic synthesis and voice to be pieced together, can not by the expression of feeling in speaker's voice out, it is very stiff that the voice that translation is come show, and user can not be wanted that the emotion feature expressed shows.The present invention utilizes emotion feature characteristic of correspondence value the Word message of input to be converted to acoustic information with corresponding emotion feature, has enriched the feature exporting voice, and also original subscriber wants the emotion feature expressed, improves Consumer's Experience.

Summary of the invention

The present invention proposes a kind of disposal route and system of text-to-speech, and the method utilizes emotion feature characteristic of correspondence value the Word message of input to be converted to acoustic information with corresponding emotion feature, and also original subscriber wants the emotion feature expressed.

For reaching this object, the present invention by the following technical solutions:

First aspect, the present invention proposes a kind of disposal route of text-to-speech, comprising:

Obtain the Word message of user's input;

Described Word message is converted to sound;

Obtain the emotion feature of described Word message, read the described emotion feature characteristic of correspondence value prestored;

Utilize described eigenwert to regulate described sound, obtain exporting voice.

Wherein, obtain the emotion feature of described Word message, comprise;

Identify the key word in described Word message, obtain emotion feature corresponding to described key word, or

Obtain the emotion feature of the described Word message of correspondence of user's input.

Wherein, after described Word message is converted to sound, utilizes before described eigenwert regulates described sound, also comprise, tone color process is carried out to described sound.

Wherein, tone color process is carried out to described sound, comprising: the information obtaining described user, read the speech data of the described user prestored, obtain tone color feature from speech data, use described tone color feature to carry out tone color process to described sound.

Wherein, use before described tone color feature carries out tone color process to described sound, also comprise: the speech data storing described user.

Before described reading described emotion feature characteristic of correspondence value, also comprise: store emotion feature characteristic of correspondence value.

Wherein, emotion feature comprises: sad, indignation, be full of loves, happiness; Eigenwert comprises: sound frequency, tone, word speed, soft and stress tone.

Second aspect, the present invention proposes a kind of disposal system of text-to-speech, comprising:

First acquiring unit: for obtaining the Word message of user's input;

Converting unit: for described Word message is converted to sound;

Second acquisition unit: for obtaining the emotion feature of described Word message, reads the described emotion feature characteristic of correspondence value prestored;

Emotion processing unit: for utilizing described eigenwert to regulate described sound, obtains exporting voice.

Wherein, second acquisition unit comprises:

Identify acquiring unit: for identifying key word in described Word message, obtaining emotion feature corresponding to described key word;

Direct acquiring unit: for obtaining the emotion feature of the described Word message of correspondence of user's input.

Wherein, also comprise, tone color processing unit: after for described converting unit described Word message being converted to sound, tone color process is carried out to described sound.

Wherein, tone color processing unit comprises:

3rd acquiring unit: for obtaining the information of described user, reads the speech data of the described user prestored, obtains tone color feature from speech data;

Processing unit: tone color process is carried out to described sound for using described tone color feature.

Wherein, also comprise, storage unit: for storing the speech data of described user and described emotion feature characteristic of correspondence value.

Beneficial effect of the present invention: the present invention, by obtaining the emotion feature of Word message, utilizes emotion feature characteristic of correspondence value the Word message of input to be converted to acoustic information with corresponding emotion feature, enriched the feature exporting voice; In addition the present invention also carries out tone color process to the voice messaging after conversion, the tone color feature of user is extracted according to the speech data of user, this tone color feature is used to carry out tone color process to voice messaging, personalized speech conversion can be carried out for user, also original subscriber wants the emotion feature of expression greatly, and Consumer's Experience is better.

Accompanying drawing explanation

Fig. 1 is the method flow diagram of the disposal route embodiment one of text-to-speech provided by the invention.

Fig. 2 is the method flow diagram of the disposal route embodiment two of text-to-speech provided by the invention.

Fig. 3 is the functional block diagram of the disposal system of text-to-speech provided by the invention.

Fig. 4 is the functional block diagram of the disposal system of another kind of text-to-speech provided by the invention.

Embodiment

Below in conjunction with accompanying drawing, further illustrate technical scheme of the present invention by specific embodiment.

Embodiment one

With reference to figure 1, a kind of disposal route of text-to-speech, comprising:

The Word message of S101, acquisition user input; The Word message that what automatic acquisition user edited want sends.

S102, described Word message is converted to sound.

Mainly utilize existing text-to-speech (tts, texttospeech) technology, Word message is converted to sound, but this step is only simply word is converted into sound, just simple voice are pieced together.

S103, obtain the emotion feature of described Word message, read the described emotion feature characteristic of correspondence value prestored.

Carry out keyword recognition to Word message, go out the emotion feature in Word message according to keyword recognition, emotion feature comprises sadness, indignation, is full of love, happiness etc.; From database, read the emotion feature characteristic of correspondence value prestored again according to emotion feature, this eigenwert is the frequency, word speed, tone, soft and stress tone etc. of sound under corresponding emotion feature.

Such as, keyword recognition is carried out to Word message, obtain the key word relevant to happiness, then judge that user wants the emotion expressed to be characterized as happiness, from database, read the characteristic of correspondence value such as sound frequency, word speed, tone, soft and stress tone that the emotion prestored is characterized as glad correspondence.

In order to improve the accuracy of emotion feature, except the emotion feature gone out according to keyword recognition in Word message, user manually can also input the emotion feature wanting to express.

S104, utilize described eigenwert to regulate described sound, obtain exporting voice.

The eigenwert such as sound frequency, word speed, tone, soft and stress tone under different emotion feature can be obtained from database, these eigenwerts are utilized to carry out emotion process to the sound obtained after simple conversion, final output voice, just with corresponding emotion, realize the object emotion feature of user being conveyed to recipient.

Such as learn that the emotion feature that user will express is glad by previous step, the frequency, word speed, tone, soft and stress tone etc. of sound when the eigenwert then extracted from database is exactly happiness, utilize these eigenwerts to be optimized the sound obtained after simple conversion, the sound of conversion just can show glad emotion state.

This method is by obtaining the emotion feature of Word message, emotion feature characteristic of correspondence value is utilized the Word message of input to be converted to acoustic information with corresponding emotion feature, enriched the feature exporting voice, also original subscriber wants the emotion feature of expression, improves Consumer's Experience.

Embodiment two

With reference to figure 2, present embodiments provide the disposal route of another kind of text-to-speech, comprising:

S201, obtain the Word message of user's input, the Word message of transmission that what automatic acquisition user edit want.

S202, described Word message is converted to sound.

S203, obtain the emotion feature of described Word message, read the described emotion feature characteristic of correspondence value prestored.This step is identical with S103, repeats no more herein.

S204, obtain the information of described user, read the speech data of the described user prestored, obtain tone color feature from speech data, use described tone color feature to carry out tone color process to described sound.

Speech data has comprised the tone color feature of user.Tone color feature refers to the characteristic voice of user, such as somebody has a low and deep voice, somebody's sound is limpid, when somebody speaks, the tone is soft, when somebody speaks, tone irritability etc. is all the tone color feature of user, different popularity tone color features is all not identical, and the tone color feature that same user shows when exchanging with different people is also different, therefore need to gather the tone color feature of user and store, not only need to preserve the tone color feature of user itself and also will preserve tone color feature when user and different people engage in the dialogue, utilizing these tone color features to carry out tone color process to the sound after simple conversion just can make change of voice process have more specific aim and personalization, style according to different users carries out tone color process to the sound after simple conversion.

Such as, when mother sends information to child, can learn that Word message is that mother sends to child, tone color feature time then now just to exchange with child before the mother preserved in database is reference, extract corresponding tone color feature, utilize corresponding tone color feature to carry out tone color process to the sound after simple conversion, obtain the sound of the tone color feature with mother.

S205, utilize described eigenwert to regulate described sound, obtain exporting voice.

By previous step, tone color process is carried out to the sound after conversion, made the sound changed have the tone color feature of user, obtain the sound with tone color feature; Again according to the emotion feature characteristic of correspondence value obtained in S203, this eigenwert is the frequency, word speed, tone, soft and stress tone etc. of sound under corresponding emotion feature.

These eigenwerts are utilized again to carry out emotion process to the sound having carried out tone color process, obtain final output voice, not only final output voice are with the emotion feature that the tone color feature but also want with user of user is expressed, by the acoustic information being reduced into user of the Word message genuineness of the transmission of user, and convey to recipient with corresponding emotion feature.

Such as, when mother sends information to child, the emotion feature that will express manually is inputted according to Word message clearance keyword recognition or mother, judge that obtaining mother wants the emotion expressed to be characterized as happiness, then extract eigenwert when emotion is characterized as happiness in a database; Secondly, can learn that Word message is that mother sends to child, tone color feature when exchanging with child before the mother then now just preserved from database is reference, extract corresponding tone color feature, utilize corresponding tone color feature to carry out change of voice process to the sound after simple conversion, obtain the sound of the tone color feature with mother; The sound of eigenwert to the tone color feature with mother when finally utilizing emotion to be characterized as happiness is optimized again, the Word message genuineness inputted the most at last be reduced into mother glad time sound send to child as output voice.

The present invention, by obtaining the emotion feature of Word message, utilizes emotion feature characteristic of correspondence value the Word message of input to be converted to acoustic information with corresponding emotion feature, has enriched the feature exporting voice; In addition the present invention also carries out tone color process to the voice messaging after conversion, the tone color feature of user is extracted according to the speech data of user, this tone color feature is used to carry out tone color process to voice messaging, personalized speech conversion can be carried out for user, also original subscriber wants the emotion feature of expression greatly, and Consumer's Experience is better.

Embodiment three

With reference to figure 3, present embodiments provide a kind of disposal system of text-to-speech, comprising:

101 first acquiring units: for obtaining the Word message of user's input; By the Word message wanting to send that the first acquiring unit automatic acquisition user edits.

102 converting units: for described Word message is converted to sound; Utilize existing text-to-speech (tts, texttospeech) technology, Word message is converted to sound, but this step is only simply word is converted into sound, just simple voice are pieced together.

103 second acquisition units: for obtaining the emotion feature of described Word message, read the described emotion feature characteristic of correspondence value prestored.103 second acquisition units, comprising:

1031 identify acquiring units: for identifying key word in described Word message, obtaining affective characteristics corresponding to described key word;

1032 direct acquiring units: for obtaining the emotion feature of the described Word message of correspondence of user's input.

By identifying that acquiring unit carries out keyword recognition to Word message, the emotion feature in Word message is gone out according to keyword recognition, in order to improve the accuracy of emotion feature, direct acquiring unit can also be passed through, the emotion feature wanting to express that direct acquisition user manually inputs, this emotion feature comprises sadness, indignation, is full of love, happiness etc.; From database, read the emotion feature characteristic of correspondence value prestored again according to emotion feature, this eigenwert is the frequency, word speed, tone, soft and stress tone etc. of sound under corresponding emotion feature.

Such as, by identifying that acquiring unit carries out keyword recognition to Word message, obtain the key word relevant to happiness, then judge that user wants the emotion expressed to be characterized as happiness, then from database, read the characteristic of correspondence value such as frequency, word speed, tone, soft and stress tone of sound during the emotion feature happiness prestored.

104 emotion processing units: for utilizing described eigenwert to regulate described sound, obtain exporting voice.

The eigenwert such as sound frequency, word speed, tone, soft and stress tone under different emotion feature can be obtained from database, utilize these eigenwerts to the sound emotion process obtained after simple conversion, final output voice, just with corresponding emotion, realize the object emotion feature of user being conveyed to recipient.

Such as, when what second acquisition unit extracted from database the is happiness eigenwert such as sound frequency, word speed, tone, soft and stress tone, utilize these eigenwerts to carry out emotion process to the sound obtained after simple conversion, the sound of conversion just can show glad emotion state.

Native system is by obtaining the emotion feature of Word message, emotion feature characteristic of correspondence value is utilized the Word message of input to be converted to acoustic information with corresponding emotion feature, enriched the feature exporting voice, also original subscriber wants the emotion feature of expression, improves Consumer's Experience.

Embodiment four

With reference to figure 4, present embodiments provide the disposal system of another kind of text-to-speech, comprising:

201 first acquiring units: for obtaining the Word message of user's input; By the Word message wanting to send that the first acquiring unit automatic acquisition user edits.

202 converting units: for described Word message is converted to sound; Mainly utilize existing text-to-speech (tts, texttospeech) technology, Word message is converted to sound, but this step is only simply word is converted into sound, just simple voice are pieced together.

203 second acquisition units: for obtaining the emotion feature of described Word message, read the described emotion feature characteristic of correspondence value prestored.103 second acquisition units, comprising:

2031 identify acquiring units: for identifying key word in described Word message, obtaining affective characteristics corresponding to described key word;

2032 direct acquiring units: for obtaining the emotion feature of the described Word message of correspondence of user's input.

By identifying that acquiring unit carries out keyword recognition to Word message, go out the emotion feature in Word message according to keyword recognition, this emotion feature comprises sadness, indignation, is full of love, happiness etc.; From database, read the emotion feature characteristic of correspondence value prestored again according to emotion feature, this eigenwert is the frequency, word speed, tone, soft and stress tone etc. of sound under corresponding emotion feature.

In order to improve the accuracy of emotion feature, can also direct acquiring unit be passed through, directly obtaining the emotion feature wanting to express that user manually inputs.

204 tone color processing units: for carrying out tone color process to described sound.204 tone color processing units, comprising:

2041 the 3rd acquiring units: for obtaining the information of described user, read the speech data of the described user prestored, obtain tone color feature from speech data.

Speech data has comprised the tone color feature of user.Tone color feature refers to the characteristic voice of user, such as somebody has a low and deep voice, somebody's sound is limpid, when somebody speaks, the tone is soft, when somebody speaks, tone irritability etc. is all the tone color feature of user, different popularity tone color features is all not identical, and the tone color feature of same user when exchanging with different people is also different, therefore need to gather the tone color feature of user and store, not only preserve the tone color feature of user itself and also will preserve tone color feature when user and different people engage in the dialogue.

2042 processing units: tone color process is carried out to described sound for using described tone color feature.

Utilizing tone color feature to be optimized the sound after simple conversion just can make change of voice process have more specific aim and personalization, and the style according to different users is optimized the sound after simple conversion.

Such as, when mother sends information to child, can learn that Word message is that mother sends to child, tone color feature when exchanging with child before the mother then now just preserved from database is reference, extract corresponding tone color feature, utilize corresponding tone color feature to carry out change of voice process to the sound after simple conversion, obtain the sound of the tone color feature with mother.

205 emotion processing units: for utilizing described eigenwert to regulate described sound, obtain exporting voice.

Obtain the sound through tone color process after tone color processing unit, make the sound changed have the tone color feature of user, obtain the sound with tone color feature; Again according to the emotion feature characteristic of correspondence value obtained in second acquisition unit, this eigenwert is the frequency, word speed, tone, soft and stress tone etc. of sound under corresponding emotion feature.

Utilize these eigenwerts to the sound emotion process again having carried out tone color process, obtain final output voice, not only final output voice are with the emotion feature that the tone color feature but also want with user of user is expressed, by the acoustic information being reduced into user of the Word message genuineness of the transmission of user, and convey to recipient with corresponding emotion feature.

206 storage unit: for storing the speech data of described user and described emotion feature characteristic of correspondence value.

The present invention utilizes emotion feature characteristic of correspondence value the Word message of input to be converted to acoustic information with corresponding emotion feature, has enriched the feature exporting voice; In addition the present invention also carries out tone color process to the voice messaging after conversion, the tone color feature of user is extracted according to the speech data of user, this tone color feature is used to process further voice messaging, personalized process can be carried out for user, also original subscriber wants the emotion feature of expression greatly, and Consumer's Experience is better.

Below the know-why of the embodiment of the present invention is described in conjunction with specific embodiments; these describe the principle just in order to explain the embodiment of the present invention; and the restriction that can not be interpreted as by any way embodiment of the present invention protection domain; those skilled in the art does not need to pay other embodiment that performing creative labour can associate the embodiment of the present invention, these modes all by fall into the embodiment of the present invention protection domain within.

Claims

1. a disposal route for text-to-speech, is characterized in that, comprising:

Obtain the Word message of user's input;

Described Word message is converted to sound;

Utilize described eigenwert to regulate described sound, obtain exporting voice;

Wherein, obtain the emotion feature of described Word message, comprise;

2. disposal route according to claim 1, is characterized in that, described described Word message is converted to sound after, utilize before described eigenwert regulates described sound, also comprise, tone color process is carried out to described sound.

3. disposal route according to claim 2, it is characterized in that, described tone color process is carried out to described sound, comprise: the information obtaining described user, read the speech data of the described user prestored, obtain tone color feature from speech data, use described tone color feature to carry out tone color process to described sound.

4. disposal route according to claim 3, is characterized in that, described use described tone color feature also comprises: the speech data storing described user before carrying out tone color process to described sound;

5. disposal route according to claim 4, is characterized in that, described emotion feature comprises: sad, indignation, be full of loves, happiness;

Described eigenwert comprises: sound frequency, tone, word speed, soft and stress tone.

6. a disposal system for text-to-speech, is characterized in that, comprising:

First acquiring unit: for obtaining the Word message of user's input;

Converting unit: for described Word message is converted to sound;

7. disposal system according to claim 6, is characterized in that, described second acquisition unit comprises:

Identify acquiring unit: for identifying key word in described Word message, obtaining emotion feature corresponding to described key word; With

8. disposal system according to claim 6, is characterized in that, also comprises, tone color processing unit: after for described converting unit described Word message being converted to sound, carry out tone color process to described sound.

9. disposal system according to claim 8, is characterized in that, described tone color processing unit comprises:

10. disposal system according to claim 9, is characterized in that, also comprises:

Storage unit: for storing the speech data of described user and described emotion feature characteristic of correspondence value.