CN108615524A

CN108615524A - A kind of phoneme synthesizing method, system and terminal device

Info

Publication number: CN108615524A
Application number: CN201810456213.3A
Authority: CN
Inventors: 朱坤
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-05-14
Filing date: 2018-05-14
Publication date: 2018-10-02
Also published as: WO2019218481A1

Abstract

The present invention is suitable for technical field of data processing, provides a kind of phoneme synthesizing method, system and terminal device, including：Text data is obtained, subordinate sentence extracts tone Feature Words, and the emotion attribute of each sentence is analyzed according to tone Feature Words；The basic speech data of each sentence are synthesized according to the emotion attribute of each sentence based on default speech database and default sound pronunciation model；Prosodic features adjustment is carried out to the basic speech data of each sentence according to tone Feature Words, obtains target speech data.By extracting the tone Feature Words of every sentence in text data come the emotion attribute of anolytic sentence, and the basic speech data adjusted by presetting the emotion attribute of sound pronunciation models coupling sentence, prosodic features adjustment is being carried out to basic speech data, is obtaining the higher target speech data of anthropomorphic degree.When carrying out the pronunciation of Sentiment orientation word, more rich in emotion, it is more in line with the feature of actual user's pronunciation, effectively improves the quality of speech synthesis data.

Description

A kind of phoneme synthesizing method, system and terminal device

Technical field

The invention belongs to a kind of technical field of data processing more particularly to phoneme synthesizing method, system and terminal devices.

Background technology

Audiobook is a kind of individual or more people according to manuscript and is recorded by different sound emoticon and recording format Works.Audiobook on the market is all manually to record and save in advance at present, is directly played when in use.However this is needed A large amount of human resources are expended to be recorded in advance.In order to save human cost, language can be synthesized by speech synthesis technique Sound data.Speech synthesis technique refer to by the methods of mechanically or electrically generating artificial voice, it is that computer oneself is generated or Externally input text information be changed into the technology that the voice that can listen to understand is exported.Current phonetic synthesis skill Art is all first to be analyzed to obtain word and word in text data to text data, later from voice when carrying out phonetic synthesis Library obtains these words and the corresponding basic voice data of word, finally is combined to obtain in order by the basic voice data of acquisition Final voice data, so obtained from voice data personification degree it is not high, thus there are problems that poor quality.

In conclusion there is the voice data poor quality that synthesis obtains in existing speech synthesis technique.

Invention content

In view of this, an embodiment of the present invention provides a kind of phoneme synthesizing method, system and terminal device, it is existing to solve There is the voice data poor quality that synthesis obtains in speech synthesis technique.

The first aspect of the present invention provides a kind of phoneme synthesizing method, including：

Text data is obtained, subordinate sentence extracts tone Feature Words, and the feelings of each sentence are analyzed according to the tone Feature Words Feel attribute；

It is synthesized respectively according to the emotion attribute of each sentence based on default speech database and default sound pronunciation model The basic speech data of a sentence；

Prosodic features adjustment is carried out to the basic speech data of each sentence according to the tone Feature Words, obtains mesh Mark voice data.

The second aspect of the present invention provides a kind of speech synthesis system, including：

Sentiment analysis module, for obtaining text data, subordinate sentence extracts tone Feature Words, and according to the tone Feature Words Analyze the emotion attribute of each sentence；

Voice synthetic module, for being based on default speech database and default sound pronunciation model according to each sentence Emotion attribute synthesize the basic speech data of each sentence；

Voice adjusts module, for carrying out rhythm to the basic speech data of each sentence according to the tone Feature Words Character adjustment is restrained, target speech data is obtained.

The third aspect of the present invention provides a kind of terminal device, including memory, processor and is stored in described deposit In reservoir and the computer program that can run on the processor, the processor realized when executing the computer program with Lower step：

The fourth aspect of the present invention provides a kind of computer readable storage medium, and the computer readable storage medium is deposited Computer program is contained, the computer program realizes following steps when being executed by processor：

A kind of phoneme synthesizing method, system and terminal device provided by the invention, by extracting every language in text data The tone Feature Words of sentence come the emotion attribute of anolytic sentence, and the emotion attribute tune by presetting sound pronunciation models coupling sentence Whole obtained basic speech data are carrying out prosodic features adjustment to basic speech data, are obtaining the higher target language of anthropomorphic degree Sound data.When carrying out the pronunciation of Sentiment orientation word, more rich in emotion, it is more in line with the feature of actual user's pronunciation, is effectively improved The quality of speech synthesis data solves asking for the voice data poor quality that existing speech synthesis technique presence synthesis obtains Topic.

Description of the drawings

It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description be only the present invention some Embodiment for those of ordinary skill in the art without having to pay creative labor, can also be according to these Attached drawing obtains other attached drawings.

Fig. 1 is a kind of implementation process schematic diagram for phoneme synthesizing method that the embodiment of the present invention one provides；

Fig. 2 is the implementation process schematic diagram of one step S101 of corresponding embodiment provided by Embodiment 2 of the present invention；

Fig. 3 is the implementation process schematic diagram for the one step S102 of corresponding embodiment that the embodiment of the present invention three provides；

Fig. 4 is the implementation process schematic diagram for the one step S103 of corresponding embodiment that the embodiment of the present invention four provides；

Fig. 5 is a kind of structural schematic diagram for speech synthesis system that the embodiment of the present invention five provides；

Fig. 6 is the structural schematic diagram of sentiment analysis module 101 in the corresponding embodiment five that the embodiment of the present invention six provides；

Fig. 7 is the structural schematic diagram of voice synthetic module 102 in the corresponding embodiment five that the embodiment of the present invention seven provides；

Fig. 8 is the structural schematic diagram of voice adjustment module 103 in the corresponding embodiment five that the embodiment of the present invention eight provides；

Fig. 9 is the schematic diagram for the terminal device that the embodiment of the present invention nine provides.

Specific implementation mode

In being described below, for illustration and not for limitation, it is proposed that such as tool of particular system structure, technology etc Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific The present invention can also be realized in the other embodiments of details.In other situations, it omits to well-known system, system, electricity The detailed description of road and method, in case unnecessary details interferes description of the invention.

There is the voice data poor quality that synthesis obtains to solve existing speech synthesis technique in the embodiment of the present invention The problem of, a kind of phoneme synthesizing method, system and terminal device are provided, the tone for extracting every sentence in text data is passed through Feature Words carry out the emotion attribute of anolytic sentence, and adjusted by presetting the emotion attribute of sound pronunciation models coupling sentence Basic speech data are carrying out prosodic features adjustment to basic speech data, are obtaining the higher target speech data of anthropomorphic degree. When carrying out the pronunciation of Sentiment orientation word, more rich in emotion, it is more in line with the feature of actual user's pronunciation, effectively improves phonetic synthesis number According to quality, solve the problems, such as the voice data poor quality that existing speech synthesis technique is obtained in the presence of synthesis.

In order to illustrate technical solutions according to the invention, illustrated below by specific embodiment.

Embodiment one：

As shown in Figure 1, present embodiments providing a kind of phoneme synthesizing method, specifically include：

Step S101：Text data is obtained, subordinate sentence extracts tone Feature Words, and each according to tone Feature Words analysis The emotion attribute of sentence.

In a particular application, the text data with text information is obtained by terminal, the format of this article notebook data can be with For text formatting (txt), rich text format (Rich Text Format, RTF) or document (Document, DOC) etc., also may be used Think the file that portable document format (Portable Document Format, PDF) or picture etc. include text information, PDF or picture are converted into again to directly read the file of text data, do not limited herein.

In a particular application, after getting text data, as unit of sentence, the tone feature in each sentence is extracted Word.Tone Feature Words refer to words, symbol or words combination with emotion, such as " happy ", " day ", " excellent " " well ", "" etc. indicate mood and the tone words and symbol.Since tone Feature Words can embody the Sentiment orientation of user, Different prosodic features is had when being pronounced.Therefore the tone Feature Words in each sentence are extracted, and according to language Gas Feature Words analyze the emotion attribute of each sentence.

In a particular application, tone Feature Words database is pre-set, each language is extracted according to tone Feature Words database The tone Feature Words to match with tone Feature Words database in sentence.When expressing mood, user can using word contamination come Express the mood of oneself.In order to enrich tone Feature Words database, accurately extract the tone Feature Words in each sentence, according to language The rule of combination of method rule setting words extracts the words for meeting rule of combination when extracting tone Feature Words together. Illustratively, said combination rule includes but not limited to following rule of combination：

A：Degree adverb+emotion word, such as " compared with+it is good ", " very+good ", " special+good "；

B：Negative word+emotion word, such as " not+good ", " not+bad "；

C：Negative word+degree adverb+emotion word, such as " not+too+good ", " not+too+bad "；

D：Degree adverb+negative word+emotion word, such as " very+or not good ", " also+or not bad ".

In a particular application, in order to ensure the phonetic synthesis effect of every a word, the present embodiment be as unit of sentence into Row tone Feature Words extract, and the emotion attribute of the sentence is analyzed for the tone Feature Words of each sentence.

In a particular application, first each sentence of text data is split, is split as multiple word contaminations, will tears open Neutral word and tone Feature Words are divided into the words combination divided, and wherein tone Feature Words include front word and negation words, Ke Yigen The emotion attribute of the sentence is obtained according to above-mentioned neutral word, front word and negation words proportion grading shared in the sentence.

Step S102：Based on default speech database and default sound pronunciation model according to the emotion category of each sentence Property synthesizes the basic speech data of each sentence.

In a particular application, the sentences of multiple word contaminations will be split as unit of words in default voice number According to the voice data for obtaining each words in library, the voice data of multiple words is synthesized to obtain the voice data of whole sentence.

In a particular application, after the voice data for obtaining whole sentence, default voice is based on according to the emotion attribute of the sentence and is sent out Sound model carries out acoustic feature adjustment to voice data, to obtain the basic speech data of corresponding sentence emotion attribute so that hair The pronunciation of sound and actual user are more nearly.In a particular application, above-mentioned acoustic feature includes intensity of sound, word speed, tone High low feature.

In a particular application, the building process of above-mentioned default sound pronunciation model is：Acquire the voice of a large amount of actual users Data utilize nerve net as training sample, and to carrying out emotion attribute label in voice data as unit of each sentence Network is trained, and obtains the acoustic feature of the corresponding sound pronunciation of each emotion attribute.

Step S103：Prosodic features tune is carried out to the basic speech data of each sentence according to the tone Feature Words It is whole, obtain target speech data.

In a particular application, basic speech data are the emotion attributes based on whole sentence, in order to further more be met reality Pronunciation characteristic of the border user in corresponding emotion carries out prosodic features to the basic speech data of whole sentence again for tone Feature Words Adjustment.

In a particular application, above-mentioned prosodic features includes loudness of a sound, pitch and the duration of a sound.Loudness of a sound is then the stress of voice, schwa Change Deng power；Pitch is then the word reconciliation intonation of voice；The duration of a sound is then the rhythm speed of voice.

In a particular application, since the user emotion of the tone Feature Words expression of different emotions tendency is different, and The prosodic features of the corresponding speech of different moods can have larger difference, such as it is happy when tone can obviously than sadness when sound It is turned up.I.e. each tone Feature Words correspond to one kind (or a kind of) prosodic features and therefore first get the corresponding rhythm of tone Feature Words Feature is restrained, prosodic features tune is carried out to the tone Feature Words in basic speech data according to the prosodic features of the tone Feature Words It is whole, if there are multiple tone Feature Words in a sentence, prosodic features adjustment is all carried out to whole tone Feature Words, is obtained more Meet the voice data of actual user's pronunciation.

In a particular application, being adjusted to prosodic features can be the prosodic features for pre-setting all kinds of tone Feature Words Parameter, such as setting indicate that the prosodic features parameter of happy tone Feature Words is loudness of a sound 1, pitch 1 and the duration of a sound 1, and setting indicates The prosodic features parameter of sad tone Feature Words is loudness of a sound 2, pitch 2 and the duration of a sound 2.Either based on basic speech data The percentage of prosodic features parameter is adjusted, and such as indicates happy tone Feature Words when carrying out prosodic features adjustment, be by The corresponding pitch of tone Feature Words increases 10% on the basis of basic speech data, and the duration of a sound is shortened 15%.

Phoneme synthesizing method provided in this embodiment is divided by extracting the tone Feature Words of every sentence in text data The emotion attribute of sentence is analysed, and the emotion attribute by presetting sound pronunciation models coupling sentence adjusts the basis personalized Voice data, then prosodic features adjustment is carried out to basic speech data, obtain the target voice for being more nearly actual user's pronunciation Data.When carrying out the pronunciation of Sentiment orientation word, more rich in emotion, it is more in line with the feature of actual user's pronunciation, effectively improves language The quality of sound generated data solves asking for the voice data poor quality that existing speech synthesis technique presence synthesis obtains Topic.

Embodiment two：

As shown in Fig. 2, in the present embodiment, the step S101 in embodiment one is specifically included：

Step S201：The sentiment analysis parameter of multiple default dimensions of the sentence is obtained according to the tone Feature Words.

In a particular application, to each vocabulary from three default dimensions such as [front, neutral, negative] after sentence being split Degree scores, and obtains the sentence and corresponds to the scoring synthesis of three default dimensions such as [positive, neutral, negative], then calculates separately The ratio of score shared by three default dimensions of the sentence.Illustratively, neutral word and the tone can be divided into after sentence is split Feature Words, tone Feature Words can be divided into front word and negation words.When classifying to tone Feature Words, to tone Feature Words into Row is classified and the grade scoring of the tone Feature Words corresponding level is arranged.

Such as " happy "：It is arranged that it is front word and its grade scoring is+2；

" excellent "：It is arranged that it is front word and grade scoring is+5；

" bad "：It is arranged that it is negation words and grade scoring is -2 "；

" very bad "：It is arranged that it is negation words and grade scoring is -5.It should be noted that it is above-mentioned to tone Feature Words into Row classification and scoring can be based on neural network structure classification grading module and be realized that specific implementation means are not subject to It repeats.

Step S202：The ratio of the sentiment analysis parameter of total dimension shared by sentiment analysis parameter to the multiple default dimension Example determines the emotion attribute of the sentence.

In a particular application, the score for presetting dimension is calculated according to the grade scoring of tone Feature Words, then is calculated each pre- If the ratio of total dimension score shared by dimension, such as scoring of three default dimensions of a sentence is [+10,4, -6], at this time It is [+0.5,0.2, -0.3] to fraction scale shared by three dimension scores.

In a particular application, the text emotion disaggregated model of supporting vector mechanism corresponding to each sentence default three is utilized Ratio value shared by the score of a dimension is calculated, and then judges the corresponding emotion attribute of the sentence.It advances with a large amount of Different text datas are trained text emotion disaggregated model, to obtain the text emotion analysis model for meeting vectorial mechanism. Wherein it is possible to which emotion attribute is divided into a variety of different states, quantify the emotion of user with this, obtained each sentence pair That answers quantifies to indicate the emotion attribute of the sentence.Illustratively, the emotion attribute of above-mentioned quantization includes but not limited to：Happily, Sad, indignation, it is frightened, feel uncertain and normal.

Embodiment three：

As shown in figure 3, in the present embodiment, the step S102 in embodiment one is specifically included：

Step S301：Voice data corresponding with each word of sentence is obtained from the default speech database.

In a particular application, each sentence is split, multiple word contaminations is split as, pre- as unit of words If obtaining the voice data of each words in speech database

Step S302：The voice data is subjected to the electronic voice data that synthesis acquires the sentence.

In a particular application, the voice data for the voice data of multiple words being synthesized to obtain whole sentence obtains the language The electronic voice data of sentence.

Step S303：By sound pronunciation model according to the emotion attribute of the sentence to the sound of the electronic voice data High, loudness of a sound and word speed are adjusted, and obtain the basic speech data of the sentence.

In a particular application, after obtaining electronic voice data, default sound pronunciation is based on according to the emotion attribute of the sentence Model is adjusted the pitch, loudness of a sound and word speed of electronic voice data, to obtain the basic speech of corresponding sentence emotion attribute Data so that the pronunciation with actual user of pronouncing is more nearly.

Example IV：

As shown in figure 4, in the present embodiment, the step S103 in embodiment one is specifically included：

Step S401：The prosodic features adjustment rule of tone Feature Words in each sentence is obtained, including：Pitch, loudness of a sound and The adjustment rule of the prosodic features parameter such as duration of a sound.

In a particular application, classified and set the grade of different classes of tone Feature Words, root to tone Feature Words Corresponding prosodic features adjustment rule, the specially sound of the tone Feature Words are obtained according to the classification of the tone Feature Words and grade The adjustment rule of the prosodic features parameter such as high, loudness of a sound and the duration of a sound.

In a particular application, the prosodic features parameter of different classes of different grades of tone Feature Words is preset, then right Each tone Feature Words carry out classification and grade classification, and then obtain its corresponding prosodic features parameter acquiring its corresponding rhythm Character adjustment rule.

Step S402：Pitch, sound of the rule to the basic speech data are adjusted according to the prosodic features of tone Feature Words The strong and duration of a sound is adjusted, and obtains target speech data.

In a particular application, it after getting the corresponding prosodic features adjustment rule of tone Feature Words, is advised according to the adjustment Then pitch, loudness of a sound and the duration of a sound of each tone Feature Words in basic speech data are adjusted, can be obtained after being adjusted To the target speech data closer to actual user's pronunciation.

In a particular application, above-mentioned adjustment process can adjust rule according to prosodic features tone Feature Words are calculated Prosodic features parameter, then the prosodic features of corresponding tone Feature Words in basic speech data is adjusted to the prosodic features Parameter.Can also be that basic speech data are adjusted using the form of percentage according to prosodic features adjustment rule, herein It does not limit.

In one embodiment, further comprising the steps of after above-mentioned steps S402：

Obtain the prosodic features parameter of the target speech data；

The average value of the prosodic features parameter of each sentence is calculated according to the prosodic features parameter of target speech data；

Pitch, loudness of a sound and the duration of a sound of each word of sentence are adjusted according to the average value, seamlessly transitted Voice data.

In a particular application, since the adjustment of above-mentioned prosodic features is only to be directed to tone Feature Words, it is thus possible to can cause The problem of existing voice mutation so that the pronunciation appearance for the words that is connected before and after the pronunciation of tone Feature Words and tone Feature Words is lofty not Harmonious situation.It in a particular application, can be by the target language after prosodic features adjusts in order to avoid the above problem The prosodic features parameter of sound data carries out adjustment again as unit of whole sentence so that sentence can seamlessly transit.Specially： The average value of the prosodic features parameter of each sentence is obtained according to the prosodic features parameter of target speech data.For tone feature The connected words of word, is adjusted the pitch, loudness of a sound and the duration of a sound of the words using above-mentioned average value.In a particular application, when It, only need to be for the words that first tone Feature Words is connected with the last one tone Feature Words when multiple tone Feature Words are connected Pitch, loudness of a sound and the duration of a sound are adjusted.

Illustratively, as " we go recreation ground to play good or not afternoon！" in, " good or not " is used as tone Feature Words, sound High and tone can all increase, and the pitch and tone of coupled " objects for appreciation " will not then increase, it is thus possible to appearance from " object for appreciation " to The case where tone and pitch are promoted suddenly when " good or not " so that pronunciation transition is unnatural.Therefore according to the prosodic features of whole sentence The prosodic features mean parameter of whole sentence is calculated in parameter, is the average value by the prosodic features parameter adjustment of " object for appreciation ", then can The drop between the loudness of a sound and tone of " object for appreciation " and " good or not " is enough efficiently reduced, realization seamlessly transits.

Embodiment five：

As shown in figure 5, the present embodiment provides a kind of speech synthesis system 100, for executing the step of the method in embodiment one Suddenly comprising sentiment analysis module 101, voice synthetic module 102 and voice adjust module 103.

Sentiment analysis module 101 extracts tone Feature Words for obtaining text data, subordinate sentence, and according to the tone feature Word analyzes the emotion attribute of each sentence；

Voice synthetic module 102 is used for based on default speech database and default sound pronunciation model according to each language The emotion attribute of sentence synthesizes the basic speech data of each sentence；

Voice adjusts module 103 and is used to carry out the basic speech data of each sentence according to the tone Feature Words Prosodic features adjusts, and obtains target speech data.

It should be noted that speech synthesis system provided in an embodiment of the present invention, due to real with method shown in Fig. 1 of the present invention It applies example and is based on same design, the technique effect brought is identical as embodiment of the method shown in Fig. 1 of the present invention, and particular content can be found in Narration in embodiment of the method shown in Fig. 1 of the present invention, details are not described herein again.

Therefore, a kind of speech synthesis system provided in this embodiment, equally can be by extracting every language in text data The tone Feature Words of sentence come the emotion attribute of anolytic sentence, and the emotion attribute tune by presetting sound pronunciation models coupling sentence The whole basic speech data to be personalized, then prosodic features adjustment is carried out to basic speech data, it obtains being more nearly reality The target speech data of user pronunciation.When carrying out the pronunciation of Sentiment orientation word, more rich in emotion, it is more in line with actual user's pronunciation Feature, effectively improve the quality of speech synthesis data, solve the voice that existing speech synthesis technique is obtained in the presence of synthesis The low problem of the quality of data.

Embodiment six：

As shown in fig. 6, in the present embodiment, the sentiment analysis module 101 in embodiment five includes right for executing Fig. 2 institutes The structure of method and step in the embodiment answered comprising parameter acquiring unit 201 and sentiment analysis unit 202.

Parameter acquiring unit 201 is used to obtain the emotion of multiple default dimensions of the sentence according to the tone Feature Words Analyze parameter.

Emotion point of the sentiment analysis unit 202 for total dimension shared by the sentiment analysis parameter to the multiple default dimension Analyse the emotion attribute of sentence described in the ratio-dependent of parameter.

Embodiment seven：

As shown in fig. 7, in the present embodiment, the voice synthetic module 102 in embodiment five includes right for executing Fig. 3 institutes The structure of method and step in the embodiment answered comprising voice data acquiring unit 301, voice data synthesis unit 302 with And acoustic feature adjustment unit 303.

Voice data acquiring unit 301 is used to obtain from the default speech database corresponding with each word of sentence Voice data.

Voice data synthesis unit 302 is used to the voice data carrying out the electronic speech that synthesis acquires the sentence Data.

Acoustic feature adjustment unit 303 is used for through sound pronunciation model according to the emotion attribute of the sentence to the electricity Pitch, loudness of a sound and the word speed of sub- voice data are adjusted, and obtain the basic speech data of the sentence.

Embodiment eight：

As shown in figure 8, in the present embodiment, the voice adjustment module 103 in embodiment five includes right for executing Fig. 4 institutes The structure of method and step in the embodiment answered comprising prosodic features adjusts Rule unit 401 and prosodic features adjustment Unit 402.

Prosodic features adjustment Rule unit 401 obtains the prosodic features adjustment rule of tone Feature Words in each sentence Then, including：The adjustment rule of the prosodic features parameter such as pitch, loudness of a sound and duration of a sound.

Prosodic features adjustment unit 402 is used for according to the prosodic features of tone Feature Words adjustment rule to the basic speech Pitch, loudness of a sound and the duration of a sound of data are adjusted, and obtain target speech data.

In one embodiment, above-mentioned voice adjustment module 103 further include characteristic parameter acquiring unit, computing unit and Seamlessly transit adjustment unit.

Characteristic parameter acquiring unit is used to obtain the prosodic features parameter of the target speech data.

Computing unit is used to calculate the prosodic features parameter of each sentence according to the prosodic features parameter of target speech data Average value.

Seamlessly transit adjustment unit, for according to the average value to pitch, loudness of a sound and the duration of a sound of each word of sentence into Row adjustment, the voice data seamlessly transitted.

Embodiment nine：

Fig. 9 is the schematic diagram for the terminal device that the embodiment of the present invention nine provides.As shown in figure 9, the terminal of the embodiment is set Standby 9 include：Processor 90, memory 91 and it is stored in the meter that can be run in the memory 91 and on the processor 90 Calculation machine program 92, such as program.The processor 90 realizes above-mentioned each phonetic synthesis side when executing the computer program 92 Step in method embodiment, such as step S101 to S103 shown in FIG. 1.Alternatively, the processor 90 executes the computer The function of each module/unit in above system embodiment, such as the function of module 101 to 103 shown in Fig. 5 are realized when program 92.

Illustratively, the computer program 92 can be divided into one or more module/units, it is one or Multiple module/units are stored in the memory 91, and are executed by the processor 90, to complete the present invention.Described one A or multiple module/units can be the series of computation machine program instruction section that can complete specific function, which is used for Implementation procedure of the computer program 92 in the terminal device 9 is described.For example, the computer program 92 can be divided It is as follows to be cut into sentiment analysis module, voice synthetic module and voice adjustment module, each module concrete function：

The terminal device 9 can be the calculating such as desktop PC, notebook, palm PC and high in the clouds management server Equipment.The terminal device may include, but be not limited only to, processor 90, memory 91.It will be understood by those skilled in the art that Fig. 9 is only the example of terminal device 9, does not constitute the restriction to terminal device 9, may include more more or fewer than illustrating Component, either combines certain components or different components, for example, the terminal device can also include input-output equipment, Network access equipment, bus etc..

Alleged processor 90 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor Deng.

The memory 91 can be the internal storage unit of the terminal device 9, such as the hard disk of terminal device 9 or interior It deposits.The memory 91 can also be to be equipped on the External memory equipment of the terminal device 9, such as the terminal device 9 Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge Deposit card (Flash Card) etc..Further, the memory 91 can also both include the storage inside list of the terminal device 9 Member also includes External memory equipment.The memory 91 is for storing needed for the computer program and the terminal device Other programs and data.The memory 91 can be also used for temporarily storing the data that has exported or will export.

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each work( Can unit, module division progress for example, in practical application, can be as needed and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of the system are divided into different functional units or module, more than completion The all or part of function of description.Each functional unit, module in embodiment can be integrated in a processing unit, also may be used It, can also be above-mentioned integrated during two or more units are integrated in one unit to be that each unit physically exists alone The form that hardware had both may be used in unit is realized, can also be realized in the form of SFU software functional unit.In addition, each function list Member, the specific name of module are also only to facilitate mutually distinguish, the protection domain being not intended to limit this application.It is above-mentioned wireless The specific work process of unit in terminal, module, can refer to corresponding processes in the foregoing method embodiment, no longer superfluous herein It states.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment The part of load may refer to the associated description of other embodiments.

Those of ordinary skill in the art may realize that lists described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, depends on the specific application and design constraint of technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.

In embodiment provided by the present invention, it should be understood that disclosed system/terminal device and method, it can be with It realizes by another way.For example, system described above/terminal device embodiment is only schematical, for example, institute The division of module or unit is stated, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as Multiple units or component can be combined or can be integrated into another system, or some features can be ignored or not executed.Separately A bit, shown or discussed mutual coupling or direct-coupling or communication connection can be by some interfaces, system Or INDIRECT COUPLING or the communication connection of unit, can be electrical, machinery or other forms.

It is described to be set as the unit that separating component illustrates and may or may not be physically separated, it is set as single The component of member display may or may not be physical unit, you can be located at a place, or may be distributed over In multiple network element.Some or all of unit therein can be selected according to the actual needs to realize this embodiment scheme Purpose.

In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.

If the integrated module/unit, which is realized in the form of SFU software functional unit and is arranged, is independent product sale Or it in use, can be stored in a computer read/write memory medium.Based on this understanding, the present invention realizes above-mentioned reality All or part of flow in a method is applied, relevant hardware can also be instructed to complete by computer program, it is described Computer program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that The step of above-mentioned each embodiment of the method.Wherein, the computer program includes computer program code, the computer program Code can be source code form, object identification code form, executable file or certain intermediate forms etc..Computer-readable Jie Matter may include：Can carry the computer program code any entity or system, recording medium, USB flash disk, mobile hard disk, Magnetic disc, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described The content that computer-readable medium includes can carry out increasing appropriate according to legislation in jurisdiction and the requirement of patent practice Subtract, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium do not include be electric carrier signal and Telecommunication signal.

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although with reference to aforementioned reality Applying example, invention is explained in detail, it will be understood by those of ordinary skill in the art that：It still can be to aforementioned each Technical solution recorded in embodiment is modified or equivalent replacement of some of the technical features；And these are changed Or replace, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims

1. a kind of phoneme synthesizing method, which is characterized in that including：

Text data is obtained, subordinate sentence extracts tone Feature Words, and the emotion category of each sentence is analyzed according to the tone Feature Words Property；

Each language is synthesized according to the emotion attribute of each sentence based on default speech database and default sound pronunciation model The basic speech data of sentence；

Prosodic features adjustment is carried out to the basic speech data of each sentence according to the tone Feature Words, obtains target language Sound data.

2. phoneme synthesizing method according to claim 1, which is characterized in that described to be analyzed respectively according to the tone Feature Words The emotion attribute of a sentence, including：

The sentiment analysis parameter of multiple default dimensions of the sentence is obtained according to the tone Feature Words；

The ratio-dependent institute predicate of the sentiment analysis parameter of total dimension shared by sentiment analysis parameter to the multiple default dimension The emotion attribute of sentence.

3. phoneme synthesizing method according to claim 1, which is characterized in that based on default speech database and default voice Pronunciation model synthesizes the basic speech data of each sentence according to the emotion attribute of each sentence, including：

Voice data corresponding with each word of sentence is obtained from the default speech database；

The voice data is subjected to the electronic voice data that synthesis acquires the sentence；

By sound pronunciation model according to the emotion attribute of the sentence to the pitch, loudness of a sound and word speed of the electronic voice data It is adjusted, obtains the basic speech data of the sentence.

4. phoneme synthesizing method according to claim 1, which is characterized in that according to the tone Feature Words to described each The basic speech data of sentence carry out prosodic features adjustment, obtain target speech data, including：

The prosodic features adjustment rule of tone Feature Words in each sentence is obtained, including：The prosodic features such as pitch, loudness of a sound and the duration of a sound The adjustment rule of parameter；

Pitch, loudness of a sound and the duration of a sound of the basic speech data are adjusted according to the prosodic features of tone Feature Words adjustment rule It is whole, obtain target speech data.

5. phoneme synthesizing method according to claim 4, which is characterized in that adjusted according to the prosodic features of tone Feature Words Rule is adjusted pitch, loudness of a sound and the duration of a sound of the basic speech data, after obtaining target speech data, further includes：

Obtain the prosodic features parameter of the target speech data；

Pitch, loudness of a sound and the duration of a sound of each word of sentence are adjusted according to the average value, the voice seamlessly transitted Data.

6. a kind of speech synthesis system, which is characterized in that including：

Sentiment analysis module, for obtaining text data, subordinate sentence extracts tone Feature Words, and is analyzed according to the tone Feature Words The emotion attribute of each sentence；

Voice synthetic module, for the feelings based on default speech database and default sound pronunciation model according to each sentence Sense attribute synthesizes the basic speech data of each sentence；

Voice adjusts module, special for carrying out the rhythm to the basic speech data of each sentence according to the tone Feature Words Requisition whole, acquisition target speech data.

7. speech synthesis system according to claim 6, which is characterized in that the sentiment analysis module includes：

Parameter acquiring unit, the sentiment analysis ginseng of multiple default dimensions for obtaining the sentence according to the tone Feature Words Number；

Sentiment analysis unit, the sentiment analysis parameter for total dimension shared by the sentiment analysis parameter to the multiple default dimension Ratio-dependent described in sentence emotion attribute.

8. speech synthesis system according to claim 6, which is characterized in that the voice synthetic module includes：

Voice data acquiring unit, for obtaining voice number corresponding with each word of sentence from the default speech database According to；

Voice data synthesis unit, for the voice data to be carried out the electronic voice data that synthesis acquires the sentence；

Acoustic feature adjustment unit, for passing through sound pronunciation model according to the emotion attribute of the sentence to the electronic speech Pitch, loudness of a sound and the word speed of data are adjusted, and obtain the basic speech data of the sentence.

9. a kind of terminal device, including memory, processor and it is stored in the memory and can be on the processor The computer program of operation, which is characterized in that the processor realizes such as claim 1 to 5 when executing the computer program The step of any one the method.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, feature to exist In when the computer program is executed by processor the step of any one of such as claim 1 to 5 of realization the method.