CN108615524A - A kind of phoneme synthesizing method, system and terminal device - Google Patents
A kind of phoneme synthesizing method, system and terminal device Download PDFInfo
- Publication number
- CN108615524A CN108615524A CN201810456213.3A CN201810456213A CN108615524A CN 108615524 A CN108615524 A CN 108615524A CN 201810456213 A CN201810456213 A CN 201810456213A CN 108615524 A CN108615524 A CN 108615524A
- Authority
- CN
- China
- Prior art keywords
- sentence
- data
- feature words
- sound
- tone feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000002194 synthesizing effect Effects 0.000 title claims abstract description 16
- 230000008451 emotion Effects 0.000 claims abstract description 69
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 40
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 40
- 239000000284 extract Substances 0.000 claims abstract description 13
- 238000004458 analytical method Methods 0.000 claims description 25
- 238000004590 computer program Methods 0.000 claims description 20
- 230000033764 rhythmic process Effects 0.000 claims description 6
- 230000001419 dependent effect Effects 0.000 claims description 3
- 241000208340 Araliaceae Species 0.000 claims 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims 1
- 235000003140 Panax quinquefolius Nutrition 0.000 claims 1
- 235000008434 ginseng Nutrition 0.000 claims 1
- 238000010168 coupling process Methods 0.000 abstract description 8
- 238000005859 coupling reaction Methods 0.000 abstract description 8
- 230000008878 coupling Effects 0.000 abstract description 7
- 238000012545 processing Methods 0.000 abstract description 6
- 238000010586 diagram Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 10
- 230000008569 process Effects 0.000 description 8
- 230000007935 neutral effect Effects 0.000 description 5
- 238000011109 contamination Methods 0.000 description 4
- 230000036651 mood Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000035772 mutation Effects 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
- G10L2013/105—Duration
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
- G10L25/63—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
Abstract
The present invention is suitable for technical field of data processing, provides a kind of phoneme synthesizing method, system and terminal device, including:Text data is obtained, subordinate sentence extracts tone Feature Words, and the emotion attribute of each sentence is analyzed according to tone Feature Words;The basic speech data of each sentence are synthesized according to the emotion attribute of each sentence based on default speech database and default sound pronunciation model;Prosodic features adjustment is carried out to the basic speech data of each sentence according to tone Feature Words, obtains target speech data.By extracting the tone Feature Words of every sentence in text data come the emotion attribute of anolytic sentence, and the basic speech data adjusted by presetting the emotion attribute of sound pronunciation models coupling sentence, prosodic features adjustment is being carried out to basic speech data, is obtaining the higher target speech data of anthropomorphic degree.When carrying out the pronunciation of Sentiment orientation word, more rich in emotion, it is more in line with the feature of actual user's pronunciation, effectively improves the quality of speech synthesis data.
Description
Technical field
The invention belongs to a kind of technical field of data processing more particularly to phoneme synthesizing method, system and terminal devices.
Background technology
Audiobook is a kind of individual or more people according to manuscript and is recorded by different sound emoticon and recording format
Works.Audiobook on the market is all manually to record and save in advance at present, is directly played when in use.However this is needed
A large amount of human resources are expended to be recorded in advance.In order to save human cost, language can be synthesized by speech synthesis technique
Sound data.Speech synthesis technique refer to by the methods of mechanically or electrically generating artificial voice, it is that computer oneself is generated or
Externally input text information be changed into the technology that the voice that can listen to understand is exported.Current phonetic synthesis skill
Art is all first to be analyzed to obtain word and word in text data to text data, later from voice when carrying out phonetic synthesis
Library obtains these words and the corresponding basic voice data of word, finally is combined to obtain in order by the basic voice data of acquisition
Final voice data, so obtained from voice data personification degree it is not high, thus there are problems that poor quality.
In conclusion there is the voice data poor quality that synthesis obtains in existing speech synthesis technique.
Invention content
In view of this, an embodiment of the present invention provides a kind of phoneme synthesizing method, system and terminal device, it is existing to solve
There is the voice data poor quality that synthesis obtains in speech synthesis technique.
The first aspect of the present invention provides a kind of phoneme synthesizing method, including:
Text data is obtained, subordinate sentence extracts tone Feature Words, and the feelings of each sentence are analyzed according to the tone Feature Words
Feel attribute;
It is synthesized respectively according to the emotion attribute of each sentence based on default speech database and default sound pronunciation model
The basic speech data of a sentence;
Prosodic features adjustment is carried out to the basic speech data of each sentence according to the tone Feature Words, obtains mesh
Mark voice data.
The second aspect of the present invention provides a kind of speech synthesis system, including:
Sentiment analysis module, for obtaining text data, subordinate sentence extracts tone Feature Words, and according to the tone Feature Words
Analyze the emotion attribute of each sentence;
Voice synthetic module, for being based on default speech database and default sound pronunciation model according to each sentence
Emotion attribute synthesize the basic speech data of each sentence;
Voice adjusts module, for carrying out rhythm to the basic speech data of each sentence according to the tone Feature Words
Character adjustment is restrained, target speech data is obtained.
The third aspect of the present invention provides a kind of terminal device, including memory, processor and is stored in described deposit
In reservoir and the computer program that can run on the processor, the processor realized when executing the computer program with
Lower step:
Text data is obtained, subordinate sentence extracts tone Feature Words, and the feelings of each sentence are analyzed according to the tone Feature Words
Feel attribute;
It is synthesized respectively according to the emotion attribute of each sentence based on default speech database and default sound pronunciation model
The basic speech data of a sentence;
Prosodic features adjustment is carried out to the basic speech data of each sentence according to the tone Feature Words, obtains mesh
Mark voice data.
The fourth aspect of the present invention provides a kind of computer readable storage medium, and the computer readable storage medium is deposited
Computer program is contained, the computer program realizes following steps when being executed by processor:
Text data is obtained, subordinate sentence extracts tone Feature Words, and the feelings of each sentence are analyzed according to the tone Feature Words
Feel attribute;
It is synthesized respectively according to the emotion attribute of each sentence based on default speech database and default sound pronunciation model
The basic speech data of a sentence;
Prosodic features adjustment is carried out to the basic speech data of each sentence according to the tone Feature Words, obtains mesh
Mark voice data.
A kind of phoneme synthesizing method, system and terminal device provided by the invention, by extracting every language in text data
The tone Feature Words of sentence come the emotion attribute of anolytic sentence, and the emotion attribute tune by presetting sound pronunciation models coupling sentence
Whole obtained basic speech data are carrying out prosodic features adjustment to basic speech data, are obtaining the higher target language of anthropomorphic degree
Sound data.When carrying out the pronunciation of Sentiment orientation word, more rich in emotion, it is more in line with the feature of actual user's pronunciation, is effectively improved
The quality of speech synthesis data solves asking for the voice data poor quality that existing speech synthesis technique presence synthesis obtains
Topic.
Description of the drawings
It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art
Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description be only the present invention some
Embodiment for those of ordinary skill in the art without having to pay creative labor, can also be according to these
Attached drawing obtains other attached drawings.
Fig. 1 is a kind of implementation process schematic diagram for phoneme synthesizing method that the embodiment of the present invention one provides;
Fig. 2 is the implementation process schematic diagram of one step S101 of corresponding embodiment provided by Embodiment 2 of the present invention;
Fig. 3 is the implementation process schematic diagram for the one step S102 of corresponding embodiment that the embodiment of the present invention three provides;
Fig. 4 is the implementation process schematic diagram for the one step S103 of corresponding embodiment that the embodiment of the present invention four provides;
Fig. 5 is a kind of structural schematic diagram for speech synthesis system that the embodiment of the present invention five provides;
Fig. 6 is the structural schematic diagram of sentiment analysis module 101 in the corresponding embodiment five that the embodiment of the present invention six provides;
Fig. 7 is the structural schematic diagram of voice synthetic module 102 in the corresponding embodiment five that the embodiment of the present invention seven provides;
Fig. 8 is the structural schematic diagram of voice adjustment module 103 in the corresponding embodiment five that the embodiment of the present invention eight provides;
Fig. 9 is the schematic diagram for the terminal device that the embodiment of the present invention nine provides.
Specific implementation mode
In being described below, for illustration and not for limitation, it is proposed that such as tool of particular system structure, technology etc
Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific
The present invention can also be realized in the other embodiments of details.In other situations, it omits to well-known system, system, electricity
The detailed description of road and method, in case unnecessary details interferes description of the invention.
There is the voice data poor quality that synthesis obtains to solve existing speech synthesis technique in the embodiment of the present invention
The problem of, a kind of phoneme synthesizing method, system and terminal device are provided, the tone for extracting every sentence in text data is passed through
Feature Words carry out the emotion attribute of anolytic sentence, and adjusted by presetting the emotion attribute of sound pronunciation models coupling sentence
Basic speech data are carrying out prosodic features adjustment to basic speech data, are obtaining the higher target speech data of anthropomorphic degree.
When carrying out the pronunciation of Sentiment orientation word, more rich in emotion, it is more in line with the feature of actual user's pronunciation, effectively improves phonetic synthesis number
According to quality, solve the problems, such as the voice data poor quality that existing speech synthesis technique is obtained in the presence of synthesis.
In order to illustrate technical solutions according to the invention, illustrated below by specific embodiment.
Embodiment one:
As shown in Figure 1, present embodiments providing a kind of phoneme synthesizing method, specifically include:
Step S101:Text data is obtained, subordinate sentence extracts tone Feature Words, and each according to tone Feature Words analysis
The emotion attribute of sentence.
In a particular application, the text data with text information is obtained by terminal, the format of this article notebook data can be with
For text formatting (txt), rich text format (Rich Text Format, RTF) or document (Document, DOC) etc., also may be used
Think the file that portable document format (Portable Document Format, PDF) or picture etc. include text information,
PDF or picture are converted into again to directly read the file of text data, do not limited herein.
In a particular application, after getting text data, as unit of sentence, the tone feature in each sentence is extracted
Word.Tone Feature Words refer to words, symbol or words combination with emotion, such as " happy ", " day ", " excellent "
" well ", "" etc. indicate mood and the tone words and symbol.Since tone Feature Words can embody the Sentiment orientation of user,
Different prosodic features is had when being pronounced.Therefore the tone Feature Words in each sentence are extracted, and according to language
Gas Feature Words analyze the emotion attribute of each sentence.
In a particular application, tone Feature Words database is pre-set, each language is extracted according to tone Feature Words database
The tone Feature Words to match with tone Feature Words database in sentence.When expressing mood, user can using word contamination come
Express the mood of oneself.In order to enrich tone Feature Words database, accurately extract the tone Feature Words in each sentence, according to language
The rule of combination of method rule setting words extracts the words for meeting rule of combination when extracting tone Feature Words together.
Illustratively, said combination rule includes but not limited to following rule of combination:
A:Degree adverb+emotion word, such as " compared with+it is good ", " very+good ", " special+good ";
B:Negative word+emotion word, such as " not+good ", " not+bad ";
C:Negative word+degree adverb+emotion word, such as " not+too+good ", " not+too+bad ";
D:Degree adverb+negative word+emotion word, such as " very+or not good ", " also+or not bad ".
In a particular application, in order to ensure the phonetic synthesis effect of every a word, the present embodiment be as unit of sentence into
Row tone Feature Words extract, and the emotion attribute of the sentence is analyzed for the tone Feature Words of each sentence.
In a particular application, first each sentence of text data is split, is split as multiple word contaminations, will tears open
Neutral word and tone Feature Words are divided into the words combination divided, and wherein tone Feature Words include front word and negation words, Ke Yigen
The emotion attribute of the sentence is obtained according to above-mentioned neutral word, front word and negation words proportion grading shared in the sentence.
Step S102:Based on default speech database and default sound pronunciation model according to the emotion category of each sentence
Property synthesizes the basic speech data of each sentence.
In a particular application, the sentences of multiple word contaminations will be split as unit of words in default voice number
According to the voice data for obtaining each words in library, the voice data of multiple words is synthesized to obtain the voice data of whole sentence.
In a particular application, after the voice data for obtaining whole sentence, default voice is based on according to the emotion attribute of the sentence and is sent out
Sound model carries out acoustic feature adjustment to voice data, to obtain the basic speech data of corresponding sentence emotion attribute so that hair
The pronunciation of sound and actual user are more nearly.In a particular application, above-mentioned acoustic feature includes intensity of sound, word speed, tone
High low feature.
In a particular application, the building process of above-mentioned default sound pronunciation model is:Acquire the voice of a large amount of actual users
Data utilize nerve net as training sample, and to carrying out emotion attribute label in voice data as unit of each sentence
Network is trained, and obtains the acoustic feature of the corresponding sound pronunciation of each emotion attribute.
Step S103:Prosodic features tune is carried out to the basic speech data of each sentence according to the tone Feature Words
It is whole, obtain target speech data.
In a particular application, basic speech data are the emotion attributes based on whole sentence, in order to further more be met reality
Pronunciation characteristic of the border user in corresponding emotion carries out prosodic features to the basic speech data of whole sentence again for tone Feature Words
Adjustment.
In a particular application, above-mentioned prosodic features includes loudness of a sound, pitch and the duration of a sound.Loudness of a sound is then the stress of voice, schwa
Change Deng power;Pitch is then the word reconciliation intonation of voice;The duration of a sound is then the rhythm speed of voice.
In a particular application, since the user emotion of the tone Feature Words expression of different emotions tendency is different, and
The prosodic features of the corresponding speech of different moods can have larger difference, such as it is happy when tone can obviously than sadness when sound
It is turned up.I.e. each tone Feature Words correspond to one kind (or a kind of) prosodic features and therefore first get the corresponding rhythm of tone Feature Words
Feature is restrained, prosodic features tune is carried out to the tone Feature Words in basic speech data according to the prosodic features of the tone Feature Words
It is whole, if there are multiple tone Feature Words in a sentence, prosodic features adjustment is all carried out to whole tone Feature Words, is obtained more
Meet the voice data of actual user's pronunciation.
In a particular application, being adjusted to prosodic features can be the prosodic features for pre-setting all kinds of tone Feature Words
Parameter, such as setting indicate that the prosodic features parameter of happy tone Feature Words is loudness of a sound 1, pitch 1 and the duration of a sound 1, and setting indicates
The prosodic features parameter of sad tone Feature Words is loudness of a sound 2, pitch 2 and the duration of a sound 2.Either based on basic speech data
The percentage of prosodic features parameter is adjusted, and such as indicates happy tone Feature Words when carrying out prosodic features adjustment, be by
The corresponding pitch of tone Feature Words increases 10% on the basis of basic speech data, and the duration of a sound is shortened 15%.
Phoneme synthesizing method provided in this embodiment is divided by extracting the tone Feature Words of every sentence in text data
The emotion attribute of sentence is analysed, and the emotion attribute by presetting sound pronunciation models coupling sentence adjusts the basis personalized
Voice data, then prosodic features adjustment is carried out to basic speech data, obtain the target voice for being more nearly actual user's pronunciation
Data.When carrying out the pronunciation of Sentiment orientation word, more rich in emotion, it is more in line with the feature of actual user's pronunciation, effectively improves language
The quality of sound generated data solves asking for the voice data poor quality that existing speech synthesis technique presence synthesis obtains
Topic.
Embodiment two:
As shown in Fig. 2, in the present embodiment, the step S101 in embodiment one is specifically included:
Step S201:The sentiment analysis parameter of multiple default dimensions of the sentence is obtained according to the tone Feature Words.
In a particular application, to each vocabulary from three default dimensions such as [front, neutral, negative] after sentence being split
Degree scores, and obtains the sentence and corresponds to the scoring synthesis of three default dimensions such as [positive, neutral, negative], then calculates separately
The ratio of score shared by three default dimensions of the sentence.Illustratively, neutral word and the tone can be divided into after sentence is split
Feature Words, tone Feature Words can be divided into front word and negation words.When classifying to tone Feature Words, to tone Feature Words into
Row is classified and the grade scoring of the tone Feature Words corresponding level is arranged.
Such as " happy ":It is arranged that it is front word and its grade scoring is+2;
" excellent ":It is arranged that it is front word and grade scoring is+5;
" bad ":It is arranged that it is negation words and grade scoring is -2 ";
" very bad ":It is arranged that it is negation words and grade scoring is -5.It should be noted that it is above-mentioned to tone Feature Words into
Row classification and scoring can be based on neural network structure classification grading module and be realized that specific implementation means are not subject to
It repeats.
Step S202:The ratio of the sentiment analysis parameter of total dimension shared by sentiment analysis parameter to the multiple default dimension
Example determines the emotion attribute of the sentence.
In a particular application, the score for presetting dimension is calculated according to the grade scoring of tone Feature Words, then is calculated each pre-
If the ratio of total dimension score shared by dimension, such as scoring of three default dimensions of a sentence is [+10,4, -6], at this time
It is [+0.5,0.2, -0.3] to fraction scale shared by three dimension scores.
In a particular application, the text emotion disaggregated model of supporting vector mechanism corresponding to each sentence default three is utilized
Ratio value shared by the score of a dimension is calculated, and then judges the corresponding emotion attribute of the sentence.It advances with a large amount of
Different text datas are trained text emotion disaggregated model, to obtain the text emotion analysis model for meeting vectorial mechanism.
Wherein it is possible to which emotion attribute is divided into a variety of different states, quantify the emotion of user with this, obtained each sentence pair
That answers quantifies to indicate the emotion attribute of the sentence.Illustratively, the emotion attribute of above-mentioned quantization includes but not limited to:Happily,
Sad, indignation, it is frightened, feel uncertain and normal.
Embodiment three:
As shown in figure 3, in the present embodiment, the step S102 in embodiment one is specifically included:
Step S301:Voice data corresponding with each word of sentence is obtained from the default speech database.
In a particular application, each sentence is split, multiple word contaminations is split as, pre- as unit of words
If obtaining the voice data of each words in speech database
Step S302:The voice data is subjected to the electronic voice data that synthesis acquires the sentence.
In a particular application, the voice data for the voice data of multiple words being synthesized to obtain whole sentence obtains the language
The electronic voice data of sentence.
Step S303:By sound pronunciation model according to the emotion attribute of the sentence to the sound of the electronic voice data
High, loudness of a sound and word speed are adjusted, and obtain the basic speech data of the sentence.
In a particular application, after obtaining electronic voice data, default sound pronunciation is based on according to the emotion attribute of the sentence
Model is adjusted the pitch, loudness of a sound and word speed of electronic voice data, to obtain the basic speech of corresponding sentence emotion attribute
Data so that the pronunciation with actual user of pronouncing is more nearly.
Example IV:
As shown in figure 4, in the present embodiment, the step S103 in embodiment one is specifically included:
Step S401:The prosodic features adjustment rule of tone Feature Words in each sentence is obtained, including:Pitch, loudness of a sound and
The adjustment rule of the prosodic features parameter such as duration of a sound.
In a particular application, classified and set the grade of different classes of tone Feature Words, root to tone Feature Words
Corresponding prosodic features adjustment rule, the specially sound of the tone Feature Words are obtained according to the classification of the tone Feature Words and grade
The adjustment rule of the prosodic features parameter such as high, loudness of a sound and the duration of a sound.
In a particular application, the prosodic features parameter of different classes of different grades of tone Feature Words is preset, then right
Each tone Feature Words carry out classification and grade classification, and then obtain its corresponding prosodic features parameter acquiring its corresponding rhythm
Character adjustment rule.
Step S402:Pitch, sound of the rule to the basic speech data are adjusted according to the prosodic features of tone Feature Words
The strong and duration of a sound is adjusted, and obtains target speech data.
In a particular application, it after getting the corresponding prosodic features adjustment rule of tone Feature Words, is advised according to the adjustment
Then pitch, loudness of a sound and the duration of a sound of each tone Feature Words in basic speech data are adjusted, can be obtained after being adjusted
To the target speech data closer to actual user's pronunciation.
In a particular application, above-mentioned adjustment process can adjust rule according to prosodic features tone Feature Words are calculated
Prosodic features parameter, then the prosodic features of corresponding tone Feature Words in basic speech data is adjusted to the prosodic features
Parameter.Can also be that basic speech data are adjusted using the form of percentage according to prosodic features adjustment rule, herein
It does not limit.
In one embodiment, further comprising the steps of after above-mentioned steps S402:
Obtain the prosodic features parameter of the target speech data;
The average value of the prosodic features parameter of each sentence is calculated according to the prosodic features parameter of target speech data;
Pitch, loudness of a sound and the duration of a sound of each word of sentence are adjusted according to the average value, seamlessly transitted
Voice data.
In a particular application, since the adjustment of above-mentioned prosodic features is only to be directed to tone Feature Words, it is thus possible to can cause
The problem of existing voice mutation so that the pronunciation appearance for the words that is connected before and after the pronunciation of tone Feature Words and tone Feature Words is lofty not
Harmonious situation.It in a particular application, can be by the target language after prosodic features adjusts in order to avoid the above problem
The prosodic features parameter of sound data carries out adjustment again as unit of whole sentence so that sentence can seamlessly transit.Specially:
The average value of the prosodic features parameter of each sentence is obtained according to the prosodic features parameter of target speech data.For tone feature
The connected words of word, is adjusted the pitch, loudness of a sound and the duration of a sound of the words using above-mentioned average value.In a particular application, when
It, only need to be for the words that first tone Feature Words is connected with the last one tone Feature Words when multiple tone Feature Words are connected
Pitch, loudness of a sound and the duration of a sound are adjusted.
Illustratively, as " we go recreation ground to play good or not afternoon!" in, " good or not " is used as tone Feature Words, sound
High and tone can all increase, and the pitch and tone of coupled " objects for appreciation " will not then increase, it is thus possible to appearance from " object for appreciation " to
The case where tone and pitch are promoted suddenly when " good or not " so that pronunciation transition is unnatural.Therefore according to the prosodic features of whole sentence
The prosodic features mean parameter of whole sentence is calculated in parameter, is the average value by the prosodic features parameter adjustment of " object for appreciation ", then can
The drop between the loudness of a sound and tone of " object for appreciation " and " good or not " is enough efficiently reduced, realization seamlessly transits.
Embodiment five:
As shown in figure 5, the present embodiment provides a kind of speech synthesis system 100, for executing the step of the method in embodiment one
Suddenly comprising sentiment analysis module 101, voice synthetic module 102 and voice adjust module 103.
Sentiment analysis module 101 extracts tone Feature Words for obtaining text data, subordinate sentence, and according to the tone feature
Word analyzes the emotion attribute of each sentence;
Voice synthetic module 102 is used for based on default speech database and default sound pronunciation model according to each language
The emotion attribute of sentence synthesizes the basic speech data of each sentence;
Voice adjusts module 103 and is used to carry out the basic speech data of each sentence according to the tone Feature Words
Prosodic features adjusts, and obtains target speech data.
It should be noted that speech synthesis system provided in an embodiment of the present invention, due to real with method shown in Fig. 1 of the present invention
It applies example and is based on same design, the technique effect brought is identical as embodiment of the method shown in Fig. 1 of the present invention, and particular content can be found in
Narration in embodiment of the method shown in Fig. 1 of the present invention, details are not described herein again.
Therefore, a kind of speech synthesis system provided in this embodiment, equally can be by extracting every language in text data
The tone Feature Words of sentence come the emotion attribute of anolytic sentence, and the emotion attribute tune by presetting sound pronunciation models coupling sentence
The whole basic speech data to be personalized, then prosodic features adjustment is carried out to basic speech data, it obtains being more nearly reality
The target speech data of user pronunciation.When carrying out the pronunciation of Sentiment orientation word, more rich in emotion, it is more in line with actual user's pronunciation
Feature, effectively improve the quality of speech synthesis data, solve the voice that existing speech synthesis technique is obtained in the presence of synthesis
The low problem of the quality of data.
Embodiment six:
As shown in fig. 6, in the present embodiment, the sentiment analysis module 101 in embodiment five includes right for executing Fig. 2 institutes
The structure of method and step in the embodiment answered comprising parameter acquiring unit 201 and sentiment analysis unit 202.
Parameter acquiring unit 201 is used to obtain the emotion of multiple default dimensions of the sentence according to the tone Feature Words
Analyze parameter.
Emotion point of the sentiment analysis unit 202 for total dimension shared by the sentiment analysis parameter to the multiple default dimension
Analyse the emotion attribute of sentence described in the ratio-dependent of parameter.
Embodiment seven:
As shown in fig. 7, in the present embodiment, the voice synthetic module 102 in embodiment five includes right for executing Fig. 3 institutes
The structure of method and step in the embodiment answered comprising voice data acquiring unit 301, voice data synthesis unit 302 with
And acoustic feature adjustment unit 303.
Voice data acquiring unit 301 is used to obtain from the default speech database corresponding with each word of sentence
Voice data.
Voice data synthesis unit 302 is used to the voice data carrying out the electronic speech that synthesis acquires the sentence
Data.
Acoustic feature adjustment unit 303 is used for through sound pronunciation model according to the emotion attribute of the sentence to the electricity
Pitch, loudness of a sound and the word speed of sub- voice data are adjusted, and obtain the basic speech data of the sentence.
Embodiment eight:
As shown in figure 8, in the present embodiment, the voice adjustment module 103 in embodiment five includes right for executing Fig. 4 institutes
The structure of method and step in the embodiment answered comprising prosodic features adjusts Rule unit 401 and prosodic features adjustment
Unit 402.
Prosodic features adjustment Rule unit 401 obtains the prosodic features adjustment rule of tone Feature Words in each sentence
Then, including:The adjustment rule of the prosodic features parameter such as pitch, loudness of a sound and duration of a sound.
Prosodic features adjustment unit 402 is used for according to the prosodic features of tone Feature Words adjustment rule to the basic speech
Pitch, loudness of a sound and the duration of a sound of data are adjusted, and obtain target speech data.
In one embodiment, above-mentioned voice adjustment module 103 further include characteristic parameter acquiring unit, computing unit and
Seamlessly transit adjustment unit.
Characteristic parameter acquiring unit is used to obtain the prosodic features parameter of the target speech data.
Computing unit is used to calculate the prosodic features parameter of each sentence according to the prosodic features parameter of target speech data
Average value.
Seamlessly transit adjustment unit, for according to the average value to pitch, loudness of a sound and the duration of a sound of each word of sentence into
Row adjustment, the voice data seamlessly transitted.
Embodiment nine:
Fig. 9 is the schematic diagram for the terminal device that the embodiment of the present invention nine provides.As shown in figure 9, the terminal of the embodiment is set
Standby 9 include:Processor 90, memory 91 and it is stored in the meter that can be run in the memory 91 and on the processor 90
Calculation machine program 92, such as program.The processor 90 realizes above-mentioned each phonetic synthesis side when executing the computer program 92
Step in method embodiment, such as step S101 to S103 shown in FIG. 1.Alternatively, the processor 90 executes the computer
The function of each module/unit in above system embodiment, such as the function of module 101 to 103 shown in Fig. 5 are realized when program 92.
Illustratively, the computer program 92 can be divided into one or more module/units, it is one or
Multiple module/units are stored in the memory 91, and are executed by the processor 90, to complete the present invention.Described one
A or multiple module/units can be the series of computation machine program instruction section that can complete specific function, which is used for
Implementation procedure of the computer program 92 in the terminal device 9 is described.For example, the computer program 92 can be divided
It is as follows to be cut into sentiment analysis module, voice synthetic module and voice adjustment module, each module concrete function:
Sentiment analysis module, for obtaining text data, subordinate sentence extracts tone Feature Words, and according to the tone Feature Words
Analyze the emotion attribute of each sentence;
Voice synthetic module, for being based on default speech database and default sound pronunciation model according to each sentence
Emotion attribute synthesize the basic speech data of each sentence;
Voice adjusts module, for carrying out rhythm to the basic speech data of each sentence according to the tone Feature Words
Character adjustment is restrained, target speech data is obtained.
The terminal device 9 can be the calculating such as desktop PC, notebook, palm PC and high in the clouds management server
Equipment.The terminal device may include, but be not limited only to, processor 90, memory 91.It will be understood by those skilled in the art that
Fig. 9 is only the example of terminal device 9, does not constitute the restriction to terminal device 9, may include more more or fewer than illustrating
Component, either combines certain components or different components, for example, the terminal device can also include input-output equipment,
Network access equipment, bus etc..
Alleged processor 90 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit
(Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor can also be any conventional processor
Deng.
The memory 91 can be the internal storage unit of the terminal device 9, such as the hard disk of terminal device 9 or interior
It deposits.The memory 91 can also be to be equipped on the External memory equipment of the terminal device 9, such as the terminal device 9
Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge
Deposit card (Flash Card) etc..Further, the memory 91 can also both include the storage inside list of the terminal device 9
Member also includes External memory equipment.The memory 91 is for storing needed for the computer program and the terminal device
Other programs and data.The memory 91 can be also used for temporarily storing the data that has exported or will export.
It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each work(
Can unit, module division progress for example, in practical application, can be as needed and by above-mentioned function distribution by different
Functional unit, module are completed, i.e., the internal structure of the system are divided into different functional units or module, more than completion
The all or part of function of description.Each functional unit, module in embodiment can be integrated in a processing unit, also may be used
It, can also be above-mentioned integrated during two or more units are integrated in one unit to be that each unit physically exists alone
The form that hardware had both may be used in unit is realized, can also be realized in the form of SFU software functional unit.In addition, each function list
Member, the specific name of module are also only to facilitate mutually distinguish, the protection domain being not intended to limit this application.It is above-mentioned wireless
The specific work process of unit in terminal, module, can refer to corresponding processes in the foregoing method embodiment, no longer superfluous herein
It states.
In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment
The part of load may refer to the associated description of other embodiments.
Those of ordinary skill in the art may realize that lists described in conjunction with the examples disclosed in the embodiments of the present disclosure
Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually
It is implemented in hardware or software, depends on the specific application and design constraint of technical solution.Professional technician
Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed
The scope of the present invention.
In embodiment provided by the present invention, it should be understood that disclosed system/terminal device and method, it can be with
It realizes by another way.For example, system described above/terminal device embodiment is only schematical, for example, institute
The division of module or unit is stated, only a kind of division of logic function, formula that in actual implementation, there may be another division manner, such as
Multiple units or component can be combined or can be integrated into another system, or some features can be ignored or not executed.Separately
A bit, shown or discussed mutual coupling or direct-coupling or communication connection can be by some interfaces, system
Or INDIRECT COUPLING or the communication connection of unit, can be electrical, machinery or other forms.
It is described to be set as the unit that separating component illustrates and may or may not be physically separated, it is set as single
The component of member display may or may not be physical unit, you can be located at a place, or may be distributed over
In multiple network element.Some or all of unit therein can be selected according to the actual needs to realize this embodiment scheme
Purpose.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing unit, it can also
It is that each unit physically exists alone, it can also be during two or more units be integrated in one unit.Above-mentioned integrated list
The form that hardware had both may be used in member is realized, can also be realized in the form of SFU software functional unit.
If the integrated module/unit, which is realized in the form of SFU software functional unit and is arranged, is independent product sale
Or it in use, can be stored in a computer read/write memory medium.Based on this understanding, the present invention realizes above-mentioned reality
All or part of flow in a method is applied, relevant hardware can also be instructed to complete by computer program, it is described
Computer program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that
The step of above-mentioned each embodiment of the method.Wherein, the computer program includes computer program code, the computer program
Code can be source code form, object identification code form, executable file or certain intermediate forms etc..Computer-readable Jie
Matter may include:Can carry the computer program code any entity or system, recording medium, USB flash disk, mobile hard disk,
Magnetic disc, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM,
Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described
The content that computer-readable medium includes can carry out increasing appropriate according to legislation in jurisdiction and the requirement of patent practice
Subtract, such as in certain jurisdictions, according to legislation and patent practice, computer-readable medium do not include be electric carrier signal and
Telecommunication signal.
Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations;Although with reference to aforementioned reality
Applying example, invention is explained in detail, it will be understood by those of ordinary skill in the art that:It still can be to aforementioned each
Technical solution recorded in embodiment is modified or equivalent replacement of some of the technical features;And these are changed
Or replace, the spirit and scope for various embodiments of the present invention technical solution that it does not separate the essence of the corresponding technical solution should all
It is included within protection scope of the present invention.
Claims (10)
1. a kind of phoneme synthesizing method, which is characterized in that including:
Text data is obtained, subordinate sentence extracts tone Feature Words, and the emotion category of each sentence is analyzed according to the tone Feature Words
Property;
Each language is synthesized according to the emotion attribute of each sentence based on default speech database and default sound pronunciation model
The basic speech data of sentence;
Prosodic features adjustment is carried out to the basic speech data of each sentence according to the tone Feature Words, obtains target language
Sound data.
2. phoneme synthesizing method according to claim 1, which is characterized in that described to be analyzed respectively according to the tone Feature Words
The emotion attribute of a sentence, including:
The sentiment analysis parameter of multiple default dimensions of the sentence is obtained according to the tone Feature Words;
The ratio-dependent institute predicate of the sentiment analysis parameter of total dimension shared by sentiment analysis parameter to the multiple default dimension
The emotion attribute of sentence.
3. phoneme synthesizing method according to claim 1, which is characterized in that based on default speech database and default voice
Pronunciation model synthesizes the basic speech data of each sentence according to the emotion attribute of each sentence, including:
Voice data corresponding with each word of sentence is obtained from the default speech database;
The voice data is subjected to the electronic voice data that synthesis acquires the sentence;
By sound pronunciation model according to the emotion attribute of the sentence to the pitch, loudness of a sound and word speed of the electronic voice data
It is adjusted, obtains the basic speech data of the sentence.
4. phoneme synthesizing method according to claim 1, which is characterized in that according to the tone Feature Words to described each
The basic speech data of sentence carry out prosodic features adjustment, obtain target speech data, including:
The prosodic features adjustment rule of tone Feature Words in each sentence is obtained, including:The prosodic features such as pitch, loudness of a sound and the duration of a sound
The adjustment rule of parameter;
Pitch, loudness of a sound and the duration of a sound of the basic speech data are adjusted according to the prosodic features of tone Feature Words adjustment rule
It is whole, obtain target speech data.
5. phoneme synthesizing method according to claim 4, which is characterized in that adjusted according to the prosodic features of tone Feature Words
Rule is adjusted pitch, loudness of a sound and the duration of a sound of the basic speech data, after obtaining target speech data, further includes:
Obtain the prosodic features parameter of the target speech data;
The average value of the prosodic features parameter of each sentence is calculated according to the prosodic features parameter of target speech data;
Pitch, loudness of a sound and the duration of a sound of each word of sentence are adjusted according to the average value, the voice seamlessly transitted
Data.
6. a kind of speech synthesis system, which is characterized in that including:
Sentiment analysis module, for obtaining text data, subordinate sentence extracts tone Feature Words, and is analyzed according to the tone Feature Words
The emotion attribute of each sentence;
Voice synthetic module, for the feelings based on default speech database and default sound pronunciation model according to each sentence
Sense attribute synthesizes the basic speech data of each sentence;
Voice adjusts module, special for carrying out the rhythm to the basic speech data of each sentence according to the tone Feature Words
Requisition whole, acquisition target speech data.
7. speech synthesis system according to claim 6, which is characterized in that the sentiment analysis module includes:
Parameter acquiring unit, the sentiment analysis ginseng of multiple default dimensions for obtaining the sentence according to the tone Feature Words
Number;
Sentiment analysis unit, the sentiment analysis parameter for total dimension shared by the sentiment analysis parameter to the multiple default dimension
Ratio-dependent described in sentence emotion attribute.
8. speech synthesis system according to claim 6, which is characterized in that the voice synthetic module includes:
Voice data acquiring unit, for obtaining voice number corresponding with each word of sentence from the default speech database
According to;
Voice data synthesis unit, for the voice data to be carried out the electronic voice data that synthesis acquires the sentence;
Acoustic feature adjustment unit, for passing through sound pronunciation model according to the emotion attribute of the sentence to the electronic speech
Pitch, loudness of a sound and the word speed of data are adjusted, and obtain the basic speech data of the sentence.
9. a kind of terminal device, including memory, processor and it is stored in the memory and can be on the processor
The computer program of operation, which is characterized in that the processor realizes such as claim 1 to 5 when executing the computer program
The step of any one the method.
10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, feature to exist
In when the computer program is executed by processor the step of any one of such as claim 1 to 5 of realization the method.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810456213.3A CN108615524A (en) | 2018-05-14 | 2018-05-14 | A kind of phoneme synthesizing method, system and terminal device |
PCT/CN2018/097560 WO2019218481A1 (en) | 2018-05-14 | 2018-07-27 | Speech synthesis method, system, and terminal apparatus |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810456213.3A CN108615524A (en) | 2018-05-14 | 2018-05-14 | A kind of phoneme synthesizing method, system and terminal device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108615524A true CN108615524A (en) | 2018-10-02 |
Family
ID=63663006
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810456213.3A Pending CN108615524A (en) | 2018-05-14 | 2018-05-14 | A kind of phoneme synthesizing method, system and terminal device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN108615524A (en) |
WO (1) | WO2019218481A1 (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109461435A (en) * | 2018-11-19 | 2019-03-12 | 北京光年无限科技有限公司 | A kind of phoneme synthesizing method and device towards intelligent robot |
CN109545245A (en) * | 2018-12-21 | 2019-03-29 | 斑马网络技术有限公司 | Method of speech processing and device |
CN109599094A (en) * | 2018-12-17 | 2019-04-09 | 海南大学 | The method of sound beauty and emotion modification |
CN109710748A (en) * | 2019-01-17 | 2019-05-03 | 北京光年无限科技有限公司 | It is a kind of to draw this reading exchange method and system towards intelligent robot |
CN110379409A (en) * | 2019-06-14 | 2019-10-25 | 平安科技(深圳)有限公司 | Phoneme synthesizing method, system, terminal device and readable storage medium storing program for executing |
CN111031386A (en) * | 2019-12-17 | 2020-04-17 | 腾讯科技(深圳)有限公司 | Video dubbing method and device based on voice synthesis, computer equipment and medium |
CN111091810A (en) * | 2019-12-19 | 2020-05-01 | 佛山科学技术学院 | VR game character expression control method based on voice information and storage medium |
CN111108549A (en) * | 2019-12-24 | 2020-05-05 | 深圳市优必选科技股份有限公司 | Speech synthesis method, speech synthesis device, computer equipment and computer readable storage medium |
CN111128118A (en) * | 2019-12-30 | 2020-05-08 | 科大讯飞股份有限公司 | Speech synthesis method, related device and readable storage medium |
CN112349272A (en) * | 2020-10-15 | 2021-02-09 | 北京捷通华声科技股份有限公司 | Speech synthesis method, speech synthesis device, storage medium and electronic device |
CN113539230A (en) * | 2020-03-31 | 2021-10-22 | 北京奔影网络科技有限公司 | Speech synthesis method and device |
CN113990286A (en) * | 2021-10-29 | 2022-01-28 | 北京大学深圳研究院 | Speech synthesis method, device, equipment and storage medium |
CN114783402A (en) * | 2022-06-22 | 2022-07-22 | 广东电网有限责任公司佛山供电局 | Variation method and device for synthetic voice, electronic equipment and storage medium |
US11545135B2 (en) * | 2018-10-05 | 2023-01-03 | Nippon Telegraph And Telephone Corporation | Acoustic model learning device, voice synthesis device, and program |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003271172A (en) * | 2002-03-15 | 2003-09-25 | Sony Corp | Method and apparatus for voice synthesis, program, recording medium and robot apparatus |
US20050187772A1 (en) * | 2004-02-25 | 2005-08-25 | Fuji Xerox Co., Ltd. | Systems and methods for synthesizing speech using discourse function level prosodic features |
CN102103856A (en) * | 2009-12-21 | 2011-06-22 | 盛大计算机(上海)有限公司 | Voice synthesis method and system |
US20130211838A1 (en) * | 2010-10-28 | 2013-08-15 | Acriil Inc. | Apparatus and method for emotional voice synthesis |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP4125362B2 (en) * | 2005-05-18 | 2008-07-30 | 松下電器産業株式会社 | Speech synthesizer |
CN101064103B (en) * | 2006-04-24 | 2011-05-04 | 中国科学院自动化研究所 | Chinese voice synthetic method and system based on syllable rhythm restricting relationship |
KR20080060909A (en) * | 2006-12-27 | 2008-07-02 | 엘지전자 주식회사 | Method for synthesing voice according to text and voice synthesis using the same |
CN101000765B (en) * | 2007-01-09 | 2011-03-30 | 黑龙江大学 | Speech synthetic method based on rhythm character |
CN101452699A (en) * | 2007-12-04 | 2009-06-10 | 株式会社东芝 | Rhythm self-adapting and speech synthesizing method and apparatus |
KR101203188B1 (en) * | 2011-04-14 | 2012-11-22 | 한국과학기술원 | Method and system of synthesizing emotional speech based on personal prosody model and recording medium |
CN103366731B (en) * | 2012-03-31 | 2019-02-01 | 上海果壳电子有限公司 | Phoneme synthesizing method and system |
CN103198827B (en) * | 2013-03-26 | 2015-06-17 | 合肥工业大学 | Voice emotion correction method based on relevance of prosodic feature parameter and emotion parameter |
US20150046164A1 (en) * | 2013-08-07 | 2015-02-12 | Samsung Electronics Co., Ltd. | Method, apparatus, and recording medium for text-to-speech conversion |
US9824681B2 (en) * | 2014-09-11 | 2017-11-21 | Microsoft Technology Licensing, Llc | Text-to-speech with emotional content |
CN105355193B (en) * | 2015-10-30 | 2020-09-25 | 百度在线网络技术(北京)有限公司 | Speech synthesis method and device |
-
2018
- 2018-05-14 CN CN201810456213.3A patent/CN108615524A/en active Pending
- 2018-07-27 WO PCT/CN2018/097560 patent/WO2019218481A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2003271172A (en) * | 2002-03-15 | 2003-09-25 | Sony Corp | Method and apparatus for voice synthesis, program, recording medium and robot apparatus |
US20050187772A1 (en) * | 2004-02-25 | 2005-08-25 | Fuji Xerox Co., Ltd. | Systems and methods for synthesizing speech using discourse function level prosodic features |
CN102103856A (en) * | 2009-12-21 | 2011-06-22 | 盛大计算机(上海)有限公司 | Voice synthesis method and system |
US20130211838A1 (en) * | 2010-10-28 | 2013-08-15 | Acriil Inc. | Apparatus and method for emotional voice synthesis |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11545135B2 (en) * | 2018-10-05 | 2023-01-03 | Nippon Telegraph And Telephone Corporation | Acoustic model learning device, voice synthesis device, and program |
CN109461435A (en) * | 2018-11-19 | 2019-03-12 | 北京光年无限科技有限公司 | A kind of phoneme synthesizing method and device towards intelligent robot |
CN109599094A (en) * | 2018-12-17 | 2019-04-09 | 海南大学 | The method of sound beauty and emotion modification |
CN109545245A (en) * | 2018-12-21 | 2019-03-29 | 斑马网络技术有限公司 | Method of speech processing and device |
CN109710748A (en) * | 2019-01-17 | 2019-05-03 | 北京光年无限科技有限公司 | It is a kind of to draw this reading exchange method and system towards intelligent robot |
CN110379409A (en) * | 2019-06-14 | 2019-10-25 | 平安科技(深圳)有限公司 | Phoneme synthesizing method, system, terminal device and readable storage medium storing program for executing |
CN110379409B (en) * | 2019-06-14 | 2024-04-16 | 平安科技(深圳)有限公司 | Speech synthesis method, system, terminal device and readable storage medium |
CN111031386A (en) * | 2019-12-17 | 2020-04-17 | 腾讯科技(深圳)有限公司 | Video dubbing method and device based on voice synthesis, computer equipment and medium |
CN111031386B (en) * | 2019-12-17 | 2021-07-30 | 腾讯科技(深圳)有限公司 | Video dubbing method and device based on voice synthesis, computer equipment and medium |
CN111091810A (en) * | 2019-12-19 | 2020-05-01 | 佛山科学技术学院 | VR game character expression control method based on voice information and storage medium |
WO2021127979A1 (en) * | 2019-12-24 | 2021-07-01 | 深圳市优必选科技股份有限公司 | Speech synthesis method and apparatus, computer device, and computer readable storage medium |
CN111108549A (en) * | 2019-12-24 | 2020-05-05 | 深圳市优必选科技股份有限公司 | Speech synthesis method, speech synthesis device, computer equipment and computer readable storage medium |
CN111108549B (en) * | 2019-12-24 | 2024-02-02 | 深圳市优必选科技股份有限公司 | Speech synthesis method, device, computer equipment and computer readable storage medium |
CN111128118A (en) * | 2019-12-30 | 2020-05-08 | 科大讯飞股份有限公司 | Speech synthesis method, related device and readable storage medium |
CN111128118B (en) * | 2019-12-30 | 2024-02-13 | 科大讯飞股份有限公司 | Speech synthesis method, related device and readable storage medium |
CN113539230A (en) * | 2020-03-31 | 2021-10-22 | 北京奔影网络科技有限公司 | Speech synthesis method and device |
CN112349272A (en) * | 2020-10-15 | 2021-02-09 | 北京捷通华声科技股份有限公司 | Speech synthesis method, speech synthesis device, storage medium and electronic device |
CN113990286A (en) * | 2021-10-29 | 2022-01-28 | 北京大学深圳研究院 | Speech synthesis method, device, equipment and storage medium |
CN113990286B (en) * | 2021-10-29 | 2024-11-19 | 北京大学深圳研究院 | Speech synthesis method, device, equipment and storage medium |
CN114783402A (en) * | 2022-06-22 | 2022-07-22 | 广东电网有限责任公司佛山供电局 | Variation method and device for synthetic voice, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
WO2019218481A1 (en) | 2019-11-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108615524A (en) | A kind of phoneme synthesizing method, system and terminal device | |
Li et al. | Controllable emotion transfer for end-to-end speech synthesis | |
Weninger et al. | On the acoustics of emotion in audio: what speech, music, and sound have in common | |
Xue et al. | Voice conversion for emotional speech: Rule-based synthesis with degree of emotion controllable in dimensional space | |
Baird et al. | The perception of vocal traits in synthesized voices: Age, gender, and human likeness | |
CN109271493A (en) | A kind of language text processing method, device and storage medium | |
Zhang et al. | Pre-trained deep convolution neural network model with attention for speech emotion recognition | |
CN101156196A (en) | Hybrid speech synthesizer, method and use | |
Schuller et al. | Synthesized speech for model training in cross-corpus recognition of human emotion | |
CN109147831A (en) | A kind of voice connection playback method, terminal device and computer readable storage medium | |
Pinto-Coelho et al. | On the development of an automatic voice pleasantness classification and intensity estimation system | |
Pauletto et al. | Exploring expressivity and emotion with artificial voice and speech technologies | |
Pravena et al. | Development of simulated emotion speech database for excitation source analysis | |
CN107221344A (en) | A kind of speech emotional moving method | |
Wang et al. | Significance of phonological features in speech emotion recognition | |
CN114927126A (en) | Scheme output method, device and equipment based on semantic analysis and storage medium | |
Alessandri et al. | A critical ear: analysis of value judgments in reviews of Beethoven's piano sonata recordings | |
Arnhold | Complex prosodic focus marking in Finnish: Expanding the data landscape | |
Chen et al. | Voice-Cloning Artificial-Intelligence Speakers Can Also Mimic Human-Specific Vocal Expression | |
Dhiman et al. | Modified dense convolutional networks based emotion detection from speech using its paralinguistic features | |
Wang et al. | Rigdelet neural network and improved partial reinforcement effect optimizer for music genre classification from sound spectrum images | |
Worrall et al. | Intelligible sonifications | |
CN110390097A (en) | A kind of sentiment analysis method and system based on the interior real time data of application | |
Shahmohammadi et al. | ViPE: Visualise Pretty-much Everything | |
CN112017668A (en) | Intelligent voice conversation method, device and system based on real-time emotion detection |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181002 |