CN109147760A - Synthesize method, apparatus, system and the equipment of voice - Google Patents
Synthesize method, apparatus, system and the equipment of voice Download PDFInfo
- Publication number
- CN109147760A CN109147760A CN201710508321.6A CN201710508321A CN109147760A CN 109147760 A CN109147760 A CN 109147760A CN 201710508321 A CN201710508321 A CN 201710508321A CN 109147760 A CN109147760 A CN 109147760A
- Authority
- CN
- China
- Prior art keywords
- voice
- phonetic dictionary
- index information
- text
- dictionary
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 58
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 159
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 159
- 230000002194 synthesizing effect Effects 0.000 claims abstract description 39
- 238000012545 processing Methods 0.000 claims abstract description 33
- 230000015654 memory Effects 0.000 claims description 17
- 238000001514 detection method Methods 0.000 claims description 13
- 238000012512 characterization method Methods 0.000 claims description 5
- 238000005516 engineering process Methods 0.000 description 13
- 238000006243 chemical reaction Methods 0.000 description 10
- 238000004891 communication Methods 0.000 description 10
- 230000000694 effects Effects 0.000 description 9
- 230000009466 transformation Effects 0.000 description 8
- 230000008569 process Effects 0.000 description 7
- 238000013519 translation Methods 0.000 description 7
- 239000002131 composite material Substances 0.000 description 6
- 238000010586 diagram Methods 0.000 description 6
- 210000003462 vein Anatomy 0.000 description 6
- 230000009471 action Effects 0.000 description 5
- 230000006870 function Effects 0.000 description 5
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 235000013399 edible fruits Nutrition 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 238000013473 artificial intelligence Methods 0.000 description 2
- 230000008859 change Effects 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 239000004973 liquid crystal related substance Substances 0.000 description 2
- 230000005055 memory storage Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 230000001419 dependent effect Effects 0.000 description 1
- 238000000151 deposition Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/08—Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
- G10L13/10—Prosody rules derived from text; Stress or intonation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
- G10L13/04—Details of speech synthesis systems, e.g. synthesiser structure or memory management
Abstract
The invention discloses a kind of method, apparatus, system and equipment for synthesizing voice.Wherein, this method comprises: receiving text and index information to be converted;Corresponding phonetic dictionary is obtained according to index information, wherein the corresponding phonetic dictionary of different index information characterizes the sound producing pattern under different application environments;Speech synthesis service processing text to be converted and corresponding phonetic dictionary are called, the voice after generating synthesis.The present invention solves the not high technical problem of voice accuracy that existing speech synthesis system generates.
Description
Technical field
The present invention relates to speech synthesis technique fields, in particular to a kind of method, apparatus, system for synthesizing voice
And equipment.
Background technique
Speech synthesis is to generate the technology of artificial voice by mechanical, electronics method.TTS technology (also known as Wen Yuzhuan
Change technology) it is under the jurisdiction of speech synthesis, text information computer-internal can be generated or externally input is converted into voice and broadcasts
It quotes and, for example, automatic telephone customer service, sound novel etc. is realized using speech synthesis technique.As user is to voice
The increasingly increase of synthesis demand, people's pairing at voice requirement increasingly diversity.Thus, how to improve speech synthesis system
Accuracy be converted into rich in emotion, be more nearly the voice of human language, be future speech synthesis system and by text information
One of important topic of system.
Existing speech synthesis system can only provide a kind of pronunciation generally, for a word, phrase or sentence.
And in practical applications, different application scenarios, the specific word of user, phrase or sentence might have different pronunciations, example
Such as, when making name or place name, it sometimes may be not that pronunciation differs greatly with usual pronunciation for some words or word
Same pronunciation.If, for the word or word of special-purpose, the pronunciation of synthesis may be wrong using existing speech synthesis system
Accidentally, or even certain ambiguities can be caused.On the other hand, same word, word, phrase or sentence be under different application scenarios,
Speech intonation is often also different.Thus, it is traditional either from application scenarios, or from the consideration of the demand of particularization
Speech synthesis system can not solve this problem.
For above-mentioned problem, currently no effective solution has been proposed.
Summary of the invention
It is existing at least to solve the embodiment of the invention provides a kind of method, apparatus, system and equipment for synthesizing voice
The not high technical problem of the voice accuracy that speech synthesis system generates.
According to an aspect of an embodiment of the present invention, a kind of method for synthesizing voice is provided, comprising: receive to be converted
Text and index information;Corresponding phonetic dictionary is obtained according to index information, wherein the corresponding phonetic dictionary of different index information
Characterize the sound producing pattern under different application environments;The text and corresponding voice word for calling speech synthesis service processing to be converted
Allusion quotation, the voice after generating synthesis.
According to another aspect of an embodiment of the present invention, a kind of equipment for synthesizing voice is additionally provided, comprising: input unit,
For receiving text and index information to be converted;Processor for obtaining corresponding phonetic dictionary according to index information, and is adjusted
Text to be converted and corresponding phonetic dictionary are handled with voice Composite service, the voice after generating synthesis, wherein different index
The corresponding phonetic dictionary of information characterizes the sound producing pattern under different application environments;Pronunciation device, for exporting the language after synthesizing
Sound.
According to another aspect of an embodiment of the present invention, a kind of system for synthesizing voice is additionally provided, comprising: headend equipment,
Text and index information to be converted for receiving input;Server is connect with headend equipment, for receiving text to be converted
Sheet and index information, and the phonetic dictionary obtained according to index information is returned into headend equipment, wherein different index information pair
The phonetic dictionary answered characterizes the sound producing pattern under different application environments;Headend equipment is also used to call speech synthesis service processing
Text to be converted and corresponding phonetic dictionary, the voice after generating synthesis.
According to another aspect of an embodiment of the present invention, a kind of device for synthesizing voice is additionally provided, comprising: receiving module,
For receiving text and index information to be converted;Module is obtained, for obtaining corresponding phonetic dictionary according to index information,
In, the corresponding phonetic dictionary of different index information characterizes the sound producing pattern under different application environments;Generation module, for calling
Speech synthesis service processing text to be converted and corresponding phonetic dictionary, the voice after generating synthesis.
According to an aspect of an embodiment of the present invention, a kind of method for synthesizing voice is provided, comprising: receive to be converted
Text and index information;Corresponding phonetic dictionary is obtained according to index information, wherein the corresponding phonetic dictionary of different index information
Characterize the sound producing pattern under different application environments;Text to be converted and corresponding phonetic dictionary are handled, generates and closes
Voice after.
According to another aspect of an embodiment of the present invention, a kind of device for synthesizing voice is additionally provided, comprising: receiving unit,
For receiving text and index information to be converted;Acquiring unit, for obtaining corresponding phonetic dictionary according to index information,
In, the corresponding phonetic dictionary of different index information characterizes the sound producing pattern under different application environments;Generation unit, for treating
The text of conversion and corresponding phonetic dictionary are handled, the voice after generating synthesis.
According to another aspect of an embodiment of the present invention, a kind of storage medium is additionally provided, storage medium includes the journey of storage
Sequence, wherein the method that equipment where control storage medium executes above-mentioned synthesis voice in program operation.
According to another aspect of an embodiment of the present invention, a kind of processor is additionally provided, processor is used to run program,
In, program executes the above-mentioned method for synthesizing voice when running.
In embodiments of the present invention, by receiving text and index information to be converted;It is obtained and is corresponded to according to index information
Phonetic dictionary, wherein the corresponding phonetic dictionary of different index information characterizes the sound producing pattern under different application environments;It calls
Speech synthesis service processing text to be converted and corresponding phonetic dictionary, the voice after generating synthesis, have reached in a language
It is the purpose of the voice under different application scene by one text Content Transformation in sound synthesis system, to realize more intelligent
With the technical effect of diversified speech synthesis service, and then solves and then solve what existing speech synthesis system generated
The not high technical problem of voice accuracy.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is a kind of equipment schematic diagram for synthesizing voice according to an embodiment of the present invention;
Fig. 2 is a kind of optional speech synthesis schematic illustration according to an embodiment of the present invention;
Fig. 3 is a kind of method flow diagram for synthesizing voice according to an embodiment of the present invention;
Fig. 4 is a kind of method flow diagram of optional synthesis voice according to an embodiment of the present invention;
Fig. 5 is a kind of system schematic for synthesizing voice according to an embodiment of the present invention;And
Fig. 6 is a kind of schematic device for synthesizing voice according to an embodiment of the present invention;
Fig. 7 is a kind of hardware block diagram of terminal according to an embodiment of the present invention;
Fig. 8 is a kind of flow chart of method for synthesizing voice according to an embodiment of the present invention;And
Fig. 9 is a kind of schematic device for synthesizing voice according to an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work
It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
Embodiment 1
According to embodiments of the present invention, a kind of apparatus embodiments for synthesizing voice are provided, it should be noted that the present embodiment
The apparatus embodiments of the synthesis voice of offer can be for for providing the computer of voice service, mobile phone, e-book, MP3, vehicle-mounted leading
The intelligent electronic devices such as boat, or the intelligent robot of artificial intelligence field.The equipment of the synthesis voice can be by oneself
Text information generating or receiving is converted to voice signal, and is exported by speech ciphering equipment (for example, player).
The equipment of synthesis voice provided in this embodiment, can support user by text information to be converted according to text
Application scenarios (for example, sound novel, storytelling, modern drama, give a lecture, read aloud), voice style are (for example, male voice, female voice, child's voice, deep and remote
Silent style, serious style etc.) or the corresponding voice of special-purpose (for example, name, place name etc.) output, to meet user's multiplicity
The speech synthesis service of change.
As an alternative embodiment, if the text information generated inside equipment is converted to language using the equipment
Sound signal, user can directly select a certain application scenarios, voice style or special-purpose, and it is specified which just exports user
Application scenarios, the voice under voice style or special-purpose;If converted using the text information that the equipment inputs user
For voice signal, user can input or choose the voice word for being converted to target voice while inputting word content
The index information (application scenarios, voice style or special-purpose that the index information is used for specified word content) of allusion quotation, equipment is just
By the word content of input with target voice output.
Fig. 1 show a kind of equipment schematic diagram for synthesizing voice according to an embodiment of the present invention.As shown in Figure 1, the equipment
10 include: input unit 101, processor 103 and pronunciation device 105.
Wherein, input unit 101, for receiving text and index information to be converted.
Specifically, above-mentioned text to be converted can be the text information obtained by input unit, and the form of text is not
It is confined to Chinese, English, can be the language of any country;Above-mentioned index information can for it is pre-set for index to
The identification information of a few phonetic dictionary, for example, it may be the number of phonetic dictionary;The phonetic dictionary can for for will wait turn
The text conversion changed is the voice under different application scene, voice style or special-purpose.
Optionally, above-mentioned input unit can be thought as the hardware input equipment such as keyboard, scanning means, handwriting pad, microphone,
If input unit is keyboard, equipment 10 directly can receive user by keyboard and input the text to be converted for target voice
Information;If input unit is scanning means (for example, scanner or camera), equipment 10 can be identified by scanning first
The text information in image that device scans, is converted to corresponding text information, as the text to be converted for target voice
This information;If input unit is handwriting pad, equipment 10 obtains corresponding text according to the motion track of user on the jotting surface
Word, as the text information to be converted for target voice;If input unit is microphone, equipment 10 is to receive user defeated
After the voice entered, voice content is converted into corresponding text information, as the text information to be converted for target voice.As
A kind of optional embodiment, above-mentioned text to be converted can be the data obtained from network, for example, when user's use exists
When line dictionary for translation, the content of text of a certain language is inputted, then server can return to another kind corresponding with text content
The text information of language, and the text information of return is exported in a manner of voice, then above-mentioned input unit can also be for can be with
For receiving the communication device of server returns information.
In a kind of optional embodiment, above-mentioned index information can be used for specified by text conversion to be converted being voice
At least one phonetic dictionary, the index information can be the identification information of phonetic dictionary, can also be with the number of phonetic dictionary.
As an alternative embodiment, above-mentioned index information can be customized by the user input, it is also possible to use
It is selected in the identification information of at least one phonetic dictionary of family according to system suggestion;It can be defeated while inputting word information
Enter index information, index information can also be just inputted before inputting text.
As another optional embodiment, it is based on context semantic to can be system for above-mentioned index information, automatic to know
It is clipped to the corresponding phonetic dictionary of text information, thus the identification information or number of the corresponding phonetic dictionary automatically selected.
Optionally, above two mode can generate voice using universal phonetic synthesis dictionary in default conditions, work as user
Input specific use phonetic dictionary index information or system identification to specific use text information in the case where, acquisition
The phonetic dictionary of corresponding specific use carries out speech synthesis.
It is easy it is noted that the embodiment of the application protection includes but is not limited to above embodiment, as long as being related to
The synthetic schemes that text information addition label information (i.e. index information) carries out voice under different scenes difference purposes is belonged to
The scope of protection of the invention.
Herein it should be noted that the priority of index information can be set, called according to the sequence of priority corresponding
Phonetic dictionary, for example, the sequence of priority can be special-purpose, application scenarios, voice style.Special-purpose is called first
The phonetic dictionary of (for example, name, place name etc.), it is ensured that the accuracy of word pronunciation under different purposes;Secondly it calls not
With the phonetic dictionary (for example, sound novel, storytelling, modern drama, give a lecture, read aloud) of application scenarios, rough voice can be determined
Intonation;Finally based on different voice styles (for example, male voice, female voice, child's voice, vein of humour vein, serious style etc.), so that voice
It is more diversified.
Processor 103 for obtaining corresponding phonetic dictionary according to index information, and calls speech synthesis service processing to wait for
The text of conversion and corresponding phonetic dictionary, the voice after generating synthesis, wherein the corresponding phonetic dictionary table of different index information
Levy the sound producing pattern under different application environments.
Specifically, above-mentioned phonetic dictionary can be for for being different application environment, voice wind by text conversion to be converted
The sound bank of sound producing pattern under lattice or special-purpose, contain in the sound bank content of text to be converted and with text content
Corresponding voice messaging;The text information and index to be converted for target voice is received by input unit 101 in equipment 10
After information, according to the index information of the target voice, phonetic dictionary corresponding with the target voice is got, and voice is called to close
At service (TTS), the text information to be converted for target voice is synthesized by corresponding target voice based on the phonetic dictionary.
In a kind of optional embodiment, it is assumed that text to be converted is " mine is named as the chief of the Xiongnu in Acient China ", wherein " chief of the Xiongnu in Acient China " this
For word when making name, pronunciation is different from normal articulation, if using existing speech synthesis system, the language that synthesizes
Sound result is " wo de ming zi jiao dan yu ";And the above embodiments of the present application are based on, by establishing dedicated for surname
The phonetic dictionary (user-oriented dictionary) of name or place name, phonetic dictionary format can be as shown in table 1, as shown in table 1, in synthesis voice
In the process, if user is when inputting text " mine is named as the chief of the Xiongnu in Acient China " to be converted, while the index of user-oriented dictionary is inputted
Information " 1 ", the then sound result synthesized are " wo3de0ming2zi4jiao4shan4yu2 ", so as to avoid inciting somebody to action
" shan4yu2 " misreads into " dan1yu2 ";Wherein, respectively with " 0 ", " 1 ", " 2 ", " 3 ", " 4 " respectively indicate tone be " softly ",
" sound ", " two sound ", " three sound ", " four tones of standard Chinese pronunciation ".
1 phonetic dictionary format of table
Number | Word | Mark |
1 | The chief of the Xiongnu in Acient China | Shan4yu2 |
2 | Bozhou | Bo2zhou1 |
In another optional embodiment, it is assumed that text to be converted is " my family come from Bozhou ", wherein " Bozhou " this
For word when making place name, pronunciation is different from normal articulation, if using existing speech synthesis system, the language that synthesizes
Sound result is " wo jia lai zi hao zhou ";And the user-oriented dictionary as shown in Table 1 based on the above embodiments of the present application,
The sound result then synthesized is " wo3jia1lai2zi1bo2zhou1 ";Wherein, respectively with " 0 ", " 1 ", " 2 ", " 3 ", " 4 " difference
Indicate that tone is " softly ", " sound ", " two sound ", " three sound ", " four tones of standard Chinese pronunciation ".
Pronunciation device 105, for exporting the voice after synthesizing.
Specifically, above-mentioned pronunciation device can be the player for exporting voice, and text to be converted is synthesized in processor
After this voice, generating device is by the voice output after synthesis.
From the foregoing, it will be observed that in the above embodiments of the present application, by the voice word for establishing sound producing pattern under different application environment
Allusion quotation receives the text phonetic dictionary corresponding with the target voice of target voice to be converted during carrying out speech synthesis
Index information, corresponding with target voice phonetic dictionary is obtained according to the index information, is based on the phonetic dictionary, calling voice
Composite service synthesizes the target voice of text to be converted, and the target voice after synthesis is exported, and has reached and has closed in a voice
At in system by one text Content Transformation being the purpose of the voice under different application scene, to realize more intelligent and more
The technical effect of the speech synthesis service of sample, and then solve and then solve the voice that existing speech synthesis system generates
The not high technical problem of accuracy.
In an alternative embodiment, above-mentioned phonetic dictionary is for recording same pronunciation object under different application environments
Different pronunciations, wherein pronunciation object includes at least one following: word, word, phrase and sentence.
Specifically, in the above-described embodiments, above-mentioned pronunciation object can be word, the word of composition content of text to be converted
Language, phrase or sentence can establish under different application environment, voice style or special-purpose for identical pronunciation object
Sound producing pattern constitutes the phonetic dictionary under different application environment, voice style or special-purpose, word content is being synthesized voice
During, it can be for word, word, phrase or sentence in word content in different application scenarios, voice style or specific
The pronunciation under the application scenarios, voice style or special-purpose is selected under purposes.
It for synthesizing sound novel, is right one small, often relates to different roles (for example, male master, female main etc.),
And what is said or talked about language under different scenes (for example, anger, sadness, happiness etc.).Thus, it is based on synthesis language provided in this embodiment
The equipment of sound, during text novel is converted to sound novel, can be called from phonetic dictionary library different role or
Voice in the phonetic dictionary of different scenes, for example, when word content to be converted is the session in happy situation of " male is main "
When content, the label of " male voice, happiness " can be inputted, so as to call voice while input male main session content
Pronunciation corresponding with session content (word, word, phrase or the sentence) in " male voice, happiness " dictionary in dictionary, synthesis are final
The voice of the session content.
In a kind of optional embodiment, each role and the corresponding gender of each role can be set, is had in synthesis
During sound novel, the session content of each role of system automatic identification, and transfer corresponding phonetic dictionary automatically and closed
At the phonetic dictionary of specific use can be transferred automatically when recognizing the text information of specific use.Optionally, using certainly
Right voice processing technology can go out the text of specific use for context semantics recognition.For example, in default conditions using general
Speech synthesis dictionary generates voice, in the case where recognizing the text information (e.g., " mine is named as the chief of the Xiongnu in Acient China ") of specific use,
System can synthesize the phrase adjacent with " name " by name phonetic dictionary.
Through the foregoing embodiment, it may be implemented for same word, word, phrase or sentence, according to application scenarios, voice wind
Lattice or special-purpose synthesize different pronunciations, to realize diversified pronunciation, enhance user experience.
In an alternative embodiment, as shown in Figure 1, above equipment 10 further include: communication device 107, for uploading
The dictinary information of phonetic dictionary is to server, wherein server stores at least one phonetic dictionary, and each phonetic dictionary includes
The dictinary information of upload, server generate matched index information, different voices after receiving the dictinary information of upload
Dictionary corresponds to different index informations.
Specifically, in the above-described embodiments, before calling speech synthesis service, user can setting by synthesis voice
The standby phonetic dictionary for generating different application environment, voice style or special-purpose, and it is uploaded to server, which is receiving
To after the dictinary information of upload, matched index information is generated, optionally, which can be the volume being randomly generated
Number, it is also possible to the identity information (ID) of user;Since different phonetic dictionaries corresponds to different index informations, thus, user
Corresponding index information can be inputted while inputting word content to be converted, then available corresponding phonetic dictionary.
Through the foregoing embodiment, the phonetic dictionary under creation different application environment, voice style or special-purpose is realized
The purpose in library, also, corresponding phonetic dictionary on server is obtained according to index information, reduce accounting for for local storage space
With.
In an alternative embodiment, after above-mentioned server receives the dictinary information of upload, dictinary information is detected
In include pronunciation object format and/or pronunciation whether meet predetermined condition, if it is satisfied, then determination dictinary information is written
Corresponding index database.
Specifically, in the above-described embodiments, server is after receiving phonetic dictionary (user-oriented dictionary, UserDict), clothes
Legitimacy detection module on business device can check whether the format of dictionary and pronunciation are legal, will after legitimacy detection passes through
Dictinary information write service device, and all user-oriented dictionaries are compiled according to index information (for example, mark id) and are formed together
Phonetic dictionary library (user-oriented dictionary library, UserDicts).
Through the foregoing embodiment, verifying link is increased, the safety of system is improved.
In an alternative embodiment, above-mentioned processor 103 is also used to be inquired from server according to index information
To corresponding phonetic dictionary;Whether the attribute for detecting phonetic dictionary is legal;If legal, it is determined that language corresponding with index information
Sound dictionary;If illegal, it is determined that inquiry failure returns to server and carries out inquiry operation, wherein if in pre- timing
The number of interior query result failure or inquiry failure is more than pre-determined number, then abandons current inquiry request and export prompt letter
Breath.
Specifically, in the above-described embodiments, processor 103 is in the process for obtaining corresponding phonetic dictionary according to index information
In, inquire phonetic dictionary corresponding with the index information from server according to the index information of content of text to be converted first,
After finding phonetic dictionary corresponding with the index information, whether the attribute for detecting the phonetic dictionary is legal, if legal,
Using the phonetic dictionary as phonetic dictionary corresponding with the index information, if the attribute of the phonetic dictionary is illegal, inquiry is lost
It loses, then returns to server and inquired, failed if it exceeds the predetermined time still inquires, or the number of inquiry failure is more than
Pre-determined number then abandons current inquiry request, and exports prompt information.In a kind of optional embodiment, if do not looked into
Corresponding phonetic dictionary is ask, it can be using default pronunciation.
As a kind of optional embodiment, Fig. 2 is that a kind of optional speech synthesis principle according to an embodiment of the present invention is shown
It is intended to, as shown in Fig. 2, user is first by its user-oriented dictionary (user-oriented dictionary 1, user before calling speech synthesis (TTS) service
Dictionary 1 ... user-oriented dictionary N) it uploads onto the server, server is examined after receiving user-oriented dictionary by legitimacy detection module
Whether the format and pronunciation consulted the dictionary are legal, after legitimacy detection, can compile all user-oriented dictionaries according to index information
User-oriented dictionary library is formed together, and in synthesis phase, user inputs text to be synthesized in TTS (Text to Speech) engine
While input dictionary index information can synthesize the sound result of this style.
In an alternative embodiment, above-mentioned communication device 107 be also used to timing from server download phonetic dictionary to
It is local, and the phonetic dictionary downloaded to is cached, so that during obtaining corresponding phonetic dictionary according to index information, if
Corresponding phonetic dictionary can not be inquired in local cache, then forwarding inquiries request to obtain corresponding voice word to server
Allusion quotation.
Specifically, in the above-described embodiments, 107 timing of communication device of above equipment 10 downloads from a server voice word
Allusion quotation is to local and caches, during processor 103 obtains corresponding phonetic dictionary according to index information, if locally slow
Corresponding phonetic dictionary can not be inquired in depositing, then it is corresponding to obtain to be forwarded to server by communication device 107 for inquiry request
Phonetic dictionary.
Through the foregoing embodiment, using the form of caching, the rate of speech synthesis is improved, and is utilized a large amount of on server
Phonetic dictionary carry out speech synthesis, ensure that the validity of speech synthesis.
Embodiment 2
According to embodiments of the present invention, additionally provide a kind of embodiment of the method for synthesizing voice, can be applied to be related to by
Text information is converted in the various speech synthesis scenes of voice, such as the speech synthesis that the speech synthesis service of Baidu, news fly
Service, thinks the speech synthesis service that must be speeded at the speech synthesis service of Jie Tonghua sound.
Speech synthesis technique is also known as literary periodicals technology, abbreviation TTS (Text to Speech) technology, and major function is
By text information that generate computer oneself or externally input (for example, text file content, word document content etc.),
Voice signal output is converted to according to speech processes rule.Any text information can be converted in real time normal stream by TTS technology
Smooth massage voice reading comes out, and is related to the technology of multiple subjects such as acoustics, linguistics, digital information processing, computer science.Literary language
Converting system can actually regard an artificial intelligence system as.In order to synthesize the language of high quality, in addition to dependent on each
Kind rule, including semantics rule, lexical rule, phonetics rule are outer, it is necessary to be well understood by having in text.
With the increasingly increase of speech synthesis demand, people's pairing at voice requirement increasingly diversity.Different
Under application environment, prosodic parameter be all it is different, the requirement with people to the naturalness and sound quality of speech synthesis is more next
Higher, speech synthesis system should generate personalized, diversified voice, to meet different application scenarios.
On the other hand, due to for same text, the pronunciation for having its different under special-purpose, for example, " one " this word,
It is read when individually reading;When being placed on composition word behind word, a sound is read;Being placed in the word formed before word will read to become
It adjusts, two sound is read before the four tones of standard Chinese pronunciation, read the four tones of standard Chinese pronunciation before one, two, three sound;In another example some words are a pronunciation in routine use, when
It is a pronunciation again when making name.
And in existing speech synthesis system, a word, phrase or usually only a kind of pronunciation of sentence were both unable to satisfy
Different application scenarios in the case where some special-purposes, or even can issue the pronunciation of mistake.
Under above-mentioned application environment, this application provides a kind of methods of synthesis voice as shown in Figure 3.Based on this method
The speech synthesis system of embodiment, can support user by text information to be converted according to the application scenarios of text (for example, having
Sound novel, modern drama, gives a lecture, reads aloud at storytelling), voice style is (for example, male voice, female voice, child's voice, vein of humour vein, serious style
Deng) or the corresponding voice of special-purpose (for example, name, place name etc.) output, to meet the diversified speech synthesis clothes of user
Business.
Fig. 3 is a kind of flow chart of method for synthesizing voice according to an embodiment of the present invention, it should be noted that in attached drawing
Process the step of illustrating can execute in a computer system such as a set of computer executable instructions, although also,
Logical order is shown in flow charts, but in some cases, can be executed with the sequence for being different from herein it is shown or
The step of description.As shown in figure 3, including the following steps:
Step S302 receives text and index information to be converted.
Specifically, in above-mentioned steps, above-mentioned text to be converted can be the text information obtained by input unit,
The form of text is not limited to Chinese, English, can be the language of any country;Above-mentioned index information can be pre-set
For indexing the identification information of at least one phonetic dictionary, which can be for for being by text conversion to be converted
Voice under different application scene, voice style or special-purpose.
Herein it should be noted that during synthesizing voice, above-mentioned index information is used for text rope to be converted
Different phonetic dictionaries is guided to, as an alternative embodiment, if above-mentioned text to be converted is the text of user's input
Word information, then above-mentioned index information can be user and be customized by the user input, example while inputting text to be converted
Such as, during user inputs " mine is named as the chief of the Xiongnu in Acient China ", due to pronunciation when " chief of the Xiongnu in Acient China " is as name and under normal circumstances not
Together, thus, user after input " chief of the Xiongnu in Acient China ", can the above or below of " chief of the Xiongnu in Acient China " input dedicated for name phonetic dictionary
Identification information " 1 ".It is alternatively possible at least one phonetic dictionary that can prompt user currently available on input interface with
And the identification information of each phonetic dictionary, it is selected for user.
Since the text for synthesizing voice is not necessarily user's input, it is also possible to be the text that computer-internal generates
Word information, thus, as another optional embodiment, above-mentioned index information can also be the semanteme of system based on context
It analyzes and sets automatically, it is alternatively possible to based on natural language analysis technology come automatic identification text information.For example, when using
When family uses translation on line dictionary, the content of text of a certain language is inputted, then server can return corresponding with text content
Another language text information, if return text information be to be exported in a manner of voice, system can be known automatically
Not Ji Suan the internal text generated purposes or application scenarios and voice style etc., and automatically select corresponding index information,
The text that computer-internal generates is input to speech synthesis system with index information to synthesize.
Herein it should also be noted that, being directed to the first above-mentioned optional embodiment, user is needed to input index information
In the case where, default conditions can generate voice using universal phonetic synthesis dictionary, and user only needs inputting some special use
When the text information of way or application scenarios, corresponding phonetic dictionary is inputted while inputting these text informations.For example,
When user needs to synthesize a voice about the article of self-introduction, first select one default phonetic dictionary (for example,
General " female voice " phonetic dictionary), general " female voice " phonetic dictionary that default is all made of during inputting text is synthesized,
If encountering the text information of specific use, for example, name or place name, user can input while inputting name or place name
The identification information of corresponding name phonetic dictionary or place name phonetic dictionary after the text information for having synthesized name or place name, continues
It is synthesized using general " female voice " phonetic dictionary of default, so that user be avoided to repeatedly input the troublesome operation of index information.
For above-mentioned second optional embodiment, voice can also be generated using universal phonetic synthesis dictionary in default conditions, when
In the case where the text information for recognizing specific use, the phonetic dictionary for obtaining corresponding specific use carries out speech synthesis.
In a kind of optional embodiment, above-mentioned text can be user directly inputted by keyboard it is to be converted for target language
The text information of sound;The text being also possible in the image scanned by scanning means is converted to phase by identifying processing
The text information answered;It is also possible to text information obtained from the text of the writing of user on the jotting surface;It can also be user
After inputting voice by microphone lamp signal mixer, text information that voice content is converted into.As a kind of optional implementation
Scheme, above-mentioned text to be converted can also be the data obtained from network, for example, when user uses translation on line dictionary
When, the content of text of a certain language is inputted, then server can return to the text of another language corresponding with text content
Information, and the text information of return is exported in a manner of voice.
In a kind of optional embodiment, above-mentioned index information can be used for specified by text conversion to be converted being voice
At least one phonetic dictionary, the index information can be the identification information of phonetic dictionary, can also be with the number of phonetic dictionary.
Herein it should be noted that the priority of index information can be set, called according to the sequence of priority corresponding
Phonetic dictionary, for example, the sequence of priority can be special-purpose, application scenarios, voice style.Special-purpose is called first
The phonetic dictionary of (for example, name, place name etc.), it is ensured that the accuracy of word pronunciation under different purposes;Secondly it calls not
With the phonetic dictionary (for example, sound novel, storytelling, modern drama, give a lecture, read aloud) of application scenarios, rough voice can be determined
Intonation;Finally based on different voice styles (for example, male voice, female voice, child's voice, vein of humour vein, serious style etc.), so that voice
It is more diversified.
Step S304 obtains corresponding phonetic dictionary according to index information, wherein the corresponding voice word of different index information
Allusion quotation characterizes the sound producing pattern under different application environments.
Specifically, in above-mentioned steps, above-mentioned phonetic dictionary can be for for being that difference is answered by text conversion to be converted
With the sound bank of sound producing pattern under environment, voice style or special-purpose, content of text to be converted is contained in the sound bank
With voice messaging corresponding with text content;After receiving the text information and index information to be converted for target voice,
According to the index information of the target voice, phonetic dictionary corresponding with the target voice is got, wherein phonetic dictionary can be
The phonetic dictionary being locally stored, the phonetic dictionary being also possible on server.
Step S306 calls speech synthesis service processing text to be converted and corresponding phonetic dictionary, after generating synthesis
Voice.
Specifically, in above-mentioned steps, after getting phonetic dictionary corresponding with the target voice according to index information,
It calls speech synthesis service (TTS), is based on the phonetic dictionary, by the text information to be converted for target voice, synthesis is corresponding
Target voice.
In a kind of optional embodiment, it is assumed that text to be converted is " mine is named as the chief of the Xiongnu in Acient China ", wherein " chief of the Xiongnu in Acient China " this
For word when making name, pronunciation is different from normal articulation, if using existing speech synthesis system, the language that synthesizes
Sound result is " wo de ming zi jiao dan yu ";And the above embodiments of the present application are based on, by establishing dedicated for surname
The phonetic dictionary (user-oriented dictionary) of name or place name, phonetic dictionary format can be as shown in table 1, as shown in table 1, in synthesis voice
In the process, if user is when inputting text " mine is named as the chief of the Xiongnu in Acient China " to be converted, while the index of user-oriented dictionary is inputted
Information " 1 ", the then sound result synthesized are " wo3de0ming2zi4jiao4shan4yu2 ", so as to avoid inciting somebody to action
" shan4yu2 " misreads into " dan1yu2 ";Wherein, respectively with " 0 ", " 1 ", " 2 ", " 3 ", " 4 " respectively indicate tone be " softly ",
" sound ", " two sound ", " three sound ", " four tones of standard Chinese pronunciation ".
1 phonetic dictionary format of table
Number | Word | Mark |
1 | The chief of the Xiongnu in Acient China | Shan4yu2 |
2 | Bozhou | Bo2zhou1 |
In another optional embodiment, it is assumed that text to be converted is " my family come from Bozhou ", wherein " Bozhou " this
For word when making place name, pronunciation is different from normal articulation, if using existing speech synthesis system, the language that synthesizes
Sound result is " wo jia lai zi hao zhou ";And the user-oriented dictionary as shown in Table 1 based on the above embodiments of the present application,
The sound result then synthesized is " wo3jia1lai2zi1bo2zhou1 ";Wherein, respectively with " 0 ", " 1 ", " 2 ", " 3 ", " 4 " difference
Indicate that tone is " softly ", " sound ", " two sound ", " three sound ", " four tones of standard Chinese pronunciation ".
From the foregoing, it will be observed that in the above embodiments of the present application, by the voice word for establishing sound producing pattern under different application environment
Allusion quotation receives the text phonetic dictionary corresponding with the target voice of target voice to be converted during carrying out speech synthesis
Index information, corresponding with target voice phonetic dictionary is obtained according to the index information, is based on the phonetic dictionary, calling voice
Composite service synthesizes the target voice of text to be converted, and the target voice after synthesis is exported, and has reached and has closed in a voice
At in system by one text Content Transformation being the purpose of the voice under different application scene, to realize more intelligent and more
The technical effect of the speech synthesis service of sample, and then solve and then solve the voice that existing speech synthesis system generates
The not high technical problem of accuracy.
In an alternative embodiment, above-mentioned phonetic dictionary is for recording same pronunciation object under different application environments
Different pronunciations, wherein pronunciation object includes at least one following: word, word, phrase and sentence.
Specifically, in the above-described embodiments, above-mentioned pronunciation object can be word, the word of composition content of text to be converted
Language, phrase or sentence can establish under different application environment, voice style or special-purpose for identical pronunciation object
Sound producing pattern constitutes the phonetic dictionary under different application environment, voice style or special-purpose, word content is being synthesized voice
During, it can be for word, word, phrase or sentence in word content in different application scenarios, voice style or specific
The pronunciation under the application scenarios, voice style or special-purpose is selected under purposes.
Herein it should be noted that above-mentioned index information can correspond to a word in text to be converted, word, phrase or
Sentence, alternatively it is also possible to correspond to entire text to be converted, it is different according to concrete application scene or preset condition, it can
To realize different index functions.Wherein, when index information corresponds only to a word, word, phrase or the sentence of text to be converted
In the case where, index information is added in the above or below for inputting the word, word, phrase or sentence, then the index information indexes
Phonetic dictionary be served only for carrying out speech synthesis to current word, word, phrase or sentence, other parts are then adopted in text to be converted
Speech synthesis is carried out with the phonetic dictionary of default.In order to further discriminate between index information for any partial words, word, phrase or sentence
Son is synthesized, and in a kind of optional embodiment, the language indexed using the index information can be added in index information
The number of words or starting, terminal text of sound dictionary progress speech synthesis;In another optional embodiment, one can be used
A little additional characters (for example, bracket or quotation marks) carry out speech synthesis using the phonetic dictionary that the index information indexes to distinguish
Text.
For example, text to be converted is " mine is named as the chief of the Xiongnu in Acient China, is reading the books of a separate edition ", under default situations
Index can be added when being input to " chief of the Xiongnu in Acient China " before " chief of the Xiongnu in Acient China " by carrying out speech synthesis using universal phonetic dictionary
It is come out to the index information of name phonetic dictionary, and by " chief of the Xiongnu in Acient China " this word bracket or quotation marks, then system is in translation " I
Be named as the chief of the Xiongnu in Acient China, reading the books of a separate edition " the words when, only when translation " chief of the Xiongnu in Acient China " this word
It is the name phonetic dictionary utilized, other word segments still carry out speech synthesis using universal phonetic dictionary, so as to incite somebody to action
Two " list " are utilized respectively different phonetic dictionary synthesis not in " mine is named as the chief of the Xiongnu in Acient China, is reading the books of a separate edition "
Same voice.
Optionally, system can also be to need to carry out voice conjunction using specific human voices dictionary in automatic identification text to be converted
At word, word, phrase or sentence, then by the text of preset quantity behind the word, word, phrase or sentence utilize index information rope
The specific human voices dictionary guided to carries out speech synthesis, and the text of other parts is carried out using Default sound dictionary in text to be converted
Speech synthesis.For example, making after recognizing " name " in " mine is named as the chief of the Xiongnu in Acient China, is reading the books of a separate edition "
For a kind of optional embodiment, the entire sentence comprising " name " can be subjected to voice conjunction using name phonetic dictionary automatically
At other word segments still carry out speech synthesis using universal phonetic dictionary, so as to by " mine is named as the chief of the Xiongnu in Acient China, just
In the books for reading a separate edition " in two " list " be utilized respectively different phonetic dictionaries and synthesize different voices.
It for synthesizing sound novel, is right one small, often relates to different roles (for example, male master, female main etc.),
And what is said or talked about language under different scenes (for example, anger, sadness, happiness etc.).Thus, it is based on synthesis language provided in this embodiment
The equipment of sound, during text novel is converted to sound novel, can be called from phonetic dictionary library different role or
Voice in the phonetic dictionary of different scenes, for example, when word content to be converted is the session in happy situation of " male is main "
When content, the label of " male voice, happiness " can be inputted, so as to call voice while input male main session content
Pronunciation corresponding with session content (word, word, phrase or the sentence) in " male voice, happiness " dictionary in dictionary, synthesis are final
The voice of the session content.
Through the foregoing embodiment, it may be implemented for same word, word, phrase or sentence, according to application scenarios, voice wind
Lattice or special-purpose synthesize different pronunciations, to realize diversified pronunciation, enhance user experience.
In an alternative embodiment, before obtaining corresponding phonetic dictionary according to index information, the above method is also
May include steps of: step S303 uploads the dictinary information of phonetic dictionary to server, wherein server store to
A few phonetic dictionary, each phonetic dictionary include the dictinary information uploaded, server the dictinary information for receiving upload it
Afterwards, matched index information is generated, different phonetic dictionaries corresponds to different index informations.
Specifically, in the above-described embodiments, before calling speech synthesis service, user can setting by synthesis voice
The standby phonetic dictionary for generating different application environment, voice style or special-purpose, and it is uploaded to server, which is receiving
To after the dictinary information of upload, matched index information is generated, optionally, which can be the volume being randomly generated
Number, it is also possible to the identity information (ID) of user;Since different phonetic dictionaries corresponds to different index informations, thus, user
Corresponding index information can be inputted while inputting word content to be converted, then available corresponding phonetic dictionary.
Through the foregoing embodiment, the phonetic dictionary under creation different application environment, voice style or special-purpose is realized
The purpose in library, also, corresponding phonetic dictionary on server is obtained according to index information, reduce accounting for for local storage space
With.
In an alternative embodiment, after above-mentioned server receives the dictinary information of upload, dictinary information is detected
In include pronunciation object format and/or pronunciation whether meet predetermined condition, if it is satisfied, then determination dictinary information is written
Corresponding index database.
Specifically, in the above-described embodiments, server is after receiving phonetic dictionary (user-oriented dictionary, UserDict), clothes
Legitimacy detection module on business device can check whether the format of dictionary and pronunciation are legal, will after legitimacy detection passes through
Dictinary information write service device, and all user-oriented dictionaries are compiled according to index information (for example, mark id) and are formed together
Phonetic dictionary library (user-oriented dictionary library, UserDicts).
Through the foregoing embodiment, verifying link is increased, the safety of system is improved.
In an alternative embodiment, as shown in figure 4, obtaining corresponding phonetic dictionary according to index information, including such as
Lower step:
Step S402 is inquired from server according to index information and is obtained corresponding phonetic dictionary;
Whether step S404, the attribute for detecting phonetic dictionary are legal;
Step S406, if legal, it is determined that phonetic dictionary corresponding with index information;
Step S408, if illegal, it is determined that inquiry failure returns to server and carries out inquiry operation, wherein such as
The number of query result failure or inquiry failure is more than pre-determined number to fruit in the given time, then abandons current inquiry request simultaneously
Export prompt information.
Specifically, in the above-described embodiments, during obtaining corresponding phonetic dictionary according to index information, root first
Phonetic dictionary corresponding with the index information is inquired from server according to the index information of content of text to be converted, find with
After the corresponding phonetic dictionary of the index information, whether the attribute for detecting the phonetic dictionary is legal, if legal, by the voice word
Allusion quotation is as phonetic dictionary corresponding with the index information, if the attribute of the phonetic dictionary is illegal, inquiry failure is then returned again
It returns server to be inquired, fail if it exceeds the predetermined time still inquires, or the number of inquiry failure is more than pre-determined number, then
Current inquiry request is abandoned, and exports prompt information.In a kind of optional embodiment, if not inquiring corresponding language
Sound dictionary, can be using default pronunciation.
As a kind of optional embodiment, as shown in Fig. 2, user is first by it before calling speech synthesis (TTS) service
User-oriented dictionary (user-oriented dictionary 1, user-oriented dictionary 1 ... user-oriented dictionary N) is uploaded onto the server, server receive user-oriented dictionary it
Afterwards, check whether the format of dictionary and pronunciation are legal by legitimacy detection module, it, can be by all use after legitimacy detection
Family dictionary is compiled according to index information is formed together user-oriented dictionary library, and in synthesis phase, user is in TTS (Text to
Speech the index information that dictionary is inputted while) inputting text to be synthesized in engine can synthesize the voice knot of this style
Fruit.
In an alternative embodiment, the above method further include: step S502 periodically downloads phonetic dictionary from server
To local, and the phonetic dictionary downloaded to is cached, so that during obtaining corresponding phonetic dictionary according to index information, such as
Fruit can not inquire corresponding phonetic dictionary in local cache, then forwarding inquiries request to obtain corresponding voice to server
Dictionary.
Specifically, in the above-described embodiments, phonetic dictionary is periodically downloaded from a server to local and cached, according to rope
During drawing the corresponding phonetic dictionary of acquisition of information, if corresponding phonetic dictionary can not be inquired in local cache,
Inquiry request is forwarded to server to obtain corresponding phonetic dictionary.
Through the foregoing embodiment, using the form of caching, the rate of speech synthesis is improved, and is utilized a large amount of on server
Phonetic dictionary carry out speech synthesis, ensure that the validity of speech synthesis.
By scheme disclosed in the above-mentioned each embodiment of the application, following technical effect may be implemented: one, pass through different fields
The method that scape corresponds to different user dictionary realizes that different word, phrase and sentences send out sound specific;Two, calling speech synthesis service
When synthesizing voice, corresponding pronunciation index information is inputted while inputting text to be synthesized, to realize diversified pronunciation.
Embodiment 3
According to embodiments of the present invention, a kind of system embodiment for synthesizing voice is additionally provided, Fig. 5 is to implement according to the present invention
The system schematic of a kind of synthesis voice of example, as shown in figure 5, the system includes: headend equipment 501 and server 503.
Wherein, headend equipment 501, text and index information to be converted for receiving input;
Server 503, connect with headend equipment, for receiving text and index information to be converted, and will be according to index
The phonetic dictionary of acquisition of information returns to headend equipment, wherein the corresponding phonetic dictionary of different index information characterizes different answer
With the sound producing pattern under environment;
The text and corresponding voice word that above-mentioned headend equipment 501 is also used to call speech synthesis service processing to be converted
Allusion quotation, the voice after generating synthesis.
Specifically, above-mentioned headend equipment can may be used to provide voice with computer, notebook, tablet computer, mobile phone etc.
The Intelligent mobile equipment of service;User can input text to be converted and specified application scenarios, voice wind by headend equipment
The index information of lattice or special-purpose, and server is sent to by headend equipment, server receive text to be converted and
After index information, corresponding phonetic dictionary is obtained according to index information, and the phonetic dictionary is back to headend equipment, front end is set
The standby voice for calling TTS service to synthesize text to be converted using the phonetic dictionary that server returns.
From the foregoing, it will be observed that in the above embodiments of the present application, by the voice word for establishing sound producing pattern under different application environment
Allusion quotation receives the text phonetic dictionary corresponding with the target voice of target voice to be converted during carrying out speech synthesis
Index information, corresponding with target voice phonetic dictionary is obtained according to the index information, is based on the phonetic dictionary, calling voice
Composite service synthesizes the target voice of text to be converted, and the target voice after synthesis is exported, and has reached and has closed in a voice
At in system by one text Content Transformation being the purpose of the voice under different application scene, to realize more intelligent and more
The technical effect of the speech synthesis service of sample, and then solve and then solve the voice that existing speech synthesis system generates
The not high technical problem of accuracy.
In an alternative embodiment, above-mentioned phonetic dictionary is for recording same pronunciation object under different application environments
Different pronunciations, wherein pronunciation object includes at least one following: word, word, phrase and sentence.
In an alternative embodiment, the dictinary information that above-mentioned headend equipment 501 is also used to upload phonetic dictionary extremely takes
Business device, wherein server stores at least one phonetic dictionary, and each phonetic dictionary includes the dictinary information uploaded, server
After receiving the dictinary information of upload, matched index information is generated, different phonetic dictionaries corresponds to different index letters
Breath.
In an alternative embodiment, after above-mentioned server 503 receives the dictinary information of upload, detection dictionary letter
Whether the format for the pronunciation object for including in breath and/or pronunciation meet predetermined condition, if it is satisfied, then dictinary information is write in determination
Enter corresponding index database.
In an alternative embodiment, above-mentioned headend equipment 501 is also used to be inquired from server according to index information
Obtain corresponding phonetic dictionary;Whether the attribute for detecting phonetic dictionary is legal;If legal, it is determined that corresponding with index information
Phonetic dictionary;If illegal, it is determined that inquiry failure returns to server and carries out inquiry operation, wherein if predetermined
The number of query result failure or inquiry failure is more than pre-determined number in time, then abandons current inquiry request and export prompt
Information.
In an alternative embodiment, above-mentioned headend equipment 501 be also used to timing from server download phonetic dictionary to
It is local, and the phonetic dictionary downloaded to is cached, so that during obtaining corresponding phonetic dictionary according to index information, if
Corresponding phonetic dictionary can not be inquired in local cache, then forwarding inquiries request to obtain corresponding voice word to server
Allusion quotation.
Embodiment 4
According to embodiments of the present invention, additionally provide it is a kind of for implementing the Installation practice of the method for above-mentioned synthesis voice,
Fig. 6 is a kind of schematic device for synthesizing voice according to an embodiment of the present invention, as shown in fig. 6, the device includes: receiving module
601, module 603 and generation module 605 are obtained.
Wherein, receiving module 601, for receiving text and index information to be converted;
Module 603 is obtained, for obtaining corresponding phonetic dictionary according to index information, wherein different index information is corresponding
Phonetic dictionary characterize the sound producing pattern under different application environments;
Generation module 605, text and corresponding phonetic dictionary for calling speech synthesis service processing to be converted generate
Voice after synthesis.
Herein it should be noted that above-mentioned receiving module 601, acquisition module 603 and generation module 605 can correspond to reality
The step S302 to step S306 in example 2 is applied, three modules are identical as example and application scenarios that corresponding step is realized, but
It is not limited to the above embodiments 2 disclosure of that.
From the foregoing, it will be observed that in the above embodiments of the present application, by the voice word for establishing sound producing pattern under different application environment
Allusion quotation receives the text phonetic dictionary corresponding with the target voice of target voice to be converted during carrying out speech synthesis
Index information, corresponding with target voice phonetic dictionary is obtained according to the index information, is based on the phonetic dictionary, calling voice
Composite service synthesizes the target voice of text to be converted, and the target voice after synthesis is exported, and has reached and has closed in a voice
At in system by one text Content Transformation being the purpose of the voice under different application scene, to realize more intelligent and more
The technical effect of the speech synthesis service of sample, and then solve and then solve the voice that existing speech synthesis system generates
The not high technical problem of accuracy.
In an alternative embodiment, above-mentioned phonetic dictionary is for recording same pronunciation object under different application environments
Different pronunciations, wherein pronunciation object includes at least one following: word, word, phrase and sentence.
In an alternative embodiment, above-mentioned apparatus further include: uploading module, the dictionary for uploading phonetic dictionary are believed
It ceasing to server, wherein server stores at least one phonetic dictionary, and each phonetic dictionary includes the dictinary information uploaded,
Server generates matched index information after receiving the dictinary information of upload, and different phonetic dictionaries corresponds to different
Index information.
Herein it should be noted that uploading module can correspond to the step S303 in embodiment 2, the module with it is corresponding
The example that step is realized is identical with application scenarios, but is not limited to the above embodiments 2 disclosure of that.
In an alternative embodiment, after above-mentioned server receives the dictinary information of upload, dictinary information is detected
In include pronunciation object format and/or pronunciation whether meet predetermined condition, if it is satisfied, then determination dictinary information is written
Corresponding index database.
In an alternative embodiment, above-mentioned acquisition module further include: enquiry module, for according to index information from clothes
Inquiry obtains corresponding phonetic dictionary in business device;Whether detection module, the attribute for detecting phonetic dictionary are legal;First executes
Module, if for legal, it is determined that phonetic dictionary corresponding with index information;Second execution module, if for illegal,
Then determine inquiry failure, return to server carry out inquiry operation, wherein if in the given time query result failure or
The number of inquiry failure is more than pre-determined number, then abandons current inquiry request and export prompt information.
Herein it should be noted that enquiry module, detection module, the first execution module and the second execution module can correspond to
Step S402 to step S408 in embodiment 2, the example and application scenarios phase that four modules are realized with corresponding step
Together, but 2 disclosure of that are not limited to the above embodiments.
In an alternative embodiment, above-mentioned apparatus is also used to timing from server downloading phonetic dictionary to local, and
The phonetic dictionary downloaded to is cached, so that during obtaining corresponding phonetic dictionary according to index information, if in local
Corresponding phonetic dictionary can not be inquired in caching, then forwarding inquiries request to obtain corresponding phonetic dictionary to server.
It should be noted that for the various method embodiments described above, for simple description, therefore, it is stated as a series of
Combination of actions, but those skilled in the art should understand that, the present invention is not limited by the sequence of acts described because
According to the present invention, some steps may be performed in other sequences or simultaneously.Secondly, those skilled in the art should also know
It knows, the embodiments described in the specification are all preferred embodiments, and related actions and modules is not necessarily of the invention
It is necessary.
Through the above description of the embodiments, those skilled in the art can be understood that according to above-mentioned implementation
The method of the synthesis voice of example can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hard
Part, but the former is more preferably embodiment in many cases.Based on this understanding, technical solution of the present invention substantially or
Say that the part that contributes to existing technology can be embodied in the form of software products, which is stored in
In one storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that a terminal device (can be hand
Machine, computer, server or network equipment etc.) execute method described in each embodiment of the present invention.
Embodiment 5
The embodiment of the present invention can provide a kind of terminal, which can be in terminal group
Any one computer terminal.Optionally, in the present embodiment, above-mentioned terminal also could alternatively be mobile whole
The terminal devices such as end.
Optionally, in the present embodiment, above-mentioned terminal can be located in multiple network equipments of computer network
At least one network equipment.
Fig. 7 shows a kind of hardware block diagram of terminal.As shown in fig. 7, terminal 70 may include
(processor 702 may include but not for one or more (to use 702a, 702b ... ... in figure, 702n to show) processor 702
Be limited to the processing unit of Micro-processor MCV or programmable logic device FPGA etc.), memory 704 for storing data and
Transmitting device 706 for communication function.In addition to this, can also include: display, input/output interface (I/O interface),
Port universal serial bus (USB) (a port that can be used as in the port of I/O interface is included), network interface, power supply
And/or camera.It will appreciated by the skilled person that structure shown in Fig. 7 is only to illustrate, above-mentioned electronics is not filled
The structure set causes to limit.For example, terminal 70 may also include than shown in Fig. 7 more perhaps less component or
With the configuration different from shown in Fig. 7.
It is to be noted that said one or multiple processors 702 and/or other data processing circuits lead to herein
Can often " data processing circuit " be referred to as.The data processing circuit all or part of can be presented as software, hardware, firmware
Or any other combination.In addition, data processing circuit for single independent processing module or all or part of can be integrated to meter
In any one in other elements in calculation machine terminal 70.As involved in the embodiment of the present application, data processing electricity
Road controls (such as the selection for the variable resistance end path connecting with interface) as a kind of processor.
Processor 702 can call the information and application program of memory storage by transmitting device, to execute following steps
It is rapid: to obtain the sliding window sequence of key, wherein sliding window sequence includes: the multiple sliding windows for obtain after slide window processing to key;It is right
At least one sliding window in sliding window sequence carries out scrambling processing, the sliding window sequence after being scrambled;Sliding window sequence after traversal scrambling
Column, post-process the sliding window sequence after scrambling using Montgomery modular multiplier.
Memory 704 can be used for storing the software program and module of application software, such as the key in the embodiment of the present invention
The corresponding program instruction/data storage device of processing method, processor 702 by operation be stored in it is soft in memory 704
Part program and module realize the key of above-mentioned application program thereby executing various function application and data processing
Processing method.Memory 704 may include high speed random access memory, may also include nonvolatile memory, such as one or more
Magnetic storage device, flash memory or other non-volatile solid state memories.In some instances, memory 704 can be wrapped further
The memory remotely located relative to processor 702 is included, these remote memories can pass through network connection to terminal
70.The example of above-mentioned network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
Transmitting device 706 is used to that data to be received or sent via a network.Above-mentioned network specific example may include
The wireless network that the communication providers of terminal 70 provide.In an example, transmitting device 706 includes that a network is suitable
Orchestration (Network Interface Controller, NIC), can be connected by base station with other network equipments so as to
Internet is communicated.In an example, transmitting device 706 can be radio frequency (Radio Frequency, RF) module,
For wirelessly being communicated with internet.
Display can such as touch-screen type liquid crystal display (LCD), the liquid crystal display aloow user with
The user interface of terminal 70 interacts.
Herein it should be noted that in some optional embodiments, above-mentioned terminal 70 shown in Fig. 7 may include
Hardware element (including circuit), software element (including the computer code that may be stored on the computer-readable medium) or hardware member
The combination of both part and software element.It should be pointed out that Fig. 7 is only an example of particular embodiment, and it is intended to show
It may be present in the type of the component in above-mentioned terminal 70 out.
In the present embodiment, above-mentioned terminal 70 can be with following step in the method for the synthesis voice of executing application
Rapid program code: problem currently entered is received;At least one candidate answers of problem are obtained based on retrieval model, and are based on
Generate the first answer that model obtains problem, wherein retrieval model is the model that result is obtained based on search technique, generates model
For the model for obtaining result based on training pattern;Assessment processing is carried out according at least to the first answer and at least one candidate answers,
The output answer of generation problem.
Optionally, processor can call the information and application program of memory storage by transmitting device, under executing
It states step: receiving text and index information to be converted;Corresponding phonetic dictionary is obtained according to index information, wherein different ropes
Fuse ceases corresponding phonetic dictionary and characterizes sound producing pattern under different application environments;Call speech synthesis service processing to be converted
Text and corresponding phonetic dictionary, generate synthesis after voice.
Optionally, above-mentioned phonetic dictionary is used to record the different pronunciations of same pronunciation object under different application environments,
In, pronunciation object includes at least one following: word, word, phrase and sentence.
Optionally, the program code of following steps can also be performed in above-mentioned processor: uploading the dictinary information of phonetic dictionary
To server, wherein server stores at least one phonetic dictionary, and each phonetic dictionary includes the dictinary information uploaded, clothes
Device be engaged in after receiving the dictinary information of upload, generates matched index information, different phonetic dictionaries corresponds to different ropes
Fuse breath.
Optionally, after server receives the dictinary information of upload, following steps are can also be performed in above-mentioned processor
Program code: whether the format for the pronunciation object for including in detection dictinary information and/or pronunciation meet predetermined condition, if full
Foot, it is determined that corresponding index database is written into dictinary information.
Optionally, the program code of following steps can also be performed in above-mentioned processor: according to index information from server
Inquiry obtains corresponding phonetic dictionary;Whether the attribute for detecting phonetic dictionary is legal;If legal, it is determined that with index information pair
The phonetic dictionary answered;If illegal, it is determined that inquiry failure returns to server and carries out inquiry operation, wherein if
The number of query result failure or inquiry failure is more than pre-determined number in predetermined time, then abandons current inquiry request and output
Prompt information.
Optionally, the program code of following steps can also be performed in above-mentioned processor: periodically downloading voice word from server
Allusion quotation caches the phonetic dictionary downloaded to local, so that during obtaining corresponding phonetic dictionary according to index information,
If can not inquire corresponding phonetic dictionary in local cache, forwarding inquiries request to obtain corresponding language to server
Sound dictionary.
It will appreciated by the skilled person that structure shown in Fig. 7 is only to illustrate, terminal is also possible to intelligence
It can mobile phone (such as Android phone, iOS mobile phone), tablet computer, applause computer and mobile internet device (Mobile
Internet Devices, MID), the terminal devices such as PAD.Fig. 7 it does not cause to limit to the structure of above-mentioned electronic device.Example
Such as, terminal 70 may also include the more or less component (such as network interface, display device) than shown in Fig. 7,
Or with the configuration different from shown in Fig. 7.
Those of ordinary skill in the art will appreciate that all or part of the steps in the various methods of above-described embodiment is can
It is completed with instructing the relevant hardware of terminal device by program, which can store in a computer readable storage medium
In, storage medium may include: flash disk, read-only memory (Read-Only Memory, ROM), random access device (Random
Access Memory, RAM), disk or CD etc..
Embodiment 6
The embodiments of the present invention also provide a kind of storage mediums.Optionally, in the present embodiment, above-mentioned storage medium can
To synthesize program code performed by the method for voice provided by above-described embodiment one for saving.
Optionally, in the present embodiment, above-mentioned storage medium can be located in computer network in computer terminal group
In any one terminal, or in any one mobile terminal in mobile terminal group.
Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps: connecing
Receive text and index information to be converted;Corresponding phonetic dictionary is obtained according to index information, wherein different index information is corresponding
Phonetic dictionary characterize the sound producing pattern under different application environments;Call speech synthesis service processing text to be converted and right
The phonetic dictionary answered, the voice after generating synthesis.
Optionally, above-mentioned phonetic dictionary is used to record the different pronunciations of same pronunciation object under different application environments,
In, pronunciation object includes at least one following: word, word, phrase and sentence.
Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps:
The dictinary information of phonetic dictionary is passed to server, wherein server stores at least one phonetic dictionary, each phonetic dictionary packet
The dictinary information of upload is included, server generates matched index information, different languages after receiving the dictinary information of upload
Sound dictionary corresponds to different index informations.
Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps: inspection
Whether the format and/or pronunciation for surveying the pronunciation object for including in dictinary information meet predetermined condition, if it is satisfied, then determining word
Corresponding index database is written in allusion quotation information.
Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps: root
It is inquired from server according to index information and obtains corresponding phonetic dictionary;Whether the attribute for detecting phonetic dictionary is legal;If closed
Method, it is determined that phonetic dictionary corresponding with index information;If illegal, it is determined that inquiry failure, return to server into
Row inquiry operation, wherein if the number of query result failure or inquiry failure is more than pre-determined number in the given time, throw
It abandons current inquiry request and exports prompt information.
Optionally, in the present embodiment, storage medium is arranged to store the program code for executing following steps: fixed
When from server download phonetic dictionary to local, and the phonetic dictionary downloaded to is cached, so that obtaining pair according to index information
During the phonetic dictionary answered, if corresponding phonetic dictionary can not be inquired in local cache, forwarding inquiries request
Corresponding phonetic dictionary is obtained to server.
Embodiment 7
The embodiments of the present invention also provide a kind of embodiments of the method for synthesizing voice, can be applied to be related to text
Information is converted in the various speech synthesis systems or equipment of voice, including but not limited to the application scenarios of embodiment 1.It needs
Bright, step shown in the flowchart of the accompanying drawings can be held in a computer system such as a set of computer executable instructions
Row, although also, logical order is shown in flow charts, and it in some cases, can be to be different from sequence herein
Execute shown or described step.
Fig. 8 is a kind of flow chart of method for synthesizing voice according to an embodiment of the present invention, as shown in figure 8, including as follows
Step:
Step S802 receives text and index information to be converted;
Step S804 obtains corresponding phonetic dictionary according to index information, wherein the corresponding voice word of different index information
Allusion quotation characterizes the sound producing pattern under different application environments;
Step S806 handles text to be converted and corresponding phonetic dictionary, the voice after generating synthesis.
Specifically, in above-mentioned steps, above-mentioned text to be converted can be the text inputted by input units such as keyboards
This information is also possible to the text information of computer-internal generation, can also be certain some application or the service (example based on Web
Such as, Baidu translation translation) return text information, the form of text be not limited to Chinese, English, can be it is any country
Language;Above-mentioned index information can be above-mentioned to be pre-set for indexing the identification information of one or more phonetic dictionaries
Phonetic dictionary can be for for being the mould that pronounces under different application environment, voice style or special-purpose by text conversion to be converted
The sound bank of formula contains content of text and voice messaging corresponding with text content to be converted in the sound bank;It is connecing
After receiving the text information and index information to be converted for target voice, according to the index information of the target voice, get with
Then the corresponding phonetic dictionary of the target voice is handled text to be converted and corresponding phonetic dictionary, generating should be to
The corresponding voice of the text of conversion.
It should be noted that being, phonetic dictionary can be the phonetic dictionary being locally stored, the language being also possible on server
Sound dictionary.
From the foregoing, it will be observed that in the above embodiments of the present application, by the voice word for establishing sound producing pattern under different application environment
Allusion quotation receives the text phonetic dictionary corresponding with the target voice of target voice to be converted during carrying out speech synthesis
Index information, corresponding with target voice phonetic dictionary is obtained according to the index information, is based on the phonetic dictionary, synthesize to turn
The target voice of exchange of notes sheet, and the target voice after synthesis is exported, reach same text in a speech synthesis system
This Content Transformation is the purpose of the voice under different application scene, to realize more intelligence and diversified speech synthesis clothes
The technical effect of business, and then solve and then solve the not high technology of voice accuracy that existing speech synthesis system generates
Problem.
Embodiment 8
According to embodiments of the present invention, additionally provide it is a kind of for implementing the Installation practice of the method for above-mentioned synthesis voice,
Fig. 9 is a kind of schematic device for synthesizing voice according to an embodiment of the present invention, as shown in figure 9, the device includes: receiving unit
901, acquiring unit 903 and generation unit 905.
Wherein, receiving unit 901, for receiving text and index information to be converted;
Acquiring unit 903, for obtaining corresponding phonetic dictionary according to index information, wherein different index information is corresponding
Phonetic dictionary characterize the sound producing pattern under different application environments;
Generation unit 905, for handling text to be converted and corresponding phonetic dictionary, the language after generating synthesis
Sound.
Herein it should be noted that above-mentioned receiving unit 901, acquiring unit 903 and generation unit 905 can correspond to reality
The step S802 to step S806 in example 7 is applied, three modules are identical as example and application scenarios that corresponding step is realized, but
It is not limited to the above embodiments 7 disclosure of that.
From the foregoing, it will be observed that in the above embodiments of the present application, by the voice word for establishing sound producing pattern under different application environment
Allusion quotation receives the text and the target voice of target voice to be converted by receiving unit 901 during carrying out speech synthesis
The index information of corresponding phonetic dictionary obtains language corresponding with target voice according to the index information by acquiring unit 903
Sound dictionary is based on the phonetic dictionary finally by generation unit 905, synthesizes the target voice of text to be converted, reached one
In a speech synthesis system by one text Content Transformation be different application scene under voice purpose, to realize more
The technical effect of intelligent and diversified speech synthesis service, and then solve and then solve existing speech synthesis system life
At the not high technical problem of voice accuracy.
Embodiment 9
According to embodiments of the present invention, a kind of system embodiment is additionally provided, which includes: processor;And memory,
It is connect with processor, for providing the instruction for handling following processing step for processor:
Receive text and index information to be converted;
Corresponding phonetic dictionary is obtained according to index information, wherein the corresponding phonetic dictionary characterization of different index information is not
Sound producing pattern under same application environment;
Speech synthesis service processing text to be converted and corresponding phonetic dictionary are called, the voice after generating synthesis.
From the foregoing, it will be observed that in the above embodiments of the present application, by the voice word for establishing sound producing pattern under different application environment
Allusion quotation receives the text phonetic dictionary corresponding with the target voice of target voice to be converted during carrying out speech synthesis
Index information, corresponding with target voice phonetic dictionary is obtained according to the index information, is based on the phonetic dictionary, calling voice
Composite service synthesizes the target voice of text to be converted, and the target voice after synthesis is exported, and has reached and has closed in a voice
At in system by one text Content Transformation being the purpose of the voice under different application scene, to realize more intelligent and more
The technical effect of the speech synthesis service of sample, and then solve and then solve the voice that existing speech synthesis system generates
The not high technical problem of accuracy.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment
The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others
Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, only
A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or
Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual
Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module
It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or
Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code
Medium.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (15)
1. a kind of equipment for synthesizing voice characterized by comprising
Input unit, for receiving text and index information to be converted;
Processor for obtaining corresponding phonetic dictionary according to the index information, and calls described in speech synthesis service processing
Text to be converted and corresponding phonetic dictionary, the voice after generating synthesis, wherein the corresponding phonetic dictionary of different index information
Characterize the sound producing pattern under different application environments;
Pronunciation device, for exporting the voice after the synthesis.
2. a kind of method for synthesizing voice characterized by comprising
Receive text and index information to be converted;
Corresponding phonetic dictionary is obtained according to the index information, wherein the corresponding phonetic dictionary characterization of different index information is not
Sound producing pattern under same application environment;
Text to be converted and corresponding phonetic dictionary described in speech synthesis service processing are called, the voice after generating synthesis.
3. according to the method described in claim 2, it is characterized in that, the phonetic dictionary is for recording under different application environments
The different pronunciations of same pronunciation object, wherein the pronunciation object includes at least one following: word, word, phrase and sentence.
4. according to the method described in claim 2, it is characterized in that, obtaining corresponding phonetic dictionary according to the index information
Before, the method also includes:
The dictinary information of phonetic dictionary is uploaded to server, wherein the server stores at least one phonetic dictionary, each
Phonetic dictionary includes the dictinary information uploaded, and the server generates matched rope after receiving the dictinary information of upload
Fuse breath, different phonetic dictionaries correspond to different index informations.
5. according to the method described in claim 4, it is characterized in that, the server is in the dictinary information for receiving upload
Later, whether the format and/or pronunciation for detecting the pronunciation object for including in the dictinary information meet predetermined condition, if full
Foot, it is determined that corresponding index database is written into the dictinary information.
6. method according to claim 4 or 5, which is characterized in that obtain corresponding voice word according to the index information
Allusion quotation, comprising:
It is inquired from the server according to the index information and obtains corresponding phonetic dictionary;
Whether the attribute for detecting the phonetic dictionary is legal;
If legal, it is determined that phonetic dictionary corresponding with the index information;
If illegal, it is determined that inquiry failure returns to the server and carries out inquiry operation, wherein if predetermined
The number of query result failure or inquiry failure is more than pre-determined number in time, then abandons current inquiry request and export prompt
Information.
7. according to the method described in claim 4, it is characterized in that, timing downloads phonetic dictionary to local from the server,
And the phonetic dictionary downloaded to is cached, so that during obtaining corresponding phonetic dictionary according to the index information, if
Corresponding phonetic dictionary can not be inquired in local cache, then forwarding inquiries request to obtain the corresponding language to server
Sound dictionary.
8. a kind of system for synthesizing voice characterized by comprising
Headend equipment, text and index information to be converted for receiving input;
Server is connect with the headend equipment, for receiving the text and index information to be converted, and will be according to described
The phonetic dictionary that index information obtains returns to headend equipment, wherein the corresponding phonetic dictionary characterization of different index information is different
Application environment under sound producing pattern;
The headend equipment is also used to call text to be converted described in speech synthesis service processing and corresponding phonetic dictionary, raw
At the voice after synthesis.
9. a kind of device for synthesizing voice characterized by comprising
Receiving module, for receiving text and index information to be converted;
Module is obtained, for obtaining corresponding phonetic dictionary according to the index information, wherein the corresponding language of different index information
Sound dictionary characterizes the sound producing pattern under different application environments;
Generation module is generated and is closed for calling text to be converted described in speech synthesis service processing and corresponding phonetic dictionary
Voice after.
10. device according to claim 9, which is characterized in that the acquisition module includes:
Enquiry module obtains corresponding phonetic dictionary for inquiring from server according to the index information;
Whether detection module, the attribute for detecting the phonetic dictionary are legal;
First execution module, if for legal, it is determined that phonetic dictionary corresponding with the index information;
Second execution module, if for illegal, it is determined that inquiry failure returns to the server and carries out inquiry behaviour
Make, wherein if the number of query result failure or inquiry failure is more than pre-determined number in the given time, abandon current
Inquiry request simultaneously exports prompt information.
11. a kind of method for synthesizing voice characterized by comprising
Receive text and index information to be converted;
Corresponding phonetic dictionary is obtained according to the index information, wherein the corresponding phonetic dictionary characterization of different index information is not
Sound producing pattern under same application environment;
The text to be converted and corresponding phonetic dictionary are handled, the voice after generating synthesis.
12. a kind of device for synthesizing voice characterized by comprising
Receiving unit, for receiving text and index information to be converted;
Acquiring unit, for obtaining corresponding phonetic dictionary according to the index information, wherein the corresponding language of different index information
Sound dictionary characterizes the sound producing pattern under different application environments;
Generation unit, for handling the text to be converted and corresponding phonetic dictionary, the voice after generating synthesis.
13. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program
When control the storage medium where equipment perform claim require any one of 2 to 7 described in synthesis voice method.
14. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run
Benefit require any one of 2 to 7 described in synthesize voice method.
15. a kind of system for synthesizing voice characterized by comprising
Processor;And
Memory is connected to the processor, for providing the instruction for handling following processing step for the processor:
Receive text and index information to be converted;
Corresponding phonetic dictionary is obtained according to the index information, wherein the corresponding phonetic dictionary characterization of different index information is not
Sound producing pattern under same application environment;
Text to be converted and corresponding phonetic dictionary described in speech synthesis service processing are called, the voice after generating synthesis.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710508321.6A CN109147760A (en) | 2017-06-28 | 2017-06-28 | Synthesize method, apparatus, system and the equipment of voice |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710508321.6A CN109147760A (en) | 2017-06-28 | 2017-06-28 | Synthesize method, apparatus, system and the equipment of voice |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109147760A true CN109147760A (en) | 2019-01-04 |
Family
ID=64803493
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710508321.6A Pending CN109147760A (en) | 2017-06-28 | 2017-06-28 | Synthesize method, apparatus, system and the equipment of voice |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109147760A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110211564A (en) * | 2019-05-29 | 2019-09-06 | 泰康保险集团股份有限公司 | Phoneme synthesizing method and device, electronic equipment and computer-readable medium |
CN111145719A (en) * | 2019-12-31 | 2020-05-12 | 北京太极华保科技股份有限公司 | Data labeling method and device for Chinese-English mixing and tone labeling |
CN111160044A (en) * | 2019-12-31 | 2020-05-15 | 出门问问信息科技有限公司 | Text-to-speech conversion method and device, terminal and computer readable storage medium |
CN111402859A (en) * | 2020-03-02 | 2020-07-10 | 问问智能信息科技有限公司 | Voice dictionary generation method and device and computer readable storage medium |
CN111414732A (en) * | 2019-01-07 | 2020-07-14 | 北京嘀嘀无限科技发展有限公司 | Text style conversion method and device, electronic equipment and storage medium |
CN111768755A (en) * | 2020-06-24 | 2020-10-13 | 华人运通(上海)云计算科技有限公司 | Information processing method, information processing apparatus, vehicle, and computer storage medium |
US20210134295A1 (en) * | 2017-08-10 | 2021-05-06 | Facet Labs, Llc | Oral communication device and computing system for processing data and outputting user feedback, and related methods |
CN112927675A (en) * | 2019-11-20 | 2021-06-08 | 阿里巴巴集团控股有限公司 | Dictionary generation method, device and system for voice synthesis, and voice synthesis method, device and system |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH09245021A (en) * | 1996-03-11 | 1997-09-19 | Matsushita Electric Ind Co Ltd | Speech synthesizing device |
CN1161529A (en) * | 1995-11-30 | 1997-10-08 | 冲电气工业株式会社 | Text voice readup system |
CN102867512A (en) * | 2011-07-04 | 2013-01-09 | 余喆 | Method and device for recognizing natural speech |
CN104881403A (en) * | 2015-06-04 | 2015-09-02 | 百度在线网络技术(北京)有限公司 | Word segmentation method and device |
-
2017
- 2017-06-28 CN CN201710508321.6A patent/CN109147760A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1161529A (en) * | 1995-11-30 | 1997-10-08 | 冲电气工业株式会社 | Text voice readup system |
JPH09245021A (en) * | 1996-03-11 | 1997-09-19 | Matsushita Electric Ind Co Ltd | Speech synthesizing device |
CN102867512A (en) * | 2011-07-04 | 2013-01-09 | 余喆 | Method and device for recognizing natural speech |
CN104881403A (en) * | 2015-06-04 | 2015-09-02 | 百度在线网络技术(北京)有限公司 | Word segmentation method and device |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20210134295A1 (en) * | 2017-08-10 | 2021-05-06 | Facet Labs, Llc | Oral communication device and computing system for processing data and outputting user feedback, and related methods |
US11763811B2 (en) * | 2017-08-10 | 2023-09-19 | Facet Labs, Llc | Oral communication device and computing system for processing data and outputting user feedback, and related methods |
CN111414732A (en) * | 2019-01-07 | 2020-07-14 | 北京嘀嘀无限科技发展有限公司 | Text style conversion method and device, electronic equipment and storage medium |
CN110211564A (en) * | 2019-05-29 | 2019-09-06 | 泰康保险集团股份有限公司 | Phoneme synthesizing method and device, electronic equipment and computer-readable medium |
CN112927675A (en) * | 2019-11-20 | 2021-06-08 | 阿里巴巴集团控股有限公司 | Dictionary generation method, device and system for voice synthesis, and voice synthesis method, device and system |
CN111145719A (en) * | 2019-12-31 | 2020-05-12 | 北京太极华保科技股份有限公司 | Data labeling method and device for Chinese-English mixing and tone labeling |
CN111160044A (en) * | 2019-12-31 | 2020-05-15 | 出门问问信息科技有限公司 | Text-to-speech conversion method and device, terminal and computer readable storage medium |
CN111145719B (en) * | 2019-12-31 | 2022-04-05 | 北京太极华保科技股份有限公司 | Data labeling method and device for Chinese-English mixing and tone labeling |
CN111402859A (en) * | 2020-03-02 | 2020-07-10 | 问问智能信息科技有限公司 | Voice dictionary generation method and device and computer readable storage medium |
CN111402859B (en) * | 2020-03-02 | 2023-10-27 | 问问智能信息科技有限公司 | Speech dictionary generating method, equipment and computer readable storage medium |
CN111768755A (en) * | 2020-06-24 | 2020-10-13 | 华人运通(上海)云计算科技有限公司 | Information processing method, information processing apparatus, vehicle, and computer storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109147760A (en) | Synthesize method, apparatus, system and the equipment of voice | |
US11074904B2 (en) | Speech synthesis method and apparatus based on emotion information | |
US7596499B2 (en) | Multilingual text-to-speech system with limited resources | |
JP5598998B2 (en) | Speech translation system, first terminal device, speech recognition server device, translation server device, and speech synthesis server device | |
US7386449B2 (en) | Knowledge-based flexible natural speech dialogue system | |
CN101030368B (en) | Method and system for communicating across channels simultaneously with emotion preservation | |
KR102321789B1 (en) | Speech synthesis method based on emotion information and apparatus therefor | |
CN111402894B (en) | Speech recognition method and electronic equipment | |
CN105190614A (en) | Search results using intonation nuances | |
CN114757176B (en) | Method for acquiring target intention recognition model and intention recognition method | |
CN112154465A (en) | Method, device and equipment for learning intention recognition model | |
CN108682420A (en) | A kind of voice and video telephone accent recognition method and terminal device | |
JP6625772B2 (en) | Search method and electronic device using the same | |
CN110544470A (en) | voice recognition method and device, readable storage medium and electronic equipment | |
JP2021096847A (en) | Recommending multimedia based on user utterance | |
CN104347081B (en) | A kind of method and apparatus of test scene saying coverage | |
CN109949814A (en) | Audio recognition method, system, computer system and computer readable storage medium | |
US11551012B2 (en) | Apparatus and method for providing personal assistant service based on automatic translation | |
CN110287498A (en) | Stratification interpretation method, device and storage medium | |
CN110534115A (en) | Recognition methods, device, system and the storage medium of multi-party speech mixing voice | |
CN115762471A (en) | Voice synthesis method, device, equipment and storage medium | |
CN114860910A (en) | Intelligent dialogue method and system | |
KR20220140301A (en) | Video learning systems for enable learners to be identified through artificial intelligence and method thereof | |
KR20210117827A (en) | Voice service supply system and supply method using artificial intelligence | |
KR102376552B1 (en) | Voice synthetic apparatus and voice synthetic method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190104 |