CN109448694A - A kind of method and device of rapid synthesis TTS voice - Google Patents
A kind of method and device of rapid synthesis TTS voice Download PDFInfo
- Publication number
- CN109448694A CN109448694A CN201811611687.7A CN201811611687A CN109448694A CN 109448694 A CN109448694 A CN 109448694A CN 201811611687 A CN201811611687 A CN 201811611687A CN 109448694 A CN109448694 A CN 109448694A
- Authority
- CN
- China
- Prior art keywords
- voice
- strategy
- text information
- high frequency
- tts voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 51
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 51
- 230000004044 response Effects 0.000 claims abstract description 70
- 230000015654 memory Effects 0.000 claims description 19
- 238000004590 computer program Methods 0.000 claims description 7
- 238000004891 communication Methods 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims 1
- 230000002194 synthesizing effect Effects 0.000 claims 1
- 230000002452 interceptive effect Effects 0.000 abstract description 20
- 230000006870 function Effects 0.000 description 13
- 230000008569 process Effects 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 2
- 238000010295 mobile communication Methods 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 230000007474 system interaction Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L13/00—Speech synthesis; Text to speech systems
- G10L13/02—Methods for producing synthetic speech; Speech synthesisers
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
Abstract
A kind of method that the present invention discloses rapid synthesis TTS voice includes the following steps: to obtain response text information;Convergence strategy is determined according to response text information;TTS voice is generated according to determining convergence strategy.The invention also discloses a kind of devices of rapid synthesis TTS voice.The methods and apparatus disclosed may be implemented to reduce the interactive voice time of intelligent sound equipment and user according to the present invention, to improve the function of interactive voice, and under the lower hardware condition of device configuration, can also be supplied to the perfect interactive voice experience of client.
Description
Technical field
The present invention relates to technical field of voice interaction, especially a kind of method and device of rapid synthesis TTS voice.
Background technique
With the continuous development of interactive voice technology, interactive voice using more and more, interactive voice in the prior art
The realization principle of technology is as follows: user, which speaks, issues phonetic order, and equipment identifies phonetic order, carries out to the phonetic order semantic
Understand, the text information of response this phonetic order is needed according to semantic output, text information is converted into TTS voice and is played out
Come, to realize the interactive voice between intelligent sound equipment and user, can achieve in this way and ask and can answer, that is, realize man-machine stream
It is smooth to link up.
But in this interactive voice scene, TTS aggregate velocity is to influence the important step of user experience.Especially existing
Have in technology, the hardware configuration that can carry voice technology is irregular, this results in needing voice interactive function that can be adapted to respectively
The type of the high configuration of kind or low configuration frequently can lead to the speed of TTS synthesis for the type of low configuration during interactive voice
Degree is slower, influences the interactive voice experience of user.
Summary of the invention
To solve the above-mentioned problems, it is contemplated that from TTS synthesis process, TTS conjunction is carried out by convergence strategy
At processing, to improve the response speed of voice.
According to the first aspect of the invention, a kind of method of rapid synthesis TTS voice is provided, is included the following steps:
Obtain response text information;
Convergence strategy is determined according to response text information;
TTS voice is generated according to determining convergence strategy.
According to the second aspect of the invention, a kind of device of rapid synthesis TTS voice is provided, comprising:
Response message obtains module, for obtaining response text information;
Tactful determining module, for determining convergence strategy according to response text information;
Voice output module, for generating TTS voice according to determining convergence strategy.
According to the third aspect of the present invention, a kind of electronic equipment is provided comprising: at least one processor, and
The memory being connect at least one processor communication, wherein memory is stored with the finger that can be executed by least one processor
It enables, instruction is executed by least one processor, so that the step of at least one processor is able to carry out the above method.
According to the fourth aspect of the present invention, a kind of storage medium is provided, computer program is stored thereon with, the program
The step of above method is realized when being executed by processor.
Device and method provided by the invention carry out TTS synthesis processing by convergence strategy, and convergence strategy is to be based on
Response text information determines, it is thus possible to carry out flexible speech synthesis processing based on response message, may be implemented to reduce
The interactive voice time of intelligent sound equipment and user, to improve the function of interactive voice.Also, based on provided by the invention
Device and method can also be supplied to the perfect interactive voice experience of client under the lower hardware condition of device configuration.
Detailed description of the invention
Fig. 1 is the method flow diagram of the rapid synthesis TTS voice of an embodiment of the present invention;
Fig. 2 is the device principle block diagram of the rapid synthesis TTS voice of an embodiment of the present invention;
Fig. 3 is the device principle block diagram of the rapid synthesis TTS voice of a further embodiment of this invention;
Fig. 4 is the electronic device block diagram of an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention
In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is
A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art
Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase
Mutually combination.
The present invention can describe in the general context of computer-executable instructions executed by a computer, such as program
Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, member
Part, data structure etc..The present invention can also be practiced in a distributed computing environment, in these distributed computing environments, by
Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with
In the local and remote computer storage media including storage equipment.
In the present invention, the fingers such as " module ", " device ", " system " are applied to the related entities of computer, such as hardware, hardware
Combination, software or software in execution with software etc..In detail, for example, element can with but be not limited to run on processing
Process, processor, object, executable element, execution thread, program and/or the computer of device.In addition, running on server
Application program or shell script, server can be element.One or more elements can be in the process and/or thread of execution
In, and element can be localized and/or be distributed between two or multiple stage computers on one computer, and can be by each
Kind computer-readable medium operation.Element can also according to the signal with one or more data packets, for example, from one with
Another element interacts in local system, distributed system, and/or the network in internet passes through signal and other system interactions
The signals of data communicated by locally and/or remotely process.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by
One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation
Between there are any actual relationship or orders.Moreover, the terms "include", "comprise", not only include those elements, and
And further include other elements that are not explicitly listed, or further include for this process, method, article or equipment institute it is intrinsic
Element.In the absence of more restrictions, the element limited by sentence " including ... ", it is not excluded that including described want
There is also other identical elements in the process, method, article or equipment of element.
The method of the rapid synthesis TTS voice of the embodiment of the present invention can be applied to any terminal for being configured with phonetic function
Equipment, for example, the terminal devices such as smart phone, tablet computer, smart home, the invention is not limited in this regard.So as to make
It obtains user and obtains response more promptly and accurately during using these terminal devices, promote user experience.
The invention will now be described in further detail with reference to the accompanying drawings.
Fig. 1 schematically shows a kind of method flow of the rapid synthesis TTS voice of embodiment according to the present invention
Figure.As shown in Figure 1, the present embodiment includes the following steps:
Step S101: response text information is obtained.Response text information is the text information for needing response, illustratively,
It can be in speech recognition process according to the response content of text of semanteme output.The mode of acquisition can be according to its application scenarios
It realizes referring to the prior art, such as when being during interactive voice, can be obtained from database according to speech recognition result
Preconfigured response text information is taken, is also possible to provide the calling interface of response text information, directly be connect from calling interface
Receive the response text information that input is come in.
Step S102: convergence strategy is determined according to response text information, wherein convergence strategy includes high frequency strategy, local
Synthetic strategy and cloud synthetic strategy.
Illustratively, configuring high-frequency sound bank first, includes the high corpus of frequency of use in High frequency speech library and its is right
The voice answered, wherein when carrying out the configuration in High frequency speech library, can rule of thumb determine the high corpus of frequency of use and its right
The voice answered, such as mobile unit, listen the phonetic order of song more commonly used, then the voice replied " will play
Music " can be set to High frequency speech, configure in High frequency speech library, and corpus is configured to " will play music ", audio text
Part is the broadcasting audio to the corpus.When being configured, by phonetic storage it is the form of audio file, while generates a language
Material and audio file name or the one-to-one map listing of ID.
After getting response text information, response text information is matched with the corpus in High frequency speech library,
When with success, illustrates that corresponding audio content stores in High frequency speech library, that is, determine the response text information
For the text information that high frequency uses, then convergence strategy is determined as high frequency strategy.
When it fails to match, illustrates that current response text information not yet stores in High frequency speech library, then obtain network-like
State is judged, convergence strategy is determined as local synthetic strategy or cloud synthetic strategy according to network state.Specifically, if net
Network state be it is not connected, then determine it as local synthetic strategy.If network state is to have connected, cloud conjunction is determined it as
At strategy.
Step S103: TTS voice is generated according to determining convergence strategy.After convergence strategy has been determined according to step S102,
Its corresponding TTS voice will be generated according to determining convergence strategy, to achieve the effect that quick response.Specifically, Ke Yishi
Now it is following several situations:
It is real to generate TTS voice according to determining convergence strategy when convergence strategy is determined as high frequency strategy for the first situation
It is existing are as follows: by the corresponding voice of the corpus stored in inquiry High frequency speech library, acquisition is matched with current response text information
The corresponding voice of corpus (pass through and obtain the corresponding audio file of matched corpus), the corresponding voice conduct that directly will acquire
TTS voice output (plays corresponding audio file), can achieve the effect of transient response in this way.
Second situation generates TTS language according to determining convergence strategy when convergence strategy is determined as local synthetic strategy
Sound is realized are as follows: response text information is synthesized TTS voice by local Compositing Engine, wherein local Compositing Engine is existing
Technology is converted to TTS voice what local was automated, and processing speed is exceedingly fast.
The third situation generates TTS language according to determining convergence strategy when convergence strategy is determined as cloud synthetic strategy
Sound is realized are as follows: response text information exported to cloud Compositing Engine, and obtains the voice messaging of cloud Compositing Engine return,
In, cloud Compositing Engine is referred to prior art realization, and in order to guarantee response speed and control flow, cloud Compositing Engine is returned
The voice messaging returned is the form of compressed package, thus after the voice messaging returned by cloud, need the voice messaging to return
It is decoded according to agreement, the TTS voice of the playable format of decoding forming apparatus.Illustratively, cloud Compositing Engine is by voice
Information synthesizes MP3 format, and the TTS voice of PCM format is resolved to the formatted voice information of return.
In the preferred embodiment, after generating TTS voice according to determining convergence strategy, also judge the TTS being currently generated
Whether voice is High frequency speech, and judgment mode, which can be, counts the access times of response text information, according to counting
Access times to determine whether being high frequency, such as when access times reach 10 times or more, be then judged as High frequency speech, work as determination
When for High frequency speech, just by current TTS voice and its corresponding response text information storage to high frequency sound bank, i.e., by response text
This information is used as corpus, and the audio file name of TTS voice and corpus are carried out binding storage, and by audio files storage to corresponding
Store path under, for subsequent interactive voice use, can constantly expand High frequency speech library in this way, reach efficient process
Effect.
The interactive voice time that reduction intelligent sound equipment and user may be implemented according to the method for the present embodiment, to mention
The function of high interactive voice, and under the lower hardware condition of device configuration, the perfect voice of client can also be supplied to and handed over
Mutually experience.
Fig. 2 schematically shows the device principle block diagram of rapid synthesis TTS voice according to an embodiment of the present invention,
As shown in Fig. 2,
The device of rapid synthesis TTS voice includes: that response message obtains module 2, tactful determining module 3, voice output
Module 4, High frequency speech library 6, local Compositing Engine 5 and speech processing module 7.
Speech processing module 7 is identified and is parsed for receiving user speech instruction, and according to identification and parsing result
It generates response text information to export to response message acquisition module 2, voice recognition mode is referred to prior art realization.
Response message obtains module 2 and is used to receive response text information, can be according to the triggering command of speech processing module 7
Obtain current response text information.
Tactful determining module 3 is used to determine that convergence strategy, convergence strategy include high frequency strategy, sheet according to response text information
Ground synthetic strategy and cloud synthetic strategy.It is realized wherein it is determined that the mode of convergence strategy is referred to above-mentioned method part,
This is without repeating.
Voice output module 4 is used to generate TTS voice according to determining convergence strategy.
Wherein, High frequency speech library 6 is for storing High frequency speech and its corresponding corpus.
Local Compositing Engine 5 is used to synthesize TTS voice according to the text information of input.
Voice output module 4 includes high frequency synthesis unit 401, local synthesis unit 402 and cloud synthesis unit 403.It is high
Frequency synthesis unit 401 is used to obtain corresponding voice as TTS voice output from High frequency speech library 6 according to response text information.
Local synthesis unit 402 is for calling local Compositing Engine 5 that response text information is synthesized TTS voice output.Cloud synthesis
Unit 403 is exported for that will reply text information to cloud Compositing Engine, and receives the voice messaging of cloud Compositing Engine return,
It decodes it as TTS voice output.The TTS speech synthesis mode that three kinds of situations can be completed according to the voice output module 4, from
And voice response speed can be provided.
When using the device, user need to only have the audio of pickup function to the microphone of equipment etc. equipped with the device
It acquires equipment and exports phonetic order, the phonetic order is carried out according to speech processing module 7 to handle final acquisition response text envelope
Breath, which is exported to response message and obtains module 2, and response message obtains module 2 and transmits text information
Strategy is determined to tactful determining module 3, and after determining strategy, which is exported to voice output module 4, according to different strategies
Corresponding different speech synthesis mode, to obtain final TTS voice output.
The interactive voice time that reduction intelligent sound equipment and user may be implemented according to the device of the present embodiment, to mention
The function of high interactive voice, and under the lower hardware condition of device configuration, the perfect voice of client can also be supplied to and handed over
Mutually experience.
Fig. 3 schematically shows the principle of device frame of the rapid synthesis TTS voice of another embodiment according to the present invention
Figure, as shown in figure 3,
The device further includes High frequency speech adding module 8, which is used to synthesize local synthesis unit 402 and cloud single
The TTS voice and its corresponding response text information of 403 output of member are made whether the judgement for High frequency speech, and the mode of judgement can
Referring to above-mentioned method part.When being determined as High frequency speech, TTS voice and its corresponding response text information are added to
High frequency speech library 6.
It can be summarized by High frequency speech adding module 8 according to each response results according to the device of the present embodiment,
It extends in High frequency speech library, it is possible thereby to which duration is that High frequency speech library increases resource, greatly improves treatment process
The speed of middle voice response.
It should be noted that in other embodiments, the device of rapid synthesis TTS voice can not also include speech processes
Module, but response message is directly obtained into module as external calling interface, with direct by response message acquisition module
Response text information is received to be handled.The embodiment of the present invention is not limited the block combiner of device, above are only one kind
Specific implementation, in a particular application, those skilled in the art can carry out any group to above-mentioned modular character according to demand
It closes, uses purpose accordingly to reach.
In some embodiments, the embodiment of the present invention provides a kind of non-volatile computer readable storage medium storing program for executing, described to deposit
Being stored in storage media one or more includes the programs executed instruction, and executing instruction can be (including but unlimited by electronic equipment
In computer, server or the network equipment etc.) it reads and executes, for executing any of the above-described rapid synthesis of the present invention
The method of TTS voice.
In some embodiments, the embodiment of the present invention also provides a kind of computer program product, computer program product packet
The computer program being stored on non-volatile computer readable storage medium storing program for executing is included, computer program includes program instruction, works as institute
When program instruction is computer-executed, make the method for computer execution any of the above-described rapid synthesis TTS voice.
In some embodiments, the embodiment of the present invention also provides a kind of electronic equipment comprising: at least one processor,
And the memory being connect at least one processor communication, wherein memory, which is stored with, to be executed by least one processor
Instruction, instruction executed by least one processor so that at least one processor is able to carry out the side of rapid synthesis TTS voice
Method.
In some embodiments, the embodiment of the present invention also provides a kind of storage medium, is stored thereon with computer program,
It is characterized in that, the method for rapid synthesis TTS voice when which is executed by processor.
The device of the rapid synthesis TTS voice of the embodiments of the present invention can be used for executing the quick conjunction of the embodiment of the present invention
At the method for TTS voice, and the method for reaching the realization rapid synthesis TTS voice of the embodiments of the present invention accordingly is reached
Technical effect, which is not described herein again.Hardware processor (hardware processor) can be passed through in the embodiment of the present invention
To realize related function module.
Fig. 4 is the hardware knot of the electronic equipment of the method for the execution rapid synthesis TTS voice that one embodiment of the invention provides
Structure schematic diagram, as shown in figure 4, the equipment includes:
One or more processors 410 and memory 420, in Fig. 4 by taking a processor 410 as an example.
The equipment for executing the method for rapid synthesis TTS voice can also include: input unit 430 and output device 440.
Processor 410, memory 420, input unit 430 and output device 440 can pass through bus or other modes
It connects, in Fig. 4 for being connected by bus.
Memory 420 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software journey
Sequence, non-volatile computer executable program and module, such as the method pair of the rapid synthesis TTS voice in the embodiment of the present application
Program instruction/the module answered.Processor 410 by operation be stored in memory 420 non-volatile software program, instruction with
And module realizes the quick conjunction of above method embodiment thereby executing the various function application and data processing of server
At the method for TTS voice.
Memory 420 may include storing program area and storage data area, wherein storing program area can store operation system
Application program required for system, at least one function;Storage data area can be stored to be made according to the device of rapid synthesis TTS voice
With the data etc. created.In addition, memory 420 may include high-speed random access memory, it can also include non-volatile
Memory, for example, at least a disk memory, flush memory device or other non-volatile solid state memory parts.In some realities
It applies in example, optional memory 420 includes the memory remotely located relative to processor 410, these remote memories can lead to
Cross the device of network connection to rapid synthesis TTS voice.The example of above-mentioned network includes but is not limited to internet, enterprises
Net, local area network, mobile radio communication and combinations thereof.
Input unit 430 can receive the number or character information of input, and generate the device with rapid synthesis TTS voice
User setting and the related signal of function control.Output device 440 may include that display screen etc. shows equipment.
Said one or multiple modules are stored in memory 420, are held when by one or more of processors 410
When row, the method that executes the rapid synthesis TTS voice in above-mentioned any means embodiment.
Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has
Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present application.
The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data
Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low
Hold mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function
Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) car-mounted device: this kind of equipment application may be implemented and the companies such as other auxiliary systems of automobile in vehicle carried driving
It connects.
(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total
Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy
Power, stability, reliability, safety, scalability, manageability etc. are more demanding.
(5) other electronic devices with data interaction function.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member
It is physically separated with being or may not be, component shown as a unit may or may not be physics list
Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs
In some or all of the modules achieve the purpose of the solution of this embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can
It is realized by the mode of software plus general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned technology
Scheme substantially in other words can be embodied in the form of software products the part that the relevant technologies contribute, the computer
Software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions to
So that computer equipment (can be personal computer, server or the network equipment etc.) execute each embodiment or
Method described in certain parts of embodiment.
Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations;Although
The application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: it still may be used
To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features;
And these are modified or replaceed, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and
Range.
Claims (10)
1. the method for rapid synthesis TTS voice, which comprises the steps of:
Obtain response text information;
Convergence strategy is determined according to response text information;
TTS voice is generated according to determining convergence strategy.
2. the method according to claim 1, wherein wherein, the convergence strategy includes high frequency strategy, local conjunction
At strategy and cloud synthetic strategy, the method also includes
Configuring high-frequency sound bank, the High frequency speech library include corpus and corresponding voice;
It is described to determine that convergence strategy includes according to response text information
Response text information is matched with corpus, convergence strategy is determined as high frequency strategy in successful match;
It when it fails to match, obtains network state and is judged, convergence strategy is determined as by local synthesis plan according to network state
Summary or cloud synthetic strategy.
3. according to the method described in claim 2, wherein, when convergence strategy is determined as high frequency strategy, according to determining fusion
Strategy generating TTS voice includes
Voice corresponding with the current matched corpus of response text information is obtained, the corresponding voice that will acquire is as TTS voice
Output;
When convergence strategy is determined as local synthetic strategy, generating TTS voice according to determining convergence strategy includes
Response text information is synthesized into TTS voice by local Compositing Engine;
When convergence strategy is determined as cloud synthetic strategy, generating TTS voice according to determining convergence strategy includes
Response text information is exported to cloud Compositing Engine, and obtains the voice messaging of cloud Compositing Engine return;
The voice messaging of return is decoded, TTS voice is generated.
4. according to the method in claim 2 or 3, which is characterized in that according to local synthetic strategy or cloud synthetic strategy
After generating TTS voice, further include
Judge whether the TTS voice that is currently generated is High frequency speech, when being determined as High frequency speech, by current TTS voice and its
Corresponding response text information storage is to the High frequency speech library.
5. the device of rapid synthesis TTS voice characterized by comprising
Response message obtains module, for obtaining response text information;
Tactful determining module, for determining convergence strategy according to response text information;
Voice output module, for generating TTS voice according to determining convergence strategy.
6. device according to claim 5, which is characterized in that the convergence strategy includes high frequency strategy, local synthesis plan
Slightly and cloud synthetic strategy, described device further include
High frequency speech library, for storing High frequency speech and its corresponding corpus;
Local Compositing Engine, for synthesizing TTS voice according to the text information of input;
Voice output module includes
High frequency synthesis unit, it is defeated as TTS voice for obtaining corresponding voice from High frequency speech library according to response text information
Out;
Local synthesis unit, for calling local Compositing Engine that response text information is synthesized TTS voice output;
Cloud synthesis unit is exported for that will reply text information to cloud Compositing Engine, and receives the return of cloud Compositing Engine
Voice messaging, decode it as TTS voice output.
7. device according to claim 6, which is characterized in that further include
High frequency speech adding module, for the TTS voice and its corresponding to local synthesis unit and the output of cloud synthesis unit
Response text information is judged, when being determined as High frequency speech, TTS voice and its corresponding response text information are added to
The High frequency speech library.
8. according to the described in any item devices of claim 5 to 7, which is characterized in that further include
Speech processing module is identified and is parsed for receiving user speech instruction, and generated according to identification and parsing result
Response text information exports to the response message and obtains module.
9. electronic equipment comprising: at least one processor, and the storage being connect at least one described processor communication
Device, wherein the memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one
A processor executes, so that at least one described processor is able to carry out the step of any one of claim 1-4 the method
Suddenly.
10. storage medium is stored thereon with computer program, which is characterized in that the program realizes right when being executed by processor
It is required that the step of any one of 1-4 the method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811611687.7A CN109448694A (en) | 2018-12-27 | 2018-12-27 | A kind of method and device of rapid synthesis TTS voice |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811611687.7A CN109448694A (en) | 2018-12-27 | 2018-12-27 | A kind of method and device of rapid synthesis TTS voice |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109448694A true CN109448694A (en) | 2019-03-08 |
Family
ID=65538423
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811611687.7A Pending CN109448694A (en) | 2018-12-27 | 2018-12-27 | A kind of method and device of rapid synthesis TTS voice |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109448694A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110600003A (en) * | 2019-10-18 | 2019-12-20 | 北京云迹科技有限公司 | Robot voice output method and device, robot and storage medium |
CN110600000A (en) * | 2019-09-29 | 2019-12-20 | 百度在线网络技术(北京)有限公司 | Voice broadcasting method and device, electronic equipment and storage medium |
CN110782869A (en) * | 2019-10-30 | 2020-02-11 | 标贝(北京)科技有限公司 | Speech synthesis method, apparatus, system and storage medium |
CN111161725A (en) * | 2019-12-17 | 2020-05-15 | 珠海格力电器股份有限公司 | Voice interaction method and device, computing equipment and storage medium |
CN111354334A (en) * | 2020-03-17 | 2020-06-30 | 北京百度网讯科技有限公司 | Voice output method, device, equipment and medium |
CN112099628A (en) * | 2020-09-08 | 2020-12-18 | 平安科技(深圳)有限公司 | VR interaction method and device based on artificial intelligence, computer equipment and medium |
CN112863479A (en) * | 2021-01-05 | 2021-05-28 | 杭州海康威视数字技术股份有限公司 | TTS voice processing method, device, equipment and system |
CN113421542A (en) * | 2021-06-22 | 2021-09-21 | 广州小鹏汽车科技有限公司 | Voice interaction method, server, voice interaction system and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1889170A (en) * | 2005-06-28 | 2007-01-03 | 国际商业机器公司 | Method and system for generating synthesized speech base on recorded speech template |
CN103366732A (en) * | 2012-04-06 | 2013-10-23 | 上海博泰悦臻电子设备制造有限公司 | Voice broadcast method and device and vehicle-mounted system |
CN104992704A (en) * | 2015-07-15 | 2015-10-21 | 百度在线网络技术(北京)有限公司 | Speech synthesizing method and device |
US20170255616A1 (en) * | 2016-03-03 | 2017-09-07 | Electronics And Telecommunications Research Institute | Automatic interpretation system and method for generating synthetic sound having characteristics similar to those of original speaker's voice |
CN107995249A (en) * | 2016-10-27 | 2018-05-04 | 中兴通讯股份有限公司 | A kind of method and apparatus of voice broadcast |
CN108877765A (en) * | 2018-05-31 | 2018-11-23 | 百度在线网络技术(北京)有限公司 | Processing method and processing device, computer equipment and the readable medium of voice joint synthesis |
-
2018
- 2018-12-27 CN CN201811611687.7A patent/CN109448694A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1889170A (en) * | 2005-06-28 | 2007-01-03 | 国际商业机器公司 | Method and system for generating synthesized speech base on recorded speech template |
CN103366732A (en) * | 2012-04-06 | 2013-10-23 | 上海博泰悦臻电子设备制造有限公司 | Voice broadcast method and device and vehicle-mounted system |
CN104992704A (en) * | 2015-07-15 | 2015-10-21 | 百度在线网络技术(北京)有限公司 | Speech synthesizing method and device |
US20170255616A1 (en) * | 2016-03-03 | 2017-09-07 | Electronics And Telecommunications Research Institute | Automatic interpretation system and method for generating synthetic sound having characteristics similar to those of original speaker's voice |
CN107995249A (en) * | 2016-10-27 | 2018-05-04 | 中兴通讯股份有限公司 | A kind of method and apparatus of voice broadcast |
CN108877765A (en) * | 2018-05-31 | 2018-11-23 | 百度在线网络技术(北京)有限公司 | Processing method and processing device, computer equipment and the readable medium of voice joint synthesis |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110600000A (en) * | 2019-09-29 | 2019-12-20 | 百度在线网络技术(北京)有限公司 | Voice broadcasting method and device, electronic equipment and storage medium |
CN110600000B (en) * | 2019-09-29 | 2022-04-15 | 阿波罗智联(北京)科技有限公司 | Voice broadcasting method and device, electronic equipment and storage medium |
CN110600003A (en) * | 2019-10-18 | 2019-12-20 | 北京云迹科技有限公司 | Robot voice output method and device, robot and storage medium |
CN110782869A (en) * | 2019-10-30 | 2020-02-11 | 标贝(北京)科技有限公司 | Speech synthesis method, apparatus, system and storage medium |
CN111161725A (en) * | 2019-12-17 | 2020-05-15 | 珠海格力电器股份有限公司 | Voice interaction method and device, computing equipment and storage medium |
CN111161725B (en) * | 2019-12-17 | 2022-09-27 | 珠海格力电器股份有限公司 | Voice interaction method and device, computing equipment and storage medium |
CN111354334A (en) * | 2020-03-17 | 2020-06-30 | 北京百度网讯科技有限公司 | Voice output method, device, equipment and medium |
EP3882909A1 (en) * | 2020-03-17 | 2021-09-22 | Beijing Baidu Netcom Science And Technology Co., Ltd. | Speech output method and apparatus, device and medium |
CN111354334B (en) * | 2020-03-17 | 2023-09-15 | 阿波罗智联(北京)科技有限公司 | Voice output method, device, equipment and medium |
CN112099628A (en) * | 2020-09-08 | 2020-12-18 | 平安科技(深圳)有限公司 | VR interaction method and device based on artificial intelligence, computer equipment and medium |
CN112863479A (en) * | 2021-01-05 | 2021-05-28 | 杭州海康威视数字技术股份有限公司 | TTS voice processing method, device, equipment and system |
CN113421542A (en) * | 2021-06-22 | 2021-09-21 | 广州小鹏汽车科技有限公司 | Voice interaction method, server, voice interaction system and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109448694A (en) | A kind of method and device of rapid synthesis TTS voice | |
KR102660922B1 (en) | Management layer for multiple intelligent personal assistant services | |
CN110442701B (en) | Voice conversation processing method and device | |
CN109637548A (en) | Voice interactive method and device based on Application on Voiceprint Recognition | |
CN111049996B (en) | Multi-scene voice recognition method and device and intelligent customer service system applying same | |
CN113127609B (en) | Voice control method, device, server, terminal equipment and storage medium | |
CN110288997A (en) | Equipment awakening method and system for acoustics networking | |
CN111261151B (en) | Voice processing method and device, electronic equipment and storage medium | |
CN107004411A (en) | Voice Applications framework | |
JP7342286B2 (en) | Voice function jump method, electronic equipment and storage medium for human-machine interaction | |
CN110874200B (en) | Interactive method, device, storage medium and operating system | |
CN109473104A (en) | Speech recognition network delay optimization method and device | |
US11595774B2 (en) | Spatializing audio data based on analysis of incoming audio data | |
WO2021129240A1 (en) | Skill scheduling method and apparatus for voice conversation platform | |
CN108877804A (en) | Voice service method, system, electronic equipment and storage medium | |
CN109669754A (en) | The dynamic display method of interactive voice window, voice interactive method and device with telescopic interactive window | |
CN109462546A (en) | A kind of voice dialogue history message recording method, apparatus and system | |
CN111966441A (en) | Information processing method and device based on virtual resources, electronic equipment and medium | |
US20230133146A1 (en) | Method and apparatus for determining skill field of dialogue text | |
CN110136713A (en) | Dialogue method and system of the user in multi-modal interaction | |
CN108702368A (en) | Integrating additional information into a telecommunications call | |
CN109726000A (en) | The management method of more application views, for more application views management device and operating method | |
CN112529585A (en) | Interactive awakening method, device, equipment and system for risk transaction | |
CN109637536A (en) | A kind of method and device of automatic identification semantic accuracy | |
CN111063348B (en) | Information processing method, device and equipment and computer storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant after: Sipic Technology Co.,Ltd. Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province Applicant before: AI SPEECH Co.,Ltd. |
|
CB02 | Change of applicant information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190308 |
|
RJ01 | Rejection of invention patent application after publication |