CN109448694A - A kind of method and device of rapid synthesis TTS voice - Google Patents

A kind of method and device of rapid synthesis TTS voice Download PDF

Info

Publication number
CN109448694A
CN109448694A CN201811611687.7A CN201811611687A CN109448694A CN 109448694 A CN109448694 A CN 109448694A CN 201811611687 A CN201811611687 A CN 201811611687A CN 109448694 A CN109448694 A CN 109448694A
Authority
CN
China
Prior art keywords
voice
strategy
text information
high frequency
tts voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811611687.7A
Other languages
Chinese (zh)
Inventor
林婷
郭志煌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AI Speech Ltd
Original Assignee
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI Speech Ltd filed Critical AI Speech Ltd
Priority to CN201811611687.7A priority Critical patent/CN109448694A/en
Publication of CN109448694A publication Critical patent/CN109448694A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A kind of method that the present invention discloses rapid synthesis TTS voice includes the following steps: to obtain response text information;Convergence strategy is determined according to response text information;TTS voice is generated according to determining convergence strategy.The invention also discloses a kind of devices of rapid synthesis TTS voice.The methods and apparatus disclosed may be implemented to reduce the interactive voice time of intelligent sound equipment and user according to the present invention, to improve the function of interactive voice, and under the lower hardware condition of device configuration, can also be supplied to the perfect interactive voice experience of client.

Description

A kind of method and device of rapid synthesis TTS voice
Technical field
The present invention relates to technical field of voice interaction, especially a kind of method and device of rapid synthesis TTS voice.
Background technique
With the continuous development of interactive voice technology, interactive voice using more and more, interactive voice in the prior art The realization principle of technology is as follows: user, which speaks, issues phonetic order, and equipment identifies phonetic order, carries out to the phonetic order semantic Understand, the text information of response this phonetic order is needed according to semantic output, text information is converted into TTS voice and is played out Come, to realize the interactive voice between intelligent sound equipment and user, can achieve in this way and ask and can answer, that is, realize man-machine stream It is smooth to link up.
But in this interactive voice scene, TTS aggregate velocity is to influence the important step of user experience.Especially existing Have in technology, the hardware configuration that can carry voice technology is irregular, this results in needing voice interactive function that can be adapted to respectively The type of the high configuration of kind or low configuration frequently can lead to the speed of TTS synthesis for the type of low configuration during interactive voice Degree is slower, influences the interactive voice experience of user.
Summary of the invention
To solve the above-mentioned problems, it is contemplated that from TTS synthesis process, TTS conjunction is carried out by convergence strategy At processing, to improve the response speed of voice.
According to the first aspect of the invention, a kind of method of rapid synthesis TTS voice is provided, is included the following steps:
Obtain response text information;
Convergence strategy is determined according to response text information;
TTS voice is generated according to determining convergence strategy.
According to the second aspect of the invention, a kind of device of rapid synthesis TTS voice is provided, comprising:
Response message obtains module, for obtaining response text information;
Tactful determining module, for determining convergence strategy according to response text information;
Voice output module, for generating TTS voice according to determining convergence strategy.
According to the third aspect of the present invention, a kind of electronic equipment is provided comprising: at least one processor, and The memory being connect at least one processor communication, wherein memory is stored with the finger that can be executed by least one processor It enables, instruction is executed by least one processor, so that the step of at least one processor is able to carry out the above method.
According to the fourth aspect of the present invention, a kind of storage medium is provided, computer program is stored thereon with, the program The step of above method is realized when being executed by processor.
Device and method provided by the invention carry out TTS synthesis processing by convergence strategy, and convergence strategy is to be based on Response text information determines, it is thus possible to carry out flexible speech synthesis processing based on response message, may be implemented to reduce The interactive voice time of intelligent sound equipment and user, to improve the function of interactive voice.Also, based on provided by the invention Device and method can also be supplied to the perfect interactive voice experience of client under the lower hardware condition of device configuration.
Detailed description of the invention
Fig. 1 is the method flow diagram of the rapid synthesis TTS voice of an embodiment of the present invention;
Fig. 2 is the device principle block diagram of the rapid synthesis TTS voice of an embodiment of the present invention;
Fig. 3 is the device principle block diagram of the rapid synthesis TTS voice of a further embodiment of this invention;
Fig. 4 is the electronic device block diagram of an embodiment of the present invention.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.
It should be noted that in the absence of conflict, the features in the embodiments and the embodiments of the present application can phase Mutually combination.
The present invention can describe in the general context of computer-executable instructions executed by a computer, such as program Module.Generally, program module includes routines performing specific tasks or implementing specific abstract data types, programs, objects, member Part, data structure etc..The present invention can also be practiced in a distributed computing environment, in these distributed computing environments, by Task is executed by the connected remote processing devices of communication network.In a distributed computing environment, program module can be with In the local and remote computer storage media including storage equipment.
In the present invention, the fingers such as " module ", " device ", " system " are applied to the related entities of computer, such as hardware, hardware Combination, software or software in execution with software etc..In detail, for example, element can with but be not limited to run on processing Process, processor, object, executable element, execution thread, program and/or the computer of device.In addition, running on server Application program or shell script, server can be element.One or more elements can be in the process and/or thread of execution In, and element can be localized and/or be distributed between two or multiple stage computers on one computer, and can be by each Kind computer-readable medium operation.Element can also according to the signal with one or more data packets, for example, from one with Another element interacts in local system, distributed system, and/or the network in internet passes through signal and other system interactions The signals of data communicated by locally and/or remotely process.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, the terms "include", "comprise", not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, method, article or equipment institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence " including ... ", it is not excluded that including described want There is also other identical elements in the process, method, article or equipment of element.
The method of the rapid synthesis TTS voice of the embodiment of the present invention can be applied to any terminal for being configured with phonetic function Equipment, for example, the terminal devices such as smart phone, tablet computer, smart home, the invention is not limited in this regard.So as to make It obtains user and obtains response more promptly and accurately during using these terminal devices, promote user experience.
The invention will now be described in further detail with reference to the accompanying drawings.
Fig. 1 schematically shows a kind of method flow of the rapid synthesis TTS voice of embodiment according to the present invention Figure.As shown in Figure 1, the present embodiment includes the following steps:
Step S101: response text information is obtained.Response text information is the text information for needing response, illustratively, It can be in speech recognition process according to the response content of text of semanteme output.The mode of acquisition can be according to its application scenarios It realizes referring to the prior art, such as when being during interactive voice, can be obtained from database according to speech recognition result Preconfigured response text information is taken, is also possible to provide the calling interface of response text information, directly be connect from calling interface Receive the response text information that input is come in.
Step S102: convergence strategy is determined according to response text information, wherein convergence strategy includes high frequency strategy, local Synthetic strategy and cloud synthetic strategy.
Illustratively, configuring high-frequency sound bank first, includes the high corpus of frequency of use in High frequency speech library and its is right The voice answered, wherein when carrying out the configuration in High frequency speech library, can rule of thumb determine the high corpus of frequency of use and its right The voice answered, such as mobile unit, listen the phonetic order of song more commonly used, then the voice replied " will play Music " can be set to High frequency speech, configure in High frequency speech library, and corpus is configured to " will play music ", audio text Part is the broadcasting audio to the corpus.When being configured, by phonetic storage it is the form of audio file, while generates a language Material and audio file name or the one-to-one map listing of ID.
After getting response text information, response text information is matched with the corpus in High frequency speech library, When with success, illustrates that corresponding audio content stores in High frequency speech library, that is, determine the response text information For the text information that high frequency uses, then convergence strategy is determined as high frequency strategy.
When it fails to match, illustrates that current response text information not yet stores in High frequency speech library, then obtain network-like State is judged, convergence strategy is determined as local synthetic strategy or cloud synthetic strategy according to network state.Specifically, if net Network state be it is not connected, then determine it as local synthetic strategy.If network state is to have connected, cloud conjunction is determined it as At strategy.
Step S103: TTS voice is generated according to determining convergence strategy.After convergence strategy has been determined according to step S102, Its corresponding TTS voice will be generated according to determining convergence strategy, to achieve the effect that quick response.Specifically, Ke Yishi Now it is following several situations:
It is real to generate TTS voice according to determining convergence strategy when convergence strategy is determined as high frequency strategy for the first situation It is existing are as follows: by the corresponding voice of the corpus stored in inquiry High frequency speech library, acquisition is matched with current response text information The corresponding voice of corpus (pass through and obtain the corresponding audio file of matched corpus), the corresponding voice conduct that directly will acquire TTS voice output (plays corresponding audio file), can achieve the effect of transient response in this way.
Second situation generates TTS language according to determining convergence strategy when convergence strategy is determined as local synthetic strategy Sound is realized are as follows: response text information is synthesized TTS voice by local Compositing Engine, wherein local Compositing Engine is existing Technology is converted to TTS voice what local was automated, and processing speed is exceedingly fast.
The third situation generates TTS language according to determining convergence strategy when convergence strategy is determined as cloud synthetic strategy Sound is realized are as follows: response text information exported to cloud Compositing Engine, and obtains the voice messaging of cloud Compositing Engine return, In, cloud Compositing Engine is referred to prior art realization, and in order to guarantee response speed and control flow, cloud Compositing Engine is returned The voice messaging returned is the form of compressed package, thus after the voice messaging returned by cloud, need the voice messaging to return It is decoded according to agreement, the TTS voice of the playable format of decoding forming apparatus.Illustratively, cloud Compositing Engine is by voice Information synthesizes MP3 format, and the TTS voice of PCM format is resolved to the formatted voice information of return.
In the preferred embodiment, after generating TTS voice according to determining convergence strategy, also judge the TTS being currently generated Whether voice is High frequency speech, and judgment mode, which can be, counts the access times of response text information, according to counting Access times to determine whether being high frequency, such as when access times reach 10 times or more, be then judged as High frequency speech, work as determination When for High frequency speech, just by current TTS voice and its corresponding response text information storage to high frequency sound bank, i.e., by response text This information is used as corpus, and the audio file name of TTS voice and corpus are carried out binding storage, and by audio files storage to corresponding Store path under, for subsequent interactive voice use, can constantly expand High frequency speech library in this way, reach efficient process Effect.
The interactive voice time that reduction intelligent sound equipment and user may be implemented according to the method for the present embodiment, to mention The function of high interactive voice, and under the lower hardware condition of device configuration, the perfect voice of client can also be supplied to and handed over Mutually experience.
Fig. 2 schematically shows the device principle block diagram of rapid synthesis TTS voice according to an embodiment of the present invention, As shown in Fig. 2,
The device of rapid synthesis TTS voice includes: that response message obtains module 2, tactful determining module 3, voice output Module 4, High frequency speech library 6, local Compositing Engine 5 and speech processing module 7.
Speech processing module 7 is identified and is parsed for receiving user speech instruction, and according to identification and parsing result It generates response text information to export to response message acquisition module 2, voice recognition mode is referred to prior art realization.
Response message obtains module 2 and is used to receive response text information, can be according to the triggering command of speech processing module 7 Obtain current response text information.
Tactful determining module 3 is used to determine that convergence strategy, convergence strategy include high frequency strategy, sheet according to response text information Ground synthetic strategy and cloud synthetic strategy.It is realized wherein it is determined that the mode of convergence strategy is referred to above-mentioned method part, This is without repeating.
Voice output module 4 is used to generate TTS voice according to determining convergence strategy.
Wherein, High frequency speech library 6 is for storing High frequency speech and its corresponding corpus.
Local Compositing Engine 5 is used to synthesize TTS voice according to the text information of input.
Voice output module 4 includes high frequency synthesis unit 401, local synthesis unit 402 and cloud synthesis unit 403.It is high Frequency synthesis unit 401 is used to obtain corresponding voice as TTS voice output from High frequency speech library 6 according to response text information. Local synthesis unit 402 is for calling local Compositing Engine 5 that response text information is synthesized TTS voice output.Cloud synthesis Unit 403 is exported for that will reply text information to cloud Compositing Engine, and receives the voice messaging of cloud Compositing Engine return, It decodes it as TTS voice output.The TTS speech synthesis mode that three kinds of situations can be completed according to the voice output module 4, from And voice response speed can be provided.
When using the device, user need to only have the audio of pickup function to the microphone of equipment etc. equipped with the device It acquires equipment and exports phonetic order, the phonetic order is carried out according to speech processing module 7 to handle final acquisition response text envelope Breath, which is exported to response message and obtains module 2, and response message obtains module 2 and transmits text information Strategy is determined to tactful determining module 3, and after determining strategy, which is exported to voice output module 4, according to different strategies Corresponding different speech synthesis mode, to obtain final TTS voice output.
The interactive voice time that reduction intelligent sound equipment and user may be implemented according to the device of the present embodiment, to mention The function of high interactive voice, and under the lower hardware condition of device configuration, the perfect voice of client can also be supplied to and handed over Mutually experience.
Fig. 3 schematically shows the principle of device frame of the rapid synthesis TTS voice of another embodiment according to the present invention Figure, as shown in figure 3,
The device further includes High frequency speech adding module 8, which is used to synthesize local synthesis unit 402 and cloud single The TTS voice and its corresponding response text information of 403 output of member are made whether the judgement for High frequency speech, and the mode of judgement can Referring to above-mentioned method part.When being determined as High frequency speech, TTS voice and its corresponding response text information are added to High frequency speech library 6.
It can be summarized by High frequency speech adding module 8 according to each response results according to the device of the present embodiment, It extends in High frequency speech library, it is possible thereby to which duration is that High frequency speech library increases resource, greatly improves treatment process The speed of middle voice response.
It should be noted that in other embodiments, the device of rapid synthesis TTS voice can not also include speech processes Module, but response message is directly obtained into module as external calling interface, with direct by response message acquisition module Response text information is received to be handled.The embodiment of the present invention is not limited the block combiner of device, above are only one kind Specific implementation, in a particular application, those skilled in the art can carry out any group to above-mentioned modular character according to demand It closes, uses purpose accordingly to reach.
In some embodiments, the embodiment of the present invention provides a kind of non-volatile computer readable storage medium storing program for executing, described to deposit Being stored in storage media one or more includes the programs executed instruction, and executing instruction can be (including but unlimited by electronic equipment In computer, server or the network equipment etc.) it reads and executes, for executing any of the above-described rapid synthesis of the present invention The method of TTS voice.
In some embodiments, the embodiment of the present invention also provides a kind of computer program product, computer program product packet The computer program being stored on non-volatile computer readable storage medium storing program for executing is included, computer program includes program instruction, works as institute When program instruction is computer-executed, make the method for computer execution any of the above-described rapid synthesis TTS voice.
In some embodiments, the embodiment of the present invention also provides a kind of electronic equipment comprising: at least one processor, And the memory being connect at least one processor communication, wherein memory, which is stored with, to be executed by least one processor Instruction, instruction executed by least one processor so that at least one processor is able to carry out the side of rapid synthesis TTS voice Method.
In some embodiments, the embodiment of the present invention also provides a kind of storage medium, is stored thereon with computer program, It is characterized in that, the method for rapid synthesis TTS voice when which is executed by processor.
The device of the rapid synthesis TTS voice of the embodiments of the present invention can be used for executing the quick conjunction of the embodiment of the present invention At the method for TTS voice, and the method for reaching the realization rapid synthesis TTS voice of the embodiments of the present invention accordingly is reached Technical effect, which is not described herein again.Hardware processor (hardware processor) can be passed through in the embodiment of the present invention To realize related function module.
Fig. 4 is the hardware knot of the electronic equipment of the method for the execution rapid synthesis TTS voice that one embodiment of the invention provides Structure schematic diagram, as shown in figure 4, the equipment includes:
One or more processors 410 and memory 420, in Fig. 4 by taking a processor 410 as an example.
The equipment for executing the method for rapid synthesis TTS voice can also include: input unit 430 and output device 440.
Processor 410, memory 420, input unit 430 and output device 440 can pass through bus or other modes It connects, in Fig. 4 for being connected by bus.
Memory 420 is used as a kind of non-volatile computer readable storage medium storing program for executing, can be used for storing non-volatile software journey Sequence, non-volatile computer executable program and module, such as the method pair of the rapid synthesis TTS voice in the embodiment of the present application Program instruction/the module answered.Processor 410 by operation be stored in memory 420 non-volatile software program, instruction with And module realizes the quick conjunction of above method embodiment thereby executing the various function application and data processing of server At the method for TTS voice.
Memory 420 may include storing program area and storage data area, wherein storing program area can store operation system Application program required for system, at least one function;Storage data area can be stored to be made according to the device of rapid synthesis TTS voice With the data etc. created.In addition, memory 420 may include high-speed random access memory, it can also include non-volatile Memory, for example, at least a disk memory, flush memory device or other non-volatile solid state memory parts.In some realities It applies in example, optional memory 420 includes the memory remotely located relative to processor 410, these remote memories can lead to Cross the device of network connection to rapid synthesis TTS voice.The example of above-mentioned network includes but is not limited to internet, enterprises Net, local area network, mobile radio communication and combinations thereof.
Input unit 430 can receive the number or character information of input, and generate the device with rapid synthesis TTS voice User setting and the related signal of function control.Output device 440 may include that display screen etc. shows equipment.
Said one or multiple modules are stored in memory 420, are held when by one or more of processors 410 When row, the method that executes the rapid synthesis TTS voice in above-mentioned any means embodiment.
Method provided by the embodiment of the present application can be performed in the said goods, has the corresponding functional module of execution method and has Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present application.
The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low Hold mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) car-mounted device: this kind of equipment application may be implemented and the companies such as other auxiliary systems of automobile in vehicle carried driving It connects.
(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy Power, stability, reliability, safety, scalability, manageability etc. are more demanding.
(5) other electronic devices with data interaction function.
The apparatus embodiments described above are merely exemplary, wherein described, unit can as illustrated by the separation member It is physically separated with being or may not be, component shown as a unit may or may not be physics list Member, it can it is in one place, or may be distributed over multiple network units.It can be selected according to the actual needs In some or all of the modules achieve the purpose of the solution of this embodiment.
Through the above description of the embodiments, those skilled in the art can be understood that each embodiment can It is realized by the mode of software plus general hardware platform, naturally it is also possible to pass through hardware.Based on this understanding, above-mentioned technology Scheme substantially in other words can be embodied in the form of software products the part that the relevant technologies contribute, the computer Software product may be stored in a computer readable storage medium, such as ROM/RAM, magnetic disk, CD, including some instructions to So that computer equipment (can be personal computer, server or the network equipment etc.) execute each embodiment or Method described in certain parts of embodiment.
Finally, it should be noted that above embodiments are only to illustrate the technical solution of the application, rather than its limitations;Although The application is described in detail with reference to the foregoing embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. the method for rapid synthesis TTS voice, which comprises the steps of:
Obtain response text information;
Convergence strategy is determined according to response text information;
TTS voice is generated according to determining convergence strategy.
2. the method according to claim 1, wherein wherein, the convergence strategy includes high frequency strategy, local conjunction At strategy and cloud synthetic strategy, the method also includes
Configuring high-frequency sound bank, the High frequency speech library include corpus and corresponding voice;
It is described to determine that convergence strategy includes according to response text information
Response text information is matched with corpus, convergence strategy is determined as high frequency strategy in successful match;
It when it fails to match, obtains network state and is judged, convergence strategy is determined as by local synthesis plan according to network state Summary or cloud synthetic strategy.
3. according to the method described in claim 2, wherein, when convergence strategy is determined as high frequency strategy, according to determining fusion Strategy generating TTS voice includes
Voice corresponding with the current matched corpus of response text information is obtained, the corresponding voice that will acquire is as TTS voice Output;
When convergence strategy is determined as local synthetic strategy, generating TTS voice according to determining convergence strategy includes
Response text information is synthesized into TTS voice by local Compositing Engine;
When convergence strategy is determined as cloud synthetic strategy, generating TTS voice according to determining convergence strategy includes
Response text information is exported to cloud Compositing Engine, and obtains the voice messaging of cloud Compositing Engine return;
The voice messaging of return is decoded, TTS voice is generated.
4. according to the method in claim 2 or 3, which is characterized in that according to local synthetic strategy or cloud synthetic strategy After generating TTS voice, further include
Judge whether the TTS voice that is currently generated is High frequency speech, when being determined as High frequency speech, by current TTS voice and its Corresponding response text information storage is to the High frequency speech library.
5. the device of rapid synthesis TTS voice characterized by comprising
Response message obtains module, for obtaining response text information;
Tactful determining module, for determining convergence strategy according to response text information;
Voice output module, for generating TTS voice according to determining convergence strategy.
6. device according to claim 5, which is characterized in that the convergence strategy includes high frequency strategy, local synthesis plan Slightly and cloud synthetic strategy, described device further include
High frequency speech library, for storing High frequency speech and its corresponding corpus;
Local Compositing Engine, for synthesizing TTS voice according to the text information of input;
Voice output module includes
High frequency synthesis unit, it is defeated as TTS voice for obtaining corresponding voice from High frequency speech library according to response text information Out;
Local synthesis unit, for calling local Compositing Engine that response text information is synthesized TTS voice output;
Cloud synthesis unit is exported for that will reply text information to cloud Compositing Engine, and receives the return of cloud Compositing Engine Voice messaging, decode it as TTS voice output.
7. device according to claim 6, which is characterized in that further include
High frequency speech adding module, for the TTS voice and its corresponding to local synthesis unit and the output of cloud synthesis unit Response text information is judged, when being determined as High frequency speech, TTS voice and its corresponding response text information are added to The High frequency speech library.
8. according to the described in any item devices of claim 5 to 7, which is characterized in that further include
Speech processing module is identified and is parsed for receiving user speech instruction, and generated according to identification and parsing result Response text information exports to the response message and obtains module.
9. electronic equipment comprising: at least one processor, and the storage being connect at least one described processor communication Device, wherein the memory is stored with the instruction that can be executed by least one described processor, and described instruction is by described at least one A processor executes, so that at least one described processor is able to carry out the step of any one of claim 1-4 the method Suddenly.
10. storage medium is stored thereon with computer program, which is characterized in that the program realizes right when being executed by processor It is required that the step of any one of 1-4 the method.
CN201811611687.7A 2018-12-27 2018-12-27 A kind of method and device of rapid synthesis TTS voice Pending CN109448694A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811611687.7A CN109448694A (en) 2018-12-27 2018-12-27 A kind of method and device of rapid synthesis TTS voice

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811611687.7A CN109448694A (en) 2018-12-27 2018-12-27 A kind of method and device of rapid synthesis TTS voice

Publications (1)

Publication Number Publication Date
CN109448694A true CN109448694A (en) 2019-03-08

Family

ID=65538423

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811611687.7A Pending CN109448694A (en) 2018-12-27 2018-12-27 A kind of method and device of rapid synthesis TTS voice

Country Status (1)

Country Link
CN (1) CN109448694A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110600003A (en) * 2019-10-18 2019-12-20 北京云迹科技有限公司 Robot voice output method and device, robot and storage medium
CN110600000A (en) * 2019-09-29 2019-12-20 百度在线网络技术(北京)有限公司 Voice broadcasting method and device, electronic equipment and storage medium
CN110782869A (en) * 2019-10-30 2020-02-11 标贝(北京)科技有限公司 Speech synthesis method, apparatus, system and storage medium
CN111161725A (en) * 2019-12-17 2020-05-15 珠海格力电器股份有限公司 Voice interaction method and device, computing equipment and storage medium
CN111354334A (en) * 2020-03-17 2020-06-30 北京百度网讯科技有限公司 Voice output method, device, equipment and medium
CN112099628A (en) * 2020-09-08 2020-12-18 平安科技(深圳)有限公司 VR interaction method and device based on artificial intelligence, computer equipment and medium
CN112863479A (en) * 2021-01-05 2021-05-28 杭州海康威视数字技术股份有限公司 TTS voice processing method, device, equipment and system
CN113421542A (en) * 2021-06-22 2021-09-21 广州小鹏汽车科技有限公司 Voice interaction method, server, voice interaction system and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1889170A (en) * 2005-06-28 2007-01-03 国际商业机器公司 Method and system for generating synthesized speech base on recorded speech template
CN103366732A (en) * 2012-04-06 2013-10-23 上海博泰悦臻电子设备制造有限公司 Voice broadcast method and device and vehicle-mounted system
CN104992704A (en) * 2015-07-15 2015-10-21 百度在线网络技术(北京)有限公司 Speech synthesizing method and device
US20170255616A1 (en) * 2016-03-03 2017-09-07 Electronics And Telecommunications Research Institute Automatic interpretation system and method for generating synthetic sound having characteristics similar to those of original speaker's voice
CN107995249A (en) * 2016-10-27 2018-05-04 中兴通讯股份有限公司 A kind of method and apparatus of voice broadcast
CN108877765A (en) * 2018-05-31 2018-11-23 百度在线网络技术(北京)有限公司 Processing method and processing device, computer equipment and the readable medium of voice joint synthesis

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1889170A (en) * 2005-06-28 2007-01-03 国际商业机器公司 Method and system for generating synthesized speech base on recorded speech template
CN103366732A (en) * 2012-04-06 2013-10-23 上海博泰悦臻电子设备制造有限公司 Voice broadcast method and device and vehicle-mounted system
CN104992704A (en) * 2015-07-15 2015-10-21 百度在线网络技术(北京)有限公司 Speech synthesizing method and device
US20170255616A1 (en) * 2016-03-03 2017-09-07 Electronics And Telecommunications Research Institute Automatic interpretation system and method for generating synthetic sound having characteristics similar to those of original speaker's voice
CN107995249A (en) * 2016-10-27 2018-05-04 中兴通讯股份有限公司 A kind of method and apparatus of voice broadcast
CN108877765A (en) * 2018-05-31 2018-11-23 百度在线网络技术(北京)有限公司 Processing method and processing device, computer equipment and the readable medium of voice joint synthesis

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110600000A (en) * 2019-09-29 2019-12-20 百度在线网络技术(北京)有限公司 Voice broadcasting method and device, electronic equipment and storage medium
CN110600000B (en) * 2019-09-29 2022-04-15 阿波罗智联(北京)科技有限公司 Voice broadcasting method and device, electronic equipment and storage medium
CN110600003A (en) * 2019-10-18 2019-12-20 北京云迹科技有限公司 Robot voice output method and device, robot and storage medium
CN110782869A (en) * 2019-10-30 2020-02-11 标贝(北京)科技有限公司 Speech synthesis method, apparatus, system and storage medium
CN111161725A (en) * 2019-12-17 2020-05-15 珠海格力电器股份有限公司 Voice interaction method and device, computing equipment and storage medium
CN111161725B (en) * 2019-12-17 2022-09-27 珠海格力电器股份有限公司 Voice interaction method and device, computing equipment and storage medium
CN111354334A (en) * 2020-03-17 2020-06-30 北京百度网讯科技有限公司 Voice output method, device, equipment and medium
EP3882909A1 (en) * 2020-03-17 2021-09-22 Beijing Baidu Netcom Science And Technology Co., Ltd. Speech output method and apparatus, device and medium
CN111354334B (en) * 2020-03-17 2023-09-15 阿波罗智联(北京)科技有限公司 Voice output method, device, equipment and medium
CN112099628A (en) * 2020-09-08 2020-12-18 平安科技(深圳)有限公司 VR interaction method and device based on artificial intelligence, computer equipment and medium
CN112863479A (en) * 2021-01-05 2021-05-28 杭州海康威视数字技术股份有限公司 TTS voice processing method, device, equipment and system
CN113421542A (en) * 2021-06-22 2021-09-21 广州小鹏汽车科技有限公司 Voice interaction method, server, voice interaction system and storage medium

Similar Documents

Publication Publication Date Title
CN109448694A (en) A kind of method and device of rapid synthesis TTS voice
KR102660922B1 (en) Management layer for multiple intelligent personal assistant services
CN110442701B (en) Voice conversation processing method and device
CN109637548A (en) Voice interactive method and device based on Application on Voiceprint Recognition
CN111049996B (en) Multi-scene voice recognition method and device and intelligent customer service system applying same
CN113127609B (en) Voice control method, device, server, terminal equipment and storage medium
CN110288997A (en) Equipment awakening method and system for acoustics networking
CN111261151B (en) Voice processing method and device, electronic equipment and storage medium
CN107004411A (en) Voice Applications framework
JP7342286B2 (en) Voice function jump method, electronic equipment and storage medium for human-machine interaction
CN110874200B (en) Interactive method, device, storage medium and operating system
CN109473104A (en) Speech recognition network delay optimization method and device
US11595774B2 (en) Spatializing audio data based on analysis of incoming audio data
WO2021129240A1 (en) Skill scheduling method and apparatus for voice conversation platform
CN108877804A (en) Voice service method, system, electronic equipment and storage medium
CN109669754A (en) The dynamic display method of interactive voice window, voice interactive method and device with telescopic interactive window
CN109462546A (en) A kind of voice dialogue history message recording method, apparatus and system
CN111966441A (en) Information processing method and device based on virtual resources, electronic equipment and medium
US20230133146A1 (en) Method and apparatus for determining skill field of dialogue text
CN110136713A (en) Dialogue method and system of the user in multi-modal interaction
CN108702368A (en) Integrating additional information into a telecommunications call
CN109726000A (en) The management method of more application views, for more application views management device and operating method
CN112529585A (en) Interactive awakening method, device, equipment and system for risk transaction
CN109637536A (en) A kind of method and device of automatic identification semantic accuracy
CN111063348B (en) Information processing method, device and equipment and computer storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant before: AI SPEECH Co.,Ltd.

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20190308

RJ01 Rejection of invention patent application after publication