CN104992703B

CN104992703B - Phoneme synthesizing method and system

Info

Publication number: CN104992703B
Application number: CN201510441079.6A
Authority: CN
Inventors: 李秀林; 白洁; 李维高; 唐海员
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2015-07-24
Filing date: 2015-07-24
Publication date: 2017-10-03
Anticipated expiration: 2035-07-24
Also published as: CN104992703A; WO2017016135A1

Abstract

The present invention proposes a kind of phoneme synthesizing method and system, and the phoneme synthesizing method includes：When needing phonetic synthesis, the list of sound storehouse can be used from service end inquiry, the sound storehouse list includes the information in multiple available sound storehouses, described to include characteristic sound storehouse with sound storehouse；User is obtained according to the sound storehouse selected with the list of sound storehouse, and the sound storehouse that user selects is downloaded from service end；Using the sound storehouse of download, text is synthesized into voice.This method can reduce offline phonetic synthesis APP volume, and can provide the user more more options, realize that personalized speech is synthesized.

Description

Phoneme synthesizing method and system

Technical field

The present invention relates to voice processing technology field, more particularly to a kind of phoneme synthesizing method and system.

Background technology

In the prior art, user is when downloading offline phonetic synthesis application program (APP), in the APP can comprising one or Two sound storehouses, user can select a kind of sound storehouse, the sound storehouse that the APP is selected using user afterwards is to that will play when using the APP Text carry out phonetic synthesis (Text To Speech, TTS).

But, on the one hand the scheme of prior art includes sound storehouse in APP, because sound library file is general all than larger, meeting Cause APP volume larger, the sound storehouse limitednumber that another aspect APP is included causes user to select space limited.

The content of the invention

It is contemplated that at least solving one of technical problem in correlation technique to a certain extent.

Therefore, it is an object of the present invention to propose a kind of phoneme synthesizing method, this method can reduce offline voice APP volume is synthesized, and more more options can be provided the user, realizes that personalized speech is synthesized.

It is another object of the present invention to propose a kind of speech synthesis system.

To reach above-mentioned purpose, the phoneme synthesizing method that first aspect present invention embodiment is proposed, including：Needing voice During synthesis, the list of sound storehouse can be used from service end inquiry, it is described to include the information in multiple available sound storehouses with the list of sound storehouse, it is described Characteristic sound storehouse can be included with sound storehouse；Obtain user and download use according to the sound storehouse selected with the list of sound storehouse, and from service end The sound storehouse of family selection；Using the sound storehouse of download, text is synthesized into voice.

The phoneme synthesizing method that first aspect present invention embodiment is proposed, by downloading sound from service end in phonetic synthesis Storehouse, rather than sound storehouse is directly included in APP, APP volume can be reduced, in addition, relative to the side that sound storehouse is included in APP More sound storehouses can be stored in formula, service end, by downloading sound storehouse in service end, more selections can be provided the user, By including characteristic sound storehouse in available sound storehouse, users ' individualized requirement can be met, Consumer's Experience is lifted.

To reach above-mentioned purpose, the speech synthesis system that second aspect of the present invention embodiment is proposed, including：Client is filled Put, the client terminal device includes：Enquiry module, for when needing phonetic synthesis, the list of sound storehouse can be used from service end inquiry, The sound storehouse list includes the information in multiple available sound storehouses, described to include characteristic sound storehouse with sound storehouse；Acquisition module, is used In acquisition user according to the sound storehouse selected with the list of sound storehouse, and the sound storehouse that user selects is downloaded from service end；Synthesize mould Block, for using the sound storehouse downloaded, text to be synthesized into voice.

The speech synthesis system that second aspect of the present invention embodiment is proposed, by downloading sound from service end in phonetic synthesis Storehouse, rather than sound storehouse is directly included in APP, APP volume can be reduced, in addition, relative to the side that sound storehouse is included in APP More sound storehouses can be stored in formula, service end, by downloading sound storehouse in service end, more selections can be provided the user, By including characteristic sound storehouse in available sound storehouse, users ' individualized requirement can be met, Consumer's Experience is lifted.

The additional aspect of the present invention and advantage will be set forth in part in the description, and will partly become from the following description Obtain substantially, or recognized by the practice of the present invention.

Brief description of the drawings

Of the invention above-mentioned and/or additional aspect and advantage will become from the following description of the accompanying drawings of embodiments Substantially and be readily appreciated that, wherein：

Fig. 1 is the schematic flow sheet for the phoneme synthesizing method that one embodiment of the invention is proposed；

Fig. 2 is the schematic flow sheet of the method for the phonetic synthesis that another embodiment of the present invention is proposed；

Fig. 3 is a kind of schematic diagram of specific example of speech synthesis system in the embodiment of the present invention；

Fig. 4 is a kind of schematic flow sheet of the phonetic synthesis of specific example in the embodiment of the present invention；

Fig. 5 is the schematic flow sheet of the phonetic synthesis of another specific example in the embodiment of the present invention；

Fig. 6 is the schematic flow sheet of the phonetic synthesis of another specific example in the embodiment of the present invention；

Fig. 7 is the schematic flow sheet of the phonetic synthesis of another specific example in the embodiment of the present invention；

Fig. 8 is the structural representation for the speech synthesis system that another embodiment of the present invention is proposed；

Fig. 9 is the structural representation for the speech synthesis system that another embodiment of the present invention is proposed.

Embodiment

Embodiments of the invention are described below in detail, the example of the embodiment is shown in the drawings, wherein from beginning to end Same or similar label represents same or similar module or the module with same or like function.Below with reference to attached The embodiment of figure description is exemplary, is only used for explaining the present invention, and is not considered as limiting the invention.On the contrary, this All changes in the range of spirit and intension that the embodiment of invention includes falling into attached claims, modification and equivalent Thing.

Fig. 1 is the schematic flow sheet for the phoneme synthesizing method that one embodiment of the invention is proposed, this method includes：

S11：When needing phonetic synthesis, the list of sound storehouse can be used from service end inquiry, it is described to be included with the list of sound storehouse The information in multiple available sound storehouses, it is described to include characteristic sound storehouse with sound storehouse.

From directly being included in the prior art in APP unlike sound storehouse, in the present embodiment, it is not necessary to included in APP Sound storehouse, but downloaded when needing sound storehouse from service end.

For example, in client the corresponding SDKs of APP (Software Development Kit, SDK) to Service end sends inquiry request, and the inquiry request is used to ask that the list of sound storehouse can be used, and service end is obtained after receiving the inquiry request The list of available sound storehouse is taken, and the available sound storehouse list of acquisition is sent to SDK.

Available sound storehouse in the present embodiment includes characteristic sound storehouse, it is, of course, understood that be able to can also be included with sound storehouse Existing common sound storehouse.

Wherein, characteristic sound storehouse is previously generated, for the sound storehouse met individual requirements, the sound different from common sound storehouse Sound storehouse of storehouse, such as virgin voice bank, or user's customization etc..

S12：User is obtained according to the sound storehouse selected with the list of sound storehouse, and the sound that user selects is downloaded from service end Storehouse.

SDK is obtained from service end and can used after the list of sound storehouse, can will can show user with the list of sound storehouse, in displaying, The information in each available sound storehouse can be specifically shown, it is the generation time, suitable offline for example, generator's information in sound storehouse can be used Field, male voice or female voice or other characteristic sound, tonequality belonging to the version of speech synthesis engine, suitable text to be synthesized Deng so as to facilitate user to select.

User according to the information of displaying, can select the information in one or more available sound storehouses.

User can be with after the information in sound storehouse in selection, and SDK can determine the available sound storehouse of corresponding user's selection, and from clothes Download the available sound storehouse of user's selection in business end.For example, can be with link information is also included in the information in sound storehouse, user's selection can use sound After the information in storehouse, available sound storehouse accordingly can be downloaded according to the link information in the information.

S13：Using the sound storehouse of download, text is synthesized into voice.

SDK is downloaded behind sound storehouse from service end, it is possible to realize phonetic synthesis using the sound storehouse.

In the present embodiment, by downloading sound storehouse from service end in phonetic synthesis, rather than sound is directly included in APP Storehouse, can reduce APP volume, in addition, relative to the mode that sound storehouse is included in APP, can store more in service end Sound storehouse, by downloading sound storehouse in service end, can provide the user more selections, by including characteristic sound in available sound storehouse Storehouse, can meet users ' individualized requirement, lift Consumer's Experience.

Fig. 2 is the schematic flow sheet of the method for the phonetic synthesis that another embodiment of the present invention is proposed, the present embodiment is to provide Exemplified by user's selection characteristic sound storehouse, this method includes：

S21：Service end creates characteristic sound storehouse and corresponding characteristic sound storehouse information, and, store the characteristic sound storehouse and spy Color sound storehouse information.

Wherein, characteristic sound storehouse is created, can be included：

Set up characteristic acoustic model and obtain acoustics segment, spy is constituted by the characteristic acoustic model and the acoustics segment Color sound storehouse；Or,

Characteristic acoustic model is set up, characteristic sound storehouse is constituted by the characteristic acoustic model；Or,

Voice data corresponding with particular text is obtained, characteristic sound is constituted by the particular text and the voice data Storehouse；Or,

Set up characteristic acoustic model, obtain acoustics segment, and, voice data corresponding with particular text is obtained, by institute State characteristic acoustic model, acoustics segment, and, the particular text and voice data composition characteristic sound storehouse；Or,

Set up characteristic acoustic model, obtain voice data corresponding with particular text, by the characteristic acoustic model, with And, the particular text and voice data composition characteristic sound storehouse.

In some embodiments, characteristic acoustic model is set up, can be included：

Characteristic voice data is obtained, and the characteristic voice data is trained, characteristic acoustic model is set up；Or,

Existing acoustic model and characteristic voice data are obtained, according to the characteristic voice data to the existing acoustics Model carries out adaptive training, sets up characteristic acoustic model.

Wherein, the sample size needed when being directly trained to characteristic voice data and obtain characteristic acoustic model is more than to Some acoustic models carry out the sample size of needs during adaptive training.

For example, recording/collection certain scale, special tamber voice data, and carry out the artificial or automatic rhythm Mark and border mark, training obtain characteristic acoustic model.Or, utilize existing acoustic model, a small amount of spy of recording/collection The voice data of accordatura color, by adaptive model training technique, characteristic acoustic model is updated to by existing acoustic model.

In some embodiments, acoustics segment is obtained, can be included：

Cutting is carried out to training sample and obtains acoustics segment.

For example, recording/collection certain scale, special tamber voice data, and carry out the artificial or automatic rhythm Mark and border mark, cutting obtain acoustics segment.

In some embodiments, voice data corresponding with particular text is obtained, can be included：

Choose the particular text to be read aloud；

Obtain specific speaker and voice is read aloud to the particular text；

Using it is described read aloud voice or to it is described read aloud voice be compressed processing after voice as with the specific text This corresponding voice data.

For example, asking specific speaker is sentient to read aloud particular text, corresponding voice data is obtained, sound number is realized According to customization.

Optionally, in order to save space, processing can be compressed to the voice of reading aloud of acquisition, the language after compression is handled Sound is used as the voice data being finally stored in characteristic sound storehouse.

Different sound are obtained furthermore it is possible to identical or different particular text read aloud by different speakers Data, afterwards can be by multiple particular texts and voice data corresponding storage respectively, composition customization storehouse.

Characteristic sound storehouse information refers to the relevant information generated for characteristic sound storehouse, for example, generator's information, generates the time, Field, male voice or female voice or other characteristics belonging to the version of suitable offline speech synthesis engine, suitable text to be synthesized Sound, tonequality etc..

After characteristic sound storehouse and characteristic sound storehouse information is created, it can store.For example, with reference to Fig. 3, creation module (is used in Fig. 3 Manage console are represented) 31 create behind characteristic sound storehouses, can be by characteristic sound library storage for storing depositing for characteristic sound storehouse Store up in module (being represented in Fig. 3 with BOS cloud storages) 32, after characteristic sound storehouse information is created, characteristic sound storehouse information is stored in use In the memory module (with mysql collection group representation in Fig. 3) 33 of storage characteristic sound storehouse information.In addition, characteristic sound storehouse information is rear Afterflow journey can be supplied to user as Query Result, and each Query Result can each can use sound as that can use in the list of sound storehouse A kind of information in the information of storehouse.

S22：SDK sends inquiry request to service end.

Wherein, SDK can send the inquiry request when needing phonetic synthesis, be clicked on for example, user opens SDK for touching After the button for sending out phonetic synthesis, SDK sends inquiry request to service end.

Referring to Fig. 3, the Ingress node that the inquiry request that SDK 34 is sent can first be sent to service end (uses physics in Fig. 3 Computer room is represented) at 35.

S23：Service end obtains Query Result according to inquiry request.

Wherein, querying condition can be included in inquiry request, for example, the version of speech synthesis engine, field, characteristic sound Deng service end is received after inquiry request, is obtained and is met the Query Result of querying condition.

In order to tackle the explosion type inquiry request that SDK ends are likely to occur, Query Result can be cached.Referring to Fig. 3, Exemplified by Query Result storage is arrived into memcached clusters 36.

Therefore, after service end receives inquiry request, first it can be inquired about in memcached clusters, if can look into The Query Result for meeting querying condition is found, then directly can obtain Query Result from memcached clusters.Or, if The Query Result for meeting querying condition can not be found in memcached clusters, then can be inquired about again in mysql clusters, When there is the Query Result for meeting querying condition in mysql clusters, Query Result is obtained from mysql clusters, and it is possible to During the Query Result obtained from mysql clusters is cached into memcached clusters, so as to subsequently can be directly from memcached Cluster obtains Query Result.

S24：Service end is obtained according to Query Result can use the list of sound storehouse.

For example, with reference to Fig. 3, physics computer room obtains Query Result out of memcached clusters, furthermore it is also possible to from BOS clouds Storage acquisition can use the storage address as link information in sound storehouse, then correspond to each available sound storehouse, can use the information in sound storehouse can With including：Query Result version, field, the characteristic sound of speech synthesis engine (such as be adapted to) and link information, afterwards, can be with The list of sound storehouse can be used by being made up of the information in multiple available sound storehouses.

S25：Service end can be sent to SDK with the list of sound storehouse.

S26：SDK obtains user according to the sound storehouse selected with the list of sound storehouse, and downloads user's selection from service end Sound storehouse.

For example, SDK is got after the list of available sound storehouse, the list is showed into user, user can according to the information of displaying To select that sound storehouse can be used.

In addition, referring to Fig. 3, after by characteristic sound library storage, the memory module in storage characteristic sound storehouse can be by characteristic sound storehouse Storage address the Ingress node of service end, afterwards, the characteristic that Ingress node will be obtained from mysql are sent to as link information Sound storehouse information and the link information that obtains from memory module are as that can use the information in sound storehouse, and by the information group in multiple available sound storehouses Into SDK can be sent to the list of sound storehouse.

Wherein, the mark for sending characteristic sound storehouse can be corresponded to when memory module sends link information to Ingress node with linking Information, when storing characteristic sound storehouse information in mysql, by the storage corresponding with information of the mark in characteristic sound storehouse, and, Ingress node When obtaining characteristic sound storehouse information from mysql, correspondence obtains the mark and information in characteristic sound storehouse, afterwards, can be according to characteristic The mark in sound storehouse associates the information obtained from mysql with the link information obtained from memory module.

After user selects to use the information in sound storehouse, the link information that the information that SDK is selected according to user includes is from clothes Download the sound storehouse of selection in business end.

S27：Text is synthesized voice by SDK using the sound storehouse downloaded.

SDK is obtained behind sound storehouse, text can be synthesized into voice using the sound storehouse, be realized phonetic synthesis.

In phonetic synthesis, voice can be carried out according to information in the characteristic sound storehouse of download and different phonetic synthesis modes Synthesis.

Optionally, it is described that text is synthesized by voice using the sound storehouse downloaded, including：

When including acoustic model and acoustics segment in the sound storehouse, text is handled, according to the text after processing Parameters,acoustic is obtained with the acoustic model, and corresponding acoustics segment is obtained according to the parameters,acoustic, and, to acquisition Acoustics segment carries out splicing synthesis, obtains synthesis voice；Or,

When including acoustic model in the sound storehouse, text is handled, according to the text after processing and the acoustics Model obtains parameters,acoustic, and vocoder parameter synthesis is carried out according to the parameters,acoustic, obtains synthesis voice；Or,

When including acoustic model, particular text with corresponding voice data in the sound storehouse, text is pre-processed, When there is the particular text consistent with pretreated text in the sound storehouse, sound corresponding with the particular text is obtained Data, the voice data after decompression is carried out using the voice data or to the voice data as synthesis voice.

Exemplified by being characteristic sound storehouse by sound storehouse, particular content can be as follows：

In some embodiments, referring to Fig. 4, the flow of phonetic synthesis can include：

S41：Text Pretreatment is carried out to text to be synthesized.

S42：Text analyzing is carried out to pretreated text.

S43：To the text prosody prediction after text analyzing.

S41-S43 particular content may refer to the related procedure of existing phonetic synthesis.

S44：According to the characteristic acoustic model in the text after prosody prediction and characteristic sound storehouse, parameters,acoustic life is carried out Into generation parameters,acoustic.

Unlike the prior art, the present embodiment is used characteristic acoustic model rather than existing acoustic model, and It is determined that after acoustic model, the flow of generation parameters,acoustic may refer to existing mode.

S45：Corresponding acoustics segment is obtained in characteristic sound storehouse according to parameters,acoustic, the acoustics segment to acquisition is spelled It is bonded into, obtains the corresponding synthesis voice of text to be synthesized.

Wherein, corresponding parameters,acoustic can also be created when creating acoustics segment, afterwards by acoustics in characteristic sound storehouse Parameter storage corresponding with acoustics segment, so that corresponding acoustics segment can be found according to parameters,acoustic in phonetic synthesis.

Had no progeny obtaining acoustic sheet, these segments can be spliced, so as to obtain the corresponding voice of text, realize language Sound is synthesized.

In some embodiments, referring to Fig. 5, the flow of phonetic synthesis can include：

S51：Text Pretreatment is carried out to text to be synthesized.

S52：Text analyzing is carried out to pretreated text.

S53：To the text prosody prediction after text analyzing.

S51-S53 particular content may refer to the related procedure of existing phonetic synthesis.

S54：According to the characteristic acoustic model in the text after prosody prediction and characteristic sound storehouse, parameters,acoustic life is carried out Into generation parameters,acoustic.

S55：Vocoder parameter synthesis is carried out according to parameters,acoustic, the corresponding synthesis voice of text to be synthesized is obtained.

Wherein, vocoder be it is a kind of the device of sound can be generated according to parameters,acoustic, therefore use the device can be with defeated Go out to synthesize voice.

In some embodiments, referring to Fig. 6, the flow of phonetic synthesis can include：

S61：Text Pretreatment is carried out to text to be synthesized.

S62：Judge to whether there is voice data corresponding with text to be synthesized in characteristic sound storehouse, if so, S63 is performed, Otherwise, S64 is performed.

Wherein, when preserving particular text with corresponding voice data in characteristic sound storehouse, it can be judged by the mode of searching Whether characteristic sound preserves the particular text consistent with text to be synthesized in storehouse.

It is understood that because different speakers may read aloud mode to identical content of text using different, Therefore, particular text can be completely the same with corresponding voice data, or in error range it is consistent.For example, right Ying Yu " front traffic lights, it is noted that observe traffic rules and regulations " this particular text, different people may have different performances, some The sound recorded in sound storehouse may correspondingly " do care, horse back traffic lights, makes a dash across the red light what is imposed a fine！" as content sound Sound.

S63：Obtain corresponding voice data.

For example, when there is the particular text consistent with text to be synthesized in characteristic sound storehouse, then can obtain specific with this The corresponding voice data of text.

The voice data can be regard as the synthesis voice finally to be synthesized after voice data is obtained.Or, if In characteristic sound storehouse, voice data corresponding with particular text is stored after compression is handled, then correspondence is obtained in characteristic sound storehouse Voice data after, decompression can be carried out to the voice data of acquisition, using the voice data after decompression as Synthesize voice.

S64：Text analyzing is carried out to pretreated text.

S65：To the text prosody prediction after text analyzing.

S61, S64 and S65 particular content may refer to the related procedure of existing phonetic synthesis.

S66：According to the characteristic acoustic model in the text after prosody prediction and characteristic sound storehouse, parameters,acoustic life is carried out Into generation parameters,acoustic.

S67：Vocoder parameter synthesis is carried out according to parameters,acoustic, the corresponding synthesis voice of text to be synthesized is obtained.

In some embodiments, referring to Fig. 7, the flow of phonetic synthesis can include：

S71：Text Pretreatment is carried out to text to be synthesized.

S72：Judge to whether there is voice data corresponding with text to be synthesized in characteristic sound storehouse, if so, S73 is performed, Otherwise, S74 is performed.

S73：Obtain corresponding voice data.

S74：Text analyzing is carried out to pretreated text.

S75：To the text prosody prediction after text analyzing.

S71, S74 and S75 particular content may refer to the related procedure of existing phonetic synthesis.

S76：According to the characteristic acoustic model in the text after prosody prediction and characteristic sound storehouse, parameters,acoustic life is carried out Into generation parameters,acoustic.

S77：Corresponding acoustics segment is obtained in characteristic sound storehouse according to parameters,acoustic, the acoustics segment to acquisition is spelled It is bonded into, obtains the corresponding synthesis voice of text to be synthesized.

In the present embodiment, by downloading sound storehouse from service end in phonetic synthesis, rather than sound is directly included in APP Storehouse, can reduce APP volume, in addition, relative to the mode that sound storehouse is included in APP, can store more in service end Sound storehouse, by downloading sound storehouse in service end, can provide the user more selections, by including characteristic sound in available sound storehouse Storehouse, can meet users ' individualized requirement, lift Consumer's Experience.Characteristic sound storehouse is created by using different modes and using different Mode phonetic synthesis is carried out according to characteristic sound storehouse, can meet different scenes demand, realize variation.

Fig. 8 is the structural representation for the speech synthesis system that another embodiment of the present invention is proposed, the system includes：Client Device 81, client terminal device 81 includes：

Enquiry module 811, it is described to use sound for when needing phonetic synthesis, the list of sound storehouse can be used from service end inquiry Storehouse list includes the information in multiple available sound storehouses, described to include characteristic sound storehouse with sound storehouse；

Acquisition module 812, is downloaded for obtaining user according to the sound storehouse selected with the list of sound storehouse, and from service end The sound storehouse of user's selection；

User can be with after the information in sound storehouse in selection, and SDK can determine the available sound storehouse of corresponding user's selection, and according to The information of selection downloads the available sound storehouse of user's selection from service end.For example, can with also including link information in the information in sound storehouse, User's selection can download available sound storehouse accordingly with after the information in sound storehouse according to the link information in the information of selection.

Synthesis module 813, for using the sound storehouse downloaded, text to be synthesized into voice.

In some embodiments, referring to Fig. 9, the system also includes：Service terminal device 82, service terminal device includes：For creating Build the creation module 821 in characteristic sound storehouse, the creation module 821 specifically for：

In some embodiments, the creation module 821 is used to set up characteristic acoustic model, including：

In some embodiments, acoustics segment is obtained, can be included：

Cutting is carried out to training sample and obtains acoustics segment.

In some embodiments, the creation module 821 is used to obtain voice data corresponding with particular text, including：

Choose the particular text to be read aloud；

Obtain specific speaker and voice is read aloud to the particular text；

In some embodiments, referring to Fig. 9, the system also includes：Positioned at the first group system 822 of service end and the second collection Group system 823, the enquiry module 811 specifically for：

Sent to service end and querying condition is included in inquiry request, the inquiry request so that the service end is according to institute State querying condition and obtain Query Result, wherein, when there is the Query Result in the first group system, from first cluster The Query Result is obtained in system, or, when the Query Result is not present in first group system, then from the second collection The Query Result is obtained in group's system, and the Query Result of acquisition is cached in first group system；

Receive the available sound storehouse list that the service end is sent, described can be the service end with the list of sound storehouse according to What Query Result was obtained.

The memcached clusters in the first group system corresponding method embodiment herein.

It is described to be included with the information in sound storehouse referring to Fig. 9 in some embodiments：Correspondence generation behind sound storehouse can be used creating Information, the system also includes：Positioned at the second group system 823 of service end, second group system 823 is used to store The establishment can use the information of correspondence generation behind sound storehouse.Creating can use behind sound storehouse in the information corresponding method embodiment of correspondence generation Characteristic sound storehouse information, characteristic sound storehouse information can be supplied to user, each Query Result in follow-up process as Query Result A kind of information in the information of sound storehouse can be each can use in the list of sound storehouse as that can use.

It is described to be included with the information in sound storehouse referring to Fig. 9 in some embodiments：The link information in sound storehouse, the system can be used System also includes：Positioned at the memory module 824 of service end, the memory module 824 is used for the available sound storehouse for storing generation, and can Link information is used as with the storage address in sound storehouse.

The characteristic sound storehouse information in the information correspondence above-described embodiment that can use correspondence generation behind sound storehouse is created herein, it is special Color sound storehouse information refers to the relevant information generated for characteristic sound storehouse, for example, generator's information, generates the time, suitable is offline Field, male voice or female voice or other characteristic sound, tonequality belonging to the version of speech synthesis engine, suitable text to be synthesized Deng.

Therefore, the mysql clusters in the second group system corresponding method embodiment herein.Memory module correspondence herein BOS cloud storages in embodiment of the method.

After user selects to use the information in sound storehouse, link information in the information that SDK is selected according to user is from service end Download the sound storehouse of selection.

In some embodiments, the synthesis module 813 specifically for：

The content of specific phonetic synthesis may refer to Fig. 4-7, will not be repeated here.

It should be noted that in the description of the invention, term " first ", " second " etc. are only used for describing purpose, without It is understood that to indicate or imply relative importance.In addition, in the description of the invention, unless otherwise indicated, the implication of " multiple " Refer at least two.

Any process described otherwise above or method description are construed as in flow chart or herein, represent to include Module, fragment or the portion of the code of one or more executable instructions for the step of realizing specific logical function or process Point, and the scope of the preferred embodiment of the present invention includes other realization, wherein can not be by shown or discussion suitable Sequence, including according to involved function by it is basic simultaneously in the way of or in the opposite order, carry out perform function, this should be of the invention Embodiment person of ordinary skill in the field understood.

It should be appreciated that each several part of the present invention can be realized with hardware, software, firmware or combinations thereof.Above-mentioned In embodiment, the software that multiple steps or method can in memory and by suitable instruction execution system be performed with storage Or firmware is realized.If, and in another embodiment, can be with well known in the art for example, realized with hardware Any one of row technology or their combination are realized：With the logic gates for realizing logic function to data-signal Discrete logic, the application specific integrated circuit with suitable combinational logic gate circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..

Those skilled in the art are appreciated that to realize all or part of step that above-described embodiment method is carried Rapid to can be by program to instruct the hardware of correlation to complete, described program can be stored in a kind of computer-readable storage medium In matter, the program upon execution, including one or a combination set of the step of embodiment of the method.

In addition, each functional unit in each embodiment of the invention can be integrated in a processing module, can also That unit is individually physically present, can also two or more units be integrated in a module.Above-mentioned integrated mould Block can both be realized in the form of hardware, it would however also be possible to employ the form of software function module is realized.The integrated module is such as Fruit is realized using in the form of software function module and as independent production marketing or in use, can also be stored in a computer In read/write memory medium.

Storage medium mentioned above can be read-only storage, disk or CD etc..

In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means to combine specific features, structure, material or the spy that the embodiment or example are described Point is contained at least one embodiment of the present invention or example.In this manual, to the schematic representation of above-mentioned term not Necessarily refer to identical embodiment or example.Moreover, specific features, structure, material or the feature of description can be any One or more embodiments or example in combine in an appropriate manner.

Although embodiments of the invention have been shown and described above, it is to be understood that above-described embodiment is example Property, it is impossible to limitation of the present invention is interpreted as, one of ordinary skill in the art within the scope of the invention can be to above-mentioned Embodiment is changed, changed, replacing and modification.

Claims

1. a kind of phoneme synthesizing method, it is characterised in that including：

When needing phonetic synthesis, the list of sound storehouse can be used from service end inquiry, the sound storehouse list includes multiple available The information in sound storehouse, it is described to include characteristic sound storehouse with sound storehouse；

User is obtained according to the sound storehouse selected with the list of sound storehouse, and the sound storehouse that user selects is downloaded from service end；

Using the sound storehouse of download, text is synthesized into voice；

Wherein, it is described to use the list of sound storehouse from service end inquiry, including：

Sent to service end and querying condition is included in inquiry request, the inquiry request so that the service end is looked into according to Inquiry condition obtains Query Result, wherein, when there is the Query Result in the first group system, from first group system The middle acquisition Query Result, or, when the Query Result is not present in first group system, from the second cluster system The Query Result is obtained in system, and the Query Result of acquisition is cached in first group system；The Query Result Including characteristic sound；

Receive the available sound storehouse list that the service end is sent, described can be the service end with the list of sound storehouse according to the inquiry As a result obtain；

It is described to be included with the information in sound storehouse：Clothes can be stored in creating with the information that correspondence is generated behind sound storehouse, described information It is engaged in second group system at end.

2. according to the method described in claim 1, it is characterised in that also include：Create characteristic sound storehouse, the establishment characteristic sound storehouse Including：

Set up characteristic acoustic model and obtain acoustics segment, characteristic sound is constituted by the characteristic acoustic model and the acoustics segment Storehouse；Or,

Voice data corresponding with particular text is obtained, characteristic sound storehouse is constituted by the particular text and the voice data；Or Person,

Set up characteristic acoustic model, obtain acoustics segment, and, voice data corresponding with particular text is obtained, by the spy Color acoustic model, acoustics segment, and, the particular text and voice data composition characteristic sound storehouse；Or,

Characteristic acoustic model is set up, voice data corresponding with particular text is obtained, by the characteristic acoustic model, and, institute State particular text and voice data composition characteristic sound storehouse.

3. method according to claim 2, it is characterised in that described to set up characteristic acoustic model, including：

Existing acoustic model and characteristic voice data are obtained, according to the characteristic voice data to the existing acoustic model Adaptive training is carried out, characteristic acoustic model is set up.

4. method according to claim 2, it is characterised in that acquisition voice data corresponding with particular text, bag Include：

Choose the particular text to be read aloud；

Obtain specific speaker and voice is read aloud to the particular text；

Using it is described read aloud voice or to it is described read aloud voice be compressed processing after voice as with the particular text pair The voice data answered.

5. the method according to claim any one of 1-4, it is characterised in that described to be included with the information in sound storehouse：It can use The link information in sound storehouse, the sound storehouse that user's selection is downloaded from service end, including：

Corresponding sound storehouse is downloaded from service end according to the link information, wherein, the link information is that storage can be used behind sound storehouse Storage address.

6. the method according to claim any one of 1-4, it is characterised in that described using the sound storehouse downloaded, text is closed As voice, including：

When including acoustic model and acoustics segment in the sound storehouse, text is handled, according to the text after processing and institute State acoustic model and obtain parameters,acoustic, and corresponding acoustics segment is obtained according to the parameters,acoustic, and, to the acoustics of acquisition Segment carries out splicing synthesis, obtains synthesis voice；Or,

When including acoustic model in the sound storehouse, text is handled, according to the text after processing and the acoustic model Parameters,acoustic is obtained, vocoder parameter synthesis is carried out according to the parameters,acoustic, synthesis voice is obtained；Or,

When including acoustic model, particular text with corresponding voice data in the sound storehouse, text is pre-processed, in institute When stating in sound storehouse in the presence of the particular text consistent with pretreated text, sound number corresponding with the particular text is obtained According to, using the voice data or to the voice data carry out decompression after voice data be used as synthesis voice.

7. a kind of speech synthesis system, it is characterised in that including：Client terminal device, the client terminal device includes：

Enquiry module, it is described to use in the list of sound storehouse for when needing phonetic synthesis, the list of sound storehouse can be used from service end inquiry Include the information in multiple available sound storehouses, it is described to include characteristic sound storehouse with sound storehouse；

Acquisition module, user's choosing is downloaded for obtaining user according to the sound storehouse selected with the list of sound storehouse, and from service end The sound storehouse selected；

Synthesis module, for using the sound storehouse downloaded, text to be synthesized into voice；

The system also includes：Positioned at the first group system and the second group system of service end, the enquiry module is specifically used In：

It is described to be included with the information in sound storehouse：The information that correspondence is generated behind sound storehouse can be used creating, second group system is used The information of correspondence generation behind sound storehouse can be used in storing the establishment.

8. system according to claim 7, it is characterised in that also include：Service terminal device, the service terminal device includes Creation module for creating characteristic sound storehouse, the creation module specifically for：

9. system according to claim 8, it is characterised in that the creation module is used to set up characteristic acoustic model, wraps Include：

10. system according to claim 8, it is characterised in that the creation module is used to obtain corresponding with particular text Voice data, including：

Choose the particular text to be read aloud；

Obtain specific speaker and voice is read aloud to the particular text；

11. the system according to claim any one of 7-10, it is characterised in that described to be included with the information in sound storehouse：Can With the link information in sound storehouse, the system also includes：Positioned at the memory module of service end, the memory module, which is used to store, to be generated Available sound storehouse, and will can be used as link information with the storage address in sound storehouse.

12. the system according to claim any one of 7-10, it is characterised in that the synthesis module specifically for：