CN104916281B - Big language material sound library method of cutting out and system - Google Patents
Big language material sound library method of cutting out and system Download PDFInfo
- Publication number
- CN104916281B CN104916281B CN201510326068.3A CN201510326068A CN104916281B CN 104916281 B CN104916281 B CN 104916281B CN 201510326068 A CN201510326068 A CN 201510326068A CN 104916281 B CN104916281 B CN 104916281B
- Authority
- CN
- China
- Prior art keywords
- sound library
- selection
- cutting
- unit
- voice unit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Abstract
The invention discloses a kind of big language material sound library method of cutting out and system, this method to include:Acquisition cuts text comprising multi-field text data as auxiliary;Text is cut using the auxiliary to preselect the voice unit in big language material sound library based on decision-tree model, obtains pre-selection sound library;The cutting score of similarity calculation institute speech units between the voice unit for including according to the corresponding decision-tree model leaf node in the pre-selection sound library;The voice unit in pre-selection sound library is cut according to the cutting score of institute's speech units, the big language material sound library after being cut.The present invention can ensure the coverage of sound library voice unit while reducing sound library occupied space.
Description
Technical field
The present invention relates to field of voice signal more particularly to a kind of big language material sound library method of cutting out and systems.
Background technology
Sound library refers to the set of the voice data and its mark established for the research and development of speech technology, and one directly
Using be phonetic synthesis or be text-to-speech system (Text to Speech, TTS), pursue target be synthesis after sound
Sound is clear, can understand, is natural, having expressive force.The splicing system in big language material sound library has been accomplished this point and has been answered extensively well
With.But since big language material sound library occupied space is excessive, application field is greatly limited, such as embedded product field.
Although after being handled by technological means such as cluster, coding and compressions, occupied space can reduce, and sound quality is damaged, and spirit
Activity declines.Therefore, occurred more and more big language material sound library method of cutting out in recent years, it is intended to by cutting the language in sound library
Sound unit reduces the occupied space in sound library.
Existing sound library method of cutting out acquires the text of a large amount of all spectras first, is then determined using the training of big language material sound library
Plan tree-model;Acquisition text is synthesized followed by the decision-tree model, the voice unit in big language material sound library is carried out pre-
Choosing;The frequency finally used according to voice unit during pre-selection, the voice unit used pre-selection are cut, and cropping makes
With the lower voice unit of frequency.
According only to voice unit, frequency of use during pre-selection cuts pre-selection voice unit to existing method, due to
The variability of synthesis text, the lower voice unit of frequency of use are possible to when carrying out voice unit pre-selection using other texts
Frequency of use is higher, and the lower voice unit of some frequency of use has some characteristics, is necessary to big language material sound library
Voice unit.Therefore, directly the lower voice unit of frequency of use is cropped obviously unreasonable, while is also easily reduced big language
Expect the voice unit coverage in sound library.Therefore, it carries out the cutting of sound library according only to the frequency of use of voice unit and is susceptible to voice
The mistake of unit is cut, and is caused the naturalness when synthesizing other texts of the sound library after cutting to decline, application effect is reduced, such as in big language
Expect in the joint synthesis system of sound library, in synthesis, can not find suitable voice unit and spliced, to cause synthesis voice certainly
So degree declines.
Invention content
A kind of big language material sound library method of cutting out of offer of the embodiment of the present invention and system are reducing the same of sound library occupied space
When, ensure the coverage of sound library voice unit.
To achieve the above object, the embodiment of the present invention provides the following technical solutions:
A kind of big language material sound library method of cutting out, including:
Acquisition cuts text comprising multi-field text data as auxiliary;
Text is cut using the auxiliary to preselect the voice unit in big language material sound library based on decision-tree model, is obtained
To pre-selection sound library;
Similarity meter between the voice unit for including according to the corresponding decision-tree model leaf node in the pre-selection sound library
Calculate the cutting score of institute's speech units;
The voice unit in pre-selection sound library is cut according to the cutting score of institute's speech units, after being cut
Big language material sound library.
Preferably, described to cut text based on decision-tree model to the voice unit in big language material sound library using the auxiliary
It is preselected, obtaining pre-selection sound library includes:
Decision-tree model is trained using all voice units in the big language material sound library;
Text is cut to the auxiliary using the decision-tree model and carries out phonetic synthesis, and during recording phonetic synthesis
The voice unit and its frequency of use used;
Choose the voice unit generation pre-selection sound library that frequency of use is more than pre-selected threshold.
Preferably, between the voice unit for including according to the corresponding decision-tree model leaf node in the pre-selection sound library
The cutting scores of similarity calculation institute speech units include:
The corresponding decision-tree model in the pre-selection sound library is trained using all voice units in pre-selection sound library;
Calculate the phase between the voice unit that each leaf node includes in the corresponding decision-tree model in the pre-selection sound library
Like degree;
Count the frequency that each voice unit occurs in each leaf node in the corresponding decision-tree model in the pre-selection sound library;
The frequency and the voice unit occurred in its affiliated leaf node according to voice unit belongs to a leaf with it
The similarity of other voice units of node calculates the cutting score of current speech unit.
Preferably, the cutting score according to institute's speech units cuts the voice unit in pre-selection sound library,
Big language material sound library after being cut includes:
It deletes and cuts the voice unit that score is more than cutting threshold value, the big language material sound library after being cut in pre-selection sound library;
Or
The voice unit in pre-selection sound library is ranked up from high to low by score is cut, is then deleted according to setting ratio
Cut the larger voice unit of score, the big language material sound library after being cut.
Preferably, the method further includes:
The text data in setting field is acquired as specific cutting text;
Using the specific cutting text based on decision-tree model to the voice list in the big language material sound library after the cutting
Member is preselected, and setting field pre-selection sound library is obtained;
Between the voice unit for including according to the corresponding decision-tree model leaf node in setting field pre-selection sound library
The cutting score of similarity calculation institute speech units;
The voice unit in the pre-selection sound library of the setting field is cut according to the cutting score of institute's speech units,
Obtain setting field sound library.
Preferably, the method further includes:
Before the cutting score for calculating institute's speech units, the text data in acquisition setting field is as specific cutting text
This;
The voice unit in the pre-selection sound library is preselected based on decision-tree model using the specific text that cuts,
Setting field pre-selection sound library is obtained, and subsequent step is executed using setting field pre-selection sound library as new pre-selection sound library.
A kind of big language material sound library cutting system, including:
Data acquisition unit, for acquiring comprising multi-field text data as auxiliary cutting text;
Pre-selection unit, for cutting text based on decision-tree model to the voice list in big language material sound library using the auxiliary
Member is preselected, and pre-selection sound library is obtained;
Computing unit, voice unit for including according to the corresponding decision-tree model leaf node in the pre-selection sound library it
Between similarity calculation institute speech units cutting score;
Unit is cut, for being cut out to the voice unit in pre-selection sound library according to the cutting score of institute's speech units
It cuts, the big language material sound library after being cut.
Preferably, the pre-selection unit includes:
First training subelement, for training decision-tree model using all voice units in the big language material sound library;
Synthesizing subunit carries out phonetic synthesis for cutting text to the auxiliary using the decision-tree model, and remembers
The voice unit and its frequency of use used during record phonetic synthesis;
Subelement is chosen, pre-selection sound library is generated more than the voice unit of pre-selected threshold for choosing frequency of use.
Preferably, the computing unit includes:
Second training subelement, for training the pre-selection sound library corresponding using all voice units in pre-selection sound library
Decision-tree model;
Similarity calculation subelement, for calculating each leaf node packet in the corresponding decision-tree model in the pre-selection sound library
Similarity between the voice unit contained;
Subelement is counted, for counting in the corresponding decision-tree model in the pre-selection sound library each voice in each leaf node
The frequency that unit occurs;
Cut score computation subunit, the frequency for occurring in its affiliated leaf node according to voice unit and the language
Sound unit belongs to the similarity of other voice units of a leaf node with it, calculates the cutting score of current speech unit.
Preferably, the unit that cuts is specifically for deleting the voice list for cutting score in pre-selection sound library and being more than cutting threshold value
Member, the big language material sound library after being cut;Or the voice unit in pre-selection sound library is arranged from high to low by score is cut
Then sequence deletes according to setting ratio and cuts the larger voice unit of score, the big language material sound library after being cut.
Preferably, the data acquisition unit is additionally operable to the text data in acquisition setting field as specific cutting text;
The pre-selection unit is additionally operable to using the specific cutting text based on decision-tree model to the big language material sound library after the cutting
In voice unit preselected, obtain setting field pre-selection sound library;The computing unit is additionally operable to according to the setting field
Similarity calculation institute speech units between the voice unit that the corresponding decision-tree model leaf node in pre-selection sound library includes
Cut score;The cutting unit is additionally operable to the cutting score according to institute's speech units to setting field pre-selection sound library
In voice unit cut, obtain setting field sound library.
Preferably, the system also includes:
The data acquisition unit is additionally operable to before the cutting score that the computing unit calculates institute's speech units,
The text data in setting field is acquired as specific cutting text;
The pre-selection unit is additionally operable to be based on decision-tree model in the pre-selection sound library using the specific cutting text
Voice unit preselected, obtain setting field pre-selection sound library, and the setting field is preselected into sound library as new pre-selection
Sound library sends the computing unit to.
Big language material sound library method of cutting out provided in an embodiment of the present invention and system, based on decision-tree model to big language material sound library
In voice unit preselected, obtain pre-selection sound library;Then further according to the corresponding decision-tree model leaf in the pre-selection sound library
The cutting score of similarity calculation institute speech units between the voice unit that node includes;According to the sanction of institute's speech units
It cuts score to cut the voice unit in pre-selection sound library, the big language material sound library after being cut.In this way, phase can be cropped
Like higher voice unit is spent, that is, the redundant voice unit in big language material sound library is removed, so as to reduce sound library occupancy sky
Between while, ensure the coverage of big language material sound library voice unit.
Description of the drawings
It, below will be to attached drawing needed in the embodiment in order to illustrate more clearly of the technical solution that the present invention is implemented
It is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, general for this field
For logical technical staff, without creative efforts, other drawings may also be obtained based on these drawings.
Fig. 1 shows the flow chart of big language material sound library method of cutting out first embodiment of the invention;
Fig. 2 shows the flow charts of big language material sound library of the invention method of cutting out second embodiment;
Fig. 3 shows the flow chart of big language material sound library method of cutting out 3rd embodiment of the invention;
Fig. 4 shows the structural schematic diagram of big language material sound library cutting system embodiment of the invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
As shown in Figure 1, being the flow chart of big language material sound library method of cutting out first embodiment of the invention, include the following steps:
Step 101, acquisition cuts text comprising multi-field text data as auxiliary.
In practical applications, it includes each field common words that the text of acquisition, which is answered as much as possible,.
Step 102, using the auxiliary cut text based on decision-tree model to the voice unit in big language material sound library into
Row pre-selection obtains pre-selection sound library.
The pre-selection process of voice unit based on decision-tree model is as follows:
(1) decision-tree model is trained using all voice units in big language material sound library.
The decision-tree model usually can be according to the context-related information of voice unit in big language material sound library, by pre-
The context-sensitive problem set of choosing design builds to obtain, and specific training process similarly to the prior art, is not described in detail herein.
(2) text is cut to auxiliary using decision-tree model and carries out phonetic synthesis, and record use during phonetic synthesis
Voice unit and its frequency of use.
After cutting text progress model prediction to auxiliary according to decision-tree model, synthesis auxiliary is selected from big language material sound library
The required voice unit of text is cut, synthesis voice is obtained after being spliced, records voice list in the big language material sound library used
The number and frequency of use of member, similarly to the prior art, and will not be described here in detail for detailed process.
(3) according to the frequency of use of voice unit and preset pre-selected threshold, to the voice unit in big language material sound library into
Row pre-selection specifically chooses the voice unit generation pre-selection sound library that frequency of use is more than pre-selected threshold.In other words, that is, it cuts
Fall the voice unit that frequency of use is less than or equal to pre-selected threshold.
Step 103, between the voice unit for including according to the corresponding decision-tree model leaf node in the pre-selection sound library
The cutting score of similarity calculation institute speech units.
First, the corresponding decision-tree model in the pre-selection sound library, tool are trained using all voice units in pre-selection sound library
Body training process is same as the prior art, and details are not described herein.Then, to preselect the leaf in the corresponding decision-tree model in sound library
Node is unit, calculates the similarity of acoustic feature between the voice unit for including in each leaf node, the acoustic feature
Can and count every in the corresponding decision-tree model in the pre-selection sound library such as one or more of fundamental frequency, frequency spectrum, duration
The frequency that each voice unit occurs in a leaf node.Finally further according to the similarity and voice unit in its affiliated leaf section
The frequency occurred in point calculates the cutting score of voice unit, and detailed process is as follows:
(1) similarity score between the voice unit for including in each leaf node in decision-tree model is calculated, specifically
Calculation formula is as follows:
Wherein, SijIndicate the similarity score of i-th of voice unit and j-th of voice unit in current leaf node, xik
And xjkThe kth dimensional feature parameter of i-th of voice unit feature and j-th of voice unit feature, v are indicated respectivelyk 2Indicate kth dimension
The global variance of feature, m indicate the dimension of voice unit current signature, if spectrum signature is 39 dimensions, then m=39.
As can be seen from the above equation, similarity score is got over hour, and two voice units are more close, i.e., similarity is higher.
(2) frequency that each voice unit occurs in each leaf node in the corresponding decision-tree model in statistics pre-selection sound library.
(3) frequency and the voice unit occurred in its affiliated leaf node according to voice unit belongs to a leaf with it
The similarity of other voice units of child node calculates the cutting score of current speech unit.
The cutting score of institute's speech units for describing the possibility that current speech unit should be cut, get over by score
Greatly, current speech unit may be more cut, and score is smaller, can not possibly be more cut, specific formula for calculation is as follows:
Wherein, Cscore(xi) indicate i-th of voice unit cutting score, fiIndicate that i-th of voice unit is working as frontal lobe
The frequency occurred in child node, fjIndicate that the frequency that j-th of voice unit occurs in current leaf node, n expressions work as frontal lobe
In child node in addition to i-th of voice unit, the quantity of other voice units.
As can be seen from the above equation, current speech unit and the similarity of other voice units are higher, i.e., similarity score is got over
Hour, cutting score is bigger, easier to be cut.
Step 104, the voice unit in pre-selection sound library is cut according to the cutting score of institute's speech units, is obtained
Big language material sound library after cutting.
Specifically, the voice unit in pre-selection sound library can be cut according to preset cutting threshold value, such as:
Wherein, I (xi) cutting of i-th of voice unit in current leaf node is indicated as a result, σ is to cut score threshold, I
(xi)=1 indicates to cut i-th of voice unit, I (xi)=0 indicates not cutting i-th of voice unit.
Alternatively, being ranked up from high to low to the voice unit in pre-selection sound library by score is cut, then according to setting ratio
Example (such as 8%), which is deleted, cuts the larger voice unit of score, the big language material sound library after being cut.
Big language material sound library provided in an embodiment of the present invention method of cutting out, based on decision-tree model to the language in big language material sound library
Sound unit is preselected, and pre-selection sound library is obtained;Then include further according to the corresponding decision-tree model leaf node in pre-selection sound library
The cutting score of similarity calculation institute speech units between voice unit;According to the cutting score of institute's speech units to pre-
The voice unit in sound library is selected to be cut, the big language material sound library after being cut.In this way, it is higher to crop similarity
Voice unit removes the redundant voice unit in big language material sound library, so as to while reducing sound library occupied space, protect
Demonstrate,prove the coverage of big language material sound library voice unit.
Further, a large amount of texts in some or certain specific areas can also be collected, as specific cutting text.So
Afterwards, the big language material sound library after cutting is repeated the above process according to the specific cutting text, obtains setting field sound library;Or
Run-off primary is carried out to the pre-selection sound library obtained after pre-selection according to the specific text that cuts, obtains setting field pre-selection sound
Then library again cuts setting field pre-selection sound library, obtains setting field sound library according to the method described above.
The two ways for generating setting field sound library is described in detail below.
As shown in Fig. 2, being the flow chart of big language material sound library method of cutting out second embodiment of the invention, include the following steps:
Step 201, acquisition cuts text comprising multi-field text data as auxiliary.
Step 202, using the auxiliary cut text based on decision-tree model to the voice unit in big language material sound library into
Row pre-selection obtains pre-selection sound library.
Step 203, between the voice unit for including according to the corresponding decision-tree model leaf node in the pre-selection sound library
The cutting score of similarity calculation institute speech units.
Step 204, the voice unit in pre-selection sound library is cut according to the cutting score of institute's speech units, is obtained
Big language material sound library after cutting.
Above-mentioned steps 201 are identical as the realization process of step corresponding in embodiment illustrated in fig. 1 to step 204, herein not
It repeats again.
Step 205, the text data in acquisition setting field is as specific cutting text.
Specifically, the text data of particular area or several specific areas can be acquired.
Step 206, using the specific cutting text based on decision-tree model in the big language material sound library after the cutting
Voice unit preselected, obtain setting field pre-selection sound library.
Specific pre-selection process is similar with the pre-selection process of step 202, and this will not be detailed here.
Step 207, the voice list that the corresponding decision-tree model leaf node in sound library includes is preselected according to the setting field
The cutting score of similarity calculation institute speech units between member.
Circular can be found in the description in the method for the present invention first embodiment of front.
Step 208, according to the cutting score of institute's speech units to the voice unit in the pre-selection sound library of the setting field
It is cut, obtains setting field sound library.
Specific method of cutting out can be found in the description in the method for the present invention first embodiment of front.
As shown in figure 3, being the flow chart of big language material sound library method of cutting out 3rd embodiment of the invention, include the following steps:
Step 301, acquisition cuts text comprising multi-field text data as auxiliary.
Step 302, using the auxiliary cut text based on decision-tree model to the voice unit in big language material sound library into
Row pre-selection obtains pre-selection sound library.
Step 303, the text data in acquisition setting field is as specific cutting text.
Step 304, using described specific text is cut based on decision-tree model to the voice unit in the pre-selection sound library
It is preselected, obtains setting field pre-selection sound library.
Step 305, the voice list that the corresponding decision-tree model leaf node in sound library includes is preselected according to the setting field
The cutting score of similarity calculation institute speech units between member.
Step 306, according to the cutting score of institute's speech units to the voice unit in the pre-selection sound library of the setting field
It is cut, obtains setting field sound library.
Correspondingly, the embodiment of the present invention also provides a kind of big language material sound library cutting system, as shown in figure 4, being system reality
Apply the structural schematic diagram of example.
In this embodiment, the system comprises:
Data acquisition unit 401, for acquiring comprising multi-field text data as auxiliary cutting text;
Pre-selection unit 402, for cutting text based on decision-tree model to the language in big language material sound library using the auxiliary
Sound unit is preselected, and pre-selection sound library is obtained;
Computing unit 403, the voice list for including according to the corresponding decision-tree model leaf node in the pre-selection sound library
The cutting score of similarity calculation institute speech units between member;
Unit 404 is cut, for being carried out to the voice unit in pre-selection sound library according to the cutting score of institute's speech units
It cuts, the big language material sound library after being cut.
A kind of concrete structure of above-mentioned pre-selection unit 402 may include following subelement:
First training subelement, for training decision-tree model using all voice units in the big language material sound library;
Synthesizing subunit carries out phonetic synthesis for cutting text to the auxiliary using the decision-tree model, and remembers
The voice unit and its frequency of use used during record phonetic synthesis;
Subelement is chosen, pre-selection sound library is generated more than the voice unit of pre-selected threshold for choosing frequency of use.
A kind of concrete structure of above-mentioned computing unit 403 may include following subelement:
Second training subelement, for training the pre-selection sound library corresponding using all voice units in pre-selection sound library
Decision-tree model;
Similarity calculation subelement, for calculating each leaf node packet in the corresponding decision-tree model in the pre-selection sound library
Similarity between the voice unit contained;
Subelement is counted, for counting in the corresponding decision-tree model in the pre-selection sound library each voice in each leaf node
The frequency that unit occurs;
Cut score computation subunit, the frequency for occurring in its affiliated leaf node according to voice unit and the language
Sound unit belongs to the similarity of other voice units of a leaf node with it, calculates the cutting score of current speech unit.
Specific calculating process can be found in the description in the method for the present invention embodiment of front, and details are not described herein.
Above-mentioned cutting unit 404, which can specifically be deleted, cuts the voice unit that score is more than cutting threshold value in pre-selection sound library,
Big language material sound library after being cut;Or the voice unit in pre-selection sound library is ranked up from high to low by score is cut,
Then it is deleted according to setting ratio and cuts the larger voice unit of score, the big language material sound library after being cut.
Big language material sound library provided in an embodiment of the present invention cutting system, based on decision-tree model to the language in big language material sound library
Sound unit is preselected, and pre-selection sound library is obtained;Then further according to the corresponding decision-tree model leaf node packet in the pre-selection sound library
The cutting score of similarity calculation institute speech units between the voice unit contained;According to the cutting score of institute's speech units
Voice unit in pre-selection sound library is cut, the big language material sound library after being cut.In this way, can crop similarity compared with
High voice unit removes the redundant voice unit in big language material sound library, so as to reduce the same of sound library occupied space
When, ensure the coverage of big language material sound library voice unit.
Further, the present invention big language material sound library cutting system can also be collected a large amount of in some or certain specific areas
Text, as specific cutting text.Then, above-mentioned mistake is repeated to the big language material sound library after cutting according to the specific text that cuts
Journey obtains setting field sound library.That is, in another embodiment of present system, above-mentioned each unit be also respectively provided with
Lower function:
The data acquisition unit 401 is additionally operable to the text data in acquisition setting field as specific cutting text;
The pre-selection unit 402 is additionally operable to after being based on decision-tree model to the cutting using the specific cutting text
Big language material sound library in voice unit preselected, obtain setting field pre-selection sound library;
The computing unit 403 is additionally operable to preselect the corresponding decision-tree model leaf section in sound library according to the setting field
The cutting score of similarity calculation institute speech units between the voice unit that point includes;
The cutting unit 404 is additionally operable to preselect sound to the setting field according to the cutting score of institute's speech units
Voice unit in library is cut, and setting field sound library is obtained.
Further, the present invention big language material sound library cutting system can also be collected a large amount of in some or certain specific areas
Text, as specific cutting text.Then, second is carried out to the pre-selection sound library obtained after pre-selection according to the specific text that cuts
Secondary pre-selection obtains setting field pre-selection sound library, then cuts, obtains according to the method described above to setting field pre-selection sound library again
To setting field sound library.That is, in another embodiment of present system, above-mentioned each unit is also respectively provided with following work(
Energy:
The data acquisition unit 401 is additionally operable to cut to divide it in computing unit calculating institute speech units
Before, the text data in acquisition setting field is as specific cutting text;
The pre-selection unit 402 is additionally operable to be based on decision-tree model to the pre-selection sound using the specific cutting text
Voice unit in library is preselected, obtain setting field pre-selection sound library, and using setting field pre-selection sound library as newly
Pre-selection sound library sends the computing unit to.
Each embodiment in this specification is described in a progressive manner, identical similar portion between each embodiment
Point just to refer each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality
For applying example, since it is substantially similar to the method embodiment, so describing fairly simple, related place is referring to embodiment of the method
Part explanation.System embodiment described above is only schematical, wherein described be used as separating component explanation
Unit and module may or may not be physically separated.Furthermore it is also possible to select according to the actual needs it
In some or all of unit and module achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying
In the case of creative work, you can to understand and implement.
The structure, feature and effect of the present invention, the above institute are described in detail based on the embodiments shown in the drawings
Only presently preferred embodiments of the present invention is stated, but the present invention is not to limit practical range, every structure according to the present invention shown in drawing
Change made by thinking, or is revised as the equivalent embodiment of equivalent variations, when not going beyond the spirit of the description and the drawings,
It should all be within the scope of the present invention.
Claims (12)
1. a kind of big language material sound library method of cutting out, which is characterized in that including:
Acquisition cuts text comprising multi-field text data as auxiliary;
Text is cut using the auxiliary to preselect the voice unit in big language material sound library based on decision-tree model, is obtained pre-
Select sound library;
Voice unit in the pre-selection sound library for including according to the corresponding decision-tree model leaf node in the pre-selection sound library it
Between similarity, calculate the cutting score of the voice unit in the pre-selection sound library;
The voice unit in the pre-selection sound library is cut according to the cutting score, the big language material sound after being cut
Library.
2. according to the method described in claim 1, it is characterized in that, described cut text based on decision tree mould using the auxiliary
Type preselects the voice unit in big language material sound library, obtains pre-selection sound library and includes:
Decision-tree model is trained using all voice units in the big language material sound library;
Text is cut to the auxiliary using the decision-tree model and carries out phonetic synthesis, and records use during phonetic synthesis
Voice unit and its frequency of use;
Choose the voice unit generation pre-selection sound library that frequency of use is more than pre-selected threshold.
3. according to the method described in claim 2, it is characterized in that, described according to the corresponding decision-tree model in the pre-selection sound library
The similarity between voice unit in the pre-selection sound library that leaf node includes calculates the voice list in the pre-selection sound library
Member cutting score include:
The corresponding decision-tree model in the pre-selection sound library is trained using all voice units in pre-selection sound library;
Calculate the similarity between the voice unit that each leaf node includes in the corresponding decision-tree model in the pre-selection sound library;
Count the frequency that each voice unit occurs in each leaf node in the corresponding decision-tree model in the pre-selection sound library;
The frequency and the voice unit occurred in its affiliated leaf node according to voice unit belongs to a leaf node with it
Other voice units similarity, calculate current speech unit cutting score.
4. according to the method described in claim 3, it is characterized in that, it is described according to the cutting score in the pre-selection sound library
Voice unit cut, the big language material sound library after being cut includes:
It deletes and cuts the voice unit that score is more than cutting threshold value, the big language material sound library after being cut in pre-selection sound library;Or
The voice unit in pre-selection sound library is ranked up from high to low by score is cut, then deletes and cuts according to setting ratio
The larger voice unit of score, the big language material sound library after being cut.
5. method according to any one of claims 1 to 4, which is characterized in that the method further includes:
The text data in setting field is acquired as specific cutting text;
Using the specific cutting text based on decision-tree model to the voice unit in the big language material sound library after the cutting into
Row pre-selection obtains setting field pre-selection sound library;
It is similar between the voice unit for including according to the corresponding decision-tree model leaf node in setting field pre-selection sound library
Degree calculates the cutting score of institute's speech units;
The voice unit in the pre-selection sound library of the setting field is cut according to the cutting score of institute's speech units, is obtained
Setting field sound library.
6. method according to any one of claims 1 to 4, which is characterized in that the method further includes:
Before the cutting score for calculating institute's speech units, the text data in acquisition setting field is as specific cutting text;
The voice unit in the pre-selection sound library is preselected based on decision-tree model using the specific text that cuts, is obtained
Setting field preselects sound library, and executes subsequent step using setting field pre-selection sound library as new pre-selection sound library.
7. a kind of big language material sound library cutting system, which is characterized in that including:
Data acquisition unit, for acquiring comprising multi-field text data as auxiliary cutting text;
Pre-selection unit, for using the auxiliary cutting text based on decision-tree model to the voice unit in big language material sound library into
Row pre-selection obtains pre-selection sound library;
Computing unit, for being preselected in the pre-selection sound library that the corresponding decision-tree model leaf node in sound library includes according to described
Voice unit between similarity, calculate the cutting score of the voice unit in the pre-selection sound library;
Unit is cut, for being cut to the voice unit in the pre-selection sound library according to the cutting score, is cut
Big language material sound library afterwards.
8. system according to claim 7, which is characterized in that the pre-selection unit includes:
First training subelement, for training decision-tree model using all voice units in the big language material sound library;
Synthesizing subunit carries out phonetic synthesis for cutting text to the auxiliary using the decision-tree model, and records language
The voice unit and its frequency of use used in sound building-up process;
Subelement is chosen, pre-selection sound library is generated more than the voice unit of pre-selected threshold for choosing frequency of use.
9. system according to claim 8, which is characterized in that the computing unit includes:
Second training subelement, for training the corresponding decision in the pre-selection sound library using all voice units in pre-selection sound library
Tree-model;
Similarity calculation subelement includes for calculating each leaf node in the corresponding decision-tree model in the pre-selection sound library
Similarity between voice unit;
Subelement is counted, for counting in the corresponding decision-tree model in the pre-selection sound library each voice unit in each leaf node
The frequency of appearance;
Cut score computation subunit, the frequency for occurring in its affiliated leaf node according to voice unit and the voice list
Member belongs to the similarity of other voice units of a leaf node with it, calculates the cutting score of current speech unit.
10. system according to claim 9, which is characterized in that
The cutting unit is specifically used for deleting cutting score in pre-selection sound library and is more than the voice unit for cutting threshold value, cut out
Big language material sound library after cutting;Or the voice unit in pre-selection sound library is ranked up from high to low by score is cut, then press
It is deleted according to setting ratio and cuts the larger voice unit of score, the big language material sound library after being cut.
11. according to claim 7 to 10 any one of them system, which is characterized in that
The data acquisition unit is additionally operable to the text data in acquisition setting field as specific cutting text;
The pre-selection unit is additionally operable to using the specific cutting text based on decision-tree model to the big language material after the cutting
Voice unit in sound library is preselected, and setting field pre-selection sound library is obtained;
The computing unit, be additionally operable to preselect the corresponding decision-tree model leaf node in sound library according to the setting field include
The cutting score of similarity calculation institute speech units between voice unit;
The cutting unit is additionally operable to the cutting score according to institute's speech units to the language in the pre-selection sound library of the setting field
Sound unit is cut, and setting field sound library is obtained.
12. according to claim 7 to 10 any one of them system, which is characterized in that the system also includes:
The data acquisition unit is additionally operable to before the cutting score that the computing unit calculates institute's speech units, acquisition
The text data in setting field is as specific cutting text;
The pre-selection unit is additionally operable to using the specific cutting text based on decision-tree model to the language in the pre-selection sound library
Sound unit is preselected, and obtains setting field pre-selection sound library, and the setting field is preselected sound library as new pre-selection sound library
Send the computing unit to.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510326068.3A CN104916281B (en) | 2015-06-12 | 2015-06-12 | Big language material sound library method of cutting out and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510326068.3A CN104916281B (en) | 2015-06-12 | 2015-06-12 | Big language material sound library method of cutting out and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104916281A CN104916281A (en) | 2015-09-16 |
CN104916281B true CN104916281B (en) | 2018-09-21 |
Family
ID=54085310
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510326068.3A Active CN104916281B (en) | 2015-06-12 | 2015-06-12 | Big language material sound library method of cutting out and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104916281B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107492371A (en) * | 2017-07-17 | 2017-12-19 | 广东讯飞启明科技发展有限公司 | A kind of big language material sound storehouse method of cutting out |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1471027A (en) * | 2002-07-25 | 2004-01-28 | 摩托罗拉公司 | Method and apparatus for compressing voice library |
CN1731509A (en) * | 2005-09-02 | 2006-02-08 | 清华大学 | Mobile speech synthesis method |
CN1924994A (en) * | 2005-08-31 | 2007-03-07 | 中国科学院自动化研究所 | Embedded language synthetic method and system |
CN102063897A (en) * | 2010-12-09 | 2011-05-18 | 北京宇音天下科技有限公司 | Sound library compression for embedded type voice synthesis system and use method thereof |
CN102201232A (en) * | 2011-06-01 | 2011-09-28 | 北京宇音天下科技有限公司 | Voice database structure compression used for embedded voice synthesis system and use method thereof |
CN102281196A (en) * | 2011-08-11 | 2011-12-14 | 中兴通讯股份有限公司 | Decision tree generating method and equipment, decision-tree-based message classification method and equipment |
CN102298635A (en) * | 2011-09-13 | 2011-12-28 | 苏州大学 | Method and system for fusing event information |
CN103077704A (en) * | 2010-12-09 | 2013-05-01 | 北京宇音天下科技有限公司 | Voice library compression and use method for embedded voice synthesis system |
CN104103268A (en) * | 2013-04-03 | 2014-10-15 | 中国移动通信集团安徽有限公司 | Corpus processing method, device and voice synthesis system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR100612843B1 (en) * | 2004-02-28 | 2006-08-14 | 삼성전자주식회사 | Method for compensating probability density function, method and apparatus for speech recognition thereby |
US8886523B2 (en) * | 2010-04-14 | 2014-11-11 | Huawei Technologies Co., Ltd. | Audio decoding based on audio class with control code for post-processing modes |
-
2015
- 2015-06-12 CN CN201510326068.3A patent/CN104916281B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1471027A (en) * | 2002-07-25 | 2004-01-28 | 摩托罗拉公司 | Method and apparatus for compressing voice library |
CN1924994A (en) * | 2005-08-31 | 2007-03-07 | 中国科学院自动化研究所 | Embedded language synthetic method and system |
CN1731509A (en) * | 2005-09-02 | 2006-02-08 | 清华大学 | Mobile speech synthesis method |
CN102063897A (en) * | 2010-12-09 | 2011-05-18 | 北京宇音天下科技有限公司 | Sound library compression for embedded type voice synthesis system and use method thereof |
CN103077704A (en) * | 2010-12-09 | 2013-05-01 | 北京宇音天下科技有限公司 | Voice library compression and use method for embedded voice synthesis system |
CN102201232A (en) * | 2011-06-01 | 2011-09-28 | 北京宇音天下科技有限公司 | Voice database structure compression used for embedded voice synthesis system and use method thereof |
CN102281196A (en) * | 2011-08-11 | 2011-12-14 | 中兴通讯股份有限公司 | Decision tree generating method and equipment, decision-tree-based message classification method and equipment |
CN102298635A (en) * | 2011-09-13 | 2011-12-28 | 苏州大学 | Method and system for fusing event information |
CN104103268A (en) * | 2013-04-03 | 2014-10-15 | 中国移动通信集团安徽有限公司 | Corpus processing method, device and voice synthesis system |
Non-Patent Citations (2)
Title |
---|
"基于决策树的汉语大语料库合成系统";王仁华;《第六届全国人机语音通讯学术会议》;20011120;全文 * |
"语音库裁减的一种不定长递阶聚类方法";张巍 等;《计算机学报》;20071130;第30卷(第11期);全文 * |
Also Published As
Publication number | Publication date |
---|---|
CN104916281A (en) | 2015-09-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US6535852B2 (en) | Training of text-to-speech systems | |
DE102020205786A1 (en) | VOICE RECOGNITION USING NATURAL LANGUAGE UNDERSTANDING (NLU) RELATED KNOWLEDGE ABOUT DEEP FORWARD NEURAL NETWORKS | |
DE69832393T2 (en) | LANGUAGE RECOGNITION SYSTEM FOR THE DETECTION OF CONTINUOUS AND ISOLATED LANGUAGE | |
DE602005002706T2 (en) | Method and system for the implementation of text-to-speech | |
DE602004012909T2 (en) | A method and apparatus for modeling a speech recognition system and estimating a word error rate based on a text | |
DE112017001830T5 (en) | LANGUAGE IMPROVEMENT AND AUDIO EVENT DETECTION FOR AN ENVIRONMENT WITH NON-STATIONARY NOISE | |
CN1924994B (en) | Embedded language synthetic method and system | |
DE69629763T2 (en) | Method and device for determining triphone hidden markov models (HMM) | |
US11282503B2 (en) | Voice conversion training method and server and computer readable storage medium | |
DE60004420T2 (en) | Recognition of areas of overlapping elements for a concatenative speech synthesis system | |
EP0710378A1 (en) | A method and apparatus for converting text into audible signals using a neural network | |
CN106653056A (en) | Fundamental frequency extraction model based on LSTM recurrent neural network and training method thereof | |
CN104538024A (en) | Speech synthesis method, apparatus and equipment | |
DE60201939T2 (en) | Device for speaker-independent speech recognition, based on a client-server system | |
DE102008040739A1 (en) | Method and system for calculating or determining confidence or confidence scores for syntax trees at all levels | |
CN105893414A (en) | Method and apparatus for screening valid term of a pronunciation lexicon | |
DE69727046T2 (en) | METHOD, DEVICE AND SYSTEM FOR GENERATING SEGMENT PERIODS IN A TEXT-TO-LANGUAGE SYSTEM | |
CN106531157A (en) | Regularization accent adapting method for speech recognition | |
EP1611568A1 (en) | Three-stage word recognition | |
CN106205601B (en) | Determine the method and system of text voice unit | |
JP2008090272A (en) | Using child directed speech to bootstrap model based speech segmentation and recognition system | |
DE112006000322T5 (en) | Audio recognition system for generating response audio using extracted audio data | |
EP1187095B1 (en) | Grapheme-phoneme assignment | |
CN104123857B (en) | A kind of Apparatus and method for realizing personalized some reading | |
CN106297794A (en) | The conversion method of a kind of language and characters and equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right |
Effective date of registration: 20200512 Address after: 200335 room 1966, floor 1, building 8, No. 33, Guangshun Road, Changning District, Shanghai Patentee after: IFLYTEK (Shanghai) Technology Co., Ltd Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666 Patentee before: IFLYTEK Co.,Ltd. |
|
TR01 | Transfer of patent right |