CN104916281B - Big language material sound library method of cutting out and system - Google Patents

Big language material sound library method of cutting out and system Download PDF

Info

Publication number
CN104916281B
CN104916281B CN201510326068.3A CN201510326068A CN104916281B CN 104916281 B CN104916281 B CN 104916281B CN 201510326068 A CN201510326068 A CN 201510326068A CN 104916281 B CN104916281 B CN 104916281B
Authority
CN
China
Prior art keywords
sound library
selection
cutting
unit
voice unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201510326068.3A
Other languages
Chinese (zh)
Other versions
CN104916281A (en
Inventor
陈彬彬
高毅
于振华
王影
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Iflytek Shanghai Technology Co Ltd
Original Assignee
iFlytek Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by iFlytek Co Ltd filed Critical iFlytek Co Ltd
Priority to CN201510326068.3A priority Critical patent/CN104916281B/en
Publication of CN104916281A publication Critical patent/CN104916281A/en
Application granted granted Critical
Publication of CN104916281B publication Critical patent/CN104916281B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of big language material sound library method of cutting out and system, this method to include:Acquisition cuts text comprising multi-field text data as auxiliary;Text is cut using the auxiliary to preselect the voice unit in big language material sound library based on decision-tree model, obtains pre-selection sound library;The cutting score of similarity calculation institute speech units between the voice unit for including according to the corresponding decision-tree model leaf node in the pre-selection sound library;The voice unit in pre-selection sound library is cut according to the cutting score of institute's speech units, the big language material sound library after being cut.The present invention can ensure the coverage of sound library voice unit while reducing sound library occupied space.

Description

Big language material sound library method of cutting out and system
Technical field
The present invention relates to field of voice signal more particularly to a kind of big language material sound library method of cutting out and systems.
Background technology
Sound library refers to the set of the voice data and its mark established for the research and development of speech technology, and one directly Using be phonetic synthesis or be text-to-speech system (Text to Speech, TTS), pursue target be synthesis after sound Sound is clear, can understand, is natural, having expressive force.The splicing system in big language material sound library has been accomplished this point and has been answered extensively well With.But since big language material sound library occupied space is excessive, application field is greatly limited, such as embedded product field. Although after being handled by technological means such as cluster, coding and compressions, occupied space can reduce, and sound quality is damaged, and spirit Activity declines.Therefore, occurred more and more big language material sound library method of cutting out in recent years, it is intended to by cutting the language in sound library Sound unit reduces the occupied space in sound library.
Existing sound library method of cutting out acquires the text of a large amount of all spectras first, is then determined using the training of big language material sound library Plan tree-model;Acquisition text is synthesized followed by the decision-tree model, the voice unit in big language material sound library is carried out pre- Choosing;The frequency finally used according to voice unit during pre-selection, the voice unit used pre-selection are cut, and cropping makes With the lower voice unit of frequency.
According only to voice unit, frequency of use during pre-selection cuts pre-selection voice unit to existing method, due to The variability of synthesis text, the lower voice unit of frequency of use are possible to when carrying out voice unit pre-selection using other texts Frequency of use is higher, and the lower voice unit of some frequency of use has some characteristics, is necessary to big language material sound library Voice unit.Therefore, directly the lower voice unit of frequency of use is cropped obviously unreasonable, while is also easily reduced big language Expect the voice unit coverage in sound library.Therefore, it carries out the cutting of sound library according only to the frequency of use of voice unit and is susceptible to voice The mistake of unit is cut, and is caused the naturalness when synthesizing other texts of the sound library after cutting to decline, application effect is reduced, such as in big language Expect in the joint synthesis system of sound library, in synthesis, can not find suitable voice unit and spliced, to cause synthesis voice certainly So degree declines.
Invention content
A kind of big language material sound library method of cutting out of offer of the embodiment of the present invention and system are reducing the same of sound library occupied space When, ensure the coverage of sound library voice unit.
To achieve the above object, the embodiment of the present invention provides the following technical solutions:
A kind of big language material sound library method of cutting out, including:
Acquisition cuts text comprising multi-field text data as auxiliary;
Text is cut using the auxiliary to preselect the voice unit in big language material sound library based on decision-tree model, is obtained To pre-selection sound library;
Similarity meter between the voice unit for including according to the corresponding decision-tree model leaf node in the pre-selection sound library Calculate the cutting score of institute's speech units;
The voice unit in pre-selection sound library is cut according to the cutting score of institute's speech units, after being cut Big language material sound library.
Preferably, described to cut text based on decision-tree model to the voice unit in big language material sound library using the auxiliary It is preselected, obtaining pre-selection sound library includes:
Decision-tree model is trained using all voice units in the big language material sound library;
Text is cut to the auxiliary using the decision-tree model and carries out phonetic synthesis, and during recording phonetic synthesis The voice unit and its frequency of use used;
Choose the voice unit generation pre-selection sound library that frequency of use is more than pre-selected threshold.
Preferably, between the voice unit for including according to the corresponding decision-tree model leaf node in the pre-selection sound library The cutting scores of similarity calculation institute speech units include:
The corresponding decision-tree model in the pre-selection sound library is trained using all voice units in pre-selection sound library;
Calculate the phase between the voice unit that each leaf node includes in the corresponding decision-tree model in the pre-selection sound library Like degree;
Count the frequency that each voice unit occurs in each leaf node in the corresponding decision-tree model in the pre-selection sound library;
The frequency and the voice unit occurred in its affiliated leaf node according to voice unit belongs to a leaf with it The similarity of other voice units of node calculates the cutting score of current speech unit.
Preferably, the cutting score according to institute's speech units cuts the voice unit in pre-selection sound library, Big language material sound library after being cut includes:
It deletes and cuts the voice unit that score is more than cutting threshold value, the big language material sound library after being cut in pre-selection sound library; Or
The voice unit in pre-selection sound library is ranked up from high to low by score is cut, is then deleted according to setting ratio Cut the larger voice unit of score, the big language material sound library after being cut.
Preferably, the method further includes:
The text data in setting field is acquired as specific cutting text;
Using the specific cutting text based on decision-tree model to the voice list in the big language material sound library after the cutting Member is preselected, and setting field pre-selection sound library is obtained;
Between the voice unit for including according to the corresponding decision-tree model leaf node in setting field pre-selection sound library The cutting score of similarity calculation institute speech units;
The voice unit in the pre-selection sound library of the setting field is cut according to the cutting score of institute's speech units, Obtain setting field sound library.
Preferably, the method further includes:
Before the cutting score for calculating institute's speech units, the text data in acquisition setting field is as specific cutting text This;
The voice unit in the pre-selection sound library is preselected based on decision-tree model using the specific text that cuts, Setting field pre-selection sound library is obtained, and subsequent step is executed using setting field pre-selection sound library as new pre-selection sound library.
A kind of big language material sound library cutting system, including:
Data acquisition unit, for acquiring comprising multi-field text data as auxiliary cutting text;
Pre-selection unit, for cutting text based on decision-tree model to the voice list in big language material sound library using the auxiliary Member is preselected, and pre-selection sound library is obtained;
Computing unit, voice unit for including according to the corresponding decision-tree model leaf node in the pre-selection sound library it Between similarity calculation institute speech units cutting score;
Unit is cut, for being cut out to the voice unit in pre-selection sound library according to the cutting score of institute's speech units It cuts, the big language material sound library after being cut.
Preferably, the pre-selection unit includes:
First training subelement, for training decision-tree model using all voice units in the big language material sound library;
Synthesizing subunit carries out phonetic synthesis for cutting text to the auxiliary using the decision-tree model, and remembers The voice unit and its frequency of use used during record phonetic synthesis;
Subelement is chosen, pre-selection sound library is generated more than the voice unit of pre-selected threshold for choosing frequency of use.
Preferably, the computing unit includes:
Second training subelement, for training the pre-selection sound library corresponding using all voice units in pre-selection sound library Decision-tree model;
Similarity calculation subelement, for calculating each leaf node packet in the corresponding decision-tree model in the pre-selection sound library Similarity between the voice unit contained;
Subelement is counted, for counting in the corresponding decision-tree model in the pre-selection sound library each voice in each leaf node The frequency that unit occurs;
Cut score computation subunit, the frequency for occurring in its affiliated leaf node according to voice unit and the language Sound unit belongs to the similarity of other voice units of a leaf node with it, calculates the cutting score of current speech unit.
Preferably, the unit that cuts is specifically for deleting the voice list for cutting score in pre-selection sound library and being more than cutting threshold value Member, the big language material sound library after being cut;Or the voice unit in pre-selection sound library is arranged from high to low by score is cut Then sequence deletes according to setting ratio and cuts the larger voice unit of score, the big language material sound library after being cut.
Preferably, the data acquisition unit is additionally operable to the text data in acquisition setting field as specific cutting text; The pre-selection unit is additionally operable to using the specific cutting text based on decision-tree model to the big language material sound library after the cutting In voice unit preselected, obtain setting field pre-selection sound library;The computing unit is additionally operable to according to the setting field Similarity calculation institute speech units between the voice unit that the corresponding decision-tree model leaf node in pre-selection sound library includes Cut score;The cutting unit is additionally operable to the cutting score according to institute's speech units to setting field pre-selection sound library In voice unit cut, obtain setting field sound library.
Preferably, the system also includes:
The data acquisition unit is additionally operable to before the cutting score that the computing unit calculates institute's speech units, The text data in setting field is acquired as specific cutting text;
The pre-selection unit is additionally operable to be based on decision-tree model in the pre-selection sound library using the specific cutting text Voice unit preselected, obtain setting field pre-selection sound library, and the setting field is preselected into sound library as new pre-selection Sound library sends the computing unit to.
Big language material sound library method of cutting out provided in an embodiment of the present invention and system, based on decision-tree model to big language material sound library In voice unit preselected, obtain pre-selection sound library;Then further according to the corresponding decision-tree model leaf in the pre-selection sound library The cutting score of similarity calculation institute speech units between the voice unit that node includes;According to the sanction of institute's speech units It cuts score to cut the voice unit in pre-selection sound library, the big language material sound library after being cut.In this way, phase can be cropped Like higher voice unit is spent, that is, the redundant voice unit in big language material sound library is removed, so as to reduce sound library occupancy sky Between while, ensure the coverage of big language material sound library voice unit.
Description of the drawings
It, below will be to attached drawing needed in the embodiment in order to illustrate more clearly of the technical solution that the present invention is implemented It is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, general for this field For logical technical staff, without creative efforts, other drawings may also be obtained based on these drawings.
Fig. 1 shows the flow chart of big language material sound library method of cutting out first embodiment of the invention;
Fig. 2 shows the flow charts of big language material sound library of the invention method of cutting out second embodiment;
Fig. 3 shows the flow chart of big language material sound library method of cutting out 3rd embodiment of the invention;
Fig. 4 shows the structural schematic diagram of big language material sound library cutting system embodiment of the invention.
Specific implementation mode
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
As shown in Figure 1, being the flow chart of big language material sound library method of cutting out first embodiment of the invention, include the following steps:
Step 101, acquisition cuts text comprising multi-field text data as auxiliary.
In practical applications, it includes each field common words that the text of acquisition, which is answered as much as possible,.
Step 102, using the auxiliary cut text based on decision-tree model to the voice unit in big language material sound library into Row pre-selection obtains pre-selection sound library.
The pre-selection process of voice unit based on decision-tree model is as follows:
(1) decision-tree model is trained using all voice units in big language material sound library.
The decision-tree model usually can be according to the context-related information of voice unit in big language material sound library, by pre- The context-sensitive problem set of choosing design builds to obtain, and specific training process similarly to the prior art, is not described in detail herein.
(2) text is cut to auxiliary using decision-tree model and carries out phonetic synthesis, and record use during phonetic synthesis Voice unit and its frequency of use.
After cutting text progress model prediction to auxiliary according to decision-tree model, synthesis auxiliary is selected from big language material sound library The required voice unit of text is cut, synthesis voice is obtained after being spliced, records voice list in the big language material sound library used The number and frequency of use of member, similarly to the prior art, and will not be described here in detail for detailed process.
(3) according to the frequency of use of voice unit and preset pre-selected threshold, to the voice unit in big language material sound library into Row pre-selection specifically chooses the voice unit generation pre-selection sound library that frequency of use is more than pre-selected threshold.In other words, that is, it cuts Fall the voice unit that frequency of use is less than or equal to pre-selected threshold.
Step 103, between the voice unit for including according to the corresponding decision-tree model leaf node in the pre-selection sound library The cutting score of similarity calculation institute speech units.
First, the corresponding decision-tree model in the pre-selection sound library, tool are trained using all voice units in pre-selection sound library Body training process is same as the prior art, and details are not described herein.Then, to preselect the leaf in the corresponding decision-tree model in sound library Node is unit, calculates the similarity of acoustic feature between the voice unit for including in each leaf node, the acoustic feature Can and count every in the corresponding decision-tree model in the pre-selection sound library such as one or more of fundamental frequency, frequency spectrum, duration The frequency that each voice unit occurs in a leaf node.Finally further according to the similarity and voice unit in its affiliated leaf section The frequency occurred in point calculates the cutting score of voice unit, and detailed process is as follows:
(1) similarity score between the voice unit for including in each leaf node in decision-tree model is calculated, specifically Calculation formula is as follows:
Wherein, SijIndicate the similarity score of i-th of voice unit and j-th of voice unit in current leaf node, xik And xjkThe kth dimensional feature parameter of i-th of voice unit feature and j-th of voice unit feature, v are indicated respectivelyk 2Indicate kth dimension The global variance of feature, m indicate the dimension of voice unit current signature, if spectrum signature is 39 dimensions, then m=39.
As can be seen from the above equation, similarity score is got over hour, and two voice units are more close, i.e., similarity is higher.
(2) frequency that each voice unit occurs in each leaf node in the corresponding decision-tree model in statistics pre-selection sound library.
(3) frequency and the voice unit occurred in its affiliated leaf node according to voice unit belongs to a leaf with it The similarity of other voice units of child node calculates the cutting score of current speech unit.
The cutting score of institute's speech units for describing the possibility that current speech unit should be cut, get over by score Greatly, current speech unit may be more cut, and score is smaller, can not possibly be more cut, specific formula for calculation is as follows:
Wherein, Cscore(xi) indicate i-th of voice unit cutting score, fiIndicate that i-th of voice unit is working as frontal lobe The frequency occurred in child node, fjIndicate that the frequency that j-th of voice unit occurs in current leaf node, n expressions work as frontal lobe In child node in addition to i-th of voice unit, the quantity of other voice units.
As can be seen from the above equation, current speech unit and the similarity of other voice units are higher, i.e., similarity score is got over Hour, cutting score is bigger, easier to be cut.
Step 104, the voice unit in pre-selection sound library is cut according to the cutting score of institute's speech units, is obtained Big language material sound library after cutting.
Specifically, the voice unit in pre-selection sound library can be cut according to preset cutting threshold value, such as:
Wherein, I (xi) cutting of i-th of voice unit in current leaf node is indicated as a result, σ is to cut score threshold, I (xi)=1 indicates to cut i-th of voice unit, I (xi)=0 indicates not cutting i-th of voice unit.
Alternatively, being ranked up from high to low to the voice unit in pre-selection sound library by score is cut, then according to setting ratio Example (such as 8%), which is deleted, cuts the larger voice unit of score, the big language material sound library after being cut.
Big language material sound library provided in an embodiment of the present invention method of cutting out, based on decision-tree model to the language in big language material sound library Sound unit is preselected, and pre-selection sound library is obtained;Then include further according to the corresponding decision-tree model leaf node in pre-selection sound library The cutting score of similarity calculation institute speech units between voice unit;According to the cutting score of institute's speech units to pre- The voice unit in sound library is selected to be cut, the big language material sound library after being cut.In this way, it is higher to crop similarity Voice unit removes the redundant voice unit in big language material sound library, so as to while reducing sound library occupied space, protect Demonstrate,prove the coverage of big language material sound library voice unit.
Further, a large amount of texts in some or certain specific areas can also be collected, as specific cutting text.So Afterwards, the big language material sound library after cutting is repeated the above process according to the specific cutting text, obtains setting field sound library;Or Run-off primary is carried out to the pre-selection sound library obtained after pre-selection according to the specific text that cuts, obtains setting field pre-selection sound Then library again cuts setting field pre-selection sound library, obtains setting field sound library according to the method described above.
The two ways for generating setting field sound library is described in detail below.
As shown in Fig. 2, being the flow chart of big language material sound library method of cutting out second embodiment of the invention, include the following steps:
Step 201, acquisition cuts text comprising multi-field text data as auxiliary.
Step 202, using the auxiliary cut text based on decision-tree model to the voice unit in big language material sound library into Row pre-selection obtains pre-selection sound library.
Step 203, between the voice unit for including according to the corresponding decision-tree model leaf node in the pre-selection sound library The cutting score of similarity calculation institute speech units.
Step 204, the voice unit in pre-selection sound library is cut according to the cutting score of institute's speech units, is obtained Big language material sound library after cutting.
Above-mentioned steps 201 are identical as the realization process of step corresponding in embodiment illustrated in fig. 1 to step 204, herein not It repeats again.
Step 205, the text data in acquisition setting field is as specific cutting text.
Specifically, the text data of particular area or several specific areas can be acquired.
Step 206, using the specific cutting text based on decision-tree model in the big language material sound library after the cutting Voice unit preselected, obtain setting field pre-selection sound library.
Specific pre-selection process is similar with the pre-selection process of step 202, and this will not be detailed here.
Step 207, the voice list that the corresponding decision-tree model leaf node in sound library includes is preselected according to the setting field The cutting score of similarity calculation institute speech units between member.
Circular can be found in the description in the method for the present invention first embodiment of front.
Step 208, according to the cutting score of institute's speech units to the voice unit in the pre-selection sound library of the setting field It is cut, obtains setting field sound library.
Specific method of cutting out can be found in the description in the method for the present invention first embodiment of front.
As shown in figure 3, being the flow chart of big language material sound library method of cutting out 3rd embodiment of the invention, include the following steps:
Step 301, acquisition cuts text comprising multi-field text data as auxiliary.
Step 302, using the auxiliary cut text based on decision-tree model to the voice unit in big language material sound library into Row pre-selection obtains pre-selection sound library.
Step 303, the text data in acquisition setting field is as specific cutting text.
Step 304, using described specific text is cut based on decision-tree model to the voice unit in the pre-selection sound library It is preselected, obtains setting field pre-selection sound library.
Step 305, the voice list that the corresponding decision-tree model leaf node in sound library includes is preselected according to the setting field The cutting score of similarity calculation institute speech units between member.
Step 306, according to the cutting score of institute's speech units to the voice unit in the pre-selection sound library of the setting field It is cut, obtains setting field sound library.
Correspondingly, the embodiment of the present invention also provides a kind of big language material sound library cutting system, as shown in figure 4, being system reality Apply the structural schematic diagram of example.
In this embodiment, the system comprises:
Data acquisition unit 401, for acquiring comprising multi-field text data as auxiliary cutting text;
Pre-selection unit 402, for cutting text based on decision-tree model to the language in big language material sound library using the auxiliary Sound unit is preselected, and pre-selection sound library is obtained;
Computing unit 403, the voice list for including according to the corresponding decision-tree model leaf node in the pre-selection sound library The cutting score of similarity calculation institute speech units between member;
Unit 404 is cut, for being carried out to the voice unit in pre-selection sound library according to the cutting score of institute's speech units It cuts, the big language material sound library after being cut.
A kind of concrete structure of above-mentioned pre-selection unit 402 may include following subelement:
First training subelement, for training decision-tree model using all voice units in the big language material sound library;
Synthesizing subunit carries out phonetic synthesis for cutting text to the auxiliary using the decision-tree model, and remembers The voice unit and its frequency of use used during record phonetic synthesis;
Subelement is chosen, pre-selection sound library is generated more than the voice unit of pre-selected threshold for choosing frequency of use.
A kind of concrete structure of above-mentioned computing unit 403 may include following subelement:
Second training subelement, for training the pre-selection sound library corresponding using all voice units in pre-selection sound library Decision-tree model;
Similarity calculation subelement, for calculating each leaf node packet in the corresponding decision-tree model in the pre-selection sound library Similarity between the voice unit contained;
Subelement is counted, for counting in the corresponding decision-tree model in the pre-selection sound library each voice in each leaf node The frequency that unit occurs;
Cut score computation subunit, the frequency for occurring in its affiliated leaf node according to voice unit and the language Sound unit belongs to the similarity of other voice units of a leaf node with it, calculates the cutting score of current speech unit. Specific calculating process can be found in the description in the method for the present invention embodiment of front, and details are not described herein.
Above-mentioned cutting unit 404, which can specifically be deleted, cuts the voice unit that score is more than cutting threshold value in pre-selection sound library, Big language material sound library after being cut;Or the voice unit in pre-selection sound library is ranked up from high to low by score is cut, Then it is deleted according to setting ratio and cuts the larger voice unit of score, the big language material sound library after being cut.
Big language material sound library provided in an embodiment of the present invention cutting system, based on decision-tree model to the language in big language material sound library Sound unit is preselected, and pre-selection sound library is obtained;Then further according to the corresponding decision-tree model leaf node packet in the pre-selection sound library The cutting score of similarity calculation institute speech units between the voice unit contained;According to the cutting score of institute's speech units Voice unit in pre-selection sound library is cut, the big language material sound library after being cut.In this way, can crop similarity compared with High voice unit removes the redundant voice unit in big language material sound library, so as to reduce the same of sound library occupied space When, ensure the coverage of big language material sound library voice unit.
Further, the present invention big language material sound library cutting system can also be collected a large amount of in some or certain specific areas Text, as specific cutting text.Then, above-mentioned mistake is repeated to the big language material sound library after cutting according to the specific text that cuts Journey obtains setting field sound library.That is, in another embodiment of present system, above-mentioned each unit be also respectively provided with Lower function:
The data acquisition unit 401 is additionally operable to the text data in acquisition setting field as specific cutting text;
The pre-selection unit 402 is additionally operable to after being based on decision-tree model to the cutting using the specific cutting text Big language material sound library in voice unit preselected, obtain setting field pre-selection sound library;
The computing unit 403 is additionally operable to preselect the corresponding decision-tree model leaf section in sound library according to the setting field The cutting score of similarity calculation institute speech units between the voice unit that point includes;
The cutting unit 404 is additionally operable to preselect sound to the setting field according to the cutting score of institute's speech units Voice unit in library is cut, and setting field sound library is obtained.
Further, the present invention big language material sound library cutting system can also be collected a large amount of in some or certain specific areas Text, as specific cutting text.Then, second is carried out to the pre-selection sound library obtained after pre-selection according to the specific text that cuts Secondary pre-selection obtains setting field pre-selection sound library, then cuts, obtains according to the method described above to setting field pre-selection sound library again To setting field sound library.That is, in another embodiment of present system, above-mentioned each unit is also respectively provided with following work( Energy:
The data acquisition unit 401 is additionally operable to cut to divide it in computing unit calculating institute speech units Before, the text data in acquisition setting field is as specific cutting text;
The pre-selection unit 402 is additionally operable to be based on decision-tree model to the pre-selection sound using the specific cutting text Voice unit in library is preselected, obtain setting field pre-selection sound library, and using setting field pre-selection sound library as newly Pre-selection sound library sends the computing unit to.
Each embodiment in this specification is described in a progressive manner, identical similar portion between each embodiment Point just to refer each other, and each embodiment focuses on the differences from other embodiments.Especially for system reality For applying example, since it is substantially similar to the method embodiment, so describing fairly simple, related place is referring to embodiment of the method Part explanation.System embodiment described above is only schematical, wherein described be used as separating component explanation Unit and module may or may not be physically separated.Furthermore it is also possible to select according to the actual needs it In some or all of unit and module achieve the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying In the case of creative work, you can to understand and implement.
The structure, feature and effect of the present invention, the above institute are described in detail based on the embodiments shown in the drawings Only presently preferred embodiments of the present invention is stated, but the present invention is not to limit practical range, every structure according to the present invention shown in drawing Change made by thinking, or is revised as the equivalent embodiment of equivalent variations, when not going beyond the spirit of the description and the drawings, It should all be within the scope of the present invention.

Claims (12)

1. a kind of big language material sound library method of cutting out, which is characterized in that including:
Acquisition cuts text comprising multi-field text data as auxiliary;
Text is cut using the auxiliary to preselect the voice unit in big language material sound library based on decision-tree model, is obtained pre- Select sound library;
Voice unit in the pre-selection sound library for including according to the corresponding decision-tree model leaf node in the pre-selection sound library it Between similarity, calculate the cutting score of the voice unit in the pre-selection sound library;
The voice unit in the pre-selection sound library is cut according to the cutting score, the big language material sound after being cut Library.
2. according to the method described in claim 1, it is characterized in that, described cut text based on decision tree mould using the auxiliary Type preselects the voice unit in big language material sound library, obtains pre-selection sound library and includes:
Decision-tree model is trained using all voice units in the big language material sound library;
Text is cut to the auxiliary using the decision-tree model and carries out phonetic synthesis, and records use during phonetic synthesis Voice unit and its frequency of use;
Choose the voice unit generation pre-selection sound library that frequency of use is more than pre-selected threshold.
3. according to the method described in claim 2, it is characterized in that, described according to the corresponding decision-tree model in the pre-selection sound library The similarity between voice unit in the pre-selection sound library that leaf node includes calculates the voice list in the pre-selection sound library Member cutting score include:
The corresponding decision-tree model in the pre-selection sound library is trained using all voice units in pre-selection sound library;
Calculate the similarity between the voice unit that each leaf node includes in the corresponding decision-tree model in the pre-selection sound library;
Count the frequency that each voice unit occurs in each leaf node in the corresponding decision-tree model in the pre-selection sound library;
The frequency and the voice unit occurred in its affiliated leaf node according to voice unit belongs to a leaf node with it Other voice units similarity, calculate current speech unit cutting score.
4. according to the method described in claim 3, it is characterized in that, it is described according to the cutting score in the pre-selection sound library Voice unit cut, the big language material sound library after being cut includes:
It deletes and cuts the voice unit that score is more than cutting threshold value, the big language material sound library after being cut in pre-selection sound library;Or
The voice unit in pre-selection sound library is ranked up from high to low by score is cut, then deletes and cuts according to setting ratio The larger voice unit of score, the big language material sound library after being cut.
5. method according to any one of claims 1 to 4, which is characterized in that the method further includes:
The text data in setting field is acquired as specific cutting text;
Using the specific cutting text based on decision-tree model to the voice unit in the big language material sound library after the cutting into Row pre-selection obtains setting field pre-selection sound library;
It is similar between the voice unit for including according to the corresponding decision-tree model leaf node in setting field pre-selection sound library Degree calculates the cutting score of institute's speech units;
The voice unit in the pre-selection sound library of the setting field is cut according to the cutting score of institute's speech units, is obtained Setting field sound library.
6. method according to any one of claims 1 to 4, which is characterized in that the method further includes:
Before the cutting score for calculating institute's speech units, the text data in acquisition setting field is as specific cutting text;
The voice unit in the pre-selection sound library is preselected based on decision-tree model using the specific text that cuts, is obtained Setting field preselects sound library, and executes subsequent step using setting field pre-selection sound library as new pre-selection sound library.
7. a kind of big language material sound library cutting system, which is characterized in that including:
Data acquisition unit, for acquiring comprising multi-field text data as auxiliary cutting text;
Pre-selection unit, for using the auxiliary cutting text based on decision-tree model to the voice unit in big language material sound library into Row pre-selection obtains pre-selection sound library;
Computing unit, for being preselected in the pre-selection sound library that the corresponding decision-tree model leaf node in sound library includes according to described Voice unit between similarity, calculate the cutting score of the voice unit in the pre-selection sound library;
Unit is cut, for being cut to the voice unit in the pre-selection sound library according to the cutting score, is cut Big language material sound library afterwards.
8. system according to claim 7, which is characterized in that the pre-selection unit includes:
First training subelement, for training decision-tree model using all voice units in the big language material sound library;
Synthesizing subunit carries out phonetic synthesis for cutting text to the auxiliary using the decision-tree model, and records language The voice unit and its frequency of use used in sound building-up process;
Subelement is chosen, pre-selection sound library is generated more than the voice unit of pre-selected threshold for choosing frequency of use.
9. system according to claim 8, which is characterized in that the computing unit includes:
Second training subelement, for training the corresponding decision in the pre-selection sound library using all voice units in pre-selection sound library Tree-model;
Similarity calculation subelement includes for calculating each leaf node in the corresponding decision-tree model in the pre-selection sound library Similarity between voice unit;
Subelement is counted, for counting in the corresponding decision-tree model in the pre-selection sound library each voice unit in each leaf node The frequency of appearance;
Cut score computation subunit, the frequency for occurring in its affiliated leaf node according to voice unit and the voice list Member belongs to the similarity of other voice units of a leaf node with it, calculates the cutting score of current speech unit.
10. system according to claim 9, which is characterized in that
The cutting unit is specifically used for deleting cutting score in pre-selection sound library and is more than the voice unit for cutting threshold value, cut out Big language material sound library after cutting;Or the voice unit in pre-selection sound library is ranked up from high to low by score is cut, then press It is deleted according to setting ratio and cuts the larger voice unit of score, the big language material sound library after being cut.
11. according to claim 7 to 10 any one of them system, which is characterized in that
The data acquisition unit is additionally operable to the text data in acquisition setting field as specific cutting text;
The pre-selection unit is additionally operable to using the specific cutting text based on decision-tree model to the big language material after the cutting Voice unit in sound library is preselected, and setting field pre-selection sound library is obtained;
The computing unit, be additionally operable to preselect the corresponding decision-tree model leaf node in sound library according to the setting field include The cutting score of similarity calculation institute speech units between voice unit;
The cutting unit is additionally operable to the cutting score according to institute's speech units to the language in the pre-selection sound library of the setting field Sound unit is cut, and setting field sound library is obtained.
12. according to claim 7 to 10 any one of them system, which is characterized in that the system also includes:
The data acquisition unit is additionally operable to before the cutting score that the computing unit calculates institute's speech units, acquisition The text data in setting field is as specific cutting text;
The pre-selection unit is additionally operable to using the specific cutting text based on decision-tree model to the language in the pre-selection sound library Sound unit is preselected, and obtains setting field pre-selection sound library, and the setting field is preselected sound library as new pre-selection sound library Send the computing unit to.
CN201510326068.3A 2015-06-12 2015-06-12 Big language material sound library method of cutting out and system Active CN104916281B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510326068.3A CN104916281B (en) 2015-06-12 2015-06-12 Big language material sound library method of cutting out and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510326068.3A CN104916281B (en) 2015-06-12 2015-06-12 Big language material sound library method of cutting out and system

Publications (2)

Publication Number Publication Date
CN104916281A CN104916281A (en) 2015-09-16
CN104916281B true CN104916281B (en) 2018-09-21

Family

ID=54085310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510326068.3A Active CN104916281B (en) 2015-06-12 2015-06-12 Big language material sound library method of cutting out and system

Country Status (1)

Country Link
CN (1) CN104916281B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107492371A (en) * 2017-07-17 2017-12-19 广东讯飞启明科技发展有限公司 A kind of big language material sound storehouse method of cutting out

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1471027A (en) * 2002-07-25 2004-01-28 摩托罗拉公司 Method and apparatus for compressing voice library
CN1731509A (en) * 2005-09-02 2006-02-08 清华大学 Mobile speech synthesis method
CN1924994A (en) * 2005-08-31 2007-03-07 中国科学院自动化研究所 Embedded language synthetic method and system
CN102063897A (en) * 2010-12-09 2011-05-18 北京宇音天下科技有限公司 Sound library compression for embedded type voice synthesis system and use method thereof
CN102201232A (en) * 2011-06-01 2011-09-28 北京宇音天下科技有限公司 Voice database structure compression used for embedded voice synthesis system and use method thereof
CN102281196A (en) * 2011-08-11 2011-12-14 中兴通讯股份有限公司 Decision tree generating method and equipment, decision-tree-based message classification method and equipment
CN102298635A (en) * 2011-09-13 2011-12-28 苏州大学 Method and system for fusing event information
CN103077704A (en) * 2010-12-09 2013-05-01 北京宇音天下科技有限公司 Voice library compression and use method for embedded voice synthesis system
CN104103268A (en) * 2013-04-03 2014-10-15 中国移动通信集团安徽有限公司 Corpus processing method, device and voice synthesis system

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR100612843B1 (en) * 2004-02-28 2006-08-14 삼성전자주식회사 Method for compensating probability density function, method and apparatus for speech recognition thereby
US8886523B2 (en) * 2010-04-14 2014-11-11 Huawei Technologies Co., Ltd. Audio decoding based on audio class with control code for post-processing modes

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1471027A (en) * 2002-07-25 2004-01-28 摩托罗拉公司 Method and apparatus for compressing voice library
CN1924994A (en) * 2005-08-31 2007-03-07 中国科学院自动化研究所 Embedded language synthetic method and system
CN1731509A (en) * 2005-09-02 2006-02-08 清华大学 Mobile speech synthesis method
CN102063897A (en) * 2010-12-09 2011-05-18 北京宇音天下科技有限公司 Sound library compression for embedded type voice synthesis system and use method thereof
CN103077704A (en) * 2010-12-09 2013-05-01 北京宇音天下科技有限公司 Voice library compression and use method for embedded voice synthesis system
CN102201232A (en) * 2011-06-01 2011-09-28 北京宇音天下科技有限公司 Voice database structure compression used for embedded voice synthesis system and use method thereof
CN102281196A (en) * 2011-08-11 2011-12-14 中兴通讯股份有限公司 Decision tree generating method and equipment, decision-tree-based message classification method and equipment
CN102298635A (en) * 2011-09-13 2011-12-28 苏州大学 Method and system for fusing event information
CN104103268A (en) * 2013-04-03 2014-10-15 中国移动通信集团安徽有限公司 Corpus processing method, device and voice synthesis system

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
"基于决策树的汉语大语料库合成系统";王仁华;《第六届全国人机语音通讯学术会议》;20011120;全文 *
"语音库裁减的一种不定长递阶聚类方法";张巍 等;《计算机学报》;20071130;第30卷(第11期);全文 *

Also Published As

Publication number Publication date
CN104916281A (en) 2015-09-16

Similar Documents

Publication Publication Date Title
US6535852B2 (en) Training of text-to-speech systems
DE102020205786A1 (en) VOICE RECOGNITION USING NATURAL LANGUAGE UNDERSTANDING (NLU) RELATED KNOWLEDGE ABOUT DEEP FORWARD NEURAL NETWORKS
DE69832393T2 (en) LANGUAGE RECOGNITION SYSTEM FOR THE DETECTION OF CONTINUOUS AND ISOLATED LANGUAGE
DE602005002706T2 (en) Method and system for the implementation of text-to-speech
DE602004012909T2 (en) A method and apparatus for modeling a speech recognition system and estimating a word error rate based on a text
DE112017001830T5 (en) LANGUAGE IMPROVEMENT AND AUDIO EVENT DETECTION FOR AN ENVIRONMENT WITH NON-STATIONARY NOISE
CN1924994B (en) Embedded language synthetic method and system
DE69629763T2 (en) Method and device for determining triphone hidden markov models (HMM)
US11282503B2 (en) Voice conversion training method and server and computer readable storage medium
DE60004420T2 (en) Recognition of areas of overlapping elements for a concatenative speech synthesis system
EP0710378A1 (en) A method and apparatus for converting text into audible signals using a neural network
CN106653056A (en) Fundamental frequency extraction model based on LSTM recurrent neural network and training method thereof
CN104538024A (en) Speech synthesis method, apparatus and equipment
DE60201939T2 (en) Device for speaker-independent speech recognition, based on a client-server system
DE102008040739A1 (en) Method and system for calculating or determining confidence or confidence scores for syntax trees at all levels
CN105893414A (en) Method and apparatus for screening valid term of a pronunciation lexicon
DE69727046T2 (en) METHOD, DEVICE AND SYSTEM FOR GENERATING SEGMENT PERIODS IN A TEXT-TO-LANGUAGE SYSTEM
CN106531157A (en) Regularization accent adapting method for speech recognition
EP1611568A1 (en) Three-stage word recognition
CN106205601B (en) Determine the method and system of text voice unit
JP2008090272A (en) Using child directed speech to bootstrap model based speech segmentation and recognition system
DE112006000322T5 (en) Audio recognition system for generating response audio using extracted audio data
EP1187095B1 (en) Grapheme-phoneme assignment
CN104123857B (en) A kind of Apparatus and method for realizing personalized some reading
CN106297794A (en) The conversion method of a kind of language and characters and equipment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200512

Address after: 200335 room 1966, floor 1, building 8, No. 33, Guangshun Road, Changning District, Shanghai

Patentee after: IFLYTEK (Shanghai) Technology Co., Ltd

Address before: Wangjiang Road high tech Development Zone Hefei city Anhui province 230088 No. 666

Patentee before: IFLYTEK Co.,Ltd.

TR01 Transfer of patent right