CN106227721B - Chinese Prosodic Hierarchy forecasting system - Google Patents
Chinese Prosodic Hierarchy forecasting system Download PDFInfo
- Publication number
- CN106227721B CN106227721B CN201610642956.0A CN201610642956A CN106227721B CN 106227721 B CN106227721 B CN 106227721B CN 201610642956 A CN201610642956 A CN 201610642956A CN 106227721 B CN106227721 B CN 106227721B
- Authority
- CN
- China
- Prior art keywords
- module
- text
- word
- feature
- model
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 239000013598 vector Substances 0.000 claims abstract description 154
- 238000012549 training Methods 0.000 claims abstract description 97
- 238000013507 mapping Methods 0.000 claims abstract description 34
- 230000004927 fusion Effects 0.000 claims abstract description 22
- 230000002708 enhancing effect Effects 0.000 claims abstract description 20
- 230000033764 rhythmic process Effects 0.000 claims abstract description 11
- 238000000034 method Methods 0.000 claims description 24
- 238000000605 extraction Methods 0.000 claims description 18
- 238000000354 decomposition reaction Methods 0.000 claims description 10
- 239000000284 extract Substances 0.000 claims description 7
- 238000003066 decision tree Methods 0.000 claims description 6
- 238000013459 approach Methods 0.000 claims description 3
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 claims description 3
- 230000000306 recurrent effect Effects 0.000 claims description 3
- 230000006403 short-term memory Effects 0.000 claims description 3
- 241000531229 Caryopteris x clandonensis Species 0.000 claims 1
- 235000001486 Salvia viridis Nutrition 0.000 claims 1
- 150000001875 compounds Chemical class 0.000 claims 1
- 235000013399 edible fruits Nutrition 0.000 claims 1
- 230000015572 biosynthetic process Effects 0.000 description 10
- 238000003786 synthesis reaction Methods 0.000 description 10
- 230000015654 memory Effects 0.000 description 9
- 238000010586 diagram Methods 0.000 description 6
- 230000008569 process Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 3
- 230000006870 function Effects 0.000 description 3
- 230000003993 interaction Effects 0.000 description 3
- 238000012216 screening Methods 0.000 description 3
- 241001672694 Citrus reticulata Species 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 235000019219 chocolate Nutrition 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000008030 elimination Effects 0.000 description 1
- 238000003379 elimination reaction Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 239000000155 melt Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 210000004218 nerve net Anatomy 0.000 description 1
- 238000003062 neural network model Methods 0.000 description 1
- 238000004064 recycling Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000002194 synthesizing effect Effects 0.000 description 1
- 230000026676 system process Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Abstract
The invention discloses a kind of Chinese Prosodic Hierarchy forecasting systems.Wherein, which includes: the text data that text analysis model output analysis is completed;The text feature of text feature parameterized module output parameter;Words vector joint training module receives the text data that the analysis that the text analysis model generates is completed, and the term vector enhanced with word vector for exporting text indicates model;Term vector generation module indicates model using the term vector enhanced with word vector, and the term vector of the word vector enhancing of text data is completed in output analysis;First single classifier training module exports the first mapping model;Second single classifier training module exports the second mapping model;The output of feature importance ranking module has the text parameter feature of predtermined category performance;Model Fusion module exports the result of the prosody hierarchy structure prediction.The accuracy of rhythm structure level prediction is improved through the embodiment of the present invention.
Description
Technical field
The present embodiments relate to the total speech synthesis technique fields of human-computer interaction, more particularly, to a kind of Chinese prosody hierarchy
Structure prediction system.
Background technique
Accurate prosody hierarchy description and from predicting that prosody hierarchy structure is always in speech synthesis to pass in text information
An important step is the important component for improving synthesis speech naturalness and expressive force, constructing harmonious human-computer interaction technology.Rhythm
Rule structural model can depict modulation in tone and the order of importance and emergency in voice, and then improve the expressive force and nature of synthesis voice
Degree.Rhythm structure modeling is of great significance to the development of speech synthesis, human-computer interaction etc. with prediction.
Although having there is many research work in this field, there are also many problems for prosody hierarchy prediction so far
There is no very good solution.
It is mainly manifested in the following:
1, the description of text feature is inaccurate.Wherein most is only from some more traditional spies such as part of speech, word length
Text is described in sign;And the fewer description for considering word vector characteristics and text feature being carried out, do not account for a word yet
The word inside different meaning and word be might have under different context on term vector influence that may be present.
2, an ideal state, therefore the naturalness of speech synthesis is generally not achieved in the accuracy of single model prediction
It is significantly damaged, and then influences the sense of hearing of people.
In view of this, the present invention is specifically proposed.
Summary of the invention
In order to solve existing technical problem, the embodiment of the present invention provides a kind of Chinese Prosodic Hierarchy forecasting system,
To improve the accuracy of rhythm structure level prediction.
To achieve the goals above, according to an aspect of the invention, there is provided following technical scheme:
A kind of Chinese Prosodic Hierarchy forecasting system, the forecasting system include:
Text analysis model, for receiving text data to be analyzed, the text data that output analysis is completed;
Text feature parameterized module is connected with the text analysis model, the text completed for receiving the analysis
Data, the text feature of output parameter;
Words vector joint training module, is connected, for receiving the text analysis model with the text analysis model
The text data that the analysis generated is completed, and language model of the joint training based on word vector sum term vector export text
The term vector enhanced with word vector indicate model;
Term vector generation module, the text data that the analysis for being exported based on the text analysis model is completed,
Model is indicated using the term vector enhanced with word vector, exports the word vector enhancing that text data is completed in the analysis
Term vector;
First single classifier training module is connected with the text feature parameterized module, for training from the text
First mapping model of the text feature of the parametrization of feature parameterization module output to prosody hierarchy structure;
Second single classifier training module is connected with the term vector generation module, raw from the term vector for training
At module output the word vector enhancing term vector to the prosody hierarchy structure the second mapping model;
Feature importance ranking module is connected with the first single classifier training module, has predetermined point for exporting
The text parameter feature of class performance;
Model Fusion module, with the first single classifier training module, the second single classifier training module and institute
It states feature importance screening module to be connected, for receiving the first single classifier training module and second single classifier instruction
Practice first mapping model and second mapping model and defeated by the feature importance ranking module of module output
The text parameter feature with predtermined category performance out, and first single classifier is instructed using integrated learning approach
Practice module and the second single classifier training module is merged in decision-making level, to export the prosody hierarchy structure prediction
Result.
Further, the text analysis model is specifically used for carrying out at regularization the text data to be analyzed
Reason corrects polyphone pronunciation mistake, and carries out Regularization to number, exports the text data of the analysis completion.
Further, the text data that the analysis is completed includes symbolic feature and numeric type feature;The text feature
Parameterized module is specifically used at the symbolic feature exported using one-hot representation method to the text analysis model
Reason, and retain the numeric type feature of the text analysis model output, to export the text feature of the parametrization.
Further, the text data that the analysis is completed further includes participle feature;
The words vector joint training module specifically includes:
Word location extraction module, the participle feature for being exported according to the text analysis model, and appeared according to word
Position in word clusters each word, extracts word location information;
Word context cluster module, the word location information for being extracted based on the word location extraction module, according to
The different contexts of the word are clustered, and indicate the same word with multiple vectors;
Non- decomposition vocabulary establishes module, the textual data that the analysis for being exported based on the text analysis model is completed
According to, establish it is non-decompose word word lists;
Specific words vector joint training module, for establishing the described overstepping one's bounds of module output according to the non-vocabulary that decomposes
The word lists of word and the output of the word context cluster module are solved as a result, exporting being enhanced with word vector for the text
Term vector indicate.
Further, the text data that the analysis is completed further includes participle feature;
The term vector generation module is specifically used for the participle feature exported based on the text analysis model, and
The term vector of the text of the words vector joint training module output enhanced with word vector indicates model, in conjunction with
The semantic information that word information in word and word is included constructs the united term vector model of words, and is combined by the words
Term vector model mapped, export it is described analysis complete text data word vector enhancing term vector.
Further, the first single classifier training module is specifically used for establishing mapping using the method for condition random field
First mapping model of the text feature of the parametrization and the prosody hierarchy relationship between structure.
Further, the second single classifier training module is specifically used for recycling nerve net using two-way long short-term memory
Network establishes the second mapping model of the term vector and the prosody hierarchy relationship between structure that map the word vector enhancing.
Further, the feature importance ranking module specifically includes:
Text feature set extraction module, the text feature for changing based on the parameter extract text spy by enumerative technique
Collection is closed;
F-Score value promotes computing module, each spy extracted for calculating separately the text feature set extraction module
The promotion of the upper brought F-Score value of verifying collection when levying the input respectively as the first single classifier training module
Value;
Feature importance ranking output module promotes each F-Score that computing module obtains to the F-Score value
The lifting values of value are ranked up, the output text parameter feature with predtermined category performance.
Further, the Model Fusion module can specifically include:
First single classifier output module is connected with the first single classifier training module, for according to described first
Mapping model determines the first probability that prosody hierarchy prediction pauses and do not stop;
Second single classifier output module is connected with the second single classifier training module, for according to described second
Mapping model determines the second probability that prosody hierarchy prediction pauses and do not stop;
Important feature generation module is connected with the feature importance ranking module, for described with pre- by calculating
The text parameter feature for determining classification performance exports the important feature of prosody hierarchy to the contribution of the F-Score value;
Fusion forecasting module, first probability for being exported to the first single classifier output module and described
Second probability of two single classifier output modules output and the important spy of important feature generation module output
Sign is iterated decision tree fusion, with the prediction result on the rhythm boundary of the determination prosody hierarchy structure.
It can be seen from the above technical proposal that the embodiment of the present invention receives text to be analyzed by text analysis model
Data, the text data that output analysis is completed;Then, the term vector and pass through that vector training module obtains are being combined by words
Text is described in two different aspects of traditional text feature that text feature parameterized module obtains, can be more smart
Text feature carefully is described;First single classifier training module is connected with text feature parameterized module, for training from text
First mapping model of the text feature of the parametrization of eigen parameterized module output to prosody hierarchy structure;It is again single by second
Classifier training module is connected with term vector generation module, for training the word vector exported from term vector generation module enhancing
Second mapping model of the term vector to the prosody hierarchy structure;For different prosody hierarchies, using different text features
In conjunction with and characteristic window length, thereby increase the accuracy of model prediction;The embodiment of the present invention also passes through words vector and joins
Close the term vector that training module generates more accurately word enhancing;It is important by the feature being connected with the first single classifier training module
Property sorting module output have predtermined category performance text parameter feature, to improve the performance of prosody prediction.Finally by mould
Type Fusion Module introduces the iteration decision Tree algorithms in integrated study, and the output and feature generate to two single classifiers is arranged
Sequence module generates importance characteristic and is merged, and substantially increases the performance of prosody prediction, so that synthesis speech naturalness and table
Existing power is more preferable.
Other features and advantages of the present invention will be illustrated in the following description, also, partly becomes from specification
It obtains it is clear that understand through the implementation of the invention.Objectives and other advantages of the present invention can be by written explanation
Specifically noted method is achieved and obtained in book, claims and attached drawing.
Detailed description of the invention
By the detailed description below in conjunction with attached drawing, above and other aspects, features and advantages of the invention will become more
Add apparent, in which:
Fig. 1 is the structural schematic diagram according to the Chinese Prosodic Hierarchy forecasting system shown in an exemplary embodiment;
Fig. 2 is the structural schematic diagram according to the words vector joint training module shown in another exemplary embodiment;
Fig. 3 is the structural schematic diagram according to the feature importance ranking module shown in an exemplary embodiment;
Fig. 4 is the structural schematic diagram according to the Model Fusion module shown in an exemplary embodiment.
Specific embodiment
To make the objectives, technical solutions, and advantages of the present invention clearer, below in conjunction with specific embodiment, and reference
Attached drawing, the present invention is described in more detail.It should be noted that in the absence of clear limitations or conflicts, this hair
Each embodiment and technical characteristic therein in bright can be combined with each other and form technical solution.It is described in attached drawing or specification
In, similar or identical part all uses identical figure number.And in the accompanying drawings, to simplify or facilitate mark.Furthermore in attached drawing
The implementation for not being painted or describing is form known to a person of ordinary skill in the art in technical field.In addition, though herein can
The example of parameter comprising particular value is provided, it is to be understood that parameter is equal to corresponding value without definite, but in acceptable mistake
It is similar to be worth accordingly in poor tolerance or design constraint.
The total thought of the embodiment of the present invention be joint word vector be trained term vector model, based on word enhancing word to
The modeling of memory models Recognition with Recurrent Neural Network model in short-term of the two-way length of amount simultaneously utilizes the method for integrated study to different classifications device
Result merged.Particularly by text analyzing and parameterized module, the traditional characteristic for obtaining text is indicated, for not
Rhythm structure with level is predicted, is combined using different text features and text is described in characteristic window length, in turn
Using text feature as the input of condition random field, to construct the first single classifier;Pass through words joint training model, building text
The term vectorization of the word enhancing of eigen indicates.To which text feature is obtained one group of vectorization parameter by statistics and is retouched
It states, as the input of two-way length memory models Recognition with Recurrent Neural Network in short-term, to construct the second single classifier;Finally, pass through feature
Importance ranking module generates the important feature for being conducive to classification, and the output of these features and the first and second single classifiers is together
As the input of Model Fusion, merged during Model Fusion using iteration decision Tree algorithms.
Fig. 1 is the structural schematic diagram of Chinese Prosodic Hierarchy of embodiment of the present invention forecasting system.As shown in Figure 1, this is pre-
Examining system may include: text analysis model 1, text feature parameterized module 2, words vector joint training module 3, term vector
Generation module 4, the first single classifier training module 5, the second single classifier training module 6, feature importance ranking module 7 and mould
Type Fusion Module 8.Wherein, text analysis model 1 is for receiving text data to be analyzed, the textual data that output analysis is completed
According to.Text feature parameterized module 2 is connected with text analysis model 1, for finishing receiving the text data of analysis, output parameter
The text feature of change.Words vector joint training module 3 is connected with text analysis model 1, raw for receiving text analysis model 1
At the text data completed of analysis, and language model of the joint training based on word vector sum term vector, export text uses word
The term vector that vector is enhanced indicates model.The analysis that term vector generation module 4 is used to export based on text analysis model is complete
At text data, using words vector joint training module 3 export the term vector enhanced with word vector indicate model,
The term vector of the word vector enhancing of text data is completed in output analysis.First single classifier training module 5 and text feature parameter
Change module 2 to be connected, for training the text feature of the parametrization exported from text feature parameterized module 2 to prosody hierarchy structure
The first mapping model.Second single classifier training module 6 is connected with term vector generation module 4, raw from term vector for training
At module 4 export word vector enhancing term vector to prosody hierarchy structure the second mapping model.Feature importance ranking mould
Block 7 is connected with the first single classifier training module 5, for exporting the text parameter feature with predtermined category performance.Model melts
Mold block 8 and 7 phase of the first single classifier training module 5, the second single classifier training module 6 and feature importance screening module
Even, for receiving the first mapping model that the first single classifier training module 5 and the second single classifier training module 6 export and the
Two mapping models and the text parameter feature with predtermined category performance exported by feature importance ranking module 7, and adopt
The first single classifier training module 5 and the second single classifier training module 6 are merged in decision-making level with integrated learning approach,
To export the result of prosody hierarchy structure prediction.
In the above-described embodiments, text analysis model 1 specifically can be used for carrying out regularization to text data to be analyzed
Processing corrects polyphone pronunciation mistake, and carries out Regularization, the text data of output analysis completion to number.Wherein, divide
The text data that analysis is completed includes symbolic feature and numeric type feature.Wherein, symbolic feature includes participle feature.
Wherein, it when text analysis model 1 carries out regularization processing to text data to be analyzed, is gone using the method for rule
Fall symbol extra in text data to be analyzed.What text analysis model 1 exported, which analyzes the text data completed, may include
But it is not limited to the corresponding participle of text, part of speech, the syllable number of word length and sentence.
In the above-described embodiments, text feature parameterized module 2 specifically can be used for using one-hot representation method to text
The symbolic feature that this analysis module 1 exports is handled, and retains the numeric type feature of the output of text analysis model 1, thus defeated
The text feature parameterized out.
In the above-described embodiments, it when the term vector of the training of words vector joint training module 3 word enhancing, i.e., is instructed in term vector
The influence of the contained word internal word vector inside word is considered when white silk.
Fig. 2 schematically illustrates the structure of words vector joint training module.As shown in Fig. 2, words vector joint instruction
Practicing module 3 can specifically include: word location extraction module 31, word context cluster module 32, non-decomposition vocabulary establish module 33
And specific words vector joint training module 34.
Word location extraction module 31 is used for the participle feature exported according to text analysis model 1, and appears in word according to word
In position, each word is clustered, extract word location information.
As an example, word location extraction module 31 can appear in the starting position, middle position and end of word according to word
Each word is clustered into three different classes by position.By word location extraction module 31, consider the text of word location uses word
Term vector that vector is enhanced indicates can be with are as follows:
Wherein,Indicate word xjMiddle first character;Indicate word xjExcept middle removing first character and the last character
K-th of word;Indicate word xjMiddle the last character;NjIndicate word xjIn number of words;The serial number of k expression word;J takes positive integer.
The embodiment of the present invention passes through setting word location extraction module 31, it is contemplated that word letter the location of in word
Breath, to eliminate word location ambiguity.
Word context cluster module 32 is used for the word location information extracted based on word location extraction module 31, not according to word
It is clustered with context, and indicates the same word with multiple vectors.
For example, for word xj={ c1,...,cN, then consider being increased with word vector for the text of context cluster
Strong term vector indicates can be with are as follows:
Wherein, XjIndicate that the term vector of text enhanced with word vector indicates;S () indicates that cosine similarity calculates
Function;K indicates the number of the cliction up and down considered, i.e. window is long, it is preferable that K=5;It indicates in the training process most frequently
By word xjThe word vector of selection;cuIt indicates in the training process by word xjThe word vector of selection;wjIndicate word xjTerm vector;wtTable
Show simple word;The serial number of k expression word;Indicate the optimum cluster of each word in word;xtIndicate the word of word enhancing.
The non-vocabulary that decomposes establishes text data of the module 33 for completing based on the analysis that text analysis model 1 exports, and builds
Found the non-word lists for decomposing word.
In practical applications, generally existing non-decomposition word in Mandarin Chinese, such as " sofa ", " chocolate ", " hovering "
Deng.In these words, the word inside word is to the semanteme of entire word substantially without contribution.It therefore is elimination Mandarin Chinese
In non-influence of the word to term vector decomposed in word, need to ignore in non-decomposition word structure in the non-decomposition word term vector of training
Influence of the word to term vector.When the embodiment of the present invention establishes module 33 using non-decomposition vocabulary, in the training process may not be used
Consider influence of the word to term vector is generated inside word, for indissoluble word, establishes the non-word lists for decomposing word.
In order to keep decomposing word and the non-consistency for decomposing word term vector dimension, so, XjFormula is needed multiplied by 1/2.
Specific words vector joint training module 34 is for establishing the non-decomposition word of the output of module 33 according to non-decomposition vocabulary
Word lists and word context cluster module 32 output as a result, output text the term vector table enhanced with word vector
Representation model.
Wherein, specific words vector joint training module 34 considers contained inside word when term vector training
The influence of word vector, then word xjThe term vector enhanced with word vector indicate model XjIt can indicate are as follows:
Wherein, wjIndicate word xjTerm vector;NjIndicate word xjIn number of words;ckIndicate word xjIn k-th of word word vector;
The serial number of k expression word.
The embodiment of the present invention can produce the term vector of more accurately word enhancing by words vector joint training module.Word
Term vector joint training module considers influence of the word inside word to term vector.Moreover, for word vector, it is contemplated that word is in word
One word is indicated with different vectors, and this is transported by the factors such as different contexts locating for middle different position and word
It uses in words joint training model.In addition, for the words that cannot be split some in Chinese, in the training process, for these
The word that cannot be split, influence of the word to term vector inside word will not be considered.
The embodiment of the present invention obtains being enhanced with word vector for trained text in words joint vector training module
Term vector indicate the text feature of traditional parametrization that model and text feature parameterized module obtain to text to be analyzed
Notebook data is described in terms of two different, can more subtly describe text feature.
In the above-described embodiments, term vector generation module 4 specifically can be used for the participle exported based on text analysis model 1
The term vector of the text that feature and words vector joint training module 3 export enhanced with word vector indicates model, ties
The semantic information that the word information in word and word is included is closed, constructs the united term vector model of words, and combine by the words
Term vector model mapped, output analysis complete text data word vector enhancing term vector.
Term vector generation module 4 has comprehensively considered the semantic letter that the word information in word and word is included during training
Breath.After obtaining the united term vector model of trained words, the word of input text can be obtained by by the mapping of the model
Vector description data.
In the above-described embodiments, the first single classifier training module 5 specifically can be used for the method using condition random field
Establish the text feature of mapping parameters and the first mapping model of prosody hierarchy relationship between structure.
Wherein, the first mapping model reflects the probability that each word pauses or do not stop on the prosody hierarchy at place.
Herein, prosody hierarchy structure may include rhythm word, prosodic phrase and intonation phrase.
In the rhythm structure level prediction based on the first single classifier training module 5, for different prosody hierarchies, adopt
With different text feature combinations and characteristic window length, the accuracy of model prediction is increased in this way.
In the above-described embodiments, the second single classifier training module 6 specifically can be used for following using two-way long short-term memory
Ring neural network maps the term vector of word vector enhancing and the second mapping model of prosody hierarchy relationship between structure.
Wherein, the second mapping model reflects the probability that each word pauses or do not stop on the prosody hierarchy at place.
Fig. 3 schematically illustrates the structure of feature importance ranking module.As shown in figure 3, feature importance ranking mould
Block 7 can specifically include: text feature set extraction module 71, F-Score value promote computing module 72 and feature importance row
Sequence output module 73.Wherein, text feature set extraction module 71 is mentioned for the text feature based on parametrization by enumerative technique
Take text feature set.Text feature set extraction module 71 extracts various possible feature combinations as feature by enumerative technique
The input of importance ranking module 7.F-Score value promotes computing module 72 for calculating separately text feature set extraction module
The 71 each features extracted respectively as the first single classifier training module 5 input when verifying collection it is upper brought by F-Score
The lifting values of value.Feature importance ranking output module 73 promotes each F-Score that computing module 72 obtains to F-Score value
The lifting values of value are ranked up, and export the text parameter feature with predtermined category performance.
Wherein, the text parameter feature with predtermined category performance can be by will select from maximum value in ranking results
Take the feature of predetermined number as obtained from importance characteristic.
The performance of prosody prediction can be improved by setting feature importance ranking module 7 in the embodiment of the present invention.
Fig. 4 schematically illustrates the structure of Model Fusion module.As shown in figure 4, Model Fusion module 8 specifically can wrap
It includes: the first single classifier output module 81, the second single classifier output module 82, important feature generation module 83 and fusion forecasting
Module 84.Wherein, the first single classifier output module 81 is connected with the first single classifier training module 5, for reflecting according to first
Model is penetrated, determines the first probability that prosody hierarchy prediction pauses and do not stop.Second single classifier output module 82 and second is single
Classifier training module 6 is connected, for determining that prosody hierarchy prediction pauses and do not stop second is general according to the second mapping model
Rate.Important feature generation module 83 is connected with feature importance ranking module 7, for having predtermined category performance by calculating
Text parameter feature exports the important feature of prosody hierarchy to the contribution of F-Score value.Fusion forecasting module 84 is used for the
One single classifier output module 81 output the first probability and the second single classifier output module 82 output the second probability and
The important feature that important feature generation module 83 exports is iterated decision tree fusion, to determine the rhythm side of prosody hierarchy structure
The prediction result on boundary.
Above-mentioned fusion forecasting module 84 comprehensively considers the first single classifier training module 5, the second single classifier training module 6
Influence with feature importance screening module 7 to final result, to generate the prediction on the rhythm boundary (pausing) of the level
As a result, the result will be as the prediction of next level input feature vector.
The embodiment of the present invention introduces the iteration decision Tree algorithms in integrated study by setting Model Fusion module, right
The output and feature ordering module that two single classifiers generate generate importance characteristic and are merged, and substantially increase prosody prediction
Performance so that obtained synthesis speech naturalness and expressive force are more preferable.
Therefore, the embodiment of the present invention passes through text analysis model 1, text feature parameterized module 2, words vector joint instruction
It is important to practice module 3, term vector generation module 4, the first single classifier training module 5, the second single classifier training module 6, feature
Property sorting module 7 and Model Fusion module 8 can by any text carry out three rhythm word, prosodic phrase and intonation phrase differences
The prediction of prosody hierarchy structure for instructing the rear end of speech synthesis to carry out speech synthesis, and then improves oneself for synthesizing voice
So degree and expressive force.
It should be noted that Chinese Prosodic Hierarchy forecasting system provided by the above embodiment is carrying out Chinese fascicule
Level structure predict when, only the example of the division of the above functional modules, in practical applications, can according to need and
Above-mentioned function distribution is completed by different functional modules, i.e., the module in the embodiment of the present invention can also be decomposed again or
Combination, such as the module of above-described embodiment can be merged into a module, multiple submodule can also be further split into, with complete
At all or part of function described above.For the title of module involved in the embodiment of the present invention, it is only for area
Divide modules, is not intended as inappropriate limitation of the present invention.
As used herein, term " module " may refer to the software object executed on a computing system or routine
(it can be used the language such as C language and is achieved).Disparate modules described herein can be embodied as calculating
The object or process (for example, as independent thread) executed in system.While it is preferred that being retouched herein with software to realize
The system and method stated, but it is also possible with the combined realization of hardware or software and hardware and be that can be conceived to
's.
The embodiment of the present invention can be based on the operation of the platforms such as windows, linux.
It will be understood by those skilled in the art that above-mentioned Chinese Prosodic Hierarchy forecasting system can also include some other
Known features, such as processor, controller, memory etc., wherein memory include but is not limited to random access memory, flash memory, only
Read memory, programmable read only memory, volatile memory, nonvolatile memory, serial storage, parallel storage or
Register etc., processor includes but is not limited to CPLD/FPGA, DSP, arm processor, MIPS processor etc., in order to unnecessarily
Fuzzy embodiment of the disclosure, these well known structures are not shown in FIG. 1.
It should be understood that the quantity of the modules in Fig. 1 is only schematical.According to actual needs, each module can be with
With arbitrary quantity.
Technical solution is provided for the embodiments of the invention above to be described in detail.Although applying herein specific
A example the principle of the present invention and embodiment are expounded, still, the explanation of above-described embodiment be only applicable to help manage
Solve the principle of the embodiment of the present invention;Meanwhile to those skilled in the art, according to an embodiment of the present invention, it is being embodied
It can be made a change within mode and application range.
It, can also be into it should be noted that the block diagram being referred to herein is not limited solely to form shown in this article
Other divisions of row and/or combination.
It should be noted that: label and text in attached drawing are intended merely to be illustrated more clearly that the present invention, are not intended as pair
The improper restriction of the scope of the present invention.
Again it should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, rather than be used to describe or indicate specific sequence or precedence.It should be understood that this
The data that sample uses can be interchanged in appropriate circumstances, so that the embodiment of the present invention described herein can be in addition at this
In illustrate or description those of other than sequence implement.
Term " includes " or any other like term are intended to cover non-exclusive inclusion, so that including a system
Process, method, article or equipment/device of column element not only includes those elements, but also including being not explicitly listed
Other elements, or further include the intrinsic element of these process, method, article or equipment/devices.
Particular embodiments described above has carried out further in detail the purpose of the present invention, technical scheme and beneficial effects
It describes in detail bright, it should be understood that the above is only a specific embodiment of the present invention, is not intended to restrict the invention, it is all
Within the spirit and principles in the present invention, any modification, equivalent substitution, improvement and etc. done should be included in guarantor of the invention
Within the scope of shield.
Claims (9)
1. a kind of Chinese Prosodic Hierarchy forecasting system, which is characterized in that the forecasting system includes:
Text analysis model, for receiving text data to be analyzed, the text data that output analysis is completed;
Text feature parameterized module is connected with the text analysis model, the text data completed for receiving the analysis,
The text feature of output parameter;
Words vector joint training module, is connected with the text analysis model, generates for receiving the text analysis model
The text data completed of the analysis, and language model of the joint training based on word vector sum term vector export the use of text
The term vector that word vector is enhanced indicates model;
Term vector generation module, the text data that the analysis for being exported based on the text analysis model is completed, utilizes
The term vector enhanced with word vector indicates model, exports the word vector enhancing for the text data that the analysis is completed
Term vector;
First single classifier training module is connected with the text feature parameterized module, for training from the text feature
First mapping model of the text feature of the parametrization of parameterized module output to prosody hierarchy structure;
Second single classifier training module is connected with the term vector generation module, generates mould from the term vector for training
Second mapping model of the term vector of the word vector enhancing of block output to the prosody hierarchy structure;
Feature importance ranking module is connected with the first single classifier training module, has predtermined category for exporting
The text parameter feature of energy;
Model Fusion module, with the first single classifier training module, the second single classifier training module and the spy
It levies importance ranking module to be connected, for receiving the first single classifier training module and second single classifier training mould
Block output first mapping model and second mapping model and by the feature importance ranking module output
The text parameter feature with predtermined category performance, and using integrated learning approach to first single classifier training mould
Block and the second single classifier training module are merged in decision-making level, to export the knot of the prosody hierarchy structure prediction
Fruit.
2. forecasting system according to claim 1, which is characterized in that the text analysis model be specifically used for it is described to
The text data of analysis carries out regularization processing, corrects polyphone pronunciation mistake, and carry out Regularization to number, exports institute
State the text data that analysis is completed.
3. forecasting system according to claim 1, which is characterized in that the text data that the analysis is completed includes symbol spy
It seeks peace numeric type feature;
The text feature parameterized module is specifically for utilizing one-hot representation method to export the text analysis model
Symbolic feature is handled, and retains the numeric type feature of the text analysis model output, to export the parametrization
Text feature.
4. forecasting system according to claim 1, which is characterized in that the text data that the analysis is completed further includes participle
Feature;
The words vector joint training module specifically includes:
Word location extraction module, the participle feature for being exported according to the text analysis model, and appeared in word according to word
Position, each word is clustered, extract word location information;
Word context cluster module, the word location information for being extracted based on the word location extraction module, according to described
The different contexts of word are clustered, and indicate the same word with multiple vectors;
Non- decomposition vocabulary establishes module, the text data that the analysis for being exported based on the text analysis model is completed,
Establish the non-word lists for decomposing word;
Specific words vector joint training module, for according to the non-non- decomposition word for decomposing vocabulary and establishing module output
Word lists and the word context cluster module output as a result, exporting the word of the text enhanced with word vector
Vector indicates.
5. forecasting system according to claim 1, which is characterized in that the text data that the analysis is completed further includes participle
Feature;
The term vector generation module is specifically used for the participle feature exported based on the text analysis model and described
The term vector of the text of words vector joint training module output enhanced with word vector indicates model, bluebeard compound and
The semantic information that word information in word is included constructs the united term vector model of words, and passes through the united word of the words
Vector model is mapped, and the term vector of the word vector enhancing for the text data that the analysis is completed is exported.
6. forecasting system according to claim 1, which is characterized in that the first single classifier training module is specifically used for
The text feature and the prosody hierarchy relationship between structure for mapping the parametrization are established using the method for condition random field
First mapping model.
7. forecasting system according to claim 1, which is characterized in that the second single classifier training module is specifically used for
The term vector and the prosody hierarchy knot for mapping the word vector enhancing are established using two-way long short-term memory Recognition with Recurrent Neural Network
Second mapping model of relationship between structure.
8. forecasting system according to claim 1, which is characterized in that the feature importance ranking module specifically includes:
Text feature set extraction module, the text feature for changing based on the parameter extract text feature collection by enumerative technique
It closes;
F-Score value promotes computing module, for calculating separately each feature point that the text feature set extraction module extracts
When input not as the first single classifier training module verifying collection it is upper brought by the F-Score value lifting values;
Feature importance ranking output module promotes mentioning for the F-Score value that computing module obtains to the F-Score value
Appreciation is ranked up, the output text parameter feature with predtermined category performance.
9. forecasting system according to claim 8, which is characterized in that the Model Fusion module specifically includes:
First single classifier output module is connected with the first single classifier training module, for according to first mapping
Model determines the first probability that prosody hierarchy prediction pauses and do not stop;
Second single classifier output module is connected with the second single classifier training module, for according to second mapping
Model determines the second probability that prosody hierarchy prediction pauses and do not stop;
Important feature generation module is connected with the feature importance ranking module, for described with predetermined point by calculating
The text parameter feature of class performance exports the important feature of prosody hierarchy to the contribution of the F-Score value;
Fusion forecasting module, first probability and second list for being exported to the first single classifier output module
Classifier output module output second probability and the important feature generation module output the important feature into
Row iteration decision tree fusion, with the prediction result on the rhythm boundary of the determination prosody hierarchy structure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610642956.0A CN106227721B (en) | 2016-08-08 | 2016-08-08 | Chinese Prosodic Hierarchy forecasting system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610642956.0A CN106227721B (en) | 2016-08-08 | 2016-08-08 | Chinese Prosodic Hierarchy forecasting system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106227721A CN106227721A (en) | 2016-12-14 |
CN106227721B true CN106227721B (en) | 2019-02-01 |
Family
ID=57547688
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610642956.0A Active CN106227721B (en) | 2016-08-08 | 2016-08-08 | Chinese Prosodic Hierarchy forecasting system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106227721B (en) |
Families Citing this family (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106682217A (en) * | 2016-12-31 | 2017-05-17 | 成都数联铭品科技有限公司 | Method for enterprise second-grade industry classification based on automatic screening and learning of information |
CN106652995A (en) * | 2016-12-31 | 2017-05-10 | 深圳市优必选科技有限公司 | Voice broadcasting method and system for text |
CN108628868B (en) * | 2017-03-16 | 2021-08-10 | 北京京东尚科信息技术有限公司 | Text classification method and device |
CN107423284B (en) * | 2017-06-14 | 2020-03-06 | 中国科学院自动化研究所 | Method and system for constructing sentence representation fusing internal structure information of Chinese words |
CN107451115B (en) * | 2017-07-11 | 2020-03-06 | 中国科学院自动化研究所 | Method and system for constructing end-to-end Chinese prosody hierarchical structure prediction model |
CN107995428B (en) * | 2017-12-21 | 2020-02-07 | Oppo广东移动通信有限公司 | Image processing method, image processing device, storage medium and mobile terminal |
CN108595416A (en) * | 2018-03-27 | 2018-09-28 | 义语智能科技(上海)有限公司 | Character string processing method and equipment |
CN108549850B (en) * | 2018-03-27 | 2021-07-16 | 联想(北京)有限公司 | Image identification method and electronic equipment |
CN108595590A (en) * | 2018-04-19 | 2018-09-28 | 中国科学院电子学研究所苏州研究院 | A kind of Chinese Text Categorization based on fusion attention model |
CN108763487B (en) * | 2018-05-30 | 2021-08-10 | 华南理工大学 | Mean Shift-based word representation method fusing part-of-speech and sentence information |
CN110427608B (en) * | 2019-06-24 | 2021-06-08 | 浙江大学 | Chinese word vector representation learning method introducing layered shape-sound characteristics |
CN111178046A (en) * | 2019-12-16 | 2020-05-19 | 山东众阳健康科技集团有限公司 | Word vector training method based on sorting |
CN111226275A (en) * | 2019-12-31 | 2020-06-02 | 深圳市优必选科技股份有限公司 | Voice synthesis method, device, terminal and medium based on rhythm characteristic prediction |
CN111738360B (en) * | 2020-07-24 | 2020-11-27 | 支付宝(杭州)信息技术有限公司 | Two-party decision tree training method and system |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101452699A (en) * | 2007-12-04 | 2009-06-10 | 株式会社东芝 | Rhythm self-adapting and speech synthesizing method and apparatus |
CN104867490A (en) * | 2015-06-12 | 2015-08-26 | 百度在线网络技术(北京)有限公司 | Metrical structure predicting method and metrical structure predicting device |
CN104916284A (en) * | 2015-06-10 | 2015-09-16 | 百度在线网络技术(北京)有限公司 | Prosody and acoustics joint modeling method and device for voice synthesis system |
CN105185374A (en) * | 2015-09-11 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | Prosodic hierarchy annotation method and device |
CN105244020A (en) * | 2015-09-24 | 2016-01-13 | 百度在线网络技术(北京)有限公司 | Prosodic hierarchy model training method, text-to-speech method and text-to-speech device |
CN105654939A (en) * | 2016-01-04 | 2016-06-08 | 北京时代瑞朗科技有限公司 | Voice synthesis method based on voice vector textual characteristics |
-
2016
- 2016-08-08 CN CN201610642956.0A patent/CN106227721B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101452699A (en) * | 2007-12-04 | 2009-06-10 | 株式会社东芝 | Rhythm self-adapting and speech synthesizing method and apparatus |
CN104916284A (en) * | 2015-06-10 | 2015-09-16 | 百度在线网络技术(北京)有限公司 | Prosody and acoustics joint modeling method and device for voice synthesis system |
CN104867490A (en) * | 2015-06-12 | 2015-08-26 | 百度在线网络技术(北京)有限公司 | Metrical structure predicting method and metrical structure predicting device |
CN105185374A (en) * | 2015-09-11 | 2015-12-23 | 百度在线网络技术(北京)有限公司 | Prosodic hierarchy annotation method and device |
CN105244020A (en) * | 2015-09-24 | 2016-01-13 | 百度在线网络技术(北京)有限公司 | Prosodic hierarchy model training method, text-to-speech method and text-to-speech device |
CN105654939A (en) * | 2016-01-04 | 2016-06-08 | 北京时代瑞朗科技有限公司 | Voice synthesis method based on voice vector textual characteristics |
Non-Patent Citations (2)
Title |
---|
Prosodic word prediction using the lexical information;Honghui DONG et al;《Natural Language Processing and Knowledge Engineering,2005.IEEE NLP-KE"05》;20060227;第189-193页 * |
基于深度学习的韵律结构预测学习的韵律结构预测学习的韵律结构预测;丁星光 等;《NCMMSC2015》;20151031;正文第1-5页 * |
Also Published As
Publication number | Publication date |
---|---|
CN106227721A (en) | 2016-12-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106227721B (en) | Chinese Prosodic Hierarchy forecasting system | |
CN105869634B (en) | It is a kind of based on field band feedback speech recognition after text error correction method and system | |
CN111026842B (en) | Natural language processing method, natural language processing device and intelligent question-answering system | |
CN111931506B (en) | Entity relationship extraction method based on graph information enhancement | |
CN103345922B (en) | A kind of large-length voice full-automatic segmentation method | |
WO2018028077A1 (en) | Deep learning based method and device for chinese semantics analysis | |
CN107330011A (en) | The recognition methods of the name entity of many strategy fusions and device | |
CN107577662A (en) | Towards the semantic understanding system and method for Chinese text | |
CN107818164A (en) | A kind of intelligent answer method and its system | |
CN110083700A (en) | A kind of enterprise's public sentiment sensibility classification method and system based on convolutional neural networks | |
CN110321432A (en) | Textual event information extracting method, electronic device and non-volatile memory medium | |
CN108711421A (en) | A kind of voice recognition acoustic model method for building up and device and electronic equipment | |
CN110502610A (en) | Intelligent sound endorsement method, device and medium based on text semantic similarity | |
CN106294344A (en) | Video retrieval method and device | |
CN108228758A (en) | A kind of file classification method and device | |
Anand Kumar et al. | A sequence labeling approach to morphological analyzer for tamil language | |
CN109918501A (en) | Method, apparatus, equipment and the storage medium of news article classification | |
CN110232123A (en) | The sentiment analysis method and device thereof of text calculate equipment and readable medium | |
CN110188195A (en) | A kind of text intension recognizing method, device and equipment based on deep learning | |
CN110851601A (en) | Cross-domain emotion classification system and method based on layered attention mechanism | |
CN113051914A (en) | Enterprise hidden label extraction method and device based on multi-feature dynamic portrait | |
CN109582788A (en) | Comment spam training, recognition methods, device, equipment and readable storage medium storing program for executing | |
CN110019741A (en) | Request-answer system answer matching process, device, equipment and readable storage medium storing program for executing | |
JP2018005690A (en) | Information processing apparatus and program | |
CN110853656A (en) | Audio tampering identification algorithm based on improved neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |