CN111949796B - Method and system for analyzing front-end text of voice synthesis of resource-limited language - Google Patents
Method and system for analyzing front-end text of voice synthesis of resource-limited language Download PDFInfo
- Publication number
- CN111949796B CN111949796B CN202010858597.9A CN202010858597A CN111949796B CN 111949796 B CN111949796 B CN 111949796B CN 202010858597 A CN202010858597 A CN 202010858597A CN 111949796 B CN111949796 B CN 111949796B
- Authority
- CN
- China
- Prior art keywords
- data
- classifier
- domain data
- neural network
- target domain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000015572 biosynthetic process Effects 0.000 title claims abstract description 31
- 238000003786 synthesis reaction Methods 0.000 title claims abstract description 31
- 238000013528 artificial neural network Methods 0.000 claims abstract description 81
- 230000008569 process Effects 0.000 claims abstract description 10
- 238000002372 labelling Methods 0.000 abstract description 6
- 230000007547 defect Effects 0.000 abstract description 5
- 230000009286 beneficial effect Effects 0.000 description 10
- 241001672694 Citrus reticulata Species 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2415—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/047—Probabilistic or stochastic networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Probability & Statistics with Applications (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a method and a system for analyzing a front-end text of voice synthesis of a resource-limited language, wherein the method comprises the following steps: acquiring training data, wherein the training data comprises source domain data, target domain data with labels and target domain data without labels; training the neural network structure based on the mixed data and the target domain data without the label; and performing speech synthesis front-end text analysis on the resource-constrained language by using the trained neural network structure. According to the method, only a small amount of labeling data of the resource-limited languages is needed, and the quality is easier to control; from the semi-supervised learning point of view, a small amount of labeled resource-restricted language data is added in the training process, and feature distribution priori knowledge of the resource-restricted language data can be learned while feature distribution of the resource-rich language data is learned, so that the defect that an unsupervised domain self-adaption method cannot consider the feature distribution of the resource-restricted language data is avoided.
Description
Technical Field
The invention relates to the technical field of speech synthesis, in particular to a method and a system for analyzing a front-end text of resource-limited language speech synthesis.
Background
Currently, for resource-constrained languages (e.g., chinese domestic dialects), there are generally two methods for speech synthesis front-end text analysis: one method is to define the label type according to expert knowledge, manually make a large amount of target domain data with labels, and then input the target domain data into a designed neural network for training parameters, wherein the method has the problems of data distribution equalization, data labeling correctness, consistency, safety, timeliness and the like; another method is to use a migration learning method to know a neural network A of a rich-resource language (Chinese Mandarin), newly establish a neural network B of a limited-resource language, and train parameters in the network B by using local features (parameters) in the network A and a small amount of target domain data with labels, namely, the data of the limited-resource language; or a small amount of target domain data with labels is directly used for fine adjustment on the network A, so that a neural network of a resource-limited language is obtained, and the method has very poor effect between open data sets which are more in line with actual scenes; feature correlation between the target domain data and the source domain data (i.e., the resource rich language) is not fully utilized; tuning large-parameter neural networks using small amounts of tagged target domain data is often difficult to implement.
In order to solve the problem of data resource shortage in the analysis of the front-end text of the voice synthesis of the language with limited resources, a method and a system for analyzing the front-end text of the voice synthesis of the language with limited resources are needed.
Disclosure of Invention
The invention provides a method and a system for analyzing a front-end text of voice synthesis of a resource-limited language, which are used for solving the problem of data resource shortage in the front-end text analysis of voice synthesis of the resource-limited language.
The invention provides a method for analyzing a front-end text of voice synthesis of a resource-limited language, which comprises the following steps:
step 1: acquiring training data, wherein the training data comprises source domain data, target domain data with labels and target domain data without labels, the source domain data comprises text data corresponding to rich languages of resources, and the target domain data comprises text data corresponding to limited languages of resources;
step 2: training a neural network structure based on hybrid data and the unlabeled target domain data, wherein the hybrid data includes the labeled source domain data and the labeled target domain data;
step 3: and performing voice synthesis front-end text analysis on the resource limited languages by using the trained neural network structure.
Further, in the step 1, the proportion of the source domain data in the training data is 55% -65%, the proportion of the tagged target domain data in the training data is 8% -12%, and the proportion of the untagged target domain data in the training data is 27% -33%.
Further, in the step 2, the neural network structure includes a feature extractor and a classifier, wherein the classifier immediately follows the feature extractor.
Further, the feature extractor includes an encoder employing a transducer.
Further, the classifier includes a full connectivity layer, a softmax layer, and a CRF layer.
Further, the step 2: training the neural network structure based on the mixed data and the target domain data without the label, and executing the following steps:
step S21: inputting the mixed data in the neural network structure to perform supervised learning on the neural network structure and synchronously updating network parameters of the feature extractor and the classifier;
step S22: and simultaneously inputting the mixed data and the target domain data without labels in the neural network structure so as to perform semi-supervised learning on the neural network structure and only update the network parameters of the classifier.
Further, in the step S21, when the mixed data is input in the neural network structure, the output features through the feature extractor are input to the classifier once, and a dropout strategy is not adopted to learn the distinguishing features.
Further, in the step S22, the mixed data and the target domain data without labels are input into the neural network structure at the same time, so as to perform semi-supervised learning on the neural network structure, and only update the network parameters of the classifier, and the following steps are performed:
step S221: when the mixed data and the target domain data without labels are simultaneously input into the neural network structure, the output features of the feature extractor are respectively input into the classifier twice, and different network nodes are respectively dropout by adopting a dropout strategy so as to sample a first classifier network and a second classifier network;
step S222: only the network parameters of the classifier are updated by maximizing the KL-divergence between the output probabilities of the first classifier network and the second classifier network.
Further, the step 2: training the neural network structure based on the mixed data and the target domain data without the label, and further performing the following steps:
step S23: in the process of training the neural network structure, the classifier and the feature extractor are mutually opposed to interactively update network parameters in the classifier and the feature extractor;
step S24: inputting unlabeled target domain data in the neural network structure, and updating network parameters of the feature extractor by minimizing KL divergence between output probabilities of the first classifier network and the second classifier network.
The method for analyzing the front-end text of the voice synthesis of the resource-limited language provided by the embodiment of the invention has the following beneficial effects: only a small amount of labeling data of the resource-limited language is needed, and the quality is easier to control; in addition, from the semi-supervised learning point of view, a small amount of labeled resource-restricted language data is added in the training process, the feature distribution priori knowledge of the resource-restricted language data can be learned while the feature distribution of the resource-rich language data is learned, and the defect that the unsupervised domain self-adaptive method cannot consider the feature distribution of the resource-restricted language data is avoided.
The invention also provides a system for analyzing the front-end text of the voice synthesis of the resource-limited language, which comprises:
the training data acquisition module is used for acquiring training data, wherein the training data comprises source domain data, target domain data with labels and target domain data without labels, the source domain data comprises text data corresponding to rich languages of resources, and the target domain data comprises text data corresponding to limited languages of resources;
the neural network training module is used for training a neural network structure based on mixed data and the target domain data without labels, wherein the mixed data comprises the source domain data with labels and the target domain data with labels;
and the front-end text analysis module is used for carrying out voice synthesis front-end text analysis on the resource limited languages by utilizing the trained neural network structure.
The front-end text analysis system for the voice synthesis of the resource-limited language provided by the embodiment of the invention has the following beneficial effects: only a small amount of labeling data of the resource-limited language is needed, and the quality is easier to control; in addition, from the semi-supervised learning point of view, the neural network training module adds a small amount of labeled resource-restricted language data in the training process, and can learn the feature distribution priori knowledge of the resource-restricted language data while learning the feature distribution of the resource-rich language data, so that the defect that the unsupervised domain self-adaption technology cannot consider the feature distribution of the resource-restricted language is avoided, and the problem of data resource shortage in the front-end text analysis of the resource-restricted language speech synthesis can be solved by utilizing the semi-supervised domain self-adaption technology.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a flow chart of a method for analyzing a front-end text of a speech synthesis of a resource-constrained language in an embodiment of the invention;
FIG. 2 is a block diagram of a system for front-end text analysis for speech synthesis in a resource-constrained language in accordance with an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
The embodiment of the invention provides a method for analyzing a front-end text of voice synthesis of a resource-limited language, which is shown in fig. 1 and comprises the following steps:
step 1: acquiring training data, wherein the training data comprises source domain data, target domain data with labels and target domain data without labels, the source domain data comprises text data corresponding to rich languages of resources, and the target domain data comprises text data corresponding to limited languages of resources;
step 2: training a neural network structure based on hybrid data and the unlabeled target domain data, wherein the hybrid data includes the labeled source domain data and the labeled target domain data;
step 3: and performing voice synthesis front-end text analysis on the resource limited languages by using the trained neural network structure.
The working principle of the technical scheme is as follows: the rich resource language may be, for example, mandarin Chinese, and the limited resource language may be, for example, domestic dialect.
Based on semi-supervised learning, a small amount of tagged resource-restricted language data is added into the resource-rich language data, so that the neural network can learn some feature distribution priori knowledge of the resource-restricted language data, and further the tag prediction accuracy of the resource-restricted language data is improved. Specifically, first, training data composed of source domain data, labeled target domain data, and unlabeled target domain data is acquired; then training the neural network structure based on the mixed data consisting of the labeled source domain data and the labeled target domain data and the unlabeled target domain data; finally, the trained neural network structure is utilized to carry out voice synthesis front-end text analysis on the resource limited languages.
The beneficial effects of the technical scheme are as follows: only a small amount of labeling data of the resource-limited language is needed, and the quality is easier to control; in addition, from the semi-supervised learning point of view, a small amount of labeled resource-restricted language data is added in the training process, the feature distribution priori knowledge of the resource-restricted language data can be learned while the feature distribution of the resource-rich language data is learned, and the defect that the unsupervised domain self-adaptive method cannot consider the feature distribution of the resource-restricted language is avoided.
In one embodiment, in the step 1, the proportion of the source domain data in the training data is 55% -65%, the proportion of the tagged target domain data in the training data is 8% -12%, and the proportion of the untagged target domain data in the training data is 27% -33%.
The working principle of the technical scheme is as follows: as an example and not by way of limitation, the proportion of source domain data is 60%, the proportion of tagged target domain data is 10%, and the proportion of untagged target domain data is 30%.
The beneficial effects of the technical scheme are as follows: the source domain data, the target domain data with the labels and the proportion setting of the target domain data with the labels in the training data are provided, so that a small amount of resource limited language data with the labels is added in the training process, and the prior knowledge of the characteristic distribution of the resource limited language data can be learned while the characteristic distribution of the resource rich language data is learned.
In one embodiment, in the step 2, the neural network structure includes a feature extractor and a classifier, wherein the classifier immediately follows the feature extractor.
The working principle of the technical scheme is as follows: the feature extractor includes an encoder employing a transducer that can fully extract feature information of text context.
The classifier includes a full connectivity layer, a softmax layer, and a CRF layer. The classifier mainly has two functions, one is used as a conventional classifier, the neural network parameters are trained by using mixed data, the other is used as a determiner, target domain data without labels is used, the characteristics of the target domain data are respectively input into the classifier network twice after passing through the characteristic extractor, different network nodes in the full-connection layer of the two times of dropouts are judged, and the similarity between the two different output probabilities is measured by using a Kullback-Leibler divergence (KL divergence).
The beneficial effects of the technical scheme are as follows: the specific structure of the neural network is provided, the feature extractor adopts a transducer encoder, the transducer encoder can fully extract the feature information of the text context, and the classifier comprises a full connection layer, a softmax layer and a CRF layer, can be used as a conventional classifier and can be used as a decision device.
In one embodiment, the step 2: training the neural network structure based on the mixed data and the target domain data without the label, and executing the following steps:
step S21: inputting the mixed data in the neural network structure to perform supervised learning on the neural network structure and synchronously updating network parameters of the feature extractor and the classifier;
step S22: and simultaneously inputting the mixed data and the target domain data without labels in the neural network structure so as to perform semi-supervised learning on the neural network structure and only update the network parameters of the classifier.
The working principle of the technical scheme is as follows: firstly, inputting mixed data, performing supervised learning, and synchronously training parameters of a feature extractor and a classifier, wherein the aim is to accurately classify the mixed data so as to obtain distinguishing features; then, the mixed data (both tagged) and untagged target domain data are input simultaneously, and only the parameters of the classifier are updated.
The beneficial effects of the technical scheme are as follows: specific steps are provided for training a neural network structure based on the hybrid data and the unlabeled target domain data.
In one embodiment, in the step S21, when the mixed data is input in the neural network structure, the output features through the feature extractor are input to the classifier once, without adopting a dropout strategy, so as to learn the distinguishing features.
The working principle of the technical scheme is as follows: when the mixed data is input, the input to the classifier is once through the feature extractor, and dropout is not performed so that the distinguishing features are learned.
The beneficial effects of the technical scheme are as follows: specific methods of inputting the hybrid data in a neural network structure to supervise learning the neural network structure are provided.
In one embodiment, in the step S22, the mixed data and the target domain data without labels are input into the neural network structure at the same time, so as to perform semi-supervised learning on the neural network structure, and only update the network parameters of the classifier, and perform the following steps:
step S221: when the mixed data and the target domain data without labels are simultaneously input into the neural network structure, the output features of the feature extractor are respectively input into the classifier twice, and different network nodes are respectively dropout by adopting a dropout strategy so as to sample a first classifier network and a second classifier network;
step S222: only the network parameters of the classifier are updated by maximizing the KL-divergence between the output probabilities of the first classifier network and the second classifier network.
The working principle of the technical scheme is as follows: inputting target domain data without labels, respectively inputting the output characteristics of the characteristic extractor into the classifiers twice, wherein the network nodes with different dropouts are respectively equivalent to sampling two classifier networks, namely a first classifier network C1 and a second classifier network C2; then maximizing the KL divergence between the two output probabilities; thus, the classifier can detect the target domain data near the decision boundary, and different neural nodes on the classifier can learn more different characteristic representations.
The beneficial effects of the technical scheme are as follows: specific steps are provided for simultaneously inputting hybrid data and unlabeled target domain data in a neural network structure to perform semi-supervised learning of the neural network structure.
In one embodiment, the step 2: training the neural network structure based on the mixed data and the target domain data without the label, and further performing the following steps:
step S23: in the process of training the neural network structure, the classifier and the feature extractor are mutually opposed to interactively update network parameters in the classifier and the feature extractor;
step S24: inputting unlabeled target domain data in the neural network structure, and updating network parameters of the feature extractor by minimizing KL divergence between output probabilities of the first classifier network and the second classifier network.
The working principle of the technical scheme is as follows: as described above, two classifier networks are sampled and the KL divergence between the output probabilities is maximized, so that the classifier can detect the target domain data near the decision boundary, different neural nodes on the classifier can learn more different feature representations, and at the same time, the feature extractor has to generate more distinguishing features in order to avoid these feature spaces, and the classifier and the feature extractor are mutually opposed and interactively updated in the training process.
In addition, the unlabeled target domain data is input, and the parameters of the feature extractor are updated by minimizing the KL divergence between the output probabilities of the first classifier network C1 and the second classifier network C2, so that the feature representation of the unlabeled target domain data far from the decision boundary can be obtained.
The beneficial effects of the technical scheme are as follows: specific steps for training the neural network structure based on the hybrid data and the unlabeled target domain data are provided.
As shown in fig. 2, an embodiment of the present invention provides a system for analyzing a front-end text of a speech synthesis of a resource-constrained language, including:
a training data obtaining module 201, configured to obtain training data, where the training data includes source domain data, labeled target domain data, and unlabeled target domain data, the source domain data includes text data corresponding to a resource-rich language, and the target domain data includes text data corresponding to a resource-limited language;
a neural network training module 202, configured to train a neural network structure based on hybrid data and the unlabeled target domain data, where the hybrid data includes the labeled source domain data and the labeled target domain data;
the front-end text analysis module 203 is configured to perform speech synthesis front-end text analysis on the resource-constrained language by using the trained neural network structure.
The working principle of the technical scheme is as follows: the rich resource language may be, for example, mandarin Chinese, and the limited resource language may be, for example, domestic dialect.
Based on semi-supervised learning, a small amount of tagged resource-restricted language data is added into the resource-rich language data, so that the neural network can learn some feature distribution priori knowledge of the resource-restricted language data, and further the tag prediction accuracy of the resource-restricted language data is improved. Specifically, the training data acquisition module 201 acquires training data composed of source domain data, labeled target domain data, and unlabeled target domain data; the neural network training module 202 trains the neural network structure based on the mixed data composed of the labeled source domain data and the labeled target domain data and the unlabeled target domain data; the front-end text analysis module 203 performs speech synthesis front-end text analysis on the resource-constrained language by using the trained neural network structure.
The beneficial effects of the technical scheme are as follows: only a small amount of labeling data of the resource-limited language is needed, and the quality is easier to control; in addition, from the semi-supervised learning point of view, the neural network training module adds a small amount of labeled resource-restricted language data in the training process, and can learn the feature distribution priori knowledge of the resource-restricted language data while learning the feature distribution of the resource-rich language data, so that the defect that the unsupervised domain self-adaption technology cannot consider the feature distribution of the resource-restricted language is avoided, and the problem of data resource shortage in the front-end text analysis of the resource-restricted language speech synthesis can be solved by utilizing the semi-supervised domain self-adaption technology.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.
Claims (6)
1. A method for analyzing a front-end text of voice synthesis of a resource-limited language is characterized by comprising the following steps:
step 1: acquiring training data, wherein the training data comprises source domain data, target domain data with labels and target domain data without labels, the source domain data comprises text data corresponding to rich languages of resources, and the target domain data comprises text data corresponding to limited languages of resources; the proportion of the source domain data in the training data is 55% -65%, the proportion of the labeled target domain data in the training data is 8% -12%, and the proportion of the unlabeled target domain data in the training data is 27% -33%;
step 2: training the neural network structure based on the hybrid data and the unlabeled target domain data, comprising: step S21: inputting the mixed data into the neural network structure to perform supervised learning on the neural network structure and synchronously updating network parameters of a feature extractor and a classifier; step S22: simultaneously inputting the mixed data and the target domain data without labels in the neural network structure to perform semi-supervised learning on the neural network structure and only update network parameters of the classifier, wherein the method comprises the following steps: step S221: when the mixed data and the target domain data without labels are simultaneously input into the neural network structure, the output features of the feature extractor are respectively input into the classifier twice, and different network nodes are respectively dropout by adopting a dropout strategy so as to sample a first classifier network and a second classifier network; step S222: updating only network parameters of the classifier by maximizing KL divergence between output probabilities of the first classifier network and the second classifier network; wherein the hybrid data includes the tagged source domain data and the tagged target domain data; the neural network structure includes a feature extractor and a classifier, wherein the classifier immediately follows the feature extractor;
step 3: and performing voice synthesis front-end text analysis on the resource limited languages by using the trained neural network structure.
2. The method of claim 1, wherein the feature extractor comprises an encoder employing a transducer.
3. The method of claim 1, wherein the classifier comprises a fully connected layer, a softmax layer, and a CRF layer.
4. The method of claim 1, wherein in the step S21, when the mixed data is input in the neural network structure, the output features through the feature extractor are input to the classifier once without employing a dropout strategy to learn the distinguishing features.
5. The method according to claim 1, wherein said step 2: training the neural network structure based on the mixed data and the target domain data without the label, and further performing the following steps:
step S23: in the process of training the neural network structure, the classifier and the feature extractor are mutually opposed to interactively update network parameters in the classifier and the feature extractor;
step S24: inputting unlabeled target domain data in the neural network structure, and updating network parameters of the feature extractor by minimizing KL divergence between output probabilities of the first classifier network and the second classifier network.
6. A system for front-end text analysis for speech synthesis in a resource-constrained language, comprising:
the training data acquisition module is used for acquiring training data, wherein the training data comprises source domain data, target domain data with labels and target domain data without labels, the source domain data comprises text data corresponding to rich languages of resources, and the target domain data comprises text data corresponding to limited languages of resources; the proportion of the source domain data in the training data is 55% -65%, the proportion of the labeled target domain data in the training data is 8% -12%, and the proportion of the unlabeled target domain data in the training data is 27% -33%;
the neural network training module is configured to train a neural network structure based on the mixed data and the target domain data without the tag, and includes: step S21: inputting the mixed data into the neural network structure to perform supervised learning on the neural network structure and synchronously updating network parameters of a feature extractor and a classifier; step S22: simultaneously inputting the mixed data and the target domain data without labels in the neural network structure to perform semi-supervised learning on the neural network structure and only update network parameters of the classifier, wherein the method comprises the following steps: step S221: when the mixed data and the target domain data without labels are simultaneously input into the neural network structure, the output features of the feature extractor are respectively input into the classifier twice, and different network nodes are respectively dropout by adopting a dropout strategy so as to sample a first classifier network and a second classifier network; step S222: updating only network parameters of the classifier by maximizing KL divergence between output probabilities of the first classifier network and the second classifier network; wherein the hybrid data includes the tagged source domain data and the tagged target domain data; the neural network structure includes a feature extractor and a classifier, wherein the classifier immediately follows the feature extractor;
and the front-end text analysis module is used for carrying out voice synthesis front-end text analysis on the resource limited languages by utilizing the trained neural network structure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010858597.9A CN111949796B (en) | 2020-08-24 | 2020-08-24 | Method and system for analyzing front-end text of voice synthesis of resource-limited language |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010858597.9A CN111949796B (en) | 2020-08-24 | 2020-08-24 | Method and system for analyzing front-end text of voice synthesis of resource-limited language |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111949796A CN111949796A (en) | 2020-11-17 |
CN111949796B true CN111949796B (en) | 2023-10-20 |
Family
ID=73359690
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010858597.9A Active CN111949796B (en) | 2020-08-24 | 2020-08-24 | Method and system for analyzing front-end text of voice synthesis of resource-limited language |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111949796B (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114239859B (en) * | 2022-02-25 | 2022-07-08 | 杭州海康威视数字技术股份有限公司 | Power consumption data prediction method and device based on transfer learning and storage medium |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107909101A (en) * | 2017-11-10 | 2018-04-13 | 清华大学 | Semi-supervised transfer learning character identifying method and system based on convolutional neural networks |
CN108460134A (en) * | 2018-03-06 | 2018-08-28 | 云南大学 | The text subject disaggregated model and sorting technique of transfer learning are integrated based on multi-source domain |
CN109947086A (en) * | 2019-04-11 | 2019-06-28 | 清华大学 | Mechanical breakdown migration diagnostic method and system based on confrontation study |
CN110148398A (en) * | 2019-05-16 | 2019-08-20 | 平安科技(深圳)有限公司 | Training method, device, equipment and the storage medium of speech synthesis model |
CN110428818A (en) * | 2019-08-09 | 2019-11-08 | 中国科学院自动化研究所 | The multilingual speech recognition modeling of low-resource, audio recognition method |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190147854A1 (en) * | 2017-11-16 | 2019-05-16 | Microsoft Technology Licensing, Llc | Speech Recognition Source to Target Domain Adaptation |
-
2020
- 2020-08-24 CN CN202010858597.9A patent/CN111949796B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107909101A (en) * | 2017-11-10 | 2018-04-13 | 清华大学 | Semi-supervised transfer learning character identifying method and system based on convolutional neural networks |
CN108460134A (en) * | 2018-03-06 | 2018-08-28 | 云南大学 | The text subject disaggregated model and sorting technique of transfer learning are integrated based on multi-source domain |
CN109947086A (en) * | 2019-04-11 | 2019-06-28 | 清华大学 | Mechanical breakdown migration diagnostic method and system based on confrontation study |
CN110148398A (en) * | 2019-05-16 | 2019-08-20 | 平安科技(深圳)有限公司 | Training method, device, equipment and the storage medium of speech synthesis model |
CN110428818A (en) * | 2019-08-09 | 2019-11-08 | 中国科学院自动化研究所 | The multilingual speech recognition modeling of low-resource, audio recognition method |
Also Published As
Publication number | Publication date |
---|---|
CN111949796A (en) | 2020-11-17 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106446045B (en) | User portrait construction method and system based on dialogue interaction | |
CN111949796B (en) | Method and system for analyzing front-end text of voice synthesis of resource-limited language | |
CN112800222B (en) | Multi-task auxiliary limit multi-label short text classification method using co-occurrence information | |
CN116416480B (en) | Visual classification method and device based on multi-template prompt learning | |
CN110009025A (en) | A kind of semi-supervised additive noise self-encoding encoder for voice lie detection | |
CN112364125A (en) | Text information extraction system and method combining reading course learning mechanism | |
CN114254077A (en) | Method for evaluating integrity of manuscript based on natural language | |
CN111984790B (en) | Entity relation extraction method | |
CN117350286A (en) | Natural language intention translation method oriented to intention driving data link network | |
CN117251562A (en) | Text abstract generation method based on fact consistency enhancement | |
CN116595169A (en) | Question-answer intention classification method for coal mine production field based on prompt learning | |
CN112233655A (en) | Neural network training method for improving voice command word recognition performance | |
CN116304064A (en) | Text classification method based on extraction | |
CN111063335B (en) | End-to-end tone recognition method based on neural network | |
CN116306502A (en) | Data annotation optimization system and method for BERT classification task | |
CN112035680B (en) | Knowledge graph construction method of intelligent auxiliary learning machine | |
CN112015921A (en) | Natural language processing method based on learning-assisted knowledge graph | |
CN114281966A (en) | Question template generation method, question answering device and electronic equipment | |
CN112541339A (en) | Knowledge extraction method based on random forest and sequence labeling model | |
CN114564942A (en) | Text error correction method, storage medium and device for supervision field | |
CN113434669A (en) | Natural language relation extraction method based on sequence marking strategy | |
CN116562275B (en) | Automatic text summarization method combined with entity attribute diagram | |
CN114444506B (en) | Relation triplet extraction method for fusing entity types | |
CN116386637B (en) | Radar flight command voice instruction generation method and system | |
CN111599349B (en) | Method and system for training language model |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |