CN111949796B - Method and system for analyzing front-end text of voice synthesis of resource-limited language - Google Patents

Method and system for analyzing front-end text of voice synthesis of resource-limited language Download PDF

Info

Publication number
CN111949796B
CN111949796B CN202010858597.9A CN202010858597A CN111949796B CN 111949796 B CN111949796 B CN 111949796B CN 202010858597 A CN202010858597 A CN 202010858597A CN 111949796 B CN111949796 B CN 111949796B
Authority
CN
China
Prior art keywords
data
classifier
domain data
neural network
target domain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010858597.9A
Other languages
Chinese (zh)
Other versions
CN111949796A (en
Inventor
吴朗
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Original Assignee
Unisound Intelligent Technology Co Ltd
Xiamen Yunzhixin Intelligent Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Unisound Intelligent Technology Co Ltd, Xiamen Yunzhixin Intelligent Technology Co Ltd filed Critical Unisound Intelligent Technology Co Ltd
Priority to CN202010858597.9A priority Critical patent/CN111949796B/en
Publication of CN111949796A publication Critical patent/CN111949796A/en
Application granted granted Critical
Publication of CN111949796B publication Critical patent/CN111949796B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a method and a system for analyzing a front-end text of voice synthesis of a resource-limited language, wherein the method comprises the following steps: acquiring training data, wherein the training data comprises source domain data, target domain data with labels and target domain data without labels; training the neural network structure based on the mixed data and the target domain data without the label; and performing speech synthesis front-end text analysis on the resource-constrained language by using the trained neural network structure. According to the method, only a small amount of labeling data of the resource-limited languages is needed, and the quality is easier to control; from the semi-supervised learning point of view, a small amount of labeled resource-restricted language data is added in the training process, and feature distribution priori knowledge of the resource-restricted language data can be learned while feature distribution of the resource-rich language data is learned, so that the defect that an unsupervised domain self-adaption method cannot consider the feature distribution of the resource-restricted language data is avoided.

Description

Method and system for analyzing front-end text of voice synthesis of resource-limited language
Technical Field
The invention relates to the technical field of speech synthesis, in particular to a method and a system for analyzing a front-end text of resource-limited language speech synthesis.
Background
Currently, for resource-constrained languages (e.g., chinese domestic dialects), there are generally two methods for speech synthesis front-end text analysis: one method is to define the label type according to expert knowledge, manually make a large amount of target domain data with labels, and then input the target domain data into a designed neural network for training parameters, wherein the method has the problems of data distribution equalization, data labeling correctness, consistency, safety, timeliness and the like; another method is to use a migration learning method to know a neural network A of a rich-resource language (Chinese Mandarin), newly establish a neural network B of a limited-resource language, and train parameters in the network B by using local features (parameters) in the network A and a small amount of target domain data with labels, namely, the data of the limited-resource language; or a small amount of target domain data with labels is directly used for fine adjustment on the network A, so that a neural network of a resource-limited language is obtained, and the method has very poor effect between open data sets which are more in line with actual scenes; feature correlation between the target domain data and the source domain data (i.e., the resource rich language) is not fully utilized; tuning large-parameter neural networks using small amounts of tagged target domain data is often difficult to implement.
In order to solve the problem of data resource shortage in the analysis of the front-end text of the voice synthesis of the language with limited resources, a method and a system for analyzing the front-end text of the voice synthesis of the language with limited resources are needed.
Disclosure of Invention
The invention provides a method and a system for analyzing a front-end text of voice synthesis of a resource-limited language, which are used for solving the problem of data resource shortage in the front-end text analysis of voice synthesis of the resource-limited language.
The invention provides a method for analyzing a front-end text of voice synthesis of a resource-limited language, which comprises the following steps:
step 1: acquiring training data, wherein the training data comprises source domain data, target domain data with labels and target domain data without labels, the source domain data comprises text data corresponding to rich languages of resources, and the target domain data comprises text data corresponding to limited languages of resources;
step 2: training a neural network structure based on hybrid data and the unlabeled target domain data, wherein the hybrid data includes the labeled source domain data and the labeled target domain data;
step 3: and performing voice synthesis front-end text analysis on the resource limited languages by using the trained neural network structure.
Further, in the step 1, the proportion of the source domain data in the training data is 55% -65%, the proportion of the tagged target domain data in the training data is 8% -12%, and the proportion of the untagged target domain data in the training data is 27% -33%.
Further, in the step 2, the neural network structure includes a feature extractor and a classifier, wherein the classifier immediately follows the feature extractor.
Further, the feature extractor includes an encoder employing a transducer.
Further, the classifier includes a full connectivity layer, a softmax layer, and a CRF layer.
Further, the step 2: training the neural network structure based on the mixed data and the target domain data without the label, and executing the following steps:
step S21: inputting the mixed data in the neural network structure to perform supervised learning on the neural network structure and synchronously updating network parameters of the feature extractor and the classifier;
step S22: and simultaneously inputting the mixed data and the target domain data without labels in the neural network structure so as to perform semi-supervised learning on the neural network structure and only update the network parameters of the classifier.
Further, in the step S21, when the mixed data is input in the neural network structure, the output features through the feature extractor are input to the classifier once, and a dropout strategy is not adopted to learn the distinguishing features.
Further, in the step S22, the mixed data and the target domain data without labels are input into the neural network structure at the same time, so as to perform semi-supervised learning on the neural network structure, and only update the network parameters of the classifier, and the following steps are performed:
step S221: when the mixed data and the target domain data without labels are simultaneously input into the neural network structure, the output features of the feature extractor are respectively input into the classifier twice, and different network nodes are respectively dropout by adopting a dropout strategy so as to sample a first classifier network and a second classifier network;
step S222: only the network parameters of the classifier are updated by maximizing the KL-divergence between the output probabilities of the first classifier network and the second classifier network.
Further, the step 2: training the neural network structure based on the mixed data and the target domain data without the label, and further performing the following steps:
step S23: in the process of training the neural network structure, the classifier and the feature extractor are mutually opposed to interactively update network parameters in the classifier and the feature extractor;
step S24: inputting unlabeled target domain data in the neural network structure, and updating network parameters of the feature extractor by minimizing KL divergence between output probabilities of the first classifier network and the second classifier network.
The method for analyzing the front-end text of the voice synthesis of the resource-limited language provided by the embodiment of the invention has the following beneficial effects: only a small amount of labeling data of the resource-limited language is needed, and the quality is easier to control; in addition, from the semi-supervised learning point of view, a small amount of labeled resource-restricted language data is added in the training process, the feature distribution priori knowledge of the resource-restricted language data can be learned while the feature distribution of the resource-rich language data is learned, and the defect that the unsupervised domain self-adaptive method cannot consider the feature distribution of the resource-restricted language data is avoided.
The invention also provides a system for analyzing the front-end text of the voice synthesis of the resource-limited language, which comprises:
the training data acquisition module is used for acquiring training data, wherein the training data comprises source domain data, target domain data with labels and target domain data without labels, the source domain data comprises text data corresponding to rich languages of resources, and the target domain data comprises text data corresponding to limited languages of resources;
the neural network training module is used for training a neural network structure based on mixed data and the target domain data without labels, wherein the mixed data comprises the source domain data with labels and the target domain data with labels;
and the front-end text analysis module is used for carrying out voice synthesis front-end text analysis on the resource limited languages by utilizing the trained neural network structure.
The front-end text analysis system for the voice synthesis of the resource-limited language provided by the embodiment of the invention has the following beneficial effects: only a small amount of labeling data of the resource-limited language is needed, and the quality is easier to control; in addition, from the semi-supervised learning point of view, the neural network training module adds a small amount of labeled resource-restricted language data in the training process, and can learn the feature distribution priori knowledge of the resource-restricted language data while learning the feature distribution of the resource-rich language data, so that the defect that the unsupervised domain self-adaption technology cannot consider the feature distribution of the resource-restricted language is avoided, and the problem of data resource shortage in the front-end text analysis of the resource-restricted language speech synthesis can be solved by utilizing the semi-supervised domain self-adaption technology.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims thereof as well as the appended drawings.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a flow chart of a method for analyzing a front-end text of a speech synthesis of a resource-constrained language in an embodiment of the invention;
FIG. 2 is a block diagram of a system for front-end text analysis for speech synthesis in a resource-constrained language in accordance with an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
The embodiment of the invention provides a method for analyzing a front-end text of voice synthesis of a resource-limited language, which is shown in fig. 1 and comprises the following steps:
step 1: acquiring training data, wherein the training data comprises source domain data, target domain data with labels and target domain data without labels, the source domain data comprises text data corresponding to rich languages of resources, and the target domain data comprises text data corresponding to limited languages of resources;
step 2: training a neural network structure based on hybrid data and the unlabeled target domain data, wherein the hybrid data includes the labeled source domain data and the labeled target domain data;
step 3: and performing voice synthesis front-end text analysis on the resource limited languages by using the trained neural network structure.
The working principle of the technical scheme is as follows: the rich resource language may be, for example, mandarin Chinese, and the limited resource language may be, for example, domestic dialect.
Based on semi-supervised learning, a small amount of tagged resource-restricted language data is added into the resource-rich language data, so that the neural network can learn some feature distribution priori knowledge of the resource-restricted language data, and further the tag prediction accuracy of the resource-restricted language data is improved. Specifically, first, training data composed of source domain data, labeled target domain data, and unlabeled target domain data is acquired; then training the neural network structure based on the mixed data consisting of the labeled source domain data and the labeled target domain data and the unlabeled target domain data; finally, the trained neural network structure is utilized to carry out voice synthesis front-end text analysis on the resource limited languages.
The beneficial effects of the technical scheme are as follows: only a small amount of labeling data of the resource-limited language is needed, and the quality is easier to control; in addition, from the semi-supervised learning point of view, a small amount of labeled resource-restricted language data is added in the training process, the feature distribution priori knowledge of the resource-restricted language data can be learned while the feature distribution of the resource-rich language data is learned, and the defect that the unsupervised domain self-adaptive method cannot consider the feature distribution of the resource-restricted language is avoided.
In one embodiment, in the step 1, the proportion of the source domain data in the training data is 55% -65%, the proportion of the tagged target domain data in the training data is 8% -12%, and the proportion of the untagged target domain data in the training data is 27% -33%.
The working principle of the technical scheme is as follows: as an example and not by way of limitation, the proportion of source domain data is 60%, the proportion of tagged target domain data is 10%, and the proportion of untagged target domain data is 30%.
The beneficial effects of the technical scheme are as follows: the source domain data, the target domain data with the labels and the proportion setting of the target domain data with the labels in the training data are provided, so that a small amount of resource limited language data with the labels is added in the training process, and the prior knowledge of the characteristic distribution of the resource limited language data can be learned while the characteristic distribution of the resource rich language data is learned.
In one embodiment, in the step 2, the neural network structure includes a feature extractor and a classifier, wherein the classifier immediately follows the feature extractor.
The working principle of the technical scheme is as follows: the feature extractor includes an encoder employing a transducer that can fully extract feature information of text context.
The classifier includes a full connectivity layer, a softmax layer, and a CRF layer. The classifier mainly has two functions, one is used as a conventional classifier, the neural network parameters are trained by using mixed data, the other is used as a determiner, target domain data without labels is used, the characteristics of the target domain data are respectively input into the classifier network twice after passing through the characteristic extractor, different network nodes in the full-connection layer of the two times of dropouts are judged, and the similarity between the two different output probabilities is measured by using a Kullback-Leibler divergence (KL divergence).
The beneficial effects of the technical scheme are as follows: the specific structure of the neural network is provided, the feature extractor adopts a transducer encoder, the transducer encoder can fully extract the feature information of the text context, and the classifier comprises a full connection layer, a softmax layer and a CRF layer, can be used as a conventional classifier and can be used as a decision device.
In one embodiment, the step 2: training the neural network structure based on the mixed data and the target domain data without the label, and executing the following steps:
step S21: inputting the mixed data in the neural network structure to perform supervised learning on the neural network structure and synchronously updating network parameters of the feature extractor and the classifier;
step S22: and simultaneously inputting the mixed data and the target domain data without labels in the neural network structure so as to perform semi-supervised learning on the neural network structure and only update the network parameters of the classifier.
The working principle of the technical scheme is as follows: firstly, inputting mixed data, performing supervised learning, and synchronously training parameters of a feature extractor and a classifier, wherein the aim is to accurately classify the mixed data so as to obtain distinguishing features; then, the mixed data (both tagged) and untagged target domain data are input simultaneously, and only the parameters of the classifier are updated.
The beneficial effects of the technical scheme are as follows: specific steps are provided for training a neural network structure based on the hybrid data and the unlabeled target domain data.
In one embodiment, in the step S21, when the mixed data is input in the neural network structure, the output features through the feature extractor are input to the classifier once, without adopting a dropout strategy, so as to learn the distinguishing features.
The working principle of the technical scheme is as follows: when the mixed data is input, the input to the classifier is once through the feature extractor, and dropout is not performed so that the distinguishing features are learned.
The beneficial effects of the technical scheme are as follows: specific methods of inputting the hybrid data in a neural network structure to supervise learning the neural network structure are provided.
In one embodiment, in the step S22, the mixed data and the target domain data without labels are input into the neural network structure at the same time, so as to perform semi-supervised learning on the neural network structure, and only update the network parameters of the classifier, and perform the following steps:
step S221: when the mixed data and the target domain data without labels are simultaneously input into the neural network structure, the output features of the feature extractor are respectively input into the classifier twice, and different network nodes are respectively dropout by adopting a dropout strategy so as to sample a first classifier network and a second classifier network;
step S222: only the network parameters of the classifier are updated by maximizing the KL-divergence between the output probabilities of the first classifier network and the second classifier network.
The working principle of the technical scheme is as follows: inputting target domain data without labels, respectively inputting the output characteristics of the characteristic extractor into the classifiers twice, wherein the network nodes with different dropouts are respectively equivalent to sampling two classifier networks, namely a first classifier network C1 and a second classifier network C2; then maximizing the KL divergence between the two output probabilities; thus, the classifier can detect the target domain data near the decision boundary, and different neural nodes on the classifier can learn more different characteristic representations.
The beneficial effects of the technical scheme are as follows: specific steps are provided for simultaneously inputting hybrid data and unlabeled target domain data in a neural network structure to perform semi-supervised learning of the neural network structure.
In one embodiment, the step 2: training the neural network structure based on the mixed data and the target domain data without the label, and further performing the following steps:
step S23: in the process of training the neural network structure, the classifier and the feature extractor are mutually opposed to interactively update network parameters in the classifier and the feature extractor;
step S24: inputting unlabeled target domain data in the neural network structure, and updating network parameters of the feature extractor by minimizing KL divergence between output probabilities of the first classifier network and the second classifier network.
The working principle of the technical scheme is as follows: as described above, two classifier networks are sampled and the KL divergence between the output probabilities is maximized, so that the classifier can detect the target domain data near the decision boundary, different neural nodes on the classifier can learn more different feature representations, and at the same time, the feature extractor has to generate more distinguishing features in order to avoid these feature spaces, and the classifier and the feature extractor are mutually opposed and interactively updated in the training process.
In addition, the unlabeled target domain data is input, and the parameters of the feature extractor are updated by minimizing the KL divergence between the output probabilities of the first classifier network C1 and the second classifier network C2, so that the feature representation of the unlabeled target domain data far from the decision boundary can be obtained.
The beneficial effects of the technical scheme are as follows: specific steps for training the neural network structure based on the hybrid data and the unlabeled target domain data are provided.
As shown in fig. 2, an embodiment of the present invention provides a system for analyzing a front-end text of a speech synthesis of a resource-constrained language, including:
a training data obtaining module 201, configured to obtain training data, where the training data includes source domain data, labeled target domain data, and unlabeled target domain data, the source domain data includes text data corresponding to a resource-rich language, and the target domain data includes text data corresponding to a resource-limited language;
a neural network training module 202, configured to train a neural network structure based on hybrid data and the unlabeled target domain data, where the hybrid data includes the labeled source domain data and the labeled target domain data;
the front-end text analysis module 203 is configured to perform speech synthesis front-end text analysis on the resource-constrained language by using the trained neural network structure.
The working principle of the technical scheme is as follows: the rich resource language may be, for example, mandarin Chinese, and the limited resource language may be, for example, domestic dialect.
Based on semi-supervised learning, a small amount of tagged resource-restricted language data is added into the resource-rich language data, so that the neural network can learn some feature distribution priori knowledge of the resource-restricted language data, and further the tag prediction accuracy of the resource-restricted language data is improved. Specifically, the training data acquisition module 201 acquires training data composed of source domain data, labeled target domain data, and unlabeled target domain data; the neural network training module 202 trains the neural network structure based on the mixed data composed of the labeled source domain data and the labeled target domain data and the unlabeled target domain data; the front-end text analysis module 203 performs speech synthesis front-end text analysis on the resource-constrained language by using the trained neural network structure.
The beneficial effects of the technical scheme are as follows: only a small amount of labeling data of the resource-limited language is needed, and the quality is easier to control; in addition, from the semi-supervised learning point of view, the neural network training module adds a small amount of labeled resource-restricted language data in the training process, and can learn the feature distribution priori knowledge of the resource-restricted language data while learning the feature distribution of the resource-rich language data, so that the defect that the unsupervised domain self-adaption technology cannot consider the feature distribution of the resource-restricted language is avoided, and the problem of data resource shortage in the front-end text analysis of the resource-restricted language speech synthesis can be solved by utilizing the semi-supervised domain self-adaption technology.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (6)

1. A method for analyzing a front-end text of voice synthesis of a resource-limited language is characterized by comprising the following steps:
step 1: acquiring training data, wherein the training data comprises source domain data, target domain data with labels and target domain data without labels, the source domain data comprises text data corresponding to rich languages of resources, and the target domain data comprises text data corresponding to limited languages of resources; the proportion of the source domain data in the training data is 55% -65%, the proportion of the labeled target domain data in the training data is 8% -12%, and the proportion of the unlabeled target domain data in the training data is 27% -33%;
step 2: training the neural network structure based on the hybrid data and the unlabeled target domain data, comprising: step S21: inputting the mixed data into the neural network structure to perform supervised learning on the neural network structure and synchronously updating network parameters of a feature extractor and a classifier; step S22: simultaneously inputting the mixed data and the target domain data without labels in the neural network structure to perform semi-supervised learning on the neural network structure and only update network parameters of the classifier, wherein the method comprises the following steps: step S221: when the mixed data and the target domain data without labels are simultaneously input into the neural network structure, the output features of the feature extractor are respectively input into the classifier twice, and different network nodes are respectively dropout by adopting a dropout strategy so as to sample a first classifier network and a second classifier network; step S222: updating only network parameters of the classifier by maximizing KL divergence between output probabilities of the first classifier network and the second classifier network; wherein the hybrid data includes the tagged source domain data and the tagged target domain data; the neural network structure includes a feature extractor and a classifier, wherein the classifier immediately follows the feature extractor;
step 3: and performing voice synthesis front-end text analysis on the resource limited languages by using the trained neural network structure.
2. The method of claim 1, wherein the feature extractor comprises an encoder employing a transducer.
3. The method of claim 1, wherein the classifier comprises a fully connected layer, a softmax layer, and a CRF layer.
4. The method of claim 1, wherein in the step S21, when the mixed data is input in the neural network structure, the output features through the feature extractor are input to the classifier once without employing a dropout strategy to learn the distinguishing features.
5. The method according to claim 1, wherein said step 2: training the neural network structure based on the mixed data and the target domain data without the label, and further performing the following steps:
step S23: in the process of training the neural network structure, the classifier and the feature extractor are mutually opposed to interactively update network parameters in the classifier and the feature extractor;
step S24: inputting unlabeled target domain data in the neural network structure, and updating network parameters of the feature extractor by minimizing KL divergence between output probabilities of the first classifier network and the second classifier network.
6. A system for front-end text analysis for speech synthesis in a resource-constrained language, comprising:
the training data acquisition module is used for acquiring training data, wherein the training data comprises source domain data, target domain data with labels and target domain data without labels, the source domain data comprises text data corresponding to rich languages of resources, and the target domain data comprises text data corresponding to limited languages of resources; the proportion of the source domain data in the training data is 55% -65%, the proportion of the labeled target domain data in the training data is 8% -12%, and the proportion of the unlabeled target domain data in the training data is 27% -33%;
the neural network training module is configured to train a neural network structure based on the mixed data and the target domain data without the tag, and includes: step S21: inputting the mixed data into the neural network structure to perform supervised learning on the neural network structure and synchronously updating network parameters of a feature extractor and a classifier; step S22: simultaneously inputting the mixed data and the target domain data without labels in the neural network structure to perform semi-supervised learning on the neural network structure and only update network parameters of the classifier, wherein the method comprises the following steps: step S221: when the mixed data and the target domain data without labels are simultaneously input into the neural network structure, the output features of the feature extractor are respectively input into the classifier twice, and different network nodes are respectively dropout by adopting a dropout strategy so as to sample a first classifier network and a second classifier network; step S222: updating only network parameters of the classifier by maximizing KL divergence between output probabilities of the first classifier network and the second classifier network; wherein the hybrid data includes the tagged source domain data and the tagged target domain data; the neural network structure includes a feature extractor and a classifier, wherein the classifier immediately follows the feature extractor;
and the front-end text analysis module is used for carrying out voice synthesis front-end text analysis on the resource limited languages by utilizing the trained neural network structure.
CN202010858597.9A 2020-08-24 2020-08-24 Method and system for analyzing front-end text of voice synthesis of resource-limited language Active CN111949796B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010858597.9A CN111949796B (en) 2020-08-24 2020-08-24 Method and system for analyzing front-end text of voice synthesis of resource-limited language

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010858597.9A CN111949796B (en) 2020-08-24 2020-08-24 Method and system for analyzing front-end text of voice synthesis of resource-limited language

Publications (2)

Publication Number Publication Date
CN111949796A CN111949796A (en) 2020-11-17
CN111949796B true CN111949796B (en) 2023-10-20

Family

ID=73359690

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010858597.9A Active CN111949796B (en) 2020-08-24 2020-08-24 Method and system for analyzing front-end text of voice synthesis of resource-limited language

Country Status (1)

Country Link
CN (1) CN111949796B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114239859B (en) * 2022-02-25 2022-07-08 杭州海康威视数字技术股份有限公司 Power consumption data prediction method and device based on transfer learning and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909101A (en) * 2017-11-10 2018-04-13 清华大学 Semi-supervised transfer learning character identifying method and system based on convolutional neural networks
CN108460134A (en) * 2018-03-06 2018-08-28 云南大学 The text subject disaggregated model and sorting technique of transfer learning are integrated based on multi-source domain
CN109947086A (en) * 2019-04-11 2019-06-28 清华大学 Mechanical breakdown migration diagnostic method and system based on confrontation study
CN110148398A (en) * 2019-05-16 2019-08-20 平安科技(深圳)有限公司 Training method, device, equipment and the storage medium of speech synthesis model
CN110428818A (en) * 2019-08-09 2019-11-08 中国科学院自动化研究所 The multilingual speech recognition modeling of low-resource, audio recognition method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190147854A1 (en) * 2017-11-16 2019-05-16 Microsoft Technology Licensing, Llc Speech Recognition Source to Target Domain Adaptation

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909101A (en) * 2017-11-10 2018-04-13 清华大学 Semi-supervised transfer learning character identifying method and system based on convolutional neural networks
CN108460134A (en) * 2018-03-06 2018-08-28 云南大学 The text subject disaggregated model and sorting technique of transfer learning are integrated based on multi-source domain
CN109947086A (en) * 2019-04-11 2019-06-28 清华大学 Mechanical breakdown migration diagnostic method and system based on confrontation study
CN110148398A (en) * 2019-05-16 2019-08-20 平安科技(深圳)有限公司 Training method, device, equipment and the storage medium of speech synthesis model
CN110428818A (en) * 2019-08-09 2019-11-08 中国科学院自动化研究所 The multilingual speech recognition modeling of low-resource, audio recognition method

Also Published As

Publication number Publication date
CN111949796A (en) 2020-11-17

Similar Documents

Publication Publication Date Title
CN106446045B (en) User portrait construction method and system based on dialogue interaction
CN111949796B (en) Method and system for analyzing front-end text of voice synthesis of resource-limited language
CN112800222B (en) Multi-task auxiliary limit multi-label short text classification method using co-occurrence information
CN116416480B (en) Visual classification method and device based on multi-template prompt learning
CN110009025A (en) A kind of semi-supervised additive noise self-encoding encoder for voice lie detection
CN112364125A (en) Text information extraction system and method combining reading course learning mechanism
CN114254077A (en) Method for evaluating integrity of manuscript based on natural language
CN111984790B (en) Entity relation extraction method
CN117350286A (en) Natural language intention translation method oriented to intention driving data link network
CN117251562A (en) Text abstract generation method based on fact consistency enhancement
CN116595169A (en) Question-answer intention classification method for coal mine production field based on prompt learning
CN112233655A (en) Neural network training method for improving voice command word recognition performance
CN116304064A (en) Text classification method based on extraction
CN111063335B (en) End-to-end tone recognition method based on neural network
CN116306502A (en) Data annotation optimization system and method for BERT classification task
CN112035680B (en) Knowledge graph construction method of intelligent auxiliary learning machine
CN112015921A (en) Natural language processing method based on learning-assisted knowledge graph
CN114281966A (en) Question template generation method, question answering device and electronic equipment
CN112541339A (en) Knowledge extraction method based on random forest and sequence labeling model
CN114564942A (en) Text error correction method, storage medium and device for supervision field
CN113434669A (en) Natural language relation extraction method based on sequence marking strategy
CN116562275B (en) Automatic text summarization method combined with entity attribute diagram
CN114444506B (en) Relation triplet extraction method for fusing entity types
CN116386637B (en) Radar flight command voice instruction generation method and system
CN111599349B (en) Method and system for training language model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant