CN112735376A

CN112735376A - Self-learning platform

Info

Publication number: CN112735376A
Application number: CN202011595340.5A
Authority: CN
Inventors: 简仁贤; 黄怀鋐; 林长洲
Original assignee: Emotibot Technologies Ltd
Current assignee: Emotibot Technologies Ltd
Priority date: 2020-12-29
Filing date: 2020-12-29
Publication date: 2021-04-30

Abstract

The invention discloses a self-learning platform, which comprises an uploading module, a storage module and a model fine-tuning module; the uploading module is used for uploading the corpus data of the polyphones by the user; the storage module is used for storing the uploaded corpus data by the user; and the model fine tuning module is used for a user to call the stored corpus data to train the preset polyphone prediction model. The method and the device can assist the user to collect the corpus data of the polyphones more conveniently and conveniently, and facilitate the user to train the required polyphone model.

Description

Self-learning platform

Technical Field

The invention belongs to the technical field of natural language processing, and particularly relates to a self-learning platform.

Background

In the existing speech synthesis system, accurate prediction of polyphones plays a very important role, correct pronunciations of polyphones enable customer experience of the speech synthesis system to be more perfect, but the current speech synthesis system has errors in pronunciations of polyphones, which is caused by too little polyphone corpus data and insufficient training of polyphone prediction models.

Disclosure of Invention

The technical problem to be solved by the present invention is to provide a self-learning platform for overcoming the above-mentioned shortcomings in the prior art, which can assist a user in collecting corpus data of polyphones more conveniently and conveniently, and facilitate the user to train a desired polyphone model.

In order to solve the technical problems, the invention adopts the technical scheme that: a self-learning platform comprises an uploading module, a storage module and a model fine-tuning module;

the uploading module is used for uploading the corpus data of the polyphones by the user;

the storage module is used for storing the uploaded corpus data by the user;

and the model fine-tuning module is used for enabling a user to call the stored corpus data to fine-tune the preset polyphone prediction model.

The system further comprises a data screening module, wherein the data screening module is used for removing repeated corpus data and corpus data with wrong format in the corpus data uploaded by a user.

And further, the system also comprises a classification module, wherein the classification module is used for classifying the stored corpus data as required by a user.

Further, the classifying module classifies the corpus data, including the following steps:

inputting the corpus data into a classification model trained in advance; the classification category of the classification model is preset according to the requirement of a user;

and classifying and storing the result of each category into a storage module according to the classification result of the corpus data by the classification model.

Further, the model fine-tuning module is further configured to select one or more corpus data in the category to perform fine-tuning on a preset polyphone prediction model.

Furthermore, the polyphone prediction model is an albert model, and a classification layer is mounted behind an output layer of the albert model.

Further, the model fine-tuning module performs fine-tuning on the polyphone prediction model, and includes the following steps:

calling an open-source fine-tuning script of the albert model;

and adjusting parameters of the albert model according to the corpus data in the storage module.

Further, the system also comprises an importing module, a marking module and a broadcasting module;

the import module is used for importing the text data for the user;

the marking module is used for calling the polyphone prediction model subjected to fine tuning by the model fine tuning module to mark pronunciation of the text data;

and the broadcasting module is used for carrying out voice broadcasting on the text data marked by the marking module.

Further, when the storage module stores the uploaded corpus data, the storage module automatically generates a storage folder belonging to the user according to the ID of the user.

Furthermore, the user uploads the corpus data of the polyphones by writing the corpus data into the txt document.

Further, the format of the corpus data written in the txt document is "Y1, Y2 XXXY1(X) XY2(X) XX", wherein Y1 and Y2 are polyphones, XXXY1XY2XX is a sentence, X is a non-polyphone, and (X) is the reading of the polyphone.

Compared with the prior art, the invention has the following advantages: the user can conveniently and quickly upload and store the collected corpus data of the polyphones to the storage module through the uploading module by the self-learning platform, and the corpus data of the polyphones accumulated in the storage module is used for training the polyphone prediction model when needed, so that the purpose of customizing the polyphone prediction model is achieved. In addition, the whole self-learning platform enables a user to correct the polyphone error at any time and any place, and the whole polyphone prediction model optimization process is faster and more convenient.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

Fig. 1 is a schematic block diagram of the circuit of the present invention.

Detailed Description

As shown in fig. 1, the self-learning platform comprises an uploading module 1, a screening module 8, a storage module 2, a classification module 3, a model fine-tuning module 4, an importing module 5, a marking module 6 and a broadcasting module 7.

The uploading module 1 is used for uploading the corpus data of polyphones by a user; the user uploads the corpus data of polyphones collected by the user to the storage module by using the upload module 1, specifically, the user writes the corpus data of the polyphones into a txt document, for example, "long me (zhang3) is high", "is, is (wei4) and is today's weather sink (de 1)", and the number after the pinyin in the parentheses is a tone; when one piece of polyphone corpus data is uploaded, the data is firstly written into a txt document, then the txt document is stored in a storage module through an uploading module 1, a plurality of pieces of polyphone corpus data can be written into one txt document, and the uploading module 1 is arranged, so that great convenience is brought to a user to store the collected corpus data to the storage module at any time and any place.

The storage module 2 is used for storing the uploaded corpus data by the user; and when the storage module stores the uploaded corpus data, automatically generating a storage folder belonging to the user according to the ID of the user. A plurality of subfolders are set under the storage folder, and each subfolder corresponds to a classification category of the classification module 3.

And the classification module 3 is used for classifying the stored corpus data according to needs by users.

Specifically, the user may mark the corpus data stored in the storage module with classification information through the classification module 3, such as marking the corpus data used in a conventional dialog scenario with a label "conventional corpus", for example, marking the corpus data used in a reading scenario in an electronic book with a label "novel corpus".

The classification module 3 classifies the corpus data, and includes the following steps:

step 1, inputting the corpus data into a classification model trained in advance; the classification category of the classification model is preset according to the requirement of a user;

and 2, classifying and storing the result of each category into a storage module according to the classification result of the corpus data by the classification model.

In this embodiment, the pre-trained classification module is an implicit Dirichlet Allocation (LDA) model.

Through the classification module 3, a user can train the polyphone prediction model by using specific corpus data to obtain the polyphone prediction model corresponding to a specific scene.

And the model fine tuning module 4 is used for a user to call the stored corpus data to train a preset polyphone prediction model.

The user uploads the collected corpus data of the polyphones to the storage module by using the uploading module 1, which is equivalent to establishing a customized corpus, and the user can optimize the preset polyphone prediction model by using the uploaded corpus when needed, so that the customized polyphone prediction model is obtained; the polyphonic prediction model may include, but is not limited to, a maximum entropy algorithm model, a TBL (transformation based learning method) algorithm model, a decision tree model, a neural network algorithm model, or the like.

It should be noted that, the current major polyphonic character prediction technologies include rule-based bases such as part-of-speech prediction, decision trees based on machine learning, conditional random fields CRF, long and short term memory networks LSTM, and neural networks transforms, and the neural networks transforms are verified to perform best at present. Compared with the existing polyphone prediction model, the method has better prediction effect. And the customized polyphone prediction model obtained finally can be directly called by the existing voice synthesis module to carry out text reading or intelligent customer service, and smoothness and comprehension degree of the user in hearing can be improved through the customized polyphone prediction model.

The model fine-tuning module is used for fine-tuning the polyphone prediction model and comprises the following steps:

step 1, calling an open-source fine-tuning script of an albert model; selecting a subfolder under a user storage folder;

and 2, adjusting parameters of the albert model according to the corpus data in the subfolders.

The import module 5 is used for importing text data for a user;

the marking module 6 is used for calling the polyphone prediction model trained by the model fine tuning module 4 to mark pronunciation of the text data;

broadcast module 7 for carry out voice broadcast to the text data that mark module 6 marked.

It should be noted that, by setting the import module 5, the user can import some texts for testing, specifically, the user imports text data through the import module 5, then calls the polyphone prediction model through the marking module 6 to mark pronunciation for the text data, and broadcasts the text data by using the broadcasting module 7 after marking, so that polyphones with wrong pronunciation marks can be found in time through broadcasting results, and accordingly, the corpus data of new polyphones is collected.

When the method is used, supposing that a user finds that pronunciation errors exist in the ' Wei2 and the ' di 1 ' of the ' today's day-sky-atmosphere sinking ', the method wants to correct the pronunciation errors, at the moment, sentences are input according to the following format, the format needs to contain polyphone characters to be modified and sentences formed by the polyphone characters, if the ' Wei4 ' and the ' De4 ' of the ' today's day-atmosphere sinking ', the sentences are formed into txt documents and uploaded to a self-learning platform, a storage folder exclusive to the user is automatically created by a storage module, and the ID of the user is written into the name field of the storage folder; then, new txt documents uploaded by the user are stored in the storage folder, and the txt documents can be called at any time when the user needs to use the corpus data in the storage folder for model training; the training process is actually a process of performing non-pair optimization on the polyphone prediction model by utilizing the corpus data collected by the user, and the accuracy of the polyphone prediction model is higher and higher along with the increasing of the collected corpus data.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and all simple modifications, changes and equivalent structural changes made to the above embodiment according to the technical spirit of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims

1. A self-learning platform, comprising:

the uploading module is used for uploading the corpus data of the polyphones;

the storage module is used for storing the corpus data;

and the model fine-tuning module is used for calling the corpus data to perform fine tuning on a preset polyphone prediction model.

2. The self-learning platform of claim 1, further comprising:

and the data screening module is used for removing repeated corpus data and corpus data with wrong format in the corpus data.

3. The self-learning platform of claim 1 or 2, further comprising:

and the classification module is used for classifying the corpus data as required.

4. The self-learning platform of claim 3,

the classification module is further configured to input the corpus data into a pre-trained classification model, and send a classification result of the corpus data by the classification model to the storage module; the classification category of the classification model is preset according to the user requirement.

5. The self-learning platform of claim 4,

and the model fine-tuning module is also used for selecting one or more corpus data under the category to perform fine tuning on a preset polyphone prediction model.

6. The self-learning platform of claim 1, wherein the polyphonic prediction model is an albert model, and a classification layer is carried behind an output layer of the albert model.

7. The self-learning platform of claim 6,

the model fine-tuning module is also used for calling an open-source fine-tuning script of the albert model and adjusting the parameters of the albert model according to the corpus data.

8. The self-learning platform of claim 1 or 2, further comprising:

the import module is used for importing the text data;

the marking module is used for calling the polyphone prediction model subjected to the fine tuning by the model fine tuning module to mark pronunciation of the text data;

9. The self-learning platform of claim 1 or 2,

and the storage module is also used for automatically generating a storage folder belonging to the user according to the ID of the user.

10. The self-learning platform of claim 9, wherein the corpus data format is "Y1, Y2 XXXY1(X) XY2(X) XX", wherein Y1 and Y2 are polyphones, XXXY1XY2XX is a sentence, X is a non-polyphone, and (X) is a reading of the polyphone.