CN112735376A - Self-learning platform - Google Patents

Self-learning platform Download PDF

Info

Publication number
CN112735376A
CN112735376A CN202011595340.5A CN202011595340A CN112735376A CN 112735376 A CN112735376 A CN 112735376A CN 202011595340 A CN202011595340 A CN 202011595340A CN 112735376 A CN112735376 A CN 112735376A
Authority
CN
China
Prior art keywords
module
model
corpus data
self
learning platform
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011595340.5A
Other languages
Chinese (zh)
Inventor
简仁贤
黄怀鋐
林长洲
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Emotibot Technologies Ltd
Original Assignee
Emotibot Technologies Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Emotibot Technologies Ltd filed Critical Emotibot Technologies Ltd
Priority to CN202011595340.5A priority Critical patent/CN112735376A/en
Publication of CN112735376A publication Critical patent/CN112735376A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/02Methods for producing synthetic speech; Speech synthesisers
    • G10L13/04Details of speech synthesis systems, e.g. synthesiser structure or memory management
    • G10L13/047Architecture of speech synthesisers
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The invention discloses a self-learning platform, which comprises an uploading module, a storage module and a model fine-tuning module; the uploading module is used for uploading the corpus data of the polyphones by the user; the storage module is used for storing the uploaded corpus data by the user; and the model fine tuning module is used for a user to call the stored corpus data to train the preset polyphone prediction model. The method and the device can assist the user to collect the corpus data of the polyphones more conveniently and conveniently, and facilitate the user to train the required polyphone model.

Description

Self-learning platform
Technical Field
The invention belongs to the technical field of natural language processing, and particularly relates to a self-learning platform.
Background
In the existing speech synthesis system, accurate prediction of polyphones plays a very important role, correct pronunciations of polyphones enable customer experience of the speech synthesis system to be more perfect, but the current speech synthesis system has errors in pronunciations of polyphones, which is caused by too little polyphone corpus data and insufficient training of polyphone prediction models.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a self-learning platform for overcoming the above-mentioned shortcomings in the prior art, which can assist a user in collecting corpus data of polyphones more conveniently and conveniently, and facilitate the user to train a desired polyphone model.
In order to solve the technical problems, the invention adopts the technical scheme that: a self-learning platform comprises an uploading module, a storage module and a model fine-tuning module;
the uploading module is used for uploading the corpus data of the polyphones by the user;
the storage module is used for storing the uploaded corpus data by the user;
and the model fine-tuning module is used for enabling a user to call the stored corpus data to fine-tune the preset polyphone prediction model.
The system further comprises a data screening module, wherein the data screening module is used for removing repeated corpus data and corpus data with wrong format in the corpus data uploaded by a user.
And further, the system also comprises a classification module, wherein the classification module is used for classifying the stored corpus data as required by a user.
Further, the classifying module classifies the corpus data, including the following steps:
inputting the corpus data into a classification model trained in advance; the classification category of the classification model is preset according to the requirement of a user;
and classifying and storing the result of each category into a storage module according to the classification result of the corpus data by the classification model.
Further, the model fine-tuning module is further configured to select one or more corpus data in the category to perform fine-tuning on a preset polyphone prediction model.
Furthermore, the polyphone prediction model is an albert model, and a classification layer is mounted behind an output layer of the albert model.
Further, the model fine-tuning module performs fine-tuning on the polyphone prediction model, and includes the following steps:
calling an open-source fine-tuning script of the albert model;
and adjusting parameters of the albert model according to the corpus data in the storage module.
Further, the system also comprises an importing module, a marking module and a broadcasting module;
the import module is used for importing the text data for the user;
the marking module is used for calling the polyphone prediction model subjected to fine tuning by the model fine tuning module to mark pronunciation of the text data;
and the broadcasting module is used for carrying out voice broadcasting on the text data marked by the marking module.
Further, when the storage module stores the uploaded corpus data, the storage module automatically generates a storage folder belonging to the user according to the ID of the user.
Furthermore, the user uploads the corpus data of the polyphones by writing the corpus data into the txt document.
Further, the format of the corpus data written in the txt document is "Y1, Y2 XXXY1(X) XY2(X) XX", wherein Y1 and Y2 are polyphones, XXXY1XY2XX is a sentence, X is a non-polyphone, and (X) is the reading of the polyphone.
Compared with the prior art, the invention has the following advantages: the user can conveniently and quickly upload and store the collected corpus data of the polyphones to the storage module through the uploading module by the self-learning platform, and the corpus data of the polyphones accumulated in the storage module is used for training the polyphone prediction model when needed, so that the purpose of customizing the polyphone prediction model is achieved. In addition, the whole self-learning platform enables a user to correct the polyphone error at any time and any place, and the whole polyphone prediction model optimization process is faster and more convenient.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
Fig. 1 is a schematic block diagram of the circuit of the present invention.
Detailed Description
As shown in fig. 1, the self-learning platform comprises an uploading module 1, a screening module 8, a storage module 2, a classification module 3, a model fine-tuning module 4, an importing module 5, a marking module 6 and a broadcasting module 7.
The uploading module 1 is used for uploading the corpus data of polyphones by a user; the user uploads the corpus data of polyphones collected by the user to the storage module by using the upload module 1, specifically, the user writes the corpus data of the polyphones into a txt document, for example, "long me (zhang3) is high", "is, is (wei4) and is today's weather sink (de 1)", and the number after the pinyin in the parentheses is a tone; when one piece of polyphone corpus data is uploaded, the data is firstly written into a txt document, then the txt document is stored in a storage module through an uploading module 1, a plurality of pieces of polyphone corpus data can be written into one txt document, and the uploading module 1 is arranged, so that great convenience is brought to a user to store the collected corpus data to the storage module at any time and any place.
The storage module 2 is used for storing the uploaded corpus data by the user; and when the storage module stores the uploaded corpus data, automatically generating a storage folder belonging to the user according to the ID of the user. A plurality of subfolders are set under the storage folder, and each subfolder corresponds to a classification category of the classification module 3.
And the classification module 3 is used for classifying the stored corpus data according to needs by users.
Specifically, the user may mark the corpus data stored in the storage module with classification information through the classification module 3, such as marking the corpus data used in a conventional dialog scenario with a label "conventional corpus", for example, marking the corpus data used in a reading scenario in an electronic book with a label "novel corpus".
The classification module 3 classifies the corpus data, and includes the following steps:
step 1, inputting the corpus data into a classification model trained in advance; the classification category of the classification model is preset according to the requirement of a user;
and 2, classifying and storing the result of each category into a storage module according to the classification result of the corpus data by the classification model.
In this embodiment, the pre-trained classification module is an implicit Dirichlet Allocation (LDA) model.
Through the classification module 3, a user can train the polyphone prediction model by using specific corpus data to obtain the polyphone prediction model corresponding to a specific scene.
And the model fine tuning module 4 is used for a user to call the stored corpus data to train a preset polyphone prediction model.
The user uploads the collected corpus data of the polyphones to the storage module by using the uploading module 1, which is equivalent to establishing a customized corpus, and the user can optimize the preset polyphone prediction model by using the uploaded corpus when needed, so that the customized polyphone prediction model is obtained; the polyphonic prediction model may include, but is not limited to, a maximum entropy algorithm model, a TBL (transformation based learning method) algorithm model, a decision tree model, a neural network algorithm model, or the like.
It should be noted that, the current major polyphonic character prediction technologies include rule-based bases such as part-of-speech prediction, decision trees based on machine learning, conditional random fields CRF, long and short term memory networks LSTM, and neural networks transforms, and the neural networks transforms are verified to perform best at present. Compared with the existing polyphone prediction model, the method has better prediction effect. And the customized polyphone prediction model obtained finally can be directly called by the existing voice synthesis module to carry out text reading or intelligent customer service, and smoothness and comprehension degree of the user in hearing can be improved through the customized polyphone prediction model.
The model fine-tuning module is used for fine-tuning the polyphone prediction model and comprises the following steps:
step 1, calling an open-source fine-tuning script of an albert model; selecting a subfolder under a user storage folder;
and 2, adjusting parameters of the albert model according to the corpus data in the subfolders.
The import module 5 is used for importing text data for a user;
the marking module 6 is used for calling the polyphone prediction model trained by the model fine tuning module 4 to mark pronunciation of the text data;
broadcast module 7 for carry out voice broadcast to the text data that mark module 6 marked.
It should be noted that, by setting the import module 5, the user can import some texts for testing, specifically, the user imports text data through the import module 5, then calls the polyphone prediction model through the marking module 6 to mark pronunciation for the text data, and broadcasts the text data by using the broadcasting module 7 after marking, so that polyphones with wrong pronunciation marks can be found in time through broadcasting results, and accordingly, the corpus data of new polyphones is collected.
When the method is used, supposing that a user finds that pronunciation errors exist in the ' Wei2 and the ' di 1 ' of the ' today's day-sky-atmosphere sinking ', the method wants to correct the pronunciation errors, at the moment, sentences are input according to the following format, the format needs to contain polyphone characters to be modified and sentences formed by the polyphone characters, if the ' Wei4 ' and the ' De4 ' of the ' today's day-atmosphere sinking ', the sentences are formed into txt documents and uploaded to a self-learning platform, a storage folder exclusive to the user is automatically created by a storage module, and the ID of the user is written into the name field of the storage folder; then, new txt documents uploaded by the user are stored in the storage folder, and the txt documents can be called at any time when the user needs to use the corpus data in the storage folder for model training; the training process is actually a process of performing non-pair optimization on the polyphone prediction model by utilizing the corpus data collected by the user, and the accuracy of the polyphone prediction model is higher and higher along with the increasing of the collected corpus data.
The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and all simple modifications, changes and equivalent structural changes made to the above embodiment according to the technical spirit of the present invention still fall within the protection scope of the technical solution of the present invention.

Claims (10)

1. A self-learning platform, comprising:
the uploading module is used for uploading the corpus data of the polyphones;
the storage module is used for storing the corpus data;
and the model fine-tuning module is used for calling the corpus data to perform fine tuning on a preset polyphone prediction model.
2. The self-learning platform of claim 1, further comprising:
and the data screening module is used for removing repeated corpus data and corpus data with wrong format in the corpus data.
3. The self-learning platform of claim 1 or 2, further comprising:
and the classification module is used for classifying the corpus data as required.
4. The self-learning platform of claim 3,
the classification module is further configured to input the corpus data into a pre-trained classification model, and send a classification result of the corpus data by the classification model to the storage module; the classification category of the classification model is preset according to the user requirement.
5. The self-learning platform of claim 4,
and the model fine-tuning module is also used for selecting one or more corpus data under the category to perform fine tuning on a preset polyphone prediction model.
6. The self-learning platform of claim 1, wherein the polyphonic prediction model is an albert model, and a classification layer is carried behind an output layer of the albert model.
7. The self-learning platform of claim 6,
the model fine-tuning module is also used for calling an open-source fine-tuning script of the albert model and adjusting the parameters of the albert model according to the corpus data.
8. The self-learning platform of claim 1 or 2, further comprising:
the import module is used for importing the text data;
the marking module is used for calling the polyphone prediction model subjected to the fine tuning by the model fine tuning module to mark pronunciation of the text data;
and the broadcasting module is used for carrying out voice broadcasting on the text data marked by the marking module.
9. The self-learning platform of claim 1 or 2,
and the storage module is also used for automatically generating a storage folder belonging to the user according to the ID of the user.
10. The self-learning platform of claim 9, wherein the corpus data format is "Y1, Y2 XXXY1(X) XY2(X) XX", wherein Y1 and Y2 are polyphones, XXXY1XY2XX is a sentence, X is a non-polyphone, and (X) is a reading of the polyphone.
CN202011595340.5A 2020-12-29 2020-12-29 Self-learning platform Pending CN112735376A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011595340.5A CN112735376A (en) 2020-12-29 2020-12-29 Self-learning platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011595340.5A CN112735376A (en) 2020-12-29 2020-12-29 Self-learning platform

Publications (1)

Publication Number Publication Date
CN112735376A true CN112735376A (en) 2021-04-30

Family

ID=75611289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011595340.5A Pending CN112735376A (en) 2020-12-29 2020-12-29 Self-learning platform

Country Status (1)

Country Link
CN (1) CN112735376A (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1811912A (en) * 2005-01-28 2006-08-02 北京捷通华声语音技术有限公司 Minor sound base phonetic synthesis method
CN104992704A (en) * 2015-07-15 2015-10-21 百度在线网络技术(北京)有限公司 Speech synthesizing method and device
CN105336322A (en) * 2015-09-30 2016-02-17 百度在线网络技术(北京)有限公司 Polyphone model training method, and speech synthesis method and device
CN111599340A (en) * 2020-07-27 2020-08-28 南京硅基智能科技有限公司 Polyphone pronunciation prediction method and device and computer readable storage medium
WO2020238061A1 (en) * 2019-05-28 2020-12-03 平安科技(深圳)有限公司 Natural language classification method and apparatus, computer device, and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1811912A (en) * 2005-01-28 2006-08-02 北京捷通华声语音技术有限公司 Minor sound base phonetic synthesis method
CN104992704A (en) * 2015-07-15 2015-10-21 百度在线网络技术(北京)有限公司 Speech synthesizing method and device
CN105336322A (en) * 2015-09-30 2016-02-17 百度在线网络技术(北京)有限公司 Polyphone model training method, and speech synthesis method and device
WO2020238061A1 (en) * 2019-05-28 2020-12-03 平安科技(深圳)有限公司 Natural language classification method and apparatus, computer device, and storage medium
CN111599340A (en) * 2020-07-27 2020-08-28 南京硅基智能科技有限公司 Polyphone pronunciation prediction method and device and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN109949799B (en) Semantic parsing method and system
CN110767213A (en) Rhythm prediction method and device
CN112580335B (en) Method and device for disambiguating polyphone
CN110929875A (en) Intelligent language learning method, system, device and medium based on machine learning
CN111145719B (en) Data labeling method and device for Chinese-English mixing and tone labeling
CN110688857A (en) Article generation method and device
CN108829823A (en) A kind of file classification method
CN112216267B (en) Prosody prediction method, device, equipment and storage medium
CN112463942A (en) Text processing method and device, electronic equipment and computer readable storage medium
CN112199932A (en) PPT generation method, device, computer-readable storage medium and processor
CN112185341A (en) Dubbing method, apparatus, device and storage medium based on speech synthesis
CN116092472A (en) Speech synthesis method and synthesis system
CN114783424A (en) Text corpus screening method, device, equipment and storage medium
CN112749544B (en) Training method and system of paragraph segmentation model
CN112151019A (en) Text processing method and device and computing equipment
CN113887244A (en) Text processing method and device
CN208422152U (en) Intelligent music score identification and display device
CN112735376A (en) Self-learning platform
CN116129868A (en) Method and system for generating structured photo
CN113793578B (en) Method, device and equipment for generating tune and computer readable storage medium
CN115762471A (en) Voice synthesis method, device, equipment and storage medium
CN113990286A (en) Speech synthesis method, apparatus, device and storage medium
Praveen et al. Phoneme based Kannada Speech Corpus for Automatic Speech Recognition System
CN112786023A (en) Mark model construction method and voice broadcasting system
CN112733546A (en) Expression symbol generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination