CN112786023B

CN112786023B - Mark model construction method and voice broadcasting system

Info

Publication number: CN112786023B
Application number: CN202011539406.9A
Authority: CN
Inventors: 简仁贤; 黄怀鋐; 林长州
Original assignee: Emotibot Technologies Ltd
Current assignee: Emotibot Technologies Ltd
Filing date: 2020-12-23
Publication date: 2024-07-02
Anticipated expiration: 2040-12-23

Abstract

The invention discloses a marking model construction method and a voice broadcasting system, wherein the model construction method comprises the steps of calling corpus data A2 marked with labels according to language habits of users, wherein the labels of each corpus data are pause point information when the corpus is broadcasted; training a model by using corpus data A2 to train a pause point marking model P2; the method comprises the steps that a pause point marking model P2 is used for carrying out label updating on corpus data A1 in a public corpus K1, and a public corpus K2 formed by corpus data A2' marked with user language habit labels is obtained; training a model by using the corpus data A2' in the public corpus K2 to train a language pause point marking model P3. The language pause point marking model P3 can select the pause points in the voice broadcasting according to the language habit of the user, so that the broadcasting effect is closer to the language habit of the user.

Description

Mark model construction method and voice broadcasting system

Technical Field

The invention belongs to the technical field of natural language processing, and particularly relates to a marking model construction method and a voice broadcasting system.

Background

With the development of AI language technology, some functions of reading speech data by using speech synthesis technology have appeared in various mobile phone readers. The purpose of speech synthesis is to convert text into speech to be played to a user, and the final goal is to achieve the effect similar to that of real person reading, wherein a pause point in the speech broadcasting process is an important link in speech synthesis. For example, a sentence "somehow he is obligated to do so" broadcast as "somehow |he is obligated to do so" and "somehow |he is obligated to do so" are two effects that are quite different.

It can be seen how it is important to determine the dwell point.

In the current speech synthesis technology, the model obtained by training the training data is mainly used for predicting the pause points, however, in practice, the pause points of each sentence are related to the speaking mode of each person, the selection of the pause points has subjectivity, and the positions of the pause points considered by each person are different.

Therefore, different pause point selection modes are used for different users, and a good voice broadcasting effect can be obtained.

Disclosure of Invention

The invention aims to solve the technical problems in the prior art by providing a language pause point decision strategy which can select pause points in voice broadcasting according to the language habit of a user so that the broadcasting effect is closer to the language habit of the user.

The first aspect of the invention discloses a method for constructing a marking model, which comprises the following steps:

The method comprises the steps of calling corpus data A2 marked with labels according to language habits of users, wherein the labels of each corpus data are pause point information when the corpus is broadcasted;

training a model by using corpus data A2 to train a pause point marking model P2;

The method comprises the steps that a pause point marking model P2 is used for carrying out label updating on corpus data A1 in a public corpus K1, and a public corpus K2 formed by corpus data A2' marked with user language habit labels is obtained;

training a model by using the corpus data A2' in the public corpus K2 to train a language pause point marking model P3.

Further, corpus data A2 of the tag is generated according to the language habit of the user by the following steps:

Training a model by using the corpus data A1 to train a pause point marking model P1;

testing the pause point marking model P1, screening out the corpus data A1' with wrong classification according to the test result, wherein the corpus data A1' with wrong classification refers to that the label marked by the corpus data A1' through the pause point marking model P1 is not consistent with the original label;

and re-labeling the label according to the language habit of the user for the selected corpus data A1' to obtain corpus data A2 labeled according to the language habit of the user.

Further, when training a model by using the corpus data A2, the selected model is a pause point marking model P1.

Further, when training a model by using the corpus data A2', the selected model is the stop point marking model P1.

Further, the corpus data a1″ tested on the punctuation mark model P1 is the corpus data A1' in the public corpus K1, which does not participate in the training of the punctuation mark model P1.

Further, when training a model with the corpus data A2', the selected model is the stop point marking model P2.

The invention discloses a voice broadcasting system, which comprises a user habit acquisition module, a model generation module, an importing module, a label marking module and a broadcasting module;

The user habit collection module is used for collecting corpus data A2 of labels marked according to user language habits; the labels are pause point information when the corpus data is broadcasted;

the model generating module is used for generating a pause point marking model P3 according to the method of claim 1;

the importing module is used for importing the external corpus data A3 into the voice broadcasting system;

The label marking module is used for marking the pause point on the imported corpus data A3 by using the pause point marking model P3;

And the broadcasting module is used for carrying out voice broadcasting on the corpus data A4 marked by the label marking module.

Further, the user habit collection module is used for sending a plurality of corpus data A1' to a user for marking labels by the user, and obtaining corpus data A2 after marking;

The corpus data A1 'is obtained by testing the pause point marking model P1, and the corpus data A1' with wrong classification is screened out according to the test result; the corpus data A1 'with wrong classification refers to that the label obtained by the corpus data A1' through the pause point marking model P1 is not consistent with the original label; the stop point marking model P1 is a model obtained by training a model with corpus data A1 in the public corpus K1.

Further, when the model generating module trains a model by using the corpus data A2' in the public corpus K2, the selected model is the stop point marking model P2.

A third aspect of the invention provides an electronic device comprising a processor, a memory, the processor establishing a communication connection with the memory;

A processor configured to read a program in a memory to perform the method provided by the first aspect or any implementation manner of the first aspect.

A fourth aspect of the invention provides a computer readable storage medium having stored therein a program which, when executed by a computing device, performs the method provided by the foregoing first aspect or any implementation of the first aspect.

Compared with the prior art, the invention has the following advantages:

1. The invention discloses a language stop point marking model construction strategy, which trains a stop point marking model P2 with strong user language habits after re-marking by utilizing corpus data A1' with wrong classification, the stop point marking model P2 at the moment is equivalent to a customized stop point marking model specific to a user, the corpus data in a public corpus K1 is re-marked with labels by the stop point marking model P2, the marked massive corpus data has strong user language stop colors at the moment, and the trained stop point marking model is re-trained by using the marked massive corpus data, so that the stop point marking model P3 obtained after training marks the stop points according to the user language habits when marking the stop points of the corpus data.

2. According to the voice broadcasting system disclosed by the invention, before the voice broadcasting system is used, a user firstly uses the user habit acquisition module and acquires the language pause information of the user, and the personalized model generation module is used for generating the pause point marking model P3 to mark pause points on the imported corpus data, so that the marked corpus data can be processed in a manner similar to the habit of the user when being broadcasted, and good user experience is obtained.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

Fig. 1 is a flow chart of the method of embodiment 1 of the present invention.

Fig. 2 is a flow chart of the method of embodiment 2 of the present invention.

Fig. 3 is a flow chart of the method of embodiment 3 of the present invention.

Fig. 4 is a diagram illustrating a voice broadcast system according to embodiment 4 of the present invention.

Fig. 5 is a schematic structural diagram of embodiment 6 of the present invention.

Detailed Description

Example 1

As shown in fig. 1, a method for constructing a marker model includes the following steps:

Step 1: and calling corpus data A2 marked with labels according to the language habits of the user, wherein the label of each corpus data is pause point information when the corpus is broadcasted.

It should be noted that, the corpus data A2 may be stored in an independent storage partition in advance, and may be directly called when the call is required; corpus data when no tag is labeled, for example: "today's weather is cloudy and rainy", the corpus data A2 after labeling is "today's |weather is cloudy and rainy" "|" indicates a stop point.

Step 2: and training a model by using the corpus data A2 to train a pause point marking model P2.

It should be noted that, the model here is an Albert model, which is a simplified model designed by google corporation on the basis of Bert; those skilled in the art will appreciate that other similar machine learning models may be selected for the models herein. By training the Albert model by using the corpus data A2, the trained pause point marking model P2 can predict pause point information of the broadcasting and broadcasting of the corpus data, and the predicted pause point information accords with the habit of a user.

Step 3: and (3) carrying out label updating on the corpus data A1 in the public corpus K1 by using the stop point marking model P2 to obtain a public corpus K2 formed by corpus data A2' marked with user language habit labels.

Specifically, the text portion of the corpus data A1 in the public corpus K1 is input into a stop point marking model P2, stop point information predicted for each corpus data A1 is output through the stop point marking model P2, the predicted stop point information is used as a new label to replace the original label of the corpus data A1, the new label is a label conforming to the language habit of the user, the corpus data A1 marked with the new label is corpus data A2 'with the language habit label of the user, and the corpus data A2' is stored in a new storage partition to form the public corpus K2.

Step 4: training a model by using the corpus data A2' in the public corpus K2 to train a language pause point marking model P3.

It should be noted that, the model here uses a new Albert model, and the pause point marker model P2 may also be used. On the one hand, the corpus data A2 'in the public corpus K2 has enough quantity to meet the requirement of training a stable model, and on the other hand, the pause point information of the corpus data A2' accords with the language habit of a user, so that the trained language pause point marking model P3 can mark labels for recording the pause point information on the corpus data according to the language habit of the user.

Example 2

As shown in fig. 2, a method for constructing a marker model includes the following steps:

Step 1, selecting a public corpus K1, equally dividing corpus data in the public corpus K1 into five parts, wherein four parts are used as training corpus data A1, and one part is used as test corpus data A1'; for example, assuming that the corpus in the public corpus is 100 ten thousand sentences, 80 ten thousand sentences of the corpus are used as training data, and 20 ten thousand sentences of the corpus are used as corpus data A1'; each label of the corpus data is pause point information in the corpus; a piece of corpus data such as: "university of large apron" indicates that large snow is coming down, "|" indicates a stop point.

Step 2, training an Albert model by using the language data A1 to obtain a pause point marking model P1; it should be noted that the Albert model is a simplified model designed by google corporation on the basis of Bert. Those skilled in the art will appreciate that the model herein may also be trained using other machine learning models that resemble the Albert model.

Step 3, testing the pause point marking model P1 by using the language data A1', screening out the classified corpus data A1' according to the test result, wherein the classified corpus data A1 'refers to that the label marked by the corpus data A1' through the pause point marking model P1 is inconsistent with the original label; namely, the pause point marking model P1 can not accurately predict the pause point on the pause point prediction of the corpus data A1';

step 4, re-labeling the label of the selected corpus data A1' according to the language habit of the user; and obtaining corpus data A2 of the label marked according to the language habit of the user.

Specifically, when the labels are marked, the positions of the pause points can be obtained according to text reading or defaulting of the corpus data A1' by a user, and the labels are further marked according to the positions of the pause points;

step 5, training the Albert model by taking the corpus data A2 as training data to obtain a pause point marking model P2; the quiescence point marking model P2 at this time has a strong user habit style when predicting the quiescence point marking position.

The Albert model here may be a new model that has not been trained, or may be the quiesce point marker model P1.

Step 6, re-labeling the corpus data A1 in the public corpus K1 with a stop point labeling model P2; the corpus data A2' after the label is re-marked has very strong user language pause habit characteristics, but the corpus data after the label is re-marked is relatively coarse at the moment, so that in the next step, the pause point marking model P1 in the step 2 is trained, the aim of strengthening the ability of the pause point marking model P1 to tend to predict the pause point position close to the user language habit is achieved, and the correction is not too excessive;

And 7, retraining the stop point marking model P1 by using the corpus data A2' in the public corpus K2 after the label is re-marked to obtain a stop point marking model P3, wherein the stop point marking model P3 is the target model.

The Albert model is output in a serialized manner when outputting data.

Example 3

Step 6, re-labeling the corpus data A1 in the public corpus K1 with a stop point labeling model P2; the corpus data A2' after the label is re-marked has very strong user language pause habit characteristics, but the corpus data after the label is re-marked is larger in roughness at the moment, so that training is performed on the pause point marking model P2 in the step 5 in the next step, and the purpose of improving the stability of the pause point marking model P2 is achieved through training of a large amount of corpus data;

and 7, retraining the stop point marking model P2 by using the corpus data A2' in the public corpus K2 after the label is re-marked to obtain a stop point marking model P3, wherein the stop point marking model P3 is the target model.

The Albert model is output in a serialized manner when outputting data.

Example 4

As shown in fig. 4, a voice broadcasting system includes a user habit collection module 2, a model generation module 4, an introduction module 3, a tag marking module 5 and a broadcasting module 1.

In this embodiment, the user habit collection module 2 is configured to retrieve corpus data A2 of labels marked according to user language habits.

Specifically, the user habit collection module 2 sends a plurality of corpus data A1' to a user for marking labels by the user, and the corpus data A2 is obtained after marking; the corpus data A1 'is obtained by testing the pause point marking model P1, and the corpus data A1' with wrong classification is screened out according to the test result; the corpus data A1 'with wrong classification refers to that the label obtained by the corpus data A1' through the pause point marking model P1 is not consistent with the original label; the stop point marking model P1 is a model obtained by training a model with corpus data A1 in the public corpus K1. The model is an Albert model, which is a simplified model designed by Google corporation on the basis of Bert; those skilled in the art will appreciate that other similar machine learning models may be selected for the models herein.

In practice, the plurality of corpus data a1″ transmitted to the user are preset.

In this embodiment, the model generating module 4 is configured to generate the quiescence point marking model P3 according to the method described in embodiment 1.

Specifically, the generation process includes the steps of:

It should be noted that, the model here is an Albert model, which is a simplified model designed by google corporation on the basis of Bert; those skilled in the art will appreciate that other similar machine learning models may be selected for the models herein. By training the Albert model by using the corpus data A2, the trained pause point marking model P2 can predict pause point information of the broadcasting and broadcasting of the corpus data, and the predicted pause point information accords with the habit of a user. The model here may also select the above-described quiesce point marking model P1.

It should be noted that, the model here uses a new Albert model, and may also use the quiescence point marking model P2 or the quiescence point marking model P1. On the one hand, the corpus data A2 'in the public corpus K2 has enough quantity to meet the requirement of training a stable model, and on the other hand, the pause point information of the corpus data A2' accords with the language habit of a user, so that the trained language pause point marking model P3 can mark labels for recording the pause point information on the corpus data according to the language habit of the user.

In this embodiment, the importing module 3 is configured to import the external corpus data A3 into the voice broadcasting system.

The label marking module 5 is configured to mark a pause point on the imported corpus data A3 by using the pause point marking model P3.

The broadcasting module 1 is used for performing voice broadcasting on the corpus data A4 marked by the label marking module; the labels are pause point information when the corpus data is broadcasted.

When the voice broadcasting system is specifically used, firstly, after a user logs in, the system enables the user to mark a plurality of corpus data A1' with a plurality of corpus data A2 obtained after stopping points through a user habit acquisition module 2; then training a new Albert model or a quiesce point marking model P1 by taking a plurality of corpus data A2 as training data to obtain a quiesce point marking model P2, further re-marking labels on corpus data in a public corpus K1 by the quiesce point marking model P2, and then re-training the quiesce point marking model P1 or the quiesce point marking model P2 by the re-labeled corpus data A2', thereby obtaining a quiesce point marking model P3 special for users; after the external corpus data A3 is imported by the user through the importing module 3, the tag marking module 5 marks the pause point on the imported corpus data A3 by using the pause point marking model P3 to obtain corpus data A4, and finally, the broadcasting module 1 broadcasts the corpus data A4.

The implementation principle and the technical effect of this embodiment are the same as those of embodiment 1, embodiment 2 or embodiment 3, and for brevity, reference may be made to the corresponding contents of the foregoing method embodiments where this embodiment is not mentioned.

Example 5

A computer-readable storage medium having stored thereon a computer program which, when executed by a computer, performs the steps involved in the marker model construction method of embodiment 1 or embodiment 2 described above.

Example 6

An electronic device may be, but is not limited to, a personal computer (Personal computer, PC), a tablet computer, a Mobile internet device (Mobile INTERNET DEVICE, MID), and the like.

Wherein the electronic device 100 may comprise: processor 110, memory 120

It should be noted that the components and structures of the electronic device 100 shown in fig. 5 are exemplary only and not limiting, as the electronic device 100 may have other components and structures as desired.

The processor 110, the memory 120, and other components that may be present in the electronic device 100 are electrically connected to each other, either directly or indirectly, to enable transmission or interaction of data. For example, the processor 110, the memory 120, and possibly other components may be electrically connected to each other by one or more communication buses or signal lines.

The memory 120 is used for storing programs, for example, a program corresponding to the foregoing marking model construction method or a voice broadcast system of the foregoing. Optionally, when the memory 120 stores a voice broadcast system, the voice broadcast system includes at least one software functional module that may be stored in the memory 120 in the form of software or firmware (firmware).

Alternatively, the software functional modules included in the voice broadcast system may be implemented in an Operating System (OS) of the electronic device 100.

The processor 110 is configured to execute executable modules stored in the memory 120, such as software functional modules or computer programs included in the voice broadcast system. When the processor 110 receives the execution instructions, it may execute a computer program, for example, to perform: when the corpus data A2 marked according to the habit of the user is obtained, a stop point marking model A2 is trained, a public corpus K2 is further generated by using the stop point marking model A2, and then a stop point marking model A3 is trained by using the public corpus K2.

Of course, the methods disclosed in any of the embodiments of the present application may be applied to the processor 110 or implemented by the processor 110.

It should be noted that, in the present specification, each embodiment is described in a progressive manner, and each embodiment is mainly described as different from other embodiments, and identical and similar parts between the embodiments are all enough to be referred to each other.

In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other manners as well. The system embodiments described above are merely illustrative, for example, of the flowcharts and block diagrams in the figures that illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

In addition, functional modules in the embodiments of the present application may be integrated together to form a single part, or each module may exist alone, or two or more modules may be integrated to form a single part.

The functions, if implemented in the form of software functional modules and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a notebook computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.

The foregoing is merely illustrative of the present application, and the present application is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present application.

The foregoing description is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and any simple modification, variation and equivalent structural changes made to the above embodiment according to the technical substance of the present invention still fall within the scope of the technical solution of the present invention.

Claims

1. A method of constructing a marker model, comprising: the method comprises the steps of calling corpus data A2 marked with labels according to language habits of users, wherein the labels of each corpus data are pause point information when the corpus is broadcasted; training a model by using corpus data A2 to train a pause point marking model P2; the method comprises the steps that a pause point marking model P2 is used for carrying out label updating on corpus data A1 in a public corpus K1, and a public corpus K2 formed by corpus data A2' marked with user language habit labels is obtained; training a model by using corpus data A2' in a public corpus K2 to train a language pause point marking model P3;

Corpus data A2 of the label is marked according to the language habit of the user, and is generated by the following steps: training a model by using the corpus data A1 to train a pause point marking model P1; testing the pause point marking model P1, screening out the corpus data A1 ' with wrong classification according to the test result, wherein the corpus data A1 ' with wrong classification refers to that the label marked by the corpus data A1 ' through the pause point marking model P1 is not consistent with the original label; and re-labeling the label according to the language habit of the user for the selected corpus data A1' to obtain corpus data A2 labeled according to the language habit of the user.

2. The method for constructing a model according to claim 1, wherein the selected model is a stop point marker model P1 when training a model with the corpus data A2.

3. The method for constructing a model according to claim 1, wherein the selected model is a stop point marker model P1 when training a model with the corpus data A2'.

4. The method for constructing a marker model according to claim 1 or 2, wherein the corpus data A1 "for testing the punctuation marker model P1 is the corpus data A1' which does not participate in the training of the punctuation marker model P1 in the public corpus K1.

5. The method of claim 1, wherein the selected model is a stop point marker model P2 when training a model with the corpus data A2'.

6. A voice broadcast system, characterized by: the system comprises a user habit acquisition module, a model generation module, an importing module, a label marking module and a broadcasting module; the user habit collection module is used for collecting corpus data A2 of labels marked according to user language habits; the labels are pause point information when the corpus data is broadcasted; the model generating module is used for generating a pause point marking model P3 according to the method of claim 1; the importing module is used for importing the external corpus data A3 into the voice broadcasting system; the label marking module is used for marking the pause point on the imported corpus data A3 by using the pause point marking model P3; the broadcasting module is used for performing voice broadcasting on the corpus data A4 marked by the label marking module;

The user habit collection module is used for sending a plurality of corpus data A1' to a user for marking labels by the user, and obtaining corpus data A2 after marking; the corpus data A1 'is obtained by testing the pause point marking model P1, and the corpus data A1' with wrong classification is screened out according to the test result; the corpus data A1 'with wrong classification refers to that the label obtained by the corpus data A1' through the pause point marking model P1 is not consistent with the original label; the stop point marking model P1 is a model obtained by training a model with corpus data A1 in the public corpus K1.

7. The voice broadcasting system according to claim 6, wherein the corpus data A1 "tested on the punctuation mark model P1 is the corpus data A1' of the public corpus K1 which does not participate in the training of the punctuation mark model P1.

8. The voice broadcasting system of claim 6 wherein the model generation module selects a model as the stop point marker model P2 when training a model with corpus data A2' in the public corpus K2.