CN113270088B

CN113270088B - Text processing method, data processing method, voice processing method, data processing device, voice processing device and electronic equipment

Info

Publication number: CN113270088B
Application number: CN202010092098.3A
Authority: CN
Inventors: 包祖贻; 李辰; 黄非
Original assignee: Alibaba Group Holding Ltd
Current assignee: Alibaba Group Holding Ltd
Priority date: 2020-02-14
Filing date: 2020-02-14
Publication date: 2022-04-29
Anticipated expiration: 2040-02-14
Also published as: CN113270088A

Abstract

The embodiment of the disclosure discloses a text processing method, a data processing method, a voice processing method, a data processing device and an electronic device, wherein the text processing method comprises the following steps: generating a first mask for the text to be corrected containing the first language element according to the fact that the text to be corrected comprises the first language element, and masking the first language element in the text to be corrected by using the first mask to generate a replacement text; inputting the first mask and the replacement text into an error correction model to predict a first language element vector corresponding to the first language element; generating a target language element according to the predicted first language element vector; the generated target language element is used for replacing the first language element in the text to be corrected to obtain the corrected text, the replacement text of the text to be corrected and the first mask can be input into the correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

Description

Text processing method, data processing method, voice processing method, data processing device, voice processing device and electronic equipment

Technical Field

The present disclosure relates to the field of computer technologies, and in particular, to a method and an apparatus for text processing, data processing, and speech processing, an electronic device, and a readable storage medium.

Background

In recent years, with the development of computer technology and the demand of people, technologies such as text processing and voice processing are becoming more and more important. The sources of text and speech that need to be processed are complex, and may be inputs in instant messages, and may also be real-time text conversions of a speech. For example, when performing speech recognition, how to recognize the accuracy of a specific language element (e.g., pronouns) poses a great challenge, and when performing text processing and speech processing, an error of the specific language element may further affect the effect of downstream tasks, such as the accuracy of translation.

Disclosure of Invention

In order to solve the problems in the related art, embodiments of the present disclosure propose text processing, data processing, and voice processing methods, apparatuses, electronic devices, and readable storage media.

In a first aspect, an embodiment of the present disclosure provides a text processing method, including:

generating a first mask for the text to be corrected containing the first language element according to the text to be corrected containing the first language element, and masking the first language element in the text to be corrected by using the first mask to generate a replacement text;

inputting the first mask and the replacement text into an error correction model to predict a first language element vector corresponding to the first language element;

generating a target language element according to the predicted first language element vector;

replacing a first language element in the text to be corrected with the generated target language element to obtain a corrected text.

With reference to the first aspect, in a first implementation manner of the first aspect, the present disclosure further includes:

and screening out sentences containing the first language elements from the non-labeled data so as to train the error correction model.

With reference to the first implementation manner of the first aspect, in a second implementation manner of the first aspect, the screening out text containing the first language element from the unlabeled data to train the error correction model includes:

generating a second mask for a first statement containing a first language element, and masking the first language element in the first statement with the second mask to generate first training data;

acquiring a second sentence converted from the first sentence through preset conversion processing, wherein the second sentence comprises the first language element and comprises a language element different from the language element in the first sentence;

generating a third mask for the second statement and masking the first language element in the second statement with the third mask to generate second training data;

generating third training data from the first language element, the second mask, the first training data, the third mask, and the second training data, and training an error correction model using the third training data.

With reference to the first aspect, or any one of the first implementation manner and the second implementation manner of the first aspect, in a third implementation manner of the first aspect, the first mask is an array including numbers corresponding to language elements in the text to be corrected, where the numbers corresponding to language elements other than the first language element in the text to be corrected are the same as each other and different from the number corresponding to the first language element; the second mask is an array including numbers in one-to-one correspondence with language elements in the first sentence, wherein the numbers corresponding to language elements other than the first language element in the first sentence are identical to each other and different from the numbers corresponding to the first language element; the third mask is an array including numbers in one-to-one correspondence with language elements in the second sentence, wherein the numbers corresponding to language elements other than the first language element in the second sentence are identical to each other and different from the numbers corresponding to the first language element.

With reference to the third implementation manner of the first aspect, in a fourth implementation manner of the first aspect, the generating a first mask for the text to be corrected containing the first language element according to that the text to be corrected includes the first language element, and masking the first language element in the text to be corrected with the first mask to generate a replacement text includes:

masking the first language element with a number in the first mask corresponding to the first language element, wherein the resulting replacement text differs from the text to be corrected in that the first language element is replaced with a particular flag.

With reference to the second implementation manner of the first aspect, in a fifth implementation manner of the first aspect, the obtaining a second sentence that is converted from the first sentence through a preset conversion process, where the second sentence includes the first language element and includes a language element different from the language element in the first sentence, includes:

converting the first sentence into an audio signal;

adding noise to the audio signal;

the noise-added audio signal is converted into a second sentence, wherein the second sentence contains the first linguistic element and contains linguistic elements different from those in the first sentence.

With reference to the second implementation manner of the first aspect, in a sixth implementation manner of the first aspect, the generating third training data according to the first language element, the second mask, the first training data, the third mask, and the second training data, and training an error correction model by using the third training data includes:

performing data mixing on a first language element vector corresponding to the first language element, the second mask, the first training data, the third mask, and the second training data to generate third training data;

training the error correction model using the third training data.

With reference to the sixth implementation manner of the first aspect, in a seventh implementation manner of the first aspect, the error correction model is a Bi-LSTM sequence labeling model.

With reference to the first aspect, in an eighth implementation manner of the first aspect, the first language element includes at least one language element.

With reference to the eighth implementation manner of the first aspect, in a ninth implementation manner of the first aspect, the screening out the sentences containing the first language element from the non-labeled data to train the error correction model includes:

and eliminating sentences containing the first language elements meeting preset conditions from the screened sentences containing the first language elements.

In a second aspect, an embodiment of the present disclosure provides a text processing apparatus, including:

the concealing module is configured to generate a first mask for the text to be corrected containing a first language element according to the text to be corrected comprising the first language element, and conceal the first language element in the text to be corrected by using the first mask to generate a replacing text;

a first generation module configured to input the first mask and the replacement text into an error correction model to predict a first language element vector corresponding to the first language element;

a second generation module configured to generate a target language element from the predicted first language element vector;

an error correction module configured to replace a first language element in the text to be error corrected with the generated target language element to obtain an error corrected text.

In a third aspect, an embodiment of the present disclosure provides a data processing method, including:

generating a first mask for a first statement containing a first language element, and masking the first language element in the first statement with the first mask to generate first training data;

generating a second mask for the second statement containing the first language element and masking the first language element in the second statement with the second mask to generate second training data;

generating third training data according to the first language element, the first mask, the first training data, the second mask and the second training data, and training a preset error correction model by using the third training data.

With reference to the third aspect, in a first implementation manner of the third aspect, the first mask is an array including numbers in one-to-one correspondence with language elements in the first sentence, where the numbers corresponding to language elements other than the first language element in the first sentence are the same as each other and different from the number corresponding to the first language element; the second mask is an array including numbers in one-to-one correspondence with language elements in the second sentence, wherein the numbers corresponding to language elements other than the first language element in the second sentence are identical to each other and different from the numbers corresponding to the first language element.

With reference to the first implementation manner of the third aspect, in a second implementation manner of the third aspect, the generating a first mask for a first sentence containing a first language element and masking the first language element in the first sentence with the first mask to generate first training data includes:

masking the first language element with a number in the first mask that corresponds to the first language element, wherein the resulting first training data differs from the first sentence in that the first language element is replaced with a particular flag.

With reference to the third aspect, or any one of the first implementation manner and the second implementation manner of the third aspect, in a third implementation manner of the third aspect, the obtaining a second sentence that is converted from the first sentence through preset conversion processing, where the second sentence includes the first language element and includes a language element different from a language element in the first sentence, includes:

converting the first sentence into an audio signal;

adding noise to the audio signal;

With reference to the third implementation manner of the third aspect, in a fourth implementation manner of the third aspect, the generating a second mask for the second sentence that includes the first language element, and masking the first language element in the second sentence with the second mask to generate second training data includes:

masking the first language element with a number in the second mask that corresponds to the first language element, wherein the resulting second training data differs from the second sentence in that the first language element is replaced with a particular flag.

With reference to the fourth implementation manner of the third aspect, in a fifth implementation manner of the third aspect, the generating third training data according to the first language element, the first mask, the first training data, the second mask, and the second training data, and training a preset error correction model by using the third training data includes:

performing data mixing on a first language element vector corresponding to the first language element, the first mask, the first training data, the second mask, and the second training data to generate third training data;

and training the preset error correction model by using the third training data.

With reference to the fifth implementation manner of the third aspect, in a sixth implementation manner of the third aspect, the preset error correction model is a Bi-LSTM sequence labeling model.

With reference to the third aspect, in a seventh implementation manner of the third aspect, the present disclosure further includes:

and correcting the error of the text to be corrected by using the trained error correction model.

With reference to the seventh implementation manner of the third aspect, in an eighth implementation manner of the third aspect, the performing error correction on the text to be corrected by using the trained error correction model includes:

detecting whether the text to be corrected comprises the first language element;

and outputting the text to be corrected according to the condition that the text to be corrected does not comprise the first language element.

With reference to the eighth implementation manner of the third aspect, in a ninth implementation manner of the third aspect, the performing error correction on the text to be corrected by using the trained error correction model further includes:

generating a third mask for the text to be corrected containing the first language element according to the text to be corrected containing the first language element, and masking the first language element in the text to be corrected by using the third mask to generate fourth training data;

inputting the third mask and the fourth training data into a trained error correction model to predict a first linguistic element vector corresponding to the first linguistic element;

With reference to the third aspect, in a tenth implementation manner of the third aspect, the first language element includes at least one language element.

With reference to the third aspect, in an eleventh implementation manner of the third aspect, before generating a first mask for a first statement that includes a first language element, and masking the first language element in the first statement with the first mask to generate first training data, the method includes:

a first sentence containing a first language element is screened from the unlabeled data.

With reference to the eleventh implementation manner of the third aspect, in a twelfth implementation manner of the third aspect, the present disclosure further includes:

and excluding the first sentences containing the first language elements meeting the preset conditions from the screened first sentences.

In a fourth aspect, an embodiment of the present disclosure provides a data processing apparatus, including:

a first generation module configured to generate a first mask for a first statement containing a first language element and mask the first language element in the first statement with the first mask to generate first training data;

an acquisition module configured to acquire a second sentence converted from the first sentence through a preset conversion process, wherein the second sentence contains the first language element and contains a language element different from the language element in the first sentence;

a second generation module configured to generate a second mask for the second statement containing the first language element and mask the first language element in the second statement with the second mask to generate second training data;

a training module configured to generate third training data according to the first language element, the first mask, the first training data, the second mask, and the second training data, and train a preset error correction model using the third training data.

In a fifth aspect, an embodiment of the present disclosure provides a speech processing method, including:

screening out a first sentence containing a first language element from the non-labeled data;

acquiring a second sentence converted from the first sentence through voice recognition processing, wherein the second sentence contains the first language element and contains language elements different from the language elements in the first sentence;

With reference to the fifth aspect, in a first implementation manner of the fifth aspect, the obtaining a second sentence converted from the first sentence through speech recognition processing, wherein the second sentence contains the first linguistic element and contains linguistic elements different from those in the first sentence, includes:

converting the first sentence into an audio signal;

adding noise to the audio signal;

With reference to the fifth aspect, in a second implementation manner of the fifth aspect, the present disclosure further includes:

and correcting the error of the text recognized by the voice by using the trained error correction model.

In a sixth aspect, an embodiment of the present disclosure provides a speech processing apparatus, including:

a filtering module configured to filter out a first sentence containing a first language element from the annotation-free data;

an acquisition module configured to acquire a second sentence converted from the first sentence by the speech recognition processing, wherein the second sentence contains the first linguistic element and contains linguistic elements different from those in the first sentence;

In a seventh aspect, an embodiment of the present disclosure provides an electronic device, including a memory and a processor; wherein,

the memory is configured to store one or more computer instructions, where the one or more computer instructions are executed by the processor to implement the method as described in any one of the first aspect, the first implementation manner to the ninth implementation manner of the first aspect, the third aspect, the first implementation manner to the twelfth implementation manner of the third aspect, the fifth aspect, the first implementation manner of the fifth aspect, and the second implementation manner.

In an eighth aspect, an embodiment of the present disclosure provides a readable storage medium, on which computer instructions are stored, which when executed by a processor implement the method according to any one of the first aspect, the first implementation manner to the ninth implementation manner of the first aspect, the third aspect, the first implementation manner to the twelfth implementation manner of the third aspect, the fifth aspect, the first implementation manner of the fifth aspect, and the second implementation manner.

The technical scheme provided by the embodiment of the disclosure can have the following beneficial effects:

according to the technical scheme provided by the embodiment of the disclosure, a first mask is generated for the text to be corrected containing a first language element according to the fact that the text to be corrected comprises the first language element, and the first language element in the text to be corrected is masked by the first mask to generate a replacement text; inputting the first mask and the replacement text into an error correction model to predict a first language element vector corresponding to the first language element; generating a target language element according to the predicted first language element vector; the generated target language element is used for replacing the first language element in the text to be corrected to obtain the corrected text, the replacement text of the text to be corrected and the first mask code can be input into the correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

According to the technical scheme provided by the embodiment of the disclosure, the sentences containing the first language elements are screened from the non-labeled data to train the error correction model, so that the language error correction model can be constructed under the condition that the data are not required to be labeled, and the identification accuracy of the first language elements in the converted sentences is improved. The replacement text and the first mask code of the text to be corrected can be input into the error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

According to the technical scheme provided by the embodiment of the disclosure, the training of the error correction model by screening out the text containing the first language element from the non-labeled data comprises the following steps: generating a second mask for a first statement containing a first language element, and masking the first language element in the first statement with the second mask to generate first training data; acquiring a second sentence converted from the first sentence through preset conversion processing, wherein the second sentence comprises the first language element and comprises a language element different from the language element in the first sentence; generating a third mask for the second statement and masking the first language element in the second statement with the third mask to generate second training data; and generating third training data according to the first language element, the second mask, the first training data, the third mask and the second training data, and training an error correction model by using the third training data, so that the language error correction model can be constructed under the condition that data do not need to be labeled, and the identification accuracy of the first language element in the converted statement is improved. The replacement text and the first mask code of the text to be corrected can be input into the error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

According to the technical scheme provided by the embodiment of the disclosure, the first mask is an array comprising numbers corresponding to language elements in the text to be corrected one by one, wherein the numbers corresponding to the language elements except the first language element in the text to be corrected are the same as each other and different from the numbers corresponding to the first language element; the second mask is an array including numbers in one-to-one correspondence with language elements in the first sentence, wherein the numbers corresponding to language elements other than the first language element in the first sentence are identical to each other and different from the numbers corresponding to the first language element; the third mask is an array including numbers corresponding to the language elements in the second sentence one by one, wherein the numbers corresponding to the language elements other than the first language element in the second sentence are the same as each other and different from the numbers corresponding to the first language element, and a language error correction model can be constructed without labeling data, so that the recognition accuracy of the first language element in the converted sentence is improved. The replacement text and the first mask code of the text to be corrected can be input into the error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

According to the technical scheme provided by the embodiment of the present disclosure, generating a first mask for a text to be corrected containing a first language element by the text to be corrected including the first language element according to the text to be corrected, and masking the first language element in the text to be corrected by using the first mask to generate a replacement text, includes: and covering the first language element by using a number corresponding to the first language element in the first mask, wherein the obtained replacement text is different from the text to be corrected in that the first language element is replaced by a specific mark, the replacement text of the text to be corrected and the first mask can be input into an error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

According to the technical solution provided by the embodiment of the present disclosure, acquiring a second sentence converted from the first sentence through preset conversion processing, where the second sentence includes the first language element and includes a language element different from the language element in the first sentence, includes: converting the first sentence into an audio signal; adding noise to the audio signal; the audio signal added with the noise is converted into a second sentence, wherein the second sentence comprises the first language element and comprises the language element different from the language element in the first sentence, a language error correction model can be built under the condition that data do not need to be labeled, and the identification accuracy of the first language element in the sentence subjected to conversion processing is improved. The replacement text and the first mask code of the text to be corrected can be input into the error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

According to the technical solution provided by the embodiment of the present disclosure, generating third training data according to the first language element, the second mask, the first training data, the third mask, and the second training data, and training an error correction model using the third training data includes: performing data mixing on a first language element vector corresponding to the first language element, the second mask, the first training data, the third mask, and the second training data to generate third training data; and training the error correction model by using the third training data, constructing a language error correction model under the condition of not marking data, and improving the identification accuracy of the first language element in the converted statement. The replacement text and the first mask code of the text to be corrected can be input into the error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

According to the technical scheme provided by the embodiment of the disclosure, the error correction model is a Bi-LSTM sequence labeling model, so that a language error correction model can be constructed under the condition that data is not required to be labeled, and the identification accuracy of the first language element in the converted statement is improved.

According to the technical scheme provided by the embodiment of the disclosure, the first language element comprises at least one language element, so that a language error correction model can be constructed, and the recognition accuracy of the first language element in the converted statement is improved. The replacement text and the first mask code of the text to be corrected can be input into the error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

According to the technical scheme provided by the embodiment of the disclosure, the method for training the error correction model by screening out the sentences containing the first language elements from the non-labeled data comprises the following steps: and excluding the sentences containing the first language elements meeting the preset conditions from the screened sentences containing the first language elements, so that a language error correction model can be constructed, and the identification accuracy of the first language elements in the sentences subjected to conversion processing is improved. The replacement text and the first mask code of the text to be corrected can be input into the error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

According to the technical scheme provided by the embodiment of the disclosure, a masking module is configured to generate a first mask for a text to be corrected containing a first language element according to the text to be corrected including the first language element, and mask the first language element in the text to be corrected by using the first mask to generate a replacement text; a first generation module configured to input the first mask and the replacement text into an error correction model to predict a first language element vector corresponding to the first language element; a second generation module configured to generate a target language element from the predicted first language element vector; the error correction module is configured to replace a first language element in the text to be corrected with the generated target language element to obtain an error-corrected text, the replaced text of the text to be corrected and the first mask can be input into the error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

According to the technical scheme provided by the embodiment of the disclosure, a first mask is generated for a first statement containing a first language element, and the first language element in the first statement is masked by the first mask to generate first training data; acquiring a second sentence converted from the first sentence through preset conversion processing, wherein the second sentence comprises the first language element and comprises a language element different from the language element in the first sentence; generating a second mask for the second statement containing the first language element and masking the first language element in the second statement with the second mask to generate second training data; generating third training data according to the first language element, the first mask code, the first training data, the second mask code and the second training data, training a preset error correction model by using the third training data, training the modeling capacity of the error correction model for the first language element through the masking operation of the first language element in the original sentence and the converted sentence, and constructing the language error correction model under the condition that the data is not required to be labeled, so that the identification accuracy of the first language element in the converted sentence is improved.

According to the technical scheme provided by the embodiment of the disclosure, the first mask is an array comprising numbers corresponding to language elements in the first sentence one by one, wherein the numbers corresponding to the language elements other than the first language element in the first sentence are the same as each other and different from the numbers corresponding to the first language element; the second mask is an array comprising numbers corresponding to language elements in the second sentence one by one, wherein the numbers corresponding to the language elements except the first language element in the second sentence are the same and different from the numbers corresponding to the first language element, the modeling capability of the error correction model for the first language element can be trained through the masking operation of the first language element in the original sentence and the sentence after conversion processing, and the language error correction model can be constructed under the condition that data do not need to be labeled, so that the identification accuracy of the first language element in the sentence after conversion processing is improved.

According to the technical solution provided by the embodiment of the present disclosure, generating a first mask by the generating a first statement containing a first language element, and masking the first language element in the first statement with the first mask to generate first training data includes: the first language element is masked by using the number corresponding to the first language element in the first mask, wherein the obtained first training data is different from the first sentence in that the first language element is replaced by a specific mark, the modeling capability of an error correction model for the first language element can be trained through masking operation on the first language element in the original sentence and the converted sentence, and the language error correction model can be constructed under the condition that data does not need to be labeled, so that the identification accuracy of the first language element in the converted sentence is improved.

According to the technical solution provided by the embodiment of the present disclosure, acquiring a second sentence converted from the first sentence through preset conversion processing, where the second sentence includes the first language element and includes a language element different from the language element in the first sentence, includes: converting the first sentence into an audio signal; adding noise to the audio signal; the audio signal added with the noise is converted into a second sentence, wherein the second sentence comprises the first language element and comprises a language element different from the language element in the first sentence, the modeling capability of the error correction model for the first language element can be trained through the masking operation on the first language element in the original sentence and the sentence after the voice recognition processing, and the language error correction model can be constructed under the condition that the data is not required to be labeled, so that the recognition accuracy rate of the first language element in the sentence after the conversion processing is improved.

According to the technical solution provided by the embodiment of the present disclosure, generating a second mask by the generating the second statement that includes the first language element, and masking the first language element in the second statement with the second mask to generate second training data includes: and covering the first language element by using a number corresponding to the first language element in the second mask, wherein the obtained second training data is different from the second sentence in that the first language element is replaced by a specific mark, the modeling capability of an error correction model for the first language element can be trained through the covering operation of the first language element in the original sentence and the sentence after voice recognition, and the language error correction model can be constructed under the condition that data is not required to be labeled, so that the recognition accuracy of the first language element in the sentence after conversion processing is improved.

According to the technical solution provided by the embodiment of the present disclosure, generating third training data according to the first language element, the first mask, the first training data, the second mask, and the second training data, and training a preset error correction model by using the third training data includes: performing data mixing on a first language element vector corresponding to the first language element, the first mask, the first training data, the second mask, and the second training data to generate third training data; the preset error correction model is trained by utilizing the third training data, the modeling capability of the error correction model for the first language element can be trained through the masking operation on the first language element in the original sentence and the sentence after the voice recognition processing, the language error correction model can be constructed under the condition that the data is not required to be labeled, and the recognition accuracy rate of the first language element in the sentence after the conversion processing is improved.

According to the technical scheme provided by the embodiment of the disclosure, the preset error correction model is a Bi-LSTM sequence labeling model, so that a language error correction model can be constructed under the condition that data is not required to be labeled, and the identification accuracy of the first language element in the converted statement is improved.

According to the technical scheme provided by the embodiment of the disclosure, by using the trained error correction model to correct the text to be corrected, the modeling capability of the error correction model for the first language element can be trained through the masking operation on the first language element in the original sentence and the sentence after conversion processing, and the language error correction model can be constructed under the condition that data is not required to be labeled, so that the identification accuracy of the first language element in the sentence after conversion processing is improved, and the first language element in the sentence after conversion processing is accurately identified.

According to the technical scheme provided by the embodiment of the present disclosure, the error correction of the text to be corrected by using the trained error correction model includes: detecting whether the text to be corrected comprises the first language element; according to the fact that the text to be corrected does not include the first language element, the text to be corrected is output, the modeling capacity of the error correction model for the first language element can be trained through the masking operation on the first language element in the original sentence and the sentence after conversion processing, the language error correction model can be built under the condition that data do not need to be labeled, the recognition accuracy rate of the first language element in the sentence after conversion processing is improved, and accurate recognition of the first language element in the sentence after conversion processing is achieved.

According to the technical scheme provided by the embodiment of the present disclosure, the error correction of the text to be corrected by using the trained error correction model further includes: generating a third mask for the text to be corrected containing the first language element according to the text to be corrected containing the first language element, and masking the first language element in the text to be corrected by using the third mask to generate fourth training data; inputting the third mask and the fourth training data into a trained error correction model to predict a first linguistic element vector corresponding to the first linguistic element; generating a target language element according to the predicted first language element vector; the generated target language element is used for replacing the first language element in the text to be corrected to obtain the corrected text, the modeling capacity of the correction model for the first language element can be trained through the masking operation on the first language element in the original sentence and the converted sentence, the language correction model can be constructed under the condition that data do not need to be labeled, the recognition accuracy of the first language element in the converted sentence is improved, and the first language element in the converted sentence is accurately recognized.

According to the technical scheme provided by the embodiment of the disclosure, the first language element comprises at least one language element, the modeling capability of the error correction model for the first language element can be trained through the masking operation on the first language element in the original sentence and the converted sentence, and the language error correction model can be constructed under the condition that data is not required to be labeled, so that the identification accuracy of the first language element in the converted sentence is improved.

According to the technical scheme provided by the embodiment of the disclosure, before generating a first mask for a first statement containing a first language element and masking the first language element in the first statement by using the first mask to generate first training data, the method includes: the first sentence containing the first language element is screened out from the non-labeled data, the modeling capacity of the error correction model for the first language element can be trained through the masking operation on the first language element in the original sentence and the sentence subjected to conversion processing, the language error correction model can be constructed under the condition that the data is not required to be labeled, the identification accuracy rate of the first language element in the sentence subjected to conversion processing is improved, and the first language element in the sentence subjected to conversion processing is accurately identified.

According to the technical scheme provided by the embodiment of the disclosure, the first sentence containing the first language element meeting the preset condition is excluded from the screened first sentence, so that the interference related to the first language element can be excluded, the language error correction model can be constructed under the condition that data is not required to be labeled, and the identification accuracy of the first language element in the sentence subjected to conversion processing is improved.

According to the technical scheme provided by the embodiment of the disclosure, a first generation module is configured to generate a first mask for a first statement containing a first language element, and mask the first language element in the first statement by using the first mask to generate first training data; an acquisition module configured to acquire a second sentence converted from the first sentence through a preset conversion process, wherein the second sentence contains the first language element and contains a language element different from the language element in the first sentence; a second generation module configured to generate a second mask for the second statement containing the first language element and mask the first language element in the second statement with the second mask to generate second training data; the training module is configured to generate third training data according to the first language element, the first mask, the first training data, the second mask and the second training data, train a preset error correction model by using the third training data, train the modeling capability of the error correction model for the first language element through the masking operation on the first language element in the original sentence and the converted sentence, and construct the language error correction model without labeling data, so as to improve the recognition accuracy of the first language element in the converted sentence.

According to the technical scheme provided by the embodiment of the disclosure, a first statement containing a first language element is screened from non-labeled data; generating a first mask for a first statement containing a first language element, and masking the first language element in the first statement with the first mask to generate first training data; acquiring a second sentence converted from the first sentence through voice recognition processing, wherein the second sentence contains the first language element and contains language elements different from the language elements in the first sentence; generating a second mask for the second statement containing the first language element and masking the first language element in the second statement with the second mask to generate second training data; generating third training data according to the first language element, the first mask code, the first training data, the second mask code and the second training data, training a preset error correction model by using the third training data, training the modeling capacity of the error correction model for the first language element through the masking operation of the first language element in the original sentence and the sentence after the voice recognition processing, and constructing the language error correction model under the condition that the data is not required to be labeled, so that the recognition accuracy of the first language element in the sentence after the conversion processing is improved.

According to the technical solution provided by the embodiment of the present disclosure, acquiring a second sentence converted from the first sentence through speech recognition processing, wherein the second sentence includes the first language element and includes a language element different from the language element in the first sentence, includes: converting the first sentence into an audio signal; adding noise to the audio signal; the audio signal added with the noise is converted into a second sentence, wherein the second sentence comprises the first language element and comprises a language element different from the language element in the first sentence, the modeling capability of the error correction model for the first language element can be trained through the masking operation on the first language element in the original sentence and the sentence after the voice recognition processing, and the language error correction model can be constructed under the condition that the data is not required to be labeled, so that the recognition accuracy rate of the first language element in the sentence after the conversion processing is improved.

According to the technical scheme provided by the embodiment of the disclosure, the text recognized by the voice is corrected by using the trained error correction model, the modeling capability of the error correction model for the first language element can be trained by masking the first language element in the original sentence and the sentence after the voice recognition processing, and the language error correction model can be constructed without marking data, so that the recognition accuracy of the first language element in the sentence after the conversion processing is improved.

According to the technical scheme provided by the embodiment of the disclosure, the screening module is configured to screen out a first statement containing a first language element from non-labeled data; a first generation module configured to generate a first mask for a first statement containing a first language element and mask the first language element in the first statement with the first mask to generate first training data; an acquisition module configured to acquire a second sentence converted from the first sentence by the speech recognition processing, wherein the second sentence contains the first linguistic element and contains linguistic elements different from those in the first sentence; a second generation module configured to generate a second mask for the second statement containing the first language element and mask the first language element in the second statement with the second mask to generate second training data; the training module is configured to generate third training data according to the first language element, the first mask, the first training data, the second mask and the second training data, train a preset error correction model by using the third training data, train the modeling capability of the error correction model for the first language element through the masking operation on the first language element in the original sentence and the sentence after the voice recognition processing, and construct the language error correction model without labeling data, so as to improve the recognition accuracy of the first language element in the sentence after the conversion processing.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

Other labels, objects and advantages of the present disclosure will become more apparent from the following detailed description of non-limiting embodiments when taken in conjunction with the accompanying drawings. In the drawings:

FIG. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure;

fig. 2 shows a flowchart of step S120 in a data processing method according to an embodiment of the present disclosure;

FIG. 3 shows a flowchart of an example of an error correction model training process in a data processing method according to an embodiment of the present disclosure;

FIG. 4 is a diagram illustrating an example of an implementation scenario of an error correction model training process in the data processing method according to the embodiment shown in FIG. 3;

fig. 5 shows a flowchart of an example of an error correction process in a data processing method according to an embodiment of the present disclosure;

FIG. 6 shows a schematic block diagram of a data processing apparatus according to an embodiment of the present disclosure;

FIG. 7 illustrates a flow diagram of a text processing method according to an embodiment of the present disclosure;

FIG. 8 shows a schematic structural diagram of a text processing apparatus according to an embodiment of the present disclosure;

FIG. 9 shows a flow diagram of a method of speech processing according to an embodiment of the present disclosure;

FIG. 10 is a schematic diagram of a speech processing apparatus according to an embodiment of the present disclosure;

FIG. 11 is a schematic block diagram of an electronic device suitable for use in implementing embodiments in accordance with the present disclosure;

FIG. 12 is a schematic block diagram of a computer device suitable for use in implementing an embodiment according to the present disclosure.

Detailed Description

Hereinafter, exemplary embodiments of the present disclosure will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily implement them. Also, for the sake of clarity, parts not relevant to the description of the exemplary embodiments are omitted in the drawings.

In the present disclosure, it is to be understood that terms such as "including" or "having," etc., are intended to indicate the presence of labels, numbers, steps, actions, components, parts, or combinations thereof disclosed in the present specification, and are not intended to preclude the possibility that one or more other labels, numbers, steps, actions, components, parts, or combinations thereof are present or added.

It should be further noted that the embodiments and labels in the embodiments of the present disclosure may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.

According to the technical scheme provided by the embodiment of the disclosure, a first mask is generated for a first statement containing a first language element, and the first language element in the first statement is masked by the first mask to generate first training data; acquiring a second sentence converted from the first sentence through preset conversion processing, wherein the second sentence comprises a first language element and a language element different from the language element in the first sentence; generating a second mask for a second statement containing the first language element, and masking the first language element in the second statement with the second mask to generate second training data; generating third training data according to the first language element, the first mask, the first training data, the second mask and the second training data, training a preset error correction model by using the third training data, training the modeling capacity of the error correction model for the first language element through the masking operation on the first language element in the original sentence and the converted sentence, and constructing the language error correction model under the condition that the data is not required to be labeled, so that the identification accuracy of the first language element in the converted sentence is improved.

Fig. 1 shows a flow diagram of a data processing method according to an embodiment of the present disclosure. As shown in fig. 1, the data processing method includes the following steps S110, S120, S130, and S140:

in step S110, a first mask is generated for a first sentence having a first language element, and the first language element in the first sentence is masked with the first mask to generate first training data.

In step S120, a second sentence converted from the first sentence by the preset conversion process is acquired, wherein the second sentence includes the first language element and includes a language element different from the language element in the first sentence.

In step S130, a second mask is generated for the second sentence having the first language element, and the first language element in the second sentence is masked with the second mask to generate second training data.

In step S140, third training data is generated according to the first language element, the first mask, the first training data, the second mask, and the second training data, and a preset error correction model is trained using the third training data.

In an embodiment of the present disclosure, the data processing method may train an error correction model to accurately identify the first language element in the converted sentence, so as to improve the accuracy of identifying the sentence. In one embodiment of the present disclosure, a language element may refer to a basic unit in a sentence, for example, a word (character) or a word. The sentence includes at least one language element. In an embodiment of the present disclosure, a sentence may also be referred to as a sentence.

In one embodiment of the present disclosure, the mask may be represented by a vector, specifically, a vector composed of an array. The dimension of the vector, i.e., the number of digits in the array, may be the same as the number of linguistic elements of the statement that it is intended to mask. In an embodiment of the present disclosure, a number in the mask corresponding to a first language element in a corresponding statement may mask the first language element, and the remaining language elements in the statement may not be masked by the mask. In an embodiment of the disclosure, a first language element in a first statement is masked with a first mask to generate first training data, and a first language element in a second statement is masked with a second mask to generate second training data.

In one embodiment of the present disclosure, the first mask is an array including numbers corresponding one-to-one to the language elements in the first sentence, wherein the numbers corresponding to the language elements other than the first language element in the first sentence are identical to each other and different from the numbers corresponding to the first language element; the second mask is an array including numbers corresponding one-to-one to the language elements in the second sentence, wherein the numbers corresponding to the language elements other than the first language element in the second sentence are identical to each other and different from the numbers corresponding to the first language element.

According to the technical scheme provided by the embodiment of the disclosure, the first mask is an array comprising numbers corresponding to the language elements in the first sentence one by one, wherein the numbers corresponding to the language elements except the first language element in the first sentence are the same as each other and different from the numbers corresponding to the first language element; the second mask is an array comprising numbers corresponding to language elements in the second sentence, wherein the numbers corresponding to the language elements except the first language element in the second sentence are the same and different from the numbers corresponding to the first language element, the modeling capability of the error correction model for the first language element can be trained through the masking operation of the original sentence and the first language element in the sentence after conversion processing, and the language error correction model can be constructed under the condition that data do not need to be labeled, so that the identification accuracy of the first language element in the sentence after conversion processing is improved.

In one embodiment of the present disclosure, step S110 includes: the first language element is masked with a number in the first mask that corresponds to the first language element, wherein the resulting first training data differs from the first sentence in that the first language element is replaced with a particular token.

According to the technical scheme provided by the embodiment of the disclosure, generating a first mask for a first sentence containing a first language element, and masking the first language element in the first sentence by using the first mask to generate first training data comprises: the first language element is covered by using the number corresponding to the first language element in the first mask, wherein the obtained first training data is different from the first sentence in that the first language element is replaced by a specific mark, the modeling capability of an error correction model for the first language element can be trained through the covering operation of the first language element in the original sentence and the sentence after conversion processing, and the language error correction model can be constructed under the condition that the data is not required to be labeled, so that the identification accuracy of the first language element in the sentence after conversion processing is improved.

In one embodiment of the present disclosure, a third person pronoun, e.g., "he," "she," or "it," is defined as the first language element. For a first sentence s containing a first language element, which is "they are flowers of a country", a first mask m, i.e. a vector (array) [ 10000000 ] is generated. After masking the pronoun "he" of the third person in the sentence s with the mask m, the sentence x ═ Ta is a flower of the country, where "Ta" is a specific mark replacing the pronoun of the third person. Sentence x is the first training data. The correct third person pronoun vector y ═ he.

In one embodiment of the present disclosure, a second sentence converted from a first sentence by a preset conversion process is acquired, wherein the second sentence contains a first language element and contains a language element different from the language element in the first sentence. The pre-set transformation process may be considered a "man-taint" process, i.e., transforming the first statement into data that is artificially tainted. For example, for the first sentence s ═ they are flowers of the country, and the second sentence obtained by the preset conversion processing is s ═ they are flowers of the foot bones. For a second statement s 'containing the first language element, a second mask m' is generated, i.e. a vector (array) [ 10000000 ]. After masking the third person pronoun "he" in the sentence s ' with a mask m ', the sentence x ' ═ Ta is the flower of the foot bone, where "Ta" is a special mark replacing the third person pronoun. The sentence x' is the second training data. The correct third person is called the pronoun vector y ═ he. It should be noted that for both the first training data and the second training data, the correct third person pronoun vector is [ his ], i.e. the third person pronoun vector corresponding to the first linguistic element.

In one embodiment of the present disclosure, third training data may be generated according to the first language element, the first mask, the first training data, the second mask, and the second training data, and the preset error correction model may be trained using the third training data. According to the above example, the third person's pronoun "he", "she" or "it" is defined as the first language element, so that the third training data X ═ X1 ═ X2 can be generated, where X1 ═ X, m, y) … ], X2 ═ X', m ', y') …. And training an error correction model by using the third training data X, wherein the trained error correction model has high recognition accuracy rate on the first language element in the sentence.

An example of step S120 in the data processing method according to an embodiment of the present disclosure is described below with reference to fig. 2.

Fig. 2 shows a flowchart of step S120 in a data processing method according to an embodiment of the present disclosure. As shown in fig. 2, step S120 includes steps S210, S220, and S230.

In step S210, the first sentence is converted into an audio signal.

In step S220, noise is added to the audio signal.

In step S230, the audio signal with the noise added thereto is converted into a second sentence, wherein the second sentence contains the first linguistic element and contains linguistic elements different from those in the first sentence.

According to the technical scheme provided by the embodiment of the present disclosure, acquiring a second sentence converted from a first sentence through preset conversion processing, wherein the second sentence includes a first language element and includes a language element different from the language element in the first sentence, includes: converting the first sentence into an audio signal; adding noise to the audio signal; the audio signal added with the noise is converted into a second sentence, wherein the second sentence comprises a first language element and a language element different from the language element in the first sentence, the modeling capability of the error correction model for the first language element can be trained through the masking operation on the first language element in the original sentence and the sentence after the voice recognition processing, and the language error correction model can be constructed under the condition that the data is not required to be labeled, so that the recognition accuracy rate of the first language element in the sentence after the conversion processing is improved.

In one embodiment of the present disclosure, the preset conversion process in step S120 may be a voice recognition process. For example, the Speech Recognition process may include converting text to audio using Speech synthesis techniques for a sentence containing a first language element such as a third person pronoun, adding noise to the audio signal, and finally converting the noisy audio back to text using an ASR (Automatic Speech Recognition) system. It will be appreciated by those skilled in the art that ASR is a technique in the related art for converting speech to text, and generally refers to converting human speech to text. It is understood that the second sentence resulting from the ASR processing may differ from the first sentence, and therefore ASR may be considered a technique for "artificially contaminating" the first sentence. For example, for the first sentence s ═ they are flowers of the country, and the second sentence obtained by the speech recognition processing is s ═ they are flowers of the foot bones. In this case, the "home" in the first sentence is recognized as the "foot bone" in the second sentence resulting from the result speech recognition processing. Thus, the second sentence s 'contains the same first language element "he" as in the first sentence s and contains a language element "foot" different from the language element "ancestor" in the first sentence s and a language element "bone" different from the language element "nation", or, stated otherwise, the second sentence s' contains a language element "foot bone" different from the language element "ancestor" in the first sentence s. It should be noted that, in this embodiment, the first language element whose recognition accuracy needs to be improved specifically is the third person pronoun "he".

In one embodiment of the present disclosure, since the pronunciations of the third-person pronouns "he", "she", and "it" in the Chinese text are the same, this causes difficulty in recognition of these third-person pronouns by Chinese ASR speech recognition. In one aspect, the manual annotation data source used for ASR model training is relatively single, such as news. On the other hand, the overall data volume of the manual annotation data used by the ASR model is relatively limited, and the third person pronoun sentences contained in the manual annotation data are more limited. In practical scenarios, the speech source that ASR needs to recognize is very complex, and may be the speech input in instant messaging or the real-time text conversion of speech. And in daily conversations, the frequency of use of the pronouns called by the third person is relatively high. This presents a significant challenge to the accuracy of third-person pronouns in ASR recognition, which may further impact the effectiveness of downstream tasks, such as translation. In the embodiments of the present disclosure, it is more representative to use the third person pronoun as the first language element. Those skilled in the art will understand that the use of the third person pronoun as the first language element in the embodiments of the present disclosure is only an example, and other language elements may also be used as the first language element to improve the accuracy of recognition of the first language element in a relatively targeted manner.

In one embodiment of the present disclosure, because differences between different accents of the same language may cause differences between the text and the original text recognized by ASR, the solution of the embodiment of the present disclosure may correct the language elements different from the original text in the sentence recognized by ASR caused by accents. In this case, when training the error correction model, the mask may be used to mask specific language elements (first language elements) in the same sentence that are different due to accents, so as to train the error correction model and improve the recognition accuracy for the specific language elements. It will be appreciated by those skilled in the art that different accents may be determined by the pronunciation characteristics of different objects, or may be due to specific pronunciation rules such as dialects. In one embodiment of the present disclosure, the corpus used to train the error correction model may include corpora with different accent characteristics, such as corpora identified from a particular dialect. Under the condition that the error correction model is trained based on the corpus of different accents, the trained error correction model can be utilized to realize accurate recognition of specific language elements aiming at the different accents.

In one embodiment of the present disclosure, the voice recognition processing as the preset conversion processing is merely an example, and the embodiment of the present disclosure may also adopt other conversion processing manners such as image recognition as the manner of the preset conversion processing.

In one embodiment of the present disclosure, step S130 includes: the first language element is masked with a number in the second mask that corresponds to the first language element, wherein the resulting second training data differs from the second sentence in that the first language element is replaced with a particular token.

According to the technical scheme provided by the embodiment of the disclosure, generating a second mask for a second statement containing a first language element, and masking the first language element in the second statement by using the second mask to generate second training data includes: and covering the first language element by using the number corresponding to the first language element in the second mask, wherein the obtained second training data is different from the second sentence in that the first language element is replaced by a specific mark, the modeling capacity of an error correction model for the first language element can be trained through the covering operation of the first language element in the original sentence and the sentence after the voice recognition processing, and the language error correction model can be constructed under the condition that the data is not required to be labeled, so that the recognition accuracy of the first language element in the sentence after the conversion processing is improved.

In one embodiment of the present disclosure, the related description of the specific token may refer to the aforementioned example of the third person pronoun as the first language element.

In one embodiment of the present disclosure, step S140 includes: performing data mixing on a first language element vector, a first mask, first training data, a second mask and second training data corresponding to a first language element to generate third training data; and training the preset error correction model by using the third training data.

According to the technical scheme provided by the embodiment of the disclosure, generating third training data according to the first language element, the first mask, the first training data, the second mask and the second training data, and training the preset error correction model by using the third training data includes: performing data mixing on a first language element vector, a first mask, first training data, a second mask and second training data corresponding to a first language element to generate third training data; the third training data is used for training the preset error correction model, the modeling capability of the error correction model for the first language element can be trained through the masking operation on the first language element in the original sentence and the sentence after the voice recognition processing, the language error correction model can be constructed under the condition that the data is not required to be labeled, and the recognition accuracy rate of the first language element in the sentence after the conversion processing is improved.

In one embodiment of the present disclosure, the preset error correction model is a Bi-LSTM sequence labeling model.

According to the technical scheme provided by the embodiment of the disclosure, the preset error correction model is the Bi-LSTM sequence labeling model, so that the language error correction model can be constructed under the condition that data is not required to be labeled, and the identification accuracy of the first language element in the converted statement is improved.

In one embodiment of the present disclosure, the Bi-LSTM sequence annotation model refers to a Bi-directional LSTM sequence annotation model. The LSTM refers to a Long Short-Term Memory network (LSTM), which is a time-cycle neural network and is specially designed to solve the Long-Term dependence problem of a general RNN (cyclic neural network) and can learn Long-Term dependence information. In one embodiment of the present disclosure, sequence tagging refers to giving a chinese sentence as input, taking a sequence string composed of "BEMS" as output, and then performing word segmentation to obtain the division of the input sentence. Wherein, B represents that the character is the initial character in the word, M represents the middle character in the word, E represents the ending character in the word, and S represents the single character formation. Details of the Bi-LSTM sequence labeling model can be obtained from the related art, and are not described in detail in this disclosure.

In one embodiment of the present disclosure, when training the error correction model using the third training data, the error correction model structure may label the model with a multi-layered Bi-LSTM sequence, with the model input being the third training data (e.g., the aforementioned X set). In the case of using the X set as the third training data, the general process may be to convert X into a word vector, obtain a vector h containing context information using a multi-layer Bi-LSTM error correction model, calculate softmax with a third named-token word table at a position corresponding to 1 in a mask m in the vector h containing context information to obtain a probability of each third named-token, calculate a cross entropy loss function with each token in a third named-token vector y, and finally perform parameter update through gradient back propagation. In machine learning, especially in deep learning, softmax is a common and important function, and is widely used especially in multi-classification scenes. softmax maps some inputs to real numbers between 0-1 and the normalization guarantees a sum of 1, so the sum of the probabilities for multi-classification is also exactly 1. It can be understood by those skilled in the art that the above discussion of training the error correction model based on the multi-layer Bi-LSTM sequence labeling model is only an example, and the present disclosure can also use other models as the error correction model for training, and details thereof can be obtained from the related art, and are not described herein again.

In one embodiment of the present disclosure, the data processing method may further include: and correcting the error of the text to be corrected by using the trained error correction model.

In an embodiment of the present disclosure, the error correction of the text to be corrected by using the trained error correction model includes: detecting whether the text to be corrected comprises a first language element; and outputting the text to be corrected according to the condition that the text to be corrected does not comprise the first language element.

According to the technical scheme provided by the embodiment of the disclosure, the error correction of the text to be corrected by using the trained error correction model comprises the following steps: detecting whether the text to be corrected comprises a first language element; according to the method, the text to be corrected is output according to the fact that the text to be corrected does not include the first language element, the modeling capacity of the error correction model for the first language element can be trained through the masking operation on the first language element in the original sentence and the sentence subjected to conversion processing, the language error correction model can be constructed under the condition that data do not need to be marked, the recognition accuracy of the first language element in the sentence subjected to conversion processing is improved, and accurate recognition of the first language element in the sentence subjected to conversion processing is achieved.

In one embodiment of the present disclosure, according to the above example in which the third-person pronoun "he", "she", or "it" is defined as the first language element, when it is detected that the text to be corrected does not include the first language element, the original sentence that does not include the third-person pronoun is returned.

In an embodiment of the present disclosure, the error correcting the text to be corrected by using the trained error correction model further includes: generating a third mask for the text to be corrected containing the first language element according to the text to be corrected containing the first language element, and masking the first language element in the text to be corrected by using the third mask to generate fourth training data; inputting the third mask and the fourth training data into a trained error correction model to predict a first language element vector corresponding to the first language element; generating a target language element according to the predicted first language element vector; and replacing the first language element in the text to be corrected with the generated target language element to obtain corrected text.

According to the technical scheme provided by the embodiment of the present disclosure, the error correction of the text to be corrected by using the trained error correction model further includes: generating a third mask for the text to be corrected containing the first language element according to the text to be corrected containing the first language element, and masking the first language element in the text to be corrected by using the third mask to generate fourth training data; inputting the third mask and the fourth training data into a trained error correction model to predict a first language element vector corresponding to the first language element; generating a target language element according to the predicted first language element vector; the generated target language element is used for replacing the first language element in the text to be corrected to obtain the corrected text, the modeling capacity of the correction model for the first language element can be trained through the masking operation on the first language element in the original sentence and the converted sentence, the language correction model can be constructed under the condition that data do not need to be labeled, the recognition accuracy of the first language element in the converted sentence is improved, and the first language element in the converted sentence is accurately recognized.

In one embodiment of the present disclosure, according to the above example in which the third person pronoun "he", "she" or "it" is defined as the first language element, for the text to be corrected containing the third person pronoun, the mask m is generated to mask the third person pronoun therein, resulting in the sentence x. Inputting the statement x and the mask m into an error correction model, predicting by the error correction model to obtain a first language element vector (third person pronoun vector) y, namely generating a third person pronoun by the error correction model, replacing the third person pronoun in the original statement by a target language element calculated by the error correction model, and returning the error-corrected statement (text). Since the first linguistic element has been masked, the error correction model needs to predict a first linguistic element vector corresponding to the masked first linguistic element. For example, the predicted first linguistic element vector may be y ═ he, or y ═ s, or y ═ it. Therefore, the target language element generated according to the first language element vector predicted by the error correction model may or may not be consistent with the first language element in the text to be corrected. Regardless of whether the target language element is consistent with the first language element in the text to be corrected, the target language element is used to replace the first language element in the text to be corrected to obtain an accurate text in the embodiment of the present disclosure.

In one embodiment of the present disclosure, the first linguistic element includes at least one linguistic element.

In one embodiment of the present disclosure, the first language element may be defined as a class of language elements. For example, the first linguistic element may be a third-person pronoun that includes at least three linguistic elements "he", "she", and "it".

In one embodiment of the present disclosure, before step S110, the method includes: a first sentence containing a first language element is screened from the unlabeled data.

According to the technical scheme provided by the embodiment of the disclosure, before generating a first mask for a first statement containing a first language element and masking the first language element in the first statement with the first mask to generate first training data, the method includes: the first sentence containing the first language element is screened out from the non-labeled data, the modeling capacity of the error correction model for the first language element can be trained through the masking operation on the first language element in the original sentence and the sentence subjected to conversion processing, the language error correction model can be constructed under the condition that the data is not required to be labeled, the identification accuracy rate of the first language element in the sentence subjected to conversion processing is improved, and the first language element in the sentence subjected to conversion processing is accurately identified.

In one embodiment of the present disclosure, since the overall data volume of the manual annotation data is relatively limited, the statements of the first language element contained therein are more limited and are not suitable for training the error correction model. Therefore, the method can adopt the label-free data to generate a large number of sentences containing the first language elements, which can also be called pseudo-corpora, and the accuracy of identifying the first language elements in the data processing result can be obviously improved by using the pseudo-corpora to train the error correction model. Therefore, the data processing scheme in the embodiment of the present disclosure focuses on the process of generating training data from label-free data, and trains the modeling capability of the error correction model for the first language element through the masking operation on the first language element in the original sentence and the artificially contaminated sentence.

In one embodiment of the present disclosure, the data processing method further includes: and excluding the first sentences containing the first language elements meeting the preset conditions from the screened first sentences.

In one embodiment of the present disclosure, in light of the above example where the third person pronoun "he", "she" or "it" is defined as the first language element, some special cases may be removed, such as removing "other" words from the first language element that formally contain the third person pronoun but do not contain their meaning, resulting in a set of sentences that contain the third person pronoun.

A flowchart of an error correction model training process in the data processing method according to an embodiment of the present disclosure is described below with reference to fig. 3 and 4.

Fig. 3 shows a flowchart of an example of an error correction model training process in a data processing method according to an embodiment of the present disclosure. Fig. 4 is a schematic diagram illustrating an implementation scenario example of an error correction model training process in the data processing method according to the embodiment shown in fig. 3.

As shown in fig. 3, in steps S310 and S320, sentences containing third person pronouns are screened from a large amount of multi-domain unlabeled data. As can be seen from fig. 4, for the first sentence S containing the third person' S pronouns, "they are flowers of the country," a first mask m is generated (step S340), i.e., a vector (array) [ 10000000 ]. After masking the pronoun "he" of the third person in the sentence s with the mask m, the sentence x ═ Ta is a flower of the country, where "Ta" is a specific mark replacing the pronoun of the third person. Sentence x is the first training data. The correct third person pronoun vector y ═ he.

In step S330, artificial pollution data is generated. As can be seen from fig. 4, a second sentence converted from the first sentence by the speech recognition process is obtained, wherein the second sentence contains a third person pronoun and contains a different linguistic element than the third person pronoun in the first sentence. The speech recognition process may be considered a "artificially contaminated" process, i.e., converting the first sentence into artificially contaminated data. For example, for the first sentence s ═ they are flowers of the country, and the second sentence obtained by the speech recognition processing is s ═ they are flowers of the foot bones. For a second sentence S ' containing a third person ' S pronoun, a second mask m ' is generated (step S340), i.e. the vector [ 10000000 ]. After masking the third person pronoun "he" in the sentence s ' with a mask m ', the sentence x ' ═ Ta is the flower of the foot bone, where "Ta" is a special mark replacing the third person pronoun. The sentence x' is the second training data. The correct third person is called the pronoun vector y ═ he.

In step S350, an error correction model is trained. As shown in fig. 4, third training data (mixed data) may be generated according to the first mask, the first training data, the third person pronoun vector y, the second mask, the second training data, the third person pronoun vector y', and the preset error correction model may be trained using the mixed data. Mixed data X ═ X1 ═ X2, where X1 ═ [ (X, m, y) … ], X2 ═ X ', m ', y ') …, can be generated. And training an error correction model by using the mixed data X, wherein the trained error correction model has high recognition accuracy rate on the third person pronouns in the sentences.

A flowchart of an error correction process in the data processing method according to an embodiment of the present disclosure is described below with reference to fig. 5.

Fig. 5 shows a flowchart of an example of an error correction process in a data processing method according to an embodiment of the present disclosure.

As shown in fig. 5, in step S510, a text to be corrected is input. In step S510, it is detected whether the text to be corrected contains a third person pronoun. In step S530, if the text to be corrected does not contain the third person pronouns, the original sentence is directly returned. In step S540, for the text containing the third person pronouns, a pronoun mask is generated for the third person pronouns, and is masked, so as to obtain a sentence x and a mask m. In step S550, the sentence x and the mask m are input into the error correction model, the error correction model predicts to obtain the third person pronoun vector y, that is, the error correction model generates the third person pronoun, the result calculated by the error correction model replaces the third person pronoun in the original sentence, and the sentence is returned.

An example of a data processing apparatus according to an embodiment of the present disclosure is described below with reference to fig. 6.

Fig. 6 shows a schematic structural diagram of a data processing apparatus 600 according to an embodiment of the present disclosure. As shown in fig. 6, the data processing apparatus 600 includes the following first generation module 610, acquisition module 620, second generation module 630, and training module 640.

The first generation module 610 is configured to generate a first mask for a first statement containing a first language element and mask the first language element in the first statement with the first mask to generate first training data.

The obtaining module 620 is configured to obtain a second sentence converted from the first sentence through a preset conversion process, wherein the second sentence includes the first language element and includes a language element different from the language element in the first sentence.

The second generation module 630 is configured to generate a second mask for a second statement containing the first language element and mask the first language element in the second statement with the second mask to generate second training data.

The training module 640 is configured to generate third training data according to the first language element, the first mask, the first training data, the second mask, and the second training data, and train the preset error correction model using the third training data.

According to the technical scheme provided by the embodiment of the disclosure, the first generation module is configured to generate a first mask for a first statement containing a first language element, and mask the first language element in the first statement by using the first mask to generate first training data; an acquisition module configured to acquire a second sentence converted from the first sentence by a preset conversion process, wherein the second sentence contains the first language element and contains a language element different from the language element in the first sentence; a second generation module configured to generate a second mask for a second statement containing the first language element and mask the first language element in the second statement with the second mask to generate second training data; the training module is configured to generate third training data according to the first language element, the first mask code, the first training data, the second mask code and the second training data, train a preset error correction model by using the third training data, train the modeling capacity of the error correction model for the first language element through the masking operation on the first language element in the original sentence and the converted sentence, and construct the language error correction model without labeling data, so that the recognition accuracy of the first language element in the converted sentence is improved.

It will be understood by those skilled in the art that the technical solution described with reference to fig. 6 may be combined with the embodiments described with reference to fig. 1 to 5, so as to have the technical effects achieved by the embodiments described with reference to fig. 1 to 5. For details, reference may be made to the description made above with reference to fig. 1 to 5, and details thereof are not repeated herein.

An example of a text processing method according to an embodiment of the present disclosure is described below with reference to fig. 7.

FIG. 7 shows a flow diagram of a text processing method according to an embodiment of the present disclosure. As shown in fig. 7, the text processing method includes the following steps S710, S720, S730, and S740:

in step S710, according to that the text to be corrected includes the first language element, a first mask is generated for the text to be corrected containing the first language element, and the first language element in the text to be corrected is masked by the first mask to generate a replacement text.

In step S720, the first mask and the replacement text are input to an error correction model to predict a first language element vector corresponding to the first language element.

In step S730, a target linguistic element is generated from the predicted first linguistic element vector.

In step S740, the generated target language element is used to replace the first language element in the text to be corrected to obtain the corrected text.

According to the technical scheme provided by the embodiment of the disclosure, a first mask is generated for the text to be corrected containing a first language element according to the text to be corrected including the first language element, and the first language element in the text to be corrected is masked by the first mask to generate a replacement text; inputting the first mask and the replacement text into an error correction model to predict a first language element vector corresponding to the first language element; generating a target language element according to the predicted first language element vector; the generated target language element is used for replacing the first language element in the text to be corrected to obtain the corrected text, the replacement text of the text to be corrected and the first mask can be input into the correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

In one embodiment of the present disclosure, the text to be corrected may be text obtained by various means, for example, text obtained via speech recognition, text obtained via image recognition that requires correction, and the like.

In one embodiment of the present disclosure, the text processing method further includes: and (5) screening out sentences containing the first language elements from the non-labeled data to train an error correction model.

In one embodiment of the present disclosure, screening out text containing a first language element from unlabeled data to train an error correction model includes: generating a second mask for the first statement containing the first language element, and masking the first language element in the first statement with the second mask to generate first training data; acquiring a second sentence converted from the first sentence through preset conversion processing, wherein the second sentence comprises a first language element and a language element different from the language element in the first sentence; generating a third mask for the second statement and masking the first language element in the second statement with the third mask to generate second training data; third training data is generated according to the first language element, the second mask, the first training data, the third mask and the second training data, and an error correction model is trained by using the third training data.

According to the technical scheme provided by the embodiment of the disclosure, the method for training the error correction model by screening out the text containing the first language element from the non-labeled data comprises the following steps: generating a second mask for the first statement containing the first language element, and masking the first language element in the first statement with the second mask to generate first training data; acquiring a second sentence converted from the first sentence through preset conversion processing, wherein the second sentence comprises a first language element and a language element different from the language element in the first sentence; generating a third mask for the second statement and masking the first language element in the second statement with the third mask to generate second training data; and generating third training data according to the first language element, the second mask code, the first training data, the third mask code and the second training data, training an error correction model by using the third training data, constructing the language error correction model under the condition of not marking data, and improving the identification accuracy of the first language element in the converted statement. The replacement text and the first mask code of the text to be corrected can be input into the error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

In one embodiment of the present disclosure, the first mask is an array including numbers corresponding one-to-one to language elements in the text to be corrected, wherein the numbers corresponding to language elements other than the first language element in the text to be corrected are identical to each other and different from the numbers corresponding to the first language element; the second mask is an array including numbers corresponding to the language elements in the first sentence one by one, wherein the numbers corresponding to the language elements other than the first language element in the first sentence are identical to each other and different from the numbers corresponding to the first language element; the third mask is an array including numbers corresponding one-to-one to the language elements in the second sentence, wherein the numbers corresponding to the language elements other than the first language element in the second sentence are identical to each other and different from the numbers corresponding to the first language element.

According to the technical scheme provided by the embodiment of the disclosure, the first mask is an array comprising numbers corresponding to language elements in the text to be corrected one by one, wherein the numbers corresponding to the language elements except the first language element in the text to be corrected are the same as each other and different from the numbers corresponding to the first language element; the second mask is an array including numbers corresponding to the language elements in the first sentence one by one, wherein the numbers corresponding to the language elements other than the first language element in the first sentence are identical to each other and different from the numbers corresponding to the first language element; the third mask is an array including numbers corresponding to the language elements in the second sentence one by one, wherein the numbers corresponding to the language elements other than the first language element in the second sentence are the same and different from the numbers corresponding to the first language element, and the language error correction model can be constructed without labeling data, so that the recognition accuracy of the first language element in the converted sentence is improved. The replacement text and the first mask code of the text to be corrected can be input into the error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

In one embodiment of the present disclosure, generating a first mask for a text to be corrected containing a first language element according to the text to be corrected including the first language element, and masking the first language element in the text to be corrected with the first mask to generate a replacement text, includes: and masking the first language element by using a number corresponding to the first language element in the first mask, wherein the obtained replacement text is different from the text to be corrected in that the first language element is replaced by a specific mark.

According to the technical scheme provided by the embodiment of the disclosure, generating a first mask for the text to be corrected containing the first language element according to the text to be corrected including the first language element, and masking the first language element in the text to be corrected by using the first mask to generate a replacement text, includes: and covering the first language element by using a number corresponding to the first language element in the first mask, wherein the obtained replacement text is different from the text to be corrected in that the first language element is replaced by a specific mark, the replacement text of the text to be corrected and the first mask can be input into an error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

In one embodiment of the present disclosure, acquiring a second sentence converted from a first sentence by a preset conversion process, wherein the second sentence contains a first language element and contains a language element different from the language element in the first sentence, includes: converting the first sentence into an audio signal; adding noise to the audio signal; the noise-added audio signal is converted into a second sentence, wherein the second sentence contains the first linguistic elements and contains linguistic elements different from those in the first sentence.

According to the technical solution provided by the embodiment of the present disclosure, acquiring a second sentence converted from a first sentence through preset conversion processing, wherein the second sentence includes a first language element and includes a language element different from the language element in the first sentence, includes: converting the first sentence into an audio signal; adding noise to the audio signal; the audio signal added with the noise is converted into the second sentence, wherein the second sentence comprises the first language element and the language element different from the language element in the first sentence, a language error correction model can be constructed under the condition that data do not need to be labeled, and the identification accuracy of the first language element in the sentence subjected to conversion processing is improved. The replacement text and the first mask code of the text to be corrected can be input into the error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

In one embodiment of the disclosure, generating third training data from the first language element, the second mask, the first training data, the third mask, and the second training data, and training the error correction model using the third training data includes: performing data mixing on a first language element vector, a second mask, first training data, a third mask and second training data corresponding to the first language element to generate third training data; the error correction model is trained using the third training data.

According to the technical scheme provided by the embodiment of the disclosure, generating third training data according to the first language element, the second mask, the first training data, the third mask and the second training data, and training the error correction model by using the third training data includes: performing data mixing on a first language element vector, a second mask, first training data, a third mask and second training data corresponding to the first language element to generate third training data; and training the error correction model by using the third training data, constructing a language error correction model under the condition that data do not need to be labeled, and improving the identification accuracy of the first language element in the converted statement. The replacement text and the first mask code of the text to be corrected can be input into the error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

In one embodiment of the present disclosure, the error correction model is a Bi-LSTM sequence labeling model.

According to the technical scheme provided by the embodiment of the disclosure, the error correction model is the Bi-LSTM sequence labeling model, so that the language error correction model can be constructed under the condition that data is not required to be labeled, and the identification accuracy of the first language element in the converted statement is improved.

According to the technical scheme provided by the embodiment of the disclosure, the language correction model can be constructed by the first language element including at least one language element, so that the recognition accuracy of the first language element in the converted statement is improved. The replacement text and the first mask code of the text to be corrected can be input into the error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

In one embodiment of the present disclosure, screening out sentences containing first language elements from unlabeled data to train an error correction model comprises: and eliminating sentences containing the first language elements meeting preset conditions from the screened sentences containing the first language elements.

According to the technical scheme provided by the embodiment of the disclosure, the method for training the error correction model by screening out the sentences containing the first language elements from the non-labeled data comprises the following steps: and eliminating the sentences containing the first language elements meeting the preset conditions from the screened sentences containing the first language elements, so that a language error correction model can be constructed, and the identification accuracy of the first language elements in the sentences subjected to conversion processing is improved. The replacement text and the first mask code of the text to be corrected can be input into the error correction model, and the first language element is generated to replace the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

It can be understood by those skilled in the art that the technical solution described with reference to fig. 7 can be combined with the embodiments described with reference to fig. 1 to 6, so as to have the technical effects achieved by the embodiments described with reference to fig. 1 to 6. For details, reference may be made to the description made above with reference to fig. 1 to 6, and details thereof are not repeated herein.

An example of a text processing apparatus according to an embodiment of the present disclosure is described below with reference to fig. 8.

Fig. 8 shows a schematic structural diagram of a text processing apparatus 800 according to an embodiment of the present disclosure. As shown in fig. 8, the text processing apparatus 800 includes a mask module 810, a first generation module 820, a second generation module 830, and an error correction module 840 as follows.

The masking module 810 is configured to generate a first mask for the text to be corrected containing the first language element according to the text to be corrected including the first language element, and mask the first language element in the text to be corrected with the first mask to generate a replacement text.

The first generation module 820 is configured to input the first mask and the replacement text into an error correction model to predict a first language element vector corresponding to the first language element.

The second generation module 830 is configured to generate the target language element from the predicted first language element vector.

The correction module 840 is configured to replace a first language element in the text to be corrected with the generated target language element to obtain a corrected text.

According to the technical scheme provided by the embodiment of the disclosure, the masking module is configured to generate a first mask for the text to be corrected containing the first language element according to the text to be corrected including the first language element, and mask the first language element in the text to be corrected by using the first mask to generate a replacement text; a first generation module configured to input the first mask and the replacement text into an error correction model to predict a first language element vector corresponding to the first language element; the second generation module generates a target language element according to the predicted first language element vector; the error correction module is configured to substitute the generated target language element for the first language element in the text to be corrected to obtain an error-corrected text, the substituted text of the text to be corrected and the first mask can be input into the error correction model, and the first language element is generated to substitute for the first language element in the text to be corrected, so that the identification accuracy of the first language element in the text to be corrected is improved.

It can be understood by those skilled in the art that the technical solution described with reference to fig. 8 can be combined with the embodiments described with reference to fig. 1 to 7, so as to have the technical effects achieved by the embodiments described with reference to fig. 1 to 7. For details, reference may be made to the description made above with reference to fig. 1 to 7, and details thereof are not repeated herein.

An example of a voice processing method according to an embodiment of the present disclosure is described below with reference to fig. 9.

FIG. 9 shows a flow diagram of a speech processing method according to an embodiment of the present disclosure. As shown in fig. 9, the data processing method includes the following steps S910, S920, S930, S940, and S950:

in step S910, a first sentence having a first language element is screened from the annotation-free data.

In step S920, a first mask is generated for a first sentence having a first language element, and the first language element in the first sentence is masked with the first mask to generate first training data.

In step S930, a second sentence converted from the first sentence by the speech recognition process is acquired, wherein the second sentence includes the first linguistic element and includes linguistic elements different from those in the first sentence.

In step S940, a second mask is generated for the second sentence having the first language element, and the first language element in the second sentence is masked with the second mask to generate second training data.

In step S950, third training data is generated according to the first language element, the first mask, the first training data, the second mask, and the second training data, and a preset error correction model is trained using the third training data.

According to the technical scheme provided by the embodiment of the disclosure, a first statement containing a first language element is screened from non-labeled data; generating a first mask for a first statement containing a first language element, and masking the first language element in the first statement with the first mask to generate first training data; acquiring a second sentence converted from the first sentence through voice recognition processing, wherein the second sentence contains a first language element and contains a language element different from the language element in the first sentence; generating a second mask for a second statement containing the first language element, and masking the first language element in the second statement with the second mask to generate second training data; generating third training data according to the first language element, the first mask, the first training data, the second mask and the second training data, training a preset error correction model by using the third training data, training the modeling capacity of the error correction model for the first language element through the masking operation on the first language element in the original sentence and the sentence after voice recognition processing, constructing the language error correction model under the condition that data do not need to be labeled, and improving the recognition accuracy of the first language element in the sentence after conversion processing.

In one embodiment of the present disclosure, obtaining a second sentence converted from a first sentence through a speech recognition process, wherein the second sentence contains a first linguistic element and contains a linguistic element different from the linguistic element in the first sentence, comprises: converting the first sentence into an audio signal; adding noise to the audio signal; the noise-added audio signal is converted into a second sentence, wherein the second sentence contains the first linguistic elements and contains linguistic elements different from those in the first sentence.

According to the technical solution provided by the embodiment of the present disclosure, acquiring a second sentence converted from a first sentence through speech recognition processing, wherein the second sentence includes a first language element and includes a language element different from the language element in the first sentence, includes: converting the first sentence into an audio signal; adding noise to the audio signal; the audio signal added with the noise is converted into a second sentence, wherein the second sentence comprises a first language element and a language element different from the language element in the first sentence, the modeling capability of the error correction model for the first language element can be trained through the masking operation on the first language element in the original sentence and the sentence after the voice recognition processing, and the language error correction model can be constructed under the condition that the data is not required to be labeled, so that the recognition accuracy rate of the first language element in the sentence after the conversion processing is improved.

In one embodiment of the present disclosure, the speech processing method further includes: and correcting the error of the text recognized by the voice by using the trained error correction model.

Those skilled in the art will appreciate that speech processing techniques according to embodiments of the present disclosure are suitable for various fields such as voice messaging, real-time audio-video, voice-to-text, simultaneous interpretation, game speech processing, and the like.

It can be understood by those skilled in the art that the technical solution described with reference to fig. 9 can be combined with the embodiments described with reference to fig. 1 to 8, so as to have the technical effects achieved by the embodiments described with reference to fig. 1 to 8. For details, reference may be made to the description made above with reference to fig. 1 to 8, and details thereof are not repeated herein.

An example of a speech processing apparatus according to an embodiment of the present disclosure is described below with reference to fig. 10.

Fig. 10 shows a schematic configuration diagram of a speech processing apparatus 1000 according to an embodiment of the present disclosure. As shown in fig. 10, the speech processing apparatus 1000 includes the following filtering module 1010, first generating module 1020, obtaining module 1030, second generating module 1040, and training module 1050.

The filtering module 1010 is configured to filter out a first sentence containing a first language element from the annotation free data.

The first generation module 1020 is configured to generate a first mask for a first statement containing a first language element and mask the first language element in the first statement with the first mask to generate first training data.

The obtaining module 1030 is configured to obtain a second sentence converted from the first sentence through the speech recognition process, wherein the second sentence contains the first language element and contains a language element different from the language element in the first sentence.

The second generation module 1040 is configured to generate a second mask for a second statement containing the first language element and mask the first language element in the second statement with the second mask to generate second training data.

The training module 1050 is configured to generate third training data according to the first language element, the first mask, the first training data, the second mask, and the second training data, and train the preset error correction model using the third training data.

According to the technical scheme provided by the embodiment of the disclosure, the screening module is configured to screen out a first statement containing a first language element from non-labeled data; a first generation module configured to generate a first mask for a first statement containing a first language element and mask the first language element in the first statement with the first mask to generate first training data; an acquisition module configured to acquire a second sentence converted from the first sentence by the speech recognition processing, wherein the second sentence contains the first language element and contains a language element different from the language element in the first sentence; a second generation module configured to generate a second mask for a second statement containing the first language element and mask the first language element in the second statement with the second mask to generate second training data; the training module is configured to generate third training data according to the first language element, the first mask code, the first training data, the second mask code and the second training data, train a preset error correction model by using the third training data, train the modeling capacity of the error correction model for the first language element through the masking operation on the first language element in the original sentence and the sentence after the voice recognition processing, construct the language error correction model without labeling data, and improve the recognition accuracy of the first language element in the sentence after the conversion processing.

It can be understood by those skilled in the art that the technical solution described with reference to fig. 10 can be combined with the embodiments described with reference to fig. 1 to 9, so as to have the technical effects achieved by the embodiments described with reference to fig. 1 to 9. For details, reference may be made to the description made above with reference to fig. 1 to 9, and details thereof are not repeated herein.

The foregoing embodiments describe the internal functions and structures of the text processing apparatus, the data processing apparatus, and the speech processing apparatus, and in one possible design, the structures of the text processing apparatus, the data processing apparatus, and the speech processing apparatus may be implemented as an electronic device, such as shown in fig. 11, and the electronic device 1100 may include a processor 1101 and a memory 1102.

The memory 1102 is used for storing programs for supporting the positioning apparatus to execute the text processing method, the data processing method or the voice processing method in any of the above embodiments, and the processor 1101 is configured to execute the programs stored in the memory 1102.

In one embodiment of the present disclosure, the memory 1102 is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor 1101 to implement the steps of:

In one embodiment of the present disclosure, the one or more computer instructions are further executable by the processor 1101 to perform the steps of:

In one embodiment of the disclosure, the screening out the text containing the first language element from the label-free data to train the error correction model includes:

In one embodiment of the disclosure, the first mask is an array including numbers corresponding to language elements in the text to be corrected in a one-to-one manner, wherein the numbers corresponding to language elements other than the first language element in the text to be corrected are identical to each other and different from the numbers corresponding to the first language element; the second mask is an array including numbers in one-to-one correspondence with language elements in the first sentence, wherein the numbers corresponding to language elements other than the first language element in the first sentence are identical to each other and different from the numbers corresponding to the first language element; the third mask is an array including numbers in one-to-one correspondence with language elements in the second sentence, wherein the numbers corresponding to language elements other than the first language element in the second sentence are identical to each other and different from the numbers corresponding to the first language element.

In an embodiment of the present disclosure, the generating a first mask for the text to be corrected containing the first language element according to the text to be corrected including the first language element, and masking the first language element in the text to be corrected with the first mask to generate a replacement text includes:

In an embodiment of the present disclosure, the obtaining a second sentence converted from the first sentence through a preset conversion process, wherein the second sentence includes the first language element and includes a language element different from the language element in the first sentence, includes:

converting the first sentence into an audio signal;

adding noise to the audio signal;

In one embodiment of the disclosure, the generating third training data according to the first language element, the second mask, the first training data, the third mask, and the second training data, and training an error correction model using the third training data includes:

training the error correction model using the third training data.

In one embodiment of the present disclosure, the first language element includes at least one language element.

In one embodiment of the disclosure, the screening out the sentences containing the first language element from the non-labeled data to train the error correction model includes:

acquiring a second sentence converted from the first sentence through preset conversion processing, wherein the second sentence comprises a first language element and a language element different from the language element in the first sentence;

In one embodiment of the present disclosure, the first mask is an array including numbers in one-to-one correspondence with language elements in the first sentence, wherein the numbers corresponding to language elements other than the first language element in the first sentence are identical to each other and different from the number corresponding to the first language element; the second mask is an array including numbers in one-to-one correspondence with language elements in the second sentence, wherein the numbers corresponding to language elements other than the first language element in the second sentence are identical to each other and different from the numbers corresponding to the first language element.

In one embodiment of the disclosure, the generating a first mask for a first statement containing a first language element and masking the first language element in the first statement with the first mask to generate first training data includes:

In one embodiment of the present disclosure, the obtaining a second sentence converted from the first sentence through a preset conversion process, wherein the second sentence includes a first language element and includes a language element different from the language element in the first sentence, includes:

converting the first sentence into an audio signal;

adding noise to the audio signal;

the noise-added audio signal is converted into a second sentence, wherein the second sentence contains the first linguistic elements and contains linguistic elements different from those in the first sentence.

In one embodiment of the disclosure, the generating a second mask for the second statement containing the first language element and masking the first language element in the second statement with the second mask to generate second training data includes:

In an embodiment of the disclosure, the generating third training data according to the first language element, the first mask, the first training data, the second mask, and the second training data, and training a preset error correction model using the third training data includes:

In an embodiment of the present disclosure, the preset error correction model is a Bi-LSTM sequence labeling model.

In an embodiment of the present disclosure, the performing error correction on the text to be corrected by using the trained error correction model includes:

In an embodiment of the present disclosure, the performing error correction on the text to be corrected by using the trained error correction model further includes:

In one embodiment of the disclosure, prior to generating a first mask for a first statement containing a first language element and masking the first language element in the first statement with the first mask to generate first training data, the one or more computer instructions are executed by the processor 1101 to:

acquiring a second sentence converted from the first sentence through voice recognition processing, wherein the second sentence comprises a first language element and a language element different from the language element in the first sentence;

In one embodiment of the present disclosure, the obtaining a second sentence converted from the first sentence through a speech recognition process, wherein the second sentence contains a first linguistic element and contains a linguistic element different from that in the first sentence, comprises:

converting the first sentence into an audio signal;

adding noise to the audio signal;

The exemplary embodiments of the present disclosure also provide a computer storage medium for storing computer software instructions for the electronic device, which includes instructions for executing the program according to any of the above embodiments, thereby achieving the technical effects of the method.

As shown in fig. 12, the computer apparatus 1200 includes a processor (CPU, GPU, FPGA, etc.) 1201, which can execute various processes in the embodiments shown in the above-described drawings according to a program stored in a Read Only Memory (ROM)1202 or a program loaded from a storage section 1208 into a Random Access Memory (RAM) 1203. In the RAM1203, various programs and data necessary for the operation of the apparatus 1200 are also stored. The processor 1201, the ROM1202, and the RAM1203 are connected to each other by a bus 1204. An input/output (I/O) interface 1205 is also connected to bus 1204.

The following components are connected to the I/O interface 1205: an input section 1206 including a keyboard, a mouse, and the like; an output portion 1207 including a display device such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage section 1208 including a hard disk and the like; and a communication section 1209 including a network interface card such as a LAN card, a modem, or the like. The communication section 1209 performs communication processing via a network such as the internet. A driver 1210 is also connected to the I/O interface 1205 as needed. A removable medium 1211, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like, is mounted on the drive 1210 as necessary, so that a computer program read out therefrom is mounted into the storage section 1208 as necessary.

In particular, according to embodiments of the present disclosure, the methods described above with reference to the figures may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program tangibly embodied on a medium readable thereby, the computer program comprising program code for performing the methods of the figures. In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 1209, and/or installed from the removable medium 1211.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of apparatus, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowcharts or block diagrams may represent a module, a program segment, or a portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based devices that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The units or modules described in the embodiments of the present disclosure may be implemented by software or hardware. The units or modules described may also be provided in a processor, and the names of the units or modules do not in some cases constitute a limitation of the units or modules themselves.

As another aspect, the present disclosure also provides a computer-readable storage medium, which may be the computer-readable storage medium included in the apparatus in the above-described embodiment; or it may be a separate computer readable storage medium not incorporated into the device. The computer-readable storage medium stores one or more programs which are used by one or more processors to perform the methods described in the present disclosure, thereby providing technical effects brought by the methods.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims

1. A method of text processing, comprising:

generating a first mask for the text to be corrected containing a first language element according to the text to be corrected comprising the first language element, and masking the first language element in the text to be corrected by using the first mask to generate a replacement text;

replacing a first language element in the text to be corrected with the generated target language element to obtain a corrected text,

selecting sentences containing the first language element from the non-labeled data to train the error correction model,

the screening out the sentences containing the first language elements from the non-labeled data to train the error correction model comprises the following steps:

data mixing according to the first language element, the second mask, the first training data, the third mask, and the second training data to generate third training data, and training an error correction model using the third training data,

the first mask is an array comprising numbers corresponding to language elements in the text to be corrected in a one-to-one mode, wherein the numbers corresponding to the language elements except the first language element in the text to be corrected are identical to each other and different from the numbers corresponding to the first language element; the second mask is an array including numbers in one-to-one correspondence with language elements in the first sentence, wherein the numbers corresponding to language elements other than the first language element in the first sentence are identical to each other and different from the numbers corresponding to the first language element,

the third mask is an array including numbers in one-to-one correspondence with language elements in the second sentence, wherein the numbers corresponding to language elements other than the first language element in the second sentence are identical to each other and different from the numbers corresponding to the first language element.

2. The method according to claim 1, wherein the generating a first mask for the text to be corrected containing the first language element according to the text to be corrected containing the first language element, and masking the first language element in the text to be corrected with the first mask to generate a replacement text comprises:

3. The method according to claim 1, wherein the obtaining a second sentence converted from the first sentence by a preset conversion process, wherein the second sentence contains the first language element and contains a language element different from the language element in the first sentence, comprises:

converting the first sentence into an audio signal;

adding noise to the audio signal;

4. The method of claim 1, wherein the error correction model is a Bi-LSTM sequence labeling model.

5. The method of claim 1, wherein the first linguistic element includes at least one linguistic element.

6. A text processing apparatus, comprising:

the system comprises a masking module, a correction module and a correction module, wherein the masking module is configured to generate a first mask for a text to be corrected containing a first language element according to the text to be corrected comprising the first language element, and mask the first language element in the text to be corrected by using the first mask to generate a replacement text;

a correction module configured to replace a first language element in the text to be corrected with the generated target language element to obtain a corrected text,

a training module configured to screen out sentences containing the first language element from the unlabeled data to train the error correction model,

generating third training data from the first language element, the second mask, the first training data, the third mask, and the second training data, and training an error correction model using the third training data,

7. A data processing method, comprising:

performing data mixing according to the first language element, the first mask, the first training data, the second mask and the second training data to generate third training data, and training a preset error correction model by using the third training data,

the first mask is an array comprising numbers corresponding to language elements in a text to be corrected, wherein the numbers corresponding to the language elements except the first language element in the text to be corrected are the same as each other and different from the numbers corresponding to the first language element; the second mask is an array including numbers in one-to-one correspondence with language elements in the first sentence, wherein the numbers corresponding to language elements other than the first language element in the first sentence are identical to each other and different from the numbers corresponding to the first language element.

8. The method of claim 7, wherein the first mask is an array comprising numbers in one-to-one correspondence with language elements in the first sentence, wherein the numbers corresponding to language elements other than the first language element in the first sentence are identical to each other and different from the numbers corresponding to the first language element; the second mask is an array including numbers in one-to-one correspondence with language elements in the second sentence, wherein the numbers corresponding to language elements other than the first language element in the second sentence are identical to each other and different from the numbers corresponding to the first language element.

9. The method of claim 8, wherein generating a first mask for a first statement containing a first language element and masking the first language element in the first statement with the first mask to generate first training data comprises:

10. The method according to any one of claims 7 to 9, wherein the obtaining a second sentence converted from the first sentence by a preset conversion process, wherein the second sentence contains the first language element and contains a language element different from the language element in the first sentence, comprises:

converting the first sentence into an audio signal;

adding noise to the audio signal;

11. The method of claim 10, wherein generating a second mask for the second statement containing the first language element and masking the first language element in the second statement with the second mask to generate second training data comprises:

12. The method of claim 11, wherein generating third training data based on the first language element, the first mask, the first training data, the second mask, and the second training data, and training a preset error correction model using the third training data comprises:

13. The method of claim 12, wherein the predetermined error correction model is a Bi-LSTM sequence labeling model.

14. The method of claim 7, further comprising:

15. The method of claim 14, wherein the error correcting the text to be corrected using the trained error correction model comprises:

16. The method of claim 15, wherein the error correcting the text to be corrected using the trained error correction model further comprises:

17. The method of claim 7, wherein the first linguistic element includes at least one linguistic element.

18. The method of claim 7, prior to generating a first mask for a first statement containing a first language element and masking the first language element in the first statement with the first mask to generate first training data, comprising:

19. A data processing apparatus, comprising:

a training module configured to perform data mixing according to the first language element, the first mask, the first training data, the second mask, and the second training data to generate third training data, and train a preset error correction model using the third training data,

20. A method of speech processing, comprising:

21. The method of claim 20, wherein obtaining a second sentence that is converted from the first sentence by the speech recognition process, wherein the second sentence contains the first linguistic element and contains linguistic elements that are different from the linguistic elements in the first sentence, comprises:

converting the first sentence into an audio signal;

adding noise to the audio signal;

22. The method of claim 20, further comprising:

23. A speech processing apparatus, comprising:

24. An electronic device comprising a memory and a processor; wherein,

the memory is to store one or more computer instructions, wherein the one or more computer instructions are to be executed by the processor to implement the method of any one of claims 1-5, 7-18, 20-22.

25. A readable storage medium having stored thereon computer instructions, which when executed by a processor, implement the method of any one of claims 1-5, 7-18, 20-22.