CN111709234B

CN111709234B - Training method and device for text processing model and electronic equipment

Info

Publication number: CN111709234B
Application number: CN202010465386.9A
Authority: CN
Inventors: 陈亮宇; 刘家辰; 肖欣延
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2020-05-28
Filing date: 2020-05-28
Publication date: 2023-07-25
Anticipated expiration: 2040-05-28
Also published as: CN111709234A

Abstract

The application discloses a training method and device for a text processing model and electronic equipment, and relates to the technical field of natural language processing. The specific implementation scheme is as follows: acquiring an original sentence set, wherein the original sentence set comprises a plurality of original sentences; word segmentation processing is carried out on each primitive sentence so as to determine each entry contained in each primitive sentence; replacing at least one entry in each entry contained in each primitive sentence with a synonym to generate a plurality of replacement sentences corresponding to the primitive sentences respectively; and training the initial text processing model by utilizing the plurality of original sentences and the corresponding plurality of alternative sentences. Therefore, by the training method of the text processing model, the text processing model obtained through training can be used for directly moisturizing the input text without depending on a dictionary, so that the calculated amount is small, and the text moisturizing effect of the text processing model is improved.

Description

Training method and device for text processing model and electronic equipment

Technical Field

The application relates to the technical field of computers, in particular to the technical field of natural language, and provides a training method and device for a text processing model and electronic equipment.

Background

Text rendering is an important technique in assisting writing and can help authors write better words.

In the related art, text rendering is generally achieved by establishing a dictionary and language model combined mode. However, this text rendering method is computationally intensive and highly dependent on dictionary quality, resulting in poor rendering quality.

Disclosure of Invention

A method, apparatus, electronic device, storage medium, and computer program product for text processing model training are provided.

According to an aspect of the present application, there is provided a training method of a text processing model, including: acquiring an original sentence set, wherein the original sentence set comprises a plurality of original sentences; performing word segmentation processing on each primitive sentence to determine each entry contained in each primitive sentence; replacing at least one entry in each entry contained in each primitive sentence with a synonym to generate a plurality of replacement sentences corresponding to the primitive sentences respectively; and training the initial text processing model by utilizing the plurality of original sentences and the corresponding plurality of alternative sentences.

According to another aspect of the present application, there is provided a training device for a text processing model, including: the first acquisition module is used for acquiring an original sentence set, wherein the original sentence set comprises a plurality of original sentences; the determining module is used for carrying out word segmentation processing on each primitive sentence so as to determine each entry contained in each primitive sentence; a replacing module, configured to replace at least one term in each term included in each primitive sentence with a synonym, so as to generate a plurality of replacement sentences corresponding to the plurality of primitive sentences respectively; and the training module is used for training the initial text processing model by utilizing the plurality of original sentences and the corresponding plurality of alternative sentences.

According to still another aspect of the present application, there is provided an electronic apparatus including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a method of training a text processing model as previously described.

According to yet another aspect of the present application, there is provided a non-transitory computer readable storage medium storing computer instructions for causing the computer to perform a training method of a text processing model as described above.

According to a further aspect of the present application, there is provided a computer program product comprising a computer program which, when executed by a processor, implements a method of training a text processing model as described above.

According to the technical scheme, the method for realizing text color rendering through the combination of the dictionary and the language model is high in calculated amount, and the problem of poor color rendering quality due to the fact that the dictionary quality is very dependent is solved. And performing synonym replacement on partial vocabulary entries in each original sentence in the primitive sentence set, respectively generating a plurality of alternative sentences corresponding to each primitive sentence, and generating the original sentences corresponding to the alternative sentences according to the plurality of alternative sentences by using the initial text processing model so as to train the initial text processing model. Therefore, the initial text processing model is utilized to generate high-quality primitive sentences corresponding to the alternative sentences according to the low-quality alternative sentences, and training of the initial text processing model is achieved, so that the text processing model obtained through training can directly moisten an input text without depending on a dictionary, the calculated amount is small, and the text moisten effect of the text processing model is improved.

It should be understood that the description of this section is not intended to identify key or critical features of the embodiments of the application or to delineate the scope of the application. Other features of the present application will become apparent from the description that follows.

Drawings

The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:

fig. 1 is a schematic flow chart of a training method of a text processing model according to an embodiment of the present application;

FIG. 2 is a flowchart of another training method for a text processing model according to an embodiment of the present application;

FIG. 3 is a flowchart of another training method of a text processing model according to an embodiment of the present application;

fig. 4 is a schematic structural diagram of a training device for a text processing model according to an embodiment of the present application;

FIG. 5 is a block diagram of an electronic device for implementing a training method for a text processing model in accordance with an embodiment of the present application.

Detailed Description

Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Aiming at the problems that in the related art, a text color rendering method is realized in a way of combining a dictionary and a language model, the calculated amount is large, the quality of the dictionary is very dependent, and the color rendering quality is poor, the embodiment of the application provides a training method of a text processing model.

The text processing model training method, device, electronic equipment, storage medium and computer program product provided by the application are described in detail below with reference to the accompanying drawings.

Fig. 1 is a flow chart of a training method of a text processing model according to an embodiment of the present application.

As shown in fig. 1, the training method of the text processing model includes the following steps:

step 101, acquiring an original sentence set, wherein the original sentence set comprises a plurality of original sentences.

The primitive sentence may be high-quality corpus data obtained from a network, a literature or the like.

In the embodiment of the application, high-quality corpus data can be obtained from information articles, encyclopedia data, excellent written works and other data to serve as original sentences, and a large number of obtained original sentences are utilized to form an original sentence set.

Step 102, word segmentation processing is performed on each primitive sentence to determine each entry contained in each primitive sentence.

In the embodiment of the application, part of the vocabulary entries in each original sentence can be replaced to generate alternative sentences which are synonymous with the original sentences and have different expression modes, and the alternative sentences are used as the corpus for training the text processing model. Thus, it is possible to first perform word segmentation processing on each primitive sentence in the primitive sentence set, respectively, and to determine the respective entry included in each primitive sentence.

And step 103, replacing at least one entry in each entry contained in each primitive sentence with a synonym to generate a plurality of replacement sentences corresponding to the primitive sentences respectively.

In this embodiment of the present application, after determining the entry included in each primitive sentence, a pre-established synonym library may be used to determine synonyms of part of the entries in each primitive sentence, and then the synonyms are used to replace part of the entries in each primitive sentence, so as to generate a plurality of alternative sentences corresponding to the plurality of primitive sentences respectively. One original sentence may correspond to one alternative sentence, or may correspond to a plurality of alternative sentences.

As a possible implementation manner, a random manner may be adopted, a preset number of terms to be replaced are selected from each primitive sentence, synonyms corresponding to each term to be replaced are obtained from a preset synonym library, and the synonyms corresponding to each term to be replaced are used for replacing the terms to be replaced in each original sentence respectively, so as to generate a replacement sentence corresponding to each primitive sentence.

In practical use, when generating the replacement sentence of the original sentence, the number of the replaced entries in each original sentence may be determined according to the practical need and the specific application scenario, which is not limited in the embodiment of the present application. For example, the number of terms replaced in each original sentence may be 1-2.

And 104, training the initial text processing model by utilizing a plurality of original sentences and a plurality of corresponding alternative sentences.

In the embodiment of the application, after determining the alternative sentence corresponding to each primitive sentence, each alternative sentence may be processed by using the initial text processing model to generate a primitive sentence corresponding to each alternative sentence, and according to the difference between the primitive sentence generated by the initial text processing model and the corresponding primitive sentence in the primitive sentence set, the parameters of the initial text processing model are updated until the performance of the updated text processing model meets the requirements, and then the training process of the text processing model is completed.

According to the technical scheme, partial entries in each original sentence in the primitive sentence set are subjected to synonym replacement, a plurality of alternative sentences corresponding to each primitive sentence are respectively generated, and original sentences corresponding to the alternative sentences are generated according to the plurality of alternative sentences by using the initial text processing model, so that the initial text processing model is trained. Therefore, the initial text processing model is utilized to generate high-quality primitive sentences corresponding to the alternative sentences according to the low-quality alternative sentences, and training of the initial text processing model is achieved, so that the text processing model obtained through training can directly moisten an input text without depending on a dictionary, the calculated amount is small, and the text moisten effect of the text processing model is improved.

In one possible implementation form of the method, the substitution mode of the primitive sentence can be determined according to parameters such as the part of speech of each term contained in the original sentence, the quantity of the contained terms, the quantity of synonyms corresponding to the terms to be substituted and the like, so that the richness of the substitution sentence is improved, and the training effect of the text processing model is further improved.

The training method of the text processing model provided in the embodiment of the present application is further described below with reference to fig. 2.

Fig. 2 is a flow chart of another training method of a text processing model according to an embodiment of the present application.

As shown in fig. 2, the training method of the text processing model includes the following steps:

step 201, an original sentence set is obtained, wherein the original sentence set includes a plurality of original sentences.

Step 202, word segmentation processing is performed on each primitive sentence to determine each entry contained in each primitive sentence.

The specific implementation and principles of the steps 201 to 202 may refer to the detailed description of the embodiments, and are not repeated here.

Step 203, obtaining the part of speech of each entry in each original sentence.

As a possible implementation manner, part of the vocabulary entries in the original sentence may be replaced according to the part of speech of each vocabulary entry included in the original sentence, so as to generate a replacement sentence corresponding to the original sentence. Thus, after word segmentation processing is performed on each primitive sentence, any part-of-speech recognition tool may be used to perform part-of-speech recognition on each term in the original sentence to determine the part-of-speech of each term in each original sentence.

Step 204, determining a plurality of candidate entries contained in each primitive sentence according to the part of speech of each entry.

The candidate vocabulary entry refers to the vocabulary entry with the part of speech meeting the preset condition contained in the original sentence. For example, if the preset condition is "part of speech is a verb and an adjective", it may be determined that candidate terms included in the original sentence are all verbs and adjectives included in the original sentence.

As one possible implementation manner, when the text is moistened, usually, the frequency of the moistened parts of speech such as verbs and adjectives is higher, and the frequency of the moistened parts of speech such as nouns and pronouns is not needed, or the frequency of the moistened parts of speech is lower, so that when a replacement sentence is generated, the words with higher frequency of the moistened parts of speech in the primitive sentence can be replaced, so that the parts of speech of high-frequency moistened parts of speech obtained by training are more concerned by the text processing model, and the text moistening effect of the text processing model is further improved.

Specifically, the part of speech to be replaced may be preset, and then according to the part of speech of each part of speech in each original sentence, the entry which is included in each original sentence and is the part of speech to be replaced is determined as the candidate entry included in each original sentence.

For example, if the preset part of speech to be replaced is a verb and an adjective, the verb and the adjective contained in each original sentence may be determined as candidate entries contained in each original sentence.

In practical use, the part of speech of the candidate entry may be preset according to the practical needs and specific application scenarios, which is not limited in the embodiment of the present application.

At step 205, at least one term of the plurality of candidate terms contained in each original sentence is replaced with a synonym to generate a plurality of alternative sentences corresponding to the plurality of original sentences.

As a possible implementation manner, synonyms corresponding to a plurality of candidate terms contained in each original sentence can be determined according to a preset synonym library, and then all candidate terms are replaced by using the synonyms corresponding to each candidate term, so as to generate replacement sentences corresponding to each primitive sentence.

As another possible implementation manner, a preset number of candidate entries may be selected from the plurality of candidate entries included in each primitive sentence in a random manner, and synonyms corresponding to the selected preset number of candidate entries are selected according to a preset synonym library, and then the selected candidate entries are replaced respectively by using the synonyms corresponding to the selected preset number of candidate entries, so as to generate a replacement sentence corresponding to each primitive sentence.

Further, for primitive sentences of different lengths, it is possible to replace the different numbers of entries contained therein. That is, in one possible implementation manner of the embodiment of the present application, the method may further include:

determining the number N of the to-be-replaced entries corresponding to each primitive sentence according to the number of the entries contained in each primitive sentence, wherein N is a positive integer;

n entries in each original sentence are respectively replaced by corresponding synonyms to generate a plurality of first alternative sentences corresponding to the primitive sentences.

As a possible implementation manner, the number N of the to-be-replaced entries corresponding to the primitive sentence may be positively correlated with the number of the entries contained in the primitive sentence, that is, the greater the number of the entries contained in the primitive sentence, the greater the number N of the to-be-replaced entries corresponding to the primitive sentence.

Optionally, in one possible implementation manner of this embodiment of the present application, a mapping relationship between a range of the number of terms and the number N of terms to be replaced may be preset, so that the number N of terms to be replaced corresponding to each primitive sentence may be determined according to a range to which the number of terms included in each primitive sentence belongs. And then, removing the entry to be replaced from each original sentence in a random mode or the selection mode according to the part of speech, and further determining the synonym corresponding to the entry to be replaced in each original sentence according to a preset synonym library so as to replace the entry to be replaced corresponding to each primitive sentence according to the synonym corresponding to each entry to be replaced, thereby generating a first replacement sentence corresponding to each primitive sentence.

For example, when the mapping relationship between the preset term number range and the term number N to be replaced is that the term number is 10 or less, the term number N to be replaced is 1; when the number of the entries is more than 10 and less than or equal to 20, the number N of the entries to be replaced is 2; when the number of the entries is more than 20 and less than or equal to 30, the number of the entries to be replaced N is 3, and the like, and the number of the entries of the original sentence A is 15, so that the number of the entries to be replaced N corresponding to the original sentence A can be determined to be 2, 2 entries can be selected from the original sentence A to serve as the entries to be replaced, and synonyms of the entries to be replaced are utilized to replace the entries to be replaced, so that a first replacement sentence corresponding to the original sentence A is generated.

Furthermore, for the primitive sentence with longer length, various ways can be adopted to replace the primitive sentence, so as to generate a plurality of alternative sentences corresponding to the original sentence. That is, in one possible implementation manner of the embodiment of the present application, the method may further include:

acquiring the number M of entries in each original sentence, wherein M is a positive integer;

if the number M of the entries contained in any primitive sentence in the original sentence set is larger than a threshold value, respectively replacing i entries in any original sentence with synonyms to generate a second alternative sentence corresponding to any primitive sentence, respectively replacing j entries in any original sentence with synonyms to generate a third alternative sentence corresponding to any primitive sentence, wherein the i entries are different from the j entries.

In this embodiment of the present application, for an original sentence including an entry number M greater than a threshold, the number N of entries to be replaced included in the original sentence may be determined according to the above manner, and then, according to the number N of entries to be replaced, partial entries of the original sentence may be replaced in multiple manners, so as to generate multiple replacement sentences corresponding to the original sentence.

Alternatively, i entries can be selected from N entries to be replaced of the primitive sentence each time for replacement, so as to generate a replacement sentence corresponding to the original sentence, until all the N entries are replaced. Wherein i is a positive integer less than or equal to N.

For example, when N is 3 and i=1, one of 3 terms to be replaced in the original sentence may be replaced each time to generate three second alternative sentences corresponding to the original sentence; when i=2, 2 terms to be replaced in the original sentence can be replaced each time to generate three second alternative sentences corresponding to the original sentence; when i=3, all the 3 entries to be replaced in the original sentence are replaced, so as to generate a second alternative sentence corresponding to the original sentence.

Optionally, when the number M of terms included in the primitive sentence is greater than the threshold, the number N of terms to be replaced that may be determined is also greater, and the number of terms to be replaced is too large, which easily results in incomplete semantic information of the replacement sentence or greater difference from the primitive sentence, so that the training effect of the text processing model is easily caused to be not ideal. Therefore, the threshold value j of the number of the to-be-replaced entries can be preset, so that j entries can be selected from N to-be-replaced entries of the primitive sentence for replacement every time, a plurality of third alternative sentences corresponding to the original sentence are generated, and the N to-be-replaced entries in the original sentence are completely replaced.

For example, if N is 4 and the number of terms to be replaced is 3, 3 terms to be replaced in the original sentence may be replaced each time to generate 4 third alternative sentences corresponding to the original sentence.

Further, for an entry to be replaced with multiple synonyms, different synonyms may be used to replace the entry to be replaced to generate multiple different alternative sentences. That is, in one possible implementation manner of the embodiment of the present application, the method may further include:

and replacing any to-be-replaced entry in any original sentence with one of Y synonyms respectively to generate Y fourth alternative sentences corresponding to any original sentence.

In this embodiment of the present application, if the original sentence includes an entry to be replaced with a plurality of synonyms, each synonym corresponding to the entry to be replaced may be used to replace the entry to be replaced, so as to generate a plurality of fourth alternative sentences corresponding to the original sentence.

For example, if one to-be-replaced entry B in the primitive sentence a has 4 synonyms, the to-be-replaced entry B may be replaced with the 4 synonyms, so as to generate 4 fourth alternative sentences corresponding to the original sentence a.

At step 206, training the initial text processing model by using the plurality of original sentences and the corresponding plurality of alternative sentences.

The specific implementation and principles of step 206 may refer to the detailed description of the embodiments, which is not repeated here.

According to the technical scheme of the embodiment of the application, the mode of generating the alternative sentences corresponding to the primitive sentences is determined according to parameters such as the part of speech of each term contained in the original sentences, the quantity of the contained terms, the quantity of synonyms corresponding to the terms to be replaced and the like, so that the richness of the alternative sentences is improved, and the original sentences corresponding to each alternative sentence are generated according to a plurality of alternative sentences by using an original text processing model, so that the original text processing model is trained. Therefore, low-quality alternative sentences corresponding to high-quality primitive sentences are generated in various modes, so that a corpus for training an initial text processing model is enriched, the text processing model obtained through training can directly moisten an input text without depending on a dictionary, the calculated amount is small, the training effect of the text processing model is further improved, and the text moisten effect of the text processing model is further improved.

In one possible implementation form of the application, when the initial text processing model processes the alternative sentences, the alternative positions in each alternative sentence and the alternative words corresponding to the alternative positions can be predicted so as to carry out color rendering processing on the alternative sentences, so that the initial text processing model can be trained from two aspects of the prediction accuracy of the alternative positions and the prediction accuracy of the alternative words, and the text color rendering effect of the text processing model is improved.

The training method of the text processing model provided in the embodiment of the present application is further described below with reference to fig. 3.

Fig. 3 is a flowchart of a training method of another text processing model according to an embodiment of the present application.

As shown in fig. 3, the training method of the text processing model includes the following steps:

step 301, obtaining an original sentence set, wherein the original sentence set comprises a plurality of original sentences;

step 302, word segmentation processing is performed on each primitive sentence to determine the respective vocabulary entry contained in each primitive sentence.

At step 303, at least one term in each term included in each primitive sentence is replaced by a synonym, so as to generate a plurality of replacement sentences corresponding to the primitive sentences respectively.

The specific implementation process and principle of the above steps 301 to 303 may refer to the detailed description of the above embodiments, which is not repeated here.

And 304, processing each alternative sentence by using the initial text processing model to generate a prediction category label and a prediction alternative word of each word in each alternative sentence.

The prediction category label of the word is a parameter for indicating whether the word needs replacement or not. For example, the prediction category label can comprise two values of 0 and 1, and when the prediction category label of the word is 0, the word is indicated to be not required to be replaced; when the predicted category label of a word is 1, it indicates that the word needs replacement. This will be specifically described below by way of example.

The predicted replacement word of the word is a predicted replacement word for replacing the word when the initial classification model determines that the word needs to be replaced.

In this embodiment of the present application, after the alternative sentence is input into the initial text processing model, the initial text processing model may first perform word segmentation processing on the alternative sentence, further determine whether each word segment in the alternative sentence needs to be replaced, and generate a prediction category label of each word segment according to the determination result. When the initial classification model determines that the predicted class label of the segmented word is 1, that is, when the segmented word is determined to need to be replaced, the predicted replacement word of the segmented word can be selected from a preset word list.

Step 305, determining a loss value of the initial text processing model according to the difference between each original sentence and the corresponding alternative sentence, and the prediction category label and the prediction alternative word of each word in the corresponding alternative sentence.

As a possible implementation manner, the accuracy of the prediction result of the initial text processing model may be checked according to the primitive sentence corresponding to each alternative sentence, and a loss value of the initial text processing model may be generated, so as to train the initial text processing model. That is, in one possible implementation manner of the embodiment of the present application, the step 305 may include:

Determining an actual category label and a target replacement word of each word segmentation in the corresponding replacement sentence according to the difference between each original sentence and the corresponding replacement sentence;

determining a first loss value according to the difference between the actual category label and the predicted category label of each word;

determining a second loss value according to the difference between the target replacement word and the predicted replacement word;

and determining the loss value of the initial text processing model according to the first loss value and the second loss value.

The actual category label of the word is a parameter indicating whether the word is replaced when generating the alternative sentence. For example, the actual category label may include two values of 0 and 1, when the actual category label of the segmented word is 0, it indicates that the segmented word is not a word for generating an alternative sentence to be replaced; when the actual category label of the word segment is 1, it indicates that the word segment is a word replaced when the alternative sentence is generated. This will be specifically described below by way of example.

The target replacement word of the word segmentation refers to the word segmentation of the corresponding position in the original sentence before the original sentence is replaced by the word segmentation. For example, if the original sentence is "the flower is beautiful", and the alternative sentence is "the flower is beautiful", it is possible to determine that the target alternative word of the word "beautiful" is "beautiful".

In the embodiment of the application, whether each word segment in the alternative sentence is a word replaced in the process of generating the alternative sentence can be judged according to the difference between the original sentence and the corresponding alternative sentence, and if so, the actual category label of the word segment can be determined to be 1; if not, the actual category label of the segmentation may be determined to be 0. When the actual category label of a word in the alternative sentence is 1, the word in the corresponding position in the original sentence corresponding to the alternative sentence can be further determined as the target alternative word of the word.

In the embodiment of the application, the performance of the initial text processing model can be reflected from different aspects due to the prediction accuracy of the initial text processing model on the category labels of the segmented words and the prediction accuracy of the segmented words, so that the performance of the initial text processing model can be measured from the two aspects of the label prediction accuracy and the replacement word prediction accuracy of the initial text processing model, and the initial text processing model can be trained.

As one possible implementation, the label confidence of the alternative sentence may be determined according to the difference between the predicted class label and the actual class label of each word segment in the alternative sentence. For example, the ratio of the number of segmented words, in which the predicted category label is the same as the actual category label, in the alternative sentence to the number of segmented words in the alternative sentence may be determined as the label confidence of the alternative sentence, which is not limited in the embodiment of the present application. Further, the label confidence of each alternative sentence is substituted into a preset loss function (such as a cross entropy loss function), and a first loss value of the initial text processing model is determined.

For example, if the alternative sentence a includes 10 words, and the predicted category label of each word of the alternative sentence a is the same as the actual category label, the label confidence of the alternative sentence a may be determined to be 1; if the predicted category labels of the 8 segmented words in the alternative sentence a are the same as the actual category labels, the label confidence of the alternative sentence a may be determined to be 0.8.

As one possible implementation, the semantic similarity between the target replacement word and the predicted replacement word may be utilized to determine the difference between the two; the semantic similarity between the segmented words can be measured by parameters such as distance between word vectors of the segmented words, cosine similarity and the like. Therefore, in the embodiment of the application, the confidence of the replacement word of the replacement sentence can be determined according to the distance or cosine similarity between the word vector of the target replacement word corresponding to each word segment in the replacement sentence and the word vector of the predicted replacement word. For example, a mean value of cosine similarity between the word vector of the target replacement word and the word vector of the predicted replacement word of each word segment in the replacement sentence may be determined as the replacement word confidence of the replacement sentence, which is not limited in the embodiment of the present application. Further, the confidence of the replacement word of each replacement sentence is substituted into a preset loss function (such as a cross entropy loss function), and a second loss value of the initial text processing model is determined.

For example, the alternative sentence a includes two words B and C with a prediction category label of 1, the cosine similarity of the target word B and the predicted word B is 0.8, and the cosine similarity of the target word C and the predicted word C is 0.6, so that the confidence of the word a is 0.7.

As one possible implementation, after determining the first and second penalty values of the initial text processing model, the first and second penalty values may be fused to generate the penalty value of the initial text processing model. For example, a sum of the first penalty value and the second penalty value may be determined as a penalty value for the initial text processing model; the average value of the first loss value and the second loss value may also be determined as the loss value of the initial text processing model, which is not limited in the embodiment of the present application.

And 306, correcting the initial text processing model according to the loss value.

As a possible implementation manner, after determining the loss value of the initial text processing model, whether the loss value is in a preset range or not can be judged, if yes, the performance of the initial text processing model can be determined to meet the requirements, and the training process of the text processing model can be completed; if the loss value is not in the preset range, it can be determined that the performance of the initial text processing model is not in accordance with the requirement, and the loss value can be propagated in the direction so as to correct the parameters of the initial text processing model, so that the corrected text processing model is generated. And then, continuing to repeat the processing process of each alternative sentence by using the corrected text processing model until the loss value is in a preset range, and completing the training process of the text processing model.

In practical use, the method for correcting the text processing model according to the loss value may be determined according to the actual needs and specific application scenarios, which is not limited in the embodiment of the present application. For example, a gradient descent method may be used to modify the text processing model.

As another possible implementation manner, the initial text processing model may include a tag prediction layer and a replacement word prediction layer respectively connected to the feature processing layer, so that the tag prediction layer and the replacement word prediction layer may be alternately trained according to the first loss value and the second loss value to correct the initial text processing model. That is, in one possible implementation manner of the embodiment of the present application, the step 306 may include:

correcting a label prediction layer and a feature processing layer of the initial text processing model according to the first loss value;

and correcting the replacement word prediction layer and the feature processing layer of the initial text processing model according to the second loss value.

The feature processing layer is a layer for representing features of the alternative sentences input into the initial text processing model. For example, the feature processing layer may be a layer that vector-maps the alternative sentence to generate a vector representation corresponding to the alternative sentence.

The label prediction layer predicts the prediction category label of each word in the alternative sentence according to the characteristic representation of the alternative sentence output by the characteristic processing layer.

The replacement word prediction layer predicts the predicted replacement word of the word segment with the prediction category label of 1 in the replacement sentence according to the feature representation of the replacement sentence output by the feature processing layer.

In the embodiment of the application, the accuracy of predicting the prediction category label of the segmentation by the initial text processing model can be reflected by the first loss value, so that the label prediction layer and the feature processing layer of the initial text processing model can be corrected by using the first loss value; the second loss value can reflect the accuracy of predicting the word segmentation prediction replacement words by the initial text processing model, so that the replacement word prediction layer and the feature processing layer of the initial text processing model can be corrected by using the second loss value until the corrected first loss value and second loss value of the text processing model are in a preset range, and the training process of the text processing model is completed.

According to the technical scheme of the embodiment of the application, partial entries in each original sentence in the primitive sentence set are subjected to synonym replacement, a plurality of replacement sentences corresponding to each primitive sentence are respectively generated, a prediction category label and a prediction replacement word corresponding to each word in the replacement sentences are predicted by using the initial text processing model, the label prediction accuracy and the replacement word prediction accuracy of the initial text processing model are checked according to the original sentences, so that a first loss value and a second loss value of the initial text processing model are respectively generated, and a label prediction layer and a replacement word prediction layer of the initial text processing model are respectively corrected according to the first loss value and the second loss value. Therefore, the text processing model is trained from two aspects of label prediction and replacement word prediction, so that the training effect of the text processing model is further improved, and the text color rendering effect of the text processing model is further improved.

In order to achieve the above embodiment, the present application further provides a training device for a text processing model.

Fig. 4 is a schematic structural diagram of a training device for a text processing model according to an embodiment of the present application.

As shown in fig. 4, the training device 40 for a text processing model includes:

a first obtaining module 41, configured to obtain a set of original sentences, where the set of original sentences includes a plurality of original sentences;

a determining module 42, configured to perform word segmentation processing on each primitive sentence, so as to determine each term included in each primitive sentence;

a replacing module 43, configured to replace at least one term in each term included in each primitive sentence with a synonym, so as to generate a plurality of replacement sentences corresponding to the plurality of primitive sentences respectively; and

the training module 44 is configured to train the initial text processing model by using a plurality of original sentences and a corresponding plurality of alternative sentences.

In practical use, the training device for the text processing model provided by the embodiment of the application can be configured in any electronic equipment to execute the training method for the text processing model.

According to the technical scheme of the embodiment of the application, partial vocabulary entries in each original sentence in the primitive sentence set are subjected to synonym replacement, a plurality of alternative sentences corresponding to each primitive sentence are respectively generated, and the original sentences corresponding to the alternative sentences are generated according to the plurality of alternative sentences by using the original text processing model, so that the original text processing model is trained. Therefore, the initial text processing model is utilized to generate high-quality primitive sentences corresponding to the alternative sentences according to the low-quality alternative sentences, and training of the initial text processing model is achieved, so that the text processing model obtained through training can directly moisten an input text without depending on a dictionary, the calculated amount is small, and the text moisten effect of the text processing model is improved.

In one possible implementation form of the present application, the replacing module 43 includes:

the first acquisition unit is used for acquiring the part of speech of each entry in each original sentence;

a first determining unit, configured to determine a plurality of candidate terms included in each primitive sentence according to a part of speech of each term;

and a first replacing unit for replacing at least one term of the plurality of candidate terms contained in each original sentence with a synonym to generate a plurality of replacement sentences corresponding to the plurality of original sentences.

Further, in another possible implementation form of the present application, the replacing module 43 includes:

the second determining unit is used for determining the number N of the vocabulary entries to be replaced corresponding to each primitive sentence according to the number of the vocabulary entries contained in each primitive sentence, wherein N is a positive integer;

and the second replacing unit is used for respectively replacing N entries in each original sentence with corresponding synonyms so as to generate a plurality of first replacing sentences corresponding to the primitive sentences.

Further, in still another possible implementation form of the present application, the replacing module 43 includes:

the second acquisition unit is used for acquiring the number M of entries in each original sentence, wherein M is a positive integer;

And a third replacing unit for replacing i entries in any original sentence with synonyms respectively when the number M of the entries contained in any original sentence in the primitive sentence set is larger than a threshold value to generate a second replacing sentence corresponding to any original sentence, and replacing j entries in any original sentence with synonyms respectively to generate a third replacing sentence corresponding to any original sentence, wherein the i entries are different from the j entries.

Further, in still another possible implementation form of the present application, if any term to be replaced of any one of the primitive sentence sets includes Y synonyms, the replacing module 43 includes:

and the fourth replacing unit is used for replacing any entry to be replaced in any original sentence with one of Y synonyms respectively so as to generate Y fourth alternative sentences corresponding to any original sentence.

Further, in still another possible implementation form of the present application, the training module 44 includes:

the generation unit is used for processing each alternative sentence by using the initial text processing model so as to generate a prediction category label and a prediction alternative word of each word segmentation in each alternative sentence;

the third determining unit is used for determining a loss value of the initial text processing model according to the difference between each original sentence and the corresponding alternative sentence, and the prediction category label and the prediction alternative word of each word in the corresponding alternative sentence; and

And the correction unit is used for correcting the initial text processing model according to the loss value.

Further, in another possible implementation form of the present application, the third determining unit includes:

a first determining subunit, configured to determine, according to the difference between each original sentence and the corresponding alternative sentence, an actual category label and a target alternative word of each word segment in the corresponding alternative sentence;

the second determining subunit is used for determining a first loss value according to the difference between the actual category label and the predicted category label of each word;

a third determining subunit, configured to determine a second loss value according to a difference between the target replacement word and the predicted replacement word;

and the fourth determining subunit is used for determining the loss value of the initial text processing model according to the first loss value and the second loss value.

Further, in still another possible implementation form of the present application, the initial text processing model includes a label prediction layer and a replacement word prediction layer that are respectively connected to the feature processing layer; correspondingly, the correction unit comprises:

the first correction subunit is used for correcting the label prediction layer and the characteristic processing layer of the initial text processing model according to the first loss value;

And the second correction subunit is used for correcting the replacement word prediction layer and the characteristic processing layer of the initial text processing model according to the second loss value.

It should be noted that the foregoing explanation of the embodiment of the training method of the text processing model shown in fig. 1, 2 and 3 is also applicable to the training device 40 of the text processing model of this embodiment, and will not be repeated here.

According to embodiments of the present application, there is also provided an electronic device, a readable storage medium and a computer program product.

As shown in fig. 5, a block diagram of an electronic device is provided for a training method of a text processing model according to an embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.

As shown in fig. 5, the electronic device includes: one or more processors 501, memory 502, and interfaces for connecting components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 501 is illustrated in fig. 5.

Memory 502 is a non-transitory computer readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the training method for the text processing model provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the training method of the text processing model provided by the present application.

The memory 502 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the first acquisition module 41, the determination module 42, the replacement module 43, and the training module 44 shown in fig. 4) corresponding to a training method of a text processing model in an embodiment of the present application. The processor 501 executes various functional applications of the server and data processing, i.e., implements the training method of the text processing model in the above-described method embodiment, by running non-transitory software programs, instructions, and modules stored in the memory 502.

Memory 502 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device of the training method of the text processing model, and the like. In addition, memory 502 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 502 may optionally include memory remotely located with respect to processor 501, which may be connected to the electronic device of the training method of the text processing model via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The electronic device of the training method of the text processing model may further include: an input device 503 and an output device 504. The processor 501, memory 502, input devices 503 and output devices 504 may be connected by a bus or otherwise, for example in fig. 5.

The input device 503 may receive input numeric or character information and generate key signal inputs related to user settings and function control of the electronic device for the training method of the text processing model, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. input devices. The output devices 504 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.

These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), the internet, and blockchain networks.

The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS (Virtual Private Server ) service are overcome. The server may also be a server of a distributed system or a server that incorporates a blockchain.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.

The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims

1. A method of training a text processing model, comprising:

acquiring an original sentence set, wherein the original sentence set comprises a plurality of original sentences;

performing word segmentation processing on each primitive sentence to determine each entry contained in each primitive sentence;

replacing at least one entry in each entry contained in each primitive sentence with a synonym to generate a plurality of replacement sentences corresponding to the primitive sentences respectively; and

training an initial text processing model by utilizing the plurality of original sentences and the plurality of corresponding alternative sentences;

wherein training the initial text processing model by using the plurality of original sentences and the corresponding plurality of alternative sentences comprises:

processing each replacement sentence by using the initial text processing model to generate a prediction category label and a prediction replacement word of each word segmentation in each replacement sentence;

Determining a loss value of the initial text processing model according to the difference between each original sentence and the corresponding alternative sentence, and the prediction category label and the prediction alternative word of each word in the corresponding alternative sentence; and

and correcting the initial text processing model according to the loss value.

2. The method of claim 1, wherein said replacing at least one term of the respective terms contained in each of the primitive sentences with a synonym to generate a plurality of alternative sentences corresponding to the plurality of original sentences comprises:

acquiring the part of speech of each entry in each original sentence;

determining a plurality of candidate entries contained in each primitive sentence according to the part of speech of each entry;

and replacing at least one term of the candidate terms contained in each original sentence with a synonym to generate a plurality of replacement sentences corresponding to the original sentences.

3. The method of claim 1, wherein said replacing at least one of the respective terms contained in each of the primitive sentences with synonyms comprises:

determining the number N of the to-be-replaced vocabulary entries corresponding to each primitive sentence according to the number of the vocabulary entries contained in each primitive sentence, wherein N is a positive integer;

And respectively replacing N entries in each original sentence with corresponding synonyms to generate a plurality of first alternative sentences corresponding to the plurality of original sentences.

4. The method of claim 1, wherein said replacing at least one of the respective terms contained in each of the primitive sentences with synonyms comprises:

and if the number M of the entries contained in any primitive sentence in the primitive sentence set is larger than a threshold value, respectively replacing i entries in any primitive sentence with synonyms to generate a second replacement sentence corresponding to the any primitive sentence, respectively replacing j entries in any primitive sentence with synonyms to generate a third replacement sentence corresponding to the any primitive sentence, wherein the i entries are different from the j entries.

5. The method of claim 1, wherein if any one of the terms to be replaced of any one of the primitive sentences includes Y synonyms, the replacing at least one term of the respective terms included in each of the primitive sentences with a synonym to generate a plurality of replacement sentences corresponding to the plurality of primitive sentences, respectively, comprises:

And respectively replacing any entry to be replaced in any original sentence with one of the Y synonyms to generate Y fourth alternative sentences corresponding to any original sentence.

6. The method of any one of claims 1-5, wherein determining the loss value of the initial text processing model according to the difference between each of the original sentences and the corresponding alternative sentences, and the predicted category labels and the predicted alternatives of each of the segmentations in the corresponding alternative sentences comprises:

determining an actual category label and a target replacement word of each word segment in the corresponding replacement sentence according to the difference between each original sentence and the corresponding replacement sentence;

determining a first loss value according to the difference between the actual category label of each word and the predicted category label;

7. The method of claim 6, wherein the initial text processing model includes a tag prediction layer and a replacement word prediction layer respectively connected to a feature processing layer, and the modifying the initial text processing model according to the loss value includes:

Correcting a label prediction layer and the characteristic processing layer of the initial text processing model according to the first loss value;

and correcting the replacement word prediction layer and the characteristic processing layer of the initial text processing model according to the second loss value.

8. A training device for a text processing model, comprising:

the first acquisition module is used for acquiring an original sentence set, wherein the original sentence set comprises a plurality of original sentences;

the determining module is used for carrying out word segmentation processing on each primitive sentence so as to determine each entry contained in each primitive sentence;

a replacing module, configured to replace at least one term in each term included in each primitive sentence with a synonym, so as to generate a plurality of replacement sentences corresponding to the plurality of primitive sentences respectively; and

the training module is used for training the initial text processing model by utilizing the plurality of original sentences and the corresponding plurality of alternative sentences;

wherein, training module includes:

A third determining unit, configured to determine a loss value of the initial text processing model according to a difference between each original sentence and a corresponding alternative sentence, and a prediction category label and a prediction alternative word of each word in the corresponding alternative sentence; and

9. The apparatus of claim 8, wherein the replacement module comprises:

and a first replacing unit configured to replace at least one term of the plurality of candidate terms contained in each original sentence with a synonym, so as to generate a plurality of alternative sentences corresponding to the plurality of original sentences.

10. The apparatus of claim 8, wherein the replacement module comprises:

the second determining unit is used for determining the number N of the to-be-replaced vocabulary entries corresponding to each primitive sentence according to the number of the vocabulary entries contained in each primitive sentence, wherein N is a positive integer;

11. The apparatus of claim 8, wherein the replacement module comprises:

the second acquisition unit is used for acquiring the number M of the entries in each original sentence, wherein M is a positive integer;

and a third replacing unit, when the number M of the entries contained in any primitive sentence in the primitive sentence set is greater than a threshold value, respectively replacing i entries in any primitive sentence with synonyms to generate a second replacing sentence corresponding to the any primitive sentence, and respectively replacing j entries in any primitive sentence with synonyms to generate a third replacing sentence corresponding to the any primitive sentence, wherein the i entries are different from the j entries.

12. The apparatus of claim 8, wherein if any of the terms to be replaced of any of the original sentences in the set of original sentences includes Y synonyms, the replacing module comprises:

and a fourth replacing unit, configured to replace the any term to be replaced in the any original sentence with one of the Y synonyms, so as to generate Y fourth alternative sentences corresponding to the any original sentence.

13. The apparatus according to any one of claims 8-12, wherein the third determining unit comprises:

a first determining subunit, configured to determine, according to a difference between each original sentence and a corresponding alternative sentence, an actual category label and a target alternative word of each word segment in the corresponding alternative sentence;

the second determining subunit is used for determining a first loss value according to the difference between the actual category label of each word and the predicted category label;

14. The apparatus of claim 13, wherein the initial text processing model includes a tag prediction layer and a replacement word prediction layer respectively connected to a feature processing layer, the correction unit includes:

15. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein, the liquid crystal display device comprises a liquid crystal display device,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-7.

16. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the method of any one of claims 1-7.