WO2021082842A1

WO2021082842A1 - Quality perception-based text generation method and apparatus, device, and storage medium

Info

Publication number: WO2021082842A1
Application number: PCT/CN2020/118114
Authority: WO
Inventors: 邓黎明; 庄伯金; 王少军
Original assignee: 平安科技（深圳）有限公司
Priority date: 2019-10-29
Filing date: 2020-09-27
Publication date: 2021-05-06
Also published as: CN111061867B; CN111061867A

Abstract

A quality perception-based text generation method and apparatus, a device, and a storage medium. The method comprises: acquiring a corpus to be processed, and performing multi-threaded processing on said corpus by means of a sequence-to-sequence model to generate a draft text (S10); predicting, according to the draft text by means of a trained quality perception occlusion language model, the position of a word to be replaced in the draft text to obtain a target position of the word to be replaced (S20); predicting the semantics of the target position according to the context of the target position by means of the trained quality perception occlusion language model to obtain a target word corresponding to the target position (S30); and replacing the word to be replaced with the target word by means of the trained quality perception occlusion language model to obtain a first iteration text, using the first iteration text as a new draft text and returning to the step of predicting the position of the word to be replaced in the new draft text according to the new draft text by means of the trained quality perception occlusion language model to obtain the target position of the word to be replaced, until all the words to be replaced in the draft text are replaced, terminating the iteration, and obtaining a target text after iterations (S40). On the basis of artificial intelligence, the text generation quality is improved by means of multiple iterations.

Description

Quality perception-based text generation method, equipment, storage medium and device

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on October 29, 2019, the application number is 201911040951.0, and the invention title is "Quality Perception-based Text Generation Method, Equipment, Storage Medium, and Device", and its entire content Incorporated in this application by reference.

Technical field

This application relates to the technical field of artificial intelligence, and in particular to a method, equipment, storage medium, and device for text generation based on quality perception.

Background technique

The inventor realizes that the existing text generation method is mainly based on the single-round generation method of the sequence-to-sequence model (Seq2seq). In the text generation stage, the model is written verbatim from left to right (or from right to left). To generate, only consider the text information that has been generated before. Once the previous text generation effect is not good, it will have a greater impact on the later generated text, resulting in accumulation of deviations. The current multi-round iteration technology uses a simple update of each word from left to right, and manually sets the iteration rounds, which is equivalent to completely regenerating the entire text. There are three key problems with this method: First, it is impossible to determine which words in the generated text need to be modified and which words can be retained; second, it is impossible to obtain words that are more in line with the context? Third, the manual setting of iteration rounds is very empirical, and it is impossible to clarify the objective conditions for the termination of the iteration, resulting in poor quality of the automatically generated text.

The above content is only used to assist the understanding of the technical solutions of this application, and does not mean that the above content is recognized as prior art.

Summary of the invention

The main purpose of this application is to provide a method, equipment, storage medium and device for text generation based on quality perception, aiming to solve the technical problem of poor quality of automatically generated text in the prior art.

In order to achieve the above objective, the present application provides a text generation method based on quality perception. The text generation method based on quality perception includes the following steps:

Obtain a corpus to be processed, perform multi-thread processing on the corpus to be processed, and generate a text draft through a sequence-to-sequence model;

Predicting the position of the word to be replaced in the draft text according to the trained quality perception occlusion language model to obtain the target position of the word to be replaced;

Predicting the semantics of the target location according to the context information of the target location through the trained quality-aware occlusion language model to obtain the target word corresponding to the target location;

Through the trained quality perception occlusion language model, replace the target word with the word to be replaced to obtain the first iteration text, use the first iteration text as a new text draft, and return to the The new draft of the text predicts the position of the word to be replaced in the new draft of the text through the trained quality perception occlusion language model, and obtains the target position of the word to be replaced, until all the words in the draft are The words to be replaced are all replaced, the iteration is terminated, and the updated target text is obtained.

In addition, in order to achieve the above-mentioned object, this application also proposes a text generation device based on quality perception. The text generation device based on quality perception includes a memory, a processor, and a device stored on the memory and available on the processor. A running text generation program based on quality perception, and the text generation program based on quality perception is configured to implement the following steps:

In addition, in order to achieve the above objective, the present application also proposes a storage medium that stores a quality-perception-based text generation program, and when the quality-perception-based text generation program is executed by a processor, the following steps are implemented:

In addition, in order to achieve the above objective, this application also proposes a text generation device based on quality perception, and the text generation device based on quality perception includes:

A generating module, used to obtain a corpus to be processed, perform multi-thread processing on the corpus to be processed, and generate a text draft through a sequence-to-sequence model;

A prediction module, configured to predict the position of the word to be replaced in the text draft through the trained quality perception occlusion language model according to the text draft, and obtain the target position of the word to be replaced;

The prediction module is further configured to predict the semantics of the target location according to the context information of the target location through the trained quality perception occlusion language model, and obtain the target word corresponding to the target location;

The iteration module is used to replace the target word with the word to be replaced by the trained quality perception occlusion language model to obtain the first iteration text, and use the first iteration text as a new text draft, Return to the step of predicting the position of the word to be replaced in the new text draft through the trained quality perception occlusion language model according to the new text draft, and obtaining the target position of the word to be replaced, until all All the words to be replaced in the draft text are replaced, the iteration is terminated, and the updated target text is obtained.

This application can be based on artificial intelligence and improve the quality of text generation through multiple iterations.

Description of the drawings

FIG. 1 is a schematic structural diagram of a text generation device based on quality perception in a hardware operating environment related to a solution of an embodiment of the present application;

2 is a schematic flowchart of a first embodiment of a text generation method based on quality perception according to this application;

3 is a schematic flowchart of a second embodiment of a text generation method based on quality perception according to this application;

4 is a schematic flowchart of a third embodiment of a text generation method based on quality perception according to this application;

Fig. 5 is a structural block diagram of a first embodiment of a text generation device based on quality perception in this application.

Detailed ways

It should be understood that the specific embodiments described here are only used to explain the application, and not used to limit the application.

The technical solution of this application can be applied to the fields of artificial intelligence, blockchain and/or big data technology, and the data involved, such as text, can be stored in a database, or can be stored in a blockchain, such as distributed through a blockchain Storage, this application is not limited.

Referring to Fig. 1, Fig. 1 is a schematic structural diagram of a text generation device based on quality perception in a hardware operating environment involved in a solution of an embodiment of the application.

As shown in FIG. 1, the text generation device based on quality perception may include a processor 1001, such as a central processing unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Among them, the communication bus 1002 is used to implement connection and communication between these components. The user interface 1003 may include a display screen (Display), and the optional user interface 1003 may also include a standard wired interface and a wireless interface. The wired interface of the user interface 1003 may be a USB interface in this application. The network interface 1004 may optionally include a standard wired interface and a wireless interface (for example, a wireless fidelity (WI-FI) interface). The memory 1005 may be a high-speed random access memory (Random Access Memory, RAM) memory, or a stable memory (Non-volatile Memory, NVM), such as a disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001.

Those skilled in the art can understand that the structure shown in FIG. 1 does not constitute a limitation on the text generation device based on quality perception, and may include more or less components than shown in the figure, or a combination of certain components, or different components. Component arrangement.

As shown in FIG. 1, the memory 1005 as a computer storage medium may include an operating system, a network communication module, a user interface module, and a text generation program based on quality perception.

In the text generation device based on quality perception shown in FIG. 1, the network interface 1004 is mainly used to connect to a back-end server for data communication with the back-end server; the user interface 1003 is mainly used to connect to user equipment; The text generation device calls the quality perception-based text generation program stored in the memory 1005 through the processor 1001, and executes the quality perception-based text generation method provided in the embodiments of the present application.

Based on the above hardware structure, an embodiment of the text generation method based on quality perception of the present application is proposed.

2, which is a schematic flowchart of the first embodiment of the text generation method based on quality perception of this application, and the first embodiment of the text generation method based on quality perception of this application is proposed.

In the first embodiment, the text generation method based on quality perception includes the following steps:

Step S10: Obtain a corpus to be processed, perform multi-thread processing on the corpus to be processed, and generate a text draft through a sequence-to-sequence model.

It should be understood that the execution subject of this embodiment is the text generation device based on quality perception, where the text generation device based on quality perception may be an electronic device such as a smart phone, a personal computer, or a server. This is not restricted. Automatic text generation can be applied to a variety of application scenarios, such as artificial intelligence (AI) to automatically generate lyrics. First, set a keyword and input the keyword into the sequence model, and the sequence to sequence The model generates sentences according to the keywords, outputs the first sentence, and then inputs the first sentence into the sequence-to-sequence model, and the sequence-to-sequence model generates a second sentence according to the first sentence, Then input the second sentence into the sequence to the sequence model, and repeat the process until the text draft is generated. In order to improve efficiency, a multi-threaded processor may be used to perform multi-thread processing on the to-be-processed corpus, thereby generating multiple drafts of the text.

In specific implementation, there are many other application scenarios, such as manual customer service and other scenarios. The user asks questions, performs voice recognition, collects user voice, and converts the user voice into text, that is, the to-be-processed corpus. The content of the corpus to be processed may not accurately express the true intentions conveyed by the video conference. At this time, the corpus to be processed needs to be processed through the sequence to sequence model, sequence to sequence model (Sequence to Sequence network or Encoder Decoder) network, Seq2Seq) is a model composed of two encoders and decoders. The encoder reads the input sequence and outputs a single vector, and the decoder reads the vector to produce the output sequence. Using the seq2seq model, the encoder creates a single vector, and ideally encodes the "meaning" of the input sequence into a single vector-a single point in the N-dimensional space of the sentence, thereby generating the text draft.

It should be noted that the above-mentioned encoding-decoding method has defects in generating text drafts. During the decoding process, the text is generated from left to right (or from right to left) verbatim, and only the text information that has been generated before is considered. Once the previous text generation effect is not good, it will have a greater impact on the later generated text, resulting in deviation accumulation. Therefore, this embodiment proposes a trained quality-aware occlusion language model, which predicts the semantics of the masked word by the position of the masked word, and realizes the prediction by learning the context information of the masked word.

Step S20: Predict the position of the word to be replaced in the text draft through the trained quality perception occlusion language model according to the text draft, and obtain the target position of the word to be replaced.

It is understandable that the draft text includes at least one sentence, and one sentence, two sentences, three sentences or multiple sentences in the draft text can be input into the trained quality perception occlusion language model, and The trained Quality Aware-Masked Language Model (QA-MLM) model predicts the position of the word to be replaced in the draft text according to the context information, for example, the input sentence contains 7 words Words, Sg=[s1, s2, s3, s4, s5, s6, s7], for the 7 words in this sentence, that is, there are 7 classifications, combined with the context to determine whether there are poor quality words, that is Whether there is the word to be replaced, if it is predicted that the position P=2 is a poor quality word, then the target position is P=2.

It should be understood that the trained quality perception occlusion language model is obtained by training the quality perception occlusion language model to be trained, and the quality perception occlusion language model to be trained may be based on an improved bidirectional encoder representation (Bidirectional Encoder Representations from Transformers, BERT) model, the input of the BERT model is two sentences: the first sentence and the second sentence. It can predict whether the next sentence of the first sentence is the second sentence, but it can’t analyze the words in the sentence. The quality of the forecast. In this embodiment, the quality perception occlusion language model to be trained is established; a large amount of standard text is obtained, and characters in the standard text are randomly replaced to obtain the replacement text; according to a large number of pairs of the standard text and the replacement text The quality-aware occlusion language model to be trained is trained to obtain a trained quality-aware occlusion language model. The trained quality-aware occlusion language model can predict whether the quality of each word in the sentence is poor, so that the predicted quality words are replaced. The input is not only two sentences, but also one sentence or three sentences. One sentence or more, the trained quality perception occlusion language model has better quality perception ability.

Step S30: Predict the semantics of the target location according to the context information of the target location through the trained quality perception occlusion language model, and obtain the target word corresponding to the target location.

It should be noted that the masked language model (masked language model, MLM) in the trained quality-aware occlusion language model occludes the words to be replaced at the target location, and fuses the context of the left and right sides of the target location , That is, the context information, predicts the semantics of the target position to be occluded, and predicts a better quality word, that is, the target word.

Step S40: Using the trained quality perception occlusion language model, replace the target word with the word to be replaced to obtain the first iteration text, use the first iteration text as a new text draft, and return to all The steps of predicting the position of the word to be replaced in the new text draft through the trained quality perception occlusion language model according to the new draft text, and obtaining the target position of the word to be replaced, until the text All the words to be replaced in the draft are replaced, the iteration is terminated, and the updated target text is obtained.

It should be understood that replacing the target word with the word to be replaced to obtain the first iteration text, using the first iteration text as a new draft text, and continuing to input the trained quality perception occlusion language model , Through the trained quality perception occlusion language model, predict the position of the word to be replaced in the first iterative text according to the first iterative text to obtain the target position of the word to be replaced; The trained quality-aware occlusion language model is used to predict the semantics of the target location based on the context information to obtain the target word corresponding to the target location; through the trained quality-aware occlusion language model, The target word replaces the word to be replaced, obtains the second iteration text, realizes another iteration, uses the second iteration text as a new draft text, and continues to input the trained quality perception occlusion language model, Until all the words to be replaced in the draft text are replaced, the iteration is terminated, and the target text after the iteration is obtained.

It should be noted that after predicting the target position of the word to be replaced, the method further includes: judging whether the target position is a second preset value; if the target position is not the second preset value, determining that the target position is not the second preset value; If there are unreplaced words to be replaced in the draft text, continue to iterate, execute the trained quality-aware occlusion language model, and predict the semantics of the target location based on the context information to obtain In the step of the target word corresponding to the target position, until the target position is the second preset value, it is determined that all the words to be replaced in the draft text have been replaced, the iteration is terminated, and the iteratively updated target text is obtained . The second preset value is equal to the first preset value, and is used to determine whether there is a word to be replaced in the draft text that is perceived, and if no word to be replaced is perceived, then it is determined that all the words to be replaced in the draft text are to be replaced The words are replaced.

In a specific application, the draft lyric text is iteratively updated through the trained quality-aware occlusion language model to obtain the target lyric text.

During the iterative update, first predict all possible positions of the word to be replaced in the draft text, and then mask the characters in these positions, and input the draft text into the trained quality-aware occlusion language model to predict The corresponding character. Combined with the context, the predicted characters are more suitable than the original characters in terms of semantic consistency and consistency. Therefore, the characters in the draft text are replaced with predicted characters, and an iterative update step is completed. The draft text can be updated multiple iterations until the preset quality perception masking language model predicts a preset end position (P= 0).

In this embodiment, by acquiring the corpus to be processed, the corpus to be processed is multi-threaded, and a text draft is generated through a sequence-to-sequence model. According to the text draft, the trained quality-aware occlusion language model is used to Predict the position of the word to be replaced in the draft text, obtain the target position of the word to be replaced, predict the position, and improve the accuracy of the prediction; through the trained quality perception occlusion language model, according to the target position The contextual context information predicts the semantics of the target location to obtain the target word corresponding to the target location. Combining the contextual context can improve the accuracy of semantic prediction and can predict better quality words; through the training Good quality perception occlusion language model, replace the target word with the word to be replaced, obtain the first iteration text, use the first iteration text as a new text draft, and return the new text according to the The draft predicts the position of the word to be replaced in the new draft text through the trained quality perception occlusion language model, and obtains the target position of the word to be replaced, until all the word to be replaced in the draft text Words are replaced, the iteration is terminated, and the updated target text is obtained. Based on artificial intelligence, the quality of text generation is improved through multiple iterations.

Referring to Figure 3, Figure 3 is a schematic flowchart of the second embodiment of the text generation method based on quality perception of this application. Based on the first embodiment shown in Figure 2 above, a second implementation of the text generation method based on quality perception of this application is proposed. example.

In the second embodiment, before the step S20, the method further includes:

Step S101: Obtain a standard text, perform random replacement of words in the standard text, and obtain a replacement text.

It should be understood that the standard text is a training text with accurate semantic expression, and characters or words in the standard text are randomly replaced, and the text with replacement words or words is the replacement text. Generally, the original words or words in the standard text are the words or words with the best semantic expression quality, and the replaced words or words are the words or words of poor quality.

Further, in this embodiment, the replacement text includes: a first replacement text of a first preset ratio, a second replacement text of a second preset ratio, and a standard text of a third preset ratio;

The step S101 includes:

Through random marking, any word in each sentence in the standard text is selected and randomly replaced with another word to obtain the first replacement text, and the position label of the replaced word is recorded. The first preset ratio is the The ratio of the first replacement text to all the replacement text;

Through random marking, any two words in each sentence in the standard text are selected and randomly replaced with other two words to obtain the second replacement text, and the position label of the replaced word is recorded. The second preset ratio is The proportion of the second replacement text in all the replacement text;

Keep the standard text unchanged, use the standard text as the replacement text, and record the position label as a first preset value, and the third ratio is the ratio of the standard text to all the replacement text.

It should be noted that the first preset ratio, the second preset ratio, and the third preset ratio are set differently according to the training process. Set the ratio, calculate the prediction time to obtain the final predicted text. The shorter the prediction time, the setting of the ratio is beneficial to the training process, so as to determine the best first preset ratio, second preset ratio, and third preset proportion. It is also possible to calculate the similarity between the iterated text after each iteration and the standard text. The higher the similarity, the higher the degree of similarity indicates that the setting of the ratio is conducive to quality perception, so as to determine the best first preset ratio and second preset ratio. Set the ratio and the third preset ratio. For example, the first preset ratio is 60%, the second preset ratio is 20%, and the third preset ratio is 20%. details as follows:

60% of the first replacement text: replace one character with one character through random marking, for example, the original text Sg=[s1,s2,s3,s4,s5,s6,s7] is changed to Sc=[s1,s2,si1, s4, s5, s6, s7], and the position label is p=3, then the replacement text line is Sm=[s1, s2, MASK, s4, s5, s6, s7].

20% of the second replacement text: replace two characters with random tags, for example, the original text Sg=[s1,s2,s3,s4,s5,s6,s7] is changed to Sc=[s1,si1,s3,s4,s5 ,si2,s7], and the position label is p=[2,6], the replacement text is Sm=[s1,MASK,s3,s4,s5,MASK,s7]

20% standard text: Keep the standard text unchanged, then set the position label to 0, that is, Sg=Sc, and the position label is p=0. That is, the first preset value can be set to 0.

Step S102: Establish a to-be-trained quality-aware occlusion language model.

It should be understood that the quality-aware occlusion language model to be trained may be a BERT model based on an improved two-way encoder. The quality-aware occlusion language model to be trained first predicts the position of a poor character, and then predicts the position of the poor character On the character. Training the quality-aware occlusion language model to be trained through a large amount of sample data to obtain the trained quality-aware occlusion language model. Construct the training corpus in the following way, where the replaced position can be expressed as P=[pi1, pi2,..., pir], ir is less than n, n is the total number of characters in the text draft, and the actual characters that are occluded It is si=[si1, si2,..., sir], the number of replaced positions r reflects the learning ability of the quality perception occlusion language model to be trained, and the appropriate r is selected according to the capacity and quality of the model.

Step S103: Training the quality-aware occlusion language model to be trained according to the standard text and the replacement text to obtain a trained quality-aware occlusion language model.

It is understandable that the quality-perceived occlusion language model to be trained is a language model based on the BERT model, and the basic perceptual occlusion language model is used to iterate the replacement text according to the context information of the standard text Update, specifically, predicting the position of the word or word of poor quality in the replacement text (that is, the word to be updated) according to the context information of the standard text, and obtaining the predicted position of the word of poor quality , And then predict the real semantics of the predicted location in combination with the contextual information, that is, predict to obtain a predicted word representing the real semantics, and replace the predicted word with the word to be updated, thereby realizing the update of the replacement text, repeating the above Step, until all the characters or words to be updated in the replacement text are replaced, the iteration stops. The quality-aware occlusion language model to be trained is trained to be the trained quality-aware occlusion language model, and the trained quality-aware occlusion language model can accurately identify the position of the word to be replaced in the draft text, And predict the semantics of the position, that is, the target word with better quality, replace the target word with the word to be replaced, obtain the first iteration text, realize one iteration, and use the first iteration text as the new draft text , And return to the step of predicting the position of the word to be replaced in the draft text, until all the words to be replaced in the draft text are replaced, the iteration is terminated, and the target text is obtained.

Further, the step S103 includes:

According to the first replacement text or the second replacement text through the to-be-trained quality perception occlusion language model, predict the position of the word to be updated in the first replacement text or the second replacement text to obtain Update the predicted position of the word;

Predicting the semantics of the word at the predicted position through the to-be-trained quality-perceived occlusion language model to obtain the predicted word corresponding to the predicted position;

Through the to-be-trained quality perception occlusion language model, the predicted word is replaced by the to-be-updated word, the first predicted text is obtained, and one iteration is realized. The first predicted text is used as the new replacement text and returned to all The step of predicting the position of the word to be updated in the new replacement text according to the new replacement text through the quality perception occlusion language model to be trained, and obtaining the predicted position of the word to be updated, until the first If all the words to be updated in the replacement text or the second replacement text are replaced, the iteration is terminated, the predicted text is obtained, and the quality perception occlusion language model to be trained is trained according to the standard text to obtain the trained quality Perceptual occlusion language model.

It should be understood that taking poetry anthologies as an example of the standard text for explanation. Poetry anthologies include poems of the Tang, Song, Yuan, Ming and Qing dynasties. Approximately 130,525 poems were selected from the poetry corpus, with a total of 905,790 poems , Used for model training and evaluation, each filtered poem contains four or more of the four poem lines, and each poem line contains seven characters. First, use the sequence-to-sequence model to generate a draft of the poem. After the poetry draft text is generated, the to-be-trained quality-aware occlusion language model is used for iterative update. First, predict which character position has the worst semantic quality. If the position is the worst, then integrate the context information to predict the character at that position. In this example, each line of poem has seven characters, and a total of 28 positions in four lines. Add an end position (p=0) to indicate that the whole poem is produced well enough. If the end position is predicted and the quality of the poem is considered to be good enough, the iterative replacement process automatically terminates.

Further, until all the words to be updated in the first replacement text or the second replacement text are replaced, the iteration is terminated, and after the predicted text is obtained, the method further includes:

Calculating the text similarity between the predicted text and the standard text;

Judging whether the text similarity exceeds a preset similarity threshold;

When the text similarity does not exceed the preset similarity threshold, the first ratio, the second ratio, and the third ratio are adjusted to obtain a new first ratio and a new second ratio And the new third ratio;

Training the to-be-trained quality perception occlusion language model according to the replacement text of the new first ratio, the new second ratio, and the new third ratio until the text similarity exceeds the predetermined If the similarity threshold is set, the adjustment of the first ratio, the second ratio, and the third ratio is stopped.

In specific implementation, in order to improve the effectiveness of the training of the quality perception occlusion language model to be trained, the first replacement text of the first preset ratio, the second replacement text of the second preset ratio, and the third preset are set. For the standard text of the ratio, it is also necessary to judge whether the first preset ratio, the second preset ratio, and the third preset ratio are set reasonably according to the quality of the predicted text obtained by training. The preset similarity threshold can be set according to the level of the output text quality requirements in actual applications, for example, the preset similarity threshold is set to 80%.

It should be understood that the word segmentation process is performed on the predicted text and the standard text, all first words of the predicted text and all second words of the standard text are obtained, and the frequency of the first word word frequency reverse file (Term Frequency-Inverse Document Frequency, TF-IDF) value and the TF-IDF value of the second word, and both the predicted text and the standard text are expressed as words composed of the word and the TF-IDF value of the word Vector, calculating the cosine distance between the word vector corresponding to the predicted text and the word vector corresponding to the standard text, and using the cosine distance as the text similarity.

When the text similarity does not exceed the preset similarity threshold, it indicates that the quality perception ability of the trained quality perception occlusion language model is poor at this time, and the first ratio and the second ratio can be reduced. Ratio, increase the third ratio, adjust the first ratio, the second ratio, and the third ratio to obtain a new first ratio, a new second ratio, and a new third ratio, according to The replacement text of the new first ratio, the new second ratio, and the new third ratio trains the to-be-trained quality-aware occlusion language model to obtain a new predictive text, and returns to the computing office The text similarity between the predicted text and the standard text, until the text similarity exceeds the preset similarity threshold, stop comparing the first ratio, the second ratio, and the third ratio Adjustment.

In practical applications, the text generated by the sequence-to-sequence model is:

The lonely spring breeze bird is crazy, and the autumn breeze blows and rains the fragrance of the garden.

The desire to know is to come back late, only the fragrance is accompanied by Diaofang.

The text generated by using the trained quality perception occlusion language model is:

If you want to know that the homeland is coming back late, only Youxiang will accompany the public.

It can be seen that the trained quality-aware occlusion language model can generate better quality text.

In this embodiment, the standard text is obtained, the characters in the standard text are randomly replaced, the replacement text is obtained, the quality perception occlusion language model to be trained is established, and the standard text and the replacement text are compared to the to-be-trained language model. Train the quality-aware occlusion language model for training, obtain a trained quality-aware occlusion language model, mask the position and then predict, realize the prediction by learning all context information, improve the predictive ability of the trained quality-aware occlusion language model, and improve the text Build quality.

Referring to Figure 4, Figure 4 is a schematic flowchart of the third embodiment of the text generation method based on quality perception of this application. Based on the above-mentioned first or second embodiment, the third implementation of the text generation method based on quality perception of this application is proposed. example. This embodiment is described based on the first embodiment.

In the third embodiment, the step S40 includes:

Step S401: Using the trained quality perception occlusion language model, replace the target word with the word to be replaced to obtain the first iteration text, use the first iteration text as a new text draft, and return to all The step of predicting the position of the word to be replaced in the new text draft through the trained quality perception occlusion language model according to the new draft of the text, obtaining the target position of the word to be replaced, and determining the target Whether the position is the second preset value, if the target position is the second preset value, it is determined that all the words to be replaced in the draft text have been replaced, the iteration is terminated, and the iteratively updated target text is obtained .

It should be noted that the second preset value is usually set to 0. When the target position of the word to be replaced is predicted to be 0, it means that all the words in the current text are appropriate, and no further iterative update is required, and it is also reserved in the training language. The real position is 0, that is, 20% of the text corpus has not been randomly replaced, so this part of the corpus is still high-quality text and does not need to be updated iteratively.

For example, the original text is Sg = [s1, s2, s3, s4, s5, s6, s7], randomly replace one of the words to Sc = [s1, s2, si1, s4, s5, s6, s7 ], and the position label is p=3, the replacement text line is Sm=[s1, s2, MASK, s4, s5, s6, s7]. Through the trained quality perception occlusion language model, it is predicted that the target position of the word to be replaced is p=3, the target word is replaced with the word to be replaced, and the first iteration text is obtained. One iteration text is Sg1=[s1, s2, s3, s4, s5, s6, s7], the first iteration text is taken as a new text draft, and the new text draft is returned according to the new text draft through training The quality-aware occlusion language model predicts the position of the word to be replaced in the new draft text, and obtains the target position of the word to be replaced. The new target position is predicted to be P=0, and the judgment 0 is According to the second preset value, it is determined that all the words to be replaced in the draft text have been replaced, the iteration is terminated, and the target text after the iteration is obtained.

Further, before the step S20, the method further includes:

The text draft is vectorized to obtain the input vector of the trained quality-aware occlusion language model.

Correspondingly, the step S20 includes:

According to the input vector through a trained quality perception occlusion language model, the position of the word to be replaced in the input vector is predicted to obtain the target position of the word to be replaced.

It is understandable that the draft text needs to be expressed in a vector form in order to iterate through the preset quality perception occlusion language model to generate a better quality target text. The text draft is expressed in vector form, and the input vector of the trained quality-aware occlusion language model is obtained, so that the position of the word to be replaced in the input vector is performed through the trained quality-aware occlusion language model. Predict and obtain the target position of the word to be replaced.

Correspondingly, the step S30 includes:

The word at the target location is occluded to obtain occluded text, and the trained quality perception occlusion language model is used according to the occluded text, and the context information of the target location is used to compare the target of the occluded text. The semantics of the position are predicted, and the target word corresponding to the target position is obtained.

It is understandable that the occlusion language model in the trained quality perception occlusion language model occludes the word to be replaced at the target location to obtain the occluded text, for example, the text draft Sg=[s1,s2, s3, s4, s5, s6, s7], the target position is p=3, and the word at p=3 is occluded, then the occluded text is Sm=[s1,s2,MASK,s4,s5,s6 ,s7]. The occlusion text is input into the trained quality-aware occlusion language model, and the trained quality-aware occlusion language model is combined with the context of the left and right sides of the target position p=3, that is, the contextual information, Predict the semantics of the target position p=3 in the occluded text, and predict a word with better quality, that is, the target word.

In this embodiment, by judging whether the target position is the second preset value, if the target position is the second preset value, it is determined that all the words to be replaced in the draft text have been replaced, Iterative termination, obtain the updated target text of the iteration, and realize the automatic termination of the iteration, which significantly improves the text generation effect and quality, avoids the iterative process of simply regenerating the existing method from left to right, and also avoids Unable to choose a suitable iteration round, and the amount of calculation is too large.

In addition, an embodiment of the present application also proposes a storage medium, the storage medium stores a quality-perception-based text generation program, and when the quality-perception-based text generation program is executed by a processor, the quality-based The steps of a perceptual text generation method.

Optionally, the storage medium involved in this application may be a computer-readable storage medium, and the storage medium, such as a computer-readable storage medium, may be non-volatile or volatile.

In addition, referring to FIG. 5, an embodiment of the present application also proposes a text generation device based on quality perception, and the text generation device based on quality perception includes:

The generating module 10 is configured to obtain a corpus to be processed, perform multi-thread processing on the corpus to be processed, and generate a text draft through a sequence-to-sequence model.

It should be understood that automatic text generation can be applied to a variety of application scenarios, such as artificial intelligence (AI) to automatically generate lyrics. First, set a keyword and input the keyword into the sequence to the sequence model. The sequence-to-sequence model generates sentences according to the keywords, outputs the first sentence, and then inputs the first sentence into the sequence-to-sequence model, and the sequence-to-sequence model is generated according to the first sentence For the second sentence, input the second sentence into the sequence to the sequence model, and repeat the process until the text draft is generated. In order to improve efficiency, a multi-threaded processor may be used to perform multi-thread processing on the to-be-processed corpus, thereby generating multiple drafts of the text.

The prediction module 20 is configured to predict the position of the word to be replaced in the text draft through the trained quality perception occlusion language model according to the text draft, and obtain the target position of the word to be replaced.

It is understandable that the draft text includes at least one sentence, and one sentence, two sentences, three sentences or multiple sentences in the draft text can be input into the trained quality perception occlusion language model, and The trained Quality Aware-Masked Language Model (QA-MLM) model predicts the position of the word to be replaced in the draft text according to the context information, for example, the input sentence contains 7 words Words, Sg=[s1, s2, s3, s4, s5, s6, s7], for the 7 words in this sentence, that is, there are 7 classifications, combined with the context to determine whether there are poor quality words, that is Whether there is the word to be replaced, if it is predicted that the position P=2 is a word of poor quality, then the target position is P=2.

The prediction module 20 is also used to predict the semantics of the target location according to the context information of the target location through the trained quality perception occlusion language model, and obtain the target word corresponding to the target location .

The iteration module 30 is configured to replace the target word with the word to be replaced by the trained quality perception occlusion language model to obtain the first iteration text, and use the first iteration text as a new text draft , Return to the step of predicting the position of the word to be replaced in the new text draft through the trained quality perception occlusion language model according to the new text draft, and obtaining the target position of the word to be replaced, until All the words to be replaced in the draft text are replaced, the iteration is terminated, and the target text after the iteration is obtained.

In an embodiment, the apparatus for generating text based on quality perception further includes:

The random replacement module is used to obtain the standard text, and randomly replace the words in the standard text to obtain the replacement text;

The establishment module is used to establish the quality perception occlusion language model to be trained;

The training module is configured to train the to-be-trained quality-aware occlusion language model according to the standard text and the replacement text to obtain a trained quality-aware occlusion language model.

In an embodiment, the replacement text includes: a first replacement text of a first preset ratio, a second replacement text of a second preset ratio, and a standard text of a third preset ratio;

The random replacement module is also used to randomly replace any word in each sentence of the standard text with another word to obtain the first replacement text by random marking, and record the position label of the replaced word. The first preset ratio is the ratio of the first replacement text to all replacement texts; through random marking, any two words in each sentence in the standard text are selected and randomly replaced with other two words to obtain the second replacement Text, and record the position label of the word to be replaced, the second preset ratio is the ratio of the second replacement text to all replacement text; keeping the standard text unchanged, using the standard text as the replacement text, The position label is recorded as the first preset value, and the third ratio is the ratio of the standard text to all the replacement text.

In one embodiment, the prediction module 20 is further configured to perform the quality perception occlusion language model for training according to the first replacement text or the second replacement text, and perform the evaluation of the first replacement text or the second replacement text. The position of the word to be updated in the second replacement text is predicted to obtain the predicted position of the word to be updated; the semantics of the word at the predicted position is predicted through the to-be-trained quality perception occlusion language model to obtain the predicted position Predicted words; through the to-be-trained quality perception occlusion language model, the predicted words are replaced by the to-be-updated words to obtain the first predicted text, and one iteration is realized, and the first predicted text is used as the new replacement text , Return to the step of predicting the position of the word to be updated in the new replacement text and obtaining the predicted position of the word to be updated through the quality perception occlusion language model to be trained according to the new replacement text, until all If all the words to be updated in the first replacement text or the second replacement text are replaced, the iteration is terminated, the predicted text is obtained, and the quality perception occlusion language model to be trained is trained according to the standard text to obtain training Good quality perception occlusion language model.

A calculation module for calculating the text similarity between the predicted text and the standard text;

A judging module for judging whether the text similarity exceeds a preset similarity threshold;

The adjustment module is configured to adjust the first ratio, the second ratio, and the third ratio when the text similarity does not exceed the preset similarity threshold to obtain a new first ratio, New second ratio and new third ratio;

The training module is further configured to train the to-be-trained quality perception occlusion language model according to the replacement text of the new first ratio, the new second ratio, and the new third ratio until all If the text similarity exceeds the preset similarity threshold, stop adjusting the first ratio, the second ratio, and the third ratio.

The judgment module is also used to judge whether the target position is a second preset value;

The iteration module 30 is further configured to, if the target position is the second preset value, determine that all the words to be replaced in the draft text have been replaced, the iteration is terminated, and the target text after the iteration is obtained .

In one embodiment, the prediction module 20 is further configured to occlude the characters at the target location to obtain occluded text, and according to the occluded text, pass the trained quality perception occlusion language model in combination with the target The contextual information of the position predicts the semantics of the target position of the occluded text, and obtains the target word corresponding to the target position.

For other embodiments or specific implementations of the text generation device based on quality perception described in this application, reference may be made to the foregoing method embodiments, and details are not described herein again.

It should be noted that in this article, the terms "include", "include" or any other variants thereof are intended to cover non-exclusive inclusion, so that a process, method, article or system including a series of elements not only includes those elements, It also includes other elements that are not explicitly listed, or elements inherent to the process, method, article, or system. Without more restrictions, the element defined by the sentence "including a..." does not exclude the existence of other identical elements in the process, method, article, or system that includes the element.

The serial numbers of the foregoing embodiments of the present application are only for description, and do not represent the advantages and disadvantages of the embodiments. In the unit claims that list several devices, several of these devices may be embodied in the same hardware item. The use of the words first, second, and third does not indicate any order, and these words may be interpreted as signs.

Through the description of the above implementation manners, those skilled in the art can clearly understand that the above-mentioned embodiment method can be implemented by means of software plus the necessary general hardware platform, of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of the present application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as a read-only memory mirror (Read Only)). Memory image, ROM)/Random Access Memory (RAM, magnetic disk, CD-ROM), including several instructions to enable a terminal device (can be a mobile phone, computer, server, air conditioner, or network device Etc.) Perform the methods described in the various embodiments of the present application.

The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent protection of this application.

Claims

A method for generating text based on quality perception, wherein the method for generating text based on quality perception includes the following steps:

Obtain a corpus to be processed, perform multi-thread processing on the corpus to be processed, and generate a text draft through a sequence-to-sequence model;

Predicting the position of the word to be replaced in the draft text according to the trained quality perception occlusion language model to obtain the target position of the word to be replaced;

Predicting the semantics of the target location according to the context information of the target location through the trained quality-aware occlusion language model to obtain the target word corresponding to the target location;

Through the trained quality perception occlusion language model, replace the target word with the word to be replaced to obtain the first iteration text, use the first iteration text as a new text draft, and return to the The new draft of the text predicts the position of the word to be replaced in the new draft of the text through the trained quality perception occlusion language model, and obtains the target position of the word to be replaced, until all the words in the draft are The words to be replaced are all replaced, the iteration is terminated, and the updated target text is obtained.
The method for generating text based on quality perception according to claim 1, wherein the position of the word to be replaced in the draft text is predicted by using a trained quality perception occlusion language model according to the draft text to obtain all Before describing the target position of the word to be replaced, the quality perception-based text generation method further includes:

Obtain the standard text, perform random replacement of words in the standard text to obtain the replacement text;

Establish a language model for training quality perception occlusion;

Training the quality-aware occlusion language model to be trained according to the standard text and the replacement text to obtain a trained quality-aware occlusion language model.
The method for generating text based on quality perception according to claim 2, wherein the replacement text comprises: a first replacement text of a first preset ratio, a second replacement text of a second preset ratio, and a third preset ratio Standard text;

The obtaining the standard text, randomly replacing words in the standard text to obtain the replacement text, includes:

Through random marking, any word in each sentence in the standard text is selected and randomly replaced with another word to obtain the first replacement text, and the position label of the replaced word is recorded. The first preset ratio is the The ratio of the first replacement text to all the replacement text;

Through random marking, any two words in each sentence in the standard text are selected and randomly replaced with other two words to obtain the second replacement text, and the position label of the replaced word is recorded. The second preset ratio is The proportion of the second replacement text in all the replacement text;

Keep the standard text unchanged, use the standard text as the replacement text, and record the position label as a first preset value, and the third ratio is the ratio of the standard text to all the replacement text.
The method for generating quality-perception-based text according to claim 3, wherein said training the quality-aware occlusion language model to be trained according to said standard text and said replacement text to obtain a trained quality-aware occlusion language Models, including:

According to the first replacement text or the second replacement text through the to-be-trained quality perception occlusion language model, predict the position of the word to be updated in the first replacement text or the second replacement text to obtain Update the predicted position of the word;

Predicting the semantics of the word at the predicted position through the to-be-trained quality-perceived occlusion language model to obtain the predicted word corresponding to the predicted position;

Through the to-be-trained quality perception occlusion language model, the predicted word is replaced by the to-be-updated word, the first predicted text is obtained, and one iteration is realized. The first predicted text is used as the new replacement text and returned to all The step of predicting the position of the word to be updated in the new replacement text according to the new replacement text through the quality perception occlusion language model to be trained, and obtaining the predicted position of the word to be updated, until the first If all the words to be updated in the replacement text or the second replacement text are replaced, the iteration is terminated, the predicted text is obtained, and the quality perception occlusion language model to be trained is trained according to the standard text to obtain the trained quality Perceptual occlusion language model.
The method for generating text based on quality perception according to claim 4, wherein, until all the words to be updated in the first replacement text or the second replacement text are replaced, the iteration is terminated, and after the predicted text is obtained, it includes :

Calculating the text similarity between the predicted text and the standard text;

Judging whether the text similarity exceeds a preset similarity threshold;

When the text similarity does not exceed the preset similarity threshold, the first ratio, the second ratio, and the third ratio are adjusted to obtain a new first ratio and a new second ratio And the new third ratio;

Training the to-be-trained quality perception occlusion language model according to the replacement text of the new first ratio, the new second ratio, and the new third ratio until the text similarity exceeds the predetermined If the similarity threshold is set, the adjustment of the first ratio, the second ratio, and the third ratio is stopped.
The method for generating text based on quality perception according to claim 1, wherein all the words to be replaced in the draft text are replaced, the iteration is terminated, and the target text after iterative update is obtained, comprising:

Judging whether the target position is a second preset value;

If the target position is the second preset value, it is determined that all the words to be replaced in the draft text have been replaced, the iteration is terminated, and the target text after the iteration is obtained.
The quality perception-based text generation method according to any one of claims 1-6, wherein the quality perception occlusion language model is trained according to the text draft, and the words to be replaced in the text draft Position prediction and before obtaining the target position of the word to be replaced, the quality perception-based text generation method further includes:

Vectorizing the draft text to obtain an input vector of a trained quality-aware occlusion language model;

The predicting the position of the word to be replaced in the draft text according to the text draft through a trained quality perception occlusion language model to obtain the target position of the word to be replaced includes:

Predicting the position of the word to be replaced in the input vector through the trained quality perception occlusion language model according to the input vector to obtain the target position of the word to be replaced;

The predicting the semantics of the target location according to the context information of the target location through the trained quality perception occlusion language model to obtain the target word corresponding to the target location includes:

The word at the target location is occluded to obtain occluded text, and the trained quality perception occlusion language model is used according to the occluded text, and the context information of the target location is used to compare the target of the occluded text. The semantics of the position are predicted, and the target word corresponding to the target position is obtained.
A text generation device based on quality perception, wherein the text generation device based on quality perception includes: a memory, a processor, and a quality perception-based text generation that is stored in the memory and can run on the processor A program, when the quality perception-based text generation program is executed by the processor, the following steps are implemented:

Obtain a corpus to be processed, perform multi-thread processing on the corpus to be processed, and generate a text draft through a sequence-to-sequence model;

Predicting the position of the word to be replaced in the draft text according to the trained quality perception occlusion language model to obtain the target position of the word to be replaced;

Predicting the semantics of the target location according to the context information of the target location through the trained quality-aware occlusion language model to obtain the target word corresponding to the target location;

Through the trained quality perception occlusion language model, replace the target word with the word to be replaced to obtain the first iteration text, use the first iteration text as a new text draft, and return to the The new draft of the text predicts the position of the word to be replaced in the new draft of the text through the trained quality perception occlusion language model, and obtains the target position of the word to be replaced, until all the words in the draft are The words to be replaced are all replaced, the iteration is terminated, and the updated target text is obtained.
The device for generating text based on quality perception according to claim 8, wherein the position of the word to be replaced in the draft text is predicted by using a trained quality perception occlusion language model according to the draft text, and the result is obtained. Before describing the target position of the word to be replaced, the quality perception-based text generation program is executed by the processor and is also used to implement the following steps:

Obtain the standard text, perform random replacement of words in the standard text to obtain the replacement text;

Establish a language model for quality perception occlusion to be trained;

Training the quality-aware occlusion language model to be trained according to the standard text and the replacement text to obtain a trained quality-aware occlusion language model.
9. The text generation device based on quality perception according to claim 9, wherein the replacement text comprises: a first replacement text of a first preset ratio, a second replacement text of a second preset ratio, and a third preset ratio Standard text;

The standard text is obtained, and words in the standard text are randomly replaced. When the replacement text is obtained, the following steps are specifically implemented:

Through random marking, any word in each sentence in the standard text is selected and randomly replaced with another word to obtain the first replacement text, and the position label of the replaced word is recorded. The first preset ratio is the The ratio of the first replacement text to all the replacement text;

Through random marking, any two words in each sentence in the standard text are selected and randomly replaced with other two words to obtain the second replacement text, and the position label of the replaced word is recorded. The second preset ratio is The proportion of the second replacement text in all the replacement text;

Keep the standard text unchanged, use the standard text as the replacement text, and record the position label as a first preset value, and the third ratio is the ratio of the standard text to all the replacement text.
The text generation device based on quality perception according to claim 10, wherein the training quality perception occlusion language model to be trained is performed according to the standard text and the replacement text to obtain a trained quality perception occlusion language When modeling, implement the following steps:

According to the first replacement text or the second replacement text through the to-be-trained quality perception occlusion language model, predict the position of the word to be updated in the first replacement text or the second replacement text to obtain Update the predicted position of the word;

Predicting the semantics of the word at the predicted position through the to-be-trained quality-perceived occlusion language model to obtain the predicted word corresponding to the predicted position;

Through the to-be-trained quality perception occlusion language model, the predicted word is replaced by the to-be-updated word, the first predicted text is obtained, and one iteration is realized. The first predicted text is used as the new replacement text and returned to all The step of predicting the position of the word to be updated in the new replacement text according to the new replacement text through the quality perception occlusion language model to be trained, and obtaining the predicted position of the word to be updated, until the first If all the words to be updated in the replacement text or the second replacement text are replaced, the iteration is terminated, the predicted text is obtained, and the quality perception occlusion language model to be trained is trained according to the standard text to obtain the trained quality Perceptual occlusion language model.
The text generation device based on quality perception according to claim 11, wherein, until all the words to be updated in the first replacement text or the second replacement text are replaced, the iteration is terminated, and after the predicted text is obtained, the The quality perception-based text generation program executed by the processor is also used to implement the following steps:

Calculating the text similarity between the predicted text and the standard text;

Judging whether the text similarity exceeds a preset similarity threshold;

When the text similarity does not exceed the preset similarity threshold, the first ratio, the second ratio, and the third ratio are adjusted to obtain a new first ratio and a new second ratio And the new third ratio;

Training the to-be-trained quality perception occlusion language model according to the replacement text of the new first ratio, the new second ratio, and the new third ratio until the text similarity exceeds the predetermined If the similarity threshold is set, the adjustment of the first ratio, the second ratio, and the third ratio is stopped.
8. The text generation device based on quality perception according to claim 8, wherein, when all the words to be replaced in the draft text are replaced, the iteration is terminated, and the target text after the iteration is obtained, specifically the following is achieved step:

Judging whether the target position is a second preset value;

If the target position is the second preset value, it is determined that all the words to be replaced in the draft text have been replaced, the iteration is terminated, and the target text after the iteration is obtained.
The text generation device based on quality perception according to any one of claims 8-13, wherein the quality perception occlusion language model is trained according to the text draft, and the words to be replaced in the text draft The position is predicted, and before the target position of the word to be replaced is obtained, the quality perception-based text generation program is executed by the processor and is also used to implement the following steps:

Vectorizing the draft text to obtain an input vector of a trained quality-aware occlusion language model;

When the position of the word to be replaced in the draft text is predicted through the trained quality perception occlusion language model according to the draft text, and the target position of the word to be replaced is obtained, the following steps are specifically implemented:

Predicting the position of the word to be replaced in the input vector through the trained quality perception occlusion language model according to the input vector to obtain the target position of the word to be replaced;

When the trained quality perception occlusion language model predicts the semantics of the target location according to the context information of the target location, and obtains the target word corresponding to the target location, the following steps are specifically implemented:

The word at the target location is occluded to obtain occluded text, and the trained quality perception occlusion language model is used according to the occluded text, and the context information of the target location is used to compare the target of the occluded text. The semantics of the position are predicted, and the target word corresponding to the target position is obtained.
A storage medium, wherein a quality perception-based text generation program is stored on the storage medium, and the following steps are implemented when the quality perception-based text generation program is executed by a processor:

Obtain a corpus to be processed, perform multi-thread processing on the corpus to be processed, and generate a text draft through a sequence-to-sequence model;

Predicting the position of the word to be replaced in the draft text according to the trained quality perception occlusion language model to obtain the target position of the word to be replaced;

Predicting the semantics of the target location according to the context information of the target location through the trained quality-aware occlusion language model to obtain the target word corresponding to the target location;

Through the trained quality perception occlusion language model, replace the target word with the word to be replaced to obtain the first iteration text, use the first iteration text as a new text draft, and return to the The new draft of the text predicts the position of the word to be replaced in the new draft of the text through the trained quality perception occlusion language model, and obtains the target position of the word to be replaced, until all the words in the draft are The words to be replaced are all replaced, the iteration is terminated, and the updated target text is obtained.
The storage medium according to claim 15, wherein the position of the word to be replaced in the draft text is predicted by the trained quality perception occlusion language model according to the draft text, and the information of the word to be replaced is obtained. Before the target location, when the quality perception-based text generation program is executed by the processor, it is also used to implement the following steps:

Obtain the standard text, perform random replacement of words in the standard text to obtain the replacement text;

Establish a language model for training quality perception occlusion;

Training the quality-aware occlusion language model to be trained according to the standard text and the replacement text to obtain a trained quality-aware occlusion language model.
15. The storage medium of claim 16, wherein the replacement text comprises: a first replacement text of a first preset ratio, a second replacement text of a second preset ratio, and a standard text of a third preset ratio;

The standard text is obtained, and words in the standard text are randomly replaced. When the replacement text is obtained, the following steps are specifically implemented:

Through random marking, any word in each sentence in the standard text is selected and randomly replaced with another word to obtain the first replacement text, and the position label of the replaced word is recorded. The first preset ratio is the The ratio of the first replacement text to all the replacement text;

Through random marking, any two words in each sentence in the standard text are selected and randomly replaced with other two words to obtain the second replacement text, and the position label of the replaced word is recorded. The second preset ratio is The proportion of the second replacement text in all the replacement text;

Keep the standard text unchanged, use the standard text as the replacement text, and record the position label as a first preset value, and the third ratio is the ratio of the standard text to all the replacement text.
15. The storage medium according to claim 15, wherein, when all the words to be replaced in the draft text are replaced, the iteration is terminated, and the target text after iterative update is obtained, the following steps are specifically implemented:

Judging whether the target position is a second preset value;

If the target position is the second preset value, it is determined that all the words to be replaced in the draft text have been replaced, the iteration is terminated, and the target text after the iteration is obtained.
The storage medium according to any one of claims 15-18, wherein the position of the word to be replaced in the draft text is predicted by using a trained quality perception occlusion language model according to the draft text to obtain Before the target position of the word to be replaced, when the quality perception-based text generation program is executed by the processor, the following steps are further implemented:

Vectorizing the draft text to obtain an input vector of a trained quality-aware occlusion language model;

When the position of the word to be replaced in the draft text is predicted through the trained quality perception occlusion language model according to the draft text, and the target position of the word to be replaced is obtained, the following steps are specifically implemented:

Predicting the position of the word to be replaced in the input vector through the trained quality perception occlusion language model according to the input vector to obtain the target position of the word to be replaced;

When the trained quality perception occlusion language model predicts the semantics of the target location according to the context information of the target location, and obtains the target word corresponding to the target location, the following steps are specifically implemented:

The word at the target location is occluded to obtain occluded text, and the trained quality perception occlusion language model is used according to the occluded text, and the context information of the target location is used to compare the target of the occluded text. The semantics of the position are predicted, and the target word corresponding to the target position is obtained.
A text generation device based on quality perception, wherein the text generation device based on quality perception includes:

A generating module, used to obtain a corpus to be processed, perform multi-thread processing on the corpus to be processed, and generate a text draft through a sequence-to-sequence model;

A prediction module, configured to predict the position of the word to be replaced in the text draft through the trained quality perception occlusion language model according to the text draft, and obtain the target position of the word to be replaced;

The prediction module is further configured to predict the semantics of the target location according to the context information of the target location through the trained quality perception occlusion language model, and obtain the target word corresponding to the target location;

The iteration module is used to replace the target word with the word to be replaced by the trained quality perception occlusion language model to obtain the first iteration text, and use the first iteration text as a new text draft, Return to the step of predicting the position of the word to be replaced in the new text draft through the trained quality perception occlusion language model according to the new text draft, and obtaining the target position of the word to be replaced, until all All the words to be replaced in the draft text are replaced, the iteration is terminated, and the updated target text is obtained.