CN114925658A

CN114925658A - Open text generation method and storage medium

Info

Publication number: CN114925658A
Application number: CN202210547656.XA
Authority: CN
Inventors: 陈峥; 陶锐文
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-05-18
Filing date: 2022-05-18
Publication date: 2022-08-19
Anticipated expiration: 2042-05-18
Also published as: CN114925658B

Abstract

The invention relates to an open text generation method and a storage medium. In order to solve the problems that in the prior art, text generation faces long text degradation, logic consistency is poor, fact inconsistency and the like, and correction cannot be performed by adjusting parameters of algorithms, the method introduces more control information in the process of the text generation decoding method, trains the decoding model based on a deep neural model by using a language model and linguistic data on the basis of the decoding method, trains the decoding model by fitting an information sequence of the text generated in the language model, and selects from probability distribution according to character information predicted by the decoding model in a generation stage. The method has better technical effects in the aspects of static evaluation indexes and text generation manual evaluation. The invention is suitable for the field of natural language processing.

Description

Open text generation method and storage medium

Technical Field

The invention relates to the field of natural language processing, in particular to an open text generation method and a storage medium.

Background

Generating long texts as realistic as possible has been a goal in the field of artificial intelligence, and has been in wide application demand. While the quality of the generation has been of great interest as a way to generate long text using machines entirely. A generating system capable of generating better text quality at present uses a neural network language model and a decoding algorithm as a generating framework. The process of generating the text by the generating framework comprises two steps: firstly, generating conditional probability distribution by using a language model; the second step selects a character from the probability distribution using a decoding algorithm. While great advances are currently made in language model modeling, decoding algorithms have been reported as technical details in the generation process.

Because of the diversity required to generate text, a random decoding type algorithm is typically used to select characters from the model generated probability distribution. However, the random decoding algorithm has a disadvantage: although the text looks smooth, the text can be recognized by experts or special recognition programs, and the generated long text still has the problems of deviation from the theme, text degradation and contradiction, namely serious text degradation. This is because the random-like decoding algorithm does not play a role in the decoding stage, i.e., handing over more control information to the decoding rather than performing random sampling can improve the quality of text generation.

The most important open text generation and decoding algorithms at present mainly include the following algorithms:

1. complete random sampling: the complete random sampling method utilizes a polynomial sampling method to sample all words in the probability distribution, the sampling result number is the size of a word list, and the weight of each word is the probability in the probability distribution. Due to the fact that low-frequency words with inaccurate modeling are mixed in the sampling process, the readability and the smoothness of texts generated by completely random sampling are poor, and serious text degradation is caused in the generation process.

2. Temperature sampling: the temperature sampling generation method is to sample after reforming probability distribution through temperature parameters. And giving a logic value before the probability distribution Softmax function and a temperature parameter t, and under the control of the temperature parameter t, re-estimating the probability of the character by the Softmax function. The temperature sampling utilizes the nonlinear scaling function of the exponential function to enable the probability distribution to incline to the high probability character, and the possibility that low probability words in the probability distribution are captured by a sampling algorithm is reduced. Experimental analysis has shown that while lowering the value of the temperature parameter t can improve the quality of the generation, it also reduces the diversity, and therefore the probability distribution is usually reshaped by using a temperature sampling method before other sampling algorithms to partially alleviate the text degradation problem in random sampling algorithms.

Top-k sampling: as a popular random sampling algorithm, Top-k sampling can be defined as sampling from characters within the Top k larger possible probability distributions at each time step, based on the relative probabilities in the probability distributions. The TopK sampling algorithm is simple to implement, and although TopK sampling is much higher than text generated by random sampling in a complete probability distribution, different sizes of K are required to be used in different contexts, and dynamic selection of a proper K is a difficult problem.

4. Nuclear sampling: the main idea of the kernel sampling algorithm is to determine the character set to be sampled by using the shape of the probability distribution, thereby solving the problem in TopK sampling. The method in kernel sampling only selects the character with the highest probability, and the size of the sampling set is dynamically adjusted according to the probability distribution generated at each time step. The part sampled is the part with the highest probability in the probability distribution.

Whether the temperature sampling changes the probability distribution or the TopK sampling and the kernel sampling select the characters of a specific target for sampling, the targets are consistent, namely, the influence of the high probability words is improved, and the influence of the low probability words is reduced. This high probability of selection propensity and randomness in the sampling process can result in a persistent reduction in the quality of text generation.

The existing open text generation algorithm mainly utilizes the probability of characters in the probability distribution generated according to a language model to carry out sampling, the generated text has the problems of long text degradation, poor logic consistency, inconsistent facts and the like, and the generated text cannot be corrected by adjusting parameters of the algorithms. It is therefore desirable that the decoding step in the generation process thus has more control information to improve the quality of the text generation.

Disclosure of Invention

In order to solve or alleviate some or all of the technical problems, the invention is realized by the following technical scheme:

an openness text generation method, the openness text generation method includes the following steps:

step 1: and a word encoding step, which is used for preprocessing the text and pre-encoding the text to obtain a word encoding result.

Step 2: and a language model processing step, which is used for generating the probability distribution of the next word according to the result of the word coding.

And step 3: and an intermediate processing step for processing the probability distribution of the next word and the implicit vector sequence of the last decoder layer in the language model to provide to the decoding step to perform training or character prediction.

And 4, step 4: a decoding step for training or character predicting the input provided by the encoding of the text according to the language model processing step to generate the text.

In some embodiments, the language model is based on a Transformer language model.

In one class of embodiments, the intermediate processing step comprises:

substep 1): taking a hidden vector sequence as an input A;

substep 2): calculating to obtain an information quantity sequence of the text according to the character sequence of the text and a next word probability distribution sequence output by a language model, and taking the information quantity sequence of the text as a training target Y;

substep 3): shifting the training target Y to the right and supplementing 0 at the head to obtain an output information sequence B;

substep 4): according to the next word probability distribution sequence, calculating the entropy of each probability distribution to obtain an entropy sequence C;

substep 5): and splicing the input information quantity sequence B and the entropy sequence C to be used as information embedding A of a decoding model, taking the last layer of logic vector output by the language model as a semantic embedding entropy sequence C, and taking information embedding, semantic embedding and training targets as output results of an intermediate processing layer.

In a certain class of embodiments, the decoding step comprises the sub-steps of:

a) performing linear mapping on the input information embedding A and the semantic embedding C, projecting the linear mapping to the same dimension to obtain A 'and C', initializing distance embedding P, and adding the A ', C' and P to obtain a semantic understanding target S of the decoding model;

b) processing the merged semantic understanding target S by utilizing M transform decoding modules comprising a multi-head attention layer and a forward connection layer, wherein M is a positive integer;

c) mapping the semantic vectors finally output by the M transform decoding modules into a one-dimensional predicted information quantity sequence, fitting the one-dimensional predicted information quantity sequence with a real information quantity sequence, and fitting errors by using L1 regular loss;

d) in the generation phase, the probability distribution D from the last word _n The predicted information quantity P with the closest information quantity to the output of the decoding module is selected _n+1 As the generated character.

In some kind of embodiment, in the generation stage, the generated character is obtained as follows: and selecting the nearest K characters from the probability distribution of the next word to sample to obtain candidate characters, wherein K is the user-defined screening number.

In a certain type of embodiment, the method for generating an openness text further includes the following steps: and a washing step, which is used for washing the training text.

In some class of embodiments, text of different lengths is iteratively generated using the open text generation method as described in any of the preceding claims, ending when an end-pointer or user-defined length is encountered.

A storage medium having computer readable code stored thereon, the computer readable code being readable by a processor for performing the method of open text generation of any one of the preceding claims.

Some or all embodiments of the invention have the following beneficial technical effects:

1) the invention introduces more control information in the text generation and decoding method process. A mainstream generation framework and a language model are selected in the open text generation method, a decoding method based on a deep neural model is provided, and the decoding model is trained by utilizing the language model and the linguistic data on the basis. Compared with the prior decoding method, the decoding model has better effects in the aspects of static evaluation indexes and text generation manual evaluation, and is proved to be capable of improving the text generation quality.

2) The method trains a decoding model by fitting an information sequence generated by a text in a language model in a training stage, selects from probability distribution according to character information predicted by the decoding model in a generating stage, and has different decoding methods such as maximum and diversity.

Further advantages will be further described in the preferred embodiments.

The technical solutions/features disclosed above are intended to be summarized in the detailed description, and thus the ranges may not be exactly the same. The technical features disclosed in this section, together with technical features disclosed in the following detailed description and parts of the drawings not explicitly described in the specification, disclose further technical aspects in mutually sensible combination.

The technical scheme combined by all the technical features disclosed at any position of the invention is used for supporting the generalization of the technical scheme, the modification of the patent document and the disclosure of the technical scheme.

Drawings

FIG. 1 is an overall flow of an open text generation method;

FIG. 2 is a detailed flow of language model processing steps;

FIG. 3 is a schematic illustration of an intermediate processing step;

fig. 4 is a schematic diagram of the decoding step.

Detailed Description

Since various alternatives cannot be exhaustively described, the following will clearly and completely describe the gist of the technical solution in the embodiment of the present invention with reference to the drawings in the embodiment of the present invention. It is to be understood that the invention is not limited to the details disclosed herein, which may vary widely from one implementation to another.

In the present invention, "/" at any position indicates a logical "or" unless it is a division meaning. The ordinal numbers such as "first," second, "etc., in any position of the invention are used merely as distinguishing labels in description and do not imply an absolute sequence, either temporally or spatially, or that the terms in such a sequence, and hence the term" similar terms in any other ordinal relation, are necessarily different.

The present invention may be described in terms of various elements combined into various embodiments, which may be combined into various methods, articles of manufacture. In the present invention, even if only the point described when introducing the method/product scheme is described, it means that the corresponding product/method scheme explicitly includes the technical feature.

When a step, a module or a feature is described as being present or included at any position in the invention, the existence of the step, the module or the feature is not implied to be exclusive and only exists, and other embodiments can be fully obtained by the technical scheme disclosed by the invention and other technical means assisted by the technical scheme disclosed by the invention by a person skilled in the art; based on the point described in the embodiments of the present invention, those skilled in the art can fully apply the means of replacement, deletion, addition, combination, and order exchange to some features to obtain a technical solution still following the concept of the present invention. Such solutions are also within the scope of protection of the present invention, without departing from the technical idea of the invention.

Fig. 1 shows the overall flow of the method for generating open text according to the present invention. The method for generating the open text comprises the following steps:

Step 2: a language model processing step for generating a probability distribution (i.e., probability table) of the next word based on the result of the word encoding.

And 3, step 3: and an intermediate processing step for processing the probability distribution of the next word and the implicit vector sequence of the last decoder layer in the language model to provide to the decoding step to perform training or character prediction.

Optionally, the present invention further comprises: and a washing step, which is used for washing the training text.

Further, using the text generation method described above, text of different lengths is iteratively generated, ending when an end-pointer or user-defined length is encountered.

Fig. 2 discloses a specific flow of the foregoing language model processing steps. Before inputting the language model, the method also comprises the following steps:

and cleaning the text corpus, establishing a sub-word list according to a sub-word coding method, and segmenting the input text according to the sub-word list.

And inputting the word segmentation result into the language model according to the word segmentation result. For example, the language model of the present invention is a transform language model, and the following specific language model processing steps are executed based on the language model:

sub-step a): disordering sequences in the training set and sequentially inputting the sequences into a Transformer language model in batches;

substep b): and the Transformer language model executes coding preprocessing operation on the input, converts the input into word embedding, and then combines position embedding information to obtain context embedding.

Substep c): embedding the preprocessed upper and lower words into an input neural network, respectively operating with three weight matrixes to obtain Q, K, V three matrixes, respectively operating Q, K, V through a self-attention module to obtain an attention score matrix between each character and other characters, wherein the operation mode is as follows:

where Q is the target word matrix, K is the keyword matrix, V is the original feature, d _k The dimension of a query vector and a key vector is shown, i is a serial number mark, i is more than or equal to 1 and less than or equal to n, M is a mask matrix of an autoregressive formula, and n is the length of a current sequence;

substep d): will Z _1～n The linear layer is passed into after stitching (concat) to obtain a final output Z with the same dimension as the multi-head attention layer input matrix X.

Substep e): inputting Z into a multilayer perceptron layer P to obtain an intermediate output vector, processing the result by using two layers of full connection layers by a multilayer perceptron module, wherein the output is consistent with the input dimension, then the residual is operated with a regular layer, the residual operation refers to the addition of input and output, the regular operation refers to the regularization processing of the input, namely, the input of neurons is converted into the output X with the mean variance meeting specific distribution:

X＝LayerNorm(P(Z)+Z)

sub-step f): and from the step c) as the beginning of the loop, taking the output of the step e) as the input of the next loop, and ending the loop after executing N times to obtain the hidden vector sequence output O, wherein N is a positive integer.

Substep g): and mapping the hidden layer vector sequence output O into a vector with the size of a word list through a linear layer, and then performing normalization operation by adopting Softmax to obtain a next word probability distribution set D output by the language model.

Those matters not described in detail above are well known to those skilled in the art and will not be described in detail herein. In addition, the network structure and the training process of the Transformer language model and the decoding network can be adjusted, for example, the number of layers of the network can be changed, the dimensionality of each layer of the network can be changed, and the learning rate can be adjusted.

Referring to fig. 3, a schematic illustration of the intermediate processing steps is disclosed. And the intermediate processing layer processes the probability distribution D of the next word output by the language model and the hidden layer vector sequence O to obtain the input and output of the decoding model for training the decoding model. The intermediate processing step comprises the following substeps:

substep 1): the hidden vector sequence is taken as input a.

Substep 2): according to the character sequence I of the text and the next word probability distribution sequence D output by the language model,

and calculating to obtain a text information quantity sequence, and taking the text information quantity sequence as a training target Y.

Substep 3): and shifting the training target Y to the right and supplementing 0 at the head to obtain an input information quantity sequence B.

Substep 4): and according to the probability distribution sequence D of the next word, calculating the entropy of each corresponding probability distribution to obtain an entropy sequence E.

Substep 5): and splicing the input information quantity sequence B and the entropy sequence C to be used as an information embedding A of a decoding model, using the last layer of logistic vector output by the language model as a semantic embedding C, and using a semantic embedding and training target as an output result of an intermediate processing layer.

Referring to fig. 4, a schematic diagram of the decoding step is disclosed. Embedding information output by the intermediate processing layer into A, embedding semantics into C, inputting the information and a training target Y into a decoding module, training the decoding module, predicting the information quantity of the next character according to the input and trained decoding model in the text generation process, and selecting the character closest to prediction from probability distribution as an output text.

The decoding step of the present invention comprises the following substeps:

a) and performing linear mapping on the input information embedding A and the semantic embedding C, projecting the linear mapping to the same dimension to obtain A 'and C', initializing distance embedding P, and adding the A ', C' and P to obtain a semantic understanding target S of the decoding model.

b) And processing the merged semantic understanding target S by utilizing M transform decoding modules comprising a multi-head attention layer and a forward connection layer, wherein M is a positive integer. The structure and processing of this part are the same as those of the language model described above.

c) Mapping the semantic vector finally output by the M transform decoding modules into a one-dimensional predicted information quantity sequence, fitting the one-dimensional predicted information quantity sequence with a real information quantity sequence, and fitting errors by using L1 regular loss to guarantee the stability of fitting, wherein the L1 regular loss is defined as:

wherein, Y represents the real information quantity sequence, P represents the prediction information quantity sequence output by the decoding module, j is the position mark of the current character in the sequence, and n is the length of the sequence.

d) In the generation phase, the probability distribution D from the last word _n The predicted information quantity P with the closest information quantity to the output of the decoding module is selected _n+1 As the generated character, the selected character T _n+1 The definition is as follows:

where n is a serial number.

Optionally, in order to meet the diversity requirement, K characters closest to the probability distribution may be selected from the probability distribution to be sampled to obtain candidate characters, where K is a user-defined filter number, and the character selection under the diversity requirement is as follows:

thus, all steps of the open text generation are completed.

In addition, the invention also discloses a storage medium, wherein the storage medium is stored with a computer readable code, and the processor reads the computer readable code and is used for executing any one of the open text generation methods.

While the present invention has been described with reference to particular features and embodiments thereof, various modifications, combinations, and substitutions may be made thereto without departing from the invention. The scope of the present application is not intended to be limited to the particular embodiments of the process, machine, manufacture, composition of matter, means, methods and steps described in the specification, and the methods and modules may also be implemented in association with, inter-dependent on, inter-compatible with, and/or before/after one or more other products or methods.

Therefore, the specification and drawings should be considered simply as a description of some embodiments of the technical solutions defined by the appended claims, and therefore the appended claims should be interpreted according to the principles of maximum reasonable interpretation and are intended to cover all modifications, variations, combinations, or equivalents within the scope of the disclosure as possible, while avoiding an unreasonable interpretation.

To achieve better technical results or for certain applications, a person skilled in the art may make further improvements on the technical solution based on the present invention. However, even if the partial improvement/design is inventive or/and advanced, the technical idea of the present invention is covered by the technical features defined in the claims, and the technical solution is also within the protection scope of the present invention.

Several technical features mentioned in the attached claims may have alternative technical features or may be rearranged with respect to the order of certain technical processes, materials organization, etc. Those skilled in the art can easily understand the alternative means, or change the sequence of the technical process and the material organization sequence, and then adopt substantially the same means to solve substantially the same technical problems to achieve substantially the same technical effects, so that even if the means or/and the sequence are explicitly defined in the claims, the modifications, changes and substitutions shall fall within the protection scope of the claims according to the equivalent principle.

The method steps or modules described in connection with the embodiments disclosed herein may be embodied in hardware, software, or a combination of both, and the steps and components of the embodiments have been described in a functional generic manner in the foregoing description for the sake of clarity in describing the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application or design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

Claims

1. An open text generation method, characterized in that: the method for generating the open text comprises the following steps:

step 1: a word encoding step, which is used for preprocessing the text and pre-encoding the text to obtain a word encoding result;

step 2: a language model processing step for generating the probability distribution of the next word according to the result of the word encoding;

and step 3: an intermediate processing step for processing the probability distribution of the next word and the implicit vector sequence of the last decoder layer in the language model to provide to the decoding step to perform training or character prediction;

2. The openness text generation method according to claim 1, characterized in that:

the language model is based on a Transformer language model.

3. The open text generation method according to claim 2, characterized in that: the intermediate processing step comprises:

substep 1): taking a hidden vector sequence as an input A;

4. The openness text generation method according to claim 3, characterized in that: the decoding step comprises the sub-steps of:

c) mapping the semantic vector finally output by the M transform decoding modules into a one-dimensional predicted information quantity sequence, fitting the one-dimensional predicted information quantity sequence with a real information quantity sequence, and fitting errors by using L1 regular loss;

5. The openness text generation method according to claim 4, characterized in that:

in the generation stage, the generated character is acquired by the following method: and selecting the nearest K characters from the probability distribution of the next word to sample to obtain candidate characters, wherein K is the user-defined screening number.

6. The open text generation method according to any one of claims 1 to 5, characterized in that: the method for generating the open text further comprises the following steps:

and a washing step, which is used for washing the training texts.

7. The open text generation method according to any one of claims 1 to 5, characterized in that: the method for generating the open text further comprises the following steps:

iteratively generating text of different lengths, using the method of opentext generation of any one of claims 1-5, ending when a terminator or user-defined length is encountered.

8. A storage medium having computer readable code stored thereon, the storage medium characterized by: the computer readable code is read by a processor for performing the open text generation method of any of the preceding claims 1-7.