CN111538831A

CN111538831A - Text generation method and device and electronic equipment

Info

Publication number: CN111538831A
Application number: CN202010502724.1A
Authority: CN
Inventors: 梁忠平; 温祖杰; 张琳
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-06-05
Filing date: 2020-06-05
Publication date: 2020-08-14
Anticipated expiration: 2040-06-05
Also published as: CN111538831B

Abstract

One or more embodiments of the present specification provide a text generation method, apparatus, and electronic device; the method comprises the steps of designing and constructing a text generation model, wherein the text generation model comprises an encoder and a decoder, the encoder encodes input text based on the structure of the encoder and the decoder, the probability that output words come from the input text is determined based on self-attention characteristics of all words included in the input text generated correspondingly by a self-attention mechanism, the probability that the output words come from a dictionary and the influence of the output words of the previous step on the output probability of the output words of the current step are determined by combining the self-attention characteristics generated by the encoder at the current step and the self-attention characteristics generated by the decoder at the previous step, and the decoder outputs the output words step by step to finally obtain the output text.

Description

Text generation method and device and electronic equipment

Technical Field

One or more embodiments of the present disclosure relate to the field of natural language processing technologies, and in particular, to a text generation method and apparatus, and an electronic device.

Background

Text generation is a widely used natural language processing technique, and can be applied to many natural language processing tasks, such as question answering system, chat system creation, and the like. In many application scenarios, there are cases where continuous content output is performed with a part of the content of the input text, that is, the generated output text is a certain continuous segment in the input text. For example, in a reading and understanding task, the output text should often be some continuous segment of text in the input text. However, the existing language model does not address the above-described case of continuous content output, and thus the generated text is not accurate enough when the task of continuous content output is performed.

Disclosure of Invention

In view of this, one or more embodiments of the present disclosure are directed to a text generation method, a text generation device, and an electronic device.

In view of the above, one or more embodiments of the present specification provide a text generation method, including:

acquiring an input text;

inputting the input text into a pre-trained text generation model so that the text generation model generates an output text corresponding to the input text; the output text comprises a plurality of output words which are gradually output by the text generation model;

wherein the text generation model comprises an encoder, a decoder, a pointer network and a probability prediction network; the text generation model outputs an output word at each step, and the method comprises the following steps:

causing the encoder to generate first self-attention features for respective words in the input text based on a self-attention mechanism;

causing the decoder to generate a second self-attention feature for the current step based on a self-attention mechanism;

inputting the first self-attention feature and the second self-attention feature of the current step into the pointer network, so that the pointer network generates a first output probability distribution that the output words of the current step correspond to each word in the input text;

inputting the second self-attention feature of the current step into the probability prediction network so that the probability prediction network generates second output probability distribution that the output words of the current step correspond to all words in a preset dictionary;

acquiring a first output probability distribution of a previous step generated by the decoder;

and determining and outputting the output word of the current step according to the first output probability distribution, the second output probability distribution and the first output probability distribution of the previous step.

Based on the same inventive concept, one or more embodiments of the present specification further provide a text generation apparatus, including:

an acquisition module configured to acquire an input text;

a generation module configured to input the input text into a pre-trained text generation model, so that the text generation model generates an output text corresponding to the input text; the output text comprises a plurality of output words which are gradually output by the text generation model;

wherein the text generation model comprises an encoder, a decoder, a pointer network and a probability prediction network; the text generation model outputs an output word at each step, and the method comprises the following steps: causing the encoder to generate first self-attention features for respective words in the input text based on a self-attention mechanism; causing the decoder to generate a second self-attention feature for the current step based on a self-attention mechanism; inputting the first self-attention feature and the second self-attention feature of the current step into the pointer network, so that the pointer network generates a first output probability distribution that the output words of the current step correspond to each word in the input text; inputting the second self-attention feature of the current step into the probability prediction network so that the probability prediction network generates second output probability distribution that the output words of the current step correspond to all words in a preset dictionary; acquiring a first output probability distribution of a previous step generated by the decoder; and determining and outputting the output word of the current step according to the first output probability distribution, the second output probability distribution and the first output probability distribution of the previous step.

Based on the same inventive concept, one or more embodiments of the present specification further provide an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the text generation method as described in any one of the above items when executing the program.

As can be seen from the foregoing, in the text generation method, apparatus and electronic device provided in one or more embodiments of the present specification, a text generation model is designed and constructed, where the text generation model uses a structure of an encoder and a decoder, encodes an input text, and determines a probability that an output word is from the input text based on a self-attention feature of each word included in the input text generated by a self-attention mechanism, and further determines, in combination with the self-attention feature generated by the encoder at a current step and the self-attention feature generated by the decoder at a previous step, a probability that the output word is from a dictionary and an influence of the output word of the previous step on an output probability of the output word of the current step, and the decoder outputs the output word step by step to finally obtain the output text. Through a self-attention mechanism, the interaction between each word in the input text and each output word is effectively embodied; meanwhile, the self-attention feature generated in the previous step is also used when the output word in the current step is generated, so that the influence of the output word in the previous step on the current step can be additionally considered when the text generation model processes a continuous content output task, the actual condition of outputting with continuous content in the input text is reflected, and the accuracy of the output text can be effectively improved.

Drawings

In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only one or more embodiments of the present specification, and that other drawings may be obtained by those skilled in the art without inventive effort from these drawings.

FIG. 1 is a flow diagram of a text generation method in accordance with one or more embodiments of the present disclosure;

FIG. 2 is a schematic diagram of a text generation model in accordance with one or more embodiments of the present disclosure;

FIG. 3 is a schematic structural diagram of a text generation apparatus according to one or more embodiments of the present disclosure;

fig. 4 is a schematic structural diagram of an electronic device according to one or more embodiments of the present disclosure.

Detailed Description

For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.

It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in one or more embodiments of the specification is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.

As described in the background section, in a natural language processing task, there is a case where continuous content output is performed with a part of content in an input text, that is, an output text is a certain continuous segment in the input text (hereinafter, simply referred to as continuous content output for simplified representation); such as reading and understanding tasks, summary generation tasks, and the like, are common. In the process of implementing the present disclosure, the applicant finds that the existing language model generally has a problem of poor accuracy of the generated output text when processing a continuous content output task. The main reason why the accuracy of the existing language model continuous content output task is poor is that the existing language model only considers the context, but does not take special consideration for continuous content output, which causes the accuracy of the generated output text to be poor when the existing language model processes the continuous content output task.

In view of the foregoing problems, one or more embodiments of the present specification provide a text generation scheme, which is designed to construct a text generation model, where the text generation model uses a structure of an encoder and a decoder, encodes an input text, and based on a self-attention feature of each word included in the input text, and further combines the self-attention feature generated by the encoder at a current step and the self-attention feature generated by the decoder at a previous step, the decoder outputs an output word step by step to obtain an output text. Through a self-attention mechanism, the interaction between each word in the input text and each output word is effectively embodied; meanwhile, the self-attention feature generated in the previous step is also used when the output word in the current step is generated, so that the influence of the output word in the previous step on the current step can be additionally considered when the text generation model processes a continuous content output task, the actual condition of outputting with continuous content in the input text is reflected, and the accuracy of the output text can be effectively improved.

Hereinafter, the text generation scheme of one or more embodiments of the present specification will be described in detail by specific examples.

One or more embodiments of the present specification provide a text generation method. Referring to fig. 1, the text generation method includes the following steps:

s101, acquiring an input text;

step S102, inputting the input text into a pre-trained text generation model so that the text generation model generates an output text corresponding to the input text.

In this embodiment, an input text is first acquired. The method comprises the steps of carrying out word segmentation processing on an input text to obtain a plurality of words through division, and arranging the words into a word sequence according to the input sequence of each word in the input text.

In this embodiment, a pre-trained text generation model is used. The text generation model is a language processing model adopting a coder-decoder structure, the input of the language processing model is a word sequence, the output of the language processing model is output words which are output step by step, and all the output words form the output text.

Specifically, referring to fig. 2, the text generation model may include: input layer 201, encoder 202, decoder 203, and output layer 207. The input layer 201 is configured to input a text, and perform word embedding processing on the input text to obtain a word vector. The output words output by the output layer 207 at each step of the text generation model are determined from first self-attention features generated by the encoder 202 for the respective words in the input text based on the self-attention mechanism, second self-attention features generated by the decoder 203 at the current step based on the self-attention mechanism, and second self-attention features of a previous step of the decoder 203. The second self-attention feature of the previous step refers to a second self-attention feature generated when the decoder 203 outputs the output word of the previous step.

In this embodiment, the encoder 202 and the decoder 203 may employ a recurrent neural network, specifically, a long-short term memory network, a gated recurrent unit neural network, or the like. The encoder 202 and the decoder 203 are both trained in advance, and training samples used for training can be a large number of different input texts for training and target output texts corresponding to the input texts for training, specifically, word vectors of all words in the input texts for training are used as input, the output texts are used as corresponding targets, and any machine learning algorithm is adopted for training to obtain the trained encoder 202 and decoder 203.

The input text is input to the text generation model of the present embodiment, which is capable of outputting the output text corresponding to the input text. Referring to fig. 2, the text generation model specific processing procedure may include the following:

the obtained input text is input into the input layer 201, specifically, the input layer 201 is a sequence of words which are obtained by segmenting the input text and are arranged in sequence, and each word can be encoded into a vector form in a one-hot manner. The input layer 201 performs word embedding processing on the input text to extract the features of each word to obtain a word vector of each word in the input text, such as the word vector in fig. 2V ^XAs shown. The algorithm used in the Word embedding process may be arbitrarily selected, such as Word2Vec, GloVe, and the like.

The word vectors of the words in the input text are gradually input into the encoder 202, and the encoder 202 gradually generates a first hidden state for each word, wherein the first hidden state is used for representing the comprehensive semantics of the words in the current step and the words in all the previous steps. The first hidden state is a vector whose dimension is equal to the number of neurons included in the hidden layer of the encoder 202, and whose value in each dimension is an output value of an activation function of each neuron. Further, the Self-Attention processing is performed on the first hidden state of each word based on a Self-Attention mechanism (Self-Attention) to obtain a first Self-Attention feature of each word in the input text, such as the first Self-Attention feature of fig. 2H ^XAs shown, the self-attention process is performed as indicated by the double-headed arrow in the encoder 202 in fig. 2. Specifically, for any word, performing dot product calculation on the first hidden state of the word and the first hidden states of the word and other words respectively to obtain a plurality of dot product values in one-to-one correspondence with the first hidden states of the words, and performing weighted summation on the plurality of dot product values respectively serving as weights of the first hidden states of the corresponding words to obtain a first self-attention feature of the word; the self-attention processing is carried out on each word, and the first self-attention characteristics corresponding to each word in the input text can be obtained. The first self-attention feature obtained through the self-attention processing can reflect the influence of each word in the input text on each other.

For the first step of the decoder 203, the first hidden state generated by the last step of the encoder 202 is input into the decoder 203, and since the decoder 203 does not generate the previous step at this time, a start symbol is simultaneously input into the decoder 203, and the decoder 203 can generate a second hidden state corresponding to the first output word according to the first hidden state and the start symbol generated by the last step of the encoder 202. For each step after the first step, the decoder 203 can generate the second hidden state of the current step according to the second hidden state generated in the previous step and the output word generated in the previous step.

Specifically, for the output word of the current step, the decoder 203 obtains the current step and the second hidden state generated in each previous step, and performs self-attention processing on the second hidden state of the current step based on a self-attention mechanism. And respectively carrying out dot product calculation on the second hidden state of the current step and the second hidden states generated by the current step and each step before the current step to obtain a plurality of dot product values corresponding to the second hidden states one by one, and respectively taking the dot product values as the weights of the second hidden states and carrying out weighted summation to obtain the second self-attention feature of the current step. Wherein the second hidden state is used to represent the integrated semantics of the output words of the current step and the output words of all previous steps, as shown in FIG. 2H ^RShown, where the pictorial object with diagonal lines represents the second self-attention feature of the current step. The second hidden state is a vector whose dimension is equal to the number of neurons included in the hidden layer of the decoder 203, and whose value in each dimension is the output value of the activation function of each neuron. The second self-attention feature obtained by the self-attention processing can reflect the influence of the output word already output and the current output word on each other.

In this embodiment, the second self-attention feature generated by the decoder 203 does not directly determine the output word to be output at the current step, and further processing needs to be performed through a text generation model. Referring to fig. 2, the text generation model of the present embodiment further includes: a pointer network 204, a probabilistic predictive network 205, and a perceptron 206. For the third self-attention feature generated by the decoder 203 at the current step, the following process is also performed:

inputting the first self-attention feature and the second self-attention feature of the current step into the pointer network 204, such that the pointer network 204 is enabled to operate according to the first self-attention feature and the second self-attention feature of the current stepAttention features, first output probability distributions are generated for the output words of the current step corresponding to the words in the input text. In particular, a pointer network is characterized by having an output as one of the inputs, and is capable of generating a probability magnitude, i.e., a probability distribution, that the output corresponds to each of the inputs. In this embodiment, the processing of the first self-attention feature and the second self-attention feature of the current step by the pointer network 204 is as follows: calculating the dot product of the second self-attention features generated by the decoder 203 at the current step and the first self-attention features of the words in the input text, and then normalizing by a Softmax function (making the sum of the probability values in the first output probability distribution be 1), so as to obtain a first output probability distribution that the output words at the current step correspond to the words in the input text, as shown in fig. 2p ^inputShown; wherein a higher height of a columnar object indicates a larger value thereof. For any word in the historical dialog text, the dot product value of the corresponding first self-attention feature and the second self-attention feature generated by the decoder 205 at the current step is the probability value that the output word generated at the current step is the word.

Inputting the second self-attention feature of the current step into the probability prediction network 205, so that the probability prediction network 205 generates a second output probability distribution that the output word of the current step corresponds to each word in the preset dictionary according to the second self-attention feature of the current step. The probabilistic predictive network 205 is obtained by pre-training, and the training sample is a corpus for training, which includes a text for training and a corresponding target text, and is trained by an arbitrary machine learning algorithm. The probability prediction network 205 can predict the probability distribution of the output words as each word in the preset dictionary. It is understood that the output word predicted by the probabilistic prediction network 205 is not limited to a word in the input text, but a word from a predetermined dictionary. In this embodiment, the second output probability distribution reflects the probability that the output word of the current step comes from each word in the preset dictionary, that is, the case corresponding to the discontinuous content output.

Inputting the first output probability distribution, the first self-attention feature and the second self-attention feature of the current step into the perceptron 206, and performing a three-classification process by the perceptron 206 with the purpose of predicting three cases of the output word generated by the current step; the output word of the current step is from the input text, the output word of the current step is from the preset dictionary, and the output word of the current step is continuous with the output word of the previous step, namely the output word of the current step is the next word of the output word of the previous step in the input text. Specifically, the perceptron 206 may select a single-hidden-layer artificial neural network or a multi-hidden-layer artificial neural network, which takes the context feature constructed based on the first self-attention feature and the first output probability distribution as input, and takes the probability that the output word generated at the current step is the above three cases as output.

In this embodiment, the context feature is constructed in the following manner: and taking the first output probability distribution as a weight, and carrying out weighted summation on the first self-attention characteristics to obtain the context characteristics. Specifically, each probability value and the first self-attention feature included in the first output probability distribution are in one-to-one correspondence with each word in the input text, so that each probability value and the first self-attention feature included in the first output probability distribution are in one-to-one correspondence; for each first self-attention feature, multiplying the probability value in the corresponding first output probability distribution, and then summing to obtain a vector, the obtained vector is used as the context feature, such as the one in fig. 2CAs shown.

The sensor 206 predicts according to the input context feature, and generates a first output weight, a second output weight and a third output weight after normalization by the Softmax function, referring to fig. 2q ^input、q ^vocabAndq ^copy. Wherein the first output weight value represents the probability that the output word of the current step is from the input text; the second output weight value represents the probability that the output word of the current step comes from the dictionary; the third output weight value represents the probability that the output word of the current step is the next word of the output word of the previous step in the input text.

Inputting the first output weight, the first output probability distribution, the second output weight, the second output probability distribution and the third output weight into the output layer 207, and generating the final output probability distribution that the output word of the current step corresponds to each word in the input text and the dictionary by the output layer 207. In this embodiment, the process of generating the final output probability distribution includes:

and for the first output weight and the first output probability distribution, multiplying the first output weight by the first output probability distribution, namely multiplying each probability included in the first output probability distribution by the first output weight to obtain the first weighted output probability distribution. The first weighted output probability distribution includes: a first weighted output probability for each word in the input text.

And acquiring first output probability distribution of the previous step, and multiplying the third output weight by the first output probability distribution of the previous step to obtain the continuous output probability of each word in the input text. Then, for any word in the input text, the first weighted output probability of the word is added to the continuous output probability of the word preceding the word in the input text to obtain a combined output probability distribution.

The process of obtaining a combined output probability distribution is illustrated below by way of a specific example: the input text includes three words. The first output probability distribution generated by pointer network 204 is (0.6, 0.3, 0.1) and the first output probability distribution of the previous step is (0.5, 0.3, 0.2). The first output weight predicted by the sensor 206 is 0.2, and the third output weight is 0.7. The first weighted output probability distribution is: 0.2 = (0.6, 0.3, 0.1) = (0.12, 0.06, 0.02). The continuous output probability of each word is: 0.7 = (0.5, 0.3, 0.2) = (0.35, 0.21, 0.04). For each word, adding its first weighted output probability to the continuous output probability of the word preceding the word in the input text, i.e.: (0.12, 0.06+0.35,0.02+ 0.21) = (0.12, 0.41, 0.23), resulting in a combined output probability distribution.

And multiplying the second output weight by the second output probability distribution, namely multiplying each probability included in the second output probability distribution by the second output weight to obtain a second weighted output probability distribution.

Outputting the obtained first weightThe probability distribution, the second weighted output probability distribution and the combined output probability distribution are used as the final output probability distribution. The final output probability distribution includes probabilities that output words corresponding to the current step are from the input text, output words of the current step are from the preset dictionary, and the output words of the current step are continuous with the output words of the previous step, and in these three cases, each word in the output text and each word in the dictionary are output as the output words of the current step. Final output probability distribution referring to FIG. 2p ^finalAs shown, the higher the height of the columnar object, the larger the value thereof, that is, the higher the probability that the corresponding word is output as an output word.

And finally, taking the word corresponding to the maximum value in the final output probability distribution as the output word output at the current step.

It can be seen that, the text generation method of this embodiment designs and constructs a text generation model for an application scenario of continuous content output, and accordingly generates a self-attention feature of each word included in an input text based on a self-attention mechanism by an encoder, combining the self-attention feature generated by the current step of the decoder and the self-attention feature of the previous step, so that the text generation model can comprehensively consider three situations that the output word of the current step is from the input text, the output word of the current step is from the preset dictionary and the output word of the current step and the output word of the previous step are continuous, especially, the influence of the output words of the previous step on the current step reflects the actual situation of outputting with continuous content in the input text, the accuracy of the output text can be effectively improved when the executed text generation task of continuous content output is executed. In addition, the text generation model is an end-to-end model, so that the training and the use are more convenient, and the error is smaller.

It should be noted that, in order to ensure the simplicity of fig. 2, the reference numerals in fig. 2 for the technical features such as the first self-attention feature, the second self-attention feature, the first output probability distribution, the second output probability distribution, and the final output probability distribution in this embodiment refer to only one illustration object, and there are a plurality of technical features, and the specific number of the technical features should be the same as the number of words included in the corresponding text. That is, the same pictorial objects in FIG. 2 that are within the same component of the text generation model all represent corresponding technical features.

It should be noted that the method of one or more embodiments of the present disclosure may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of one or more embodiments of the present disclosure, and the devices may interact with each other to complete the method.

It should be noted that the above description describes certain embodiments of the present disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.

Based on the same inventive concept, referring to fig. 3, one or more embodiments of the present specification further provide a text generation apparatus, including:

an obtaining module 301 configured to obtain an input text;

a generation module 302 configured to input the input text into a pre-trained text generation model, so that the text generation model generates an output text corresponding to the input text; the output text comprises a plurality of output words which are gradually output by the text generation model;

For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.

The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.

As an optional embodiment, the text generation model further comprises an input layer;

the generating module 302 is specifically configured to input the input text into the input layer, so that the input layer generates a word vector of each word in the input text; and inputting the word vector into the encoder so that the encoder generates a first hidden state for each word in the input text respectively, and generating the first self-attention feature for each word in the input text respectively according to the first hidden state based on a self-attention mechanism.

As an optional embodiment, the generating module 302 is specifically configured to enable the decoder to obtain a current step and a second hidden state generated in each previous step, and generate a second self-attention feature of the current step according to the second hidden state based on a self-attention mechanism.

As an optional embodiment, the text generation model further comprises a perceptron;

the generating module 302 is specifically configured to input the first output probability distribution, the first self-attention feature, and the second self-attention feature of the current step into the sensor, so that the sensor generates a context feature of the current step, and generates a first output weight, a second output weight, and a third output weight of the current step according to the context feature and the second self-attention feature of the current step; wherein the first output weight value represents a probability that the output word of the current step is from the input text; the second output weight value represents the probability that the output word of the current step comes from the dictionary; the third output weight value represents the probability that the output word of the current step is the next word of the output word of the previous step in the input text.

As an optional embodiment, the text generation model further comprises an output layer;

the generating module 302 is specifically configured to input the first output weight, the first output probability distribution, the second output weight, the second output probability distribution, and the third output weight into the output layer, so that the output layer generates a final output probability distribution that an output word in a current step corresponds to each word in the input text and the dictionary, and uses a word corresponding to a maximum value in the final output probability distribution as an output word output in the current step;

wherein the step of generating the final output probability distribution comprises: multiplying the first output weight by the first output probability distribution to obtain a first weighted output probability distribution; the first weighted output probability distribution includes: a first weighted output probability for each word in the input text; acquiring first output probability distribution of the previous step, and multiplying the third output weight by the first output probability distribution of the previous step to obtain continuous output probability of each word in the input text; for any word in the input text, adding the first weighted output probability of the word and the continuous output probability of the previous word of the word in the input text to obtain a combined output probability distribution; multiplying the second output weight by the second output probability distribution to obtain a second weighted output probability distribution; and taking the first weighted output probability distribution, the second weighted output probability distribution and the combined output probability distribution as the final output probability distribution.

As an optional embodiment, the inputting the first output probability distribution, the first self-attention feature, and the second self-attention feature of the current step into the sensor to enable the sensor to generate a context feature of the current step specifically includes: and enabling the perceptron to take the first output probability distribution as a weight, and carrying out weighted summation on the first self-attention feature to obtain the context feature.

Based on the same inventive concept, one or more embodiments of the present specification further provide an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the processor implements the text generation method according to any one of the above embodiments.

Fig. 4 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 1010, a memory 1020, an input/output interface 1030, a communication interface 1040, and a bus 1050. Wherein the processor 1010, memory 1020, input/output interface 1030, and communication interface 1040 are communicatively coupled to each other within the device via bus 1050.

The processor 1010 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present disclosure.

The Memory 1020 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 1020 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 1020 and called to be executed by the processor 1010.

The input/output interface 1030 is used for connecting an input/output module to input and output information. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.

The communication interface 1040 is used for connecting a communication module (not shown in the drawings) to implement communication interaction between the present apparatus and other apparatuses. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).

Bus 1050 includes a path that transfers information between various components of the device, such as processor 1010, memory 1020, input/output interface 1030, and communication interface 1040.

It should be noted that although the above-mentioned device only shows the processor 1010, the memory 1020, the input/output interface 1030, the communication interface 1040 and the bus 1050, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.

Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the present description as described above, which are not provided in detail for the sake of brevity.

It is intended that the one or more embodiments of the present specification embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims

1. A text generation method, comprising:

acquiring an input text;

2. The method of claim 1, the text generation model further comprising an input layer;

the causing the encoder to generate a first self-attention feature for each word in the input text based on a self-attention mechanism includes:

inputting the input text into the input layer to enable the input layer to generate word vectors of all words in the input text;

and inputting the word vector into the encoder so that the encoder generates a first hidden state for each word in the input text respectively, and generating the first self-attention feature for each word in the input text respectively according to the first hidden state based on a self-attention mechanism.

3. The method according to claim 2, wherein the causing the decoder to generate the second self-attention feature of the current step based on a self-attention mechanism includes:

and enabling the decoder to acquire a current step and a second hidden state generated by each previous step, and generating a second self-attention feature of the current step according to the second hidden state based on a self-attention mechanism.

4. The method of claim 3, the text generation model further comprising a perceptron;

determining and outputting the output word of the current step according to the first output probability distribution, the second output probability distribution and the first output probability distribution of the previous step, specifically comprising:

inputting the first output probability distribution, the first self-attention feature and the second self-attention feature of the current step into the sensor so that the sensor generates a context feature of the current step, and predicting and generating a first output weight, a second output weight and a third output weight of the current step according to the context feature and the second self-attention feature of the current step; wherein the first output weight value represents a probability that the output word of the current step is from the input text; the second output weight value represents the probability that the output word of the current step comes from the dictionary; the third output weight value represents the probability that the output word of the current step is the next word of the output word of the previous step in the input text.

5. The method of claim 4, the text generation model further comprising an output layer;

inputting the first output weight, the first output probability distribution, the second output weight, the second output probability distribution and the third output weight into the output layer, so that the output layer generates a final output probability distribution that the output words of the current step correspond to each word in the input text and the dictionary, and takes the word corresponding to the maximum value in the final output probability distribution as the output word output by the current step;

wherein the step of generating the final output probability distribution comprises:

multiplying the first output weight by the first output probability distribution to obtain a first weighted output probability distribution; the first weighted output probability distribution includes: a first weighted output probability for each word in the input text;

acquiring first output probability distribution of the previous step, and multiplying the third output weight by the first output probability distribution of the previous step to obtain continuous output probability of each word in the input text;

for any word in the input text, adding the first weighted output probability of the word and the continuous output probability of the previous word of the word in the input text to obtain a combined output probability distribution;

multiplying the second output weight by the second output probability distribution to obtain a second weighted output probability distribution;

and taking the first weighted output probability distribution, the second weighted output probability distribution and the combined output probability distribution as the final output probability distribution.

6. The method of claim 4, wherein said inputting said first output probability distribution, said first self-attention feature and said second self-attention feature of the current step into said perceptron to cause said perceptron to generate a context feature of the current step, comprises:

and enabling the perceptron to take the first output probability distribution as a weight, and carrying out weighted summation on the first self-attention feature to obtain the context feature.

7. A text generation apparatus comprising:

an acquisition module configured to acquire an input text;

8. The apparatus of claim 7, the text generation model further comprising an input layer;

the generating module is specifically configured to input the input text into the input layer, so that the input layer generates a word vector of each word in the input text; and inputting the word vector into the encoder so that the encoder generates a first hidden state for each word in the input text respectively, and generating the first self-attention feature for each word in the input text respectively according to the first hidden state based on a self-attention mechanism.

9. The apparatus of claim 8, wherein the generating module is specifically configured to cause the decoder to obtain a current step and a second hidden state generated by each previous step, and to generate a second self-attention feature of the current step according to the second hidden state based on a self-attention mechanism.

10. The apparatus of claim 9, the text generation model further comprising a perceptron;

the generating module is specifically configured to input the first output probability distribution, the first self-attention feature and the second self-attention feature of the current step into the sensor, so that the sensor generates a context feature of the current step, and generates a first output weight, a second output weight and a third output weight of the current step according to the context feature and the second self-attention feature of the current step in a prediction manner; wherein the first output weight value represents a probability that the output word of the current step is from the input text; the second output weight value represents the probability that the output word of the current step comes from the dictionary; the third output weight value represents the probability that the output word of the current step is the next word of the output word of the previous step in the input text.

11. The apparatus of claim 10, the text generation model further comprising an output layer;

the generating module is specifically configured to input the first output weight, the first output probability distribution, the second output weight, the second output probability distribution, and the third output weight into the output layer, so that the output layer generates a final output probability distribution that an output word in a current step corresponds to each word in the input text and the dictionary, and uses a word corresponding to a maximum value in the final output probability distribution as an output word output in the current step;

12. The apparatus of claim 10, said inputting said first output probability distribution, said first self-attention feature, and said second self-attention feature of the current step into said perceptron to cause said perceptron to generate a context feature of the current step, comprising in particular: and enabling the perceptron to take the first output probability distribution as a weight, and carrying out weighted summation on the first self-attention feature to obtain the context feature.

13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 6 when executing the program.