CN115630651A

CN115630651A - Text generation method and training method and device of text generation model

Info

Publication number: CN115630651A
Application number: CN202211306837.XA
Authority: CN
Inventors: 王凡; 鲍思琪; 何煌; 吴华; 林英展; 黄世维; 何径舟
Original assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Current assignee: Beijing Baidu Netcom Science and Technology Co Ltd
Priority date: 2022-10-24
Filing date: 2022-10-24
Publication date: 2023-01-20
Anticipated expiration: 2042-10-24
Also published as: CN115630651B

Abstract

The disclosure provides a text generation method and a training method and device of a text generation device, and relates to the field of artificial intelligence, in particular to the technical fields of deep learning, natural language processing, intelligent voice and the like. The specific implementation scheme of the text generation method is as follows: preprocessing a text to be processed to obtain an embedded characteristic sequence, wherein the embedded characteristic sequence comprises embedded characteristics corresponding to text units; inputting the embedded characteristic sequence into an attention network formed by a decoding unit to obtain a text characteristic sequence output by the attention network; and decoding the text feature sequence to generate a subsequent text of the text to be processed, wherein the decoding unit is configured to perform the following operations: coding the input characteristic sequence by adopting an attention mechanism to obtain a first characteristic sequence; adjusting the first characteristic sequence according to the hidden state characteristic to obtain a second characteristic sequence; the hidden state features represent the semantics of the previous text of the text to be processed; and updating the hidden state feature according to the second feature sequence.

Description

Text generation method and training method and device of text generation model

Technical Field

The present disclosure relates to the field of artificial intelligence, and in particular, to the technical fields of deep learning, natural language processing, and intelligent speech, and in particular, to a method, an apparatus, a device, and a medium for text generation and training of a text generation model.

Background

With the development of computer technology and network technology, self-attention mechanism based on vision is widely applied. For example, in the field of natural language processing, a self-attention mechanism may be relied upon to capture long-range semantic features in text. But the self-attention mechanism is usually unable to remember long-term information due to the limited encoding length of the self-attention mechanism.

Disclosure of Invention

The present disclosure is directed to providing a text generation method, apparatus, electronic device, and storage medium that can generate a text using long-term memory, and is directed to improving the accuracy of the generated text.

According to an aspect of the present disclosure, there is provided a text generation method including: preprocessing a text to be processed to obtain an embedded characteristic sequence, wherein the embedded characteristic sequence comprises embedded characteristics corresponding to text units in the text to be processed; inputting the embedded characteristic sequence into an attention network formed by a coding unit to obtain a text characteristic sequence output by the attention network; and decoding the text feature sequence to generate a subsequent text of the text to be processed, wherein the encoding unit is configured to perform the following operations: coding the input characteristic sequence by adopting an attention mechanism to obtain a first characteristic sequence; adjusting the first characteristic sequence according to the hidden state characteristic to obtain a second characteristic sequence; the hidden state features represent the semantics of the previous text of the text to be processed; and updating the hidden state feature according to the second feature sequence.

According to another aspect of the present disclosure, there is provided a training method of a text generation model, wherein the text generation model includes a preprocessing network, an attention network, and a decoding network; the attention network is composed of coding units, and the training method comprises the following steps: preprocessing each target text in the text sequence by adopting a preprocessing network to obtain an embedded characteristic sequence, wherein the embedded characteristic sequence comprises embedded characteristics corresponding to text units in each target text; inputting the embedded characteristic sequence into an attention network to obtain a text characteristic sequence output by the attention network; decoding the text characteristic sequence by adopting a decoding network to generate a predicted text of each target text; training a text generation model according to the predicted subsequent text and the adjacent subsequent text of each target text in the text sequence; wherein the encoding unit is configured to perform the following operations: coding the input characteristic sequence by adopting an attention mechanism to obtain a first characteristic sequence; adjusting the first characteristic sequence according to the hidden state characteristic to obtain a second characteristic sequence; the hidden state features represent the semantics of the previous text of each target text; and updating the hidden state feature according to the second feature sequence.

According to another aspect of the present disclosure, there is provided a text generation apparatus including: the preprocessing module is used for preprocessing the text to be processed to obtain an embedded characteristic sequence, and the embedded characteristic sequence comprises embedded characteristics corresponding to text units in the text to be processed; the text characteristic obtaining module is used for inputting the embedded characteristic sequence into an attention network formed by the coding units to obtain a text characteristic sequence output by the attention network; and the characteristic decoding module is used for decoding the text characteristic sequence and generating a post text of the text to be processed, wherein the characteristic sequence obtaining module comprises: the encoding submodule is used for encoding the input characteristic sequence by adopting an attention mechanism aiming at the encoding unit to obtain a first characteristic sequence; the adjusting submodule is used for adjusting the first characteristic sequence according to the hidden state characteristic to obtain a second characteristic sequence; the hidden state features represent the semantics of the previous text of the text to be processed; and an update submodule for updating the hidden state feature according to the second feature sequence.

According to another aspect of the present disclosure, there is provided a training apparatus for a text generation model, wherein the text generation model includes a preprocessing network, an attention network, and a decoding network; the attention network is composed of coding units; the training device comprises: the preprocessing module is used for preprocessing each target text in the text sequence by adopting a preprocessing network to obtain an embedded feature sequence, and the embedded feature sequence comprises embedded features corresponding to text units in each target text; the text characteristic obtaining module is used for inputting the embedded characteristic sequence into the attention network to obtain a text characteristic sequence output by the attention network; the characteristic decoding module is used for decoding the text characteristic sequence by adopting a decoding network to generate a predicted text of each target text; the model training module is used for training the text generation model according to the predicted subsequent text and the subsequent text of each target text in the text sequence; wherein, the text characteristic obtaining module comprises: the coding submodule is used for coding the input characteristic sequence by adopting an attention mechanism aiming at the coding unit to obtain a first characteristic sequence; the adjusting submodule is used for adjusting the first characteristic sequence according to the hidden state characteristic to obtain a second characteristic sequence; the hidden state features represent the semantics of the previous text of each target text; and an update submodule for updating the hidden state feature according to the second feature sequence.

According to another aspect of the present disclosure, there is provided an electronic device including: at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a text generation method and/or a training method of a text generation model provided by the present disclosure.

According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform a text generation method and/or a training method of a text generation model provided by the present disclosure.

According to another aspect of the present disclosure, a computer program product is provided, comprising computer programs/instructions stored on at least one of a readable storage medium and an electronic device, which when executed by a processor, implement the text generation method and/or the training method of the text generation model provided by the present disclosure.

It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.

Drawings

The drawings are included to provide a better understanding of the present solution and are not to be construed as limiting the present disclosure. Wherein:

FIG. 1 is a schematic diagram of an application scenario of a training method and apparatus for a text generation method and a text generation model according to an embodiment of the present disclosure;

FIG. 2 is a flow diagram of a text generation method according to an embodiment of the present disclosure;

FIG. 3 is a schematic diagram of adjusting a first sequence of features according to an embodiment of the present disclosure;

FIG. 4 is a schematic diagram of updating hidden state features according to an embodiment of the present disclosure;

FIG. 5 is an implementation schematic diagram of a text generation method according to an embodiment of the present disclosure;

FIG. 6 is a flow diagram illustrating a method of training a text generation model according to an embodiment of the present disclosure;

fig. 7 is a block diagram of a structure of a text generation apparatus according to an embodiment of the present disclosure;

FIG. 8 is a block diagram of a training apparatus for generating a model of text according to an embodiment of the present disclosure; and

FIG. 9 is a block diagram of an electronic device for implementing a method of text generation and/or a method of training a text generation model according to embodiments of the present disclosure.

Detailed Description

Exemplary embodiments of the present disclosure are described below with reference to the accompanying drawings, in which various details of the embodiments of the disclosure are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present disclosure. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

An attention network constructed based on the self-attention mechanism can encode (Encoding) a text segment, namely, the text segment is subjected to Encoding processing, and the text segment is represented as a vector sequence. For example, the attention network may be a Transformer network. But attention networks can only encode a short piece of text and do not have the ability to encode that piece of text in conjunction with the semantics of the preceding text of that piece of text. Although the attention network can adjust the network parameters by learning the semantics of a large amount of text through a large amount of pre-training, the network parameters are fixed after training. In an actual usage scenario, the attention network cannot encode the current text according to the semantics of the previously input text and complete the prediction task. This makes it difficult for the attention network to grasp a large amount of information of the vertical domain.

For example, when an attention network is applied to a question and answer scenario of a certain product, the attention network cannot be well competent for a question and answer task when the attention network lacks a detailed introduction document of the product. Instead, the attention network needs to be trained individually based on a large number of introductory documents for that certain product. There is a problem that the generalization ability of the trained attention network is weak.

In order to solve the problem of weak generalization ability, methods such as transfer learning, meta-learning, pre-training and prompt (Prompting) learning appear. However, these methods are limited by the coding length of the self-attention mechanism, and still have the problem that only short texts can be coded. Therefore, when the current text is coded, the semantics of the previous text of the current text cannot be considered, which limits the accuracy of the feature expression information obtained by coding and affects the precision of the generated text.

In order to solve the problem, the present disclosure provides a method, an apparatus, a device, and a medium for training a text generation method and a text generation model, and an application scenario of the method and the apparatus provided by the present disclosure will be described in detail below with reference to fig. 1.

Fig. 1 is a schematic view of an application scenario of a text generation method and a training method and device of a text generation model according to an embodiment of the present disclosure.

As shown in fig. 1, the application scenario 100 of this embodiment may include an electronic device 110, and the electronic device 110 may be various electronic devices with processing functions, including but not limited to a smart phone, a tablet computer, a laptop computer, a desktop computer, a server, and so on.

The electronic device 110 may have text processing functionality for processing the entered text 120 to predict the subsequent text 130 of the text 120. In one embodiment, the electronic device 110 may also have, for example, a smart voice function for converting a voice signal provided by a user into a text 120 and generating a subsequent text 130 of the text, and simultaneously converting the subsequent text 130 into a voice signal for playing, so as to achieve smart interaction with the user.

Illustratively, electronic device 110 may employ a model that combines a self-attention mechanism and a looping mechanism to encode text 120. For example, a transform-XL model or a Block loop transform (Block recovery Transformer) model that solves the problem of long sequences can be used. Wherein XL represents an Extra Long length (Extra Long). By means of the models, the coding length of the attention network is not limited to local parts, and long-term information can be accumulated through recursion, so that the subsequent inference process can be influenced and corrected. For example, in a text generation scenario, the text generation model may include the model that combines a self-attentiveness mechanism and a round robin mechanism.

Illustratively, the electronic device 110 may also process the text 120 using the text generation method provided by the present disclosure, thereby generating the following text 130. In this way, the length of the semantic meaning memorized by the model can be extended, and the information before a long time can be continuously memorized, thereby improving the accuracy of the generated post text 130. Accordingly, the electronic device 110 may employ the text generation model 140 provided by the present disclosure to implement a text generation method.

As shown in fig. 1, the application scenario 100 may further include a server 150, and the server 150 may be a background management server supporting the running of the client application in the electronic device 110. The electronic device 110 may be communicatively coupled to the server 150 via a network, which may include wired or wireless communication links. The server 150 may also be a cloud server, a server of a distributed system, or a server that incorporates a blockchain.

For example, the server 150 may train the text generation model 140 with a large amount of text, and in response to an acquisition request of the electronic device 110, send the trained text generation model 140 to the electronic device 110, so that the electronic device 110 generates the subsequent text 130 with the text generation model 140.

In an embodiment, the electronic device 110 may further send the text 120 to the server 150, and the server 150 processes the text 120 using the trained text generation model to obtain the subsequent text 130.

It should be noted that the text generation method provided in the present disclosure may be executed by the electronic device 110, and may also be executed by the server 150. Accordingly, the text generation apparatus provided by the present disclosure may be provided in the electronic device 110, and may also be provided in the server 150. The training method of the text generation model provided by the present disclosure may be performed by the server 150. Accordingly, the training device of the text generation model provided by the present disclosure may be provided in the server 150.

It should be understood that the number and type of electronic devices 110 and servers 150 in fig. 1 are merely illustrative. There may be any number and type of electronic devices 110 and servers 150, as desired for an implementation.

The text generation method provided by the present disclosure will be described in detail below with reference to fig. 2 to 5.

Fig. 2 is a flow diagram of a text generation method according to an embodiment of the present disclosure.

As shown in fig. 2, the text generation method 200 of this embodiment may include operations S210 to S230. Wherein, in operation S220, each encoding unit in the attention network may be configured to perform operations S221 to S223.

In operation S210, the text to be processed is preprocessed to obtain an embedded feature sequence.

According to an embodiment of the present disclosure, operation S210 may cut the text x to be processed into a plurality of text units, and the plurality of text units may constitute a text unit sequence. For example, if L text units can be obtained by dividing, the text X to be processed can be represented as X = (X) ₁ ，x ₂ ，…，x _L ). Wherein x is _i Representing the ith text unit of the L text units. L may be an integer greater than 1, and i is in the range of [1]. The text unit may be, for example, a text unit with any granularity, for example, a text unit with a word granularity or a word granularity, which is not limited in this disclosure.

According to an embodiment of the present disclosure, after obtaining the text feature sequence, the embodiment may, for example, perform embedding processing on the text unit sequence to obtain an embedded feature sequence. For example, for text unit x _i The embedding characteristic E (x) can be obtained by an embedding process _i ). In this embodiment, for example, an Embedding layer (Embedding layer) may be used to perform Embedding processing on the text unit, so as to obtain the dense feature sequence through linear processing. For example, the embedding layer may be a fully connected layer, which is not limited by this disclosure. The embedded feature sequence obtained in this embodiment includes embedded features corresponding to text units in the text to be processed, that is, the embedded features in the embedded feature sequence correspond to the text units in the text unit sequence one to one.

In operation S220, the embedded feature sequence is input into the attention network composed of the encoding units, and a text feature sequence output by the attention network is obtained.

According to an embodiment of the present disclosure, the attention network may be composed of at least two encoding units connected in sequence, for example, and the attention network may be understood as an encoding network for processing the embedded feature sequence based on a self-attention mechanism to extract the context semantic features of the text X. The embodiment can input the embedded characteristic sequence into the coding unit arranged at the first position in at least two coding units which are connected in sequence, and after the processing of the at least two coding units, the characteristic sequence output by the coding unit arranged at the last position in the at least two coding units is used as the text characteristic sequence.

According to an embodiment of the present disclosure, each encoding unit in the attention network may be configured to perform operations S221 to S223, for example.

In operation S221, the input feature sequence is encoded using an attention mechanism, resulting in a first feature sequence.

According to an embodiment of the present disclosure, for the encoding unit ranked first, the input signature sequence may include the embedded signature sequence obtained in operation S210. For other coding units except the coding unit arranged at the head, the input characteristic sequence is the characteristic sequence output by the previous coding unit connected with the other coding units.

In this embodiment, the coding of the input feature sequence may be implemented by performing attention operation on features in the input feature sequence, so as to obtain the first feature sequence. For example, a self-attention mechanism may be employed to encode the input signature sequence. Namely, the attention operation is carried out on every two characteristics by adopting the self-attention principle.

In an embodiment, the coding unit may include a self attention layer in a transform encoder. The self-attention layer can be constructed by adopting a multi-head attention mechanism, and then for the ith feature in the input feature sequence

The following may be adoptedAnd (3) realizing coding by the formula (1) to obtain corresponding characteristics in the first characteristic sequence. Wherein j represents the arrangement sequence of the coding units in at least two coding units which are connected in sequence, and the value range of j is [1, N ]]Wherein N is the total number of coding units in the attention network. Then

The ith characteristic in the characteristic sequence output by the (j-1) th coding unit. Wherein, setting

Then

For example, for the first-ranked coding unit of the at least two coding units, the input feature sequence may be the embedded feature E (x) described above _i ) And (5) forming an embedded characteristic sequence. Alternatively, the attention network may perform regularization on the embedded feature sequence, and use the regularized embedded feature sequence as a feature sequence of an encoding unit that is input first. For example, the ith embedded feature in the regularized embedded feature sequence may be calculated by the following formula (2).

In operation S222, the first feature sequence is adjusted according to the hidden state feature to obtain a second feature sequence.

In operation S223, the hidden-state feature is updated according to the second feature sequence.

According to an embodiment of the present disclosure, the hidden state feature is similar to a hidden variable in a recurrent neural network, for storing and propagating historical information as one dynamic variable. For example, the hidden state feature may characterize the semantics of the preceding text of the text to be processed. The embodiment can realize the adjustment of the first feature sequence by the weighted fusion of the hidden state features and the first feature sequence. Or, the first feature sequence and the hidden state feature may be concat () concatenated, and then the concatenated feature may be linearly processed to implement the adjustment of the first feature sequence.

After the second feature sequence is obtained, the embodiment may obtain an update amount of the hidden-state feature according to the predetermined learning rate and the second feature sequence, and then update the hidden-state feature according to the update amount. The predetermined learning rate may be set according to actual requirements, which is not limited by the present disclosure.

In operation S230, the text feature sequence is decoded to generate a subsequent text of the text to be processed.

According to the embodiment of the disclosure, a decoder based on a recurrent neural network or a Transformer architecture or the like can be adopted to decode the text feature sequence, and the decoder generates the post text of the text to be processed. For example, if the text to be processed is the text of a query sentence, the subsequent text may be the text of a reply sentence.

According to the embodiment of the disclosure, the first feature sequence is adjusted according to the hidden state feature representing the semantic meaning of the previous text to obtain the second feature sequence, so that the obtained second feature sequence not only can express the semantic meaning of the text to be processed, but also can express the semantic meaning of the previous text, and the expression capability of the second feature sequence can be improved. The hidden state features can then be merged into the semantics of the real-time processed sentence by updating the hidden state features according to the second sequence of features. That is, hidden state features have the ability to learn semantics in real time during text generation. Therefore, in the text generation process, the existing knowledge can be continuously corrected according to the semantics learned in real time without gradient return or fine adjustment of network parameters, and fine adjustment of network parameters in the attention network is not needed to adapt to different application scenes. And thus facilitates an increase in the accuracy of the generated subsequent text and the robustness of the text generation method.

In an embodiment, the encoding unit may further include a non-linear processing layer in a transform encoder, and the like, configured to perform non-linear processing on the obtained second signature sequence. The feature sequence output by the encoding unit is a feature sequence obtained after the nonlinear processing. Correspondingly, the coding unit is also configured to perform nonlinear processing on the second characteristic sequence to obtain an output characteristic sequence. For example, the second feature sequence may be subjected to nonlinear processing by using a ReLU activation function or a GeLU activation function, so as to improve the robustness of the entire text generation method.

In an embodiment, the encoding unit may further include a regularization layer, configured to perform regularization on the obtained second feature sequence. The feature sequence output by the encoding unit is the feature sequence obtained after the regularization processing. Correspondingly, the encoding unit is also configured to perform regularization processing on the second feature sequence, so as to obtain an output feature sequence. By carrying out regularization processing on the second characteristic sequence, the complexity of a decoding process can be reduced, and the text generation efficiency is favorably improved.

In an embodiment, the encoding unit may include both the regularization layer and the non-linear processing layer. The encoding unit may be configured to: and firstly, regularizing the second feature sequence, and then carrying out nonlinear processing on the feature sequence obtained by the regularization. The characteristic sequence output by the coding unit is a characteristic sequence after nonlinear processing.

In an embodiment, the second signature sequence may be added to the regularized signature sequence to avoid degradation problems. Then, the added features are subjected to nonlinear processing. For example, the ith feature in the second feature sequence is set as

The regularization processing of the ith feature in the second feature sequence may be performed by the following formula (3) and the regularized feature is added to the ith feature to obtain an added feature

Where LayerNorm () is a function of the regularization operation,

after obtaining the added features

Then, the added features can be applied by the following equation (4)

Performing nonlinear processing to obtain output characteristic corresponding to ith characteristic in the second characteristic sequence

Wherein σ ₁ () In order to be a non-linear activation function,

are network parameters of the non-linear layer.

The implementation principle of the operation S222 of adjusting the first characteristic sequence described above will be further expanded and defined below with reference to fig. 3.

Fig. 3 is a schematic diagram of adjusting a first signature sequence according to an embodiment of the disclosure.

As shown in fig. 3, in an embodiment 300, the encoding network may determine an adjustment 303 corresponding to the first feature sequence, for example, from hidden state features 301 and an input feature sequence 302. The first signature sequence 304 is then adjusted according to the adjustment 303 to obtain a second signature sequence 305. For example, the encoding network may fuse the hidden-state feature 301 and the input feature sequence 302, so that the fused feature may represent the long-term semantics (which may be understood as global semantics) of the text to be processed, that is, the semantics of the text to be processed determined according to the semantics of the preceding text. This embodiment may use the long-term semantics as an adjustment to the first feature sequence. Since the first feature sequence is obtained by processing only the input feature sequence, the first feature sequence can represent short-term semantics (which can be understood as local semantics) of the text to be processed. Therefore, the adjusted second feature sequence not only can represent the short-term semantics of the text to be processed, but also can represent the long-term semantics of the text to be processed, and the expression capability of the second feature sequence is improved.

For example, the hidden state feature 301 and the input feature sequence 302 may be concat () concatenated, and then the concatenated features may be linearly processed, so as to obtain features representing the long-term semantics of the text to be processed.

For example, for coding units arranged at the jth position, if the current hidden state feature is present

The ith feature in the input feature sequence

This embodiment may also achieve fusion of the two features by computing the inner product of the hidden state feature 301 and the ith feature in the input feature sequence. That is, the inner product obtained by calculation is used as the feature for representing the long-term semantics of the text to be processed. On the basis, the feature which represents the long-term semantics of the text to be processed can be added to the first feature sequence, so as to obtain a second feature sequence. For example, for the feature arranged at the ith position in the first feature sequence, the following formula (5) can be used to calculate the corresponding feature in the second feature sequence

Wherein t can represent that the text to be processed is generated at the text generatorOrder in the text sequence processed by the method. That is, the above-described text to be processed is the t-th text processed by the text generation method, and the current hidden state feature

Namely, the updated hidden state characteristics obtained by the coding unit arranged at the jth position in the process of generating the subsequent text according to the (t-1) th text to be processed by the text generation method.

For example, W may be set ₁ ^j The value of (b) is a preset initial value, for example, may be any initial value such as a 0 matrix, which is not limited in this disclosure.

The implementation principle of operation S223 of updating the hidden-state feature described above will be further extended and defined in conjunction with fig. 4.

Fig. 4 is a schematic diagram of updating hidden state features according to an embodiment of the present disclosure.

According to the embodiment of the present disclosure, the hidden state feature can be updated according to the principle of biological plasticity. For example, the input signature sequence of the coding network may be analogized to a pre-synaptic neuron state and the second signature sequence to a post-synaptic neuron state. An update amount of the hidden state is determined according to the presynaptic neuron state and the postsynaptic neuron state, and the update amount can represent the connection relation between the presynaptic text and the text to be processed. By the method, the continuous updating of the hidden state can be realized, and the automatic learning without manual intervention can be realized without depending on the guidance of a loss function. Moreover, the updating principle of the hidden state characteristics can be more fit with the biological mechanism, and the updating accuracy is improved.

As shown in fig. 4, in this embodiment 400, a feature sequence 401 of the setting input includes a first text feature corresponding to a text unit. If the number of the text units obtained after the segmentation of the text to be processed is L, the input feature sequence comprises L first text features. Correspondingly, the first feature sequence obtained by encoding and the second feature sequence 402 obtained by adjusting according to the hidden state feature both include L features corresponding to L text units. This embodiment may take the L features included in the second feature sequence 402 as the L second text features.

In this embodiment 400, the amount of updates for each unit of text may be determined based on the first and second textual features for that unit of text. So that L update amounts corresponding to L text units can be obtained in total.

For example, the embodiment may cross-multiply the first text feature with the second text feature, and determine the update amount according to the cross-multiplied feature. The cross-multiplied feature may for example be the same size as the hidden state feature to be updated. For example, for the ith text unit in the L text units, the first text feature corresponding to the ith text unit in the feature sequence input into the jth coding unit is

Obtaining a second text characteristic corresponding to the ith text unit in the second characteristic sequence as

The adjustment amount aw for the ith text unit _i ^j For example, it can be calculated by the following equation (6).

Wherein,

for example, can be expressed as

Wherein if it is set

Then

For the network parameter in the jth coding unit,

in one embodiment, hebb's rule may be further used to process the second text feature and the first text feature corresponding to each text unit, thereby obtaining the updated amount of the text unit. For example, the adjustment amount Δ W for the ith text unit can be calculated by using the following formula (7) for the coding unit arranged at the jth position _i ^j 。

Wherein,

are all network parameters of the coding unit arranged at the jth position, and

if the coding unit is not considered to be the position in the N coding units, the superscript j in the formula (7) can be removed and the superscript H can be removed for each coding unit _i I.e. the first text feature, Y ', of the corresponding ith text unit input into each encoding unit' _i Namely the second text characteristic of the corresponding ith text unit obtained by each coding unit.

It is understood that the adjustment amount aw for the i-th text unit may also be calculated by using a formula including only the first term on the right side and at least one of the second to fourth terms on the right side in the above formula (7) _i ^j The present disclosure is toThis is not limitative.

By adopting the Hubbu law to determine the updating amount, the updating of the hidden state characteristics can be more accordant with the biological principle, the updating of the hidden state characteristics can be more fit with reality and reality, and the generated precision of the subsequent text can be improved.

Subsequently, the embodiment 400 may update the hidden-state feature according to the L update amounts after obtaining the L update amounts. For example, the weighted sum of the L update amounts may be used as the total update amount 403, and then the total update amount 403 is added to the hidden-state feature 404, so as to complete the update of the hidden-state feature, resulting in the updated hidden-state feature 405. The weights used in calculating the weighted sum may be obtained by pre-training, for example, as network parameters of the coding unit. Alternatively, the weight used in calculating the weighted sum may be a value set according to actual requirements, such as 1/L, and the present disclosure does not limit this.

In one embodiment, the principle of dopamine neurons can be employed to determine the weights employed in weighting and use the weights as learning rates. By this principle, signals of different areas in the second text feature can be integrated, thereby increasing the plasticity of the hidden state feature. For example, the embodiment may perform non-linear processing on the second text feature corresponding to each text unit, thereby obtaining a learning rate for the each text unit. Finally, the hidden state feature is updated according to the learning rate and the update amount.

For example, for the ith text unit, the coding unit arranged at the jth position can obtain the learning rate by using the following formula (8)

Wherein,

i.e. a second text feature for the ith text unit in the second feature sequence.

Is the network parameter of the coding unit ranked at the jth position, wherein,

σ ₂ () As a non-linear activation function, σ ₂ () And σ ₁ () Similarly.

According to the embodiment of the disclosure, the learning rate is obtained

The instance 400 may then update the hidden-state feature based on the learning rate and the amount of the update.

For example, the weighted update amount for each text unit may be determined according to the learning rate and the update amount for each text unit. Finally, the hidden state feature is updated according to the L weighted update amounts for the L text units. For example, the learning rate for each text unit is used as a weight of the update amount for each text unit, and the learning rate and the update amount are multiplied by each other to obtain a weighted update amount. Finally, the sum of the weighted update amounts is taken as the total update amount 403. The total update amount 403 is added to the hidden state feature 404 to be updated, so as to obtain an updated hidden state feature 405. For example, for the coding unit arranged at the j-th position, the updated hidden state feature can be calculated by the following formula (9)

In an embodiment, a boundary may be set for an element in the hidden state feature to avoid that the value of the element in the hidden state feature is too large, which brings a calculation risk to the text generation method, and makes the update of the hidden state feature unreasonable. And therefore, the accuracy of semantic expression due to hidden state features can be improved.

For example, when the hidden state feature is updated according to a plurality of weighted update amounts, the hidden state feature may be adjusted according to the sum of the plurality of weighted update amounts, that is, the hidden state feature is adjusted by using the above-described formula (9), and therefore the feature obtained by the formula (9) is used as the adjusted state feature. And then, updating the adjusted state characteristic by adopting a boundary function to obtain an updated hidden state characteristic. For example, the boundary value is set to H _W In this embodiment, the above formula (9) for obtaining the updated hidden-state feature may be rewritten as the following formula (10).

Where BounddDecay () is a boundary function.

For example, the embodiment may compare each element in the adjusted state feature with a boundary value, and assign the boundary value to a certain element if the value of the certain element exceeds the boundary value. And if the value of a certain element does not exceed the boundary value, keeping the value of the certain element.

In an embodiment, when the adjusted state feature is updated by using the boundary function, for example, when the target element beyond a predetermined boundary is included in the adjusted state feature, the target element may be updated according to a predetermined forgetting rate.

For example, a difference between 1 and a predetermined forgetting rate may be used as an adjustment coefficient, the adjustment coefficient may be multiplied by the value of the target element, and the value obtained by the multiplication may be assigned to the target element.

For example, the boundary function can also be expressed by the following formula (11).

This embodiment may employ that in equation (10)

And (3) replacing a in the formula (11), namely, updating the adjusted state characteristic to obtain an updated hidden state characteristic. Wherein p is a predetermined forgetting rate, and a value of the predetermined forgetting rate may be set according to an actual requirement, for example, may be set to 0.05, which is not limited in this disclosure.

The principles of the text generation method provided by the present disclosure will be further extended and defined below in connection with fig. 5.

Fig. 5 is an implementation schematic diagram of a text generation method according to an embodiment of the present disclosure.

As shown in FIG. 5, this embodiment 500 may implement a text generation method using a text generation model that includes a preprocessing network 510, an attention network 520, and a decoding network 530.

For the text 501 to be processed, the text 501 to be processed may be input into the preprocessing network 510, and the embedded feature sequence 502 is output from the preprocessing network 510. The preprocessing network includes the embedded network described above.

The attention network 520 is composed of L coding units connected in sequence. Each coding unit includes a multi-headed self-attention layer 521, a feature adjustment layer 522, a hidden state update layer 523, and an overlay & regularization layer 524. In order to reduce the complexity of the model calculation, before the L coding units, the attention network 520 may further be provided with a regularization layer 525, for example, to regularize the embedded feature sequence 502 by using the above-described formula (2).

Accordingly, after obtaining the embedded feature sequence 502, the embedded feature sequence 502 may be input into the regularization layer 525. Subsequently, the feature sequence output by the regularization layer 525 is input to the first-ranked coding unit of the L coding units connected in sequence. The multi-headed self-attention layer 521 in the jth coding unit is used for coding the input feature sequence by using the formula (1) described above, so as to obtain the first feature sequence described above. For example, the multi-headed self-attention layer 521 may be used to perform the operation S221 described above.

The hidden state update layer 523 is used to store and update hidden state features. The first sequence of features may be input into the hidden state update layer and the feature adjustment layer 522. Meanwhile, the hidden-state feature stored by the feature adjustment layer 522 may be input to the feature adjustment layer 522 in synchronization with the first feature sequence. Note that the hidden-state feature input to the feature adjustment layer 522 is a hidden-state feature stored before being updated according to the first feature sequence. The feature adjustment layer 522 is configured to adjust the first feature sequence according to the input hidden state feature, so as to obtain a second feature sequence. For example, the feature adjustment layer 522 may be used to perform the operation S222 described above, and the feature adjustment layer 522 may adjust the first feature sequence using the formula (5) described above, for example. Meanwhile, the hidden state updating layer 523 is configured to update the hidden state feature according to the first feature sequence. For example, the hidden state updating layer 523 may be configured to perform the operation S223 described above, and the hidden state updating layer 523 may be configured to update the hidden layer state by using the formula (9) described above.

The superposition & regularization layer 524 may be configured to regularize the second feature sequence using equation (3) described above, and add the feature sequence resulting from the regularization process to the first feature sequence. The superposition & regularization layer 524 may also be used to perform nonlinear processing on the summed features using equation (4) described above, for example, to obtain an output feature sequence. The feature sequence output by the last coding unit of the L coding units may be used as the text feature sequence 503.

The decoding network 530 is used to decode the input text feature sequence 503 to output the post-text 504.

It is understood that the attention network can be constructed based on a Transformer encoder. The coding unit in the attention network is different from the coding unit in the transform encoder in that the coding unit in the attention network is provided with a feature adjustment layer 522 and a hidden state update layer 523. Therefore, the attention network has the semantics of continuously correcting and supplementing the learnt semantics in the encoding process, and the attention network has the capability of capturing long-term memory on the premise of not adjusting network parameters.

Based on the implementation principle of this embodiment, for the coding unit ranked at the jth position, the following formula (12) can be adopted to express its calculation principle for the ith text unit, for example.

It can be understood that the calculation principles of the N coding units are all formula (12), and the difference is only that the value of the superscript j is different. If it is not considered which text unit is processed, the calculation principle of the coding network of the coding unit arranged at the jth position for the tth text X to be processed can be expressed by the following formula (13).

The calculation principle can be expressed by the following equation (14) for the entire attention network.

Wherein,

for the text feature sequence output for the t-th text to be processed,

when the text generation method of the embodiment of the disclosure is applied to the intelligent voice interaction system, the intelligent voice interaction system can remember the preference of the user through the intelligent voice interaction with the user, and therefore, the method is beneficial to providing more real and reliable voice response for the user in the subsequent voice interaction. For example, if the user is in the history of voice interactions of the intelligent voice interaction system, a first voice "the highest mountain in the world is Himalayan mountain with an altitude of 8848 meters" is provided. After a period of time after the first voice is provided, a second voice "himalayas changed in height due to geological motion to 8850 meters" is provided. Then after providing the second voice, if the user provides the query voice "how high the highest mountain in the world is", the intelligent voice interaction system may generate the text "8850 meters of himalayas" and convert the text to voice for playing.

In order to facilitate the implementation of the text generation method provided by the present disclosure, the present disclosure also provides a training method of a text generation model. The training method will be described in detail below with reference to fig. 6.

FIG. 6 is a flowchart illustrating a method for training a text generation model according to an embodiment of the present disclosure.

As shown in fig. 6, the training method 600 of the text generation model of this embodiment may include operations S610 to S640. Wherein, in operation S620, each coding unit in the attention network may be configured to perform operations S621 to S623. The text generation model may include a preprocessing network, an attention network, and a decoding network. Wherein the attention network may be constituted by the coding unit.

In one embodiment, the text generation model may employ a model structure as illustrated in FIG. 5 described above.

In operation S610, each target text in the text sequence is preprocessed by using a preprocessing network, so as to obtain an embedded feature sequence, where the embedded feature sequence includes embedded features corresponding to text units in each target text. The embodiment may use the text sequence as a training text, use any text in the text sequence except the last text as a target text, and use the next text of the any text as a true value of the following text of the any text. The text sequence can be obtained, for example, by sentence-splitting a text segment.

The implementation principle of operation S610 is similar to that of operation S210 described above, and is not described herein again.

In operation S620, the embedded feature sequence is input into the attention network, resulting in a text feature sequence output by the attention network.

In operation S621, the input signature sequence is encoded using an attention mechanism, and a first signature sequence is obtained.

In operation S622, the first feature sequence is adjusted according to the hidden state feature, resulting in a second feature sequence. Wherein the hidden state features characterize the semantics of the preceding text of each target text.

In operation S623, the hidden-state feature is updated according to the second feature sequence.

It is understood that the implementation principle of operation S620 may be similar to the implementation principle of operation S220 described above, and the implementation principles of operations S621 to S623 may be similar to the implementation principles of operations S221 to S223 described above, respectively, and are not described herein again.

In operation S630, the text feature sequence is decoded using a decoding network to generate a predicted-after text for each target text. The implementation principle of operation S630 may be similar to that of operation S230 described above, and is not described herein again.

In operation S640, a text generation model is trained according to the predicted-after text and the adjacent-after text of each target text in the text sequence.

According to an embodiment of the present disclosure, the adjacent subsequent text is a next text of each target text in the text sequence. The embodiment may determine a loss value of the text generation model based on a difference between the predicted-after-text and an adjacent-after-text. And training the text generation model by taking the minimized loss value as a target. For example, the difference between the predicted-after text and the adjacent-after text may be determined using semantic similarity between the texts, a ratio of the number of identical characters, or the like. For example, the generation accuracy may be employed to determine a loss value for the text generation model. The method of determining the difference and loss values is not limited by this disclosure. It is understood that a loss function can be designed to require a prediction of the subsequent text to be close to the adjacent subsequent text and to optimize the network parameters in the text generation model by a gradient pass-back algorithm.

In one embodiment, after the loss values are determined, a gradient pass-back algorithm may be employed to determine a gradient of the network parameters in the text generation model.

In an embodiment, a gradient back-propagation algorithm and a predetermined truncation position of the gradient back-propagation may be employed to determine a gradient of the loss value with respect to the network parameter in the text generation model. This is because, according to the above-described formula (14), during the gradient back-propagation, the back-propagation to M is required ₁ . As described above, the gradient descent in the reverse direction may take too long time, and thus, it may be necessary to occupy too much video memory. To avoid this, this embodiment may preset the truncation position of the gradient backtransmission. For example, setting gradients to pass back only to M _t-k Then only the relation M is calculated when determining the gradient of the loss value with respect to the network parameter in the text generation model _t-k Of the network parameter. For involving M _t-k-1 ～M ₁ The gradient of the network parameter(s) of (1) is not calculated. Subsequently, the embodiment may adjust the network parameters of the text generation model according to the calculated gradient with the goal of minimizing the calculated loss value, so as to implement training of the text generation model.

Based on the text generation method provided by the disclosure, the disclosure also provides a text generation device. The apparatus will be described in detail below with reference to fig. 7.

Fig. 7 is a block diagram of a structure of a text generation apparatus according to an embodiment of the present disclosure.

As shown in fig. 7, the text generation apparatus 700 of this embodiment includes a preprocessing module 710, a text feature obtaining module 720, and a feature decoding module 730. The text feature obtaining module 720 may include an encoding sub-module 721, an adjusting sub-module 722, and an updating sub-module 723.

The preprocessing module 710 is configured to preprocess the text to be processed to obtain an embedded feature sequence, where the embedded feature sequence includes embedded features corresponding to text units in the text to be processed. In an embodiment, the preprocessing module 710 can be configured to perform the operation S210 described above, which is not described herein again.

The feature decoding module 730 is configured to decode the text feature sequence to generate a subsequent text of the text to be processed. In an embodiment, the feature decoding module 730 may be configured to perform the operation S230 described above, which is not described herein again.

The encoding sub-module 721 is configured to encode the input feature sequence with respect to the encoding unit by using an attention mechanism, so as to obtain a first feature sequence. In an embodiment, the encoding sub-module 721 may be configured to perform the operation S221 described above, which is not described herein again.

The adjusting submodule 722 is configured to adjust the first feature sequence according to the hidden state feature to obtain a second feature sequence; the hidden state features characterize the semantics of the preceding text of the text to be processed. In an embodiment, the adjusting submodule 722 may be configured to perform the operation S222 described above, and is not described herein again.

The update sub-module 723 is configured to update the hidden-state feature according to the second feature sequence. In an embodiment, the update sub-module 723 may be configured to perform the operation S223 described above, which is not described herein again.

The text feature obtaining module 720 is configured to input the embedded feature sequence into an attention network formed by the coding units, and obtain a text feature sequence output by the attention network. In an embodiment, the text feature obtaining module 720 may be configured to perform the operation S220 described above, which is not described herein again.

According to an embodiment of the disclosure, the adjustment submodule includes: an adjustment quantity determining unit, configured to determine an adjustment quantity corresponding to the first feature sequence according to the hidden state feature and the input feature sequence; and the adjusting unit is used for adjusting the first characteristic sequence according to the adjusting amount to obtain a second characteristic sequence.

According to an embodiment of the present disclosure, the input feature sequence includes a first text feature corresponding to a text unit; the second sequence of features includes a second text feature corresponding to the text unit; the update submodule includes: an update amount determination unit configured to determine an update amount for the text unit based on the second text feature and the first text feature; and an updating unit for updating the hidden state feature according to the update amount.

According to an embodiment of the present disclosure, an update amount determination unit is configured to: and processing the second text characteristic and the first text characteristic by adopting a hebrs law to obtain an updating amount aiming at the text unit.

According to an embodiment of the present disclosure, the update amount determination unit is configured to obtain the update amount by using the following formula:

wherein, Δ W _i The update quantity of the ith text unit in the text to be processed is set; y' _i A second text feature corresponding to the ith text unit; h _i As a first text feature corresponding to the ith text element, W _A 、W _B 、W _C 、W _p Is a network parameter of the coding unit.

According to an embodiment of the present disclosure, the update sub-module further includes: a learning rate determining unit, configured to perform nonlinear processing on the second text feature to obtain a learning rate for the text unit, where the updating unit is configured to: and updating the hidden state features according to the learning rate and the updating amount.

According to the embodiment of the disclosure, a plurality of text units are included in the text to be processed. The update unit includes: a weighted amount determining subunit configured to determine a weighted update amount for the text unit, based on the learning rate and the update amount for the text unit; and an updating subunit, configured to update the hidden-state feature according to the plurality of weighted update amounts for the plurality of text units.

According to an embodiment of the present disclosure, the update subunit is configured to: adjusting the hidden state characteristic according to the sum of the weighted updating quantities to obtain an adjusted state characteristic; and updating the adjusted state characteristic by adopting a boundary function to obtain an updated hidden state characteristic.

According to an embodiment of the present disclosure, the update subunit is configured to: determining the adjusted state feature as an updated hidden state feature in response to each element in the adjusted state feature being within a predetermined boundary; and responding to the adjusted state characteristics including the target elements exceeding the preset boundary, and updating the target elements according to the preset forgetting rate to obtain the updated hidden state characteristics.

According to an embodiment of the present disclosure, an attention network is composed of a plurality of coding units connected in sequence; inputting a characteristic sequence of a coding unit ranked first in a plurality of coding units, wherein the characteristic sequence comprises an embedded characteristic sequence; the text feature obtaining module further comprises: the nonlinear processing submodule is used for carrying out nonlinear processing on the second feature sequence to obtain an output feature sequence, wherein the text feature sequence comprises: the characteristic sequence output by the coding unit arranged at the tail bit in the plurality of coding units; inputting the characteristic sequences of the coding units except the coding unit arranged at the head of the plurality of coding units, wherein the characteristic sequences comprise: and the characteristic sequences output by the previous coding units connected with other coding units.

Based on the training method of the text generation model provided by the disclosure, the disclosure also provides a training device of the text generation model. The apparatus will be described in detail below with reference to fig. 8.

Fig. 8 is a block diagram of a structure of a training apparatus for a text generation model according to an embodiment of the present disclosure.

As shown in fig. 8, the training apparatus 800 for generating a text model according to this embodiment includes a preprocessing module 810, a text feature obtaining module 820, a feature decoding module 830, and a model training module 840. The text feature obtaining module 820 may include an encoding sub-module 821, an adjusting sub-module 822, and an updating sub-module 823. The text generation model comprises a preprocessing network, an attention network and a decoding network; the attention network is composed of coding units.

The preprocessing module 810 is configured to perform preprocessing on each target text in the text sequence by using a preprocessing network to obtain an embedded feature sequence, where the embedded feature sequence includes embedded features corresponding to text units in each target text. In an embodiment, the preprocessing module 810 can be configured to perform the operation S610 described above, which is not described herein again.

The text feature obtaining module 820 is configured to input the embedded feature sequence into the attention network to obtain a text feature sequence output by the attention network. In an embodiment, the text feature obtaining module 820 may be configured to perform the operation S620 described above, which is not described herein again.

The feature decoding module 830 is configured to decode the text feature sequence by using a decoding network to generate a predicted post text of each target text. In an embodiment, the feature decoding module 830 may be configured to perform the operation S630 described above, which is not described herein again.

The model training module 840 is configured to train the text-generating model according to the predicted after-text and the after-text of each target text in the text sequence. In an embodiment, the model training module 840 may be configured to perform the operation S640 described above, which is not described herein again.

The encoding submodule 821 is configured to encode the input feature sequence by using an attention mechanism for the encoding unit, so as to obtain a first feature sequence. In an embodiment, the encoding submodule 821 may be configured to perform the operation S621 described above, and will not be described herein again.

The adjusting submodule 822 is configured to adjust the first feature sequence according to the hidden state feature to obtain a second feature sequence; the hidden state features characterize the semantics of the preceding text of the text to be processed. In an embodiment, the adjusting submodule 822 may be configured to perform the operation S622 described above, and details thereof are not repeated herein.

The update submodule 823 is configured to update the hidden-state feature according to the second feature sequence. In an embodiment, the update sub-module 823 may be configured to perform the operation S623 described above, which is not described herein again.

According to an embodiment of the present disclosure, a model training module includes: the loss value determining submodule is used for determining the loss value of the text generation model aiming at each target text according to the difference between the predicted post text and the post text of each target text in the text sequence; the gradient determination submodule is used for determining the gradient of the loss value relative to the network parameter in the text generation model by adopting a gradient return algorithm and a preset truncation position of the gradient return; and the training sub-module is used for training the text generation model according to the gradient of the network parameters by taking the minimum loss value as a target.

In the technical scheme of the present disclosure, the processes of collecting, storing, using, processing, transmitting, providing, disclosing and applying the personal information of the related users all conform to the regulations of related laws and regulations, and necessary security measures are taken without violating the good customs of the public order. In the technical scheme of the disclosure, before the personal information of the user is obtained or collected, the authorization or the consent of the user is obtained.

The present disclosure also provides an electronic device, a readable storage medium, and a computer program product according to embodiments of the present disclosure.

FIG. 9 illustrates a schematic block diagram of an example electronic device 900 that may be used to implement the text generation methods and/or the training methods of text generation models of embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic devices may also represent various forms of mobile devices, such as personal digital processors, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not intended to limit implementations of the disclosure described and/or claimed herein.

As shown in fig. 9, the apparatus 900 includes a computing unit 901, which can perform various appropriate actions and processes in accordance with a computer program stored in a Read Only Memory (ROM) 902 or a computer program loaded from a storage unit 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data required for the operation of the device 900 can also be stored. The calculation unit 901, ROM902, and RAM 903 are connected to each other via a bus 904. An input/output (I/O) interface 905 is also connected to bus 904.

A number of components in the device 900 are connected to the I/O interface 905, including: an input unit 906 such as a keyboard, a mouse, and the like; an output unit 907 such as various types of displays, speakers, and the like; a storage unit 908 such as a magnetic disk, optical disk, or the like; and a communication unit 909 such as a network card, a modem, a wireless communication transceiver, and the like. The communication unit 909 allows the device 900 to exchange information/data with other devices through a computer network such as the internet and/or various telecommunication networks.

The computing unit 901 may be a variety of general and/or special purpose processing components with processing and computing capabilities. Some examples of the computing unit 901 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 901 performs the respective methods and processes described above, such as a text generation method and/or a training method of a text generation model. For example, in some embodiments, the text generation method and/or the training method of the text generation model may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as storage unit 908. In some embodiments, part or all of a computer program may be loaded onto and/or installed onto device 900 via ROM902 and/or communications unit 909. When the computer program is loaded into the RAM 903 and executed by the computing unit 901, one or more steps of the text generation method and/or the training method of the text generation model described above may be performed. Alternatively, in other embodiments, the computing unit 901 may be configured by any other suitable means (e.g., by means of firmware) to perform the text generation method and/or the training method of the text generation model.

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), system on a chip (SOCs), complex Programmable Logic Devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.

Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program code, when executed by the processor or controller, causes the functions/acts specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user can be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server may be a cloud Server, which is also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service extensibility in a traditional physical host and a VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.

It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, and are not limited herein as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved.

The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.

Claims

1. A text generation method, comprising:

preprocessing a text to be processed to obtain an embedded characteristic sequence, wherein the embedded characteristic sequence comprises embedded characteristics corresponding to text units in the text to be processed;

inputting the embedded characteristic sequence into an attention network formed by a coding unit to obtain a text characteristic sequence output by the attention network; and

decoding the text feature sequence to generate a subsequent text of the text to be processed,

wherein the encoding unit is configured to perform the following operations:

coding the input characteristic sequence by adopting an attention mechanism to obtain a first characteristic sequence;

adjusting the first characteristic sequence according to the hidden state characteristic to obtain a second characteristic sequence; the hidden state features represent the semantics of a previous text of the text to be processed; and

and updating the hidden state feature according to the second feature sequence.

2. The method of claim 1, wherein the adjusting the first sequence of features to obtain a second sequence of features according to hidden state features comprises:

determining an adjustment amount corresponding to the first feature sequence according to the hidden state feature and the input feature sequence; and

and adjusting the first characteristic sequence according to the adjustment amount to obtain the second characteristic sequence.

3. The method of claim 1, wherein the input sequence of features includes a first text feature corresponding to the unit of text; the second sequence of features includes a second text feature corresponding to the text unit; the updating the hidden state feature according to the second feature sequence comprises:

determining an update amount for the text unit according to the second text feature and the first text feature; and

and updating the hidden state characteristics according to the updating amount.

4. The method of claim 3, wherein the determining an update amount for the unit of text from the second textual feature and the first textual feature comprises:

and processing the second text characteristic and the first text characteristic by adopting a Hubby law to obtain the updating amount aiming at the text unit.

5. The method of claim 4, wherein the processing the second textual features and the first textual features using Hubbu's law to derive the update amount for the textual unit comprises: the update amount is obtained using the following formula:

wherein, Δ W _i For the update of the ith text unit in the text to be processedAn amount; t' _i A second text feature corresponding to the ith text unit; h _i For a first text feature corresponding to said i-th text element, W _A 、W _B 、W _C 、W _D Is a network parameter of the coding unit.

6. The method of claim 3, wherein the updating the hidden-state features according to the second sequence of features further comprises:

carrying out nonlinear processing on the second text features to obtain a learning rate aiming at the text unit; and

and updating the hidden state features according to the learning rate and the updating amount.

7. The method of claim 6, wherein the text to be processed includes a plurality of text units; the updating the hidden-state feature according to the learning rate and the update amount includes:

determining a weighted update amount for the text unit according to the learning rate and the update amount for the text unit; and

updating the hidden state feature according to a plurality of weighted update amounts for a plurality of the text units.

8. The method of claim 7, wherein said updating the hidden-state feature according to a plurality of weighted update amounts for a plurality of the text units comprises:

adjusting the hidden state characteristic according to the sum of the weighted updating quantities to obtain an adjusted state characteristic; and

and updating the adjusted state characteristic by adopting a boundary function to obtain an updated hidden state characteristic.

9. The method of claim 8, wherein the updating the adjusted state feature with a boundary function to obtain an updated hidden state feature comprises:

determining the adjusted state feature as the updated hidden state feature in response to each element in the adjusted state feature being within a predetermined boundary; and

and responding to the adjusted state characteristics including the target elements exceeding the preset boundary, and updating the target elements according to a preset forgetting rate to obtain updated hidden state characteristics.

10. The method of claim 1, wherein the attention network is comprised of a plurality of coding units connected in sequence; inputting the characteristic sequence of the coding unit which is arranged at the head in the plurality of coding units to comprise the embedded characteristic sequence; the encoding unit is further configured to:

carrying out nonlinear processing on the second characteristic sequence to obtain an output characteristic sequence,

wherein the text feature sequence comprises: the characteristic sequence output by the coding unit arranged at the tail bit in the plurality of coding units; inputting the characteristic sequences of the coding units except the coding unit arranged at the head of the plurality of coding units comprises: and the characteristic sequence output by the previous coding unit connected with the other coding units.

11. A training method of a text generation model, wherein the text generation model comprises a preprocessing network, an attention network and a decoding network; the attention network is composed of coding units; the method comprises the following steps:

preprocessing each target text in the text sequence by adopting the preprocessing network to obtain an embedded feature sequence, wherein the embedded feature sequence comprises embedded features corresponding to text units in each target text;

inputting the embedded feature sequence into the attention network to obtain a text feature sequence output by the attention network;

decoding the text feature sequence by adopting the decoding network to generate a predicted text of each target text; and

training the text generation model according to the predicted subsequent text and the adjacent subsequent text of each target text in the text sequence;

wherein the encoding unit is configured to perform the following operations:

adjusting the first characteristic sequence according to the hidden state characteristic to obtain a second characteristic sequence; the hidden state features characterize the semantics of the preceding text of each target text; and

and updating the hidden state feature according to the second feature sequence.

12. The method of claim 11, wherein said training the text generation model based on the predicted-after text and the after-text of each of the target texts in the text sequence comprises:

determining a loss value of the text generation model for each target text according to a difference between the predicted subsequent text and the subsequent text of each target text in the text sequence;

determining the gradient of the loss value relative to the network parameters in the text generation model by adopting a gradient back-transmission algorithm and a preset truncation position of the gradient back-transmission; and

and training the text generation model according to the gradient of the network parameters by taking the minimized loss value as a target.

13. A text generation apparatus comprising:

the system comprises a preprocessing module, a processing module and a processing module, wherein the preprocessing module is used for preprocessing a text to be processed to obtain an embedded characteristic sequence, and the embedded characteristic sequence comprises embedded characteristics corresponding to text units in the text to be processed;

the text characteristic obtaining module is used for inputting the embedded characteristic sequence into an attention network formed by a coding unit to obtain a text characteristic sequence output by the attention network;

a feature decoding module, configured to decode the text feature sequence to generate a subsequent text of the text to be processed,

wherein the feature sequence obtaining module comprises:

the coding submodule is used for coding the input characteristic sequence by adopting an attention mechanism aiming at the coding unit to obtain a first characteristic sequence;

the adjusting submodule is used for adjusting the first characteristic sequence according to the hidden state characteristic to obtain a second characteristic sequence; the hidden state features represent the semantics of a previous text of the text to be processed; and

and the updating submodule is used for updating the hidden state characteristic according to the second characteristic sequence.

14. The apparatus of claim 13, wherein the adjustment submodule comprises:

an adjustment amount determining unit, configured to determine an adjustment amount corresponding to the first feature sequence according to the hidden state feature and the input feature sequence; and

and the adjusting unit is used for adjusting the first characteristic sequence according to the adjustment amount to obtain the second characteristic sequence.

15. The apparatus of claim 13, wherein the input sequence of features includes a first text feature corresponding to the unit of text; the second sequence of features includes a second text feature corresponding to the text unit; the update sub-module includes:

an update amount determination unit configured to determine an update amount for the text unit according to the second text feature and the first text feature; and

and the updating unit is used for updating the hidden state characteristics according to the updating amount.

16. The apparatus of claim 15, wherein the update amount determination unit is to:

and processing the second text feature and the first text feature by adopting a hebry law to obtain the updating amount aiming at the text unit.

17. The apparatus of claim 16, wherein the update amount determination unit is configured to derive the update amount by using the following formula:

wherein, Δ W _i The update quantity aiming at the ith text unit in the text to be processed; y' _i A second text feature corresponding to the ith text unit; h _i For a first text feature corresponding to said i-th text element, W _A 、W _B 、W _C 、W _D Is a network parameter of the coding unit.

18. The apparatus of claim 15, wherein the update submodule further comprises:

a learning rate determining unit, configured to perform nonlinear processing on the second text feature to obtain a learning rate for the text unit,

wherein the update unit is configured to: and updating the hidden state feature according to the learning rate and the updating amount.

19. The apparatus of claim 18, wherein the text to be processed includes a plurality of text units; the update unit includes:

a weighted amount determination subunit operable to determine a weighted update amount for the text unit, based on the learning rate and the update amount for the text unit; and

and the updating subunit is used for updating the hidden state feature according to a plurality of weighted updating quantities aiming at the plurality of text units.

20. The apparatus of claim 19, wherein the update subunit is to:

adjusting the hidden state feature according to the sum of the weighted updating quantities to obtain an adjusted state feature; and

21. The apparatus of claim 20, wherein the update subunit is to:

22. The apparatus of claim 13, wherein the attention network is comprised of a plurality of coding units connected in sequence; inputting a characteristic sequence of a coding unit ranked first in the plurality of coding units to comprise the embedded characteristic sequence; the text feature obtaining module further comprises:

a nonlinear processing submodule for performing nonlinear processing on the second feature sequence to obtain an output feature sequence,

23. A training device of a text generation model, wherein the text generation model comprises a preprocessing network, an attention network and a decoding network; the attention network is composed of coding units; the device comprises:

the preprocessing module is used for preprocessing each target text in the text sequence by adopting the preprocessing network to obtain an embedded feature sequence, and the embedded feature sequence comprises embedded features corresponding to text units in each target text;

a text feature obtaining module, configured to input the embedded feature sequence into the attention network, so as to obtain a text feature sequence output by the attention network;

the feature decoding module is used for decoding the text feature sequence by adopting the decoding network to generate a predicted text of each target text; and

the model training module is used for training the text generation model according to the predicted post text and the post text of each target text in the text sequence;

wherein the text feature obtaining module comprises:

the encoding submodule is used for encoding the input characteristic sequence by adopting an attention mechanism aiming at the encoding unit to obtain a first characteristic sequence;

the adjusting submodule is used for adjusting the first characteristic sequence according to the hidden state characteristic to obtain a second characteristic sequence; the hidden state features characterize semantics of a preceding text of each of the target texts; and

24. The apparatus of claim 23, wherein the model training module comprises:

a loss value determining sub-module, configured to determine a loss value of the text generation model for each target text in the text sequence according to a difference between the predicted subsequent text and a subsequent text of each target text;

the gradient determination submodule is used for determining the gradient of the loss value relative to the network parameter in the text generation model by adopting a gradient postback algorithm and a preset truncation position of the gradient postback; and

and the training submodule is used for training the text generation model according to the gradient of the network parameters by taking the minimized loss value as a target.

25. An electronic device, comprising:

at least one processor; and

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1 to 12.

26. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the method according to any one of claims 1-12.

27. A computer program product comprising computer program/instructions stored on at least one of a readable storage medium and an electronic device, which when executed by a processor implement the steps of the method according to any one of claims 1 to 12.