CN116383652A

CN116383652A - Model training method, controllable text generation method, system, equipment and medium

Info

Publication number: CN116383652A
Application number: CN202310354856.8A
Authority: CN
Inventors: 蔡华
Original assignee: Huayuan Computing Technology Shanghai Co ltd
Current assignee: Huayuan Computing Technology Shanghai Co ltd
Priority date: 2023-04-03
Filing date: 2023-04-03
Publication date: 2023-07-04
Anticipated expiration: 2043-04-03
Also published as: CN116383652B

Abstract

The invention discloses a model training method, a controllable text generation method, a system, equipment and a medium, wherein the model training method comprises the following steps: training a controllable text generation model by taking a first training sample as input to determine parameters of a prompt sub-model and an attention sub-model in the controllable text generation model; the controllable text generation model further comprises a pre-trained text generation sub-model; in each time step, the attention sub-model takes the implicit state of the prompt word of all previous time steps, the implicit state of the prompt word of the current time step and the implicit state of the text of all previous time steps as input, and takes the attention text matrix of the current time step as output. The invention uses the prompt submodel to guide the text to generate submodel, prevents the theme of the controllable text from diverging, and prevents the content irrelevant to the prompt word from appearing; the attention sub-model realizes independent prompt at each time step, prevents the content of the controllable text from not corresponding to the content of all the prompt words, and avoids the loss of the prompt words.

Description

Model training method, controllable text generation method, system, equipment and medium

Technical Field

The invention belongs to the technical field of information processing, and particularly relates to a model training method, a controllable text generation system, controllable text generation equipment and a controllable text generation medium.

Background

The controllable text generation can be used for increasing control on some attributes, styles and key information of the generated text based on the traditional text generation, so that the generated text meets certain expectations. The application scene is very wide, such as generating text with specific emotion, kiss, personality or style. Current causal language models (CLM, causal Language Model) are typically based on a Transformer architecture (a neural network architecture), such as GPT (generated Pre-Trained Transformer, an artificial intelligence model based on deep learning techniques), with very strong text generation capabilities. However, the output of the CLM is difficult to control because each token (subscript) in the CLM can only see the token information before it and not the tokens after it, and the training goal of the model is to predict the next-position token from the previous token, requiring one token to decode the next prediction step at a time.

Currently controllable text generation has two main directions of research:

Direction one: instructions are interpreted with CLM and incorporated into text generation, typically by fine tuning or hinting (Prompt). However, this method has no fine control capability in controlling the generation of specific points, which eventually leads to the divergence of the subject matter and the appearance of content that is not related to the control element. This is because the effect of a hint is inversely related to the distance of the hint from the next predictive marker, making hinting of non-adjacent text difficult.

The second method is as follows: emphasis is placed on fine-grained control and the generation process is guided at any point in time. This approach separates CLM from control methods, which typically limits the types that can be controlled, as well as loss of control elements and loss of positions between control elements.

Disclosure of Invention

The invention aims to overcome the defect that the generation of a controllable text is difficult in the prior art, and provides a model training method, a controllable text generation method, a system, equipment and a medium.

The invention solves the technical problems by the following technical scheme:

the invention provides a model training method, which comprises the following steps:

acquiring a plurality of first training samples; each first training sample comprises a first prompt word sequence;

Training a controllable text generation model by taking the first training sample as input so as to determine parameters of a prompt sub-model and an attention sub-model in the controllable text generation model; the controllable text generation model comprises the prompt sub-model, the attention sub-model and a pre-trained text generation sub-model; the text generation sub-model is a causal language model and is obtained by pre-training by taking a pre-training text as input;

in each training, the output of the controllable text generation model is controllable text; the content of the controllable text corresponds to the sequence and the content of the prompt words in the first prompt word sequence;

each training process comprises the same time steps as the number of the prompting words in the first prompting word sequence;

in each time step, the prompt sub-model takes the first prompt word sequence as input and takes the prompt word hidden state of the current time step as output; the attention sub-model takes the implicit state of the prompt word of all previous time steps, the implicit state of the prompt word of the current time step and the implicit state of the text of all previous time steps as input, and takes the attention text matrix of the current time step as output; the text generation sub-model generates a text hidden state of the current time step based on the attention text matrix of the current time step; in the first time step, the attention sub-model takes the implicit state of the prompt word and the implicit state of the preset text in the first time step as input.

Preferably, in each time step, the attention sub-model is used to:

generating a prompt word matrix of the current time step according to a self-attention mechanism according to the prompt word hidden states of all the previous time steps and the prompt word hidden states of the current time step; the prompt word matrix comprises a prompt word query matrix, a prompt word key matrix and a prompt word value matrix;

generating a prompt word self-attention matrix of the current time step according to a self-attention mechanism according to the prompt word matrix of the current time step;

generating a first text matrix of the current time step according to a self-attention mechanism according to the text hidden states of all the previous time steps; the first text matrix comprises a first text query matrix, a first text key matrix and a first text value matrix;

generating a text self-attention matrix of the current time step according to a self-attention mechanism according to the first text matrix of the current time step;

generating a second text matrix of the current time step according to the text hidden state of the last time step; the second text matrix comprises a second text key matrix and a second text value matrix;

generating a prompt word attention matrix of the current time step according to an attention mechanism according to the prompt word inquiry matrix of the current time step and the second text matrix of the current time step;

And adding the prompting word self-attention matrix of the current time step, the text self-attention matrix of the current time step and the prompting word attention moment matrix of the current time step to obtain the attention text matrix of the current time step.

Preferably, the controllable text generation model further comprises a location sub-model;

the step of training the controllable text generation model by taking the first training sample as input to determine parameters of a prompt sub-model and an attention sub-model in the controllable text generation model further comprises the following steps:

setting parameters of the prompt sub-model, the attention sub-model and the text generation sub-model to be fixed;

acquiring a plurality of second training samples; each second training sample comprises a second prompt word sequence;

training the controllable text generation model by taking the second training sample as input so as to determine parameters of the position sub-model;

in each time step, the position sub-model takes the attention text matrix of the current time step as input and takes the position text matrix of the current time step as output; the text generation sub-model takes a position text matrix of the current time step as input.

Preferably, the step of training the controllable text generation model to determine the parameters of the location sub-model using the second training sample as input further includes:

setting the parameters of the position sub-model and the text generation sub-model to be fixed;

acquiring third training data;

and training the controllable text generation model by taking the third training data as input so as to finely adjust the parameters of the prompt sub-model and the attention sub-model.

The invention also provides a controllable text generation method, which comprises the following steps:

acquiring a prompt word sequence to be processed;

taking the prompt word sequence to be processed as input of a controllable text generation model to obtain a controllable text; the controllable text generation model is trained according to the model training method.

The invention also provides a model training system, which comprises:

the first training sample acquisition module is used for acquiring a plurality of first training samples; each first training sample comprises a first prompt word sequence;

the training module is used for training a controllable text generation model by taking the first training sample as input so as to determine parameters of a prompt sub-model and an attention sub-model in the controllable text generation model; the controllable text generation model comprises the prompt sub-model, the attention sub-model and a pre-trained text generation sub-model; the text generation sub-model is a causal language model and is obtained by pre-training by taking a pre-training text as input;

The invention also provides a controllable text generation system, which comprises:

the acquisition module is used for acquiring a prompt word sequence to be processed;

The controllable text generation module is used for taking the to-be-processed prompt word sequence as input of a controllable text generation model to obtain a controllable text; the controllable text generation model is trained according to the model training system.

The invention also provides an electronic device comprising a memory, a processor and a computer program stored on the memory and capable of running on the processor, wherein the processor realizes the model training method or the controllable text generating method when executing the computer program.

The invention also provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the foregoing model training method or the foregoing controllable text generation method.

The invention has the positive progress effects that: according to the method, a prompt sub-model is added into the controllable text generation model, the text generation sub-model is guided by using the prompt sub-model, and the theme of the controllable text is prevented from diverging, so that content irrelevant to a prompt word appears; adding an attention sub-model into the controllable text generation model, wherein the input of the attention sub-model comprises the implicit state of the prompting words of all previous time steps and the implicit state of the prompting words of the current time steps output by the prompting sub-model, and also comprises the implicit state of the texts of all previous time steps output by the text generation sub-model, the attention text matrix of the current time steps is more closely related to the implicit state of the texts of the previous time steps, so that independent prompting is realized in each time step, the content of the controllable text is prevented from not corresponding to the content of all prompting words, and the loss of the prompting words is avoided; and adding a position sub-model into the controllable text generation model to prevent the content of the controllable text from not corresponding to the sequence of the prompt words.

Drawings

Fig. 1 is a flowchart of a model training method provided in embodiment 1 of the present invention.

Fig. 2 is a schematic diagram of a parameter confirmation process of a prompt submodel provided in embodiment 1 of the present invention.

Fig. 3 is a schematic diagram of a first training process of the controllable text generation model provided in embodiment 1 of the present invention.

Fig. 4 is a schematic diagram of a second training process of the controllable text generation model provided in embodiment 1 of the present invention. Fig. 5 is a flowchart of a controllable text generation method provided in embodiment 2 of the present invention.

Fig. 6 is a schematic block diagram of a model training system according to embodiment 3 of the present invention.

Fig. 7 is a schematic block diagram of a controllable text generation system according to embodiment 4 of the present invention.

Fig. 8 is a schematic structural diagram of an electronic device according to embodiment 5 of the present invention.

Detailed Description

The invention is further illustrated by means of the following examples, which are not intended to limit the scope of the invention.

Example 1

The present embodiment provides a model training method, as shown in fig. 1, including:

s101, acquiring a plurality of first training samples.

Each first training sample includes a first sequence of cue words.

S102, training the controllable text generation model by taking the first training sample as input so as to determine parameters of a prompt sub-model and an attention sub-model in the controllable text generation model.

The controllable text generation model comprises a prompt sub-model, an attention sub-model and a pre-trained text generation sub-model; the text generation sub-model is a causal language model and is obtained by pre-training by taking pre-training text as input.

Specifically, in this embodiment, the text generation submodel may be based on a fransfomer architecture, and keywords may be extracted from a single sentence as the hint words in the first hint word sequence.

In each training, the output of the controllable text generation model is controllable text; the content of the controllable text corresponds to the sequence and the content of the prompt words in the first prompt word sequence; each training process includes the same number of time steps as the number of cue words in the first sequence of cue words.

Preferably, keywords may be extracted from the single sentence as the cue words in the first cue word sequence. Specifically, in this embodiment, "happy" may be used as the first prompt word sequence, and the obtained controllable text is "He Te satisfied". Wherein, the 'satisfaction' in the single sentence corresponds to the prompt word 'happiness'.

In each time step, the prompt sub-model takes a first prompt word sequence as input and takes the prompt word hidden state of the current time step as output; the attention sub-model takes the implicit state of the prompting word of all previous time steps, the implicit state of the prompting word of the current time step and the implicit state of the text of all previous time steps as input, and takes the attention text matrix of the current time step as output; the text generation sub-model generates a text hidden state of the current time step based on the attention text matrix of the current time step; in the first time step, the attention sub-model takes the implicit state of the prompt word and the implicit state of the preset text in the first time step as input.

Preferably, as shown in fig. 2, a pre-trained text generation sub-model may be used as a framework of a prompt sub-model, parameters of the text generation sub-model are copied into the prompt sub-model, and parameters of the pre-trained text generation sub-model are used as initial values of the parameters of the prompt sub-model, so that model training time is reduced, and model training efficiency is improved.

Preferably, in each time step, the attention sub-model may be used to generate a prompt word matrix of the current time step according to a self-attention mechanism according to the prompt word hidden states of all previous time steps and the prompt word hidden states of the current time step; the prompt word matrix comprises a prompt word query matrix, a prompt word key matrix and a prompt word value matrix; generating a prompt word self-attention matrix of the current time step according to a self-attention mechanism according to the prompt word matrix of the current time step; generating a first text matrix of the current time step according to a self-attention mechanism according to the text hidden states of all the previous time steps; the first text matrix comprises a first text query matrix, a first text key matrix and a first text value matrix; generating a text self-attention matrix of the current time step according to a self-attention mechanism according to the first text matrix of the current time step; generating a prompt word attention matrix of the current time step according to an attention mechanism according to the prompt word inquiry matrix of the current time step and the second text matrix of the current time step; generating a second text matrix of the current time step according to the text hidden state of the last time step; the second text matrix comprises a second text key matrix and a second text value matrix; and adding the prompting word self-attention matrix of the current time step, the text self-attention matrix of the current time step and the prompting word attention moment matrix of the current time step to obtain the attention text matrix of the current time step.

Specifically, in this embodiment, the prompt word matrix of the current time step may be generated by the following formula:

Q _t ＝W _Q ·H _t

K _t ＝W _K ·H _t

V _t ＝W _V ·H _t

wherein Q is _t Prompt word query matrix, K, representing time step t _t Prompt word key matrix representing time step t, V _t Prompt word value matrix representing time step t, W _Q 、W _K And W is _V To the parameters to be determined by training, W _Q Query weight matrix, W, representing prompt words _K Representing a prompt word key weight matrix, W _V Weight matrix for indicating prompt word value, H _t The implicit state of the prompt word and the implicit state of the prompt word of the time step t are represented by all time steps before the time step t.

Specifically, in this embodiment, the cue word self-attention matrix of the current time step may be generated by the following formula:

wherein softmax represents a normalized exponential function, A _t Prompt word self-attention matrix representing time step t, Q _t Prompt word query matrix, K, representing time step t _t Prompt word key matrix representing time step t, V _t Prompt word value matrix representing time step t, d _k Representing the dimensions of the hidden layer of the attention sub-model.

Specifically, in the present embodiment, the first text matrix of the current time step may be generated by the following formula:

q _t ＝W _q ·Y _t

k _t ＝W _k ·Y _t

v _t ＝W _v ·Y _t

wherein q _t A first text query matrix, k, representing a time step t _t First text key matrix, v, representing time step t _t A first matrix of text values, W, representing a time step t _q 、W _k And W is _v To the parameters to be determined by training, W _q Representing a first text query weight matrix, W _k Representing a first text key weight matrix, W _v Representing a first text value weightHeavy matrix, Y _t Representing the implicit state of the text for all time steps before time step t.

Specifically, in the present embodiment, the text self-attention matrix of the current time step may be generated by the following formula:

wherein softmax represents the normalized exponential function, B _t Text self-attention matrix, q, representing time step t _t A first text query matrix, k, representing a time step t _t First text key matrix, v, representing time step t _t A first matrix of text values, d, representing a time step t _k Representing the dimensions of the hidden layer of the attention sub-model.

Specifically, in the present embodiment, the second text matrix of the current time step may be generated by the following formula:

M _t ＝W _M ·L _t-1

N _t ＝W _N ·L _t-1

wherein M is _t A second text key matrix representing a time step t, N _t A second matrix of text values, W, representing a time step t _M And N _t To the parameters to be determined by training, W _M Representing a second text key weight matrix, W _N Representing a second text value weight matrix, L _t-1 Representing the implicit state of the text at time step t-1.

Specifically, in the present embodiment, the prompt word attention matrix of the current time step may be generated by the following formula:

wherein softmax represents the normalized exponential function, C _t Prompt term attention matrix representing time step t, M _t A second text key matrix representing a time step t, N _t Second text representing time step tValue matrix, d _k Representing the dimensions of the hidden layer of the attention sub-model.

Specifically, in the present embodiment, the attention text matrix of the current time step may be generated by the following formula:

Z _t ＝(A _t +B _t +C _t )·W _Z

wherein A is _t Prompt word self-attention matrix representing time step t, B _t Text self-attention matrix representing time step t, C _t Prompt word attention matrix representing time step t, W _Z The attention text matrix is represented for parameters that need to be determined by training.

In one non-limiting implementation of this embodiment, the output of the attention sub-model is used directly as input to the text generation sub-model. In this case, in each time step, as shown in fig. 3, the prompt sub-model takes the first prompt word sequence as input and takes the prompt word hidden state of the current time step as output; the attention sub-model takes the implicit state of the prompting word of all previous time steps, the implicit state of the prompting word of the current time step and the implicit state of the text of all previous time steps as input, and takes the attention text matrix of the current time step as output; the text generation sub-model takes the attention text matrix of the current time step as input and takes the text hidden state of the current time step as output.

In another non-limiting implementation manner of this embodiment, since the position coding system in the alert sub-model is not beneficial to control the generation of text at any time step, this situation is more serious when the alert sub-model adopts a transform architecture, and the position sub-model is added as a plug-in the controllable text generation model, so that the content of the controllable text is prevented from not corresponding to the sequence of the alert words through a set of additional weights. The parameters of the position sub-model are trained after the parameters of the prompt sub-model and the attention sub-model are determined, so that the overall training time is shortened, and the model training efficiency is improved. In the case that the controllable text generation further includes a location sub-model, step S102 further includes, after: setting parameters of a prompt sub-model, an attention sub-model and a text generation sub-model to be fixed; acquiring a plurality of second training samples; each second training sample comprises a second prompt word sequence; and training the controllable text generation model by taking the second training sample as input so as to determine the parameters of the position sub-model. In each time step, as shown in fig. 4, the prompt sub-model takes a first prompt word sequence as input and takes the prompt word hidden state of the current time step as output; the attention sub-model takes the implicit state of the prompting word of all previous time steps, the implicit state of the prompting word of the current time step and the implicit state of the text of all previous time steps as input, and takes the attention text matrix of the current time step as output; the position sub-model takes the attention text matrix of the current time step as input and takes the position text matrix of the current time step as output; the text generation sub-model takes a position text matrix of the current time step as input and takes a text hidden state of the current time step as output.

Preferably, keywords may be extracted from the compound sentence as the cue words in the second sequence of cue words. Specifically, in this embodiment, the coastline is viewed when the training compound sentence "long lazy ocean. Suddenly, there is a feeling of injury, and tears flow involuntarily. When he sees the rainbow on the sky, he spreads the eyebrow again, and feels happy. When the second prompting word sequence is "comprising: a captain, a coastline; emotion: injury to the heart; emotion: open heart. The "captain lazy ocean is looking at the coastline" in the compound sentence corresponds to the prompt words "captain" and "coastline", the "wounded feeling with species and mole name" in the compound sentence corresponds to the prompt word "wounded heart", the "brow is stretched again in the compound sentence, and the" happy comfort "corresponds to the prompt word" happy ". In order to improve the usability of the controllable text generation model in the vertical field, it is preferable to include, after step S102: acquiring third training data; and training the controllable text generation model by taking the third training data as input so as to finely adjust the parameters of the prompt sub-model and the attention sub-model. Specifically, the third training sample may be derived from a vertical domain, and the specific vertical domain may be determined according to the practical situation when implementing.

According to the embodiment, a prompt sub-model is added into the controllable text generation model, the text generation sub-model is guided by using the prompt sub-model, the theme of the controllable text is prevented from being scattered, and content irrelevant to a prompt word appears; adding an attention sub-model into the controllable text generation model, wherein the input of the attention sub-model comprises the implicit state of the prompting words of all previous time steps and the implicit state of the prompting words of the current time steps output by the prompting sub-model, and also comprises the implicit state of the texts of all previous time steps output by the text generation sub-model, the attention text matrix of the current time steps is more closely related to the implicit state of the texts of the previous time steps, so that independent prompting is realized in each time step, the content of the controllable text is prevented from not corresponding to the content of all prompting words, and the loss of the prompting words is avoided; adding a position sub-model into the controllable text generation model to prevent the content of the controllable text from not corresponding to the sequence of the prompt words; and the parameters of the prompt sub-model and the attention sub-model are finely adjusted, so that the practicability of the controllable text generation model in the vertical field is improved.

Example 2

The embodiment provides a controllable text generating method, as shown in fig. 5, including:

S201, acquiring a prompt word sequence to be processed.

S202, taking the prompt word sequence to be processed as input of a controllable text generation model to obtain a controllable text.

The controllable text generation model is trained according to the model training method of example 1.

The prompt submodel is added into the controllable text generation model used in the embodiment, the text generation submodel is guided by using the prompt submodel, the theme of the controllable text is prevented from being scattered, and the content irrelevant to the prompt word appears; adding an attention sub-model into the controllable text generation model, wherein the input of the attention sub-model comprises the implicit state of the prompting words of all previous time steps and the implicit state of the prompting words of the current time steps output by the prompting sub-model, and also comprises the implicit state of the texts of all previous time steps output by the text generation sub-model, the attention text matrix of the current time steps is more closely related to the implicit state of the texts of the previous time steps, so that independent prompting is realized in each time step, the content of the controllable text is prevented from not corresponding to the content of all prompting words, and the loss of the prompting words is avoided; adding a position sub-model into the controllable text generation model to prevent the content of the controllable text from not corresponding to the sequence of the prompt words; and the parameters of the prompt sub-model and the attention sub-model are finely adjusted, so that the practicability of the controllable text generation model in the vertical field is improved.

Example 3

The present embodiment provides a model training system, and as shown in fig. 6, the model training system 30 includes a first training sample acquisition module 31 and a training module 32.

The first training sample acquiring module 31 is configured to acquire a plurality of first training samples; each first training sample includes a first sequence of cue words.

The training module 32 is configured to train the controllable text generation model using the first training sample as input to determine parameters of the prompt sub-model and the attention sub-model in the controllable text generation model.

Specifically, in the present embodiment, the text generation submodel may be based on a transfomere architecture.

Preferably, keywords may be extracted from the single sentence as the cue words in the first cue word sequence. Specifically, in this embodiment, "happy" may be used as the first prompt word sequence, and the obtained controllable text is "He Te satisfied". Wherein, the 'satisfaction' in the single sentence corresponds to the prompt word 'happiness'. In each time step, the prompt sub-model takes a first prompt word sequence as input and takes the prompt word hidden state of the current time step as output; the attention sub-model takes the implicit state of the prompting word of all previous time steps, the implicit state of the prompting word of the current time step and the implicit state of the text of all previous time steps as input, and takes the attention text matrix of the current time step as output; the text generation sub-model generates a text hidden state of the current time step based on the attention text matrix of the current time step; in the first time step, the attention sub-model takes the implicit state of the prompt word and the implicit state of the preset text in the first time step as input.

Preferably, the pre-trained text generation sub-model can be used as a framework of the prompt sub-model, parameters of the text generation sub-model are copied into the prompt sub-model, and the parameters of the pre-trained text generation sub-model are used as initial values of the parameters of the prompt sub-model, so that the model training time is shortened, and the model training efficiency is improved.

Q _t ＝W _Q ·H _t

K _t ＝W _K ·H _t

V _t ＝W _V ·H _t

q _t ＝W _q ·Y _t

k _t ＝W _k ·Y _t

v _t ＝W _v ·Y _t

wherein q _t A first text query matrix, k, representing a time step t _t First text key matrix, v, representing time step t _t A first matrix of text values, W, representing a time step t _q 、W _k And W is _v To the parameters to be determined by training, W _q Representing a first text query weight matrix, W _k Representing a first text key weight matrix, W _v Representing a first text value weight matrix, Y _t Representing the implicit state of the text for all time steps before time step t.

M _t ＝W _M ·L _t-1

N _t ＝W _N ·L _t-1

wherein Q is _t Prompt word query matrix representing time step t, M _t A second text key matrix representing a time step t, N _t A second matrix of text values, W, representing a time step t _M And N _t To the parameters to be determined by training, W _M Representing a second text key weight matrix, W _N Representing a second text value weight momentArray, L _t-1 Representing the implicit state of the text at time step t-1.

wherein softmax represents the normalized exponential function, C _t Prompt term attention matrix representing time step t, M _t A second text key matrix representing a time step t, N _t A second matrix of text values, d, representing a time step t _k Representing the dimensions of the hidden layer of the attention sub-model.

Z _t ＝(A _t +B _t +C _t )·W _Z

In one non-limiting implementation of this embodiment, the output of the attention sub-model is used directly as input to the text generation sub-model. In this case, in each time step, the prompt sub-model takes the first prompt word sequence as input and takes the prompt word hidden state of the current time step as output; the attention sub-model takes the implicit state of the prompting word of all previous time steps, the implicit state of the prompting word of the current time step and the implicit state of the text of all previous time steps as input, and takes the attention text matrix of the current time step as output; the text generation sub-model takes the attention text matrix of the current time step as input and takes the text hidden state of the current time step as output.

In another non-limiting implementation manner of this embodiment, since the position coding system in the alert sub-model is not beneficial to control the generation of text at any time step, this situation is more serious when the alert sub-model adopts a transform architecture, and the position sub-model is added as a plug-in the controllable text generation model, so that the content of the controllable text is prevented from not corresponding to the sequence of the alert words through a set of additional weights. The parameters of the position sub-model are trained after the parameters of the prompt sub-model and the attention sub-model are determined, so that the overall training time is shortened, and the model training efficiency is improved. Where the controllable text generation further includes a location sub-model, model training system 30 further includes a fixed module, a second training sample acquisition module, and a location sub-model training module. The fixing module is used for setting parameter fixing of the prompt sub-model, the attention sub-model and the text generation sub-model. The second training sample acquisition module is used for acquiring a plurality of second training samples. Each second training sample includes a second sequence of cue words. The position sub-model training module is used for taking the second training sample as input to train the controllable text generation model so as to determine the parameters of the position sub-model. In each time step, the prompt sub-model takes a first prompt word sequence as input and takes the prompt word hidden state of the current time step as output; the attention sub-model takes the implicit state of the prompting word of all previous time steps, the implicit state of the prompting word of the current time step and the implicit state of the text of all previous time steps as input, and takes the attention text matrix of the current time step as output; the position sub-model takes the attention text matrix of the current time step as input and takes the position text matrix of the current time step as output; the text generation sub-model takes a position text matrix of the current time step as input and takes a text hidden state of the current time step as output.

Preferably, keywords may be extracted from the compound sentence as the cue words in the second sequence of cue words. Specifically, in this embodiment, the coastline is viewed when the training compound sentence "long lazy ocean. Suddenly, there is a feeling of injury, and tears flow involuntarily. When he sees the rainbow on the sky, he spreads the eyebrow again, and feels happy. When the second prompting word sequence is "comprising: a captain, a coastline; emotion: injury to the heart; emotion: open heart. The "captain lazy ocean is looking at the coastline" in the compound sentence corresponds to the prompt words "captain" and "coastline", the "wounded feeling with species and mole name" in the compound sentence corresponds to the prompt word "wounded heart", the "brow is stretched again in the compound sentence, and the" happy comfort "corresponds to the prompt word" happy ".

To enhance the usability of the controllable text generation model in the vertical field, the model training system 30 may preferably further comprise a third training data acquisition module and a fine tuning module. The third training data acquisition module is used for acquiring third training data. The fine tuning module is used for training the controllable text generation model by taking the third training data as input so as to carry out fine tuning on the parameters of the prompt sub-model and the attention sub-model. Specifically, the third training sample may be derived from a vertical domain, and the specific vertical domain may be determined according to the practical situation when implementing.

Example 4

The present embodiment provides a controllable text generation system, and as shown in fig. 7, the generation system 40 includes an acquisition module 41 and a controllable text generation module 42.

The obtaining module 41 is configured to obtain a sequence of to-be-processed prompt words.

The controllable text generation module 42 is configured to obtain controllable text by using the sequence of to-be-processed prompt words as input of a controllable text generation model.

The controllable text generation model is trained in accordance with the model training system of embodiment 3.

Example 5

Fig. 8 is a schematic structural diagram of an electronic device according to embodiment 5 of the present invention. Comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the model training method of the foregoing embodiment 1 or the controllable text generation method of embodiment 2 when executing the computer program. The electronic device 50 shown in fig. 8 is merely an example and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.

The electronic device 50 may be embodied in the form of a general purpose computing device, which may be a server device, for example. Components of electronic device 50 may include, but are not limited to: the at least one processor 51, the at least one memory 52, a bus 53 connecting the different system components, including the memory 52 and the processor 51.

The bus 53 includes a data bus, an address bus, and a control bus.

Memory 52 may include volatile memory such as Random Access Memory (RAM) 521 and/or cache memory 522, and may further include Read Only Memory (ROM) 523.

Memory 52 may also include a program/utility 525 having a set (at least one) of program modules 524, such program modules 524 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment.

The processor 51 executes various functional applications and data processing such as the model training method of embodiment 1 of the present invention or the controllable text generation method of embodiment 2 by running a computer program stored in the memory 52.

The electronic device 50 may also communicate with one or more external devices 54 (e.g., keyboard, pointing device, etc.). Such communication may occur through an input/output (I/O) interface 55. Also, model-generating device 50 may also communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the internet via network adapter 56. As shown, the network adapter 56 communicates with other modules of the model-generating device 50 via the bus 53. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in connection with the model-generating device 50, including, but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID (disk array) systems, tape drives, data backup storage systems, and the like.

It should be noted that although several units/modules or sub-units/modules of an electronic device are mentioned in the above detailed description, such a division is merely exemplary and not mandatory. Indeed, the features and functionality of two or more units/modules described above may be embodied in one unit/module in accordance with embodiments of the present invention. Conversely, the features and functions of one unit/module described above may be further divided into ones that are embodied by a plurality of units/modules.

Example 6

The present invention also provides a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the model training method of the foregoing embodiment 1 or the controllable text generation method of embodiment 2.

More specifically, among others, readable storage media may be employed including, but not limited to: portable disk, hard disk, random access memory, read only memory, erasable programmable read only memory, optical storage device, magnetic storage device, or any suitable combination of the foregoing.

In a possible embodiment, the invention may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the model training method of embodiment 1 or the controllable text generation method of embodiment 2 when said program product is run on the terminal device.

Wherein the program code for carrying out the invention may be written in any combination of one or more programming languages, which program code may execute entirely on the user device, partly on the user device, as a stand-alone software package, partly on the user device and partly on the remote device or entirely on the remote device.

While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that this is by way of example only, and the scope of the invention is defined by the appended claims. Various changes and modifications to these embodiments may be made by those skilled in the art without departing from the principles and spirit of the invention, but such changes and modifications fall within the scope of the invention.

Claims

1. A model training method, characterized in that the model training method comprises:

2. The model training method of claim 1,

in each time step, the attention sub-model is used to:

3. The model training method of claim 2, wherein the controllable text generation model further comprises a location sub-model;

4. The model training method of claim 3, wherein the step of training the controllable text generation model to determine the parameters of the location sub-model using the second training sample as input further comprises:

acquiring third training data;

5. A method for generating controllable text, the method comprising:

acquiring a prompt word sequence to be processed;

taking the prompt word sequence to be processed as input of a controllable text generation model to obtain a controllable text; the controllable text generation model is trained according to the model training method of any one of claims 1-4.

6. A model training system, the model training system comprising:

7. A system for generating controllable text, the system comprising:

The controllable text generation module is used for taking the to-be-processed prompt word sequence as input of a controllable text generation model to obtain a controllable text; the controllable text generation model is trained in accordance with the model training system of claim 6.

8. An electronic device comprising a memory, a processor and a computer program stored on the memory for execution on the processor, characterized in that the processor implements the model training method of any one of claims 1 to 4 or the controllable text generation method of claim 5 when executing the computer program.

9. A computer readable storage medium having stored thereon a computer program, which when executed by a processor implements the model training method of any one of claims 1 to 4 or the controllable text generation method of claim 5.