CN115905852A

CN115905852A - Story generation method, system, storage medium and terminal based on pre-training prompt

Info

Publication number: CN115905852A
Application number: CN202210818147.6A
Authority: CN
Inventors: 倪宣凡; 李丕绩
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2022-07-12
Filing date: 2022-07-12
Publication date: 2023-04-04

Abstract

The invention discloses a story generation method based on a pre-training prompt, which comprises the following steps: inputting a story beginning to a first pre-trained model, the story beginning containing a plurality of sentences, the first pre-trained model generating event inference corresponding to each sentence; filling the events into different question templates according to the types of the events to obtain the questions about the story; answering the questions by using a question-answer model to generate answers to obtain an answer set; for each answer in the answer set, calculating the perplexity of the answer set by using a second pre-training model, and selecting the answer with the minimum score as the story text. Aiming at the problem that the deep neural network model is lack of data sets during training, the method fully stimulates the potential of the pre-training model by constructing a high-quality prompt template, and helps the model to remember the knowledge learned during training so as to better complete the downstream tasks.

Description

Story generation method, system, storage medium and terminal based on pre-training prompt

Technical Field

The invention relates to a story generation method, a system, a storage medium and a terminal based on pre-training prompts, and belongs to the field of natural language processing in the field of computers.

Background

Open story generation is a classic task in the field of natural language processing, and how to ensure the consistency, continuity and logicality of stories during generation is always a very challenging problem. With the development of deep neural networks, more parametric and structurally complex deep neural network models are continuously proposed and applied to story generation. However, the model often requires a lot of data and time training to achieve certain effects, which is difficult to satisfy in many scenarios. Thus, research has been conducted to use models that have been trained on large-scale datasets to perform different tasks. The current practice in the prior art is to fine-tune a pre-trained model, change a partial structure of the model, and retrain the model to execute a downstream task. However, for each task, all parameters of the model need to be saved, and in most cases, the cost is not small.

For a recent period of time, a relevant study of Prompt has conducted well. A large amount of work shows that the Prompt has the advantages and performance which are not achieved by a common fine-tuning model under the scene of few samples and zero samples, and is very fit with the characteristics of a pre-training language model. The Prompt Learning is based on a language model and directly models the text probability. To perform a task using these pre-trained models, the original input is modified using the template Prompt into a text string with some unfilled slots, and then the unfilled information is probability-filled using a language model to obtain a final string from which the final output can be derived.

Disclosure of Invention

The invention aims to solve the technical problems that:

the invention aims to solve the following difficulties in story generation, and generates a story with smooth sentences, correct logic and reasonable plot:

readability: the most basic requirements of the generated story are that the story can be read by others, the stories can be understood and expanded by others, the story cannot be a stack of messy codes, the Chinese story cannot be generated, and English is generated as a result.

The logic property: the story role has logic characteristics, the story content conforms to a social logic system, and the story is attached to an adhesive logic rule.

Consistency: contextual roles, environments, events, etc. are consistent.

Continuity: the narrative remains continuous. The situation does not occur where one story is spoken above and another story is spoken below.

The invention adopts the following technical scheme for solving the technical problems: a story generation method based on pre-training prompts comprises the following steps:

1) Inputting a story beginning to a first pre-training model, the story beginning containing a plurality of sentences, the first pre-training model generating event inference corresponding to each sentence;

2) Filling the events into different problem templates according to the types of the events to obtain problems related to the story text;

3) Answering the questions by using a question-answer model to generate answers to obtain an answer set;

4) For each answer in the answer set, calculating the perplexity of the answer set by using a second pre-training model, and selecting the answer with the minimum score as the story text.

Preferably, the first pre-training model is a Para-Comet pre-training model.

Preferably, the implementation process of step 2) is as follows:

2.1 Different link templates and problem templates are constructed according to the types corresponding to the events;

2.2 Filling the event reasoning obtained in the step 1) into a corresponding link template, splicing the beginning of a story before the link template, inputting the link template into a RoBERTA model to generate sentences, splicing the generated sentences at the beginning of the story, and then taking the spliced sentences as the input of the RoBERTA model of the next round to obtain roles corresponding to each event reasoning to obtain an < event, role > pair;

2.3 Fill the event and role in the question template to get the final question.

Preferably, the implementation process of step 3) is as follows:

sequentially splicing sentences generated by each round of RoBERTA model at the beginning of a story to serve as documents, and inputting the documents and the questions into an ELI5QA model to generate a candidate answer set; or splicing the questions and the documents into a training set, and inputting the training set into a BART model to generate a candidate answer set.

Preferably, the implementation process of step 4) is as follows: and merging the answer sets of each question, calculating the Perplexity of each element in the set, namely the answer, by using a second pre-training model, and selecting the element with the minimum Perplexity as the text of the story, wherein the second pre-training model is a GPT2 model.

Preferably, after the ELI5QA model generates the answer, the answer is cleaned to some extent: collecting a plurality of forbidden phrases to form a set, and adding the forbidden phrases into the candidate answer set when the answer is a single sentence; when the answer is a plurality of sentences, discarding the sentences containing the elements appearing in the prohibited phrase set and all the sentences following the sentences; if the length of the first sentence is less than 6, the content of the first sentence is also abandoned, and the rest sentences are added into the candidate answer set to obtain the candidate answer set corresponding to each question.

Preferably, the BART model generates a candidate answer set, the questions and the documents are spliced into a Question-T-Document form, and the spliced content is used as a value of the corresponding keyword text; all sentences behind the current sentence are taken as values corresponding to the keywords, the values and the text key value pairs form a dictionary, the dictionary is stored in a jsonnines file, a training set and a verification set are obtained, the BART is trained, when the problems and the documents are generated, the problems and the documents are spliced into a training set data form, a Top-k sampling decoding algorithm is selected, the k value of the algorithm is set, the generated result is cleaned by adopting the same strategy as an ELI5QA model, and a candidate answer set corresponding to each problem is obtained.

A story generation system based on pre-training cues, comprising the following modules:

1) A common sense reasoning module; the method comprises the steps that a Para-COMET pre-training model is used for generating reasoning events corresponding to sentences for the beginning of an input story;

2) A problem generation module; the system is used for filling the reasoning events into different question templates according to the types of the reasoning events to obtain multi-angle guide questions about the above stories;

3) The answer generating module is used for answering the questions by using an ELI5QA model or a BART model and generating a candidate item set under the story;

4) And the scoring selection module is used for sending answers to each element in the candidate item set, calculating the confusion score of each element by using the GPT2 model, and selecting the element with the smallest score as the following sentence.

A storage medium having stored thereon a computer program which, when executed by a processor, implements the pre-training cue-based story generation method described above.

A terminal, comprising: a processor and a memory;

the memory is used for storing a computer program;

the processor is used for executing the computer program stored in the memory so as to enable the terminal to execute the story generation method based on the pre-training prompt.

Compared with the prior art, the invention adopting the technical scheme has the following beneficial effects:

1) Aiming at the problem that a large-scale deep neural network model is lack of a data set in training, the potential of a pre-training model is fully excited by combining a new prompting learning thought through constructing a high-quality prompting template, and the model is helped to remember knowledge learned in training so as to well complete downstream tasks;

2) The system provided by the invention preliminarily explores the combined use mode of the pre-training models with different architectures and applied to different downstream tasks: linking the models by hinting the templates;

3) The system provided by the invention has strong vitality. In the future, when a better performing pre-trained model appears and replaces the model in the system, only the template style needs to be changed.

Drawings

Fig. 1 is an overall flow diagram of a proposed story generation system framework.

Detailed Description

The invention is described in further detail below with reference to the accompanying drawings.

The novel story generation system provided by the invention consists of four modules, and the novel story generation system is composed of four modulesEach of which uses a different pre-trained language model. The whole system combines different models through a high-quality Prompt problem template to execute an accident generation task. The overall architecture of the system is shown in fig. 1. For the input S, the common sense inference module generates n possible inference events E ₁ ,E ₂ ,...,E _n (ii) a The question generation module constructs a corresponding question Q using the events ₁ ,Q ₂ ,...,Q _n The answer generation module answers the question to obtain an answer A ₁ ,A ₂ ,...,A _n (ii) a The score selection module selects the solution A' with the least confusion as the story context. Combined with the head to form

As the beginning of a new round of generation.

(1) Common sense reasoning module

The common sense reasoning module uses a Para-Comet pre-training model to generate a reasoning event corresponding to each sentence according to the input story beginning. Common sense reasoning has been considered as a premium external knowledge to assist in text generation. With external knowledge injection, the model is not limited to the input text, but can be generated from the perspective of being more consistent with the common sense of human society.

The Para-Comet model is proposed to solve the problem that the Comet model only accepts the reasoning training of the short-language common sense and can not carry out the long sentence reasoning. It takes into account the implicit common sense knowledge from two aspects: semantic knowledge based on social knowledge of world knowledge and specific culture; situational knowledge based on causal understanding and cognitive reasoning. It proposes semantic knowledge from the context and assists generation through cyclic memory enhancement.

The Para-Comet inputs narrative texts and outputs common sense reasoning corresponding to each sentence on the basis of a Comet model. These inferences are in the form of events and are of 9 types, respectively: xIntent, xNeed, xAttr, xWatt, xEffect, xReact, oWant, oEffect, and oReact. The first six are selected for subsequent generation.

(2) Question generation module

Events derived by the common sense inference module do not show the corresponding persona. Therefore, the role most probably related to the event needs to be obtained through pre-training the model, and then the event reference and the role Character are filled into the problem template to generate the final problem.

The RoBERTA model provided by Huggingface was selected trained on SQuAD (Stanford university challenge dataset). Taking xNeed and xtent as examples, the following templates were designed:

-Who needs to Event？

-Who wants toEvent？

filling the event into the template, splicing the generated content in the previous text before the problem, and sending the generated content to the RoBERTA model for generation. The generated content is filtered and cleaned, so that the role corresponding to each inference event is obtained.

After the < event, role > pair is present, it is filled into the question template to generate the corresponding question according to the different event type. Still taking xNeed and xtintet as examples, the following templates were designed:

-What does character do to needEvent？

-Why does character do Event？

and filling the event and the role into the template to obtain the final problem.

(3) Answer generation module

And inputting the questions obtained in the question generation module into a pre-training model to generate the story. Two pre-training models were selected: ELI5QA and Finetuned BART.

The ELI5QA model is a question-and-answer model in the form of long text trained under the Fairseq-py framework. It was trained on the ELI5 dataset. The ELI5 dataset is called Explain LikeI'm Five, collected from the Reddit community corpus. In this dataset, people give long and easy to understand answers to open questions, just like giving a five year old child.

The inputs to the ELI5QA model are questions (Question) and documents (Document), which will generate answers to the questions from the knowledge gained from the documents and training. In the system, in order to ensure the consistency and continuity of the generated story as much as possible, the story beginning and the generated content are used as documents and input into the model together with the questions.

The problem template was originally constructed to get a short and relevant sentence from the ELI5QA model as the story's context. To ensure this, the Top-k sampling algorithm is used for generation. In the decoding process, k tokens with the highest probability are taken, and the sum of the probabilities is recorded as sum-topk. The probability redistribution is transformed as follows: the k tokens are divided by the probability and sum-topk to obtain new probabilities, and the rest tokens become 0. The constant k is a given value, setting k too small will tend to generate a more flat or extensive sentence, and when k is large, the candidate set will contain some inappropriate tokens. K =50 was chosen.

Even if the Top-k sampling algorithm is used, the generated answers still have a certain probability of having meaningless or repeated sentences, and therefore, the answers need to be cleaned to a certain degree. The strategy adopted is as follows: multiple prohibited phrases are collected 100, making up a collection. When the answer is a single sentence, adding the single sentence into the candidate; when the answer is a plurality of sentences, those sentences containing elements appearing in the prohibited phrase set and all sentences following it are discarded. If the length of the first sentence is less than 6, the content of the first sentence is also abandoned, and the rest sentences are added into the candidates. In this way, a set of candidate answers for each question is obtained.

In addition to the ELI5QA model, BART was also selected as a pre-training model for generating answers. However, the original BART model does not work well when performing QA tasks because the form of the data is not a question and answer material when training. Therefore, the method carries out Finetune on a ROCSeries data set so as to improve the generation effect. The ROCStories dataset is a collection of common sense short novels, containing a story of 100,000 five sentences. Each story follows a daily theme. These stories contain various common sense causal and temporal relationships between daily events. Data sets were expressed as 70%:15%: the 15% ratio was randomly divided into a training set, a validation set, and a test set. For each story in the training set and validation set, he is processed as follows:

1) Generating 20 questions by using a common sense reasoning module and a question generating module for one to four sentences in each story;

2) The current sentence and all sentences before it are as documents, and the question is concatenated as follows:

Question--T--Document

the concatenated content is used as the value of the corresponding keyword text.

3) All sentences behind the current sentence are used as values of corresponding key words, namely queries, and the values of the text key values form a dictionary together with the values of the text key values, and the dictionary is stored in a jsonnines file.

Thus, a training set and a validation set were obtained, the learning rate was set to 2e-5, the batch size was set to 16, and BART was trained.

During generation, the Question and Document are spliced into a form of training set data, a Top-k sampling decoding algorithm is selected, k =50 is set, the generated result is cleaned by adopting the same strategy as an ELI5QA model, and a candidate answer set corresponding to each Question is obtained

(4) Score selection module

After candidate answer sets for each question are obtained, the sets are merged, the Perplexity (Perplexity) of each element in the set is calculated, and the element with the smallest Perplexity is selected as the story text.

The language model is used to calculate the probability of a sentence, given a sentence (word sequence) S = W ₁ ，W ₂ ，...，W _k The probability that the model can generate this sentence is:

P(S)＝P(W ₁ ，W ₂ ，…，W _K )＝p(W ₁ )p(W ₂ |W ₁ )…P(W _K |W ₁ ，W ₂ ，…，W _K - ₁ )

it is better to assign a language model with higher probability values to the sentences of the test set. The confusion is a unit for quantifying the quality of the model, and the formula is as follows:

the formula shows that a language model that scores sentences for less confusion is better.

GPT2 is used as a reference model to calculate the probability of a sentence. For a given sentence, it is shifted to the left by one bit as label, the last bit is removed as input, and the output of input and label are subjected to cross entropy loss. The power to the base of 2 for the loss is the confusion score of the sentence, but the cross entropy loss magnitude should be compared directly since the final selection is to be made. The smaller the loss, the smaller the confusion score, and the better the sentence.

(5) Flow of generation

The beginning of the story is entered and the system gives the next story. If the sentence number of the target story is set, the system can spell the currently generated story at the beginning of the story to be used as the input of the next flow. The system continues to generate the story until the number of sentences in the story meets the requirements.

The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A story generation method based on pre-training prompts is characterized by comprising the following steps:

1) Inputting a story beginning to a first pre-trained model, the story beginning containing a plurality of sentences, the first pre-trained model generating event inference corresponding to each sentence;

2) Filling the events into different question templates according to the types of the events to obtain the questions about the story;

3) Answering the questions by using a question-answer model to generate answers and obtain an answer set;

2. A story generation method based on pre-training cues as claimed in claim 1, characterised in that the first pre-training model is a Para-Comet pre-training model.

3. A story generation method based on pre-training hints as claimed in claim 1, characterized in that the implementation process of step 2) is:

2.2 Filling the event inference obtained in the step 1) into a corresponding link template, splicing the beginning of a story before the link template, inputting the beginning of the story into a RoBERTA model for generation, obtaining a role corresponding to each event inference, and obtaining an (event, role) pair;

2.3 Fill the event and role in the question template to get the final question.

4. A story generation method based on pre-training cues as claimed in claim 3, wherein the implementation process of step 3) is:

inputting the beginning of a story as a document and a question into an ELI5QA model together to generate a candidate answer set; or splicing the question and the story start, and inputting the spliced question and story start into a BART model to generate a candidate answer set.

5. A story generation method based on pre-training hints as claimed in claim 4, characterized in that step 4) is implemented by: and merging the answer sets of each question, calculating the Perplexity of each element in the set, namely the answer, by using a second pre-training model, and selecting the element with the minimum Perplexity as the following of the story, wherein the second pre-training model is a GPT2 model.

6. A story generation system based on pre-trained cues, comprising the following modules:

3) The answer generating module is used for answering the questions by using an ELI5QA model or a BART model and generating a candidate item set below the story;

4) And the scoring selection module is used for sending answers to each element in the candidate item set, calculating a confusion score of each element by using a GPT2 model, and selecting the element with the smallest score as a following sentence.

7. A storage medium having stored thereon a computer program which, when executed by a processor, implements the above-described pre-training cue-based story generation method.

8. A terminal, comprising: a processor and a memory;

the memory is used for storing a computer program;

the processor is used for executing the computer program stored in the memory so as to enable the terminal to execute the story generating method based on the pre-training prompt.