WO2021217935A1

WO2021217935A1 - Method for training question generation model, question generation method, and related device

Info

Publication number: WO2021217935A1
Application number: PCT/CN2020/105777
Authority: WO
Inventors: 曹辰捷; 徐国强
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2020-04-29
Filing date: 2020-07-30
Publication date: 2021-11-04
Also published as: CN111639163A

Abstract

A method for training a question generation model, a question generation method, and a related device. The method comprises: pre-training an initial model to obtain a pre-trained language model, and adjusting a mask matrix during pre-training so as to realize three language models; acquiring question-and-answer information that comprises a question text and an answer text; extracting, from the answer text, a key entity related to the question text; configuring a network in the pre-trained language model such that same adapts to the generation of a Chinese text; inputting the key entity and the answer text into the pre-trained language model, so as to obtain a predicted question text; according to the predicted question text and the question text, determining a prediction error; and adjusting the model according to the prediction error, so as to obtain a question generation model. The method does not need to rely on manual data labeling. The method belongs to the field of artificial intelligence and further relates to blockchain technology, and the predicted question text can be stored in a blockchain node.

Description

Training method of problem generation model, problem generation method and related equipment

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on April 29, 2020, the application number is 202010356637.X, and the invention title is "Training method of problem generation model, problem generation method and related equipment", which The entire content is incorporated into this application by reference.

Technical field

This application relates to the field of artificial intelligence technology, and in particular to a training method of a question generation model, a question generation method and related equipment.

Background technique

With the development of natural language processing technology, problem generation technology has emerged. Problem generation involves machine learning and natural language processing in the field of artificial intelligence, as well as smart life in the field of smart cities. Question generation research how to generate natural language-based questions is an important issue in the field of natural language processing. Question generation has a wide range of applications. For example, machine knowledge bases can use active questioning to build or supplement knowledge bases and expand data sets; in the field of education, question generation can help students ask questions; in the field of dialogue, question generation can be started as a cold start A topic, or to get feedback by asking questions, is very rich in application scenarios.

The inventor realizes that the existing problem generation technology is usually based on known grammatical rules, using syntax trees to generate problems, and filling existing templates with entities in the knowledge base. This technology has poor migration capabilities. A large amount of prior expert knowledge is required for construction or migration; another technique is to use deep learning models to generate questions based on pre-labeled answers. This technique requires manual labeling of large amounts of data in advance, which is time-consuming and labor-intensive, and most of the labeled text Shorter, affecting the generation of problems. It can be seen that the existing problem generation technology has poor problem generation performance.

Summary of the invention

The purpose of the embodiments of the present application is to propose a method for training a question generation model, a method for question generation, and related equipment for improving the performance of question generation. In order to solve the above technical problems, an embodiment of the present application provides a method for training a problem generation model, which adopts the following technical solutions:

Pre-training the initial model to obtain a pre-training language model, and in the pre-training, by adjusting the mask matrix, the network in the initial model is realized as a one-way model, a two-way model, and a sequence-to-sequence model;

Obtaining question and answer information from a web page through a web crawler, where the question and answer information includes question text and answer text;

Extract key entities related to the question text from the answer text;

Setting the network in the pre-training language model to a sequence-to-sequence model to obtain a pre-training language model for Chinese text generation;

Inputting the key entity and the answer text into a pre-trained language model constructed in advance for Chinese text generation to obtain the predicted question text output by the pre-training language model;

Determine the prediction error according to the prediction question text and the question text;

The pre-training language model is adjusted according to the prediction error until the prediction error satisfies the training stop condition, and a problem generation model is obtained.

A problem generation method, including:

Obtain the source text used for question generation;

Filter several groups of source entities from the source text;

Respectively inputting the several groups of source entities into a question generation model; wherein, the question generation model is a model obtained by using any one of the above-mentioned training methods of the question generation model;

Acquiring the question generation model based on several question texts generated by the several groups of source entities.

In order to solve the above technical problems, an embodiment of the present application also provides a training device for a problem generation model, including:

The model training module is used to pre-train the initial model to obtain the pre-trained language model, and adjust the mask matrix in the pre-training to realize the one-way model, the two-way model and the sequence-to-sequence model of the network in the initial model;

An information acquisition module for acquiring question and answer information from a web page through a web crawler, where the question and answer information includes question text and answer text;

An entity extraction module for extracting key entities related to the question text from the answer text;

A model setting module, configured to set the network in the pre-training language model to a sequence-to-sequence model to obtain a pre-training language model for Chinese text generation;

A text input module, configured to input the key entity and the answer text into the pre-training language model to obtain the predicted question text output by the pre-training language model;

An error determination module, configured to determine a prediction error according to the prediction question text and the question text;

The model adjustment module is configured to adjust the pre-training language model according to the prediction error until the prediction error satisfies the training stop condition to obtain a problem generation model.

In order to solve the above technical problems, an embodiment of the present application also provides a computer device, including a memory and a processor, and computer-readable instructions stored in the memory and capable of running on the processor, and the processor executes The computer-readable instructions implement the following steps:

Extract key entities related to the question text from the answer text;

Inputting the key entity and the answer text into the pre-training language model to obtain the predicted question text output by the pre-training language model;

Obtain the source text used for question generation;

Filter several groups of source entities from the source text;

Respectively inputting the several groups of source entities into a question generation model, where the question generation model is a model obtained by using the above-mentioned question generation model training method;

In order to solve the above technical problems, embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores computer-readable instructions, and the computer-readable instructions implement the following steps when executed by a processor:

Extract key entities related to the question text from the answer text;

Obtain the source text used for question generation;

Filter several groups of source entities from the source text;

Compared with the prior art, the embodiment of the training method of the problem generation model of the present application mainly has the following beneficial effects: the network in the initial model is realized by adjusting the mask matrix to realize three language models, so as to carry out a comprehensive prediction of the initial model. Training to obtain a pre-trained language model that can understand natural language and generate natural language; through web crawlers, a large amount of question and answer information can be obtained from web pages for model training. The question and answer information includes question text and answer text, and the answer is automatically obtained Extract key entities related to the question text from the text, without relying on manual a large number of annotations, improve the efficiency of obtaining key entities, thereby improving the efficiency of model training; adjust the network in the pre-training language model to a sequence-to-sequence model, making The pre-training language model is oriented to text generative tasks and has good text generation capabilities; key entities and answer text are input into the pre-training language model to obtain the predicted question text, and the pre-trained language model is based on the error between the predicted question text and the real question text Adjustments are made to obtain a problem generation model. The problem generation model is obtained by fine-tuning the pre-training language model according to downstream tasks, which ensures the quality of the generated problems, thereby improving the performance of the generated problems.

Description of the drawings

In order to explain the solution in this application more clearly, the following will briefly introduce the drawings used in the description of the embodiments of the application. Obviously, the drawings in the following description are some embodiments of the application. Ordinary technicians can obtain other drawings based on these drawings without creative work.

Figure 1 is an exemplary system architecture diagram to which the present application can be applied;

Fig. 2 is a flowchart of an embodiment of a training method for a question generation model according to the present application;

FIG. 3 is a flowchart of a specific implementation of step 201 in FIG. 2;

FIG. 4 is a flowchart of a specific implementation of step 203 in FIG. 2;

FIG. 5 is a flowchart of a specific implementation of step 205 in FIG. 2;

Fig. 6 is a flowchart of an embodiment of the question generation method according to the present application;

FIG. 7 is a flowchart of a specific implementation of step 302 in FIG. 4;

Fig. 8 is a schematic structural diagram of an embodiment of a training device for a question generation model according to the present application;

Fig. 9 is a schematic structural diagram of an embodiment of a computer device according to the present application.

Detailed ways

The terminology used in the specification of the application herein is only for the purpose of describing specific embodiments, and is not intended to limit the application.

The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings.

As shown in FIG. 1, the system architecture 100 may include

terminal devices

101, 102, 103, a network 104, and a server 105. The network 104 is used to provide a medium for communication links between the

terminal devices

101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.

The user can use the

terminal devices

101, 102, and 103 to interact with the server 105 through the network 104 to receive or send messages and so on. Various communication client applications can be installed on the

terminal devices

101, 102, 103.

The

terminal devices

101, 102, 103 may be various electronic devices with display screens and support for web browsing, including but not limited to smart phones, tablet computers, e-book readers, MP3 players (Moving Picture Experts Group Audio Layer III, dynamic Video experts compress standard audio layer 3), MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image experts compress standard audio layer 4) players, laptop portable computers and desktop computers, etc. The server 105 may be a server that provides various services.

It should be noted that the training method of the question generation model provided by the embodiments of the present application is generally executed by the server, and accordingly, the processing device of the question generation model is generally set in the server.

It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. There can be any number of terminal devices, networks, and servers according to implementation needs.

Continuing to refer to FIG. 2, there is shown a flowchart of an embodiment of the training method of the question generation model according to the present application. The training method of the problem generation model includes the following steps:

Step 201: Pre-train the initial model to obtain a pre-trained language model, and adjust the mask matrix in the pre-training to implement a one-way model, a two-way model, and a sequence-to-sequence model for the network in the initial model.

In this embodiment, the electronic device (for example, the server shown in FIG. 1) on which the training method of the question generation model runs can communicate with the terminal through various wired connection methods or wireless connection methods.

Wherein, the initial model may be a model that has not been pre-trained. The mask matrix can be the mask matrix of the network in the initial model, which is used to control the context information used in training; the one-way model is one-way LM, the two-way model is two-way LM, and the sequence-to-sequence model is seq2seq LM.

Specifically, the server first obtains the pre-built initial model, and pre-trains the initial model. During the pre-training process, the server sets the initial model to three different language models by adjusting the mask matrix of the network in the initial model, including one-way model, two-way model and sequence-to-sequence model, so as to enrich the pre-training With the information obtained, a pre-trained language model that can understand natural language and generate natural language is obtained.

Step 202: Obtain question and answer information from a web page through a web crawler, where the question and answer information includes question text and answer text.

Specifically, the user can configure the web crawler at the terminal, and the terminal generates an information acquisition instruction according to the crawler configuration information input by the user, and sends the information acquisition instruction to the server. The configured web crawler is used to crawl information from the World Wide Web. The crawler configuration information may include the URL of the page, the storage address of the information, and so on.

After the server receives the information acquisition instruction, it extracts the crawler configuration information in the information acquisition instruction, and generates a web crawler according to the crawler configuration information. The server runs the generated web crawler, the web crawler crawls the question and answer information from the web page, and the server saves the question and answer information crawled by the web crawler into the database. Among them, the question and answer information may be composed of question text and answer text corresponding to the question text.

In one embodiment, the web crawler may be a Scrapy-based web crawler. Scrapy is a fast, high-level screen scraping and web scraping framework developed by python, used to scrape web sites and extract structured data from pages. Scrapy-based web crawlers can crawl a large amount of question and answer information from public question and answer community websites such as Zhihu and Baidu Zhizhi, and store the crawled question and answer information in the form of a JSON file in the database of the server.

In one embodiment, a question in the web page has at least one answer, and at least one sub-answer text is obtained after crawling the at least one answer; at least one sub-answer text corresponding to one question text constitutes an answer corresponding to the question text text.

In one embodiment, the step of obtaining question and answer information from a web page through a web crawler specifically includes: receiving the target text; splitting the target text to obtain several sentences; generating the same number of web crawlers as the several sentences; embedding the several sentences respectively Each web crawler: Run each web crawler to get the question and answer information that each web crawler crawls from the web page according to the embedded sentence.

Wherein, the target text may be text that instructs the web crawler to crawl the question and answer information.

Specifically, the server receives the target text sent by the user through the terminal, and performs sentence-level disassembly of the target text according to punctuation to obtain several sentences. The server generates the same number of web crawlers as the sentences obtained by the split, and embeds the sentences obtained by the split into the code layer of each web crawler. The server runs the web crawler after the embedded statement, and the web crawler crawls the question and answer information related to the embedded statement from the web page through columnar crawling.

In this embodiment, after receiving the target text, the target text is split to obtain several sentences, and the several sentences are embedded in different web crawlers. After running the web crawler, question and answer information related to the embedded sentences can be crawled.

Step 203: Extract key entities related to the question text from the answer text.

Among them, the key entity can be an entity in the answer text, and the key entity is related to the question text.

Specifically, the server performs word segmentation on the question text and the answer text respectively, and each obtains multiple entities. The server recognizes the part-of-speech of the entity, and filters the entities with the preset part-of-speech, which can be verbs and nouns. The server performs precise matching and fuzzy matching on the key entities selected from the question text and the answer text, and uses the matching entities in the answer text as the key entities.

In one embodiment, the answer text includes at least one sub-answer text; the server respectively extracts key entities related to the question text from the sub-answer texts, and associates the sub-answer texts with the key entities extracted from the sub-answer texts.

In one embodiment, before the step of extracting key entities from the question text and answer text in the question and answer information, the step further includes: matching the question and answer information through regular expressions to obtain the character string to be cleaned; deleting the matched character string to be cleaned to correct Q&A information is data cleaned.

Among them, the character string to be cleaned may be a meaningless character string in the question and answer message.

Specifically, there is meaningless content in the crawled question and answer information. In order to increase the proportion of effective content, the server matches the question and answer information through a preset regular expression, so as to obtain the string to be cleaned in the question and answer information, and match the matched The string to be cleaned is deleted to clean the question and answer information. Regular expressions are pre-configured, and a regular expression can correspond to a meaningless string.

For example, when the Q&A information is crawled from Zhihu, the Q&A information may include hyperlinks, dividing lines, and invalid characters in the Q&A information; "Source:...", "Author:" in the column of Zhihu ......" and other content that has nothing to do with the main body of the text. When the question and answer information is crawled from Baidu, the question and answer information may include a large number of meaningless characters. The server can delete the above meaningless content through regular expressions.

In this embodiment, the question and answer information is matched by a regular expression to obtain the character string to be cleaned, and the matched character string to be cleaned is deleted, so as to realize the data cleaning of the question and answer information and increase the proportion of effective content in the question and answer information.

Step 204: Set the network in the pre-training language model to a sequence-to-sequence model to obtain a pre-training language model for Chinese text generation.

Among them, the pre-trained language model (Unified pre-trained Language Model, UNILM) is a model that can process natural language understanding and natural language generation at the same time.

The pre-training of the pre-training language model adopts three unsupervised language model goals: one-way model is one-way LM (including left to right and right to left), two-way model is two-way LM and sequence-to-sequence model is sequence- to-sequence LM (seq2seq LM), where LM is language model.

The pre-training language model uses a Transformer network with shared parameters, and also uses specific self-attention masks to control the context information used in prediction. During pre-training, the above three LMs are realized by adjusting the mask matrix in the Transformer network.

When fine-tuning according to downstream tasks, the pre-training language model can be regarded as a one-way encoder, a two-way encoder or a sequence-to-sequence model. The mask matrix in the Transformer network can be adjusted to adapt to different downstream tasks (naturally Language understanding and generative tasks).

Seq2seq is an Encoder-Deocder structure model with good text generation effect; the input of seq2seq is a sequence, and the output is also a sequence. The Encoder turns a variable-length input sequence into a fixed-length vector, and the Decoder decodes the fixed-length vector into a variable-length output sequence.

Specifically, the server obtains a pre-trained language model, and the pre-trained language model is used for Chinese processing, can be used for natural language understanding, and can also be used for text generation. In this application, the pre-training language model needs to be fine-tuned to a model for problem generation, so it is necessary to set the mask matrix of the Transformer network in the pre-training language model, so as to realize the sequence-to-sequence model, that is, seq2seq LM. In the mask matrix of seq2seq LM, the matrix elements on the left are all 0, which means that both the above information and the following information can be obtained; in the right matrix, the upper right matrix element is infinite, which means that only the upper Text information.

Step 205: Input the key entity and the answer text into the pre-training language model to obtain the predicted question text output by the pre-training language model.

Among them, the predictive question text may be a question text related to the answer text generated by the pre-training language model according to the key entity and the answer text.

Specifically, after setting the network in the pre-training language model to a sequence-to-sequence model, the server fine-tunes the pre-training language model according to key entities, question text, and answer text. The pre-training language model converts key entities and question texts into vectors, processes the vectors, and outputs prediction question texts.

In one embodiment, the pre-training language model divides the key entities and the question text in units of words, converts each word into a vector according to the character conversion table, and processes the vector. The character conversion table is created in advance, and the correspondence between words and vectors is determined. When the server converts the word, it queries the word in the character conversion table, and uses the vector corresponding to the inquired word as the vector after the word conversion.

Step 206: Determine the prediction error according to the prediction question text and the question text.

It should be emphasized that, in order to further ensure the privacy and security of the above-mentioned prediction question text, the above-mentioned prediction question text can also be stored in a node of a blockchain.

Specifically, the question text in the question and answer information is the target output of the pre-trained language model. The server obtains the prediction question text output by the pre-training language model and the question text in the Q&A information, and calculates the prediction error according to the preset error formula.

Further, in an embodiment, the calculation formula of the prediction error is:

Among them, y _i is the identifier of the i-th character in the question text when it is converted into a vector according to the character conversion table; logits _i is the score of the i-th character in the prediction question text in the character conversion table; softmaxLoss is the prediction problem The prediction error between the text and the question text.

Specifically, in the character conversion table, each word can be regarded as a token, and each token has a unique identifier in the character conversion table, that is, the identifier token_id. For example, when the size of the character conversion table is 20000, that is, the character conversion table records the conversion relationship between 20000 words and vectors, the range of token_id is 0-19999. The pre-training language model is to get the token_id sequence of the predicted problem text.

Suppose the question text contains N (N is a positive integer) words. The pre-training language model encodes the answer text and key entities to obtain N Hs, where H is the word in the predicted question text to be generated. The pre-training language model calculates the score logits of H at each word in the character conversion table. It can be understood that the score logits is equivalent to the similarity between H and each word in the character table, and the word with the highest score is selected as the corresponding H Character.

After the pre-training language model determines each word in the predicted question text and its corresponding score logits, calculate the prediction error, y _i is the identifier token_id of the i-th word in the crawled question text, and logits _i is the predicted question text The score of the i-th word in the middle is calculated by cross-entropy to get the prediction error.

In this embodiment, the prediction error can be accurately measured by the error formula, which ensures that the pre-training language model can be accurately adjusted according to the error.

In step 207, the pre-training language model is adjusted according to the prediction error until the prediction error meets the training stop condition, and the problem generation model is obtained.

Wherein, the training stop condition is a condition for stopping model training, and the training stop condition may be that the prediction error is less than a predetermined error threshold.

Specifically, the terminal obtains a predetermined error threshold, and compares the prediction error with the error threshold. When the prediction error is greater than or equal to the error threshold, the terminal adjusts the model parameters in the pre-training language model in the direction of reducing the prediction error. Each time the terminal adjusts the parameters of the pre-trained language model, the key entities and answer text are reprocessed to obtain the prediction question text, and the prediction error is obtained according to the prediction question text and the question text, and the prediction error is compared with the error threshold. If the prediction error is still greater than or equal to the error Threshold, adjust the model again, and loop iteratively until the prediction error is less than the error threshold, stop training, and use the pre-trained language model at the time of stopping training as the problem generation model.

When adjusting the model parameters of each layer in the pre-training language model, the output of the current layer and the gradient back propagated back are required, and the output of each layer is stored in the video memory. When there are many Transformr network layers in the pre-training language model, for example, when the Transformr network has 24 layers, the output of the 24 layers needs to be saved, which takes up a lot of video memory resources. For this reason, you can only save the output of a part of the layer. When backpropagation needs to update the model parameters, you can calculate the output of the current layer through the saved output of the part of the layer, so as to save video memory resources and reduce the hardware equipment requirements for model training.

For example, the Transformr network has 24 layers, now save

The output of the layer, that is, the output of the 1, 7, 13, 19, and 24 layers are saved. When backpropagation is performed, the output of the 2nd-6th layer is recalculated from the output of the 1st layer, and the 8-12th layer The output is recalculated from the output of layer 7, the output of layers 14-18 is recalculated from the output of layer 13, and the output of layers 20-23 is recalculated from the 19th layer.

In this embodiment, the network in the initial model is adjusted to realize three language models by adjusting the mask matrix, so as to perform all-round pre-training on the initial model to obtain a pre-trained language model that can understand natural language and generate natural language; Through web crawlers, a large amount of question and answer information can be obtained from web pages for model training. The question and answer information includes question text and answer text, and key entities related to the question text are automatically extracted from the answer text, without relying on manual large-scale labeling, improving The efficiency of obtaining key entities is improved, thereby improving the efficiency of model training; the network in the pre-training language model is adjusted to a sequence-to-sequence model, so that the pre-training language model is oriented to text generative tasks and has good text generation capabilities; The entity and answer text are input into the pre-trained language model to obtain the predicted question text. The pre-trained language model is adjusted according to the error between the predicted question text and the real question text to obtain the question generation model. The question generation model is based on the downstream of the pre-trained language model The task is fine-tuned to ensure the quality of the generated problem, thereby improving the performance of the generated problem.

Further, as shown in FIG. 3, the above step 201 specifically includes:

Step 2011: Obtain an initial model for pre-training and multiple sets of pre-training samples.

Wherein, the pre-training sample set may be a data set used to train the initial model.

Specifically, the built initial model and multiple sets of pre-training sample sets for pre-training the initial model are pre-stored in the server. The server obtains the initial model and the pre-training sample set, and needs to pre-train the initial model to obtain the pre-trained language model.

In step 2012, the mask identifiers corresponding to each group of pre-training sample sets are randomly generated; the mask matrix corresponding to the mask identifiers realizes a one-way model, a two-way model, and a sequence-to-sequence model.

Wherein, the mask mark may be the mark of the mask matrix of the network in the model.

Specifically, the initial model constructed is a Transformer network, and the Transformer network can be 12 layers or 24 layers. Pre-training uses three unsupervised language model targets: one-way LM (including left-to-right and right-to-left), two-way LM and seq2seq LM.

For each training sample set, the server randomly generates the mask identification of the training sample set, the mask identification corresponds to the mask matrix, and the server sets the Transformer network to a different LM according to the mask matrix; each group of training is randomly generated The mask identification of the sample set realizes equal pre-training of different LMs.

In one embodiment, the model parameters in the initial model are half-precision, and before the step of randomly generating the mask identifiers corresponding to each group of pre-training sample sets, it further includes: setting the model parameters of the layernorm layer and the embedding layer in the initial model to Single precision.

Among them, half-precision or half-precision floating-point number (FP16) is a binary floating-point number data type used by computers. Half-precision floating-point numbers use 2 bytes (16 bits) for storage; and single-precision floating-point numbers (FP32) occupy 4 bytes (32 bits) of storage space.

Specifically, model training requires higher hardware equipment resources of the computer, and the training time is longer. In order to increase the training speed and reduce the GPU (Graphics Processing Unit, graphics processing unit) occupancy, the model parameters in the initial model are half-precision; To avoid non-convergence of the initial model, set the model parameters of the embedding layer in the initial model to single precision. In order to avoid large losses caused by insufficient operation precision such as average and variance during the training process, set the model parameters of the layernorm layer in the initial model to Single precision.

In this embodiment, the model parameters in the initial model are set to half precision, and the model parameters of the layernorm layer and the embedding layer are set to single precision, which improves the speed and accuracy of model training.

In step 2013, each group of pre-training sample sets are respectively input to the initial model, and the mask matrix of the network in the initial model is adjusted according to the mask identifier corresponding to the pre-training sample set.

Specifically, the server sequentially inputs the pre-training sample set into the initial model. After inputting a set of pre-training sample sets, the server adjusts the mask matrix of the Transformer network in the initial model according to the mask identifier corresponding to the pre-training sample set, thereby setting the Transformer network to one-way LM, two-way LM or seq2seq LM.

In step 2014, the initial model adjusted by the mask matrix is sequentially pre-trained according to the input pre-training sample set to obtain a pre-training language model.

Specifically, after the server adjusts the mask matrix, it pre-trains the initial matrix according to the pre-training sample set; when the training is completed according to a set of pre-training sample sets, input the next set of pre-training sample sets to adjust the mask matrix , Proceed to the next round of pre-training. After all the pre-training sample sets are trained, the server obtains the pre-trained language model.

In the pre-training process, the Transformer network randomly switches between one-way LM (including left-to-right and right-to-left), two-way LM, and seq2seq LM. Each layer in the Transformer network shares model parameters in multiple rounds of pre-training.

In this embodiment, the mask mark of the pre-training sample set is randomly generated. When the initial model is pre-trained according to the pre-training sample set, the mask matrix in the initial model is adjusted according to the mask mark, so that the initial model can complete 3 languages on average The pre-training goal of the model ensures the scientificity of pre-training.

Further, as shown in FIG. 4, the above step 203 may include:

Step 2031: Extract text entities from the question text and the answer text in the question and answer information respectively.

Among them, the text entity can be an entity in the question text and the answer text.

Specifically, the server may segment the question text and the answer text to obtain multiple entities. The server can use pkuseg for word segmentation to segment the question text and the answer text in word units. pkuseg is an open source Chinese word segmentation toolkit released by Peking University, which has a high accuracy rate of word segmentation.

After word segmentation, stop words are removed from the entity. Stop words are stop words, which are words that have no obvious meaning and can be deleted, such as "在", "的" and "是". Then, entities whose parts of speech are verbs and nouns are extracted as text entities.

Step 2032: Calculate the similarity between each text entity in the answer text and each text entity in the question text.

Specifically, the text entities in the answer text form the first data set, and the text entities in the question text form the second data set, and the server calculates the similarity between each entity in the first data set and each entity in the second data set. The server can calculate the similarity through exact matching and fuzzy matching, and the similarity between text entities that can be accurately matched is 100%. When performing fuzzy matching, the server can convert text entities into vectors and calculate the cosine similarity between vectors; or calculate the text edit distance between text entities (also known as Levenshtein distance, which converts a string into another character) The minimum number of operations required for string, operations include insert, delete, and replace). The shorter the text editing distance, the higher the similarity.

Step 2033: Extract text entities whose similarity meets a preset similarity threshold from each text entity of the answer text as key entities.

Specifically, assuming that there are M (M is a positive integer) text entities in the first data set, and there are N (N is a positive integer) text entities in the second data set, then M*N groups of similarities are calculated. The server obtains the preset similarity threshold, and selects the similarity whose similarity value is greater than the similarity threshold from the M*N group of similarities. The two text entities corresponding to each selected similarity will be from the first The text entity of the data set is used as the key entity. The server may also arrange the M*N groups of similarities in descending order, select a preset number of similarities according to the arrangement order, and use the first data set text entity corresponding to the selected similarity as the key entity.

For example, the question text is "What is the ranking of Fudan University in China?", divided into {"Fudan University", "in", "domestic", "of", "ranking", "approximately" through pkuseg, "how many","?"}. After word segmentation, the stop words {"在", "的", "YES"} are removed, and the verbs and nouns {"Fudan University", "ranking"} are extracted, and the answer text is processed in the same way. Assuming that the entity "Fudan" is extracted from the answer text, the similarity between "Fudan University" and "Fudan" is calculated to meet the similarity threshold, and "Fudan" is taken as the key entity.

In this embodiment, the extracted key entities are highly related to the question text and the answer text, which can assist the pre-training language model to output the question text.

Further, the answer text includes at least one sub-answer text. As shown in FIG. 5, the above step 205 may include:

Step 2051: Input at least one sub-answer text and key entities corresponding to the sub-answer text into the pre-training language model to obtain at least one three-dimensional word vector matrix.

Specifically, the answer text corresponding to one question text may be composed of at least one sub-answer text, and each sub-answer text is extracted to obtain a key entity.

The server performs batch processing, and at least one sub-answer text corresponding to a question text and key entities corresponding to the sub-answer text are processed as a batch.

The server fills in the text length of the sub-answer text (that is, the number of characters in the sub-answer text) by adding zeros, and then converts it into a one-hot vector (also known as "one-hot encoding") according to the character conversion table to obtain one -hot matrix. Assuming that the number of sub-answer texts is batch, the length of the text after completion is length, and the number of characters in the character conversion table is M, then the three dimensions of the one-hot matrix are batch, length, and M in turn, where batch represents one-hot matrix Which sub-answer text comes from, length is the number of rows in the one-hot matrix, and M is the number of columns in the one-hot matrix.

The server needs to convert the one-hot vector into a word vector, input the three-dimensional one-hot matrix into the embedding layer of the pre-trained language model, and replace the M dimension with the dim dimension to obtain a three-dimensional word vector matrix; dim is the feature dimension, in a model The dim is a uniform constant, for example, dim can be 512, 768, or 1024.

Step 2052: Combine the converted three-dimensional word vector matrix into a two-dimensional word vector matrix.

Specifically, in order to improve the calculation efficiency, the three-dimensional word vector matrices are combined to obtain a larger matrix, that is, the two-dimensional word vector matrix. The matrix merging cancels the batch dimension, so that the calculation of the matrix in the pre-training language model becomes correct. The operation of the two-dimensional matrix improves the calculation speed and reduces the training time.

In step 2053, the two-dimensional word vector matrix is processed through the pre-training language model to obtain the prediction question text output by the pre-training language model, where the prediction question text is stored in the blockchain.

Specifically, the server processes the pre-trained language model through a two-dimensional word vector matrix to obtain the score logits of each word in the predicted question text. At each word, the word with the highest score is selected as the word in that place, thereby Output prediction question text. The server can also upload the predicted question text to the blockchain for storage to record the training process of the pre-trained language model, while ensuring the privacy and security of the predicted question text.

In this embodiment, each sub-answer text and corresponding key entities are converted into multiple three-dimensional word vector matrices, and then the three-dimensional word vector matrices are merged into a two-dimensional word vector matrix, so that the pre-training language model performs the two-dimensional word vector matrix Processing improves the efficiency of outputting prediction problem text.

In an embodiment, as shown in FIG. 6, a method for generating a question is provided. Taking the method applied to the server in FIG. 1 as an example for description, the method includes the following steps:

Step 301: Obtain the source text for question generation.

Specifically, the question generation model generates question text based on the input text. The user sends the source text to the server through the terminal, and the question generation model generates the question text based on the source text.

In an embodiment, the terminal may also send voice data to the server, and the server converts the voice data into text data through voice recognition to obtain the source text.

Step 302: Filter several groups of source entities from the source text.

Specifically, the server performs word segmentation on the source text to obtain multiple entities. The server can randomly screen multiple entities to obtain a group of source entities, and can screen several groups of source entities. The server can also filter several groups of source entities according to the instruction information sent by the terminal.

Step 303: Input several groups of source entities into the question generation model respectively; wherein, the question generation model is a model obtained by using the training method of the above question generation model.

Specifically, the server inputs the selected groups of source entities into the question generation model, and the question generation model converts the source entities into vectors in units of characters to perform question generation processing. The question generation model is a model obtained using the training method of the above question generation model.

When the server generates the question text, it can generate the question text based on the entire source text, or it can generate the question text based on several groups of source entities extracted from the source text.

Step 304: Obtain several question texts generated by the question generation model based on several groups of source entities.

Specifically, the question generation model is based on a set of source entities to process and generate a set of question texts. When there are several groups of source entities, the server generates question texts corresponding to the several groups of source entities.

In one embodiment, the server sends several generated question texts to the terminal, and the user selects the question texts through the terminal for subsequent use.

In this embodiment, several groups of source entities are filtered from the source text used for question text generation, and different question texts can be generated according to different source entities through the question generation model, which improves the flexibility of generating question text.

Further, as shown in FIG. 7, step 302 may include:

Step 3021: Identify text entities in the source text.

Specifically, after receiving the source text, the server performs word segmentation on the source text to obtain multiple entities, recognizes the part of speech of each entity, and uses the entity that meets the preset part of speech as the text entity. Among them, the part of speech of the text entity can include nouns, verbs, adjectives, etc.

Step 3022: Randomly extract several groups of text entities from the recognized text entities to obtain several groups of source entities.

Specifically, after the server recognizes the text entities, it randomly selects several groups of text entities, and uses each group of text entities as a group of source entities to obtain multiple groups of source entities.

Step 3023: Perform semantic annotation on the text entities in the source text according to a preset semantic knowledge base to obtain a semantic annotation result.

Specifically, a semantic knowledge base is preset in the server. The server recognizes the semantics of each text entity according to the semantic knowledge base, and performs semantic annotation on each text entity to obtain the semantic annotation result.

Step 3024: According to the semantic annotation result, filter several text entities that meet the preset semantic range to obtain several groups of source entities.

Specifically, the semantic information expressed by the text entity can be determined according to the semantic annotation result. The server obtains the preset semantic range, filters several text entities whose semantic information meets the preset semantic range, and obtains several sets of source entities. The preset semantic range may come from the instruction information sent by the terminal.

For example, when the user wants to obtain the question text in the financial field, the preset semantic range in the instruction information is set to the financial field, and the server filters the text entities belonging to the financial field to obtain the source entity.

In this embodiment, the text entities in the source text are recognized, and the text entities are extracted randomly or semantically, so as to ensure the flexibility of text entity extraction, thereby ensuring the flexibility of generating the question text.

Those of ordinary skill in the art can understand that all or part of the processes in the above-mentioned embodiment methods can be implemented by instructing relevant hardware through computer-readable instructions, which can be stored in a computer-readable storage medium. When the computer-readable instructions are executed, they may include the processes of the above-mentioned method embodiments.

Although the steps in the flowchart of the drawings are shown in sequence as indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated in this article, the execution of these steps is not strictly limited in order, and they can be executed in other orders.

With further reference to FIG. 8, as an implementation of the method shown in FIG. 2, this application provides an embodiment of a training device for a question generation model. The device embodiment corresponds to the method embodiment shown in FIG. The device can be applied to various electronic devices.

As shown in FIG. 8, the training device 400 for the problem generation model described in this embodiment includes: a model training module 401, an information acquisition module 402, an entity extraction module 403, a model setting module 404, a text input module 405, and an error determination module 406 And the model adjustment module 407. in:

The model training module 401 is used to pre-train the initial model to obtain the pre-trained language model, and adjust the mask matrix in the pre-training to realize the one-way model, the two-way model and the sequence-to-sequence model of the network in the initial model.

The information obtaining module 402 is configured to obtain question and answer information from a web page through a web crawler, and the question and answer information includes question text and answer text.

The entity extraction module 403 is used to extract key entities related to the question text from the answer text.

The model setting module 404 is used to set the network in the pre-training language model to a sequence-to-sequence model to obtain a pre-training language model for Chinese text generation.

The text input module 405 is used to input key entities and answer text into the pre-training language model to obtain the predicted question text output by the pre-training language model.

The error determination module 406 is configured to determine the prediction error according to the prediction question text and the question text.

The model adjustment module 407 is configured to adjust the pre-training language model according to the prediction error until the prediction error meets the training stop condition, and the problem generation model is obtained.

In this embodiment, the network in the pre-training language model is adjusted to a sequence-to-sequence model, so that the pre-training language model is oriented to text generative tasks and has good text generation capabilities, and then the pre-training language model is fine-tuned to obtain a problem generation model , To ensure the quality of the generated problems.

In some optional implementations of this embodiment, the above-mentioned model training module 401 includes: an acquisition sub-module, an identity generation sub-module, an input sub-module, and a pre-training sub-module, wherein:

The acquisition sub-module is used to acquire the initial model used for pre-training and multiple sets of pre-training samples;

The identification generation sub-module is used to randomly generate the mask identification corresponding to each group of pre-training sample sets; the mask matrix corresponding to the mask identification realizes one-way model, two-way model and sequence-to-sequence model;

The input sub-module is used to input each group of pre-training sample sets into the initial model, and adjust the mask matrix of the network in the initial model according to the mask identifier corresponding to the pre-training sample set;

The pre-training sub-module is used to sequentially pre-train the initial model adjusted by the mask matrix according to the input pre-training sample set to obtain the pre-training language model.

In some optional implementations of this embodiment, the model parameters in the initial model are half-precision, and the above model training module 401 also includes a parameter setting submodule. The parameter setting submodule is used to combine the layernorm and embedding layers in the initial model. The model parameters are set to single precision.

In some optional implementations of this embodiment, the entity extraction module 403 is further configured to: extract text entities from the question text and answer text in the question and answer information, respectively; calculate each text entity and question text in the answer text The similarity of each text entity in the answer text; from each text entity of the answer text, extract the text entity whose similarity meets the preset similarity threshold as the key entity.

In some optional implementations of this embodiment, the answer text includes at least one sub-answer text, and the text input module 405 is further configured to: input at least one sub-answer text and key entities corresponding to the sub-answer text into the pre-training language Model to obtain at least one three-dimensional word vector matrix; merge the converted three-dimensional word vector matrix into a two-dimensional word vector matrix; process the two-dimensional word vector matrix through the pre-training language model to obtain the prediction problem text output by the pre-training language model , Where the prediction question text is stored in the blockchain.

In one embodiment, a question generation device is provided, including: a source text acquisition module, a source entity extraction module, a source entity input module, and a question generation module, wherein:

The source text obtaining module is used to obtain the source text used for question generation.

The source entity extraction module is used to filter several groups of source entities from the source text.

The source entity input module is used to input several groups of source entities into the question generation model; wherein, the question generation model is a model obtained by using the training method of the above question generation model.

The question generation module is used to obtain several question texts generated by the question generation model based on several groups of source entities.

In some optional implementations of this embodiment, the aforementioned source entity extraction module is further used to: identify text entities in the source text; randomly extract several groups of text entities from the recognized text entities to obtain several groups of source entities; Or, perform semantic annotation on the text entities in the source text according to a preset semantic knowledge base to obtain a semantic annotation result; according to the semantic annotation result, filter several text entities that meet the preset semantic range to obtain several groups of source entities.

In order to solve the above technical problems, the embodiments of the present application also provide computer equipment. Please refer to FIG. 9 for details. FIG. 9 is a block diagram of the basic structure of the computer device in this embodiment.

The computer device 5 includes a memory 51, a processor 52, and a network interface 53 that communicate with each other through a system bus. It should be pointed out that the figure only shows the computer device 5 with the components 51-53, and it is not required to implement all the shown components, and more or fewer components may be implemented instead. The computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.

The memory 51 includes at least one type of computer-readable storage medium. The computer-readable storage medium may be non-volatile or volatile. The computer-readable storage medium includes flash memory, hard disk, and multimedia card. , Card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), Programmable read-only memory (PROM), magnetic memory, magnetic disk, optical disk, etc. In some embodiments, the memory 51 may be an internal storage unit of the computer device 5, such as a hard disk or a memory of the computer device 5. In other embodiments, the memory 51 may also be an external storage device of the computer device 5, for example, a plug-in hard disk equipped on the computer device 5, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, Flash Card, etc. Of course, the memory 51 may also include both the internal storage unit of the computer device 5 and its external storage device. In this embodiment, the memory 51 is generally used to store an operating system and various application software installed in the computer device 5, such as a training method of a question generation model, or computer readable instructions of a question generation method, and the like. In addition, the memory 51 can also be used to temporarily store various types of data that have been output or will be output.

The processor 52 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments. The processor 52 is generally used to control the overall operation of the computer device 5. In this embodiment, the processor 52 is configured to run computer-readable instructions or processed data stored in the memory 51, for example, run a training method of a problem generation model or a computer-readable instruction of a problem generation method.

The network interface 53 may include a wireless network interface or a wired network interface, and the network interface 53 is generally used to establish a communication connection between the computer device 5 and other electronic devices.

The computer device provided in this embodiment can execute the steps of the training method of the problem generation model described above. Here, the steps of the training method of the question generation model may be the steps in the training method of the question generation model of each of the foregoing embodiments.

In this embodiment, the network in the pre-training language model is adjusted to a sequence-to-sequence model, so that the pre-training language model is oriented to text generative tasks and has good text generation capabilities, and then the pre-training language model is fine-tuned to obtain a problem generation model , To ensure the quality of the generated problems. The computer device provided in this embodiment can execute the steps of the above-mentioned problem generation method. Here, the steps of the question generation method may be the steps in the question generation method of each of the foregoing embodiments.

This application also provides another implementation manner, that is, a computer-readable storage medium storing computer-readable instructions for training a question generation model, and the computer-readable training question generation model The instructions may be executed by at least one processor, so that the at least one processor executes the steps of the training method of the problem generation model as described above.

In this embodiment, the network in the pre-training language model is adjusted to a sequence-to-sequence model, so that the pre-training language model is oriented to text generative tasks and has good text generation capabilities, and then the pre-training language model is fine-tuned to obtain a problem generation model , To ensure the quality of the generated problems. This application also provides another implementation manner, that is, a computer-readable storage medium storing computer-readable instructions for question generation, and the computer-readable instructions for question generation It may be executed by at least one processor, so that the at least one processor executes the steps of the problem generation method as described above.

Through the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better.的实施方式。 Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes several instructions to make a terminal device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of the present application.

The blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The drawings show preferred embodiments of the present application, but do not limit the patent scope of the present application. This application can be implemented in many different forms. All equivalent structures made by using the contents of the description and drawings of this application, directly or indirectly used in other related technical fields, are similarly within the scope of patent protection of this application.

Claims

A training method of a problem generation model includes the following steps:

Pre-training the initial model to obtain a pre-training language model, and in the pre-training, by adjusting the mask matrix, the network in the initial model is realized as a one-way model, a two-way model, and a sequence-to-sequence model;

Obtaining question and answer information from a web page through a web crawler, where the question and answer information includes question text and answer text;

Extract key entities related to the question text from the answer text;

Setting the network in the pre-training language model to a sequence-to-sequence model to obtain a pre-training language model for Chinese text generation;

Inputting the key entity and the answer text into the pre-training language model to obtain the predicted question text output by the pre-training language model;

Determine the prediction error according to the prediction question text and the question text;

The pre-training language model is adjusted according to the prediction error until the prediction error satisfies the training stop condition, and a problem generation model is obtained.
The method for training a problem generation model according to claim 1, wherein the pre-training of the initial model is performed to obtain the pre-training language model, and the network in the initial model is realized by adjusting the mask matrix in the pre-training. The steps to model, bidirectional model and sequence to sequence model specifically include:

Obtain an initial model for pre-training and multiple sets of pre-training samples;

Randomly generating the mask identifiers corresponding to each group of pre-training sample sets; the mask matrix corresponding to the mask identifiers realizes a one-way model, a two-way model, and a sequence-to-sequence model;

Input each of the pre-training sample sets into the initial model, and adjust the mask matrix of the network in the initial model according to the mask identifier corresponding to the pre-training sample set;

The initial model adjusted by the mask matrix is sequentially pre-trained according to the input pre-training sample set to obtain the pre-training language model.
The method for training a question generation model according to claim 2, wherein the model parameters in the initial model are half-precision, and before the step of randomly generating mask identifiers corresponding to each set of pre-training sample sets, the method further comprises:

The model parameters of the layernorm layer and the embedding layer in the initial model are set to single precision.
The method for training a question generation model according to claim 1, wherein the step of extracting key entities related to the question text from the answer text specifically comprises:

Extract text entities from the question text and answer text in the question and answer information;

Calculating the similarity between each text entity in the answer text and each text entity in the question text;

From each text entity of the answer text, extract the text entity whose similarity meets the preset similarity threshold as the key entity.
The method for training a question generation model according to claim 1, wherein the answer text includes at least one sub-answer text, and the key entity and the answer text are input into the pre-training language model to obtain the The steps of predicting the question text output by the pre-training language model specifically include:

Input at least one sub-answer text and key entities corresponding to the sub-answer text into the pre-training language model to obtain at least one three-dimensional word vector matrix;

Combine the converted three-dimensional word vector matrix into a two-dimensional word vector matrix;

The two-dimensional word vector matrix is processed by the pre-training language model to obtain the prediction question text output by the pre-training language model, wherein the prediction question text is stored in a blockchain.
A problem generation method includes the following steps:

Obtain the source text used for question generation;

Filter several groups of source entities from the source text;

Respectively inputting the several groups of source entities into a question generation model, wherein the question generation model is a model obtained by using the training method of the question generation model of any one of claims 1 to 5;

Acquiring the question generation model based on several question texts generated by the several groups of source entities.
8. The question generation method according to claim 6, wherein said filtering several groups of source entities from said source text comprises:

Identifying text entities in the source text;

Randomly extract several groups of text entities from the recognized text entities to obtain several groups of source entities;

or,

Performing semantic annotation on the text entities in the source text according to a preset semantic knowledge base to obtain a semantic annotation result;

According to the semantic annotation result, several text entities that meet the preset semantic range are screened to obtain several groups of source entities.
A training device for a problem generation model, including:

The model training module is used to pre-train the initial model to obtain the pre-trained language model, and adjust the mask matrix in the pre-training to realize the one-way model, the two-way model and the sequence-to-sequence model of the network in the initial model;

An information acquisition module for acquiring question and answer information from a web page through a web crawler, where the question and answer information includes question text and answer text;

An entity extraction module for extracting key entities related to the question text from the answer text;

A model setting module, configured to set the network in the pre-training language model to a sequence-to-sequence model to obtain a pre-training language model for Chinese text generation;

A text input module, configured to input the key entity and the answer text into the pre-training language model to obtain the predicted question text output by the pre-training language model;

An error determination module, configured to determine a prediction error according to the prediction question text and the question text;

The model adjustment module is configured to adjust the pre-training language model according to the prediction error until the prediction error satisfies the training stop condition to obtain a problem generation model.
A computer device includes a memory and a processor. The memory stores computer readable instructions. When the processor executes the computer readable instructions, the following steps are implemented:

Pre-training the initial model to obtain a pre-training language model, and in the pre-training, by adjusting the mask matrix, the network in the initial model is realized as a one-way model, a two-way model, and a sequence-to-sequence model;

Obtaining question and answer information from a web page through a web crawler, where the question and answer information includes question text and answer text;

Extract key entities related to the question text from the answer text;

Setting the network in the pre-training language model to a sequence-to-sequence model to obtain a pre-training language model for Chinese text generation;

Inputting the key entity and the answer text into the pre-training language model to obtain the predicted question text output by the pre-training language model;

Determine the prediction error according to the prediction question text and the question text;

The pre-training language model is adjusted according to the prediction error until the prediction error satisfies the training stop condition, and a problem generation model is obtained.
The computer device according to claim 9, wherein the pre-training of the initial model is performed to obtain a pre-training language model, and the network in the initial model is implemented as a one-way model and a two-way model by adjusting a mask matrix in the pre-training. The steps of model and sequence to sequence model include:

Obtain an initial model for pre-training and multiple sets of pre-training samples;

Randomly generating the mask identifiers corresponding to each group of pre-training sample sets; the mask matrix corresponding to the mask identifiers realizes a one-way model, a two-way model, and a sequence-to-sequence model;

Input each of the pre-training sample sets into the initial model, and adjust the mask matrix of the network in the initial model according to the mask identifier corresponding to the pre-training sample set;

The initial model adjusted by the mask matrix is sequentially pre-trained according to the input pre-training sample set to obtain the pre-training language model.
10. The computer device according to claim 10, wherein the model parameters in the initial model are half-precision, and before the step of randomly generating mask identifiers corresponding to each group of pre-training sample sets, the method further comprises:

The model parameters of the layernorm layer and the embedding layer in the initial model are set to single precision.
The computer device according to claim 9, wherein the step of extracting key entities related to the question text from the answer text specifically comprises:

Extract text entities from the question text and answer text in the question and answer information;

Calculating the similarity between each text entity in the answer text and each text entity in the question text;

From each text entity of the answer text, extract the text entity whose similarity meets the preset similarity threshold as the key entity.
The computer device according to claim 9, wherein the answer text includes at least one sub-answer text, and the key entity and the answer text are input into the pre-training language model to obtain the pre-training language model The specific steps of the output prediction question text include:

Input at least one sub-answer text and key entities corresponding to the sub-answer text into the pre-training language model to obtain at least one three-dimensional word vector matrix;

Combine the converted three-dimensional word vector matrix into a two-dimensional word vector matrix;

The two-dimensional word vector matrix is processed by the pre-training language model to obtain the prediction question text output by the pre-training language model, wherein the prediction question text is stored in a blockchain.
A computer device includes a memory and a processor. The memory stores computer readable instructions. When the processor executes the computer readable instructions, the following steps are implemented:

Obtain the source text used for question generation;

Filter several groups of source entities from the source text;

Respectively inputting the several groups of source entities into a question generation model, wherein the question generation model is a model obtained by using the training method of the question generation model of any one of claims 1 to 5;

Acquiring the question generation model based on several question texts generated by the several groups of source entities.
A computer-readable storage medium having computer-readable instructions stored thereon, and when the computer-readable instructions are executed by a processor, the following steps are implemented:

Pre-training the initial model to obtain a pre-training language model, and in the pre-training, by adjusting the mask matrix, the network in the initial model is realized as a one-way model, a two-way model, and a sequence-to-sequence model;

Obtaining question and answer information from a web page through a web crawler, where the question and answer information includes question text and answer text;

Extract key entities related to the question text from the answer text;

Setting the network in the pre-training language model to a sequence-to-sequence model to obtain a pre-training language model for Chinese text generation;

Inputting the key entity and the answer text into the pre-training language model to obtain the predicted question text output by the pre-training language model;

Determine the prediction error according to the prediction question text and the question text;

The pre-training language model is adjusted according to the prediction error until the prediction error satisfies the training stop condition, and a problem generation model is obtained.
The computer-readable storage medium according to claim 15, wherein the pre-training of the initial model is performed to obtain a pre-trained language model, and the network in the initial model is realized by adjusting the mask matrix in the pre-training The steps of one-way model, two-way model and sequence-to-sequence model specifically include:

Obtain an initial model for pre-training and multiple sets of pre-training samples;

Randomly generating the mask identifiers corresponding to each group of pre-training sample sets; the mask matrix corresponding to the mask identifiers realizes a one-way model, a two-way model, and a sequence-to-sequence model;

Input each of the pre-training sample sets into the initial model, and adjust the mask matrix of the network in the initial model according to the mask identifier corresponding to the pre-training sample set;

The initial model adjusted by the mask matrix is sequentially pre-trained according to the input pre-training sample set to obtain the pre-training language model.
The computer-readable storage medium according to claim 16, wherein the model parameters in the initial model are half-precision, and before the step of randomly generating the mask identifiers corresponding to each set of pre-training sample sets, the method further comprises :

The model parameters of the layernorm layer and the embedding layer in the initial model are set to single precision.
15. The computer-readable storage medium according to claim 15, wherein the step of extracting key entities related to the question text from the answer text specifically comprises:

Extract text entities from the question text and answer text in the question and answer information;

Calculating the similarity between each text entity in the answer text and each text entity in the question text;

From each text entity of the answer text, extract the text entity whose similarity meets the preset similarity threshold as the key entity.
The computer-readable storage medium according to claim 15, wherein the answer text includes at least one sub-answer text, and the key entity and the answer text are input into the pre-training language model to obtain the answer text. The steps to describe the prediction question text output by the pre-training language model specifically include:

Input at least one sub-answer text and key entities corresponding to the sub-answer text into the pre-training language model to obtain at least one three-dimensional word vector matrix;

Combine the converted three-dimensional word vector matrix into a two-dimensional word vector matrix;

The two-dimensional word vector matrix is processed by the pre-training language model to obtain the prediction question text output by the pre-training language model, wherein the prediction question text is stored in a blockchain.
A computer-readable storage medium having computer-readable instructions stored thereon, and when the computer-readable instructions are executed by a processor, the following steps are implemented:

Obtain the source text used for question generation;

Filter several groups of source entities from the source text;

Respectively inputting the several groups of source entities into a question generation model, wherein the question generation model is a model obtained by using the training method of the question generation model of any one of claims 1 to 5;

Acquiring the question generation model based on several question texts generated by the several groups of source entities.