CN115510814B

CN115510814B - Chapter-level complex problem generation method based on dual planning

Info

Publication number: CN115510814B
Application number: CN202211394785.6A
Authority: CN
Inventors: 毕胜; 程茜雅; 漆桂林
Original assignee: Southeast University
Current assignee: Southeast University
Priority date: 2022-11-09
Filing date: 2022-11-09
Publication date: 2023-03-14
Anticipated expiration: 2042-11-09
Also published as: CN115510814A

Abstract

The invention discloses a chapter-level complex question generation method based on double planning, which is mainly used for generating a natural language question sequence capable of being answered by answers according to a given article and answers. The method comprises the steps of firstly, coding given articles and answers by using a pre-training language model BERT to obtain a semantic vector perceived by the answers. Then, a semantic structure diagram is constructed for each sentence sequence in a given article, the semantic structure diagrams are encoded by adopting a multi-head attention mechanism, and relevant information among the semantic structure diagrams is obtained to guide generation of a complex problem. And finally, generating a complex problem by using a neural network Transformer as a decoder, selecting a semantic graph needing important attention and a fact triple in the semantic graph based on double planning, namely fact-level planning and semantic graph-level planning at each decoding time step, enhancing the complexity of the problem generation by integrating the information, and assisting the generation of the current vocabulary.

Description

Chapter-level complex problem generation method based on double planning

Technical Field

The invention relates to the technical field of computers, in particular to a chapter-level complex problem generation method based on double planning.

Background

In recent years, with the rapid development of artificial intelligence, a Question Generation (QG) task has become a current research focus. Question generation refers to the automatic generation of content-relevant, language-friendly natural language questions from a range of data sources (e.g., text, pictures, knowledge bases). The question generation task of the present invention is input by the fact text and the answer. The question generation task has wide application prospect and can generate training data for the question and answer task; actively proposing a question in a dialogue system to improve the fluency of the dialogue; and (3) constructing automatic tutoring systems (automatic tutoring systems) to generate targeted problems according to course materials, tutoring students to learn and the like.

The current QG method based on deep learning mainly studies generation of simple problems, and there is little work to study generation of complex problems. The simple problem refers to a problem only containing one entity relationship triple, the complex problem refers to a problem containing a plurality of entity relationship triples, and the answer can be obtained through complex multi-hop reasoning. The generation of complex questions has more realistic significance than simple questions that contain only one entity-relationship triplet, for example, in the field of education, because different students have different abilities to accept knowledge, it is difficult to test the true level of a student if a simple question is generated. For students with strong abilities, complex questions need to be tested to get true feedback. In addition, the performance of the existing Question Answering (QA) system on simple questions reaches the bottleneck, and the complex questions are more beneficial to the improvement of the QA system. Therefore, the method has certain practical value and application prospect in research of complex problem generation. However, most of the existing complex problem generation methods are based on knowledge graph complex problem generation, and the methods cannot be directly applied to problem generation of unstructured texts which lack clear logic structures. In the generation of complex problems based on texts, multiple texts are generally used as input, and the situation that complex problems are generated on a single text is not considered. In addition, the methods directly blend the sentence sequence of the nodes when modeling the effective information, and do not further screen the fact in the sentences. And a sentence often contains a plurality of facts as well. Therefore, the problem generation method at the discourse level lacks of integral planning, cannot select specific facts, and is easy to cause mismatching of entities and relations, thereby affecting the correctness of the facts of the problem. Moreover, sentences contain other redundant information and noise may be introduced.

Therefore, the invention provides a chapter-level problem generation model based on double planning, a semantic structure diagram is constructed for each sentence in a text, and information needing important attention in each decoding time step is accurately positioned through double planning (fact-level planning and semantic diagram-level planning). Specifically, during decoding, a semantic structure diagram needing attention is selected, fact triple information needing attention is further determined, and complexity of a generation problem is enhanced by integrating the fact triple information into the semantic structure diagram.

Disclosure of Invention

The technical problem to be solved by the invention is that most of the existing complex problem generation methods construct a semantic graph, ignore rich fact information contained in a single sentence, lack of overall planning causes that specific facts cannot be selected, and entity and relation are easy to mismatch, thereby influencing the fact correctness of the problem.

The technical scheme adopted by the invention for solving the technical problems is as follows: a chapter-level complex problem generation method based on double planning. The method comprises the steps of firstly, coding given articles and answers by using BERT to obtain semantic vectors of answer perception. Then, a semantic structure diagram is constructed for each sentence sequence in a given article, the semantic structure diagrams are coded by adopting a multi-head attention mechanism, and relevant information among the semantic structure diagrams is obtained to guide generation of complex problems. And finally, generating a complex problem by adopting transform decoding, selecting a semantic graph needing important attention and a fact triple therein based on double planning (fact-level planning and semantic graph-level planning) at each decoding time step, and enhancing the complexity of the generated problem by integrating the information to assist the generation of the current word.

The invention discloses a chapter-level complex problem generation method based on double planning, which comprises the following steps of:

1) And encoding the given article and the answer by using BERT to obtain a text vector representation of answer perception.

2) For each sentence sequence in a given article, the sentence sequence is subjected to preliminary processing by using a self-adaptive cross sentence referring and resolving technology, and then a semantic structure diagram with fine granularity is constructed by adopting a memory perception semantic diagram construction method.

3) Processing the edges in the graph as nodes for the fine-grained semantic structure graph finally obtained in the step 2), firstly obtaining the vector representation of each node in the sentence through a multi-head attention-map encoder, then obtaining the vector representation of a single fact, and finally obtaining the vector representation of the whole graph.

4) And (2) obtaining a text vector representation of answer perception for the step 1), and sending the text vector representation into a Transformer model for decoding. And at each time step of decoding, selecting a semantic graph and a fact triple therein which need to focus on based on double planning (fact-level planning and semantic graph-level planning), and assisting the generation of the current word.

5) And designing a loss function, and training a problem generation model through multiple iterations.

As a further improvement of the invention, saidIn step 1), a BERT is adopted to code given texts and answers, and the input form is

Specifically, the text sequence and the answer are spliced, and a separator is inserted in the middle

Separating text from answer and inserting specific classification identifier at the beginning

After the pretraining process of BERT, the classification identifier learns the characterization information of the fused text and the answer, and is represented by a vector C.

As a further improvement of the present invention, in the step 2), a fine-grained semantic structure diagram is constructed for each sentence sequence in a given article. Firstly, a self-adaptive cross-sentence reference resolution technology is adopted, pronouns are replaced by entities referred by the pronouns, and the entities are convenient to fuse in the subsequent composition. In adaptive cross-sentence resolution techniques, entity segments need to be replaced with real-world entities. Each entity instance is first represented as a semantic vector. Query entities are then predicted by entering similarity features in the softmax layer

And a set of candidates, and predicting the entities

And the candidate with the greatest co-designated probability.

As a further improvement of the present invention, in said step 2), an adaptive cross-sentence reference resolution technique is employed, in order to predict the co-referred links across sentences, an algorithm is employed to traverse the sentence list and predict the co-referred links between the entities mentioned in the current sentence and the candidate clusters calculated across all previous sentences, which first randomly orders the sentence list D,then, for each sentence

Entity in (1)

From a previous sentence

Computing a candidate set in a common set of fingers

Wherein, in the step (A),

，

the number of the sentences is represented by,

then predict

And a candidate object

Inter-finger link

Finally, the predicted candidate set is updated and recalculated

The new candidate of (2).

As a further improvement of the invention, in the step 2), when the adaptive cross-sentence reference resolution technology is adopted to predict the common reference link, each entity

Number of possible candidates ofThe volume will grow as the number of previous sentences increases and the computational cost increases significantly. In order to reduce the calculation cost, the invention only considers the sum of the two in the calculation process

Similar previous sentence

。

As a further improvement of the present invention, in step 2), after each sentence is subjected to reference resolution, entity relationship triples are extracted from the sentences by using a semantic graph construction method of memory perception, so as to construct a semantic graph. In the semantic graph construction method based on memory perception, the invention uses an iterative storage to store the extraction result generated in each round in a memory, so that the next decoding iteration can access all the previous extractions. Specifically, a first extraction result is generated by inputting sentences into the sequence-to-sequence architecture, then the extraction result is spliced with the source sentence and then input into the sequence-to-sequence architecture again to generate a new extraction result, and the process is repeated until EndOfExtractions are generated, wherein the symbol represents that the extraction process is finished.

As a further improvement of the invention, in the step 2), a semantic graph construction method adopting memory perception uses a sequence-to-sequence model. In order to train the sequence to a sequence model, the present invention requires a set of sentence-extraction pairs as training data. The manual construction of the data set has good effect, but is time-consuming and labor-consuming, and a large-scale data set cannot be constructed. Therefore, the present invention proposes a method for automatically constructing sentences-extracting pairs of data sets. Generally speaking, the automatic construction of a data set is divided into two steps, and all the extracted results are firstly sorted according to the confidence degree descending order output by the original system. Then, training data is constructed according to the input and output format of the model. But simply aggregating all the extraction results is not feasible. Because of the following problems: 1) No calibration: the confidence scores assigned by the different systems are not calibrated to comparable scales. 2) And (3) redundancy extraction: in addition to the full repetition, multiple systems will produce similar extractions, but with less marginal utility. 3) And (3) error extraction: pooling inevitably contaminates the data and amplifies false instances, forcing open information extraction systems downstream to learn less quality extractions. In order to solve the above problems, the present invention uses a scoring-filtering framework to obtain high quality extraction results. The pooled extraction results are scored first, and generally, good (correct, informative) extraction results will get higher values than bad (incorrect) and redundant extraction results. And then redundant data in the extraction result is filtered out. Through the scoring-filtering framework, high-quality fact triples can be obtained, and accordingly a semantic graph is constructed.

As a further improvement of the present invention, in the step 3), when the semantic structure diagram is encoded, the edges in the structure diagram are also encoded as nodes. For a certain semantic structure chart, firstly, a word vector trained in advance is adopted to initialize a node embedding vector in the semantic structure chart. Then, in order to capture semantic relations between nodes, a relation enhanced graph Transformer is adopted to encode the nodes. The method uses a multi-head attention mechanism with enhanced relationship to obtain the embedding vector of each node, so that when each node in the semantic structure chart is coded, the method not only contains the coding information of the current node, but also contains the information of other nodes in the semantic structure chart, namely, the relation between the current node and the other nodes is kept. And finally, inputting all node vectors in the semantic structure chart into a full connected feed-forward Network (FFN) to obtain a final node semantic representation vector, and solving the degradation problem in deep learning by adopting residual connection. After the node semantic expression vectors are obtained, the node vectors in the same fact triple in the graph are input into the average pooling layer, and the semantic vector expression of the fact triple is obtained. Similarly, when calculating the vector representation of the ith semantic structure diagram, all fact triple representation vectors included in the diagram are input into the average pooling layer to obtain the semantic vector representation of the semantic structure diagram.

As a further improvement of the present invention, in the step 4), a problem is generated by using a Transformer as a decoder based on the encoding result of the text and semantic structure diagram. At each time step of decoding, selecting a semantic graph needing important attention and a fact triple in the semantic graph based on double planning (fact-level planning and semantic graph-level planning), and assisting the generation of the current word. Specifically, a semantic structure diagram is selected first, then relevant fact triples are selected from the semantic structure diagram, and finally the hidden state of the decoder is updated based on the text vector and the selected fact triples, and a current word is generated.

The semantic graph level planning aims at selecting a semantic structure graph which needs to be focused at present through an attention mechanism on the basis of a text semantic vector C and words generated in previous time steps at each decoding time step to obtain a semantic structure graph representation based on attention. And then, splicing the attention-based semantic structure graph representation and the text semantic vector C, calculating the probability of each subgraph through a softmax layer, and selecting the subgraph with the highest probability to guide the generation of the current problem.

The fact-level planning aims at selecting fact triples needing important attention currently through an attention mechanism on the basis of the text semantic vector C, words generated at a previous time step and the selected semantic structure diagram at each decoding time step to obtain attention-based fact triplet representations in the kth semantic structure diagram. Similar to semantic graph-level planning, attention-based fact triple representations and text semantic vectors C are spliced together, the probability of each fact triple is obtained through calculation through a softmax layer, and the fact triple with the highest probability is selected to guide generation of the current problem.

As a further improvement of the present invention, in the step 5), the loss function is composed of three parts — cross entropy loss, supervision information loss, and coverage loss. Where cross-entropy loss refers to minimizing the negative log-likelihood of all model parameters. The loss of supervision information refers to the deviation between the semantic graph and facts of the dual planning selection and the standard semantic graph and facts. The coverage loss refers to that when the coverage vectors of the semantic graph and the fact are calculated in the step 4), the coverage loss is additionally calculated, so that the model is constrained to pay attention to a certain semantic graph or a certain fact repeatedly.

Has the advantages that:

compared with the prior art, the invention has the following advantages: 1) The existing problem generation method only constructs a semantic graph from a chapter level, and easily omits rich factual information contained in sentences. The semantic structure diagram is constructed for each sentence sequence in a given article, so that the fact in the sentence can be comprehensively and accurately acquired, and powerful data support is provided for generating complex problems. 2) The existing method is lack of overall planning, cannot select specific facts, and easily causes mismatching of entities and relations, thereby influencing the correctness of the facts of the problems. The invention uses double planning, can select the semantic graph and the fact triple in the semantic graph and the fact triple which need to focus on through the semantic graph level planning and the fact level planning in the decoding process, and ensures the matching of the generated relation and the entity by integrating the information to assist the generation of the current word, thereby improving the correctness of the problem fact.

Experimental analysis proves that the chapter-level complex problem generation method based on the double-planning provided by the method has an improvement effect on improving the fact correctness of generating complex problems and enhancing the problem generation effect.

Drawings

FIG. 1 is a schematic diagram of the basic process of the present invention;

FIG. 2 is a diagram of a model framework of the present invention;

fig. 3 is a diagram of a decoding implementation of the present invention based on dual programming.

Detailed Description

The invention is further described with reference to the following examples and the accompanying drawings.

Example (b): the invention discloses a sentence-level problem generation method based on syntactic perception prompt learning, which comprises the following steps: 1) The given article and answer are encoded using BERT, resulting in a text vector representation of the answer perception. The BERT is based on a bidirectional Transformer structure, realizes integrated feature fusion by adopting a mask language model, can model a word polysemy phenomenon, and generates a deep bidirectional language representation. Therefore, the invention adopts BERT coding, and the specific input form is

The text and the answer are separated. Inserting a specific classification identifier at the beginning

2) For each sentence sequence in a given article, the sentence sequence is preliminarily processed by using a self-adaptive cross sentence referring and resolving technology, and then a semantic structure diagram with fine granularity is constructed by adopting a semantic diagram construction method of memory perception. The semantic information among different entities in the text can be clearly displayed by constructing the semantic structure chart, so that proper information can be conveniently selected to be integrated into the problem in the decoding process, and the generation of complex problems is assisted. Because the single text has a long space, the chapter constructs a semantic structure diagram for each sentence sequence in the text independently, which is favorable for capturing semantic information more accurately. To pairIn each sentence sequence, firstly, a self-adaptive cross sentence reference resolution technology is adopted, and pronouns are replaced by the referred entities, so that the entities can be conveniently fused in the subsequent composition. In adaptive cross-sentence resolution techniques, entity segments need to be replaced with real-world entities. For each entity, definition is

Wherein

Is an entity that is to be created,

is an entity

A set of participating events. Each entity instance is first represented as a semantic vector. The invention inputs entity span into BERT to obtain initial vector representation

Then each event is obtained in the same way

Vector representation of

And inputting the event set into the BilSTM, and externally connecting a mean pooling layer to obtain a vector representation of the event set

. Finally, combining the initial vector representation of the entity with the vector representation of the event set to obtain the final entity comment semantic representation vector

。

Suppose that

Is a set of related entities

The antecedent of (a) refers collectively to a cluster. The invention combines sentence-level information via an incremental amount

And word level information

Is composed of

Each co-reference entity antecedent set P compute candidate cluster representation in (a)

. Wherein, the first and the second end of the pipe are connected with each other,

the finger will comprise

The sentences are represented by the CLS vector obtained by BERT, and the semantic information of the sentences is contained. The calculation method is as follows:

wherein, in the step (A),

are all learning parameters. Then all of the common referent entity antecedent words in the set P

Averaging to obtain candidate cluster representation

。

Query entities are then predicted by entering similarity features in the softmax layer

And a co-designated link between a set of candidates. Suppose that

Is that

Is/are as follows

A set of candidate representations, the invention first uses cosine similarity

And multi-view cosine similarity

Calculating each candidate

And entities

The similarity of (c). These similarity features are then combined with the differences of the candidate and query and the dot product to

A final characterization is obtained, the calculation formula is as follows:

then, for all candidates

We compute query entities

Probability associated therewith

The calculation method is as follows:

(ii) a And predict entities

And the candidate with the greatest co-designated probability.

To predict coreference links across sentences, the present invention designs an algorithm to traverse the sentence list and predict coreference links between entities mentioned in the current sentence and candidate clusters computed across all previous sentences. The algorithm first arbitrarily orders the sentence list D, and then, for each sentence

Entity in (1)

From a previous sentence

Computing a candidate set in a common set of fingers

Wherein, in the step (A),

，

the number of the sentences is represented by,

then predict

And a candidate object

Inter-finger link

Finally, the predicted candidate set is updated and recalculated

Of the new candidate object.

Predicting common reference links for each entity using adaptive cross-sentence reference resolution techniques

The number of possible candidates of (2) may grow as the number of previous sentences increases, and the computational cost increases greatly. In order to reduce the calculation cost, the invention only considers the calculation process

Similar previous sentence

. The present invention considers sentences having the same topic as similar sentences. During training, the present invention uses standard entity clusters to compute candidate and standard sentence topic clusters. In contrast, in the inference process, the co-designated cluster of the current prediction is used to compute the candidates. In addition, the predicted topic clusters computed using K-means. By minimizing the cross-entropy loss training model for batch computations, all M entities in a single sentence form a batch, and the loss is computed after M sequential predictions. After each sentence is subjected to reference resolution, a semantic graph construction method of memory perception is adopted, and a triple in a format of (head entity, relation and tail entity) is extracted from the sentence. The head and tail entities represent the subject and object, respectively, and the relationship is equivalent to the predicate connecting the subject and object. In the semantic graph construction method for memory perception, the invention uses an iterative memory to store the extraction result generated in each round intoIn memory so that the next decoding iteration can access all previous fetches. Specifically, a first extraction result is generated by inputting sentences into the sequence-to-sequence architecture, then the extraction result is spliced with the source sentence and then input into the sequence-to-sequence architecture again to generate a new extraction result, and the process is repeated until EndOfextractions are generated, wherein the symbol indicates that the extraction process is finished. Because a semantic graph construction method adopting memory perception uses a sequence-to-sequence model, in order to train the sequence-to-sequence model, the invention needs a group of sentence-extraction pairs as training data. The manual construction of the data set has good effect, but is time-consuming and labor-consuming, and a large-scale data set cannot be constructed. Thus, the present invention proposes a method for automatically constructing sentences-extracting pairs of data sets. Generally, the automatic construction of a data set is done in two steps, first sorting all the extracted results in order of decreasing confidence in the original system output. Then, training data is constructed according to the input and output format of the model. But simply aggregating all the extraction results is not feasible. Because of the following problems: 1) No calibration: the confidence scores assigned by the different systems are not calibrated to comparable scales. 2) And (3) redundancy extraction: in addition to the full repetition, multiple systems produce similar extractions, but with less marginal utility. 3) And (3) error extraction: pooling inevitably contaminates the data and amplifies false instances, forcing open information extraction systems downstream to learn less quality extractions. To solve the above problem, the present invention uses a score-filter framework to obtain high quality extraction results. And (3) scoring: the present invention relates to a model for scoring an aggregated draw, which is pre-trained on a random pilot data set. The random guidance data set is generated by randomly extracting each sentence from any one of the guidance systems being aggregated. The model assigns a score to each extraction in the pool based on its confidence value, and generally speaking, good (correct, informative) extractions will get higher values than bad (incorrect) and redundant extractions. And (3) filtering: and then redundant data in the extraction result is filtered. For a given set of ordered decimationsAs a result, the present invention seeks to select the subset of extractions (assigned by the stochastic bootstrap model) with the best confidence score, while having minimal similarity to other selected extractions. Therefore, the invention constructs a complete weighted graph based on all the extraction results in a set of ordering

Each node in the graph corresponds to a decimation result. Each pair of nodes

Connected by an edge. Each edge having an associated weight

The similarity between two corresponding extractions is indicated. Each node

Is assigned a score

This score is equal to the confidence given by the stochastic guidance model. And then selecting the best subgraph from them

As a result of the high quality extraction, other nodes in the graph default to redundant data and are automatically filtered. The process is mathematically expressed as follows:

wherein, in the process,

representation diagram

The node(s) in (1) is (are),

representing nodes

And

right 2 score in between. The first term of the formula refers to the sum of significance of all selected triples and the second term refers to redundant information between those triples. If the picture is

With n nodes, then the target may be set to:

(ii) a Wherein

Score representing a node, i.e.

And is and

is provided with an item

A symmetric matrix of (a).

Is a decision vector in which

Representing a particular node

Whether it belongs to a graph

. By the scoring-filtering framework, high-quality fact triples can be obtained, and finally, entities are used as nodes, and relationships are used as edges connecting two entities to construct a semantic structure chart.

3) And (3) regarding the fine-grained semantic structure diagram finally obtained in the step 2), processing the edges in the diagram as nodes, and obtaining the vector representation of the whole diagram through a multi-head attention diagram encoder. Specifically, for a certain semantic structure chart, firstly, a word vector trained in advance is used for initializing a node embedding vector in the semantic structure chart. Then, in order to capture semantic relations between nodes, the invention adopts a graph Transformer with enhanced relations to encode the nodes. The method uses a relation-enhanced multi-head attention mechanism to obtain each node

Embedding vector of

，

Is the size of the node embedding, the calculation formula is as follows:

，

wherein the content of the first and second substances,

the multi-head attention mechanism is a model parameter, and the multi-head attention mechanism has the function that when each node in the semantic structure diagram is coded, the multi-head attention mechanism not only contains the coding information of the current node, but also contains the information of other nodes in the semantic structure diagram, namely, the connection between the current node and the other nodes is kept. This process is formulated as follows:

as can be seen from the formula, the key point of the multi-head attention mechanism is to integrate the semantic relationship between nodes into the query vector

Sum key vector

In (1). Wherein the content of the first and second substances,

are respectively nodes

The shortest relation path between

The encoding of (3). The encoding result is obtained by adding the embedding vectors of all the relation nodes in the path.

Finally, all the node vectors in the semantic structure diagram are input into a full connected feed-forward Network (FFN) to obtain a final node semantic expression vector, and the degradation problem in deep learning is solved by adopting residual connection, wherein the calculation formula is as follows:

(ii) a Wherein the content of the first and second substances,

are all of the parameters that are trainable,

is a linear network using the gelu activation function.

After obtaining the node semantic expression vector, inputting the node vectors positioned in the same fact triple in the graph into an average poolIn the layer, the semantic vector representation of the fact triple is obtained

Is shown as

In the semantic structure diagram

A fact triplet. Similarly, in calculating

When the vector of the semantic structure chart is represented, all fact triple representing vectors contained in the graph are input into the average pooling layer to obtain the semantic vector representation of the semantic structure chart

The calculation formula is as follows:

(ii) a Wherein the content of the first and second substances,

the average pooling function is represented by the average pooling function,

represents the first

In the semantic structure chart

All the nodes embedding vectors in a fact triplet,

represents the first

All fact triple vectors in the semantic structure diagram.

4) And (3) obtaining a text vector representation of answer perception for the step 1), and sending the text vector representation into a Transformer model for decoding. And at each time step of decoding, selecting a semantic graph and a fact triple therein which need to focus on based on double planning (fact-level planning and semantic graph-level planning), and assisting the generation of the current word. As shown in FIG. 3, specifically, a semantic structure diagram is first selected

Then, relevant fact triples are selected from the semantic structure diagram

Finally, the hidden state of the decoder is updated based on the text vector and the selected fact triplet

And generates the current word

The calculation process is as follows:

the semantic graph level planning aims at selecting a semantic structure graph needing important attention at each decoding time step through an attention mechanism based on a text semantic vector C and words generated at previous time steps to obtain a semantic structure graph representation based on attention

In order to prevent the decoder from repeatedly selecting the same semantic graph, the invention integrates an overlay mechanism to encourage the decoder to overlay all semantic structure graphs when generating words. The calculation process is as follows:

(ii) a Wherein the content of the first and second substances,

respectively to the semantic structure chart

The degree of attention and coverage of the user,

are all model parameters.

At each time step of model decoding, a coverage loss is calculated for the selected semantic graph in the following manner:

then the attention-based semantic structure graph is represented

And splicing with the text semantic vector C, calculating the probability of each sub-image through a softmax layer, and selecting the sub-image with the highest probability to guide the generation of the current problem.

The fact-level planning aims at each decoding time step, based on the text semantic vector C, the words generated at the previous time step and the selected semantic structure diagram

Selecting the fact triplets needing important attention currently through an attention mechanism to obtain the first

Attention-based fact triplet representation in semantic structure graph

. Similar to semantic graph-level programming, the invention incorporates an overlay machine in order to prevent the decoder from repeatedly selecting the same fact tripletThe decoder is encouraged to cover all fact triplets when generating words.

Wherein, in the step (A),

are respectively a pair fact triplet

The degree of attention and coverage of the user,

are all model parameters.

Similarly, at each time step of model decoding, a coverage loss is computed for the selected fact triplet, as follows:

attention-based fact triplets are then represented

And (3) splicing with the text semantic vector C, calculating the probability of each fact triple through a softmax layer, and selecting the fact triple with the highest probability to guide the generation of the current problem.

5) And designing a loss function, and training a problem generation model through multiple iterations. The loss function consists of three parts-cross entropy loss, supervision information loss, and coverage loss. Wherein, the cross entropy loss refers to minimizing the negative log-likelihood of all model parameters, and when a text D and an answer A are given, the calculation mode is as follows:

the invention also makes statistics on the semantic structure chart and the fact triple supervision information selected in each step of reasoning process, analyzes the question and the answer, finds the answer and the entity involved in the question in the text, and determines the standard semantic graph

And fact triplets

. At each time step of problem generation, a probability distribution of the semantic structure diagram is generated

And probability distribution of fact triplets

At this time, the semantic structure diagram is compared with the semantic structure diagram of the standard which should be selected

And fact triplets

Matching is performed and the corresponding losses are calculated

The formula is as follows:

(ii) a The coverage loss refers to that when the coverage vectors of the semantic graph and the facts are calculated in the step 4), the coverage loss is additionally calculated, and repeated selection of information in the same graph can be effectively avoided through the coverage loss, because the higher the probability of one graph in the past selection is,the greater the loss produced.

The resulting loss function is as follows, where,

are parameters used to balance these loss terms:

(ii) a The present example was evaluated using the following criteria: in terms of automatic Evaluation, bilingual Evaluation aid (BLEU) is used to evaluate the degree of coincidence of the generated result and the standard result; displaying a ranking Translation Evaluation (METEOR) for evaluating semantic relevance between the generated result and the standard result; recall-Oriented abstract Evaluation-assisted (round-L) Evaluation of the longest common substring between the generated result and the standard result. In the aspect of manual evaluation, fluency is used for explaining the fluency of the generated result expression; evaluating the correlation degree of the generated result and the given input text; the complexity refers to whether the generated question is complex or not, and the complexity can be evaluated by observing the number of clauses and modifiers contained in the sentence and the number of steps for carrying out multi-hop reasoning when answering the question. Correctness refers to whether the fact contained in the generation problem is correct, i.e., whether fact triples exist in a given source text, and whether entities and relationships match.

In order to verify the effect of the invention, automatic evaluation and manual evaluation are carried out on the common data sets SQuAD and MSMARCO. The experimental results are as follows:

table 1 automatic evaluation results of different methods on SQuAD;

table 2 automatic evaluation results of different methods on msstrarco;

table 3 results of manual evaluation of different methods on msstrarco;

the method achieves optimal performance on a plurality of data sets, and is greatly improved compared with other methods.

The above examples are only preferred embodiments of the present invention, it should be noted that: it will be apparent to those skilled in the art that various modifications and equivalents can be made without departing from the spirit of the invention, and it is intended that all such modifications and equivalents included within the scope of the claims be interpreted as included within the scope of the invention.

Claims

1. A chapter-level complex problem generation method based on double planning is characterized by comprising the following steps:

1) Coding given articles and answers by adopting a pre-training language model BERT to obtain a text vector representation of answer perception,

2) For each sentence sequence in a given article, the sentence sequence is preliminarily processed by using a self-adaptive cross sentence referring and resolving technology, then a semantic structure diagram with fine granularity is constructed by adopting a semantic diagram construction method of memory perception,

3) Processing the edges in the graph as nodes for the fine-grained semantic structure graph finally obtained in the step 2), firstly obtaining the vector representation of each node in the sentence through a multi-head attention-map encoder, then obtaining the vector representation of a single fact, and finally obtaining the vector representation of the whole graph,

4) For the step 1), obtaining a text vector representation of answer perception, sending the text vector representation into a Transformer model for decoding, and at each time step of decoding, selecting a semantic graph needing important attention and fact triples in the semantic graph based on double planning, namely fact-level planning and semantic graph-level planning, and assisting the generation of the current word,

5) Designing a loss function, and training a problem generation model through multiple iterations;

in step 4), based on the encoding results of the text and the semantic structure diagram, using a Transformer as a decoder to generate a problem, and at each time step of decoding, based on a double planning, namely a fact-level planning and a semantic-level planning, selecting a semantic diagram and a fact triplet therein which need to be focused, and assisting the generation of a current word, specifically, firstly selecting the semantic structure diagram, then selecting a related fact triplet from the semantic structure diagram, and finally updating the hidden state of the decoder based on a text vector and the selected fact triplet, and generating the current word,

wherein the semantic graph level planning aims at text semantic vector based at each decoding time step

And words generated in the previous time step are selected through an attention mechanism, a semantic structure diagram which needs important attention at present is obtained, a semantic structure diagram representation based on attention is obtained, and then the semantic structure diagram representation based on attention and a text semantic vector are represented

Splicing, calculating the probability of each sub-graph through the softmax layer, selecting the sub-graph with the highest probability to guide the generation of the current problem,

fact-level planning aims at text-based semantic vectors at each decoding time step

Selecting a fact triple needing important attention currently through an attention mechanism to obtain an attention-based fact triple representation in a kth semantic structure diagram, and then representing the attention-based fact triple and a text semantic vector

And splicing, calculating the probability of each fact triple through the softmax layer, and selecting the fact triple with the highest probability to guide the generation of the current problem.

2. The method for generating discourse-level complex questions based on dual planning as claimed in claim 1, wherein in the step 1), the BERT is used to encode the given text and answers, and the input form is

3. The generation method of space-level complex problems based on dual planning as claimed in claim 1, wherein in step 2), a fine-grained semantic structure diagram is constructed for each sentence sequence in a given article, an adaptive cross-sentence reference resolution technique is firstly adopted to replace pronouns with entities referred to by the sentence sequences, so that the entities can be fused during subsequent composition, in the adaptive cross-sentence reference resolution technique, entity elements need to be replaced with entities in the real world, each entity element needs to be represented as a semantic vector, and then similarity features are input into a softmax layer to predict and query the entities

And a set of candidates, and predicting the entities

And the candidate with the greatest co-designated probability.

4. The method of claim 1, wherein in step 2), an adaptive cross-sentence resolution technique is used, and in order to predict the inter-sentence coreference links, an algorithm is used to traverse the sentence list and predict the coreference links between the entities mentioned in the current sentence and the candidate clusters calculated across all previous sentences, the algorithm first randomly ordering the sentence list D, and then, for each sentence, the algorithm calculates the coreference links between the candidate clusters and the entity list D

Entity in (1)

From a previous sentence

Computing a candidate set in a common set of fingers

Wherein, in the process,

，

the number of the sentences is represented by,

then predict

And a candidate object

Inter-finger links

Finally, the predicted candidate set is updated and recalculated

Of the new candidate object.

5. The method for generating discourse-level complex problems based on dual planning as claimed in claim 4, wherein in the step 2), each entity predicts the co-reference link by using the adaptive cross-sentence resolution technique

The number of possible candidates of (2) increases with the number of previous sentences, the calculation cost increases, only the sentence is considered in the calculation process

Similar previous sentence

。

6. The generation method of chapter-level complex problems based on dual planning as claimed in claim 1, wherein in step 2), after each sentence is subjected to reference resolution, entity relationship triples are extracted from the sentence by using a semantic graph construction method of memory perception to construct a semantic graph, in the semantic graph construction method of memory perception, an iterative storage is used to store the extraction results generated in each round in a memory so that the next decoding iteration accesses all previous extractions, first, the sentence is input into a sequence architecture to generate a first extraction result, then, the extraction result and a source sentence are spliced and input into the sequence architecture again to generate a new extraction result, and the process is repeated until EndExOftractions are generated to represent that the extraction process is finished;

in the step 2), a sequence-to-sequence model is used in the semantic graph construction method adopting memory perception, a scoring frame is used to obtain high-quality extraction results, the collected extraction results are scored firstly, good extraction results can obtain higher values than bad and redundant extraction results, then redundant data in the extraction results are filtered, and high-quality fact triples are obtained through the scoring frame, so that the semantic graph is constructed.

7. The method for generating chapter-level complex problems based on dual programming as claimed in claim 1, wherein in step 3), when encoding the semantic structure diagram, the edges in the structure diagram are also used as nodes to encode, for a certain semantic structure diagram, a pre-trained word vector is used to initialize the node embedding vector therein, then in order to capture the semantic relation between the nodes, a relation-enhanced graph Transformer is used to encode the nodes, the method uses a relation-enhanced multi-head attention mechanism to obtain the embedding vector of each node, so that when encoding each node in the semantic structure diagram, not only the encoded information of the current node but also the information of other nodes in the semantic structure diagram, i.e. the relation between the current node and the other nodes is preserved, and finally, all the node vectors in the semantic structure diagram are input into a Fully connected feed-forward Network (FFN) to obtain the final node semantic representation vector, and residual connection is used to solve the problem in deep learning, and obtain the fact average node vector representing the same semantic vector in a semantic triple feed-forward Network, and the fact average node vector is obtained after the fact vector is input into a feed forward layer.

8. The method for generating a dual-programming-based discourse-level complex problem according to claim 1, wherein in the step 4), in the process of decoding the generated problem, an overlay mechanism is incorporated to encourage the decoder to overlay all semantic structure diagrams and all fact triples when generating words.

9. The method for generating a chapter-level complex problem based on dual programming according to claim 1, wherein in the step 5), the loss function is composed of three parts, i.e. cross entropy loss, supervision information loss and coverage loss, wherein the cross entropy loss refers to minimization of negative log likelihood of all model parameters, the supervision information loss refers to deviation between the semantic graph and the fact selected by the dual programming and the standard semantic graph and the fact, and the coverage loss refers to additional calculation of coverage loss when calculating the coverage vector of the semantic graph and the fact in the step 4), so as to constrain the model to pay attention to a certain semantic graph or a certain fact repeatedly.