CN115510814A - Chapter-level complex problem generation method based on double planning - Google Patents
Chapter-level complex problem generation method based on double planning Download PDFInfo
- Publication number
- CN115510814A CN115510814A CN202211394785.6A CN202211394785A CN115510814A CN 115510814 A CN115510814 A CN 115510814A CN 202211394785 A CN202211394785 A CN 202211394785A CN 115510814 A CN115510814 A CN 115510814A
- Authority
- CN
- China
- Prior art keywords
- semantic
- fact
- sentence
- vector
- planning
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a chapter-level complex question generation method based on double planning, which is mainly used for generating a natural language question sequence capable of being answered by answers according to a given article and answers. The method comprises the steps of firstly, coding given articles and answers by using a pre-training language model BERT to obtain a semantic vector perceived by the answers. Then, a semantic structure diagram is constructed for each sentence sequence in a given article, the semantic structure diagrams are encoded by adopting a multi-head attention mechanism, and relevant information among the semantic structure diagrams is obtained to guide generation of a complex problem. And finally, generating a complex problem by using a neural network Transformer as a decoder, selecting a semantic graph needing important attention and a fact triple in the semantic graph based on double planning, namely fact-level planning and semantic graph-level planning at each decoding time step, enhancing the complexity of the problem generation by integrating the information, and assisting the generation of the current vocabulary.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a chapter-level complex problem generation method based on double planning.
Background
In recent years, with the rapid development of artificial intelligence, a Question Generation (QG) task has become a current research focus. Question generation refers to the automatic generation of content-related, language-friendly natural language questions from a range of data sources (e.g., text, pictures, knowledge bases). The question generation task of the present invention is input by the fact text and the answer. The question generation task has wide application prospect and can generate training data for the question and answer task; actively proposing a question in a dialogue system to improve the fluency of the dialogue; and (3) constructing automatic tutoring systems (automatic tutoring systems) to generate targeted problems according to course materials, tutoring students to learn and the like.
The current QG method based on deep learning mainly studies generation of simple problems, and there is little work to study generation of complex problems. The simple problem refers to a problem only containing one entity relationship triple, the complex problem refers to a problem containing a plurality of entity relationship triples, and answers can be obtained only through complex multi-hop reasoning. The generation of complex questions has more realistic significance than simple questions that contain only one entity-relationship triplet, for example, in the field of education, because different students have different abilities to accept knowledge, it is difficult to test the true level of a student if a simple question is generated. For students with strong abilities, complex questions need to be tested to get true feedback. In addition, the performance of the existing Question Answering (QA) system on simple questions reaches the bottleneck, and the complex questions are more beneficial to the improvement of the QA system. Therefore, the method has certain practical value and application prospect in research of complex problem generation. However, most of the existing complex problem generation methods are based on knowledge graph complex problem generation, and the methods cannot be directly applied to problem generation of unstructured texts which lack clear logic structures. In the generation of complex problems based on texts, multiple texts are generally used as input, and the situation that complex problems are generated on a single text is not considered. In addition, the methods directly integrate the sentence sequence of the node when modeling effective information, and do not further screen the fact in the sentences. And a sentence often contains a plurality of facts as well. Therefore, the problem generation method at the discourse level lacks of integral planning, cannot select specific facts, and is easy to cause mismatching of entities and relations, thereby affecting the correctness of the facts of the problem. Moreover, sentences contain other redundant information and noise may be introduced.
Therefore, the invention provides a dual-planning-based chapter-level problem generation model, a semantic structure diagram is constructed for each sentence in a text, and information needing important attention in each decoding time step is accurately positioned through dual planning (fact-level planning and semantic diagram-level planning). Specifically, during decoding, a semantic structure diagram needing attention is selected, fact triple information needing attention is further determined, and complexity of a generation problem is enhanced by integrating the fact triple information into the semantic structure diagram.
Disclosure of Invention
The technical problem to be solved by the invention is that most of the existing complex problem generation methods construct a semantic graph, ignore rich fact information contained in a single sentence, lack of overall planning causes that specific facts cannot be selected, and entity and relation are easy to mismatch, thereby influencing the fact correctness of the problem.
The technical scheme adopted by the invention for solving the technical problem is as follows: a chapter-level complex problem generation method based on double planning. The method comprises the steps of firstly, coding given articles and answers by using BERT to obtain semantic vectors of answer perception. Then, a semantic structure diagram is constructed for each sentence sequence in a given article, the semantic structure diagrams are encoded by adopting a multi-head attention mechanism, and relevant information among the semantic structure diagrams is obtained to guide generation of a complex problem. And finally, generating a complex problem by adopting transform decoding, selecting a semantic graph needing important attention and a fact triple therein based on double planning (fact-level planning and semantic graph-level planning) at each decoding time step, and enhancing the complexity of the generated problem by integrating the information to assist the generation of the current word.
The invention discloses a chapter-level complex problem generation method based on double planning, which comprises the following steps:
1) And coding the given article and the answer by using BERT to obtain text vector representation of answer perception.
2) For each sentence sequence in a given article, the sentence sequence is subjected to preliminary processing by using a self-adaptive cross sentence referring and resolving technology, and then a semantic structure diagram with fine granularity is constructed by adopting a memory perception semantic diagram construction method.
3) Processing the edges in the graph as nodes for the fine-grained semantic structure graph finally obtained in the step 2), firstly obtaining the vector representation of each node in the sentence through a multi-head attention-map encoder, then obtaining the vector representation of a single fact, and finally obtaining the vector representation of the whole graph.
4) And (3) obtaining a text vector representation of answer perception for the step 1), and sending the text vector representation into a Transformer model for decoding. And at each time step of decoding, selecting a semantic graph needing important attention and fact triples in the semantic graph based on double planning (fact-level planning and semantic graph-level planning), and assisting the generation of the current word.
5) And designing a loss function, and training a problem generation model through multiple iterations.
As a further improvement of the invention, in the step 1), the BERT is adopted to code the given text and the answer, and the input form isSpecifically, the text sequence and the answer are spliced, and a separator is inserted in the middleSeparating text from answer and inserting specific classification identifier at the beginningAfter the pretraining process of BERT, the classification identifier learns the characterization information of the fused text and the answer, and is represented by a vector C.
As a further improvement of the present invention, in the step 2), a fine-grained semantic structure diagram is constructed for each sentence sequence in a given article. Firstly, adopting self-adaptive cross sentence reference resolution technique to substituteThe words are replaced by the entities, so that the entities can be conveniently fused in the subsequent composition. In the adaptive cross-sentence resolution technique, entity maintenance needs to be replaced with a real-world entity. Each entity instance is first represented as a semantic vector. Query entities are then predicted by entering similarity features in the softmax layerAnd a set of candidates, and predicting the entitiesAnd the candidate with the greatest co-designated probability.
As a further improvement of the present invention, in said step 2), an adaptive cross sentence referential resolution technique is employed, and in order to predict the co-indicated links across sentences, an algorithm is employed to traverse the sentence list and predict the co-indicated links between the entities mentioned in the current sentence and the candidate clusters calculated across all previous sentences, the algorithm first randomly ordering the sentence list D, and then, for each sentenceEntity in (1)From the previous sentenceComputing a candidate set in a common set of fingersWherein, in the step (A),,the number of the sentences is represented by,then predictAnd a candidate objectInter-finger linksFinally, the predicted candidate set is updated and recalculatedThe new candidate of (2).
As a further improvement of the invention, in the step 2), when the adaptive cross-sentence reference resolution technology is adopted to predict the common reference link, each entityThe number of possible candidates of (2) may grow as the number of previous sentences increases, and the computational cost increases greatly. In order to reduce the calculation cost, the invention only considers the sum of the two in the calculation processSimilar previous sentence。
As a further improvement of the present invention, in step 2), after each sentence is subjected to reference resolution, entity relationship triples are extracted from the sentences by using a semantic graph construction method of memory perception, so as to construct a semantic graph. In the semantic graph construction method based on memory perception, the invention uses an iterative storage to store the extraction result generated in each round in a memory, so that the next decoding iteration can access all the previous extractions. Specifically, a first extraction result is generated by inputting sentences into the sequence-to-sequence architecture, then the extraction result is spliced with the source sentence and then input into the sequence-to-sequence architecture again to generate a new extraction result, and the process is repeated until EndOfextractions are generated, wherein the symbol indicates that the extraction process is finished.
As a further improvement of the present invention, in the step 2), a sequence-to-sequence model is used in the semantic graph construction method using memory perception. In order to train the sequence to a sequence model, the present invention requires a set of sentence-extraction pairs as training data. The manual construction of the data set has a good effect, but is time-consuming and labor-consuming, and a large-scale data set cannot be constructed. Thus, the present invention proposes a method for automatically constructing sentences-extracting pairs of data sets. Generally speaking, the automatic construction of a data set is divided into two steps, and all the extracted results are firstly sorted according to the confidence degree descending order output by the original system. Then, training data is constructed according to the input and output format of the model. But simply aggregating all the extraction results is not feasible. Because of the following problems: 1) No calibration: the confidence scores assigned by the different systems are not calibrated to comparable scales. 2) And (3) redundancy extraction: in addition to the full repetition, multiple systems will produce similar extractions, but with less marginal utility. 3) And (3) error extraction: pooling inevitably contaminates the data and amplifies false instances, forcing open information extraction systems downstream to learn poor quality extractions. In order to solve the above problems, the present invention uses a scoring-filtering framework to obtain high quality extraction results. The pooled extraction results are scored first, and generally, good (correct, informative) extraction results will get higher values than bad (incorrect) and redundant extraction results. And then redundant data in the extraction result is filtered out. By the scoring-filtering framework, high-quality fact triples can be obtained, and a semantic graph can be constructed.
As a further improvement of the present invention, in the step 3), when the semantic structure diagram is encoded, the edges in the structure diagram are also encoded as nodes. For a certain semantic structure chart, firstly, a word vector trained in advance is adopted to initialize a node embedding vector in the semantic structure chart. Then, in order to capture semantic relations between nodes, a graph Transformer with enhanced relations is adopted to encode the nodes. The method uses a multi-head attention mechanism with enhanced relationship to obtain the embedding vector of each node, so that when each node in the semantic structure chart is coded, the method not only contains the coding information of the current node, but also contains the information of other nodes in the semantic structure chart, namely, the relation between the current node and the other nodes is kept. And finally, inputting all node vectors in the semantic structure chart into a full connected feed-forward Network (FFN) to obtain a final node semantic representation vector, and solving the degradation problem in deep learning by adopting residual connection. And after the node semantic expression vector is obtained, the node vectors positioned in the same fact triple in the graph are input into the average pooling layer to obtain the semantic vector expression of the fact triple. Similarly, when calculating the vector representation of the ith semantic structure diagram, all the fact triple representation vectors contained in the diagram are input into the average pooling layer to obtain the semantic vector representation of the semantic structure diagram.
As a further improvement of the present invention, in the step 4), a problem is generated by using a Transformer as a decoder based on the encoding result of the text and semantic structure diagram. At each time step of decoding, selecting a semantic graph needing important attention and a fact triple in the semantic graph based on double planning (fact-level planning and semantic graph-level planning), and assisting the generation of the current word. Specifically, a semantic structure diagram is selected first, then relevant fact triples are selected from the semantic structure diagram, and finally the hidden state of the decoder is updated based on the text vector and the selected fact triples, and a current word is generated.
The semantic graph level planning aims at selecting a semantic structure graph which needs to be focused at present through an attention mechanism on the basis of a text semantic vector C and words generated in previous time steps at each decoding time step to obtain a semantic structure graph representation based on attention. And then, splicing the attention-based semantic structure graph representation and the text semantic vector C, calculating the probability of each subgraph through a softmax layer, and selecting the subgraph with the highest probability to guide the generation of the current problem.
The fact-level planning aims at selecting fact triples needing important attention currently through an attention mechanism on the basis of the text semantic vector C, words generated at a previous time step and the selected semantic structure diagram at each decoding time step to obtain attention-based fact triplet representations in the kth semantic structure diagram. Similar to semantic graph-level planning, attention-based fact triple representations and text semantic vectors C are spliced together, the probability of each fact triple is calculated through a softmax layer, and the fact triple with the highest probability is selected to guide generation of the current problem.
As a further improvement of the present invention, in the step 4), a problem is generated by using a Transformer as a decoder based on the encoding result of the text and semantic structure diagram. At each time step of decoding, selecting a semantic graph needing important attention and a fact triple in the semantic graph based on double planning (fact-level planning and semantic graph-level planning), and assisting the generation of the current word. Specifically, a semantic structure diagram is selected first, then relevant fact triples are selected from the semantic structure diagram, and finally the hidden state of the decoder is updated based on the text vector and the selected fact triples, and a current word is generated.
As a further improvement of the present invention, in said step 5), the loss function is composed of three parts — cross entropy loss, supervision information loss, coverage loss. Where cross-entropy loss refers to minimizing the negative log-likelihood of all model parameters. The loss of supervision information refers to the deviation between the semantic graph and facts of the dual planning selection and the standard semantic graph and facts. The coverage loss refers to that when the coverage vectors of the semantic graph and the fact are calculated in the step 4), the coverage loss is additionally calculated, so that the model is constrained to pay attention to a certain semantic graph or a certain fact repeatedly.
Has the advantages that:
compared with the prior art, the invention has the following advantages: 1) The existing problem generation method only constructs a semantic graph from a chapter level, and easily omits rich factual information contained in sentences. The semantic structure diagram is constructed for each sentence sequence in a given article, so that the fact in the sentence can be comprehensively and accurately acquired, and powerful data support is provided for generating complex problems. 2) The existing method is lack of overall planning, cannot select specific facts, and easily causes mismatching of entities and relations, thereby influencing the correctness of the facts of the problems. The invention uses double planning, can select the semantic graph and the fact triple in the semantic graph and the fact triple which need to focus on through the semantic graph level planning and the fact level planning in the decoding process, and ensures the matching of the generated relation and the entity through integrating the information to assist the generation of the current word, thereby improving the correctness of the problem fact.
Experimental analysis proves that the chapter-level complex problem generation method based on the double-planning provided by the method has an improvement effect on improving the fact correctness of generating complex problems and enhancing the problem generation effect.
Drawings
FIG. 1 is a schematic diagram of the basic process of the present invention;
FIG. 2 is a diagram of a model framework of the present invention;
fig. 3 is a diagram of a dual-programming based decoding implementation of the present invention.
Detailed Description
The invention is further described with reference to the following examples and the accompanying drawings.
Example (b): the invention discloses a sentence-level problem generation method based on syntactic perception prompt learning, which comprises the following steps: 1) The given article and answer are encoded using BERT, resulting in a text vector representation of the answer perception. The BERT is based on a bidirectional Transformer structure, realizes integrated feature fusion by adopting a mask language model, can model a word polysemy phenomenon, and generates a deep bidirectional language representation. Therefore, the invention adopts BERT coding, and the specific input form is
Specifically, the text sequence and the answer are spliced, and a separator is inserted in the middleThe text and the answer are separated. Inserting a specific classification identifier at the beginningAfter the pretraining process of BERT, the classification identifier learns the characterization information of the fused text and the answer, and is represented by a vector C.
2) For each sentence sequence in a given article, the sentence sequence is preliminarily processed by using a self-adaptive cross sentence referring and resolving technology, and then a semantic structure diagram with fine granularity is constructed by adopting a semantic diagram construction method of memory perception. The semantic information among different entities in the text can be clearly displayed by constructing the semantic structure chart, so that the appropriate information can be conveniently selected to be integrated into the problem in the decoding process, and the generation of the complex problem is assisted. Because the single text has a long space, the chapter constructs a semantic structure diagram for each sentence sequence in the text independently, which is favorable for capturing semantic information more accurately. For each sentence sequence, firstly, a self-adaptive cross sentence reference resolution technology is adopted, and pronouns are replaced by the referred entities, so that the entities can be conveniently fused during subsequent composition. In the adaptive cross-sentence resolution technique, entity maintenance needs to be replaced with a real-world entity. For each entity, definition isIn whichIs an entity that is to be created,is an entityA set of participating events. Each entity instance is first represented as a semantic vector. The invention inputs entity span into BERT to obtain initial vectorRepresentThen each event is obtained in the same wayVector representation ofInputting the event set into a BilSTM, and externally connecting a mean pooling layer to obtain a vector representation of the event set. Finally, combining the initial vector representation of the entity with the vector representation of the event set to obtain the final entity comment semantic representation vector。
Suppose thatIs a set of related entitiesThe antecedent of (a) refers collectively to a cluster. The invention combines sentence-level information via an incremental amountAnd word level informationIs composed ofEach co-referent entity antecedent set P computing candidate cluster representation in. Wherein the content of the first and second substances,the finger will compriseThe sentences of (1) are represented by the vectors of CLS obtained by BERT, and the semantic information of the sentences is contained. The calculation method is as follows:
wherein, in the process,are all learning parameters. Then all of the common referent entity antecedent words in the set PAveraging to obtain candidate cluster representation。
Query entities are then predicted by entering similarity features in the softmax layerAnd a co-designated link between a set of candidates. Suppose thatIs thatIsA set of candidate representations, the invention first uses cosine similarityAnd multi-view cosine similarityCalculating each candidateAnd entitiesThe similarity of (c). These similarity features are then combined with the differences of the candidate and query and the dot product to
A final characterization is obtained, the calculation formula is as follows:
then, for all candidatesWe compute query entitiesProbability associated therewithThe calculation method is as follows:
To predict coreference links across sentences, the present invention designs an algorithm to traverse the sentence list and predict coreference links between entities mentioned in the current sentence and candidate clusters computed across all previous sentences. The algorithm first arbitrarily orders the sentence list D, and then, for each sentenceEntity in (1)From the previous sentenceComputing a candidate set in a common set of fingersWherein, in the step (A),,the number of the sentences is represented by,then predictAnd a candidate objectInter-finger linkFinally, the predicted candidate set is updated and recalculatedThe new candidate of (2).
Predicting common reference links for each entity using adaptive cross-sentence reference resolution techniquesThe number of possible candidates of (2) may grow as the number of previous sentences increases, and the computational cost increases greatly. In order to reduce the calculation cost, the invention only considers the sum of the two in the calculation processSimilar previous sentence. The present invention considers sentences having the same topic as similar sentences. During training, the present invention uses standard entity clusters to compute candidate and standard sentence topic clusters. In contrast, in the inference process, the co-designated cluster of the current prediction is used to compute the candidates. In addition, the predicted topic clusters computed using K-means. By minimizing the cross entropy loss training model for batch computations, all M entities in a single sentence form a batch, and the loss is computed after M sequential predictions. After each sentence is subjected to reference resolution, a semantic graph construction method of memory perception is adopted, and a triple in a format of (head entity, relation and tail entity) is extracted from the sentence. The head and tail entities represent the subject and object, respectively, and the relationship is equivalent to the predicate that connects the subject and object. In the semantic graph construction method based on memory perception, the invention uses an iterative storage to store the extraction result generated in each round in a memory, so that the next decoding iteration can access all the previous extractions. Specifically, a first extraction result is generated by inputting sentences into the sequence-to-sequence architecture, then the extraction result is spliced with the source sentence and then input into the sequence-to-sequence architecture again to generate a new extraction result, and the process is repeated until EndOfextractions are generated, wherein the symbol indicates that the extraction process is finished. Because a semantic graph construction method adopting memory perception uses a sequence-to-sequence model, in order to train the sequence-to-sequence model, the invention needs a group of sentence-extraction pairs as training data. The manual construction of the data set has good effect, but is time-consuming and labor-consuming, and a large-scale data set cannot be constructed. Therefore, the present invention proposes a method for automatically constructing sentences-extracting pairs of data sets. Generally speaking, the automatic construction of a data set is divided into two steps, and all the extracted results are firstly sorted according to the confidence degree descending order output by the original system. Then, training data is constructed according to the input and output format of the model. But simply aggregating all the extraction results is not feasible. Because of the fact thatThere are the following problems: 1) No calibration: the confidence scores assigned by the different systems are not calibrated to comparable scales. 2) And (3) redundancy extraction: in addition to the full repetition, multiple systems will produce similar extractions, but with less marginal utility. 3) And (3) error extraction: pooling inevitably contaminates the data and amplifies false instances, forcing open information extraction systems downstream to learn less quality extractions. To solve the above problem, the present invention uses a scoring-filtering framework to obtain a high quality extraction result. And (3) scoring: the present invention relates to a model for scoring an aggregated draw, which is pre-trained on a random pilot data set. The random guidance data set is generated by randomly extracting each sentence from any of the guidance systems being aggregated. The model assigns a score to each extraction in the pool based on its confidence value, and generally speaking, good (correct, information-rich) extractions will get higher values than bad (incorrect) and redundant extractions. And (3) filtering: and then redundant data in the extraction result is filtered out. For a given set of ranked extraction results, the present invention wishes to select the extraction subset (assigned by the stochastic bootstrap model) with the best confidence score, while having the least similarity to the other selected extractions. Therefore, the invention constructs a complete weighted graph based on all the extraction results in a set of orderingEach node in the graph corresponds to a decimation result. Each pair of nodesConnected by an edge. Each edge having an associated weightRepresenting the similarity between two corresponding extractions. Each nodeIs assigned a scoreThis score is equal to the confidence given by the stochastic bootstrap model. And then selecting the best subgraph from themAs a result of the high quality extraction, other nodes in the graph default to redundant data and are automatically filtered. The process is mathematically expressed as follows:
wherein, in the step (A),representation diagramThe node(s) in (1) is (are),representing nodesAndright 2 score in between. The first term of the formula refers to the sum of significance of all selected triples and the second term refers to redundant information between those triples. If the figure isThere are n nodes, then the target may be set to:
(ii) a WhereinScore representing a node, i.e.And are each and everyIs provided with an item A symmetric matrix of (a).Is a decision vector in whichRepresenting a particular nodeWhether or not to belong to a graph. By the scoring-filtering framework, high-quality fact triples can be obtained, and finally, entities are used as nodes, and relationships are used as edges connecting two entities to construct a semantic structure chart.
3) And (3) regarding the fine-grained semantic structure diagram finally obtained in the step 2), processing the edges in the diagram as nodes, and obtaining the vector representation of the whole diagram through a multi-head attention diagram encoder. Specifically, for a certain semantic structure diagram, firstly, a word vector trained in advance is used for initializing a node embedding vector therein. Then, in order to capture semantic relations between nodes, the invention adopts a graph Transformer with enhanced relations to encode the nodes. The method uses a relation-enhanced multi-head attention mechanism to obtain each nodeEmbedding vector of,Is the size of the node embedding, and the calculation formula is as follows:
wherein the content of the first and second substances,the multi-head attention mechanism is a model parameter, and the multi-head attention mechanism has the function that when each node in the semantic structure diagram is coded, the multi-head attention mechanism not only contains the coding information of the current node, but also contains the information of other nodes in the semantic structure diagram, namely, the connection between the current node and the other nodes is kept. This process is formulated as follows:
as can be seen from the formula, the key point of the multi-head attention mechanism is to integrate the semantic relationship between nodes into the query vectorSum key vectorIn (1). Wherein, the first and the second end of the pipe are connected with each other,are respectively nodesThe shortest relation path betweenThe coding of (2). The weaveThe code result is obtained by adding the embedding vectors of all the relation nodes in the path.
Finally, all the node vectors in the semantic structure chart are input into a full connected feed-forward Network (FFN) to obtain a final node semantic expression vector, and the degradation problem in deep learning is solved by adopting residual connection, wherein the calculation formula is as follows:
(ii) a Wherein, the first and the second end of the pipe are connected with each other,are all of the parameters that can be trained in the training process,is a linear network using the gelu activation function.
After obtaining the node semantic expression vector, inputting the node vector positioned in the same fact triple in the graph into an average pooling layer to obtain the semantic vector expression of the fact tripleIs shown asIn the semantic structure chartA fact triplet. Similarly, in calculatingWhen the vector of each semantic structure chart is represented, all fact triple representing vectors contained in the chart are input into the average pooling layer to obtain the semantic vector representation of the semantic structure chartThe calculation formula is as follows:
(ii) a Wherein, the first and the second end of the pipe are connected with each other,the average pooling function is represented by the average pooling function,represents the firstIn the semantic structure diagramAll the nodes embedding vectors in a fact triplet,represents the firstAll fact triplet vectors in a semantic structure diagram.
4) And (3) obtaining a text vector representation of answer perception for the step 1), and sending the text vector representation into a Transformer model for decoding. And at each time step of decoding, selecting a semantic graph and a fact triple therein which need to focus on based on double planning (fact-level planning and semantic graph-level planning), and assisting the generation of the current word. As shown in FIG. 3, specifically, a semantic structure diagram is first selectedThen, relevant fact triples are selected from the semantic structure chartFinally, the hidden state of the decoder is updated based on the text vector and the selected fact tripletAnd generates the current wordThe calculation process is as follows:
the semantic graph level planning aims at selecting a semantic structure graph needing important attention at each decoding time step through an attention mechanism based on a text semantic vector C and words generated at previous time steps to obtain a semantic structure graph representation based on attentionIn order to prevent the decoder from repeatedly selecting the same semantic graph, the invention integrates an overlay mechanism to encourage the decoder to overlay all semantic structure graphs when generating words. The calculation process is as follows:
(ii) a Wherein, the first and the second end of the pipe are connected with each other,respectively to the semantic structure chartThe degree of attention and coverage of the user,are all model parameters.
At each time step of model decoding, a coverage loss is calculated for the selected semantic graph in the following manner:then the attention-based semantic structure graph is representedAnd splicing the text semantic vector C with the text semantic vector C, calculating the probability of each sub-graph through a softmax layer, and selecting the sub-graph with the highest probability from the probability to guide the generation of the current problem.
The fact-level planning aims at each decoding time step, based on the text semantic vector C, the words generated at the previous time step and the selected semantic structure diagramSelecting the fact triplets needing important attention currently through an attention mechanism to obtain the firstAttention-based fact triplet representation in semantic structure graph. Similar to semantic graph-level programming, to prevent the decoder from repeatedly selecting the same fact triplet, the present invention incorporates an override mechanism that encourages the decoder to override all fact triplets when generating words.
Wherein, in the step (A),are respectively a pair fact tripletThe degree of attention and coverage of the user, are all model parameters.
Similarly, at each time step of model decoding, a coverage loss is computed for the selected fact triplet, as follows:
attention-based fact triplets are then representedAnd (3) splicing with the text semantic vector C, calculating the probability of each fact triple through a softmax layer, and selecting the fact triple with the highest probability to guide the generation of the current problem.
5) And designing a loss function, and training a problem generation model through multiple iterations. The loss function consists of three parts-cross entropy loss, supervision information loss, and coverage loss. Wherein, the cross entropy loss refers to minimizing the negative log-likelihood of all model parameters, and when a text D and an answer A are given, the calculation mode is as follows:
the invention also makes statistics for the semantic structure chart and fact triple supervision information selected in each step of reasoning process, at the same time, analyzes the question and answer, finds the answer and the entity involved in the question in the text, thereby determining the standard semantic chart and the fact triple supervision informationAnd fact triplets. At each time step of the problem generation, probability distribution of the semantic structure chart is generatedAnd probability distribution of fact tripletsAt this time, the semantic structure diagram is compared with the semantic structure diagram of the standard which should be selectedAnd fact tripletsMatching is performed and the corresponding losses are calculatedThe formula is as follows:
(ii) a The coverage loss refers to that when the coverage vectors of the semantic graph and the fact are calculated in the step 4), the coverage loss is additionally calculated, and the repeated selection of information in the same graph can be effectively avoided through the coverage loss, because the loss generated by one graph is larger when the selected attribute is higher in the past.
(ii) a The present example was evaluated using the following criteria: in the automatic Evaluation aspect, bilingual Evaluation assistance (BLEU) is used to evaluate the degree of coincidence of the generated result and the standard result; displaying a ranking Translation Evaluation (METEOR) for evaluating semantic relevance between the generated result and the standard result; recall-Oriented abstract Evaluation-assisted (round-L) Evaluation of the longest common substring between the generated result and the standard result. In terms of manual evaluation, fluency is used to illustrate the flow of generating a result representationDegree of smoothness; evaluating the correlation degree of the generated result and the given input text; the complexity refers to whether the generated question is complex or not, and the complexity can be evaluated by observing the number of clauses, the number of modifiers and the number of steps of multi-hop reasoning required for answering the question in the sentence. Correctness refers to whether the fact contained in the generation problem is correct, i.e., whether fact triples exist in a given source text, and whether entities and relationships match.
In order to verify the effect of the invention, automatic evaluation and manual evaluation are carried out on the common data sets SQuAD and MSMARCO. The experimental results are as follows:
table 1 automatic evaluation results of different methods on squid;
table 2 automatic evaluation results of different methods on MSMARCO;
table 3 results of manual evaluation of different methods on msstrarco;
the method achieves optimal performance on a plurality of data sets, and is greatly improved compared with other methods.
The above examples are only preferred embodiments of the present invention, it should be noted that: it will be apparent to those skilled in the art that various modifications and equivalents can be made without departing from the spirit of the invention, and it is intended that all such modifications and equivalents fall within the scope of the invention as defined in the claims.
Claims (10)
1. A chapter-level complex problem generation method based on double planning is characterized by comprising the following steps:
1) Coding given articles and answers by adopting a pre-training language model BERT to obtain text vector representation of answer perception,
2) For each sentence sequence in a given article, the sentence sequence is preliminarily processed by using a self-adaptive cross sentence referring and resolving technology, then a semantic structure diagram with fine granularity is constructed by adopting a semantic diagram construction method of memory perception,
3) Processing the edges in the graph as nodes for the fine-grained semantic structure graph finally obtained in the step 2), firstly obtaining the vector representation of each node in the sentence through a multi-head attention-map encoder, then obtaining the vector representation of a single fact, and finally obtaining the vector representation of the whole graph,
4) For the text vector representation of answer perception obtained in the step 1), sending the text vector representation into a Transformer model for decoding, and at each time step of decoding, selecting a semantic graph needing important attention and fact triples in the semantic graph based on double planning, namely fact-level planning and semantic graph-level planning, assisting the generation of the current word,
5) And designing a loss function, and training a problem generation model through multiple iterations.
2. The method for generating discourse-level complex problems based on dual programming according to claim 1, wherein in the step 1), the BERT is adopted to encode the given text and the answers, and the input form isSpecifically, the text sequence and the answer are spliced, and a separator is inserted in the middleSeparating text from answer and inserting specific classification identifier at the beginningAfter the pretraining process of BERT, the classification identifier learns the characterization information of the fused text and the answer, and is represented by a vector C.
3. The method for generating discourse-level complex problems based on dual planning as claimed in claim 1, wherein in step 2), a fine-grained semantic structure diagram is constructed for each sentence sequence in a given article, firstly, an adaptive cross-sentence reference resolution technique is adopted, pronouns are replaced by the entities referred to by the sentence sequences, the entities are fused during subsequent composition, in the adaptive cross-sentence reference resolution technique, the entity instances are required to be replaced by entities in the real world, firstly, each entity instance is expressed as a semantic vector, and then, the query entity is predicted by inputting similarity features into the softmax layerAnd coreference links between a set of candidates and predicting entitiesAnd the candidate with the greatest co-designated probability.
4. The method for generating discourse-level complex problems based on dual programming according to claim 1, wherein in step 2), adaptive cross-sentence reference resolution is applied, and in order to predict the co-reference links across sentences, an algorithm is applied to traverse the sentence list and predict the co-reference links between the entities mentioned in the current sentence and the candidate clusters calculated across all previous sentences, the algorithm first randomly ordering the sentence list D, and then, for each sentenceEntity in (1)From a previous sentenceComputing a candidate set in a common set of fingersWherein, in the step (A),,the number of the sentences is represented by,then predictAnd a candidate objectInter-finger linkFinally, the predicted candidate set is updated and recalculatedThe new candidate of (2).
5. The method for generating discourse-level complex problems based on dual programming according to claim 4, wherein in the step 2), each entity predicts the common reference link by adopting an adaptive cross-sentence reference resolution techniqueThe number of possible candidates of (a) may increase with the number of previous sentences, the computational cost increases, and the total number of candidates of (b) may increaseOnly sentences are considered in the calculation processSimilar previous sentence。
6. The generation method of space-level complex problems based on double planning as claimed in claim 1, wherein in step 2), after each sentence is subjected to reference resolution, entity-relationship triples are extracted from the sentence by using a semantic graph construction method of memory perception to construct a semantic graph, in the semantic graph construction method of memory perception, an iterative storage is used to store the extraction results generated in each round in a memory so that the next decoding iteration can access all previous extractions, first, a first extraction result is generated by inputting the sentence into a sequence-to-sequence architecture, then, the extraction result is input into the sequence-to-sequence architecture again after being spliced with a source sentence to generate a new extraction result, and the process is repeated until EndOfextructions are generated, wherein the symbol represents that the extraction process is finished;
in the step 2), a sequence-to-sequence model is used in the semantic graph construction method adopting memory perception, a scoring frame is used to obtain high-quality extraction results, the collected extraction results are scored firstly, generally, good extraction results can obtain higher values than bad and redundant extraction results, then redundant data in the extraction results are filtered, and high-quality fact triples can be obtained through the scoring frame, so that the semantic graph is constructed.
7. The method for generating chapter-level complex problems based on dual programming as claimed in claim 1, wherein in step 3), when encoding the semantic structure diagram, the edges in the structure diagram are also used as nodes for encoding, for a semantic structure diagram, a pre-trained word vector is used to initialize the node embedding vector therein, then in order to capture the semantic relation between the nodes, a relationship-enhanced graph Transformer is used to encode the nodes, the method uses a relationship-enhanced multi-head attention mechanism to obtain the embedding vector of each node, so that when encoding each node in the semantic structure diagram, not only the encoding information of the current node but also the information of other nodes in the semantic structure diagram, i.e. the relation between the current node and the other nodes is preserved, and finally, all the node vectors in the semantic structure layer are input into a Fully connected feed-forward Network (FFN) to obtain the final node semantic vector, and the residual error connection is used to solve the problem of semantic learning in deep learning, and obtain the semantic vector representing the fact that the same semantic vector is in the triple triplet structure diagram, and the fact average node vector is obtained by inputting the triple vector into a feed forward triplet of the semantic structure diagram.
8. The generation method of chapter-level complex problems based on dual-planning as claimed in claim 1, wherein in step 4), a problem is generated based on the encoding results of the text and semantic structure diagrams by using a Transformer as a decoder, and at each time step of decoding, a semantic graph and a fact triplet therein are selected based on dual-planning, namely fact-level planning and semantic graph-level planning, to assist the generation of the current word, specifically, the semantic structure diagram is selected first, then the relevant fact triplet is selected from the semantic structure diagram, finally the hidden state of the decoder is updated based on the text vector and the selected fact triplet, and the current word is generated, wherein the semantic graph-level planning aims at each decoding time step, and the semantic vector is based on the text semantic vectorAnd words generated in the previous time step are selected through an attention mechanism, a semantic structure diagram which needs to be focused at present is obtained, and then the semantic structure diagram based on attention is shown and text semantic vectors are displayedSplicing, calculating the probability of each sub-image through a softmax layer, selecting the sub-image with the highest probability from the sub-images to guide the generation of the current problem, and aiming at the fact-level planning to perform text semantic vector-based decoding at each decoding time stepSelecting a fact triple needing important attention currently through an attention mechanism to obtain an attention-based fact triple representation in a kth semantic structure diagram, and then representing the attention-based fact triple and a text semantic vectorAnd splicing, calculating the probability of each fact triple through the softmax layer, and selecting the fact triple with the highest probability to guide the generation of the current problem.
9. The method for generating discourse-level complex problems based on dual planning as claimed in claim 1, wherein in the step 4), in the process of decoding the generated problems, an overlay mechanism is incorporated to encourage the decoder to overlay all semantic structure diagrams and all fact triples when generating words.
10. The method for generating a chapter-level complex problem based on dual programming according to claim 1, wherein in the step 5), the loss function is composed of three parts, i.e. cross entropy loss, supervision information loss and coverage loss, wherein the cross entropy loss refers to minimization of negative log likelihood of all model parameters, the supervision information loss refers to deviation between the semantic graph and the fact selected by the dual programming and the standard semantic graph and the fact, and the coverage loss refers to additional calculation of coverage loss when calculating the coverage vector of the semantic graph and the fact in the step 4), so as to constrain the model to pay attention to a certain semantic graph or a certain fact repeatedly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211394785.6A CN115510814B (en) | 2022-11-09 | 2022-11-09 | Chapter-level complex problem generation method based on dual planning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211394785.6A CN115510814B (en) | 2022-11-09 | 2022-11-09 | Chapter-level complex problem generation method based on dual planning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115510814A true CN115510814A (en) | 2022-12-23 |
CN115510814B CN115510814B (en) | 2023-03-14 |
Family
ID=84513613
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211394785.6A Active CN115510814B (en) | 2022-11-09 | 2022-11-09 | Chapter-level complex problem generation method based on dual planning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115510814B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115795018A (en) * | 2023-02-13 | 2023-03-14 | 广州海昇计算机科技有限公司 | Multi-strategy intelligent searching question-answering method and system for power grid field |
CN116662582A (en) * | 2023-08-01 | 2023-08-29 | 成都信通信息技术有限公司 | Specific domain business knowledge retrieval method and retrieval device based on natural language |
CN117151069A (en) * | 2023-10-31 | 2023-12-01 | 中国电子科技集团公司第十五研究所 | Security scheme generation system |
CN116824461B (en) * | 2023-08-30 | 2023-12-08 | 山东建筑大学 | Question understanding guiding video question answering method and system |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111538838A (en) * | 2020-04-28 | 2020-08-14 | 中国科学技术大学 | Question generation method based on article |
CN113065336A (en) * | 2021-05-06 | 2021-07-02 | 清华大学深圳国际研究生院 | Text automatic generation method and device based on deep learning and content planning |
-
2022
- 2022-11-09 CN CN202211394785.6A patent/CN115510814B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111538838A (en) * | 2020-04-28 | 2020-08-14 | 中国科学技术大学 | Question generation method based on article |
CN113065336A (en) * | 2021-05-06 | 2021-07-02 | 清华大学深圳国际研究生院 | Text automatic generation method and device based on deep learning and content planning |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115795018A (en) * | 2023-02-13 | 2023-03-14 | 广州海昇计算机科技有限公司 | Multi-strategy intelligent searching question-answering method and system for power grid field |
CN116662582A (en) * | 2023-08-01 | 2023-08-29 | 成都信通信息技术有限公司 | Specific domain business knowledge retrieval method and retrieval device based on natural language |
CN116662582B (en) * | 2023-08-01 | 2023-10-10 | 成都信通信息技术有限公司 | Specific domain business knowledge retrieval method and retrieval device based on natural language |
CN116824461B (en) * | 2023-08-30 | 2023-12-08 | 山东建筑大学 | Question understanding guiding video question answering method and system |
CN117151069A (en) * | 2023-10-31 | 2023-12-01 | 中国电子科技集团公司第十五研究所 | Security scheme generation system |
CN117151069B (en) * | 2023-10-31 | 2024-01-02 | 中国电子科技集团公司第十五研究所 | Security scheme generation system |
Also Published As
Publication number | Publication date |
---|---|
CN115510814B (en) | 2023-03-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109840287B (en) | Cross-modal information retrieval method and device based on neural network | |
CN108363743B (en) | Intelligent problem generation method and device and computer readable storage medium | |
CN107133211B (en) | Composition scoring method based on attention mechanism | |
CN115510814B (en) | Chapter-level complex problem generation method based on dual planning | |
He et al. | See: Syntax-aware entity embedding for neural relation extraction | |
CN110633730A (en) | Deep learning machine reading understanding training method based on course learning | |
CN112232087B (en) | Specific aspect emotion analysis method of multi-granularity attention model based on Transformer | |
CN111125333B (en) | Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism | |
CN115048447B (en) | Database natural language interface system based on intelligent semantic completion | |
CN114969304A (en) | Case public opinion multi-document generation type abstract method based on element graph attention | |
CN114429143A (en) | Cross-language attribute level emotion classification method based on enhanced distillation | |
CN113505583A (en) | Sentiment reason clause pair extraction method based on semantic decision diagram neural network | |
CN114254645A (en) | Artificial intelligence auxiliary writing system | |
Jeon et al. | Dropout prediction over weeks in MOOCs via interpretable multi-layer representation learning | |
CN113283488B (en) | Learning behavior-based cognitive diagnosis method and system | |
CN113378581A (en) | Knowledge tracking method and system based on multivariate concept attention model | |
Li et al. | Approach of intelligence question-answering system based on physical fitness knowledge graph | |
CN117235261A (en) | Multi-modal aspect-level emotion analysis method, device, equipment and storage medium | |
CN114943216B (en) | Case microblog attribute level view mining method based on graph attention network | |
CN111767388B (en) | Candidate pool generation method | |
Song | Distilling knowledge from user information for document level sentiment classification | |
CN110879838A (en) | Open domain question-answering system | |
CN117521812B (en) | Automatic arithmetic text question solving method and system based on variational knowledge distillation | |
CN114462380B (en) | Story ending generation method based on emotion pre-training model | |
CN116681087B (en) | Automatic problem generation method based on multi-stage time sequence and semantic information enhancement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |