CN115510814B - Chapter-level complex problem generation method based on dual planning - Google Patents
Chapter-level complex problem generation method based on dual planning Download PDFInfo
- Publication number
- CN115510814B CN115510814B CN202211394785.6A CN202211394785A CN115510814B CN 115510814 B CN115510814 B CN 115510814B CN 202211394785 A CN202211394785 A CN 202211394785A CN 115510814 B CN115510814 B CN 115510814B
- Authority
- CN
- China
- Prior art keywords
- semantic
- fact
- sentence
- vector
- graph
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a chapter-level complex question generation method based on double planning, which is mainly used for generating a natural language question sequence capable of being answered by answers according to a given article and answers. The method comprises the steps of firstly, coding given articles and answers by using a pre-training language model BERT to obtain a semantic vector perceived by the answers. Then, a semantic structure diagram is constructed for each sentence sequence in a given article, the semantic structure diagrams are encoded by adopting a multi-head attention mechanism, and relevant information among the semantic structure diagrams is obtained to guide generation of a complex problem. And finally, generating a complex problem by using a neural network Transformer as a decoder, selecting a semantic graph needing important attention and a fact triple in the semantic graph based on double planning, namely fact-level planning and semantic graph-level planning at each decoding time step, enhancing the complexity of the problem generation by integrating the information, and assisting the generation of the current vocabulary.
Description
Technical Field
The invention relates to the technical field of computers, in particular to a chapter-level complex problem generation method based on double planning.
Background
In recent years, with the rapid development of artificial intelligence, a Question Generation (QG) task has become a current research focus. Question generation refers to the automatic generation of content-relevant, language-friendly natural language questions from a range of data sources (e.g., text, pictures, knowledge bases). The question generation task of the present invention is input by the fact text and the answer. The question generation task has wide application prospect and can generate training data for the question and answer task; actively proposing a question in a dialogue system to improve the fluency of the dialogue; and (3) constructing automatic tutoring systems (automatic tutoring systems) to generate targeted problems according to course materials, tutoring students to learn and the like.
The current QG method based on deep learning mainly studies generation of simple problems, and there is little work to study generation of complex problems. The simple problem refers to a problem only containing one entity relationship triple, the complex problem refers to a problem containing a plurality of entity relationship triples, and the answer can be obtained through complex multi-hop reasoning. The generation of complex questions has more realistic significance than simple questions that contain only one entity-relationship triplet, for example, in the field of education, because different students have different abilities to accept knowledge, it is difficult to test the true level of a student if a simple question is generated. For students with strong abilities, complex questions need to be tested to get true feedback. In addition, the performance of the existing Question Answering (QA) system on simple questions reaches the bottleneck, and the complex questions are more beneficial to the improvement of the QA system. Therefore, the method has certain practical value and application prospect in research of complex problem generation. However, most of the existing complex problem generation methods are based on knowledge graph complex problem generation, and the methods cannot be directly applied to problem generation of unstructured texts which lack clear logic structures. In the generation of complex problems based on texts, multiple texts are generally used as input, and the situation that complex problems are generated on a single text is not considered. In addition, the methods directly blend the sentence sequence of the nodes when modeling the effective information, and do not further screen the fact in the sentences. And a sentence often contains a plurality of facts as well. Therefore, the problem generation method at the discourse level lacks of integral planning, cannot select specific facts, and is easy to cause mismatching of entities and relations, thereby affecting the correctness of the facts of the problem. Moreover, sentences contain other redundant information and noise may be introduced.
Therefore, the invention provides a chapter-level problem generation model based on double planning, a semantic structure diagram is constructed for each sentence in a text, and information needing important attention in each decoding time step is accurately positioned through double planning (fact-level planning and semantic diagram-level planning). Specifically, during decoding, a semantic structure diagram needing attention is selected, fact triple information needing attention is further determined, and complexity of a generation problem is enhanced by integrating the fact triple information into the semantic structure diagram.
Disclosure of Invention
The technical problem to be solved by the invention is that most of the existing complex problem generation methods construct a semantic graph, ignore rich fact information contained in a single sentence, lack of overall planning causes that specific facts cannot be selected, and entity and relation are easy to mismatch, thereby influencing the fact correctness of the problem.
The technical scheme adopted by the invention for solving the technical problems is as follows: a chapter-level complex problem generation method based on double planning. The method comprises the steps of firstly, coding given articles and answers by using BERT to obtain semantic vectors of answer perception. Then, a semantic structure diagram is constructed for each sentence sequence in a given article, the semantic structure diagrams are coded by adopting a multi-head attention mechanism, and relevant information among the semantic structure diagrams is obtained to guide generation of complex problems. And finally, generating a complex problem by adopting transform decoding, selecting a semantic graph needing important attention and a fact triple therein based on double planning (fact-level planning and semantic graph-level planning) at each decoding time step, and enhancing the complexity of the generated problem by integrating the information to assist the generation of the current word.
The invention discloses a chapter-level complex problem generation method based on double planning, which comprises the following steps of:
1) And encoding the given article and the answer by using BERT to obtain a text vector representation of answer perception.
2) For each sentence sequence in a given article, the sentence sequence is subjected to preliminary processing by using a self-adaptive cross sentence referring and resolving technology, and then a semantic structure diagram with fine granularity is constructed by adopting a memory perception semantic diagram construction method.
3) Processing the edges in the graph as nodes for the fine-grained semantic structure graph finally obtained in the step 2), firstly obtaining the vector representation of each node in the sentence through a multi-head attention-map encoder, then obtaining the vector representation of a single fact, and finally obtaining the vector representation of the whole graph.
4) And (2) obtaining a text vector representation of answer perception for the step 1), and sending the text vector representation into a Transformer model for decoding. And at each time step of decoding, selecting a semantic graph and a fact triple therein which need to focus on based on double planning (fact-level planning and semantic graph-level planning), and assisting the generation of the current word.
5) And designing a loss function, and training a problem generation model through multiple iterations.
As a further improvement of the invention, saidIn step 1), a BERT is adopted to code given texts and answers, and the input form isSpecifically, the text sequence and the answer are spliced, and a separator is inserted in the middleSeparating text from answer and inserting specific classification identifier at the beginningAfter the pretraining process of BERT, the classification identifier learns the characterization information of the fused text and the answer, and is represented by a vector C.
As a further improvement of the present invention, in the step 2), a fine-grained semantic structure diagram is constructed for each sentence sequence in a given article. Firstly, a self-adaptive cross-sentence reference resolution technology is adopted, pronouns are replaced by entities referred by the pronouns, and the entities are convenient to fuse in the subsequent composition. In adaptive cross-sentence resolution techniques, entity segments need to be replaced with real-world entities. Each entity instance is first represented as a semantic vector. Query entities are then predicted by entering similarity features in the softmax layerAnd a set of candidates, and predicting the entitiesAnd the candidate with the greatest co-designated probability.
As a further improvement of the present invention, in said step 2), an adaptive cross-sentence reference resolution technique is employed, in order to predict the co-referred links across sentences, an algorithm is employed to traverse the sentence list and predict the co-referred links between the entities mentioned in the current sentence and the candidate clusters calculated across all previous sentences, which first randomly orders the sentence list D,then, for each sentenceEntity in (1)From a previous sentenceComputing a candidate set in a common set of fingersWherein, in the step (A),,the number of the sentences is represented by,then predictAnd a candidate objectInter-finger linkFinally, the predicted candidate set is updated and recalculatedThe new candidate of (2).
As a further improvement of the invention, in the step 2), when the adaptive cross-sentence reference resolution technology is adopted to predict the common reference link, each entityNumber of possible candidates ofThe volume will grow as the number of previous sentences increases and the computational cost increases significantly. In order to reduce the calculation cost, the invention only considers the sum of the two in the calculation processSimilar previous sentence。
As a further improvement of the present invention, in step 2), after each sentence is subjected to reference resolution, entity relationship triples are extracted from the sentences by using a semantic graph construction method of memory perception, so as to construct a semantic graph. In the semantic graph construction method based on memory perception, the invention uses an iterative storage to store the extraction result generated in each round in a memory, so that the next decoding iteration can access all the previous extractions. Specifically, a first extraction result is generated by inputting sentences into the sequence-to-sequence architecture, then the extraction result is spliced with the source sentence and then input into the sequence-to-sequence architecture again to generate a new extraction result, and the process is repeated until EndOfExtractions are generated, wherein the symbol represents that the extraction process is finished.
As a further improvement of the invention, in the step 2), a semantic graph construction method adopting memory perception uses a sequence-to-sequence model. In order to train the sequence to a sequence model, the present invention requires a set of sentence-extraction pairs as training data. The manual construction of the data set has good effect, but is time-consuming and labor-consuming, and a large-scale data set cannot be constructed. Therefore, the present invention proposes a method for automatically constructing sentences-extracting pairs of data sets. Generally speaking, the automatic construction of a data set is divided into two steps, and all the extracted results are firstly sorted according to the confidence degree descending order output by the original system. Then, training data is constructed according to the input and output format of the model. But simply aggregating all the extraction results is not feasible. Because of the following problems: 1) No calibration: the confidence scores assigned by the different systems are not calibrated to comparable scales. 2) And (3) redundancy extraction: in addition to the full repetition, multiple systems will produce similar extractions, but with less marginal utility. 3) And (3) error extraction: pooling inevitably contaminates the data and amplifies false instances, forcing open information extraction systems downstream to learn less quality extractions. In order to solve the above problems, the present invention uses a scoring-filtering framework to obtain high quality extraction results. The pooled extraction results are scored first, and generally, good (correct, informative) extraction results will get higher values than bad (incorrect) and redundant extraction results. And then redundant data in the extraction result is filtered out. Through the scoring-filtering framework, high-quality fact triples can be obtained, and accordingly a semantic graph is constructed.
As a further improvement of the present invention, in the step 3), when the semantic structure diagram is encoded, the edges in the structure diagram are also encoded as nodes. For a certain semantic structure chart, firstly, a word vector trained in advance is adopted to initialize a node embedding vector in the semantic structure chart. Then, in order to capture semantic relations between nodes, a relation enhanced graph Transformer is adopted to encode the nodes. The method uses a multi-head attention mechanism with enhanced relationship to obtain the embedding vector of each node, so that when each node in the semantic structure chart is coded, the method not only contains the coding information of the current node, but also contains the information of other nodes in the semantic structure chart, namely, the relation between the current node and the other nodes is kept. And finally, inputting all node vectors in the semantic structure chart into a full connected feed-forward Network (FFN) to obtain a final node semantic representation vector, and solving the degradation problem in deep learning by adopting residual connection. After the node semantic expression vectors are obtained, the node vectors in the same fact triple in the graph are input into the average pooling layer, and the semantic vector expression of the fact triple is obtained. Similarly, when calculating the vector representation of the ith semantic structure diagram, all fact triple representation vectors included in the diagram are input into the average pooling layer to obtain the semantic vector representation of the semantic structure diagram.
As a further improvement of the present invention, in the step 4), a problem is generated by using a Transformer as a decoder based on the encoding result of the text and semantic structure diagram. At each time step of decoding, selecting a semantic graph needing important attention and a fact triple in the semantic graph based on double planning (fact-level planning and semantic graph-level planning), and assisting the generation of the current word. Specifically, a semantic structure diagram is selected first, then relevant fact triples are selected from the semantic structure diagram, and finally the hidden state of the decoder is updated based on the text vector and the selected fact triples, and a current word is generated.
The semantic graph level planning aims at selecting a semantic structure graph which needs to be focused at present through an attention mechanism on the basis of a text semantic vector C and words generated in previous time steps at each decoding time step to obtain a semantic structure graph representation based on attention. And then, splicing the attention-based semantic structure graph representation and the text semantic vector C, calculating the probability of each subgraph through a softmax layer, and selecting the subgraph with the highest probability to guide the generation of the current problem.
The fact-level planning aims at selecting fact triples needing important attention currently through an attention mechanism on the basis of the text semantic vector C, words generated at a previous time step and the selected semantic structure diagram at each decoding time step to obtain attention-based fact triplet representations in the kth semantic structure diagram. Similar to semantic graph-level planning, attention-based fact triple representations and text semantic vectors C are spliced together, the probability of each fact triple is obtained through calculation through a softmax layer, and the fact triple with the highest probability is selected to guide generation of the current problem.
As a further improvement of the present invention, in the step 4), a problem is generated by using a Transformer as a decoder based on the encoding result of the text and semantic structure diagram. At each time step of decoding, selecting a semantic graph needing important attention and a fact triple in the semantic graph based on double planning (fact-level planning and semantic graph-level planning), and assisting the generation of the current word. Specifically, a semantic structure diagram is selected first, then relevant fact triples are selected from the semantic structure diagram, and finally the hidden state of the decoder is updated based on the text vector and the selected fact triples, and a current word is generated.
As a further improvement of the present invention, in the step 5), the loss function is composed of three parts — cross entropy loss, supervision information loss, and coverage loss. Where cross-entropy loss refers to minimizing the negative log-likelihood of all model parameters. The loss of supervision information refers to the deviation between the semantic graph and facts of the dual planning selection and the standard semantic graph and facts. The coverage loss refers to that when the coverage vectors of the semantic graph and the fact are calculated in the step 4), the coverage loss is additionally calculated, so that the model is constrained to pay attention to a certain semantic graph or a certain fact repeatedly.
Has the advantages that:
compared with the prior art, the invention has the following advantages: 1) The existing problem generation method only constructs a semantic graph from a chapter level, and easily omits rich factual information contained in sentences. The semantic structure diagram is constructed for each sentence sequence in a given article, so that the fact in the sentence can be comprehensively and accurately acquired, and powerful data support is provided for generating complex problems. 2) The existing method is lack of overall planning, cannot select specific facts, and easily causes mismatching of entities and relations, thereby influencing the correctness of the facts of the problems. The invention uses double planning, can select the semantic graph and the fact triple in the semantic graph and the fact triple which need to focus on through the semantic graph level planning and the fact level planning in the decoding process, and ensures the matching of the generated relation and the entity by integrating the information to assist the generation of the current word, thereby improving the correctness of the problem fact.
Experimental analysis proves that the chapter-level complex problem generation method based on the double-planning provided by the method has an improvement effect on improving the fact correctness of generating complex problems and enhancing the problem generation effect.
Drawings
FIG. 1 is a schematic diagram of the basic process of the present invention;
FIG. 2 is a diagram of a model framework of the present invention;
fig. 3 is a diagram of a decoding implementation of the present invention based on dual programming.
Detailed Description
The invention is further described with reference to the following examples and the accompanying drawings.
Example (b): the invention discloses a sentence-level problem generation method based on syntactic perception prompt learning, which comprises the following steps: 1) The given article and answer are encoded using BERT, resulting in a text vector representation of the answer perception. The BERT is based on a bidirectional Transformer structure, realizes integrated feature fusion by adopting a mask language model, can model a word polysemy phenomenon, and generates a deep bidirectional language representation. Therefore, the invention adopts BERT coding, and the specific input form is
Specifically, the text sequence and the answer are spliced, and a separator is inserted in the middleThe text and the answer are separated. Inserting a specific classification identifier at the beginningAfter the pretraining process of BERT, the classification identifier learns the characterization information of the fused text and the answer, and is represented by a vector C.
2) For each sentence sequence in a given article, the sentence sequence is preliminarily processed by using a self-adaptive cross sentence referring and resolving technology, and then a semantic structure diagram with fine granularity is constructed by adopting a semantic diagram construction method of memory perception. The semantic information among different entities in the text can be clearly displayed by constructing the semantic structure chart, so that proper information can be conveniently selected to be integrated into the problem in the decoding process, and the generation of complex problems is assisted. Because the single text has a long space, the chapter constructs a semantic structure diagram for each sentence sequence in the text independently, which is favorable for capturing semantic information more accurately. To pairIn each sentence sequence, firstly, a self-adaptive cross sentence reference resolution technology is adopted, and pronouns are replaced by the referred entities, so that the entities can be conveniently fused in the subsequent composition. In adaptive cross-sentence resolution techniques, entity segments need to be replaced with real-world entities. For each entity, definition isWhereinIs an entity that is to be created,is an entityA set of participating events. Each entity instance is first represented as a semantic vector. The invention inputs entity span into BERT to obtain initial vector representationThen each event is obtained in the same wayVector representation ofAnd inputting the event set into the BilSTM, and externally connecting a mean pooling layer to obtain a vector representation of the event set. Finally, combining the initial vector representation of the entity with the vector representation of the event set to obtain the final entity comment semantic representation vector。
Suppose thatIs a set of related entitiesThe antecedent of (a) refers collectively to a cluster. The invention combines sentence-level information via an incremental amountAnd word level informationIs composed ofEach co-reference entity antecedent set P compute candidate cluster representation in (a). Wherein, the first and the second end of the pipe are connected with each other,the finger will compriseThe sentences are represented by the CLS vector obtained by BERT, and the semantic information of the sentences is contained. The calculation method is as follows:
wherein, in the step (A),are all learning parameters. Then all of the common referent entity antecedent words in the set PAveraging to obtain candidate cluster representation。
Query entities are then predicted by entering similarity features in the softmax layerAnd a co-designated link between a set of candidates. Suppose thatIs thatIs/are as followsA set of candidate representations, the invention first uses cosine similarityAnd multi-view cosine similarityCalculating each candidateAnd entitiesThe similarity of (c). These similarity features are then combined with the differences of the candidate and query and the dot product to
A final characterization is obtained, the calculation formula is as follows:
then, for all candidatesWe compute query entitiesProbability associated therewithThe calculation method is as follows:
To predict coreference links across sentences, the present invention designs an algorithm to traverse the sentence list and predict coreference links between entities mentioned in the current sentence and candidate clusters computed across all previous sentences. The algorithm first arbitrarily orders the sentence list D, and then, for each sentenceEntity in (1)From a previous sentenceComputing a candidate set in a common set of fingersWherein, in the step (A),,the number of the sentences is represented by,then predictAnd a candidate objectInter-finger linkFinally, the predicted candidate set is updated and recalculatedOf the new candidate object.
Predicting common reference links for each entity using adaptive cross-sentence reference resolution techniquesThe number of possible candidates of (2) may grow as the number of previous sentences increases, and the computational cost increases greatly. In order to reduce the calculation cost, the invention only considers the calculation processSimilar previous sentence. The present invention considers sentences having the same topic as similar sentences. During training, the present invention uses standard entity clusters to compute candidate and standard sentence topic clusters. In contrast, in the inference process, the co-designated cluster of the current prediction is used to compute the candidates. In addition, the predicted topic clusters computed using K-means. By minimizing the cross-entropy loss training model for batch computations, all M entities in a single sentence form a batch, and the loss is computed after M sequential predictions. After each sentence is subjected to reference resolution, a semantic graph construction method of memory perception is adopted, and a triple in a format of (head entity, relation and tail entity) is extracted from the sentence. The head and tail entities represent the subject and object, respectively, and the relationship is equivalent to the predicate connecting the subject and object. In the semantic graph construction method for memory perception, the invention uses an iterative memory to store the extraction result generated in each round intoIn memory so that the next decoding iteration can access all previous fetches. Specifically, a first extraction result is generated by inputting sentences into the sequence-to-sequence architecture, then the extraction result is spliced with the source sentence and then input into the sequence-to-sequence architecture again to generate a new extraction result, and the process is repeated until EndOfextractions are generated, wherein the symbol indicates that the extraction process is finished. Because a semantic graph construction method adopting memory perception uses a sequence-to-sequence model, in order to train the sequence-to-sequence model, the invention needs a group of sentence-extraction pairs as training data. The manual construction of the data set has good effect, but is time-consuming and labor-consuming, and a large-scale data set cannot be constructed. Thus, the present invention proposes a method for automatically constructing sentences-extracting pairs of data sets. Generally, the automatic construction of a data set is done in two steps, first sorting all the extracted results in order of decreasing confidence in the original system output. Then, training data is constructed according to the input and output format of the model. But simply aggregating all the extraction results is not feasible. Because of the following problems: 1) No calibration: the confidence scores assigned by the different systems are not calibrated to comparable scales. 2) And (3) redundancy extraction: in addition to the full repetition, multiple systems produce similar extractions, but with less marginal utility. 3) And (3) error extraction: pooling inevitably contaminates the data and amplifies false instances, forcing open information extraction systems downstream to learn less quality extractions. To solve the above problem, the present invention uses a score-filter framework to obtain high quality extraction results. And (3) scoring: the present invention relates to a model for scoring an aggregated draw, which is pre-trained on a random pilot data set. The random guidance data set is generated by randomly extracting each sentence from any one of the guidance systems being aggregated. The model assigns a score to each extraction in the pool based on its confidence value, and generally speaking, good (correct, informative) extractions will get higher values than bad (incorrect) and redundant extractions. And (3) filtering: and then redundant data in the extraction result is filtered. For a given set of ordered decimationsAs a result, the present invention seeks to select the subset of extractions (assigned by the stochastic bootstrap model) with the best confidence score, while having minimal similarity to other selected extractions. Therefore, the invention constructs a complete weighted graph based on all the extraction results in a set of orderingEach node in the graph corresponds to a decimation result. Each pair of nodesConnected by an edge. Each edge having an associated weightThe similarity between two corresponding extractions is indicated. Each nodeIs assigned a scoreThis score is equal to the confidence given by the stochastic guidance model. And then selecting the best subgraph from themAs a result of the high quality extraction, other nodes in the graph default to redundant data and are automatically filtered. The process is mathematically expressed as follows:
wherein, in the process,representation diagramThe node(s) in (1) is (are),representing nodesAndright 2 score in between. The first term of the formula refers to the sum of significance of all selected triples and the second term refers to redundant information between those triples. If the picture isWith n nodes, then the target may be set to:
(ii) a WhereinScore representing a node, i.e.And is andis provided with an item A symmetric matrix of (a).Is a decision vector in whichRepresenting a particular nodeWhether it belongs to a graph. By the scoring-filtering framework, high-quality fact triples can be obtained, and finally, entities are used as nodes, and relationships are used as edges connecting two entities to construct a semantic structure chart.
3) And (3) regarding the fine-grained semantic structure diagram finally obtained in the step 2), processing the edges in the diagram as nodes, and obtaining the vector representation of the whole diagram through a multi-head attention diagram encoder. Specifically, for a certain semantic structure chart, firstly, a word vector trained in advance is used for initializing a node embedding vector in the semantic structure chart. Then, in order to capture semantic relations between nodes, the invention adopts a graph Transformer with enhanced relations to encode the nodes. The method uses a relation-enhanced multi-head attention mechanism to obtain each nodeEmbedding vector of,Is the size of the node embedding, the calculation formula is as follows:
wherein the content of the first and second substances,the multi-head attention mechanism is a model parameter, and the multi-head attention mechanism has the function that when each node in the semantic structure diagram is coded, the multi-head attention mechanism not only contains the coding information of the current node, but also contains the information of other nodes in the semantic structure diagram, namely, the connection between the current node and the other nodes is kept. This process is formulated as follows:
as can be seen from the formula, the key point of the multi-head attention mechanism is to integrate the semantic relationship between nodes into the query vectorSum key vectorIn (1). Wherein the content of the first and second substances,are respectively nodesThe shortest relation path betweenThe encoding of (3). The encoding result is obtained by adding the embedding vectors of all the relation nodes in the path.
Finally, all the node vectors in the semantic structure diagram are input into a full connected feed-forward Network (FFN) to obtain a final node semantic expression vector, and the degradation problem in deep learning is solved by adopting residual connection, wherein the calculation formula is as follows:
(ii) a Wherein the content of the first and second substances,are all of the parameters that are trainable,is a linear network using the gelu activation function.
After obtaining the node semantic expression vector, inputting the node vectors positioned in the same fact triple in the graph into an average poolIn the layer, the semantic vector representation of the fact triple is obtainedIs shown asIn the semantic structure diagramA fact triplet. Similarly, in calculatingWhen the vector of the semantic structure chart is represented, all fact triple representing vectors contained in the graph are input into the average pooling layer to obtain the semantic vector representation of the semantic structure chartThe calculation formula is as follows:
(ii) a Wherein the content of the first and second substances,the average pooling function is represented by the average pooling function,represents the firstIn the semantic structure chartAll the nodes embedding vectors in a fact triplet,represents the firstAll fact triple vectors in the semantic structure diagram.
4) And (3) obtaining a text vector representation of answer perception for the step 1), and sending the text vector representation into a Transformer model for decoding. And at each time step of decoding, selecting a semantic graph and a fact triple therein which need to focus on based on double planning (fact-level planning and semantic graph-level planning), and assisting the generation of the current word. As shown in FIG. 3, specifically, a semantic structure diagram is first selectedThen, relevant fact triples are selected from the semantic structure diagramFinally, the hidden state of the decoder is updated based on the text vector and the selected fact tripletAnd generates the current wordThe calculation process is as follows:
the semantic graph level planning aims at selecting a semantic structure graph needing important attention at each decoding time step through an attention mechanism based on a text semantic vector C and words generated at previous time steps to obtain a semantic structure graph representation based on attentionIn order to prevent the decoder from repeatedly selecting the same semantic graph, the invention integrates an overlay mechanism to encourage the decoder to overlay all semantic structure graphs when generating words. The calculation process is as follows:
(ii) a Wherein the content of the first and second substances,respectively to the semantic structure chartThe degree of attention and coverage of the user,are all model parameters.
At each time step of model decoding, a coverage loss is calculated for the selected semantic graph in the following manner:then the attention-based semantic structure graph is representedAnd splicing with the text semantic vector C, calculating the probability of each sub-image through a softmax layer, and selecting the sub-image with the highest probability to guide the generation of the current problem.
The fact-level planning aims at each decoding time step, based on the text semantic vector C, the words generated at the previous time step and the selected semantic structure diagramSelecting the fact triplets needing important attention currently through an attention mechanism to obtain the firstAttention-based fact triplet representation in semantic structure graph. Similar to semantic graph-level programming, the invention incorporates an overlay machine in order to prevent the decoder from repeatedly selecting the same fact tripletThe decoder is encouraged to cover all fact triplets when generating words.
Wherein, in the step (A),are respectively a pair fact tripletThe degree of attention and coverage of the user, are all model parameters.
Similarly, at each time step of model decoding, a coverage loss is computed for the selected fact triplet, as follows:
attention-based fact triplets are then representedAnd (3) splicing with the text semantic vector C, calculating the probability of each fact triple through a softmax layer, and selecting the fact triple with the highest probability to guide the generation of the current problem.
5) And designing a loss function, and training a problem generation model through multiple iterations. The loss function consists of three parts-cross entropy loss, supervision information loss, and coverage loss. Wherein, the cross entropy loss refers to minimizing the negative log-likelihood of all model parameters, and when a text D and an answer A are given, the calculation mode is as follows:
the invention also makes statistics on the semantic structure chart and the fact triple supervision information selected in each step of reasoning process, analyzes the question and the answer, finds the answer and the entity involved in the question in the text, and determines the standard semantic graphAnd fact triplets. At each time step of problem generation, a probability distribution of the semantic structure diagram is generatedAnd probability distribution of fact tripletsAt this time, the semantic structure diagram is compared with the semantic structure diagram of the standard which should be selectedAnd fact tripletsMatching is performed and the corresponding losses are calculatedThe formula is as follows:
(ii) a The coverage loss refers to that when the coverage vectors of the semantic graph and the facts are calculated in the step 4), the coverage loss is additionally calculated, and repeated selection of information in the same graph can be effectively avoided through the coverage loss, because the higher the probability of one graph in the past selection is,the greater the loss produced.
(ii) a The present example was evaluated using the following criteria: in terms of automatic Evaluation, bilingual Evaluation aid (BLEU) is used to evaluate the degree of coincidence of the generated result and the standard result; displaying a ranking Translation Evaluation (METEOR) for evaluating semantic relevance between the generated result and the standard result; recall-Oriented abstract Evaluation-assisted (round-L) Evaluation of the longest common substring between the generated result and the standard result. In the aspect of manual evaluation, fluency is used for explaining the fluency of the generated result expression; evaluating the correlation degree of the generated result and the given input text; the complexity refers to whether the generated question is complex or not, and the complexity can be evaluated by observing the number of clauses and modifiers contained in the sentence and the number of steps for carrying out multi-hop reasoning when answering the question. Correctness refers to whether the fact contained in the generation problem is correct, i.e., whether fact triples exist in a given source text, and whether entities and relationships match.
In order to verify the effect of the invention, automatic evaluation and manual evaluation are carried out on the common data sets SQuAD and MSMARCO. The experimental results are as follows:
table 1 automatic evaluation results of different methods on SQuAD;
table 2 automatic evaluation results of different methods on msstrarco;
table 3 results of manual evaluation of different methods on msstrarco;
the method achieves optimal performance on a plurality of data sets, and is greatly improved compared with other methods.
The above examples are only preferred embodiments of the present invention, it should be noted that: it will be apparent to those skilled in the art that various modifications and equivalents can be made without departing from the spirit of the invention, and it is intended that all such modifications and equivalents included within the scope of the claims be interpreted as included within the scope of the invention.
Claims (9)
1. A chapter-level complex problem generation method based on double planning is characterized by comprising the following steps:
1) Coding given articles and answers by adopting a pre-training language model BERT to obtain a text vector representation of answer perception,
2) For each sentence sequence in a given article, the sentence sequence is preliminarily processed by using a self-adaptive cross sentence referring and resolving technology, then a semantic structure diagram with fine granularity is constructed by adopting a semantic diagram construction method of memory perception,
3) Processing the edges in the graph as nodes for the fine-grained semantic structure graph finally obtained in the step 2), firstly obtaining the vector representation of each node in the sentence through a multi-head attention-map encoder, then obtaining the vector representation of a single fact, and finally obtaining the vector representation of the whole graph,
4) For the step 1), obtaining a text vector representation of answer perception, sending the text vector representation into a Transformer model for decoding, and at each time step of decoding, selecting a semantic graph needing important attention and fact triples in the semantic graph based on double planning, namely fact-level planning and semantic graph-level planning, and assisting the generation of the current word,
5) Designing a loss function, and training a problem generation model through multiple iterations;
in step 4), based on the encoding results of the text and the semantic structure diagram, using a Transformer as a decoder to generate a problem, and at each time step of decoding, based on a double planning, namely a fact-level planning and a semantic-level planning, selecting a semantic diagram and a fact triplet therein which need to be focused, and assisting the generation of a current word, specifically, firstly selecting the semantic structure diagram, then selecting a related fact triplet from the semantic structure diagram, and finally updating the hidden state of the decoder based on a text vector and the selected fact triplet, and generating the current word,
wherein the semantic graph level planning aims at text semantic vector based at each decoding time stepAnd words generated in the previous time step are selected through an attention mechanism, a semantic structure diagram which needs important attention at present is obtained, a semantic structure diagram representation based on attention is obtained, and then the semantic structure diagram representation based on attention and a text semantic vector are representedSplicing, calculating the probability of each sub-graph through the softmax layer, selecting the sub-graph with the highest probability to guide the generation of the current problem,
fact-level planning aims at text-based semantic vectors at each decoding time stepSelecting a fact triple needing important attention currently through an attention mechanism to obtain an attention-based fact triple representation in a kth semantic structure diagram, and then representing the attention-based fact triple and a text semantic vectorAnd splicing, calculating the probability of each fact triple through the softmax layer, and selecting the fact triple with the highest probability to guide the generation of the current problem.
2. The method for generating discourse-level complex questions based on dual planning as claimed in claim 1, wherein in the step 1), the BERT is used to encode the given text and answers, and the input form isSpecifically, the text sequence and the answer are spliced, and a separator is inserted in the middleSeparating text from answer and inserting specific classification identifier at the beginningAfter the pretraining process of BERT, the classification identifier learns the characterization information of the fused text and the answer, and is represented by a vector C.
3. The generation method of space-level complex problems based on dual planning as claimed in claim 1, wherein in step 2), a fine-grained semantic structure diagram is constructed for each sentence sequence in a given article, an adaptive cross-sentence reference resolution technique is firstly adopted to replace pronouns with entities referred to by the sentence sequences, so that the entities can be fused during subsequent composition, in the adaptive cross-sentence reference resolution technique, entity elements need to be replaced with entities in the real world, each entity element needs to be represented as a semantic vector, and then similarity features are input into a softmax layer to predict and query the entitiesAnd a set of candidates, and predicting the entitiesAnd the candidate with the greatest co-designated probability.
4. The method of claim 1, wherein in step 2), an adaptive cross-sentence resolution technique is used, and in order to predict the inter-sentence coreference links, an algorithm is used to traverse the sentence list and predict the coreference links between the entities mentioned in the current sentence and the candidate clusters calculated across all previous sentences, the algorithm first randomly ordering the sentence list D, and then, for each sentence, the algorithm calculates the coreference links between the candidate clusters and the entity list DEntity in (1)From a previous sentenceComputing a candidate set in a common set of fingersWherein, in the process,,the number of the sentences is represented by,then predictAnd a candidate objectInter-finger linksFinally, the predicted candidate set is updated and recalculatedOf the new candidate object.
5. The method for generating discourse-level complex problems based on dual planning as claimed in claim 4, wherein in the step 2), each entity predicts the co-reference link by using the adaptive cross-sentence resolution techniqueThe number of possible candidates of (2) increases with the number of previous sentences, the calculation cost increases, only the sentence is considered in the calculation processSimilar previous sentence。
6. The generation method of chapter-level complex problems based on dual planning as claimed in claim 1, wherein in step 2), after each sentence is subjected to reference resolution, entity relationship triples are extracted from the sentence by using a semantic graph construction method of memory perception to construct a semantic graph, in the semantic graph construction method of memory perception, an iterative storage is used to store the extraction results generated in each round in a memory so that the next decoding iteration accesses all previous extractions, first, the sentence is input into a sequence architecture to generate a first extraction result, then, the extraction result and a source sentence are spliced and input into the sequence architecture again to generate a new extraction result, and the process is repeated until EndExOftractions are generated to represent that the extraction process is finished;
in the step 2), a sequence-to-sequence model is used in the semantic graph construction method adopting memory perception, a scoring frame is used to obtain high-quality extraction results, the collected extraction results are scored firstly, good extraction results can obtain higher values than bad and redundant extraction results, then redundant data in the extraction results are filtered, and high-quality fact triples are obtained through the scoring frame, so that the semantic graph is constructed.
7. The method for generating chapter-level complex problems based on dual programming as claimed in claim 1, wherein in step 3), when encoding the semantic structure diagram, the edges in the structure diagram are also used as nodes to encode, for a certain semantic structure diagram, a pre-trained word vector is used to initialize the node embedding vector therein, then in order to capture the semantic relation between the nodes, a relation-enhanced graph Transformer is used to encode the nodes, the method uses a relation-enhanced multi-head attention mechanism to obtain the embedding vector of each node, so that when encoding each node in the semantic structure diagram, not only the encoded information of the current node but also the information of other nodes in the semantic structure diagram, i.e. the relation between the current node and the other nodes is preserved, and finally, all the node vectors in the semantic structure diagram are input into a Fully connected feed-forward Network (FFN) to obtain the final node semantic representation vector, and residual connection is used to solve the problem in deep learning, and obtain the fact average node vector representing the same semantic vector in a semantic triple feed-forward Network, and the fact average node vector is obtained after the fact vector is input into a feed forward layer.
8. The method for generating a dual-programming-based discourse-level complex problem according to claim 1, wherein in the step 4), in the process of decoding the generated problem, an overlay mechanism is incorporated to encourage the decoder to overlay all semantic structure diagrams and all fact triples when generating words.
9. The method for generating a chapter-level complex problem based on dual programming according to claim 1, wherein in the step 5), the loss function is composed of three parts, i.e. cross entropy loss, supervision information loss and coverage loss, wherein the cross entropy loss refers to minimization of negative log likelihood of all model parameters, the supervision information loss refers to deviation between the semantic graph and the fact selected by the dual programming and the standard semantic graph and the fact, and the coverage loss refers to additional calculation of coverage loss when calculating the coverage vector of the semantic graph and the fact in the step 4), so as to constrain the model to pay attention to a certain semantic graph or a certain fact repeatedly.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211394785.6A CN115510814B (en) | 2022-11-09 | 2022-11-09 | Chapter-level complex problem generation method based on dual planning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211394785.6A CN115510814B (en) | 2022-11-09 | 2022-11-09 | Chapter-level complex problem generation method based on dual planning |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115510814A CN115510814A (en) | 2022-12-23 |
CN115510814B true CN115510814B (en) | 2023-03-14 |
Family
ID=84513613
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202211394785.6A Active CN115510814B (en) | 2022-11-09 | 2022-11-09 | Chapter-level complex problem generation method based on dual planning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115510814B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115795018B (en) * | 2023-02-13 | 2023-05-09 | 广州海昇计算机科技有限公司 | Multi-strategy intelligent search question-answering method and system for power grid field |
CN116662582B (en) * | 2023-08-01 | 2023-10-10 | 成都信通信息技术有限公司 | Specific domain business knowledge retrieval method and retrieval device based on natural language |
CN117151069B (en) * | 2023-10-31 | 2024-01-02 | 中国电子科技集团公司第十五研究所 | Security scheme generation system |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111538838B (en) * | 2020-04-28 | 2023-06-16 | 中国科学技术大学 | Problem generating method based on article |
CN113065336B (en) * | 2021-05-06 | 2022-11-25 | 清华大学深圳国际研究生院 | Text automatic generation method and device based on deep learning and content planning |
-
2022
- 2022-11-09 CN CN202211394785.6A patent/CN115510814B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN115510814A (en) | 2022-12-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109840287B (en) | Cross-modal information retrieval method and device based on neural network | |
CN108363743B (en) | Intelligent problem generation method and device and computer readable storage medium | |
CN115510814B (en) | Chapter-level complex problem generation method based on dual planning | |
CN107133211B (en) | Composition scoring method based on attention mechanism | |
CN106202010B (en) | Method and apparatus based on deep neural network building Law Text syntax tree | |
US20180329884A1 (en) | Neural contextual conversation learning | |
CN107944027B (en) | Method and system for creating semantic key index | |
CN112232087B (en) | Specific aspect emotion analysis method of multi-granularity attention model based on Transformer | |
CN111125333B (en) | Generation type knowledge question-answering method based on expression learning and multi-layer covering mechanism | |
CN116450796B (en) | Intelligent question-answering model construction method and device | |
CN111507093A (en) | Text attack method and device based on similar dictionary and storage medium | |
CN111125520A (en) | Event line extraction method for news text based on deep clustering model | |
US20230014904A1 (en) | Searchable data structure for electronic documents | |
CN113505583A (en) | Sentiment reason clause pair extraction method based on semantic decision diagram neural network | |
CN114429143A (en) | Cross-language attribute level emotion classification method based on enhanced distillation | |
CN114880307A (en) | Structured modeling method for knowledge in open education field | |
CN113283488B (en) | Learning behavior-based cognitive diagnosis method and system | |
CN117235261A (en) | Multi-modal aspect-level emotion analysis method, device, equipment and storage medium | |
CN113590745B (en) | Interpretable text inference method | |
Bai et al. | Gated character-aware convolutional neural network for effective automated essay scoring | |
CN111767388B (en) | Candidate pool generation method | |
Song | Distilling knowledge from user information for document level sentiment classification | |
CN114896966A (en) | Method, system, equipment and medium for positioning grammar error of Chinese text | |
Pei et al. | Visual relational reasoning for image caption | |
CN115729532B (en) | Java program method name recommendation method and system based on generation countermeasure network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |