CN116151375A - Event tracing reasoning method based on inverse facts and path mining - Google Patents
Event tracing reasoning method based on inverse facts and path mining Download PDFInfo
- Publication number
- CN116151375A CN116151375A CN202310426771.6A CN202310426771A CN116151375A CN 116151375 A CN116151375 A CN 116151375A CN 202310426771 A CN202310426771 A CN 202310426771A CN 116151375 A CN116151375 A CN 116151375A
- Authority
- CN
- China
- Prior art keywords
- event
- path
- events
- representation
- chain
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 52
- 238000005065 mining Methods 0.000 title claims abstract description 17
- 230000006870 function Effects 0.000 claims abstract description 46
- 230000007246 mechanism Effects 0.000 claims abstract description 29
- 239000013598 vector Substances 0.000 claims abstract description 26
- 238000010586 diagram Methods 0.000 claims abstract description 20
- 239000011159 matrix material Substances 0.000 claims abstract description 20
- 230000002787 reinforcement Effects 0.000 claims abstract description 20
- 230000035945 sensitivity Effects 0.000 claims abstract description 19
- 238000013528 artificial neural network Methods 0.000 claims abstract description 11
- 230000000694 effects Effects 0.000 claims abstract description 7
- 238000005457 optimization Methods 0.000 claims abstract description 4
- 230000001364 causal effect Effects 0.000 claims description 35
- 230000007704 transition Effects 0.000 claims description 20
- 230000009471 action Effects 0.000 claims description 16
- 238000004364 calculation method Methods 0.000 claims description 15
- 238000012549 training Methods 0.000 claims description 13
- 238000009826 distribution Methods 0.000 claims description 12
- 230000008859 change Effects 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 10
- 230000015654 memory Effects 0.000 claims description 7
- 230000004913 activation Effects 0.000 claims description 6
- 239000003795 chemical substances by application Substances 0.000 claims description 5
- 230000004927 fusion Effects 0.000 claims description 5
- 230000007787 long-term memory Effects 0.000 claims description 5
- 238000013507 mapping Methods 0.000 claims description 5
- 238000004422 calculation algorithm Methods 0.000 claims description 4
- 230000004048 modification Effects 0.000 claims description 4
- 238000012986 modification Methods 0.000 claims description 4
- 230000006403 short-term memory Effects 0.000 claims description 4
- 241000726103 Atta Species 0.000 claims description 3
- 230000003213 activating effect Effects 0.000 claims description 3
- 238000010348 incorporation Methods 0.000 claims description 3
- 238000006467 substitution reaction Methods 0.000 claims description 3
- 101100512352 Enterococcus faecalis (strain TX4000 / JH2-2) mapP gene Proteins 0.000 claims 1
- 239000013604 expression vector Substances 0.000 claims 1
- 239000013589 supplement Substances 0.000 description 3
- 238000010276 construction Methods 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 238000003058 natural language processing Methods 0.000 description 2
- 238000002360 preparation method Methods 0.000 description 2
- 239000000047 product Substances 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 241000282414 Homo sapiens Species 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000015572 biosynthetic process Effects 0.000 description 1
- 238000004140 cleaning Methods 0.000 description 1
- 238000013434 data augmentation Methods 0.000 description 1
- 201000010099 disease Diseases 0.000 description 1
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000008092 positive effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/042—Backward inferencing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3346—Query execution using probabilistic model
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Software Systems (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biophysics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention disclosesAn event trace reasoning method based on inverse facts and path mining comprises the following steps: a self-attention mechanism is adopted to obtain a causality effect matrix, and the causality effect matrix is led into an event node representation by using a graph neural network; forming an intermediate hidden state by using an attention mechanism, and guiding the RoBERTa model to extract key features of an observation eventh att The method comprises the steps of carrying out a first treatment on the surface of the Projecting the events onto an external event logic diagram by using cosine similarity, and calculating logic links between similar events by using reinforcement learning based on the intermediate hidden state; obtaining context vectors using an attention mechanismq path The method comprises the steps of carrying out a first treatment on the surface of the Will beh att And (3) withq path Stitching is used to calculate reasonable value scores for hypotheses; selecting the hypothesis with the highest score of the reasonable value as the reasonable hypothesis which is most likely to occur; and adding a back facts loss function optimization model, and comparing different hypothesis events to mine key traceability features. The reasoning result of the method is more accurate, and key factors supporting tracing are grabbed according to the sensitivity of the counterfactual.
Description
Technical Field
The invention belongs to the field of natural language processing, and particularly relates to an event tracing reasoning method based on inverse facts and path mining.
Background
There are a number of supervised trained models in the field of natural language processing, which are trained on text with tag values, in order to find a direct link between the input text and the result tag. Such models are often black box models, with no interpretability. Therefore, natural language reasoning is necessary based on natural language understanding, and human beings consider that the behavior of "tracing" is the core of people to read and understand natural language for a long time. The αnli task is selected for traceability reasoning here to reflect the reasoning capabilities of the model. The alpha NLI traceability reasoning task is to select the most reasonable explanation or hypothesis reasoning according to the incompletely observed situation.
For the αnli task, bhagavatula C, le Bras R, malaviya C, et al, "AbductiveCommonsense Reasoning" encoded with pre-trained language models BERT and GPT, training the inference capabilities of the models. The "displaationlp: abductive reasoning for explainable science question answering" builds a graph structure on top of the text of the observed event and the hypothesized event, with words as nodes, and builds an inference graph to determine the rationality of the hypothesized event. But the accuracy of reasoning about the hypothetical events is low only by incompletely observing the events, so that many researchers join an external knowledge base and attempt to integrate the knowledge of the external event map to enhance the traceability reasoning, wherein the external knowledge comprises the information of social knowledge, causal knowledge, auxiliary evidence events and the like. For example, mu F, li W, xie z. "Effect Generation Based on CausalReasoning" introduced an event-level external event graph to capture auxiliary knowledge related to a given event to support auxiliary model reasoning; du L, ding X, liu T, et al, "Learning eventgraph knowledge for abductive reasoning" supplements the auxiliary evidence events between observed events by way of pre-training, so that the state change between captured events is more fine-grained, and further the model reasoning performance is improved.
The graphic prize acquirer Pearl j. "Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution" believes that the current machine learning system operates almost entirely in a statistical or model-free mode, which severely limits the system's capabilities and performance at a theoretical level. He puts forward three layers of association, intervention and counter facts, and considers that the existing model is only in the "association" layer, namely, the data is output by reasoning according to the input data, and the inference can not be made on the problems of the "intervention" and "counter facts", so that the existing model is difficult to be the basis of strong AI. To solve the above problems, existing models attempt to introduce counterfactual. Paul D, frank A. "Generating Hypothetical Events for Abductive Inference" defines the counterfactual as a task, replaces intermediate events on the same premise, lets the model reason about the content of subsequent events in text form, and hopes that the reasoning results are as consistent as possible with the true results. ZhangB, guo X, lin Q, et al, "Counterfactual inference graph network for disease prediction" tried to find similar event pairs from observed event pairs, and training models based on the true assumption that similar event pairs were reasonably trusted. The counterfacts emphasize that if the past intermediate events are changed on the premise of knowing the current event, the influence on the subsequent results is caused, and the existing model lacks the comparison of the current state and the assumed state, and the key factors for changing the event results cannot be captured.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the event traceability reasoning method based on the inverse facts and the path mining is provided, an external event graph and logic chain based double-coding structure is adopted, a self-attention mechanism is utilized to introduce a structured hidden variable to discover potential causal relations among events, and an external event logic graph is utilized to introduce traceability knowledge, so that reasonable value scores obtained by reasonable assumption are enabled to be larger through learning reasoning, and traceability of causal reasoning is highlighted.
In order to solve the technical problems, the invention adopts the following technical scheme:
an event tracing reasoning method based on inverse facts and path mining comprises the following steps:
s1, inputting observation event text and assumption event text, and giving assumption eventH。
S2, coding the observed event text and the hypothesized event text by using the RoBERTa pre-training language model to obtain shallow event features.
S3, adopting a self-attention mechanism for the observed event and the hypothesized event, and taking the obtained attention score as a causal effect matrix between the observed event and the hypothesized eventA*The matrix is a square matrix.
S4, according to the causal effect matrixA*And (3) encoding the shallow event features obtained in the step (S2) by using a graph neural network to obtain event representation.
S5, forming an intermediate hidden state by using the attention mechanism to represent the hypothesized event and observe the eventZGuiding the RoBERTa pre-training language model to further encode the observed event and extracting key features affecting causal effectsh att 。
Relying on information of observed events alone may lead to a lack of interpretability of the causal reasoning process and a low accuracy of the reasoning results. Considering that similar causal events may exist between similar reasoning logic, an external event logic diagram is introduced to supplement evidence events.
S6, projecting the observed event and the hypothesized event on an external event logic diagram by using cosine similarity, and obtaining a logic chain path set between the similar observed event and the similar hypothesized event by using a reinforcement learning algorithm.
S7, iteratively calculating probability transition among logics through a deep neural network and updating the preamble event chain representation to obtain all event chain representationsS。
S8, representing by given hypothesisIs shown with all event linksSPerforming an attention mechanism to obtain context vectors based on external event logic diagramsq path 。
S9, key features are obtainedh att With context vectorsq path And (5) splicing, and calculating a reasonable value score for the hypothesized event H.
S10, iterative selection hypothesis HAnd only brings in one hypothetical event at a timeHThe hypothesis with the highest score in rational value is selected as the most likely reasonable hypothesis.
S11, reasoning the loss function through a prediction loss function, a logic chain, and a trace reasoning method of the four parts of the inverse sensitivity loss function and the triple loss function.
Further, in step S1, the observation event includes a background eventAnd outcome event->The hypothetical event includes a hypothetical eventH 1 And hypothesis eventH 2 Only one hypothetical event is given at a time to operate.
Further, in step S2, a RoBERTa pre-training language model is constructed through a Huggingface website in a one-touch manner, a [ CLS ] is added to an event text as a first mark, and after the text is encoded in the RoBERTa pre-training language model, a representation vector of the [ CLS ] is used as a shallow feature representation of the given event.
Further, in step S4, the specific content of the event representation is obtained as follows: CLS for each event]Representation and causal effect matrixA*Input into a graph convolution neural network for coding to obtain event representation,/>,/>。
Further, in step S5, key features are extractedh att The specific steps of (a) are as follows:
S501、Vrepresenting the event representation obtained in step S4, for background events The formula for obtaining the corresponding attention score is as follows:
wherein ,representing a softmax operation; />For background event->A corresponding attention score; />Is the attention network coefficient.
Results eventAnd hypothesis eventH 1 The corresponding attention score is likewise calculated by the above formula, by the corresponding attention score +.>Representing each event +.>Fusion to obtain an intermediate stateZThe specific calculation formula is as follows:
wherein ,for result event->A corresponding attention score; />To assume eventH 1 A corresponding attention score; />Representing incorporation of event->、/>And hypothesis eventH 1 Is a piece of information of (a).
S502, utilizing a forward attention mechanism to hide from the middle stateZStarting, shallow event features for a given event are observedExtracting causal effect characteristics of the correlation, which is marked as +.>The method comprises the steps of carrying out a first treatment on the surface of the By means of the reverse attention mechanism, for->Conversely, attention is paid to identify irrelevant contents, denoted +.>:
S503, relevant content of forward attention mechanismIrrelevant to the reverse attention mechanism +.>Adding to obtain key features affecting causal effectsh att :
Further, in step S6, the specific steps of capturing the set of logical link paths are:
s601, defining the external event map asGAn event is defined as a node on the graph NCausal relationships between events are defined as edgesR. If event A results in the occurrence of event B, then the edge is pointed to by event A to event B.
S602, marking an adjacent matrix of the external event diagram as, wherein />Representing eventsiResulting in an eventj,/>Representing eventsiDoes not result in an eventj。
S603, projecting the observed event and the hypothesized event onto an external event logic diagram by using cosine similarity, wherein the specific formula is as follows:
wherein ,representing the nodes on the external event graph with the highest similarity value, namely the projection of a given event.
S604, background eventOutcome event->Hypothetical eventsHAre respectively corresponding to->Expressed as->。/>
S605, the prior method adopts breadth-first traversal or depth-first traversal, path exploration is performed in advance, interaction with a prediction process cannot be performed, and the prior graph traversal algorithm has randomness, semantic information among nodes on a graph cannot be considered, so that reinforcement learning is adopted for path searching.
The whole environment of reinforcement learning is defined by action spaceActionStatus ofStatePolicy Network and rewarding functionRewardFour-part composition, state pair combining deep path method and NLI prediction problemStatePolicy Network and rewarding functionRewardAnd (3) carrying out modification. The specific contents are as follows:
First, through a policy network, an agentFrom the active space in each time stepActionA relation is searched as a path to reach the next node by selecting the relation +.>. Wherein the action setActionConsists of relationships between all pairs of entities, for example: receive, turn, and advance the relationship. The node->Having a new state, at time stepstIs defined as:
wherein ,is->Node representation +.>Is->Node representation +.>Is a time steptNode representation to which the time instant selection relation (i.e. action) is associated,/->The gap between the currently selected node and the target node is measured. Will be in an intermediate hidden stateZIncorporating the predictive process, paths with strong correlations are further extracted under simulation of background events and hypothetical events.
The prior reinforcement learning is based on the Markov process, and only depends on the previous state to predict, and the obtained information is insufficient, which is easy to cause strongLearning is repeated with jumps between individual nodes. Therefore, an LSTM (Long Short-Term Memory) architecture is adopted to improve the Policy Network, so that the influence of all node information selected in the early stage on subsequent selection is enhanced. New state is to be set Input to LSTM to get hidden state at time t>And will be +.>Mapping to the action space to obtain probability vectors with the same dimension as the action spaceAnd performs a selection relationship (i.e., action). The specific formula is as follows:
wherein ,network parameters of the full connection layer; />For LSTM attShort-term memory of time instant, ->For LSTM attA long-time memory of time, and the time and the memory are output together; />Indicating LSTM int-short-term and long-term memory at time 1. Due to->Contains more information of the selected node, so use +.>Mapping to action space to obtain probability vector +.>. If a repeated relationship is selected or a target node or path reaches a preset longest length limit, the intelligent body stops acting, and the total rewards are summarized for the exploratory process of the round; if no duplicate relationship is selected, the search is continued until the longest limit is reached.
Reward functionRewardIs an important part of optimizing reinforcement learning network by seeking path accuracyPath diversity->Path efficiency->And causal prediction accuracy->The optimized network is reasonably selected, and the specific contents are as follows:
wherein ,representing the length of the explored path,/-, and >For a set of paths obtained by cosine similarity calculationFSimilarity between every two search paths. The whole reinforcement learning environment adopts gradient based on the obtained rewardsThe drop is optimized. Causal prediction accuracy->To ensure that reinforcement learning learns the relevance of path nodes and predicted outcomes.
In the above assumption, defineH 1 For a reasonable assumption of an event,H 2 for an unreasonable assumption of an event,H 1 assuming that the reasonable value score for an event is greater thanH 2 Assuming a reasonable value score for the event, the extracted path is considered to have a positive effect on the predicted outcome, so. On the contrary, the extracted route is considered to have a negative effect on the predicted result, so +.>。
S606, will、/>Is divided into->And do not go through->Two types, in、/>Between and->、/>The method adopts reinforcement learning to perform path exploration respectively, and the specific formula is as follows:
S607, combining the paths in the two path lists two by using the product function in Python self-contained toolkit ItetoolsAll the obtained path sets are the paths of the relay node, and then the paths are the logic chain path sets between the observation event and the assumption event based on the external event map P。
Further, in step S7, the specific calculation steps for the probability transition between logics and updating the preamble event chain representation are as follows:
s701, aiming at logic link pathjHas the following componentsEvent, expressed as->Embedding events by the Roberta pre-trained language model is represented as: />。
S702, considering that the content of the preceding event affects the occurrence probability of the following event, modeling is required for the preceding event and is recorded asBy calculating->Influence of subsequent event->To->Thereby indirectly influencing the representation of the final logical chain path +.>The method comprises the steps of carrying out a first treatment on the surface of the At time 0, the first event representation of the event chain path is taken as the initializing preamble event representation +.>。
S703, utilize the firstk-1 event and the firstkRepresentation of individual event correspondence events themselves、/>And the preamble event represents->In combination, the contextual representation of the contextual event is obtained +.>、/>:
wherein ,for network parameters +.>For hyperbolic tangent activation function,/->Represented as a stitching operation.
S704 according to、/>Calculating transition probability distribution between two events using fully connected network, due to preamble 1 to preamblek-1 event->Probability of transition to current step event +.>Causing an influence and therefore expressed as +.>And is simplified to +.>The specific formula is as follows:
wherein ,is a parameter of the fully connected network layer, +.>Activating a function for sigmoid->Represent the firstjThe first in the strip logic chaink-1 event to the firstkProbability distribution of individual event transitions ∈>Represented in a logic chainjSelect the firstkProbability distribution of individual events.
S705, calculating event transition probabilityAfter that, the current event content is merged into the current event representation +.>In (1), marked as->Preparation for the next event transition is made by the following specific formula:
S706, pair ofjIterative event calculation in event logic chain until last event in event chain is transferred after calculation, and finally obtaining preamble event representationThe specific formula is as follows:
wherein ,represent the firstjProbability that a bar event logic chain may occur; the probability multiplication between each front and back step event can be used as the probability that the event chain is possible to happen. />
Will be the firstjProbability of occurrence of a logical chain of eventsMultiplying the representation of the event logic chain to obtain an event logic chain context representation +.>The specific formula is as follows:
the set of contextual representations of all event logic chains is denoted asSThe method comprises the steps of carrying out a first treatment on the surface of the At this point, all possible occurrences in the event chain are fully considered.
S707, event logic chain path set PThe operation of steps S701 to S706 is performed for each logical chain path.
Further, in step S8, the event logical link context is representedAt the external event graph level, to highlight the relevance to a given event, the attention mechanism is utilized to observe the event logic chain from the given hypothesis angle to mine relevant traceability reasoning information, the first isjAttention score obtained by the event logic chain +.>Logical chain context representation with corresponding event +.>Multiply-accumulate to obtain context information based on external event logic diagramq path The specific formula is as follows:
wherein ,to give a representation of the hypothetical event, +.>For network parameters +.>For sigmoid activation function, oc represents a softmax operation.
Further, in step S9, the context vector is calculatedq path And key featuresh att Fusion is carried out by utilizing a gating mechanism to obtain comprehensive context vector based on path and textd att And using nonlinear variation to calculate reasonable score corresponding to hypothesisY H The specific formula is as follows:
Further, in step S11, the specific steps of the optimization method using the loss function are as follows:
s111, predicting a loss function by using a cross entropy loss function fitting method Judging reasonable value score of hypothesisyThe specific formula is as follows:
wherein ,ythe resulting reasonable value scores are calculated for the model,a standard reasonable value is given for the dataset. Reasonable assumption is expected and the resulting reasonable value scoreyAs close to 1 as possible, the reasonable value score of the unreasonable hypothesis is as close to 0 as possible.
S112, reasoning about the loss function by using the event logic chainAiming at an event chain coding module, fitting is carried out by using event transition probability given by a data set as a standard through an inference model, so as to ensure the correct inference capability of the model on events, wherein a specific formula is as follows:
wherein ,represent the firstjThe total number of events contained in the bar event logic chain,Jrepresenting the total number of event logic chains, +.>Representing a standard probability distribution.
S113, modeling based on inverse fact sensitivity ideas and creating sensitivity loss functionAnd triplet loss function->With the substitution of the hypothesis, the factors of the change and the non-change are observed, and the influence factors of the effect event are further analyzed, wherein the specific formula is as follows:
wherein ,representing coding model transducer parameters; />Refers to the loss of L2 distance between two vectors, namely the sum of squares of corresponding bit element differences between vectors; />Is a deviation super parameter, and is a constant; YReasonable value score given for model, +.>Represented as an approximate description of reasonable assumptions.
Construction of sensitivity loss function based on inverse fact sensitivityAnd triplet loss function->To find the minimum change hypothesis content (sensitivity loss function +.>Bringing the three inter-hypothesis representations close) enables the model to find the classification plane (triplet loss function +.>Ensuring the respective real prediction results unchanged), and improving the sensitivity of the prediction model.
For comparison of different hypotheses, a model with the same number of the listed hypotheses is built, and the model structures are consistent, giving a number of hypothesis events of 3, thus building three identical network models. After comparing the reasonable value scores of different hypotheses, the function is lost when the counterfactualThe number counter propagates back to the corresponding predictive loss function in each network modelAnd event logic chain loss function>The network parameters are updated so there are 3 sets of model parameters. With reference to the idea of federal learning, after t epochs pass each time, the parameters of 3 networks are averaged, and then synchronization parameters are broadcast to the 3 networks, so as to fuse the inference information learned by each model.
Compared with the prior art, the invention has the following beneficial effects:
The method considers the potential hidden event between the observed event and the hypothesized event by introducing the structured hidden variable, and leads the method to be more accurate than the reasoning of the prior method at the text level. When reasonable values are considered for given hypotheses, an external event diagram is introduced, the influence of related potential events is considered from the diagram level, similar logic among similar events is considered, the selection accuracy of the proposed hypotheses is improved, and priori knowledge is provided for model reasoning. Meanwhile, a self-attention mechanism is utilized to calculate a structured hidden variable, attention scores are used as potential causal links among events, and interpretability is provided for traceability reasoning. In addition, the invention adopts the idea of counterfacts sensitivity, so that the reasonable value score obtained by reasonable assumption is larger by learning reasoning, and key factors which can ensure that the result is unchanged along with the condition change in a series of events are grasped.
Drawings
FIG. 1 is a flowchart illustrating the overall steps of the present invention.
Fig. 2 is an overall structural view of the present invention.
FIG. 3 is a graph of the training process calculation of the present invention based on the inverse sensitivity.
Fig. 4 is a representation of the present invention providing a hypothetical approximate description of an alpha NLI dataset as a matter of data augmentation.
Detailed Description
The following describes the specific embodiments of the present invention in detail with reference to the accompanying drawings:
in order to achieve the above objective, the present invention provides an event tracing reasoning method based on inverse facts and path mining, wherein specific steps are shown in fig. 1, and an overall structure diagram is shown in fig. 2:
s1, inputting observation event text and assumption event text, and giving assumption event H. Wherein the observed event comprises a background eventAnd outcome event->The hypothetical event includes a hypothetical eventH 1 And hypothesis eventH 2 Only one hypothetical event is given at a time to operate.
In the present embodiment, the background viewing eventThe house is cleaned for Jane. Result observation event->To get back home, she found a mess in her home. Hypothetical eventsH 1 Which is a thief that breaks into the house by pulling the window open. Hypothetical eventsH 2 For that to be a day with a gentle breeze, a bird flies into the house. And inputting the event text into a word segmentation device of the RoBERTa model to perform word segmentation pretreatment, and obtaining a character sequence of each event text. For example, a->Is decomposed into a sequence of Jane, cleaned, completed, house, done, work, and]。
s2, as shown in FIG. 3, using a large-scale pre-training language model to build an encoder structure, adding [ CLS ] before the character sequence of each event text ]The tag is used as the first mark, and [ END ] is added after the character sequence]The tag indicates the end of the current sentence, resulting in [ [ CLS ]]Jane, clean, finish, house, go, work, [ END ]]]Length 9. Inputting the sequence intoTo 1 to N layers of coding in the RoBERTa model, a 9 x D vector matrix is finally obtained. Will [ CLS ]]The corresponding vector is used as the shallow event feature of the corresponding event, i.e. the first column vector of the matrix is taken asShallow feature representation of events. The shallow feature set for all events is represented as a matrix E ε 4 XD, where 4 is the total number of events given, including background events, result events, and two hypothetical events, and D is the dimension of the embedded representation.
S3, shallow feature set E of all events including observation events and assumption events, adopting a self-attention mechanism to obtain a 4 multiplied by 4 attention score square matrix serving as a causal effect matrix between the observation events and the assumption eventsA*。
S4, according to the causal effect matrixA*And (3) encoding the shallow event features obtained in the step S2 by using a graph neural network. The graphic neural network adopts the structure of a Graphic Convolution Network (GCN) and inputs a causal effect matrix into the graphic neural networkA*And a shallow feature set E of all events, finally obtaining an event representation E shaped like E * E 4 x D. Including background event representationsE 1 XD, result event represents +.> E 1 XD, assuming event representation +.>∈1×D,/>∈1×D,/>。
S5, using the attention mechanism to assume the representation of the eventIn the formation of observation event representation EInter-hidden stateZE, 1 xD, guiding the RoBERTa model to further encode the observed event, and extracting key features affecting causal effectsh att The method comprises the following specific steps:
S501、Vrepresenting the event representation obtained in step S4, for background eventsThe formula for obtaining the corresponding attention score is as follows:
wherein ,representing a softmax operation; />For background event->A corresponding attention score; />Is the attention network coefficient.
Results eventAnd hypothesis eventH 1 The corresponding attention score is likewise calculated by the above formula, by the corresponding attention score +.>Representing each event +.>Fusion to obtain an intermediate statezThe specific calculation formula is as follows:
wherein ,for result event->A corresponding attention score; />To assume eventH 1 A corresponding attention score; />Representing incorporation of event->、/>And hypothesis eventH 1 Is a piece of information of (a).
S502, utilizing a forward attention mechanism to hide from the middle stateZStarting, shallow event features for a given event are observedExtracting causal effect characteristics of the correlation, which is marked as +. > E 1×D; by means of the reverse attention mechanism, for->Conversely, attention is paid to identify irrelevant contents, denoted +.>∈1×D:
S503, relevant content of forward attention mechanismIrrelevant to the reverse attention mechanism +.>Adding to obtain key features affecting causal effectsh att ∈1×D:
Relying on information of observed events alone may lead to a lack of interpretability of the causal reasoning process and a low accuracy of the reasoning results. Considering that similar causal events may exist between similar reasoning logic, an external event logic diagram is introduced to supplement evidence events.
S6, projecting the observed event and the hypothesized event to an external event logic diagram by using cosine similarity, and obtaining a logic chain path set between the similar observed event and the similar hypothesized event by using a reinforcement learning algorithm, wherein the specific steps are as follows:
s601, defining the external event map asGAn event is defined as a node on the graphNCausal relationships between events are defined as edgesR。
S602, marking an adjacent matrix of the external event diagram as, wherein />Representing eventsiResulting in an eventj,/>Representing eventsiDoes not result in an eventj。
S603, projecting the observed event and the hypothesized event onto an external event logic diagram by using cosine similarity, wherein the specific formula is as follows:
wherein ,representing the nodes on the external event graph with the highest similarity value, namely the projection of a given event.
S604, background eventOutcome event->Hypothetical eventsHAre respectively corresponding to->Expressed as->。
In the present embodiment, background events are takenProjecting the event map to the outside, cleaning the room, and then going to the supermarket to get the event +.>The projection is: the Andi returns home to find the bedroom very messy and the money is lost. />
S605, performing path search by adopting single-agent reinforcement learning. The whole environment of reinforcement learning is defined by action spaceActionStatus ofStatePolicy Network and rewarding functionRewardFour-part composition, state pair combining deep path method and NLI prediction problemStatePolicy Network and rewarding functionRewardAnd (3) carrying out modification. The specific contents are as follows:
first, through a policy network, an agentDriven in each time stepWorking spaceActionA relation is searched as a path to reach the next node by selecting the relation +.>. Wherein the action spaceActionConsists of relationships between all pairs of entities, for example: receive, turn, and advance the relationship. Node->Having a new state, at time stepstIs defined as:
wherein ,is->Node representation +. >Is->Node representation +.>Is a time steptNode representation to which the time instant selection relation (i.e. action) is associated,/->The gap between the currently selected node and the target node is measured. Will be in an intermediate hidden stateZIncorporating the predictive process, paths with strong correlations are further extracted under simulation of background events and hypothetical events.
Secondly, the Policy Network is improved by adopting the LSTM architecture of the long-term memory Network, and the influence of all node information selected in the early stage on the subsequent selection is enhanced. New state is to be setInput to LSTM to get hidden state at time t>And will be +.>Mapping to action space to obtain probability vector with the same dimension as action space>And performs a selection relationship (i.e., action). The specific formula is as follows:
wherein ,network parameters of the full connection layer; />For LSTM attShort-term memory of time instant, ->For LSTM attA long-time memory of time, and the time and the memory are output together; />Indicating LSTM int-short-term and long-term memory at time 1. Due to->Contains more information of the selected node, so use +.>Mapping to action space to obtainTo probability vector->. If the repeated relation is selected, or the target node or the path reaches the preset longest length limit, the intelligent body stops acting, and the total rewards are summarized for the exploring process of the round; otherwise, the search is continued until the longest limit is reached.
Finally, a bonus functionRewardIs an important part of optimizing reinforcement learning network by seeking path accuracyPath diversity->Path efficiency->And causal prediction accuracy->The optimized network is reasonably selected, and the specific contents are as follows: />
wherein ,representing the length of the explored path,/-, and>for a set of paths obtained by cosine similarity calculationFSimilarity between every two search paths. The whole reinforcement learning environment is optimized by adopting gradient descent on the basis of the obtained rewards. Causal prediction accuracy->To ensure that reinforcement learning learns the relevance of path nodes and predicted outcomes.
S606, will、/>Is divided into->And do not go through->Two types, in、/>Between and->、/>The method adopts reinforcement learning to perform path exploration respectively, and the specific formula is as follows:
wherein ,is indicated at->、/>Between (I)>Is indicated at->、/>Between them. In this embodiment, the event logic chain is: [ Adish clean room after going to supermarket. The thief prizes the door. And the thief prizes the safe in the bedroom, and steals money. The angdi returns home to find the bedroom messy and the money is lost.]。
S607, combining the paths in the two path lists two by using the product function in Python self-contained toolkit Itetools All the obtained path sets are the paths of the relay node, and then the paths are the logic chain path sets between the observation event and the assumption event based on the external event mapPI.e., { [ Andi cleaned room and then arrived at supermarket. The thief prizes the door. And the thief prizes the safe in the bedroom, and steals money. The angdi returns home to find the bedroom messy and the money is lost.]The user forgets to close the window when rushing out of the house. Thieves slide in from the window. The thief takes away money. And (3) finding that money is lost and alarming when the user returns home.], [ Anna is watching TV at home. The bird flies in, creaking, but is very good. The Anna home is clean.]}。
S7, iteratively calculating probability transition among logics through a deep neural network and updating the preamble event chain representation to obtain all event chain representationsSThe specific calculation steps are as follows:
s701, aiming at logic link pathjHas the following componentsEvent, expressed as->Embedding events using a Roberta pre-trained language model is represented as: />。
S702, considering that the content of the preceding event affects the occurrence probability of the following event, modeling is required for the preceding event and is recorded asBy calculating->Influence of subsequent event->To->Thereby indirectly influencing the representation of the final logical chain path +. >The method comprises the steps of carrying out a first treatment on the surface of the At time 0, the first event representation of the event chain path is taken as the initializing preamble event representation +.>。
S703, utilize the firstk-1 event and the firstkRepresentation of individual event correspondence events themselves、/>And the preamble event represents->In combination, the contextual representation of the contextual event is obtained +.>、/>:
wherein ,For network parameters +.>For hyperbolic tangent activation function,/->Represented as a stitching operation.
S704 according to、/>Calculating transition probability distribution between two events using fully connected network, due to preamble 1 to preamblek-1 event->Probability of transition to current step event +.>Causing an influence and therefore expressed as +.>And is simplified to +.>The specific formula is as follows:
wherein ,is a parameter of the fully connected network layer, +.>Activating a function for sigmoid->Represent the firstjThe first in the strip logic chaink-1 event to the firstkProbability distribution of individual event transitions ∈>Represented in a logic chainjSelect the firstkProbability distribution of individual events.
S705, calculating event transition probabilityAfter that, the current event content is merged into the current event representation +.>In preparation for the next event transition, the specific formula is:
S706, pair ofjIterative event calculation in event logic chain until last event in event chain is transferred after calculation, and finally obtaining preamble event representation The specific formula is as follows: />
wherein ,represent the firstjProbability that a bar event logic chain may occur; the probability multiplication between each front and back step event can be used as the probability that the event chain is possible to happen.
Will be the firstjProbability of occurrence of a logical chain of eventsMultiplying the representation of the event logic chain to obtain an event logic chain context representation +.>The specific formula is as follows:
the set of contextual representations of all event logic chains is denoted asSE N D, where N is the set of logical chain pathsPIs a logical chain total number of the plurality of logical chains. At this point, all possible occurrences in the event chain are fully considered.
S707, event logic chain path setPThe operation of steps S701 to S706 is performed for each logical chain path.
S8, representing by given hypothesisE1 XD with all event chain representationSE N x D attention mechanism to obtain context vector based on external event logic diagramq path The specific content of the E1 xD is: event logic chain context representation +.>At the external event graph level, to highlight the relevance to a given event, the attention mechanism is utilized to observe the event logic chain from the given hypothesis angle to mine relevant traceability reasoning information, the first isjAttention score obtained by the event logic chain +. >Logical chain context representation with corresponding event +.>Multiply-accumulate to obtain context information based on external event logic diagramq path The specific formula is as follows:
wherein ,to give a representation of the hypothetical event, +.>For network parameters +.>For sigmoid activation function, oc represents a softmax operation.
S9, carrying out causal effect characteristicsh att E 1 XD with context vectorq path E, splicing the 1 xD phases into a 1 x2D vector, and calculating to obtain a reasonable value score corresponding to the assumed event H by using nonlinear variationY H The specific formula is as follows:
wherein Network superparameter for gating mechanism, +.>A parameter that is not a linear change,d att is a comprehensive context vector based on path and text.
S10, iterative selection hypothesisHAnd only brings in one hypothetical event at a timeHEach hypothesis will obtain a corresponding reasonable value score, and the hypothesis with the highest reasonable value score is selected as the most likely reasonable hypothesis to be output.
S11, reasoning a loss function through a prediction loss function, a logic chain, a negative fact sensitivity loss function and a triple loss function, and performing a four-part combined optimization event tracing reasoning method, wherein the specific steps are as follows:
s111, utilize CrossEntropy loss function fitting model by predicting loss function Judging reasonable value score of hypothesisyThe specific formula is as follows:
wherein ,ythe resulting reasonable value scores are calculated for the model,a standard reasonable value is given for the dataset. Reasonable assumption is expected and the resulting reasonable value scoreyAs close to 1 as possible, the reasonable value score of the unreasonable hypothesis is as close to 0 as possible.
S112, reasoning about the loss function by using the event logic chainAiming at an event chain coding module, fitting is carried out by using event transition probability given by a data set as a standard through an inference model, so as to ensure the correct inference capability of the model on events, wherein a specific formula is as follows:
wherein ,represent the firstjThe total number of events contained in the bar event logic chain,Jrepresenting the total number of event logic chains, +.>Representing a standard probability distribution.
S113, modeling based on inverse fact sensitivity ideas, and utilizing sensitivity loss functionAnd triplet loss function->With the substitution of the hypothesis, the factors of the change and the non-change are observed, and the influence factors of the effect event are further analyzed, wherein the specific formula is as follows:
wherein ,representing coding model transducer parameters; />Refers to the loss of L2 distance between two vectors, namely the sum of squares of corresponding bit element differences between vectors; />Is a deviation super parameter, and is a constant; YA reasonable value score given to the model, whereinH 1 For the sake of a reasonable assumption,H 2 for irrational assumption, ->Represented as an approximate description of reasonable assumptions. As shown in FIG. 4, the approximate description may be provided by the dataset from which a reasonable assumption is drawn, denoted +.>As a data extension. In this example, approximate example->Is "a thief breaks into the house by prying open the door".
Construction of sensitivity loss function based on inverse fact sensitivityAnd triplet loss function->To find the minimum change hypothesis content (sensitivity loss function +.>Bringing the three inter-hypothesis representations close) enables the model to find the classification plane (triplet loss function +.>Ensuring the respective real prediction results unchanged), and improving the sensitivity of the prediction model. For example, by sensitivity loss function->And triplet loss function->In synergy, it can be assumed that "prying open the door" or "prying open the window" is not a critical factor leading to a difference in predicted results, but that "thieves" are critical factors leading to causality.
For comparison of different hypotheses, a model with the same number of the listed hypotheses is built, and the model structures are consistent, giving a number of hypothesis events of 3, thus building three identical network models. After comparing the reasonable value scores of the different hypotheses, corresponding predicted loss functions in each network model are propagated back as the inverse fact loss functions And event logic chain loss function>Updating network parameters, so that there are 3 sets of model parameters; with reference to the idea of federal learning, after t epochs pass each time, the parameters of 3 networks are averaged, and then synchronization parameters are broadcast to the 3 networks, so as to fuse the inference information learned by each model.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereto, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the present invention.
Claims (10)
1. The event tracing reasoning method based on the counterfactual and path mining is characterized by comprising the following steps:
s1, inputting observation event text and assumption event text, and giving assumption eventH;
S2, coding the observed event text and the hypothesized event text by using a RoBERTa pre-training language model to obtain shallow event characteristics;
s3, obtaining attention scores of the observation event text and the assumption event text by using a self-attention mechanism, and using the attention scores as a causal effect matrix between the observation event and the assumption eventA*;
S4, according to the causal effect matrixA*Coding the shallow event features in the step S2 by using a graph neural network to obtain event representation;
S5, forming the hypothesized event representation and the observed event representation into an intermediate hidden state by using an attention mechanismZGuiding the RoBERTa pre-training language model to further encode the observed event and extracting key features affecting causal effectsh att ;
S6, projecting the observed event and the hypothesized event to an external event logic diagram by using cosine similarity, and obtaining a logic chain path set between the similar observed event and the similar hypothesized event by using a reinforcement learning algorithm;
s7, iteratively calculating probability transition among logics through a deep neural network and updating the preamble event chain representation to obtain all event chain representationsS;
S8, representing by given hypothesisV H Chain representation with all eventsSPerforming an attention mechanism to obtain context vectors based on external event logic diagramsq path ;
S9, key features are obtainedh att And context directionMeasuring amountq path Phase stitching, computing for hypothetical eventsHIs a reasonable value score for (1);
s10, iteratively selecting hypothesis eventsHAnd only one at a time, selecting the hypothesis with the highest score of the reasonable value as the reasonable hypothesis most likely to occur;
s11, reasoning the loss function through a prediction loss function, a logic chain, and a trace reasoning method of the four parts of the inverse sensitivity loss function and the triple loss function.
2. The method for trace-back reasoning about events based on inverse facts and path mining according to claim 1, wherein in step S1, the observed events include background eventsAnd outcome event->The hypothetical event includes a hypothetical eventH 1 And hypothesis eventH 2 Only one hypothetical event is given at a time to operate.
3. The method for trace-back reasoning about events based on inverse facts and path mining according to claim 1, wherein in step S2, a RoBERTa pre-training language model is built through Huggingface website one-key, CLS is added to the event text as a first mark, and after the bar text is encoded in the RoBERTa pre-training language model, the expression vector of CLS is used as the shallow feature expression of the given event.
4. The method for event trace-back reasoning based on inverse facts and path mining according to claim 1, wherein in step S4, the specific content of the obtained event representation is: CLS for each event]Representation and causal effect matrixA*Input into a graph convolution neural network for coding to obtain event representation,/>,/>。
5. The method for trace-back reasoning of events based on inverse facts and path mining according to claim 1, wherein in step S5, key features affecting causal effects are extracted h att The specific steps of (a) are as follows:
S501、Vrepresenting the event representation obtained in step S4, for background eventsThe formula for obtaining the corresponding attention score is as follows:
wherein ,representing a softmax operation; />For background event->A corresponding attention score; />Is the attention network coefficient;
results eventAnd hypothesis eventH 1 The corresponding attention score is also calculated by the above formulaCalculated, by the corresponding attention score +.>Representing each event +.>Fusion to obtain an intermediate hidden stateZThe specific calculation formula is as follows:
wherein ,for result event->A corresponding attention score; />To assume eventH 1 A corresponding attention score;representing incorporation of event->、/>And hypothesis eventH 1 Information of (2);
s502, utilizing a forward attention mechanism to hide from the middle stateZStarting, shallow event features for a given event are observedExtracting causal effect characteristics of the correlation, which is marked as +.>The method comprises the steps of carrying out a first treatment on the surface of the By means of the reverse attention mechanism, for->On the contrary, attention is paid to the recipe, which is marked as +.>:
S503, relevant content of forward attention mechanismIrrelevant to the reverse attention mechanism +.>Adding to obtain key features affecting causal effectsh att :
6. The event trace reasoning method based on inverse facts and path mining according to claim 1, wherein in step S6, the specific steps of obtaining a logic chain path set are as follows:
S601, defining an external event graph as G, and defining an event as a node on the graphNCausal relationships between events are defined as edgesR;
S602, marking a causal effect matrix of the external event map as, wherein />Representing eventsiResulting in an eventj,/>Representing eventsiDoes not result in an eventj;
S603, projecting the observed event and the hypothesized event to an external event logic diagram by using cosine similarity, wherein the specific formula is as follows:
wherein ,representing nodes on the external event graph with highest similarity value, namely projection of a given event;
s604, background eventOutcome event->Hypothetical eventsHAre respectively corresponding to->Expressed as->;
S605, performing path search by adopting single-agent reinforcement learning, wherein the whole environment of reinforcement learning is defined by an action spaceActionStatus ofStatePolicy Network and rewarding functionRewardFour-part composition, state pair combining deep path method and NLI prediction problemStatePolicy Network and rewarding functionRewardThe specific contents of the modification are as follows:
utilizing a policy networkComplex and intelligent agentFrom the active space in each time stepActionA relation is found as a path to the next node by selecting the relation +.>The node has a new state, at time stepstIs defined as:
wherein , is->Node representation +.>Is->Node representation +.>Is a time steptNode representation to which the time selection relationship is associated, +.>Measuring the gap between the current selected node and the target node;
new state is to be setInput into LSTM to obtaintHidden state of moment->By means of the full connection layer +.>Mapping into the motion space, obtaining probability vector with the same dimension as the motion space>And selecting a relationship; the specific formula is as follows:
wherein ,network parameters of all connection layers, +.>For LSTM attShort-term memory of time instant, ->For LSTM attA long-term memory of the time; />Indicating LSTM int-short-term and long-term memory at time-1;
if a repeated relation is selected or a target node or a path reaches a preset longest length limit, stopping the action of the intelligent body, and summarizing the total rewards for the exploring process of the round; otherwise, searching until the longest limit is reached;
by seeking path accuracyPath diversity->Path efficiency->And causal prediction accuracy->The optimized network is reasonably selected, and the specific contents are as follows:
wherein ,representing the length of the explored path,/-, and>for computing the set of obtained pathsFSimilarity between every two exploration paths;
S606, will、/>Is divided into->And do not go through->Two kinds, in->、Between and->、/>The method adopts reinforcement learning to perform path exploration respectively, and the specific formula is as follows: />
s607, combining the paths in the two path lists two by using the product function in Python self-contained toolkit ItetoolsAll the obtained path sets are the paths of the relay node, and then the paths are the logic chain path sets between the observation event and the assumption event based on the external event mapP。
7. The method for trace-by-event reasoning based on inverse facts and path mining according to claim 1, wherein in step S7, the specific calculation steps of the probability transition between logics and updating the preamble event chain representation are as follows:
s701, aiming at logic link pathjHas the following componentsEvent, expressed as->Embedding events by the RoBERTa pre-trained language model is represented as: />;
S702, modeling the preamble event, which is recorded asCalculate the preamble event->Influence of subsequent event->To->At time 0, the first event representation of the event chain path is taken as the initializing preamble event representation +.>;
S703, utilize the firstk-1 event and the firstkRepresentation of individual event correspondence events themselves 、/>And the preamble event represents->In combination, the contextual representation of the contextual event is obtained +.>、/>:
wherein ,for network parameters +.>For hyperbolic tangent activation function,/->Represented as a splice operation;
s704 according to、/>Calculating a transition probability distribution between two events using a fully connected network, expressed as +.>And is simplified to +.>The specific formula is as follows:
wherein ,is a parameter of the fully connected network layer, +.>Activating a function for sigmoid->Represent the firstjThe first in the strip logic chaink-1 event to the firstkProbability distribution of individual event transitions ∈>Represented in a logic chainjSelect the firstkProbability distribution of individual events;
s705, integrating the current event content into the current preamble event representationThe specific formula is as follows:
s706, pair ofjIterative event calculation in the event logic chain until the last event in the event logic chain is transferred after calculation, and a preamble event representation is obtainedThe specific formula is as follows:
wherein ,represent the firstjProbability that a bar event logic chain may occur; multiplying the probability between each front and back step event as the probability that the event chain may occur;
will be the firstjProbability of occurrence of a logical chain of eventsMultiplying the representation of the event logic chain to obtain an event logic chain context representation +. >The specific formula is as follows:
the set of contextual representations of all event logic chains is denoted asS;
S707, event logic chain path setPThe operation of steps S701 to S706 is performed for each logical chain path.
8. The method for trace-back reasoning about events based on inverse facts and path mining according to claim 1, wherein in step S8, the event logic chain is observed from the point of view of the given hypothesis using the attention mechanism, the first isjAttention score from a logical chain of eventsLogical chain context representation with corresponding event +.>Multiply-accumulate to obtain context information based on external event logic diagramq path The specific formula is as follows:
9. The method for trace-back reasoning about events based on inverse facts and path mining according to claim 1, wherein in step S9, a hypothetical event is calculatedHThe specific content of the reasonable value score of (a) is as follows: context vectorq path And key featuresh att Fusion is carried out by utilizing a gating mechanism, and comprehensive context vectors based on paths and texts are obtainedd att And calculating reasonable score corresponding to the hypothesis by using nonlinear variation Y H The specific formula is as follows:
10. The event trace-back reasoning method based on inverse facts and path mining according to claim 1, wherein in step S11, the specific steps of the optimization method using the loss function are as follows:
s111, predicting the loss functionJudging reasonable value score of hypothesisyThe specific formula is as follows:
wherein ,yin order to calculate the resulting reasonable value score,is a standard reasonable value in the data set;
s112, reasoning about the loss function by using the event logic chainAiming at an event chain coding module, fitting is carried out by using event transition probability given by a data set as a standard through an inference model, wherein the specific formula is as follows:
wherein ,represent the firstjThe total number of events contained in the bar event logic chain,Jrepresenting the total number of event logic chains, +.>Representing a standard probability distribution;
s113, based on the inverse fact sensitivity idea, passing through a sensitivity loss functionAnd triplet loss function->With the substitution of the hypothesis, the factors of the change and the non-change are observed, and the influence factors of the effect event are further analyzed, wherein the specific formula is as follows:
wherein ,representing coding model transducer parameters; / >Refers to the loss of L2 distance between two vectors, namely the sum of squares of corresponding bit element differences between vectors; />Is a deviation super parameter, and is a constant;Ya reasonable value score given to the model, whereinH 1 For the sake of a reasonable assumption,H 2 for irrational assumption, ->Represented as an approximate description of reasonable assumptions. />
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310426771.6A CN116151375B (en) | 2023-04-20 | 2023-04-20 | Event tracing reasoning method based on inverse facts and path mining |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310426771.6A CN116151375B (en) | 2023-04-20 | 2023-04-20 | Event tracing reasoning method based on inverse facts and path mining |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116151375A true CN116151375A (en) | 2023-05-23 |
CN116151375B CN116151375B (en) | 2023-07-14 |
Family
ID=86352849
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310426771.6A Active CN116151375B (en) | 2023-04-20 | 2023-04-20 | Event tracing reasoning method based on inverse facts and path mining |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116151375B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117350386A (en) * | 2023-12-04 | 2024-01-05 | 南京信息工程大学 | Event tracing reasoning method and system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190073420A1 (en) * | 2017-09-04 | 2019-03-07 | Borislav Agapiev | System for creating a reasoning graph and for ranking of its nodes |
CN110471297A (en) * | 2019-07-30 | 2019-11-19 | 清华大学 | Multiple agent cooperative control method, system and equipment |
CN111581396A (en) * | 2020-05-06 | 2020-08-25 | 西安交通大学 | Event graph construction system and method based on multi-dimensional feature fusion and dependency syntax |
CN111626056A (en) * | 2020-04-11 | 2020-09-04 | 中国人民解放军战略支援部队信息工程大学 | Chinese named entity identification method and device based on RoBERTA-BiGRU-LAN model |
CN114580642A (en) * | 2022-03-17 | 2022-06-03 | 中国科学院自动化研究所 | Method, device, equipment and medium for constructing game AI model and processing data |
-
2023
- 2023-04-20 CN CN202310426771.6A patent/CN116151375B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20190073420A1 (en) * | 2017-09-04 | 2019-03-07 | Borislav Agapiev | System for creating a reasoning graph and for ranking of its nodes |
CN110471297A (en) * | 2019-07-30 | 2019-11-19 | 清华大学 | Multiple agent cooperative control method, system and equipment |
CN111626056A (en) * | 2020-04-11 | 2020-09-04 | 中国人民解放军战略支援部队信息工程大学 | Chinese named entity identification method and device based on RoBERTA-BiGRU-LAN model |
CN111581396A (en) * | 2020-05-06 | 2020-08-25 | 西安交通大学 | Event graph construction system and method based on multi-dimensional feature fusion and dependency syntax |
CN114580642A (en) * | 2022-03-17 | 2022-06-03 | 中国科学院自动化研究所 | Method, device, equipment and medium for constructing game AI model and processing data |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117350386A (en) * | 2023-12-04 | 2024-01-05 | 南京信息工程大学 | Event tracing reasoning method and system |
CN117350386B (en) * | 2023-12-04 | 2024-03-19 | 南京信息工程大学 | Event tracing reasoning method and system |
Also Published As
Publication number | Publication date |
---|---|
CN116151375B (en) | 2023-07-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Naeem et al. | Learning graph embeddings for compositional zero-shot learning | |
Zhang et al. | Neural, symbolic and neural-symbolic reasoning on knowledge graphs | |
CN109934261B (en) | Knowledge-driven parameter propagation model and few-sample learning method thereof | |
Roshanfekr et al. | Sentiment analysis using deep learning on Persian texts | |
CN111651974B (en) | Implicit discourse relation analysis method and system | |
CN111340661B (en) | Automatic application problem solving method based on graph neural network | |
Zhang et al. | Building interpretable interaction trees for deep nlp models | |
CN111291556A (en) | Chinese entity relation extraction method based on character and word feature fusion of entity meaning item | |
CN116151375B (en) | Event tracing reasoning method based on inverse facts and path mining | |
CN114911945A (en) | Knowledge graph-based multi-value chain data management auxiliary decision model construction method | |
CN113535904A (en) | Aspect level emotion analysis method based on graph neural network | |
CN116383399A (en) | Event public opinion risk prediction method and system | |
CN116402066A (en) | Attribute-level text emotion joint extraction method and system for multi-network feature fusion | |
Moreira et al. | Distantly-supervised neural relation extraction with side information using BERT | |
CN112347245A (en) | Viewpoint mining method and device for investment and financing field mechanism and electronic equipment | |
Zhang et al. | Hierarchical representation and deep learning–based method for automatically transforming textual building codes into semantic computable requirements | |
Paria et al. | A neural architecture mimicking humans end-to-end for natural language inference | |
Hao | Evaluating attribution methods using white-box LSTMs | |
CN110889505A (en) | Cross-media comprehensive reasoning method and system for matching image-text sequences | |
CN112560440B (en) | Syntax dependency method for aspect-level emotion analysis based on deep learning | |
CN114818682B (en) | Document level entity relation extraction method based on self-adaptive entity path perception | |
CN114357166B (en) | Text classification method based on deep learning | |
Ji et al. | LSTM based semi-supervised attention framework for sentiment analysis | |
CN113468884A (en) | Chinese event trigger word extraction method and device | |
Xiao et al. | A Recursive tree-structured neural network with goal forgetting and information aggregation for solving math word problems |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |