CN116151375A - Event tracing reasoning method based on inverse facts and path mining - Google Patents

Event tracing reasoning method based on inverse facts and path mining Download PDF

Info

Publication number
CN116151375A
CN116151375A CN202310426771.6A CN202310426771A CN116151375A CN 116151375 A CN116151375 A CN 116151375A CN 202310426771 A CN202310426771 A CN 202310426771A CN 116151375 A CN116151375 A CN 116151375A
Authority
CN
China
Prior art keywords
event
path
events
representation
chain
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310426771.6A
Other languages
Chinese (zh)
Other versions
CN116151375B (en
Inventor
孙圣杰
荣欢
马廷淮
杨毅
蒋永溢
汤子睿
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Information Science and Technology
Original Assignee
Nanjing University of Information Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Information Science and Technology filed Critical Nanjing University of Information Science and Technology
Priority to CN202310426771.6A priority Critical patent/CN116151375B/en
Publication of CN116151375A publication Critical patent/CN116151375A/en
Application granted granted Critical
Publication of CN116151375B publication Critical patent/CN116151375B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/042Backward inferencing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention disclosesAn event trace reasoning method based on inverse facts and path mining comprises the following steps: a self-attention mechanism is adopted to obtain a causality effect matrix, and the causality effect matrix is led into an event node representation by using a graph neural network; forming an intermediate hidden state by using an attention mechanism, and guiding the RoBERTa model to extract key features of an observation eventh att The method comprises the steps of carrying out a first treatment on the surface of the Projecting the events onto an external event logic diagram by using cosine similarity, and calculating logic links between similar events by using reinforcement learning based on the intermediate hidden state; obtaining context vectors using an attention mechanismq path The method comprises the steps of carrying out a first treatment on the surface of the Will beh att And (3) withq path Stitching is used to calculate reasonable value scores for hypotheses; selecting the hypothesis with the highest score of the reasonable value as the reasonable hypothesis which is most likely to occur; and adding a back facts loss function optimization model, and comparing different hypothesis events to mine key traceability features. The reasoning result of the method is more accurate, and key factors supporting tracing are grabbed according to the sensitivity of the counterfactual.

Description

Event tracing reasoning method based on inverse facts and path mining
Technical Field
The invention belongs to the field of natural language processing, and particularly relates to an event tracing reasoning method based on inverse facts and path mining.
Background
There are a number of supervised trained models in the field of natural language processing, which are trained on text with tag values, in order to find a direct link between the input text and the result tag. Such models are often black box models, with no interpretability. Therefore, natural language reasoning is necessary based on natural language understanding, and human beings consider that the behavior of "tracing" is the core of people to read and understand natural language for a long time. The αnli task is selected for traceability reasoning here to reflect the reasoning capabilities of the model. The alpha NLI traceability reasoning task is to select the most reasonable explanation or hypothesis reasoning according to the incompletely observed situation.
For the αnli task, bhagavatula C, le Bras R, malaviya C, et al, "AbductiveCommonsense Reasoning" encoded with pre-trained language models BERT and GPT, training the inference capabilities of the models. The "displaationlp: abductive reasoning for explainable science question answering" builds a graph structure on top of the text of the observed event and the hypothesized event, with words as nodes, and builds an inference graph to determine the rationality of the hypothesized event. But the accuracy of reasoning about the hypothetical events is low only by incompletely observing the events, so that many researchers join an external knowledge base and attempt to integrate the knowledge of the external event map to enhance the traceability reasoning, wherein the external knowledge comprises the information of social knowledge, causal knowledge, auxiliary evidence events and the like. For example, mu F, li W, xie z. "Effect Generation Based on CausalReasoning" introduced an event-level external event graph to capture auxiliary knowledge related to a given event to support auxiliary model reasoning; du L, ding X, liu T, et al, "Learning eventgraph knowledge for abductive reasoning" supplements the auxiliary evidence events between observed events by way of pre-training, so that the state change between captured events is more fine-grained, and further the model reasoning performance is improved.
The graphic prize acquirer Pearl j. "Theoretical Impediments to Machine Learning With Seven Sparks from the Causal Revolution" believes that the current machine learning system operates almost entirely in a statistical or model-free mode, which severely limits the system's capabilities and performance at a theoretical level. He puts forward three layers of association, intervention and counter facts, and considers that the existing model is only in the "association" layer, namely, the data is output by reasoning according to the input data, and the inference can not be made on the problems of the "intervention" and "counter facts", so that the existing model is difficult to be the basis of strong AI. To solve the above problems, existing models attempt to introduce counterfactual. Paul D, frank A. "Generating Hypothetical Events for Abductive Inference" defines the counterfactual as a task, replaces intermediate events on the same premise, lets the model reason about the content of subsequent events in text form, and hopes that the reasoning results are as consistent as possible with the true results. ZhangB, guo X, lin Q, et al, "Counterfactual inference graph network for disease prediction" tried to find similar event pairs from observed event pairs, and training models based on the true assumption that similar event pairs were reasonably trusted. The counterfacts emphasize that if the past intermediate events are changed on the premise of knowing the current event, the influence on the subsequent results is caused, and the existing model lacks the comparison of the current state and the assumed state, and the key factors for changing the event results cannot be captured.
Disclosure of Invention
The technical problems to be solved by the invention are as follows: the event traceability reasoning method based on the inverse facts and the path mining is provided, an external event graph and logic chain based double-coding structure is adopted, a self-attention mechanism is utilized to introduce a structured hidden variable to discover potential causal relations among events, and an external event logic graph is utilized to introduce traceability knowledge, so that reasonable value scores obtained by reasonable assumption are enabled to be larger through learning reasoning, and traceability of causal reasoning is highlighted.
In order to solve the technical problems, the invention adopts the following technical scheme:
an event tracing reasoning method based on inverse facts and path mining comprises the following steps:
s1, inputting observation event text and assumption event text, and giving assumption eventH
S2, coding the observed event text and the hypothesized event text by using the RoBERTa pre-training language model to obtain shallow event features.
S3, adopting a self-attention mechanism for the observed event and the hypothesized event, and taking the obtained attention score as a causal effect matrix between the observed event and the hypothesized eventA*The matrix is a square matrix.
S4, according to the causal effect matrixA*And (3) encoding the shallow event features obtained in the step (S2) by using a graph neural network to obtain event representation.
S5, forming an intermediate hidden state by using the attention mechanism to represent the hypothesized event and observe the eventZGuiding the RoBERTa pre-training language model to further encode the observed event and extracting key features affecting causal effectsh att
Relying on information of observed events alone may lead to a lack of interpretability of the causal reasoning process and a low accuracy of the reasoning results. Considering that similar causal events may exist between similar reasoning logic, an external event logic diagram is introduced to supplement evidence events.
S6, projecting the observed event and the hypothesized event on an external event logic diagram by using cosine similarity, and obtaining a logic chain path set between the similar observed event and the similar hypothesized event by using a reinforcement learning algorithm.
S7, iteratively calculating probability transition among logics through a deep neural network and updating the preamble event chain representation to obtain all event chain representationsS
S8, representing by given hypothesis
Figure SMS_1
Is shown with all event linksSPerforming an attention mechanism to obtain context vectors based on external event logic diagramsq path
S9, key features are obtainedh att With context vectorsq path And (5) splicing, and calculating a reasonable value score for the hypothesized event H.
S10, iterative selection hypothesis HAnd only brings in one hypothetical event at a timeHThe hypothesis with the highest score in rational value is selected as the most likely reasonable hypothesis.
S11, reasoning the loss function through a prediction loss function, a logic chain, and a trace reasoning method of the four parts of the inverse sensitivity loss function and the triple loss function.
Further, in step S1, the observation event includes a background event
Figure SMS_2
And outcome event->
Figure SMS_3
The hypothetical event includes a hypothetical eventH 1 And hypothesis eventH 2 Only one hypothetical event is given at a time to operate.
Further, in step S2, a RoBERTa pre-training language model is constructed through a Huggingface website in a one-touch manner, a [ CLS ] is added to an event text as a first mark, and after the text is encoded in the RoBERTa pre-training language model, a representation vector of the [ CLS ] is used as a shallow feature representation of the given event.
Further, in step S4, the specific content of the event representation is obtained as follows: CLS for each event]Representation and causal effect matrixA*Input into a graph convolution neural network for coding to obtain event representation
Figure SMS_4
,/>
Figure SMS_5
,/>
Figure SMS_6
Further, in step S5, key features are extractedh att The specific steps of (a) are as follows:
S501、Vrepresenting the event representation obtained in step S4, for background events
Figure SMS_7
The formula for obtaining the corresponding attention score is as follows:
Figure SMS_8
wherein ,
Figure SMS_9
representing a softmax operation; />
Figure SMS_10
For background event->
Figure SMS_11
A corresponding attention score; />
Figure SMS_12
Is the attention network coefficient.
Results event
Figure SMS_13
And hypothesis eventH 1 The corresponding attention score is likewise calculated by the above formula, by the corresponding attention score +.>
Figure SMS_14
Representing each event +.>
Figure SMS_15
Fusion to obtain an intermediate stateZThe specific calculation formula is as follows:
Figure SMS_16
wherein ,
Figure SMS_17
for result event->
Figure SMS_18
A corresponding attention score; />
Figure SMS_19
To assume eventH 1 A corresponding attention score; />
Figure SMS_20
Representing incorporation of event->
Figure SMS_21
、/>
Figure SMS_22
And hypothesis eventH 1 Is a piece of information of (a).
S502, utilizing a forward attention mechanism to hide from the middle stateZStarting, shallow event features for a given event are observed
Figure SMS_23
Extracting causal effect characteristics of the correlation, which is marked as +.>
Figure SMS_24
The method comprises the steps of carrying out a first treatment on the surface of the By means of the reverse attention mechanism, for->
Figure SMS_25
Conversely, attention is paid to identify irrelevant contents, denoted +.>
Figure SMS_26
Figure SMS_27
Figure SMS_28
S503, relevant content of forward attention mechanism
Figure SMS_29
Irrelevant to the reverse attention mechanism +.>
Figure SMS_30
Adding to obtain key features affecting causal effectsh att
Figure SMS_31
Further, in step S6, the specific steps of capturing the set of logical link paths are:
s601, defining the external event map asGAn event is defined as a node on the graph NCausal relationships between events are defined as edgesR. If event A results in the occurrence of event B, then the edge is pointed to by event A to event B.
S602, marking an adjacent matrix of the external event diagram as
Figure SMS_32
, wherein />
Figure SMS_33
Representing eventsiResulting in an eventj,/>
Figure SMS_34
Representing eventsiDoes not result in an eventj
S603, projecting the observed event and the hypothesized event onto an external event logic diagram by using cosine similarity, wherein the specific formula is as follows:
Figure SMS_35
wherein ,
Figure SMS_36
representing the nodes on the external event graph with the highest similarity value, namely the projection of a given event.
S604, background event
Figure SMS_37
Outcome event->
Figure SMS_38
Hypothetical eventsHAre respectively corresponding to->
Figure SMS_39
Expressed as->
Figure SMS_40
。/>
S605, the prior method adopts breadth-first traversal or depth-first traversal, path exploration is performed in advance, interaction with a prediction process cannot be performed, and the prior graph traversal algorithm has randomness, semantic information among nodes on a graph cannot be considered, so that reinforcement learning is adopted for path searching.
The whole environment of reinforcement learning is defined by action spaceActionStatus ofStatePolicy Network and rewarding functionRewardFour-part composition, state pair combining deep path method and NLI prediction problemStatePolicy Network and rewarding functionRewardAnd (3) carrying out modification. The specific contents are as follows:
First, through a policy network, an agent
Figure SMS_41
From the active space in each time stepActionA relation is searched as a path to reach the next node by selecting the relation +.>
Figure SMS_42
. Wherein the action setActionConsists of relationships between all pairs of entities, for example: receive, turn, and advance the relationship. The node->
Figure SMS_43
Having a new state, at time stepstIs defined as:
Figure SMS_44
wherein ,
Figure SMS_45
is->
Figure SMS_46
Node representation +.>
Figure SMS_47
Is->
Figure SMS_48
Node representation +.>
Figure SMS_49
Is a time steptNode representation to which the time instant selection relation (i.e. action) is associated,/->
Figure SMS_50
The gap between the currently selected node and the target node is measured. Will be in an intermediate hidden stateZIncorporating the predictive process, paths with strong correlations are further extracted under simulation of background events and hypothetical events.
The prior reinforcement learning is based on the Markov process, and only depends on the previous state to predict, and the obtained information is insufficient, which is easy to cause strongLearning is repeated with jumps between individual nodes. Therefore, an LSTM (Long Short-Term Memory) architecture is adopted to improve the Policy Network, so that the influence of all node information selected in the early stage on subsequent selection is enhanced. New state is to be set
Figure SMS_51
Input to LSTM to get hidden state at time t>
Figure SMS_52
And will be +.>
Figure SMS_53
Mapping to the action space to obtain probability vectors with the same dimension as the action space
Figure SMS_54
And performs a selection relationship (i.e., action). The specific formula is as follows:
Figure SMS_55
Figure SMS_56
wherein ,
Figure SMS_57
network parameters of the full connection layer; />
Figure SMS_58
For LSTM attShort-term memory of time instant, ->
Figure SMS_59
For LSTM attA long-time memory of time, and the time and the memory are output together; />
Figure SMS_60
Indicating LSTM int-short-term and long-term memory at time 1. Due to->
Figure SMS_61
Contains more information of the selected node, so use +.>
Figure SMS_62
Mapping to action space to obtain probability vector +.>
Figure SMS_63
. If a repeated relationship is selected or a target node or path reaches a preset longest length limit, the intelligent body stops acting, and the total rewards are summarized for the exploratory process of the round; if no duplicate relationship is selected, the search is continued until the longest limit is reached.
Reward functionRewardIs an important part of optimizing reinforcement learning network by seeking path accuracy
Figure SMS_64
Path diversity->
Figure SMS_65
Path efficiency->
Figure SMS_66
And causal prediction accuracy->
Figure SMS_67
The optimized network is reasonably selected, and the specific contents are as follows:
Figure SMS_68
wherein ,
Figure SMS_69
representing the length of the explored path,/-, and >
Figure SMS_70
For a set of paths obtained by cosine similarity calculationFSimilarity between every two search paths. The whole reinforcement learning environment adopts gradient based on the obtained rewardsThe drop is optimized. Causal prediction accuracy->
Figure SMS_71
To ensure that reinforcement learning learns the relevance of path nodes and predicted outcomes.
In the above assumption, defineH 1 For a reasonable assumption of an event,H 2 for an unreasonable assumption of an event,H 1 assuming that the reasonable value score for an event is greater thanH 2 Assuming a reasonable value score for the event, the extracted path is considered to have a positive effect on the predicted outcome, so
Figure SMS_72
. On the contrary, the extracted route is considered to have a negative effect on the predicted result, so +.>
Figure SMS_73
S606, will
Figure SMS_76
、/>
Figure SMS_78
Is divided into->
Figure SMS_80
And do not go through->
Figure SMS_75
Two types, in
Figure SMS_77
、/>
Figure SMS_79
Between and->
Figure SMS_81
、/>
Figure SMS_74
The method adopts reinforcement learning to perform path exploration respectively, and the specific formula is as follows:
Figure SMS_82
wherein ,
Figure SMS_83
is indicated at->
Figure SMS_84
、/>
Figure SMS_85
Between (I)>
Figure SMS_86
Is indicated at->
Figure SMS_87
、/>
Figure SMS_88
Between them.
S607, combining the paths in the two path lists two by using the product function in Python self-contained toolkit Itetools
Figure SMS_89
All the obtained path sets are the paths of the relay node, and then the paths are the logic chain path sets between the observation event and the assumption event based on the external event map P
Further, in step S7, the specific calculation steps for the probability transition between logics and updating the preamble event chain representation are as follows:
s701, aiming at logic link pathjHas the following components
Figure SMS_90
Event, expressed as->
Figure SMS_91
Embedding events by the Roberta pre-trained language model is represented as: />
Figure SMS_92
S702, considering that the content of the preceding event affects the occurrence probability of the following event, modeling is required for the preceding event and is recorded as
Figure SMS_93
By calculating->
Figure SMS_94
Influence of subsequent event->
Figure SMS_95
To->
Figure SMS_96
Thereby indirectly influencing the representation of the final logical chain path +.>
Figure SMS_97
The method comprises the steps of carrying out a first treatment on the surface of the At time 0, the first event representation of the event chain path is taken as the initializing preamble event representation +.>
Figure SMS_98
S703, utilize the firstk-1 event and the firstkRepresentation of individual event correspondence events themselves
Figure SMS_99
、/>
Figure SMS_100
And the preamble event represents->
Figure SMS_101
In combination, the contextual representation of the contextual event is obtained +.>
Figure SMS_102
、/>
Figure SMS_103
Figure SMS_104
Figure SMS_105
wherein ,
Figure SMS_106
for network parameters +.>
Figure SMS_107
For hyperbolic tangent activation function,/->
Figure SMS_108
Represented as a stitching operation.
S704 according to
Figure SMS_109
、/>
Figure SMS_110
Calculating transition probability distribution between two events using fully connected network, due to preamble 1 to preamblek-1 event->
Figure SMS_111
Probability of transition to current step event +.>
Figure SMS_112
Causing an influence and therefore expressed as +.>
Figure SMS_113
And is simplified to +.>
Figure SMS_114
The specific formula is as follows:
Figure SMS_115
wherein ,
Figure SMS_116
is a parameter of the fully connected network layer, +.>
Figure SMS_117
Activating a function for sigmoid->
Figure SMS_118
Represent the firstjThe first in the strip logic chaink-1 event to the firstkProbability distribution of individual event transitions ∈>
Figure SMS_119
Represented in a logic chainjSelect the firstkProbability distribution of individual events.
S705, calculating event transition probability
Figure SMS_120
After that, the current event content is merged into the current event representation +.>
Figure SMS_121
In (1), marked as->
Figure SMS_122
Preparation for the next event transition is made by the following specific formula:
Figure SMS_123
wherein ,
Figure SMS_124
is a network parameter.
S706, pair ofjIterative event calculation in event logic chain until last event in event chain is transferred after calculation, and finally obtaining preamble event representation
Figure SMS_125
The specific formula is as follows:
Figure SMS_126
wherein ,
Figure SMS_127
represent the firstjProbability that a bar event logic chain may occur; the probability multiplication between each front and back step event can be used as the probability that the event chain is possible to happen. />
Will be the firstjProbability of occurrence of a logical chain of events
Figure SMS_128
Multiplying the representation of the event logic chain to obtain an event logic chain context representation +.>
Figure SMS_129
The specific formula is as follows:
Figure SMS_130
the set of contextual representations of all event logic chains is denoted asSThe method comprises the steps of carrying out a first treatment on the surface of the At this point, all possible occurrences in the event chain are fully considered.
S707, event logic chain path set PThe operation of steps S701 to S706 is performed for each logical chain path.
Further, in step S8, the event logical link context is represented
Figure SMS_131
At the external event graph level, to highlight the relevance to a given event, the attention mechanism is utilized to observe the event logic chain from the given hypothesis angle to mine relevant traceability reasoning information, the first isjAttention score obtained by the event logic chain +.>
Figure SMS_132
Logical chain context representation with corresponding event +.>
Figure SMS_133
Multiply-accumulate to obtain context information based on external event logic diagramq path The specific formula is as follows:
Figure SMS_134
wherein ,
Figure SMS_135
to give a representation of the hypothetical event, +.>
Figure SMS_136
For network parameters +.>
Figure SMS_137
For sigmoid activation function, oc represents a softmax operation.
Further, in step S9, the context vector is calculatedq path And key featuresh att Fusion is carried out by utilizing a gating mechanism to obtain comprehensive context vector based on path and textd att And using nonlinear variation to calculate reasonable score corresponding to hypothesisY H The specific formula is as follows:
Figure SMS_138
wherein
Figure SMS_139
Network superparameter for gating mechanism, +.>
Figure SMS_140
A parameter that varies non-linearly.
Further, in step S11, the specific steps of the optimization method using the loss function are as follows:
s111, predicting a loss function by using a cross entropy loss function fitting method
Figure SMS_141
Judging reasonable value score of hypothesisyThe specific formula is as follows:
Figure SMS_142
wherein ,ythe resulting reasonable value scores are calculated for the model,
Figure SMS_143
a standard reasonable value is given for the dataset. Reasonable assumption is expected and the resulting reasonable value scoreyAs close to 1 as possible, the reasonable value score of the unreasonable hypothesis is as close to 0 as possible.
S112, reasoning about the loss function by using the event logic chain
Figure SMS_144
Aiming at an event chain coding module, fitting is carried out by using event transition probability given by a data set as a standard through an inference model, so as to ensure the correct inference capability of the model on events, wherein a specific formula is as follows:
Figure SMS_145
/>
wherein ,
Figure SMS_146
represent the firstjThe total number of events contained in the bar event logic chain,Jrepresenting the total number of event logic chains, +.>
Figure SMS_147
Representing a standard probability distribution.
S113, modeling based on inverse fact sensitivity ideas and creating sensitivity loss function
Figure SMS_148
And triplet loss function->
Figure SMS_149
With the substitution of the hypothesis, the factors of the change and the non-change are observed, and the influence factors of the effect event are further analyzed, wherein the specific formula is as follows:
Figure SMS_150
Figure SMS_151
wherein ,
Figure SMS_152
representing coding model transducer parameters; />
Figure SMS_153
Refers to the loss of L2 distance between two vectors, namely the sum of squares of corresponding bit element differences between vectors; />
Figure SMS_154
Is a deviation super parameter, and is a constant; YReasonable value score given for model, +.>
Figure SMS_155
Represented as an approximate description of reasonable assumptions.
Construction of sensitivity loss function based on inverse fact sensitivity
Figure SMS_156
And triplet loss function->
Figure SMS_157
To find the minimum change hypothesis content (sensitivity loss function +.>
Figure SMS_158
Bringing the three inter-hypothesis representations close) enables the model to find the classification plane (triplet loss function +.>
Figure SMS_159
Ensuring the respective real prediction results unchanged), and improving the sensitivity of the prediction model.
For comparison of different hypotheses, a model with the same number of the listed hypotheses is built, and the model structures are consistent, giving a number of hypothesis events of 3, thus building three identical network models. After comparing the reasonable value scores of different hypotheses, the function is lost when the counterfactualThe number counter propagates back to the corresponding predictive loss function in each network model
Figure SMS_160
And event logic chain loss function>
Figure SMS_161
The network parameters are updated so there are 3 sets of model parameters. With reference to the idea of federal learning, after t epochs pass each time, the parameters of 3 networks are averaged, and then synchronization parameters are broadcast to the 3 networks, so as to fuse the inference information learned by each model.
Compared with the prior art, the invention has the following beneficial effects:
The method considers the potential hidden event between the observed event and the hypothesized event by introducing the structured hidden variable, and leads the method to be more accurate than the reasoning of the prior method at the text level. When reasonable values are considered for given hypotheses, an external event diagram is introduced, the influence of related potential events is considered from the diagram level, similar logic among similar events is considered, the selection accuracy of the proposed hypotheses is improved, and priori knowledge is provided for model reasoning. Meanwhile, a self-attention mechanism is utilized to calculate a structured hidden variable, attention scores are used as potential causal links among events, and interpretability is provided for traceability reasoning. In addition, the invention adopts the idea of counterfacts sensitivity, so that the reasonable value score obtained by reasonable assumption is larger by learning reasoning, and key factors which can ensure that the result is unchanged along with the condition change in a series of events are grasped.
Drawings
FIG. 1 is a flowchart illustrating the overall steps of the present invention.
Fig. 2 is an overall structural view of the present invention.
FIG. 3 is a graph of the training process calculation of the present invention based on the inverse sensitivity.
Fig. 4 is a representation of the present invention providing a hypothetical approximate description of an alpha NLI dataset as a matter of data augmentation.
Detailed Description
The following describes the specific embodiments of the present invention in detail with reference to the accompanying drawings:
in order to achieve the above objective, the present invention provides an event tracing reasoning method based on inverse facts and path mining, wherein specific steps are shown in fig. 1, and an overall structure diagram is shown in fig. 2:
s1, inputting observation event text and assumption event text, and giving assumption event H. Wherein the observed event comprises a background event
Figure SMS_162
And outcome event->
Figure SMS_163
The hypothetical event includes a hypothetical eventH 1 And hypothesis eventH 2 Only one hypothetical event is given at a time to operate.
In the present embodiment, the background viewing event
Figure SMS_164
The house is cleaned for Jane. Result observation event->
Figure SMS_165
To get back home, she found a mess in her home. Hypothetical eventsH 1 Which is a thief that breaks into the house by pulling the window open. Hypothetical eventsH 2 For that to be a day with a gentle breeze, a bird flies into the house. And inputting the event text into a word segmentation device of the RoBERTa model to perform word segmentation pretreatment, and obtaining a character sequence of each event text. For example, a->
Figure SMS_166
Is decomposed into a sequence of Jane, cleaned, completed, house, done, work, and]。
s2, as shown in FIG. 3, using a large-scale pre-training language model to build an encoder structure, adding [ CLS ] before the character sequence of each event text ]The tag is used as the first mark, and [ END ] is added after the character sequence]The tag indicates the end of the current sentence, resulting in [ [ CLS ]]Jane, clean, finish, house, go, work, [ END ]]]Length 9. Inputting the sequence intoTo 1 to N layers of coding in the RoBERTa model, a 9 x D vector matrix is finally obtained. Will [ CLS ]]The corresponding vector is used as the shallow event feature of the corresponding event, i.e. the first column vector of the matrix is taken as
Figure SMS_167
Shallow feature representation of events. The shallow feature set for all events is represented as a matrix E ε 4 XD, where 4 is the total number of events given, including background events, result events, and two hypothetical events, and D is the dimension of the embedded representation.
S3, shallow feature set E of all events including observation events and assumption events, adopting a self-attention mechanism to obtain a 4 multiplied by 4 attention score square matrix serving as a causal effect matrix between the observation events and the assumption eventsA*
S4, according to the causal effect matrixA*And (3) encoding the shallow event features obtained in the step S2 by using a graph neural network. The graphic neural network adopts the structure of a Graphic Convolution Network (GCN) and inputs a causal effect matrix into the graphic neural networkA*And a shallow feature set E of all events, finally obtaining an event representation E shaped like E * E 4 x D. Including background event representations
Figure SMS_168
E
1 XD, result event represents +.>
Figure SMS_169
E 1 XD, assuming event representation +.>
Figure SMS_170
∈1×D,/>
Figure SMS_171
∈1×D,/>
Figure SMS_172
S5, using the attention mechanism to assume the representation of the event
Figure SMS_173
In the formation of observation event representation EInter-hidden stateZE, 1 xD, guiding the RoBERTa model to further encode the observed event, and extracting key features affecting causal effectsh att The method comprises the following specific steps:
S501、Vrepresenting the event representation obtained in step S4, for background events
Figure SMS_174
The formula for obtaining the corresponding attention score is as follows:
Figure SMS_175
wherein ,
Figure SMS_176
representing a softmax operation; />
Figure SMS_177
For background event->
Figure SMS_178
A corresponding attention score; />
Figure SMS_179
Is the attention network coefficient.
Results event
Figure SMS_180
And hypothesis eventH 1 The corresponding attention score is likewise calculated by the above formula, by the corresponding attention score +.>
Figure SMS_181
Representing each event +.>
Figure SMS_182
Fusion to obtain an intermediate statezThe specific calculation formula is as follows:
Figure SMS_183
wherein ,
Figure SMS_184
for result event->
Figure SMS_185
A corresponding attention score; />
Figure SMS_186
To assume eventH 1 A corresponding attention score; />
Figure SMS_187
Representing incorporation of event->
Figure SMS_188
、/>
Figure SMS_189
And hypothesis eventH 1 Is a piece of information of (a).
S502, utilizing a forward attention mechanism to hide from the middle stateZStarting, shallow event features for a given event are observed
Figure SMS_190
Extracting causal effect characteristics of the correlation, which is marked as +. >
Figure SMS_191
E 1×D; by means of the reverse attention mechanism, for->
Figure SMS_192
Conversely, attention is paid to identify irrelevant contents, denoted +.>
Figure SMS_193
∈1×D:
Figure SMS_194
Figure SMS_195
S503, relevant content of forward attention mechanism
Figure SMS_196
Irrelevant to the reverse attention mechanism +.>
Figure SMS_197
Adding to obtain key features affecting causal effectsh att ∈1×D:
Figure SMS_198
Relying on information of observed events alone may lead to a lack of interpretability of the causal reasoning process and a low accuracy of the reasoning results. Considering that similar causal events may exist between similar reasoning logic, an external event logic diagram is introduced to supplement evidence events.
S6, projecting the observed event and the hypothesized event to an external event logic diagram by using cosine similarity, and obtaining a logic chain path set between the similar observed event and the similar hypothesized event by using a reinforcement learning algorithm, wherein the specific steps are as follows:
s601, defining the external event map asGAn event is defined as a node on the graphNCausal relationships between events are defined as edgesR
S602, marking an adjacent matrix of the external event diagram as
Figure SMS_199
, wherein />
Figure SMS_200
Representing eventsiResulting in an eventj,/>
Figure SMS_201
Representing eventsiDoes not result in an eventj
S603, projecting the observed event and the hypothesized event onto an external event logic diagram by using cosine similarity, wherein the specific formula is as follows:
Figure SMS_202
wherein ,
Figure SMS_203
representing the nodes on the external event graph with the highest similarity value, namely the projection of a given event.
S604, background event
Figure SMS_204
Outcome event->
Figure SMS_205
Hypothetical eventsHAre respectively corresponding to->
Figure SMS_206
Expressed as->
Figure SMS_207
In the present embodiment, background events are taken
Figure SMS_208
Projecting the event map to the outside, cleaning the room, and then going to the supermarket to get the event +.>
Figure SMS_209
The projection is: the Andi returns home to find the bedroom very messy and the money is lost. />
S605, performing path search by adopting single-agent reinforcement learning. The whole environment of reinforcement learning is defined by action spaceActionStatus ofStatePolicy Network and rewarding functionRewardFour-part composition, state pair combining deep path method and NLI prediction problemStatePolicy Network and rewarding functionRewardAnd (3) carrying out modification. The specific contents are as follows:
first, through a policy network, an agent
Figure SMS_210
Driven in each time stepWorking spaceActionA relation is searched as a path to reach the next node by selecting the relation +.>
Figure SMS_211
. Wherein the action spaceActionConsists of relationships between all pairs of entities, for example: receive, turn, and advance the relationship. Node->
Figure SMS_212
Having a new state, at time stepstIs defined as:
Figure SMS_213
wherein ,
Figure SMS_214
is->
Figure SMS_215
Node representation +. >
Figure SMS_216
Is->
Figure SMS_217
Node representation +.>
Figure SMS_218
Is a time steptNode representation to which the time instant selection relation (i.e. action) is associated,/->
Figure SMS_219
The gap between the currently selected node and the target node is measured. Will be in an intermediate hidden stateZIncorporating the predictive process, paths with strong correlations are further extracted under simulation of background events and hypothetical events.
Secondly, the Policy Network is improved by adopting the LSTM architecture of the long-term memory Network, and the influence of all node information selected in the early stage on the subsequent selection is enhanced. New state is to be set
Figure SMS_220
Input to LSTM to get hidden state at time t>
Figure SMS_221
And will be +.>
Figure SMS_222
Mapping to action space to obtain probability vector with the same dimension as action space>
Figure SMS_223
And performs a selection relationship (i.e., action). The specific formula is as follows:
Figure SMS_224
Figure SMS_225
wherein ,
Figure SMS_226
network parameters of the full connection layer; />
Figure SMS_227
For LSTM attShort-term memory of time instant, ->
Figure SMS_228
For LSTM attA long-time memory of time, and the time and the memory are output together; />
Figure SMS_229
Indicating LSTM int-short-term and long-term memory at time 1. Due to->
Figure SMS_230
Contains more information of the selected node, so use +.>
Figure SMS_231
Mapping to action space to obtainTo probability vector->
Figure SMS_232
. If the repeated relation is selected, or the target node or the path reaches the preset longest length limit, the intelligent body stops acting, and the total rewards are summarized for the exploring process of the round; otherwise, the search is continued until the longest limit is reached.
Finally, a bonus functionRewardIs an important part of optimizing reinforcement learning network by seeking path accuracy
Figure SMS_233
Path diversity->
Figure SMS_234
Path efficiency->
Figure SMS_235
And causal prediction accuracy->
Figure SMS_236
The optimized network is reasonably selected, and the specific contents are as follows: />
Figure SMS_237
wherein ,
Figure SMS_238
representing the length of the explored path,/-, and>
Figure SMS_239
for a set of paths obtained by cosine similarity calculationFSimilarity between every two search paths. The whole reinforcement learning environment is optimized by adopting gradient descent on the basis of the obtained rewards. Causal prediction accuracy->
Figure SMS_240
To ensure that reinforcement learning learns the relevance of path nodes and predicted outcomes.
S606, will
Figure SMS_243
、/>
Figure SMS_244
Is divided into->
Figure SMS_246
And do not go through->
Figure SMS_242
Two types, in
Figure SMS_245
、/>
Figure SMS_247
Between and->
Figure SMS_248
、/>
Figure SMS_241
The method adopts reinforcement learning to perform path exploration respectively, and the specific formula is as follows:
Figure SMS_249
wherein ,
Figure SMS_250
is indicated at->
Figure SMS_251
、/>
Figure SMS_252
Between (I)>
Figure SMS_253
Is indicated at->
Figure SMS_254
、/>
Figure SMS_255
Between them. In this embodiment, the event logic chain is: [ Adish clean room after going to supermarket. The thief prizes the door. And the thief prizes the safe in the bedroom, and steals money. The angdi returns home to find the bedroom messy and the money is lost.]。
S607, combining the paths in the two path lists two by using the product function in Python self-contained toolkit Itetools
Figure SMS_256
All the obtained path sets are the paths of the relay node, and then the paths are the logic chain path sets between the observation event and the assumption event based on the external event mapPI.e., { [ Andi cleaned room and then arrived at supermarket. The thief prizes the door. And the thief prizes the safe in the bedroom, and steals money. The angdi returns home to find the bedroom messy and the money is lost.]The user forgets to close the window when rushing out of the house. Thieves slide in from the window. The thief takes away money. And (3) finding that money is lost and alarming when the user returns home.], [ Anna is watching TV at home. The bird flies in, creaking, but is very good. The Anna home is clean.]}。
S7, iteratively calculating probability transition among logics through a deep neural network and updating the preamble event chain representation to obtain all event chain representationsSThe specific calculation steps are as follows:
s701, aiming at logic link pathjHas the following components
Figure SMS_257
Event, expressed as->
Figure SMS_258
Embedding events using a Roberta pre-trained language model is represented as: />
Figure SMS_259
S702, considering that the content of the preceding event affects the occurrence probability of the following event, modeling is required for the preceding event and is recorded as
Figure SMS_260
By calculating->
Figure SMS_261
Influence of subsequent event->
Figure SMS_262
To->
Figure SMS_263
Thereby indirectly influencing the representation of the final logical chain path +. >
Figure SMS_264
The method comprises the steps of carrying out a first treatment on the surface of the At time 0, the first event representation of the event chain path is taken as the initializing preamble event representation +.>
Figure SMS_265
S703, utilize the firstk-1 event and the firstkRepresentation of individual event correspondence events themselves
Figure SMS_266
、/>
Figure SMS_267
And the preamble event represents->
Figure SMS_268
In combination, the contextual representation of the contextual event is obtained +.>
Figure SMS_269
、/>
Figure SMS_270
Figure SMS_271
Figure SMS_272
wherein ,
Figure SMS_273
For network parameters +.>
Figure SMS_274
For hyperbolic tangent activation function,/->
Figure SMS_275
Represented as a stitching operation.
S704 according to
Figure SMS_276
、/>
Figure SMS_277
Calculating transition probability distribution between two events using fully connected network, due to preamble 1 to preamblek-1 event->
Figure SMS_278
Probability of transition to current step event +.>
Figure SMS_279
Causing an influence and therefore expressed as +.>
Figure SMS_280
And is simplified to +.>
Figure SMS_281
The specific formula is as follows:
Figure SMS_282
wherein ,
Figure SMS_283
is a parameter of the fully connected network layer, +.>
Figure SMS_284
Activating a function for sigmoid->
Figure SMS_285
Represent the firstjThe first in the strip logic chaink-1 event to the firstkProbability distribution of individual event transitions ∈>
Figure SMS_286
Represented in a logic chainjSelect the firstkProbability distribution of individual events.
S705, calculating event transition probability
Figure SMS_287
After that, the current event content is merged into the current event representation +.>
Figure SMS_288
In preparation for the next event transition, the specific formula is:
Figure SMS_289
wherein ,
Figure SMS_290
is a network parameter.
S706, pair ofjIterative event calculation in event logic chain until last event in event chain is transferred after calculation, and finally obtaining preamble event representation
Figure SMS_291
The specific formula is as follows: />
Figure SMS_292
wherein ,
Figure SMS_293
represent the firstjProbability that a bar event logic chain may occur; the probability multiplication between each front and back step event can be used as the probability that the event chain is possible to happen.
Will be the firstjProbability of occurrence of a logical chain of events
Figure SMS_294
Multiplying the representation of the event logic chain to obtain an event logic chain context representation +.>
Figure SMS_295
The specific formula is as follows:
Figure SMS_296
the set of contextual representations of all event logic chains is denoted asSE N D, where N is the set of logical chain pathsPIs a logical chain total number of the plurality of logical chains. At this point, all possible occurrences in the event chain are fully considered.
S707, event logic chain path setPThe operation of steps S701 to S706 is performed for each logical chain path.
S8, representing by given hypothesis
Figure SMS_297
E1 XD with all event chain representationSE N x D attention mechanism to obtain context vector based on external event logic diagramq path The specific content of the E1 xD is: event logic chain context representation +.>
Figure SMS_298
At the external event graph level, to highlight the relevance to a given event, the attention mechanism is utilized to observe the event logic chain from the given hypothesis angle to mine relevant traceability reasoning information, the first isjAttention score obtained by the event logic chain +. >
Figure SMS_299
Logical chain context representation with corresponding event +.>
Figure SMS_300
Multiply-accumulate to obtain context information based on external event logic diagramq path The specific formula is as follows:
Figure SMS_301
wherein ,
Figure SMS_302
to give a representation of the hypothetical event, +.>
Figure SMS_303
For network parameters +.>
Figure SMS_304
For sigmoid activation function, oc represents a softmax operation.
S9, carrying out causal effect characteristicsh att E 1 XD with context vectorq path E, splicing the 1 xD phases into a 1 x2D vector, and calculating to obtain a reasonable value score corresponding to the assumed event H by using nonlinear variationY H The specific formula is as follows:
Figure SMS_305
wherein
Figure SMS_306
Network superparameter for gating mechanism, +.>
Figure SMS_307
A parameter that is not a linear change,d att is a comprehensive context vector based on path and text.
S10, iterative selection hypothesisHAnd only brings in one hypothetical event at a timeHEach hypothesis will obtain a corresponding reasonable value score, and the hypothesis with the highest reasonable value score is selected as the most likely reasonable hypothesis to be output.
S11, reasoning a loss function through a prediction loss function, a logic chain, a negative fact sensitivity loss function and a triple loss function, and performing a four-part combined optimization event tracing reasoning method, wherein the specific steps are as follows:
s111, utilize CrossEntropy loss function fitting model by predicting loss function
Figure SMS_308
Judging reasonable value score of hypothesisyThe specific formula is as follows:
Figure SMS_309
wherein ,ythe resulting reasonable value scores are calculated for the model,
Figure SMS_310
a standard reasonable value is given for the dataset. Reasonable assumption is expected and the resulting reasonable value scoreyAs close to 1 as possible, the reasonable value score of the unreasonable hypothesis is as close to 0 as possible.
S112, reasoning about the loss function by using the event logic chain
Figure SMS_311
Aiming at an event chain coding module, fitting is carried out by using event transition probability given by a data set as a standard through an inference model, so as to ensure the correct inference capability of the model on events, wherein a specific formula is as follows:
Figure SMS_312
wherein ,
Figure SMS_313
represent the firstjThe total number of events contained in the bar event logic chain,Jrepresenting the total number of event logic chains, +.>
Figure SMS_314
Representing a standard probability distribution.
S113, modeling based on inverse fact sensitivity ideas, and utilizing sensitivity loss function
Figure SMS_315
And triplet loss function->
Figure SMS_316
With the substitution of the hypothesis, the factors of the change and the non-change are observed, and the influence factors of the effect event are further analyzed, wherein the specific formula is as follows:
Figure SMS_317
Figure SMS_318
wherein ,
Figure SMS_319
representing coding model transducer parameters; />
Figure SMS_320
Refers to the loss of L2 distance between two vectors, namely the sum of squares of corresponding bit element differences between vectors; />
Figure SMS_321
Is a deviation super parameter, and is a constant; YA reasonable value score given to the model, whereinH 1 For the sake of a reasonable assumption,H 2 for irrational assumption, ->
Figure SMS_322
Represented as an approximate description of reasonable assumptions. As shown in FIG. 4, the approximate description may be provided by the dataset from which a reasonable assumption is drawn, denoted +.>
Figure SMS_323
As a data extension. In this example, approximate example->
Figure SMS_324
Is "a thief breaks into the house by prying open the door".
Construction of sensitivity loss function based on inverse fact sensitivity
Figure SMS_325
And triplet loss function->
Figure SMS_326
To find the minimum change hypothesis content (sensitivity loss function +.>
Figure SMS_327
Bringing the three inter-hypothesis representations close) enables the model to find the classification plane (triplet loss function +.>
Figure SMS_328
Ensuring the respective real prediction results unchanged), and improving the sensitivity of the prediction model. For example, by sensitivity loss function->
Figure SMS_329
And triplet loss function->
Figure SMS_330
In synergy, it can be assumed that "prying open the door" or "prying open the window" is not a critical factor leading to a difference in predicted results, but that "thieves" are critical factors leading to causality.
For comparison of different hypotheses, a model with the same number of the listed hypotheses is built, and the model structures are consistent, giving a number of hypothesis events of 3, thus building three identical network models. After comparing the reasonable value scores of the different hypotheses, corresponding predicted loss functions in each network model are propagated back as the inverse fact loss functions
Figure SMS_331
And event logic chain loss function>
Figure SMS_332
Updating network parameters, so that there are 3 sets of model parameters; with reference to the idea of federal learning, after t epochs pass each time, the parameters of 3 networks are averaged, and then synchronization parameters are broadcast to the 3 networks, so as to fuse the inference information learned by each model.
The above embodiments are only for illustrating the technical idea of the present invention, and the protection scope of the present invention is not limited thereto, and any modification made on the basis of the technical scheme according to the technical idea of the present invention falls within the protection scope of the present invention.

Claims (10)

1. The event tracing reasoning method based on the counterfactual and path mining is characterized by comprising the following steps:
s1, inputting observation event text and assumption event text, and giving assumption eventH
S2, coding the observed event text and the hypothesized event text by using a RoBERTa pre-training language model to obtain shallow event characteristics;
s3, obtaining attention scores of the observation event text and the assumption event text by using a self-attention mechanism, and using the attention scores as a causal effect matrix between the observation event and the assumption eventA*
S4, according to the causal effect matrixA*Coding the shallow event features in the step S2 by using a graph neural network to obtain event representation;
S5, forming the hypothesized event representation and the observed event representation into an intermediate hidden state by using an attention mechanismZGuiding the RoBERTa pre-training language model to further encode the observed event and extracting key features affecting causal effectsh att
S6, projecting the observed event and the hypothesized event to an external event logic diagram by using cosine similarity, and obtaining a logic chain path set between the similar observed event and the similar hypothesized event by using a reinforcement learning algorithm;
s7, iteratively calculating probability transition among logics through a deep neural network and updating the preamble event chain representation to obtain all event chain representationsS
S8, representing by given hypothesisV H Chain representation with all eventsSPerforming an attention mechanism to obtain context vectors based on external event logic diagramsq path
S9, key features are obtainedh att And context directionMeasuring amountq path Phase stitching, computing for hypothetical eventsHIs a reasonable value score for (1);
s10, iteratively selecting hypothesis eventsHAnd only one at a time, selecting the hypothesis with the highest score of the reasonable value as the reasonable hypothesis most likely to occur;
s11, reasoning the loss function through a prediction loss function, a logic chain, and a trace reasoning method of the four parts of the inverse sensitivity loss function and the triple loss function.
2. The method for trace-back reasoning about events based on inverse facts and path mining according to claim 1, wherein in step S1, the observed events include background events
Figure QLYQS_1
And outcome event->
Figure QLYQS_2
The hypothetical event includes a hypothetical eventH 1 And hypothesis eventH 2 Only one hypothetical event is given at a time to operate.
3. The method for trace-back reasoning about events based on inverse facts and path mining according to claim 1, wherein in step S2, a RoBERTa pre-training language model is built through Huggingface website one-key, CLS is added to the event text as a first mark, and after the bar text is encoded in the RoBERTa pre-training language model, the expression vector of CLS is used as the shallow feature expression of the given event.
4. The method for event trace-back reasoning based on inverse facts and path mining according to claim 1, wherein in step S4, the specific content of the obtained event representation is: CLS for each event]Representation and causal effect matrixA*Input into a graph convolution neural network for coding to obtain event representation
Figure QLYQS_3
,/>
Figure QLYQS_4
,/>
Figure QLYQS_5
5. The method for trace-back reasoning of events based on inverse facts and path mining according to claim 1, wherein in step S5, key features affecting causal effects are extracted h att The specific steps of (a) are as follows:
S501、Vrepresenting the event representation obtained in step S4, for background events
Figure QLYQS_6
The formula for obtaining the corresponding attention score is as follows:
Figure QLYQS_7
wherein ,
Figure QLYQS_8
representing a softmax operation; />
Figure QLYQS_9
For background event->
Figure QLYQS_10
A corresponding attention score; />
Figure QLYQS_11
Is the attention network coefficient;
results event
Figure QLYQS_12
And hypothesis eventH 1 The corresponding attention score is also calculated by the above formulaCalculated, by the corresponding attention score +.>
Figure QLYQS_13
Representing each event +.>
Figure QLYQS_14
Fusion to obtain an intermediate hidden stateZThe specific calculation formula is as follows:
Figure QLYQS_15
wherein ,
Figure QLYQS_16
for result event->
Figure QLYQS_17
A corresponding attention score; />
Figure QLYQS_18
To assume eventH 1 A corresponding attention score;
Figure QLYQS_19
representing incorporation of event->
Figure QLYQS_20
、/>
Figure QLYQS_21
And hypothesis eventH 1 Information of (2);
s502, utilizing a forward attention mechanism to hide from the middle stateZStarting, shallow event features for a given event are observed
Figure QLYQS_22
Extracting causal effect characteristics of the correlation, which is marked as +.>
Figure QLYQS_23
The method comprises the steps of carrying out a first treatment on the surface of the By means of the reverse attention mechanism, for->
Figure QLYQS_24
On the contrary, attention is paid to the recipe, which is marked as +.>
Figure QLYQS_25
Figure QLYQS_26
Figure QLYQS_27
S503, relevant content of forward attention mechanism
Figure QLYQS_28
Irrelevant to the reverse attention mechanism +.>
Figure QLYQS_29
Adding to obtain key features affecting causal effectsh att
Figure QLYQS_30
6. The event trace reasoning method based on inverse facts and path mining according to claim 1, wherein in step S6, the specific steps of obtaining a logic chain path set are as follows:
S601, defining an external event graph as G, and defining an event as a node on the graphNCausal relationships between events are defined as edgesR
S602, marking a causal effect matrix of the external event map as
Figure QLYQS_31
, wherein />
Figure QLYQS_32
Representing eventsiResulting in an eventj,/>
Figure QLYQS_33
Representing eventsiDoes not result in an eventj
S603, projecting the observed event and the hypothesized event to an external event logic diagram by using cosine similarity, wherein the specific formula is as follows:
Figure QLYQS_34
wherein ,
Figure QLYQS_35
representing nodes on the external event graph with highest similarity value, namely projection of a given event;
s604, background event
Figure QLYQS_36
Outcome event->
Figure QLYQS_37
Hypothetical eventsHAre respectively corresponding to->
Figure QLYQS_38
Expressed as->
Figure QLYQS_39
S605, performing path search by adopting single-agent reinforcement learning, wherein the whole environment of reinforcement learning is defined by an action spaceActionStatus ofStatePolicy Network and rewarding functionRewardFour-part composition, state pair combining deep path method and NLI prediction problemStatePolicy Network and rewarding functionRewardThe specific contents of the modification are as follows:
utilizing a policy networkComplex and intelligent agent
Figure QLYQS_40
From the active space in each time stepActionA relation is found as a path to the next node by selecting the relation +.>
Figure QLYQS_41
The node has a new state, at time stepstIs defined as:
Figure QLYQS_42
wherein ,
Figure QLYQS_43
is->
Figure QLYQS_44
Node representation +.>
Figure QLYQS_45
Is->
Figure QLYQS_46
Node representation +.>
Figure QLYQS_47
Is a time steptNode representation to which the time selection relationship is associated, +.>
Figure QLYQS_48
Measuring the gap between the current selected node and the target node;
new state is to be set
Figure QLYQS_49
Input into LSTM to obtaintHidden state of moment->
Figure QLYQS_50
By means of the full connection layer +.>
Figure QLYQS_51
Mapping into the motion space, obtaining probability vector with the same dimension as the motion space>
Figure QLYQS_52
And selecting a relationship; the specific formula is as follows:
Figure QLYQS_53
Figure QLYQS_54
wherein ,
Figure QLYQS_55
network parameters of all connection layers, +.>
Figure QLYQS_56
For LSTM attShort-term memory of time instant, ->
Figure QLYQS_57
For LSTM attA long-term memory of the time; />
Figure QLYQS_58
Indicating LSTM int-short-term and long-term memory at time-1;
if a repeated relation is selected or a target node or a path reaches a preset longest length limit, stopping the action of the intelligent body, and summarizing the total rewards for the exploring process of the round; otherwise, searching until the longest limit is reached;
by seeking path accuracy
Figure QLYQS_59
Path diversity->
Figure QLYQS_60
Path efficiency->
Figure QLYQS_61
And causal prediction accuracy->
Figure QLYQS_62
The optimized network is reasonably selected, and the specific contents are as follows:
Figure QLYQS_63
wherein ,
Figure QLYQS_64
representing the length of the explored path,/-, and>
Figure QLYQS_65
for computing the set of obtained pathsFSimilarity between every two exploration paths;
S606, will
Figure QLYQS_66
、/>
Figure QLYQS_69
Is divided into->
Figure QLYQS_72
And do not go through->
Figure QLYQS_67
Two kinds, in->
Figure QLYQS_70
Figure QLYQS_71
Between and->
Figure QLYQS_73
、/>
Figure QLYQS_68
The method adopts reinforcement learning to perform path exploration respectively, and the specific formula is as follows: />
Figure QLYQS_74
wherein ,
Figure QLYQS_75
is indicated at->
Figure QLYQS_76
、/>
Figure QLYQS_77
Between (I)>
Figure QLYQS_78
Is indicated at->
Figure QLYQS_79
、/>
Figure QLYQS_80
Between them;
s607, combining the paths in the two path lists two by using the product function in Python self-contained toolkit Itetools
Figure QLYQS_81
All the obtained path sets are the paths of the relay node, and then the paths are the logic chain path sets between the observation event and the assumption event based on the external event mapP
7. The method for trace-by-event reasoning based on inverse facts and path mining according to claim 1, wherein in step S7, the specific calculation steps of the probability transition between logics and updating the preamble event chain representation are as follows:
s701, aiming at logic link pathjHas the following components
Figure QLYQS_82
Event, expressed as->
Figure QLYQS_83
Embedding events by the RoBERTa pre-trained language model is represented as: />
Figure QLYQS_84
S702, modeling the preamble event, which is recorded as
Figure QLYQS_85
Calculate the preamble event->
Figure QLYQS_86
Influence of subsequent event->
Figure QLYQS_87
To->
Figure QLYQS_88
At time 0, the first event representation of the event chain path is taken as the initializing preamble event representation +.>
Figure QLYQS_89
S703, utilize the firstk-1 event and the firstkRepresentation of individual event correspondence events themselves
Figure QLYQS_90
、/>
Figure QLYQS_91
And the preamble event represents->
Figure QLYQS_92
In combination, the contextual representation of the contextual event is obtained +.>
Figure QLYQS_93
、/>
Figure QLYQS_94
Figure QLYQS_95
Figure QLYQS_96
wherein ,
Figure QLYQS_97
for network parameters +.>
Figure QLYQS_98
For hyperbolic tangent activation function,/->
Figure QLYQS_99
Represented as a splice operation;
s704 according to
Figure QLYQS_100
、/>
Figure QLYQS_101
Calculating a transition probability distribution between two events using a fully connected network, expressed as +.>
Figure QLYQS_102
And is simplified to +.>
Figure QLYQS_103
The specific formula is as follows:
Figure QLYQS_104
wherein ,
Figure QLYQS_105
is a parameter of the fully connected network layer, +.>
Figure QLYQS_106
Activating a function for sigmoid->
Figure QLYQS_107
Represent the firstjThe first in the strip logic chaink-1 event to the firstkProbability distribution of individual event transitions ∈>
Figure QLYQS_108
Represented in a logic chainjSelect the firstkProbability distribution of individual events;
s705, integrating the current event content into the current preamble event representation
Figure QLYQS_109
The specific formula is as follows:
Figure QLYQS_110
wherein ,
Figure QLYQS_111
is a network parameter;
s706, pair ofjIterative event calculation in the event logic chain until the last event in the event logic chain is transferred after calculation, and a preamble event representation is obtained
Figure QLYQS_112
The specific formula is as follows:
Figure QLYQS_113
wherein ,
Figure QLYQS_114
represent the firstjProbability that a bar event logic chain may occur; multiplying the probability between each front and back step event as the probability that the event chain may occur;
will be the firstjProbability of occurrence of a logical chain of events
Figure QLYQS_115
Multiplying the representation of the event logic chain to obtain an event logic chain context representation +. >
Figure QLYQS_116
The specific formula is as follows:
Figure QLYQS_117
the set of contextual representations of all event logic chains is denoted asS
S707, event logic chain path setPThe operation of steps S701 to S706 is performed for each logical chain path.
8. The method for trace-back reasoning about events based on inverse facts and path mining according to claim 1, wherein in step S8, the event logic chain is observed from the point of view of the given hypothesis using the attention mechanism, the first isjAttention score from a logical chain of events
Figure QLYQS_118
Logical chain context representation with corresponding event +.>
Figure QLYQS_119
Multiply-accumulate to obtain context information based on external event logic diagramq path The specific formula is as follows:
Figure QLYQS_120
wherein ,
Figure QLYQS_121
to give a representation of the hypothetical event, +.>
Figure QLYQS_122
For network parameters +.>
Figure QLYQS_123
For sigmoid activation function, oc represents a softmax operation.
9. The method for trace-back reasoning about events based on inverse facts and path mining according to claim 1, wherein in step S9, a hypothetical event is calculatedHThe specific content of the reasonable value score of (a) is as follows: context vectorq path And key featuresh att Fusion is carried out by utilizing a gating mechanism, and comprehensive context vectors based on paths and texts are obtainedd att And calculating reasonable score corresponding to the hypothesis by using nonlinear variation Y H The specific formula is as follows:
Figure QLYQS_124
wherein ,
Figure QLYQS_125
network superparameter for gating mechanism, +.>
Figure QLYQS_126
A parameter that varies non-linearly.
10. The event trace-back reasoning method based on inverse facts and path mining according to claim 1, wherein in step S11, the specific steps of the optimization method using the loss function are as follows:
s111, predicting the loss function
Figure QLYQS_127
Judging reasonable value score of hypothesisyThe specific formula is as follows:
Figure QLYQS_128
/>
wherein ,yin order to calculate the resulting reasonable value score,
Figure QLYQS_129
is a standard reasonable value in the data set;
s112, reasoning about the loss function by using the event logic chain
Figure QLYQS_130
Aiming at an event chain coding module, fitting is carried out by using event transition probability given by a data set as a standard through an inference model, wherein the specific formula is as follows:
Figure QLYQS_131
wherein ,
Figure QLYQS_132
represent the firstjThe total number of events contained in the bar event logic chain,Jrepresenting the total number of event logic chains, +.>
Figure QLYQS_133
Representing a standard probability distribution;
s113, based on the inverse fact sensitivity idea, passing through a sensitivity loss function
Figure QLYQS_134
And triplet loss function->
Figure QLYQS_135
With the substitution of the hypothesis, the factors of the change and the non-change are observed, and the influence factors of the effect event are further analyzed, wherein the specific formula is as follows:
Figure QLYQS_136
Figure QLYQS_137
wherein ,
Figure QLYQS_138
representing coding model transducer parameters; / >
Figure QLYQS_139
Refers to the loss of L2 distance between two vectors, namely the sum of squares of corresponding bit element differences between vectors; />
Figure QLYQS_140
Is a deviation super parameter, and is a constant;Ya reasonable value score given to the model, whereinH 1 For the sake of a reasonable assumption,H 2 for irrational assumption, ->
Figure QLYQS_141
Represented as an approximate description of reasonable assumptions. />
CN202310426771.6A 2023-04-20 2023-04-20 Event tracing reasoning method based on inverse facts and path mining Active CN116151375B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310426771.6A CN116151375B (en) 2023-04-20 2023-04-20 Event tracing reasoning method based on inverse facts and path mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310426771.6A CN116151375B (en) 2023-04-20 2023-04-20 Event tracing reasoning method based on inverse facts and path mining

Publications (2)

Publication Number Publication Date
CN116151375A true CN116151375A (en) 2023-05-23
CN116151375B CN116151375B (en) 2023-07-14

Family

ID=86352849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310426771.6A Active CN116151375B (en) 2023-04-20 2023-04-20 Event tracing reasoning method based on inverse facts and path mining

Country Status (1)

Country Link
CN (1) CN116151375B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117350386A (en) * 2023-12-04 2024-01-05 南京信息工程大学 Event tracing reasoning method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190073420A1 (en) * 2017-09-04 2019-03-07 Borislav Agapiev System for creating a reasoning graph and for ranking of its nodes
CN110471297A (en) * 2019-07-30 2019-11-19 清华大学 Multiple agent cooperative control method, system and equipment
CN111581396A (en) * 2020-05-06 2020-08-25 西安交通大学 Event graph construction system and method based on multi-dimensional feature fusion and dependency syntax
CN111626056A (en) * 2020-04-11 2020-09-04 中国人民解放军战略支援部队信息工程大学 Chinese named entity identification method and device based on RoBERTA-BiGRU-LAN model
CN114580642A (en) * 2022-03-17 2022-06-03 中国科学院自动化研究所 Method, device, equipment and medium for constructing game AI model and processing data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190073420A1 (en) * 2017-09-04 2019-03-07 Borislav Agapiev System for creating a reasoning graph and for ranking of its nodes
CN110471297A (en) * 2019-07-30 2019-11-19 清华大学 Multiple agent cooperative control method, system and equipment
CN111626056A (en) * 2020-04-11 2020-09-04 中国人民解放军战略支援部队信息工程大学 Chinese named entity identification method and device based on RoBERTA-BiGRU-LAN model
CN111581396A (en) * 2020-05-06 2020-08-25 西安交通大学 Event graph construction system and method based on multi-dimensional feature fusion and dependency syntax
CN114580642A (en) * 2022-03-17 2022-06-03 中国科学院自动化研究所 Method, device, equipment and medium for constructing game AI model and processing data

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117350386A (en) * 2023-12-04 2024-01-05 南京信息工程大学 Event tracing reasoning method and system
CN117350386B (en) * 2023-12-04 2024-03-19 南京信息工程大学 Event tracing reasoning method and system

Also Published As

Publication number Publication date
CN116151375B (en) 2023-07-14

Similar Documents

Publication Publication Date Title
Naeem et al. Learning graph embeddings for compositional zero-shot learning
Zhang et al. Neural, symbolic and neural-symbolic reasoning on knowledge graphs
CN109934261B (en) Knowledge-driven parameter propagation model and few-sample learning method thereof
Roshanfekr et al. Sentiment analysis using deep learning on Persian texts
CN111651974B (en) Implicit discourse relation analysis method and system
CN111340661B (en) Automatic application problem solving method based on graph neural network
Zhang et al. Building interpretable interaction trees for deep nlp models
CN111291556A (en) Chinese entity relation extraction method based on character and word feature fusion of entity meaning item
CN116151375B (en) Event tracing reasoning method based on inverse facts and path mining
CN114911945A (en) Knowledge graph-based multi-value chain data management auxiliary decision model construction method
CN113535904A (en) Aspect level emotion analysis method based on graph neural network
CN116383399A (en) Event public opinion risk prediction method and system
CN116402066A (en) Attribute-level text emotion joint extraction method and system for multi-network feature fusion
Moreira et al. Distantly-supervised neural relation extraction with side information using BERT
CN112347245A (en) Viewpoint mining method and device for investment and financing field mechanism and electronic equipment
Zhang et al. Hierarchical representation and deep learning–based method for automatically transforming textual building codes into semantic computable requirements
Paria et al. A neural architecture mimicking humans end-to-end for natural language inference
Hao Evaluating attribution methods using white-box LSTMs
CN110889505A (en) Cross-media comprehensive reasoning method and system for matching image-text sequences
CN112560440B (en) Syntax dependency method for aspect-level emotion analysis based on deep learning
CN114818682B (en) Document level entity relation extraction method based on self-adaptive entity path perception
CN114357166B (en) Text classification method based on deep learning
Ji et al. LSTM based semi-supervised attention framework for sentiment analysis
CN113468884A (en) Chinese event trigger word extraction method and device
Xiao et al. A Recursive tree-structured neural network with goal forgetting and information aggregation for solving math word problems

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant