WO2023091144A1

WO2023091144A1 - Forecasting future events from current events detected by an event detection engine using a causal inference engine

Info

Publication number: WO2023091144A1
Application number: PCT/US2021/060139
Authority: WO
Inventors: Nam HUYN
Original assignee: Hitachi, Ltd.
Priority date: 2021-11-19
Filing date: 2021-11-19
Publication date: 2023-05-25

Abstract

Example implementations described herein provide systems and methods for forecasting a target event upon receiving one or more indicator events of interest; feeding the one or more indicator events of interest into a causal inference engine comprising a trained Bayesian Network configured to compute posterior marginal probability distributions of the target event based on causal relationships between the one or more indicator events of interest and the target event from the Bayesian Network; and outputting the posterior marginal probability distribution of the target event.

Description

FORECASTING FUTURE EVENTS FROM CURRENT EVENTS DETECTED BY AN EVENT DETECTION ENGINE USING A CAUSAL INFERENCE ENGINE

BACKGROUND

Field

[0001] The present disclosure is generally related to storage systems, and more specifically, to systems and methods for using causal inference to forecast future events from current events detected in text data sources by an event detection engine.

Related Art

[0002] In general, related art implementations of event prediction relies on analytics based on statistical associations or correlations between current events and a future event.

[0003] Additionally, related art implementations that provide for event extraction from text rely on using distant supervision. Such implementations rely on assumptions that labeled data sets are available for training detection models and that events to be detected belong to well established event classes, which are not adhoc in nature.

SUMMARY

[0004] Given a future event of interest (also referred to as a target event), how can the future event be forecasted accurately using early indicator events? Events under consideration, both future and current events, can be ad-hoc in nature and are typically published in textual forms in a timely fashion in unstructured data sources, such as newsfeeds, emails, social media, business documents, newsletters, and the like. As such, these events need to be extracted from textual documents. The earlier indicator events can be detected and used to forecast target events, the more valuable the resulting forecast will be. High forecast accuracy may depend on deep understanding of causal relationships between events, which can to be learned from data of the events. When quantifying how events occur, it may be desirable to capture these uncertainties. So, for example, instead of stating an event occurs or not (as in propositional logic), it may be more informative to state an event will occur, for example, with a 0.7 probability and it will not occur with a 0.3 probability (as in probabilistic logic).

[0005] Accordingly, example implementations described herein involve systems and methods that provide for event forecasting where events can be arbitrarily defined (also referred to herein as adhoc events) and are not readily extracted. Systems and methods disclosed herein take advantage of causal relationships between events of interest to predict a target event from current events of interest by means of causal inference. Related art implementations do not use causation, instead related art implementations attempt to forecast events through event association or correlation, which leads to less accurate forecasting of future events as compared to a causation approach as disclosed herein. That is, correlation does not imply causation. Furthermore, the forecasting of future events according to the example implementations disclosed herein may be fully explainable based on causal relationships.

[0006] Additionally, example implementations disclosed herein detect and extract events of interest that are ad-hoc in nature. Where events are articulated with sufficient precision in either structured or unstructured data sources, example implementations disclosed herein are configured to automatically extract the events from these data sources.

[0007] Furthermore, causal reasoning may hinge on an ability to explicitly represent causal relationships between events of interests and/or the target event. Example implementations disclosed herein may utilize Bayesian Networks as the framework to represent causal relationships between various events. Bayesian Networks can be explicitly provided by a subject matter expert, and/or discovered from event data made available, for example, through event detection and extraction, according to some example implementations disclosed herein.

[0008] In a case an event of interest is available in unstructured data sources, example implementations disclosed herein automatically detect occurrences of the event in textual documents. The textual documents need not be pre-labeled with event occurrences. The implementations disclosed herein comprise bootstrapping an event extraction process of learning event detection with an initial weak classification model based on knowledge driven natural language processing (NLP) based techniques, followed by fine-tuning a pre-trained deep-leaming-based language model.

[0009] Example implementations described herein also provide for discovery of a Bayesian Network from available event data based on analyzing historical event baskets (e.g., co-occurrences of events of interest elicited for a target event) and applying a structure discovery algorithm to establish causal relationships between events of the event baskets. The historical event baskets may be supplied by event detection and extraction according to some example implementations disclosed herein and/or explicitly provided by a subject matter expert. [0010] Example implementations disclosed herein formulate the forecasting of a target event as an estimation of a marginal posterior probability distribution for the target event given new evidence, such as occurrence of events of interest that are causally related to the target event.

[0011] Aspects of the present disclosure can involve a method for forecasting a target event, the method involving receiving one or more events of interest; feeding the one or more events of interest into a causal inference engine comprising a trained Bayesian Network configured to compute posterior marginal probability distributions of the target event based on causal relationships between the one or more events of interest and the target event from the Bayesian Network; and outputting the posterior marginal probability distribution of the target event.

[0012] Aspects of the present disclosure can involve a system for forecasting a target event, the system involving one or more memories configured to store instructions; and one or more processors coupled to the one or more memories. The one or more processors configured to execute the instructions to: receive one or more events of interest; feed the one or more events of interest into a causal inference engine comprising a trained Bayesian Network configured to compute posterior marginal probability distributions of the target event based on causal relationship between the one or more events of interest and the target event from the Bayesian Network; and output the posterior marginal probability distribution of the target event.

[0013] Aspects of the present disclosure can involve a non-transitory computer readable medium, storing instructions for forecasting a target event. The instructions involving receiving one or more events of interest; feeding the one or more events of interest into a causal inference engine comprising a trained Bayesian Network configured to compute posterior marginal probability distributions of the target event based on causal relationship between the one or more events of interest and the target event from the Bayesian Network; and outputting the posterior marginal probability distribution of the target event.

[0014] Aspects of the present disclosure can involve an apparatus for forecasting a target event, the apparatus involving a means for receiving one or more events of interest; a means for feeding the one or more events of interest into a causal inference engine comprising a trained Bayesian Network configured to compute posterior marginal probability distributions of the target event based on causal relationship between the one or more events of interest and the target event from the Bayesian Network; and a means for outputting the posterior marginal probability distribution of the target event.

BRIEF DESCRIPTION OF DRAWINGS

[0015] FIG. 1 illustrates a flow diagram of an example process for building a causal inference engine for target event prediction, in accordance with example implementations disclosed herein.

[0016] FIG. 2 illustrates a flow diagram of an example process for target event forecasting by means of causal inference, in accordance with example implementations disclosed herein.

[0017] FIG. 3 illustrates example direct causal relationships between example events of interest relevant to an example target event, in accordance with example implementations disclosed herein.

[0018] FIG. 4 illustrates a flow diagram of an example process for building an ad-hoc event detector, in accordance with example implementations disclosed herein.

[0019] FIG. 5 illustrates an example table of curated synonyms for an example root word, according to example implementations disclosed herein, ranked based on pre-trained embeddings.

[0020] FIG. 6 illustrates a flow diagram of an example process for fine-tuning a pre-trained language model to recognize events of interest, in accordance with example implementations disclosed herein.

[0021] FIG. 7 illustrates an example directed acyclic graph from a Bayesian Network that captures statistical independence among a plurality of variables represented by the nodes.

[0022] FIG. 8 illustrates an example conditional probability table, according to example implementations disclosed herein, used for carrying out one causal inference step.

[0023] FIG. 9 illustrates a flow diagram of a causal inference process for target event forecasting as computing posterior marginals, in accordance with example implementations disclosed herein. [0024] FIG. 10 illustrates an example end-to-end system for forecasting future events starting from reading textual data sources, in accordance with example implementations disclosed herein.

[0025] FIG. 11 illustrates an example computing environment with an example computer device suitable for use in some example implementations.

DETAILED DESCRIPTION

[0026] The following detailed description provides details of the figures and example implementations of the present application. Reference numerals and descriptions of redundant elements between figures are omitted for clarity. Terms used throughout the description are provided as examples and are not intended to be limiting. For example, the use of the term “automatic” may involve fully automatic or semi-automatic implementations involving user or administrator control over certain aspects of the implementation, depending on the desired implementation of one of the ordinary skills in the art practicing implementations of the present application. Selection can be conducted by a user through a user interface or other input means, or can be implemented through a desired algorithm. Example implementations as described herein can be utilized either singularly or in combination and the functionality of the example implementations can be implemented through any means according to the desired implementations.

[0027] Further, in the following description, the information is expressed in a table format, but the information may be expressed in any data structure. Further, in the following description, a configuration of each information is an example, and one table may be divided into two or more tables or a part or all of two or more tables may be one table.

[0028] Future events of interest (e.g., target events) can often be forecasted using early indicators (e.g., current events and/or historical events). These early indicators can be recognized if present in structured and/or unstructured data sources. In many situations, early indicators may be located only in published unstructured data sources, such as, but not limited to, newsfeeds, emails, social media, business documents, newsletters, and the like. The ability to accurately forecast target events may be important because it allows planning for occurrence of the target event to avoid catastrophes, to reap profits, etc. Financial events such as future product demand or stock price movements, for example, can be linked to seemingly unrelated precursor events, such as corporate acquisitions or geopolitical events, which are typically first reported in news. Events are, in all likelihood, causally related to each other. To leverage natural causal dependencies, implementation disclosed herein provides for forecasting future events of interest using causal reasoning across early indicator events to infer something about target events. The example implementation disclosed herein overcome two technical hurdles, namely learning causality structure between events, and extracting events from textual sources by machine reading. Since events considered are not restricted to any well-established classes of events, the events are referred to herein as “ad-hoc events.” Ad-hoc events may refer to events that are not predefined events.

[0029] The systems and methods disclosed herein have broad uses in event forecasting, where events can be arbitrarily defined and are not readily extracted in structured form. Use cases range, for example, from forecasting demand as a result of events that are as unlikely as pandemic breakouts, to predicting major financial market corrections as a result of other financial events. An example of a non-limiting advantage of the example implementations disclosed herein is an increased forecasting accuracy in inferring an occurrence of a target event, which is achieved by leveraging true semantic, causal relationships between events, whereas related art methods rely on event patterns that can be ambiguous. Causation is able to represent the concept of “confounding”, which can introduce predictive biases if not corrected. Confounding cannot be described in terms of associations or correlations. As another example non-limiting advantage, forecasting results according to the examples herein are more explainable (and may be fully explainable) than traditional forecast methods due to the use of causal relationships. Forecasting methods and systems disclosed herein are also much more broadly applicable because the methods and systems are uniquely able to deal with ad-hoc events, given that information by-and-large is predominantly contained in unstructured data. Because new information typically first appears in unstructured form and data sources, the systems and methods disclosed herein are able to perform forecasts well before related art implementations are able to act, due to the ability to extract events of interest from these unstructured sources.

[0030] Example implementations disclosed herein provide an end-to-end method for forecasting ad-hoc target events from ad-hoc indicator events (also referred to herein as events of interest). In various example implementations, the target events may be user-defined target events. The implementations herein uniquely combine various artificial intelligence techniques to analyze events of interest (current and/or historical events) and forecast the occurrence of a target event. Example implementations may start with eliciting, from an end user and/or a subject matter expert (SME), a target event the user wishes to forecast and all other events that may be relevant to the target event. For example, the end user may define the target event and the SME may supply events of interest relevant to the target event. Example implementations disclosed herein propagate observed indicator event (e.g., events of interest) occurrences through causal chains that lead to an accurate forecasting of the target event . Accordingly, example implementations herein provide for building a causal inference engine that is configured to estimate a probability distribution of the target event from events of interest.

[0031] The processes and methods disclosed herein may be executed, individually or in combination, in a computing environment, for example, by one or more computing devices for executing programs stored in one or more storage devices. The storage devices may be configured to store the programs and data for executing the programs. The computing devices and storage devices may be connected via wired or wireless communication, such as via a network. In some examples, the computing environment may by implemented as one or more of computing environment 1100 of FIG. 11. End users and subject matter experts (SMEs) may interface directly with the computing environment on which processes disclosed herein are executed and/or via separate computing environments (e.g., separate iterations of computing environment 1100 of FIG. 11) connected to the computing environment via a network or wired connection. For example, end users and/or SMEs may utilize respective user devices, each implemented as an iteration of computing environment 1100 of FIG. 11, to input data and receive output data. Examples of a user device may include, but are not limited to, highly mobile devices (e.g., smartphones, devices in vehicles and other machines, devices carried by humans and animals, and the like), mobile devices (e.g., tablets, notebooks, laptops, personal computers, portable televisions, radios, and the like), and devices not designed for mobility (e.g., desktop computers, other computers, information kiosks, televisions with one or more processors embedded therein and/or coupled thereto, radios, and the like).

[0032] FIG. 1 illustrates a flow diagram of an example process 100 for building a causal inference engine for target event prediction, in accordance with example implementations disclosed herein. Process 100 for building a causal inference engine for a target event comprises multiple phrases.

[0033] At step 110, process 100 executes an event elicitation process. At step 110, the enduser has the opportunity to specify and define the target event of interest in forecasting, for example, an ad-hoc target event. For example, the end user may enter a target event via a user interface, for example, such as the input/user interface 1135 described in connection with FIG. 11. The specified target event may then be stored in a storage device, for example, such as one or more of internal storage 1120, external storage 1145 and/or memory 1115 of the computing environment described in FIG. 11. Additionally, SMEs (e.g., domain experts) may identify other events that may be used as early indicators or deemed otherwise relevant to forecasting (or predicting) the occurrence of the target event. For example, SMEs may input other events (e.g., events of interest) into a user interface via a user interface, for example, such as the input/user interface 1135 described in connection with FIG. 11. The identified event of interest may then be stored in a storage device, for example, such as one or more of internal storage 1120, external storage 1145 and/or memory 1115 of the computing environment described in FIG. 11. The computing environment 1100 used by the end user may be the same computing environment as that used by the SMEs or a separate computing environment. Furthermore, each SME may use the same or different computing environments to supply the events of interest. An example event elicitation process is described in connection to FIG. 3 below.

[0034] At step 120, process 100 executes an event extraction process. For example, the target event has been specified and, based on identified events of interest, event data sources from which events of interest can be extracted are identified. Example event data sources include, but are not limited to, unstructured data sources (e.g., data not having a pre-defined data model or is not organized in a pre-defined manner, such as newsfeeds and the like) or structured data sources. Examples of unstructured data sources include, but are not limited to, newspapers, newsfeeds, news reports, etc. Event data may be extracted from textual sources as well as image and video data sources, audio data sources, etc., from which text may be extracted through a manual, automated or semiautomated process. For example, in the case of audio data, the audio may be transcribed into text (either external to the example systems disclosed herein or are part of the event extract process), and an event extracted from the transcribed text. Similarly, in the case of image data, a scanned image of a textual document may be converted to textual data using optical character recognition process and events extracted from the recognized text. At step 120, based on identified events of interest, process 100 retrieves digitized versions of the data and utilizes data scraping techniques to parse through identified data sources and extract events of interest and data representing the events of interest. [0035] Event detection and extraction from textual documents may require building an event detection model, for the purpose of building a training set out of historical event data. The event detection model may be started by developing a knowledge-driven weak classification model (also referred to herein as a weak sentence classifier), from which a labeled event dataset can be built, and gradually bootstrapping into a deep neural network language model, for example as described below in connection to FIGS. 4 and 6. Extracted event data will be collected into a training dataset, referred to herein as event baskets.

[0036] At step 130, the process 100 determines whether data has been extracted for each event of interest identified by the SME. In various examples, data of all events can be extracted from some sources, regardless as to whether the data is confirmatory or not that the event of interest occurred, directly related to the event of interest, ancillary to the event of interest, etc. Thus, data or information of any type and semantic context is extracted for each event of interest. If data is not available for all identified events, the process returns to step 110. At this point, the user may confirm, revise, or update the target event and/or SMEs may revise and/or update the identified events of interest such that at least a data source is located for each event. If data is available for all identified events, the process 100 proceeds to step 140.

[0037] At step 140, process 100 executes an event causal structure discovery process. An example of an event causal structure discovery process may include analyzing event baskets to discover causal relationship(s) between events in the dataset. In an example implementation, the event causal structure may take the form of a Bayesian Network, which may be illustrated, for example, as a directed acyclic graph (DAG) whose nodes represent events, sink node represent the target or future events, and whose edges represent direct causal relationships. From the discovered structure, causal network between events can be learned, for example, as described in connection to FIGS. 7 and 9. An example of an event causal structure discovery is described in connection with FIGS. 3 and 7 below.

[0038] At step 150, process 100 executes a causal structure parameters estimation process. A DAG by itself may not be sufficient to support causal inference. For that, strength of a node’ s dependency on its parent node(s) may be quantified. An example of such quantification may be referred to as Bayesian Network parameters, which are estimated and trained from the event baskets, as described below in connection with FIGS. 7 and 9. For example, as detailed below, causal relationships may be used to define a structure of a Bayesian Network. Thus, the Bayesian Network may be completely trainable based on the collecting training data (e.g., the causal relationships among other data) into event baskets.

[0039] FIG. 2 illustrates a flow diagram of an example process 200 for target event forecasting by means of causal inference, in accordance with example implementations disclosed herein. Process 200 may start, for example, based on input of a target event from a user.

[0040] At step 210, process 200 executes an event monitoring process. For example, from the data sources identified in step 110 and 120 of FIG. 1, occurrences of early indicator events or other relevant events (e.g., events of interest) are detected and extracted from data sources. Process 210 may be performed on a periodic basis, according to a preset period. Extracted events may be fed into the Bayesian inference process (e.g., FIG. 1) as new evidence used as inputs to the Bayesian inference process. That is, while process 200 may utilize data sources and events of interest as determined from steps 110-120 of FIG. 1, the event monitoring may output extracted events that may be fed into the causal inference engine as set forth in FIG. 1, to infer something about the target event.

[0041] At step 220, process 200 executes a causal inference process. For example, an inference is executed on the Bayesian Network (for example, as described in FIG. 1) to compute a marginal probability distribution for the target event.

[0042] Additional details regarding the steps of processes 100 and 200 are provided below, for example, in connection with FIGS. 3-10.

Event Elicitation

[0043] Example implementations provided herein forecast any future events (e.g., target events), as well as any early indicator events that can influence the target events and/or any other events that are otherwise relevant to the target events (collectively referred to as events of interest). The end-user, with the help of subject matter experts, may specify and define target events and/or events of interest, for example, at step 110 of FIG. 1. In the cases where an event type is well-established by the general community and their occurrences curated in structured data sources, off-the-shelf event structured data sources may be utilized. But in most cases, events are arbitrarily defined and will be expressed only in unstructured data sources. Such events may be referred to as “ad-hoc” events. Target events and/or events of interest may be ad-hoc events and/or well established events. [0044] As an example, suppose a user wants to forecast demand for drilling pipes that can be used in oil and gas industry. Events of interest may be elicited from the SME, which may provide a list of events that may impact or that are otherwise relevant to the Pipe Demand (e.g., the example target event). This list does not need to be exhaustive nor sound: in case the relevance of an event is unknown, it may be included in the list during the first (e.g., brainstorming) stage, to the extent not to make the list excessively long. Irrelevant events may be filtered out at a later stage.

[0045] FIG. 3 illustrates a graph 300 of an inventory of events elicited by our hypothetical SME’s. That is, FIG. 3 illustrates graph 300 as an example of events of interest that may be relevant to the example target event, “Pipe Demand”. Each node of the graph is illustrative of an event of interest and each directed edge, which represent direct causal relationships between event nodes, are shown here for illustration purposes only. In general, causal relationships need not be obvious. So, instead of relying on the SME to specify them, a more scalable and objective approach may include discovering relationships between them from the data extracted from the data sources (e.g., deriving therefrom).

Event Extraction

[0046] Traditional machine learning or deep learning may train a machine to recognize something (e.g., a cat) from data (e.g., image data) and return that the machine recognize the taught object in the data (e.g., returns that the machine recognized a cat in the data). However, example implementations disclosed herein are training to recognize the occurrence of a given event of interest in text data sources. In the traditional examples, the machine learning starts with labeled training datasets that teach the machine how to recognize an object in the data. Whereas, according to the implementation disclosed herein there need to be a labeled training dataset to start with since the events are ad-hoc and unknown prior to initiating the machine learning techniques disclosed herein.

[0047] Before analyzing events of interest, the events need to be identified and extracted, for example, at step 120 of FIG. 1. Data sources need to be identified, including structured and unstructured textual sources, where data of events of interest is expected to reside and where events can be extracted. In various scenarios, an event may not be in a structured data source, such as databases. For such an event, example implementations described herein provide for building a model to detect events in unstructured textual documents. In some examples, ad-hoc events of interest may be expressed at the sentence level in a text document. For these cases, given a textual document such as a news report or news feed, detecting an event in the document can be reduced to classifying sentences in the document as either expressing the event or not.

[0048] Training a machine learning model to classify sentences may require labeled data which may not exist, especially when the event to be detected is ad-hoc in nature. This lack of labeled reference data may be referred as “Small Data ML Problem,” which requires innovative solutions such as those presented herein. To detect ad-hoc events in text documents, example implementations disclosed herein provide a hybrid method (also referred to as a hybrid model) that combines both knowledge-driven and data-driven techniques, for example, as shown in FIG. 4.

[0049] FIG. 4 illustrates a flow diagram of an example process 400 for building an ad-hoc event detector, in accordance with example implementations disclosed herein. The process 400 may provide for building up a labeled data set for bootstrapping the construction of the event detector (e.g., step 120 of FIG. 1). Process 400 may be performed for each event of interest. That is, execution of process 400 provides for building an event detector for a given event.

[0050] Process 400 starts with collecting a corpus of textual documents from a plurality of data sources, step 410. The data sources may be those sources identified in step 120 of FIG. 1. The textual documents may include historical datasets of documents where events of interest can be located (e.g., news reports, news feeds, articles, etc.), which may be stored in a digital format accessible via a web based interface.

[0051] At step 420, given a textual document, the document is segmented into sentences. In various examples, segmentation into sentences may include pre-processing tasks to distinguish between a “period” used as punctuation from a “period” used as decimal point in numbers. Example pre-processing for this distinction may include detection using, for example but not limited to, NER (Named-entity recognition) supported by tools such as spaCy or Stanza NLP Toolkit. Pre-processing may also be used to distinguish between a “period” used as punctuation from a “period” used for abbreviations, such as “U.S.” Example pre-processing for this distinction may include using, for example but not limited to, NLTK Punkt sentence tokenizer. Output from step 420, for a given textual document, may be unlabeled sentences. [0052] At step 430, each sentence is classified as including a given event of interest or not. Step 430 may be performed by a weak classification model (also referred to herein a weak sentence classifier), and may be performed without labeled data. The term “weak” as used herein refers to low accuracy in the classification process (e.g., sometimes the classifier may fail to detect an event in a sentence). Without a large quantity of labeled data, machine learning techniques may not be useful. Thus, process 400 begins with developing the weak classification model, driven by knowledge about NLP (natural language processing) models of how an event could be expressed in a sentence. The weak classification model, in various examples, may be based on heuristics.

[0053] As an example, “Crude Oil Futures surge” may represent an event of interest. Components of this event can be expressed taking the following heuristics into account:

• Crude oil is not the same as oil, so “crude” needs to be present in the sentence. Also, crude oil is often referred as West Texas Intermediate (WTI) crude and/or Brent crude, both of which express the market where these commodity futures are traded.

• Futures are contracts to deliver an asset in some future month at some price. Futures are often expressed in a sentence with mention of the delivery month.

• A surge in commodity price is often expressed as a percentage increase.

• An increase can be expressed in so many forms, so a dictionary of synonyms for the concept can be utilized, which include for example words like surge, jump, go up, spike, rise, etc. For each concept involved in an event, a dictionary of words that are similar in meaning to the root word is built. For example, building the dictionary may start with identifying words with embeddings that are similar (e.g., vector representation of other words that are closest to the word analyzed), for example using Cosine Similarity. Pre-trained embeddings may be provided by, for example but not limited to, GloVe (Global Vectors for Word Representation) from Stanford and/or Google News embeddings from Google. However, this similarity sometimes reflects the cooccurrence of words and not necessarily synonyms. Accordingly, the dictionary needs to be curated. The table shown in FIG. 5 illustrates an example of curated synonyms, which may be used to recognize the many different ways a concept can be expressed using words. For illustrative purposes, FIG. 5 depicts curated synonyms for the root word “rise” ranked according to a similarity score, based on Google News embeddings, where deleted entries (shown as strike throughs) are not similar to the root word in meaning even though the similarity score according to the pre-trained embeddings is relatively high. Another choice of pre-built synonyms dictionary from the NLTK WordNet library that originated from Princeton, but curation is still needed. The curated dictionary is a list of entries that pair words together with a measure of similarity with the root word.

[0054] Event detection can then be formulated as semantic keyword matching according to a set of rules or conditions. For this, executable search queries may be utilized to detect events. For the running example of “Crude Oil Futures surge”, an example search query (e.g., set of conditions), in conjunctive normal form, may look like:

([crude, 1 .0] OR [wti, 1 .0] OR [brent, 1 .0]) AND

([futures, 1 .0] OR [contract, 0.9] OR [delivery, 0.8]) AND

[surge, 0.5]

[0055] Besides the logical combinations illustrated above, a subquery (e.g., [surge, 0.5] for instance) specifies to find a match in the curated dictionary (e.g., the table of FIG. 5 in this example) for “surge” that has a similarity score of at least 0.5.

[0056] Additionally, in various examples, event attributes such as how much (e.g., percentage figures) and when (e.g., dates) can be extracted using a NER pre-trained model.

[0057] For some simplistic events, a semantic keyword matching may be sufficient. For others, the model may be required to recognize more complex phrases, such as bigrams like “White House” (which, in this example, has a very different meaning than the combination of white color with a dwelling), and analyze more complex sentence structures using part-of- speech (POS) tags and dependency tree techniques.

[0058] At step 440, the process 400 may determine whether or not the weak classification model is accurate to a preset threshold. For example, the threshold may be 90% accuracy in the classification of sentences and at step 440 process 400 determines the weak classification model is sufficiently accurate if the classification is correct 90% of the time. Conversely, if the weak classifier is not accurate 90% of the time (e.g., 70%), step 440 may determine the weak classification model is not sufficiently accurate. A threshold of 90% is provided as an example, and the actual threshold may be set as desired, for example, based on complexity of the event of interest and/or quantity of data sources for which the process is executed.

[0059] As noted above, the weak classification model may not be accurate enough and return to step 430, where misclassified sentences may be analyzed and this error analysis used to fine-tune the weak classification model. In some implementations, the accuracy of the weak classifier may be determined by comparing its predictions with a labeled dataset. When starting with an unlabeled dataset, the classifier may be applied to predict labels and these predictions may be manually verified. Note that the goal is to not only improve a weak classifier but building up a labeled dataset. Bootstrapping may include of alternating between the preceding tasks until a labeled dataset is generated that is large enough to fine-tune a pre-trained language model (e.g., classifier) at step 460. Note that between iterations, data that has been labeled may not be labeled again. Only new data will need to be labeled. Fine-tuning herein may include updating the rules (e.g., knowledge) used in implementing the weak classifier, with the goal to improve classification accuracy.

[0060] At step 430, by analyzing sentences that are misclassified, the model may be configured to determine how the errors are made, reclassify the sentence, and use this error analysis to fine tune the structure of the weak classification model (the rules used to implement the weak classifier for example) and improve classification accuracy. The knowledge-driven approach provides an improvement over data-driven techniques by making it easier to carry out error analysis and correction. Note that the goal of developing a sufficiently accurate weak classifier is to help create a labeled dataset that is sufficiently large to train a deep-learningbased classifier in step 460. Otherwise, if the weak classification model is sufficiently accurate, the process proceeds to step 450 and uses the weak classification model to label sentences of the textual document as including the event of interest or not. In some examples, recognition that a sentence has been misclassified may be performed by the end-user and/or SME upon reviewing the classifications output by process 400.

[0061] Steps 420 through 440 are executed for each textual document extracted from data sources. Furthermore, for each textual document, steps 430 through 450 are executed on each sentence contained in each textual document. Steps 420 through 450 may be executed sequentially on each sentence and/or document, or executed in parallel. That is, for example, steps 430 through 450 may be executed on a given sentence of a given document in parallel with steps 430 through 450 being executed on one or more other sentence of the given document. Similarly, steps 420 through 450 may be executed on a given document in parallel with steps 420 through 450 being executed on one or more other documents.

[0062] Step 450 may generate a labeled dataset that is of sufficiently quality and output the labeled dataset to step 460. The labeled dataset may comprise sentences that are labeled with their class (e.g., that the sentence expresses a given ad-hoc event or not). In some implementations, this labeled dataset might not be large enough to train a deep neural network model from scratch, but may be large enough for transfer learning or fine-tuning a pre-trained language model (e.g., step 460) that can recognize sentence structures and concepts in sentences and perform a variety of natural language downstream tasks on sentences. As used herein, a pre-trained language model may also be referred to as a deep-learning based language model. The labeled dataset from step 450 is used to fine-tune the pre-trained language model to perform a new downstream task of classifying a recognized structure, per the pre-trained language model, as expressing the event of interest for which the event detector is built according to process 400. Example pre-trained language models include, but are not limited to, Bidirectional Encoder Representations from Transforms (BERT) model or the like. BERT, for example, BERT-BASE, is an advanced language model pre-trained on unlabeled data (consisting of pairs of consecutive sentences) using self-supervised learning to perform two pre-training tasks: predicting tokens that have been masked and predicting next sentence. BERT was pretrained on a very large document corpus. BERT has been shown to outperform other state-of-the-art models on various downstream tasks. For a given downstream task, BERT needs to be fine-tuned (conceptually analogous to transfer learning) using labeled data for that task. Accordingly, some implementations disclosed herein provide for fine-tuning BERT to recognize events of interest using the generated labeled dataset output from step 450, while other implementations may fine-tune different pre-trained language models. That is, the example implementation disclosed herein are not limited to BERT, but can be used with any pre-trained language model, with BERT referred to herein as an illustrative example. That is, at step 460, the labeled dataset is used to fine-tune a pretrained model (e.g., BERT) to classify an input sentence as expressing our ad-hoc event or not.

[0063] Example implementations disclosed herein may be described under the conception that an event of interest can be expressed in a single sentence. However, the implementations disclosed herein are not limited to one sentence per label (e.g., expressing the event). Implementations disclosed herein may be configured to analyze one or more sentences as a grouping of sentences, determined that the group of sentences express the event of interest, and classify the group of sentences accordingly.

[0064] FIG. 6 is a flow diagram of process 600 for fine-tuning a pre-trained language model (such as BERT) to recognize a specific ad-hoc event, in accordance with example implementations disclosed herein. Process 600 is an example of step 460 of FIG. 6. FIG. 6 shows inputting an initial labeled dataset 610 (e.g., as output from step 450 of FIG. 4) into a pre-trained language model. The initial labeled data set may comprise sentences and class label pairs, for example, (Sentence, 0), where the class is indicative of the sentence expressing the event or not. As described above, an illustrative example of a pre-trained language model is BERT or BERT-BASE; however, implementations disclosed herein are not to be limited to utilizing BERT or BERT-BASE, any pre-trained language model may be applicable.

[0065] At step 620, the initial labeled dataset 610 is fed to the pre-trained language model without the labels. The pre-trained language model then generates an internal representation of the structure of an unlabeled sentence, which is fed to a downstream machine-leaming-based event classifier. This classifier is fine-tuned by comparing its predictions at step 640 with the gold standard labels that come with the labeled data 610. Once the fine-tuning is complete, the process 600 becomes our data-driven deep-learning-based event classifier that takes, as input, a new unlabeled sentence 610, generates an internal representation of the unlabeled sentence 610 at the pre-trained language model 620, classifies the internal representation at classifier 630, and outputs a label 640 for the unlabeled sentence 610. The label 640 indicates whether the input sentence expresses the given event of interest of not.

[0066] These newly computed labels from output 640 can be further validated, for example, using either weak classification models as set forth above or manually, thereby creating a larger curated labeled dataset. This resulting dataset can in turn be fed back into the pre-trained language model (having been fine-tuned using the initial labeled dataset) to further fine-tune pre-trained language model a second round and further subsequent rounds. The repeated feeding back of validated labeled datasets gradually bootstraps the event detection model into an increasingly more accurate model.

[0067] To step back a bit, a purpose of building event detectors is to be able to detect events of interests in text documents, commonly sourced from unstructured sources. Detected events may be collected into “event baskets”, that is, tuples of the form: (E_T, E₁, E₂, E_N) Eq. 1

[0068] where ET represents the target event of interest, and, for each value of i from 1 to N, Ei represents an upstream event of interest elicited to have a potential causal impact on ET. E’s may be thought of as random variables, which can be binary (True or False) or categorical (e.g., Up, Down, or Unchanged). Returning to the running example, an example event basket would look like:

(pipe demand Up, ... , crude oil future Up, ... ) Eq. 2

[0069] where “pipe demand Up” represents an ET and “crude oil future Up” represents an Ei.

[0070] Because example implementations herein are configured for learning causal relationships between events, there may be timing constraints that govern when events of interest are to take place. For example, a time window for when the target event should happen is expected to be after a time window of when values for each Ei are extracted, such that the target event occurs after the events of interest. Furthermore, because the model disclosed herein does not know how each of the Ei are causally related to each other, the time window for each Ei should not be too wide. That is, the target event is expected to occur within a set time window of each event of interest. However, the time windows may be based on the target event and events of interest, such that one target event may have a first time window that is appropriate for the target event while another target event may correspond to a different time window.

Event Causal Structure Discovery

[0071] Given a target event (ET), to predict events of interest (Ei) that may impact ET or that are otherwise relevant to ET, causal relationships among events may be modeled, for example, using Bayesian Networks, which are a class of probabilistic models that can support causal reasoning. FIG. 3 above illustrates an example of a DAG (directed acyclic graph) structure corresponding to a Bayesian Network according to example implementations disclosed herein. In the DAG shown in FIG. 3, sink nodes at the bottom of the graph 300 represent potential target events ET (e.g., Pipe Demand in the running example) or other events that may be impacted (e.g., Permit Applications), and each directed edge represents a direct causal relationship between two event nodes (e.g., inter-dependence between nodes). [0072] Each edge may be interpreted in a number of different ways besides direct causal relationships. For example, edge nodei node2 can be interpreted as:

• Node₁ directly causes node2;

• Understanding of nodei directly influences understanding about node2; and/or

• Node2 functionally depends on Node₁ together with any other nodes that directly point to node2.

[0073] A node, in Bayesian Networks, may be formally represented by a variable assumed to be categorical, e.g., one that may take two or more discrete values. In a Bayesian Network, all the above interpretations of the DAG have a set of statistical independence assumptions that are in common, called Markovian assumptions, that state “Every variable is conditionally independent of its non-descendants given its parents.”

[0074] For the sake of simplicity and illustrative purposes only, consider a subgraph of the DAG shown in FIG. 3, without descriptive labels, and assume that this simplified Bayesian Network satisfies all the Markovian assumptions. This simplified DAG 700 is shown in FIG.

7, where nodes A, B, C, E, and R represent nodes without descriptive labels and edges illustrating the independence assumptions therebetween.

[0075] The following lists all Markovian independence assumptions that implicitly hold in the example DAG 700: a) C is independent of B, E, R given A b) R is independent of A, B, C given E c) A is independent of R given B, E d) B is independent of E, R e) E is independent of B

[0076] The statements a)-c) express conditional independence, and the statements d) and e) express marginal independence. [0077] While SME’s can rely on their causal perceptions to construct a DAG, it would be tedious to do so in order to satisfy all the Markovian assumptions stated above, especially when the DAG involves a large number of nodes. A more scalable and objective approach would be to discover the edges from extracted data. The following disclosure provides algorithms capable of graph structure discovery by analyzing event baskets collected from historical data sources, such as historical newsfeeds, as described above in connection to FIGS. 1 and 2. Recall event baskets are of the form:

(V_T, V₁, V₂ , ..,V_N) Eq. 3

[0078] Where V’s are used in place of E’s for illustrative purposes only. Each Vi may be substantially the same as a given Ei referred to above. For illustrative purposes, these baskets may be represented as rows in a relational table with each column having a value of a respective Vi. Assume, that each Vi is a categorical variable and example implementations herein discover how each Vi might be causally related to each other. That is, implementations disclosed herein construct a DAG that satisfies the Markovian assumptions of independence among all random variables defined in the event baskets. To systematically discover the underlying causal relationships, a variant of the PC-algorithm may be used. The PC-algorithm, at its core, relies on testing conditional independence between variables. The PC-algorithm has 2 phases: a skeleton phase and an orientation phase.

[0079] For the skeleton phase, the goal is to construct an undirected graph. To do this, the PC-algorithm starts with a complete graph G with N nodes that correspond to the number of variables (e.g., number of Vi’s in the event basket). At stageo, the algorithm removes any edge Vi ->Vj where Vi and Vj are marginally independent. At stagei, the algorithm marks any edge Vi ->Vj for removal when there is a set S of one node (other than Vi and Vj) that is connected to Vi such that Vi and Vj are conditionally independent given S, and records SEPSET(Vi, Vj) = SEPSET(Vj, Vi) = S. At the end of stagei, all edges are removed that have been marked for removal. At stage2, the same procedure as stage 1 is executed, except that S is now a set of two nodes. This process repeats until stager, where no such set S of k nodes exists. In some examples, the process may be stopped at k = 3, to reduce monopolizing computation resources; however, any k value may be utilized as desired. The algorithm then returns the resulting undirected graph G, and SEPSET. Note that, in some examples, independence testing can be done using X² or G² statistics. [0080] For the orientation phase, the goal is to assign a direction to every edge of the undirected graph G from the skeleton phase. The orientation phase may apply the following four rules: 1. For every triplet of variables (Vi, Vj, Vk) such that Vi and Vj and Vj and Vk are adjacent in G, but Vi and Vk are not, orient Vi − Vj − Vk as Vi → Vj ← Vk if Vj ∉ SEPSET(V_i, V_k). 2. Orient V_j −V_k as V_j →V_k if there is a directed edge V_i →V_j such that V_i and V_k are not adjacent in G. 3. Orient Vi−Vk as Vi → Vk if there is a directed path Vi →Vj →Vk. 4. Orient Vi − Vj as Vi → Vj whenever there are two directed paths Vi − Vk → Vj and V_i−V_l→V_j, such that V_k and V_i are not adjacent in G. Build Event Inference Model [0081] A DAG constructed as set forth above, which satisfies all Markovian assumptions, represents the structure of a Bayesian Network (e.g., also referred to herein as an event causal structure). The strength of the causal relationships in the DAG may need to be quantified. The quantification of causal relationship may be referred to as network parameters of the Bayesian Network which, together along with the graph structure and Markovian assumptions, allows execution of event causal inferences. In an example, network parameters may be provided, for every variable X (e.g., every node) in a given DAG and its parent nodes U, as conditional probabilities Pr ( ^^| ^^) for every value x of an X node and every value combination u of its parent nodes U. That is, with reference to FIG.7, the network parameter for C may be the probability of C occurring, given A; the network parameter for A may be the probability of A occurring, giving B and E; etc., for each node in the DAG. [0082] For the DAG 700 shown in FIG. 7, for example, the following conditional probabilities need to be provided:

[0083] FIG. 8 gives an example of the conditional probabilities Pr( ^^ | ^^) required for variable C. [0084] Conditional probabilities tables (CPTs) such as FIG. 8, may be needed for the inference for each node (e.g., inferring the occurrence of a node, given its parent nodes) according to the examples disclosed herein. They can be easily estimated from the data, such as the event baskets described herein.

[0085] Based on the data, a Bayesian Network is provided that comprises:

• A DAG whose nodes represent the events of interest, denoted as variables; and

• Network parameters, which specify for each variable X, a CPT mathematically denoted as Pr(X | parents(X))

[0086] This Bayesian Network model will be used for causal inference about distribution of any network variable.

Future Event Forecast By Means Of Causal Reasoning Over Detected Early Indicator Events

[0087] Bayesian Networks support computation of many types of inferences. An example inference type computes Marginal Probability Distributions of a subset of variables in the network, called query variables (e.g., inferring something about the query variables). In the running example, the Bayesian Networks described above may be used to computer a marginal probability distribution for the target event (e.g., “Pipe Demand”). In this scenario, there is one query variable, namely “Pipe Demand”. Marginal probability distributions can be computed with or without additional evidence: in the latter case, a prior marginal probability distribution is computed on the query variables; in the former case, a posterior marginal probability distribution is computed on the query variables.

[0088] Example implementations disclosed herein are configured to estimate the probability distribution of the target event (e.g., “Pipe Demand” in the running example). As will be described herein, without using additional event observation data, a prior marginal probability distribution for “Pipe Demand” can be inferred using a Bayesian Network. By contrast, if additional information is provided, such as newly observed events, this new information can be leveraged to arrive at a better estimate for the “Pipe Demand” target event (e.g., the posterior marginal probability distribution). The input to the prediction problem may be a collection of events. In some embodiments, the input may be the result of event detection and extraction as described in connection with FIGS. 1, 2, 4, and 6. Given new event observation information as inputs, referred to herein as evidence, the forecasting problem can be framed as computing posterior marginals, that is, the marginal probability distribution conditioned on the evidence.

[0089] FIG. 9 is a flow diagram of a causal inference process 900 for target event forecasting as computing posterior marginals, in accordance with example implementations disclosed herein. Process 900 may be an example forecasting a target event by computing a marginal probability distribution as described above. As described above, process 900 comprises building a causal inference engine 930 for a target event (e.g., query variable) from a Bayesian Network 920 (e.g., DAG and network parameters) and inputting evidence 910 into the causal inference engine 930, which calculates posterior marginal probability distributions 940 for the given query variable.

[0090] Before giving an example formulation for the different types of inferences involved, some notation may need to be defined:

• Let V be all the variables in the Bayesian Network.

• Let Q be the query variables, a subset of V.

• For any variable X in E, let Parents(X) be the variables that directly point to X in the DAG.

• Let E be the evidence variables, for which actual values e have been observed.

• The reduction of Pr(V) given evidence e, denoted Reduction(Pr(V), e), is defined to be the subset {Pr(V) where v agrees with e}.

[0091] From the above, example formulations for various types of inferences that are relevant are set forth as follows.

[0092] Joint probability distribution, Pr(V). By repeatedly applying the chain rule for Bayesian Networks, it can be shown that:

[0093] In this formula, Pr(X | Parents(X)) can be recognized to be the CPT (e.g., network parameter) for X. [0094] Prior marginal probability distribution on the query variables, Pr(Q) can be obtained by summing out non-query variables from the joint probability distribution:

[0095] Joint marginal probability distribution, Pr(Q, e), can be obtained by:

[0096] Posterior marginal probability distribution on the query variables, Pr(Q | e), can be obtained as shown in Eq. 8 below. The posterior marginal probability distribution can be obtained from the joint marginal probability distribution (Pr(Q, e) ) by normalizing the latter entries to 1 :

[0097] A class of algorithms, called variable elimination algorithms, may be used to compute prior marginal probability distributions. As another example, Junction Trees algorithms may be used, which are significantly more efficient than variable elimination algorithms and continue to evolve.

[0098] To summarize the implementations disclosed herein, FIG. 10 illustrates an example system 1000 for forecasting future events, in accordance with example implementations disclosed herein. System 1000 may be illustrative of an end-to-end system according to the example implementation disclosed herein. System 1000 may comprise an event detection engine 1030 and a causal event inference engine 1050.

[0099] According to an example implementation, the event detection engine 1030 may comprise one or more fine-tuned language models 1020 configured to detect and extract events of interest from input textual documents based on a target event and events of interest, for example, as described in connection to FIGS. 1-6. That is, for each event of interest elicited for a given target event, a fine-tuned language model 1020 is built as described above in connection to FIGS. 1-6. Thus, for example, if there are 10 events of interest, 10 models 1020 would be built and used to construct the event detection engine 1030. Each fine-tuned language model 1020 may be, for example, a pre-trained BERT that is fine-tuned for detecting a given event of interest, based on an initially labeled dataset output by a weak classification model. The event detection engine 1030 may scrape through data sources (such as newsfeeds, blogs, etc.) to identify unlabeled documents and apply the fine-tuned language model(s) 1020 to identify and extract information relevant to event(s) of interest 1035.

[0100] The identified events of interest 1035 (e.g., new events, such as Crude Futures UP for the running example) are output by the event detection engine 1030 and fed into the causal inference engine 1050. The causal inference engine 1050 may be built from a trained Bayesian Network, for example, as described above in connection with FIGS. 7 and 9 according to elicited target event and events of interest. Event(s) of interest 1035 are fed into the trained Bayesian Network as nodes of the DAG used in the causal inference engine 1050, which may then calculate a posterior marginal probability distribution 1060 for the given target event. As shown in FIG. 10, the posterior marginal probability distribution for the running example indicates that Pipe Demand will be UP with a 0.7 probability and will be down with a 0.3 probability (as in probabilistic logic) when the event of interest “Crude Futures UP” is detected.

[0101] In various implementations, the causal inference algorithm disclosed herein is capable of accepting evidence of one or more events detected. Furthermore, in some implementations, alone or in combination, queries that involve multiple variables are also supported by the causal inference algorithm disclosed herein.

[0102] In some examples, the causal inference engine 1050 may be implemented independent of the event detection engine 1030. For example, where events of interest are retrieved from structured data sources, events of interest are pre-known, and/or events of interest are recognized without a need for the event detection engine (e.g., events of interest is readily recognized in a document), the event detection engine 1030 may be skipped and the events feed directly to the causal inference engine 1050 for use in calculating the posterior marginal probability distribution.

[0103] Similarly, the event detection engine 1030 may be utilized independent of the causal inference engine. For example, the event detection engine 1030 may be utilized to identify events of interest separate from forecasting a target event. Events of interest that are detected by the event detection engine 1030 can even be used in other event forecasting engines that do not use causal reasoning.

[0104] The system 1000 may be implemented, for example, in a computing environment such as that described in FIG. 11. The event detection engine 1030 and/or causal inference engine 1050 may be modules implemented as a collection of instructions (e.g., as software) stored in a memory and executable by a processor(s). The event detection engine 1030 and causal inference engine 1050 may be separate modules or a single module, and may be broken down into a plurality of modules individually executable.

Computing Environment

[0105] FIG. 11 illustrates an example computing environment with an example computer device suitable for use in some example implementations, such as the system 1000 of FIG. 10. Computer device 1105 in computing environment 1100 can include one or more processing units, cores, or processors 1110, memory 1115 (e.g., RAM, ROM, and/or the like), internal storage 1120 (e.g., magnetic, optical, solid-state storage, and/or organic), and/or I/O interface 1125, any of which can be coupled on a communication mechanism or bus 1130 for communicating information or embedded in the computer device 1105. I/O interface 1125 is also configured to receive images from cameras or provide images to projectors or displays, depending on the desired implementation. In some example implementations the VO interface 1125 may be an example of a means for receiving elicited events, including target events and/or events of interest. In some example implementations, alone or in combination, the I/O interface 1125 may be an example of a means for feeding the elicited events into the processor(s) 910, internal storage 1120, and/or memory 1115, which is then fed into a causal inference engine in accordance with the examples disclosed herein. In some example implementations, alone or in combination, the I/O interface 1125 may be an example of a means for outputting a posterior marginal probability distribution(s) of the target event determined by the processor(s) 910 in accordance with the examples disclosed herein. In some examples, the bus 1130 (separately or in combination with the I/O interface 1125) may also be an example of means for receiving, feeding, and/or outputting as described above. In some example implementations, a computer device 1105 may be used to implement the event detection engine 1030 and another computer device 1105 used to implement the causal inference engine 1050. In another example, a common computer device 1105 may be used to implement both engines.

[0106] Computer device 1105 can be communicatively coupled to input/user interface 1135 and output device/interface 1140. Either one or both of the input/user interface 1135 and output device/interface 1140 can be a wired or wireless interface and can be detachable. Input/user interface 1135 may include any device, component, sensor, or interface, physical or virtual, that can be used to provide input (e.g., buttons, touch-screen interface, keyboard, a pointing/cursor control, microphone, camera, braille, motion sensor, optical reader, and/or the like). For example, the input/user interface 1135 may provide an interface for receiving inputs from users and SMEs, such as to input a target event and event elicitation at step 110 of FIG. 1. Output device/interface 1140 may include a display, television, monitor, printer, speaker, braille, or the like. In some example implementations, input/user interface 1135 and output device/interface 1140 can be embedded with or physically coupled to the computer device 1105. In other example implementations, other computer devices may function as or provide the functions of input/user interface 1135 and output device/interface 1140 for a computer device 1105.

[0107] Examples of computer device 1105 may include, but are not limited to, highly mobile devices (e.g., smartphones, devices in vehicles and other machines, devices carried by humans and animals, and the like), mobile devices (e.g., tablets, notebooks, laptops, personal computers, portable televisions, radios, and the like), and devices not designed for mobility (e.g., desktop computers, other computers, information kiosks, televisions with one or more processors embedded therein and/or coupled thereto, radios, and the like).

[0108] Computer device 1105 can be communicatively coupled (e.g., via VO interface 1125) to external storage 1145 and network 1150 for communicating with any number of networked components, devices, and systems, including one or more computer devices of the same or different configuration. Computer device 1105 or any connected computer device can be functioning as, providing services of, or referred to as a server, client, thin server, general machine, special-purpose machine, or another label.

[0109] I/O interface 1125 can include but is not limited to, wired and/or wireless interfaces using any communication or I/O protocols or standards (e.g., Ethernet, 802.1 lx, Universal System Bus, WiMax, modem, a cellular network protocol, and the like) for communicating information to and/or from at least all the connected components, devices, and networks in computing environment 1100. Network 1150 can be any network or combination of networks (e.g., the Internet, local area network, wide area network, a telephonic network, a cellular network, satellite network, and the like).

[0110] Computer device 1105 can use and/or communicate using computer-usable or computer readable media, including transitory media and non-transitory media. Transitory media includes transmission media (e.g., metal cables, fiber optics), signals, carrier waves, and the like. Non-transitory media includes magnetic media (e.g., disks and tapes), optical media (e.g., CD ROM, digital video disks, Blu-ray disks), solid-state media (e.g., RAM, ROM, flash memory, solid-state storage), and other non-volatile storage or memory.

[0111] Computer device 1105 can be used to implement techniques, methods, applications, processes, or computer-executable instructions in some example computing environments. Computer-executable instructions can be retrieved from transitory media, and stored on and retrieved from non-transitory media. The executable instructions can originate from one or more of any programming, scripting, and machine languages (e.g., C, C++, C#, Java, Visual Basic, Python, Perl, JavaScript, and others).

[0112] Memory 1115 can be configured to store one or more programs, such as Operating System (OS), Hypervisor, and applications. Memory 1115 may be configured to store instructions for executing the event detection engine 1030 and/or the causal inference engine 1050 of FIG. 10. In various implementations, the memory 1115 may be configured to store instructions for performing process 100 of FIG. 1; process 200 of FIG. 2; process 400 of FIG. 4; process 600 of FIG. 6; and process 900 of FIG. 9. One or more of internal storage 1120 and external storage (if applicable) may be configured to store the configuration table 115, mapping table 112, and the data 113 of FIG. 2.

[0113] Processor(s) 1110 can execute under any operating system (OS) (not shown), in a native or virtual environment. One or more applications can be deployed that include logic unit 1160, application programming interface (API) unit 1165, input unit 1170, output unit 1175, and inter-unit communication mechanism 1195 for the different units to communicate with each other, with the OS, and with other applications (not shown). The described units and elements can be varied in design, function, configuration, or implementation and are not limited to the descriptions provided. Processor(s) 1110 can be in the form of hardware processors such as central processing units (CPUs) or in a combination of hardware and software units.

[0114] Processor(s) 1110 can be in the form of physical hardware processors (e.g., Central Processing Units (CPUs), field-programmable gate array (FPGA), application-specific integrated circuit (ASIC)) or a combination of software and hardware processors.

[0115] Processor(s) 1110 can be configured to fetch and execute programs stored in memory 1115. When processor(s) 1110 execute programs, processor(s) 1110 fetch instructions of the programs from memory 1115 and execute them, such as programs for performing process as illustrated in FIGS. 1, 2, 4, 6, and 9. When processor(s) 1110 execute programs, processor can load information such as illustrated in FIGS. 3, 5, 7 and 8 from memory. Processor(s) 1110 can pre-fetch and cache instruction of programs and information to improve performance.

[0116] In some example implementations, when information or an execution instruction is received by API unit 1165, it may be communicated to one or more other units (e.g., logic unit 1160, input unit 1170, output unit 1175). In some instances, logic unit 1160 may be configured to control the information flow among the units and direct the services provided by API unit 1165, the input unit 1170, the output unit 1175, in some example implementations described above. For example, the flow of one or more processes or implementations may be controlled by logic unit 1160 alone or in conjunction with API unit 1165. The input unit 1170 may be configured to obtain input for the calculations described in the example implementations, and the output unit 1175 may be configured to provide an output based on the calculations described in example implementations.

[0117] Processor(s) 1110 can be configured to receive one or more events of interest; feed the one or more events of interest into a causal inference engine comprising a trained Bayesian Network that is configured to compute posterior marginal probability distributions of a target event based on causal relationship between the one or more events and the target event from the Bayesian Network; and output the posterior marginal probability distribution of the target event, for example, as illustrated in FIGS. 7-10. The processors(s) 1110 may also be configured to receive the one or more events of interest based on an event detection engine identifying at least one of the one or more events of interest, as illustrated in FIG. 10. In some examples, the one or more events may be stored in a storage device, such as internal storage 1120, memory 1114, external storage 1145, etc. In various example implementations, the processor(s) 910 (or the components therein) may be an example of means for receiving elicited events, feeding events of interest into a causal inference engine, and/or means for outputting posterior marginal probability distributions(s), in accordance with the example implementations disclosed herein.

[0118] In example implementations, processor(s) 1110 may be configured to build an event detection engine based on receiving unlabeled data extracted from data sources (e.g. unstructured and/or structured data sources), determining that the unlabeled data pertains to an event of interest, and, in response to the determination, feed the resulting event into the causal inference engine as an event of interest of the one or more events of interest, as illustrated in FIGS. 1-6. [0119] In example implementations, processor(s) 1110 may be configured to build the causal inference engine based on the target event and subject matter expert definitions identifying information relevant to the target event, use an event detection engine to identify a plurality of events of interest, analyze the events of interest based on a Bayesian Network structure discovery algorithm to discover causal relationships between the plurality of events of interest and the target event; and estimate a strength of dependency between the plurality of events of interest and the target event, as described in connection with FIGS. 7-10. They Bayesian Network may comprise a DAG having nodes and edges connecting the nodes, where each node represents an event of the plurality of events of interest, a sink node represents the target event, and each edge represents a direct causal relationship between events represented by connected nodes.

[0120] Some portions of the detailed description are presented in terms of algorithms and symbolic representations of operations within a computer. These algorithmic descriptions and symbolic representations are the means used by those skilled in the data processing arts to convey the essence of their innovations to others skilled in the art. An algorithm is a series of defined steps leading to a desired end state or result. In example implementations, the steps carried out require physical manipulations of tangible quantities for achieving a tangible result.

[0121] Unless specifically stated otherwise, as apparent from the discussion, it is appreciated that throughout the description, discussions utilizing terms such as “processing,” “computing,” “calculating,” “determining,” “displaying,” or the like, can include the actions and processes of a computer system or other information processing device that manipulates and transforms data represented as physical (electronic) quantities within the computer system’s registers and memories into other data similarly represented as physical quantities within the computer system’s memories or registers or other information storage, transmission or display devices.

[0122] Example implementations may also relate to an apparatus for performing the operations herein. This apparatus may be specially constructed for the required purposes, or it may include one or more general-purpose computers selectively activated or reconfigured by one or more computer programs. Such computer programs may be stored in a computer readable medium, such as a computer readable storage medium or a computer readable signal medium. A computer readable storage medium may involve tangible mediums such as, but not limited to optical disks, magnetic disks, read-only memories, random access memories, solid- state devices, and drives, or any other types of tangible or non-transitory media suitable for storing electronic information. A computer readable signal medium may include mediums such as carrier waves. The algorithms and displays presented herein are not inherently related to any particular computer or other apparatus. Computer programs can involve pure software implementations that involve instructions that perform the operations of the desired implementation.

[0123] Various general-purpose systems may be used with programs and modules in accordance with the examples herein, or it may prove convenient to construct a more specialized apparatus to perform desired method steps. In addition, the example implementations are not described with reference to any particular programming language. It will be appreciated that a variety of programming languages may be used to implement the teachings of the example implementations as described herein. The instructions of the programming language(s) may be executed by one or more processing devices, e.g., central processing units (CPUs), processors, or controllers.

[0124] As is known in the art, the operations described above can be performed by hardware, software, or some combination of software and hardware. Various aspects of the example implementations may be implemented using circuits and logic devices (hardware), while other aspects may be implemented using instructions stored on a machine-readable medium (software), which if executed by a processor, would cause the processor to perform a method to carry out implementations of the present application. Further, some example implementations of the present application may be performed solely in hardware, whereas other example implementations may be performed solely in software. Moreover, the various functions described can be performed in a single unit, or can be spread across a number of components in any number of ways. When performed by software, the methods may be executed by a processor, such as a general-purpose computer, based on instructions stored on a computer readable medium. If desired, the instructions can be stored on the medium in a compressed and/or encrypted format.

[0125] Moreover, other implementations of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the teachings of the present application. Various aspects and/or components of the described example implementations may be used singly or in any combination. It is intended that the specification and example implementations be considered as examples only, with the true scope and spirit of the present application being indicated by the following claims.

Claims

CLAIMS What is claimed is:

1. A method for forecasting a target event, the method comprising: receiving one or more events of interest; feeding the one or more events of interest into a causal inference engine comprising a trained Bayesian Network configured to compute posterior marginal probability distributions of the target event based on causal relationships between the one or more events of interest and the target event from the Bayesian Network; and outputting the posterior marginal probability distribution of the target event.

2. The method of claim 1, wherein the one or more events of interest are received based on an event detection engine identifying at least one of the one or more events of interest.

3. The method of claim 2, wherein the event detection engine receives textual data extracted from an unstructured data source, determines that the textual data pertains to an event of interest, and in response to the determination, populates an event object with relevant attribute information extracted from the textual data, and feeds the resulting event object into the causal inference engine as an event of interest of the one or more events of interest.

4. The method of claim 3, wherein the one or more events of interest are ad hoc in nature, wherein the event detection engine is trained using a hybrid model that combines a natural language processing model with a deep-learning based language model.

5. The method of claim 1, further comprising: building the causal inference engine based on the target event and subject matter expert definitions identifying events of interest that are relevant to the target event; using an event detection engine to identify a plurality of events of interest that are relevant to the target event; analyzing the identified events of interest that are collected to discover direct causal relationships between the plurality of events of interest and the target event, wherein the discovered direct causal relationships define a structure of the Bayesian Network; and estimating a strength of dependency between the plurality of events of interest and the target event, wherein the Bayesian Network is completely trainable from by forming a collection of events baskets comprising the identified events of interest.

6. The method of claim 5, wherein the Bayesian Network comprises a directed acyclic graph comprising nodes and edges connecting the nodes; wherein each of the nodes represents an event of the plurality of events of interest; wherein a sink node from the nodes represents the target event; wherein each edge represents a causal relationship between connected nodes; and wherein the strength of dependency is represented as a probability of the event of interest of a node conditioned on the event of interest of parent nodes.

7. The method of claim 1, wherein at least one of the one or more events of interest is expressed as written textual sentences from unstructured data sources.

8. A system for forecasting a target event, the system comprising: one or more memories configured to store instructions; and one or more processors coupled to the one or more memories and configured to execute the instructions to: receive one or more events of interest; feed the one or more events of interest into a causal inference engine comprising a trained Bayesian Network configured to compute posterior marginal probability distributions of the target event based on causal relationships between the one or more events of interest and the target event from the Bayesian Network; and output the posterior marginal probability distribution of the target event.

9. The system of claim 8, wherein the one or more events of interest are received based on an event detection engine identifying at least one of the one or more events of interest.

10. The system of claim 9, wherein the event detection engine receives textual data extracted from an unstructured data source, determines that the textual data pertains to an event of interest, and in response to the determination, populates an event causal structure with relevant information extracted from the textual data, and feeds the resulting event causal structure into the causal inference engine as an event of interest of the one or more events of interest.

11. The system of claim 9, wherein the one or more events of interest are ad hoc in nature, wherein the event detection engine is trained using a hybrid model that combines a natural language processing model with a deep-learning based language model.

12. The system of claim 8, wherein the one or more processors are further configured to execute the instructions to: build the causal inference engine based on the target event and subject matter expert definitions identifying events of interest that are relevant to the target event; an event detection engine to identify a plurality of events of interest that are relevant to the target event; analyzing the identified events of interest that are collected to discover direct causal relationships between the plurality of events of interest and the target event, wherein the discovered direct causal relationships define a structure of the Bayesian Network; and estimate a strength of dependency between the plurality of events of interest and the target event, wherein the Bayesian Network is completely trainable from by forming an event basked comprising the identified events of interest.

13. The system of claim 12, wherein the Bayesian Network comprises a directed acyclic graph comprising nodes and edges connecting the nodes; wherein each of the nodes represents an event of the plurality of events of interest; wherein a sink node from the nodes represents the target event; wherein each edge represents a causal relationship between connected nodes; and wherein the strength of dependency is represented as a probability of the event of interest of a node conditioned on the event of interest of parent nodes.

14. The system of claim 8, wherein at least one of the one or more events of interest is expressed as written textual sentences from unstructured data sources.

15. A non-transitory computer readable medium, storing instructions for forecasting a target event, the instructions comprising: receiving one or more events of interest; feeding the one or more events of interest into a causal inference engine comprising a trained Bayesian Network configured to compute posterior marginal probability distributions of the target event based on causal relationships between the one or more events of interest and the target event from the Bayesian Network; and outputting the posterior marginal probability distribution of the target event.

16. The non-transitory computer readable medium of claim 15, wherein the one or more events of interest are received based on an event detection engine identifying at least one of the one or more events of interest.

17. The non-transitory computer readable medium of claim 16, wherein the event detection engine receives textual data extracted from an unstructured data source, determines that the textual data pertains to an event of interest, and in response to the determination, populates an event causal structure with relevant information extracted from the textual data, and feeds the resulting event causal structure into the causal inference engine as an event of interest of the one or more events of interest.

18. The non-transitory computer readable medium of claim 17, wherein the one or more events of interest are ad hoc in nature, wherein the event detection engine is trained using a hybrid model that combines a natural language processing model with a deeplearning based language model.

19. The non-transitory computer readable medium of claim 15, further comprising: building the causal inference engine based on the target event and subject matter expert definitions identifying events of interest that are relevant to the target event; using an event detection engine to identify a plurality of events of interest that are relevant to the target event; analyzing the identified events of interest that are collected to discover direct causal relationships between the plurality of events of interest and the target event, wherein the discovered direct causal relationships define a structure of the Bayesian Network; and estimating a strength of dependency between the plurality of events of interest and the target event, wherein the Bayesian Network is completely trainable from by forming an event basked comprising the identified events of interest.

20. The non-transitory computer readable medium of claim 19, wherein the Bayesian Network comprises a directed acyclic graph comprising nodes and edges connecting the nodes; wherein each of the nodes represents an event of the plurality of events of interest; wherein a sink node from the nodes represents the target event; wherein each edge represents a causal relationship between connected nodes; and