WO2019050968A1

WO2019050968A1 - Methods, apparatus, and systems for transforming unstructured natural language information into structured computer- processable data

Info

Publication number: WO2019050968A1
Application number: PCT/US2018/049579
Authority: WO
Inventors: Jack CROWLEY
Original assignee: Forgeai, Inc.
Priority date: 2017-09-05
Filing date: 2018-09-05
Publication date: 2019-03-14

Abstract

Systems, apparatus, and methods are disclosed for transforming unstructured natural language information into structured computer-processable data. A collection controller can comprise of an emitter to extract the unstructured data. A topological learning network can classify the extracted unstructured data. A transformation controller can model and encode the classified unstructured data as a knowledge graph thereby transforming the unstructured data as structured data. A syndicate controller can perform entity resolution and knowledge-based reasoning over the knowledge graph to resolve the structured data as computationally-relevant data.

Description

METHODS, APPARATUS, AND SYSTEMS FOR TRANSFORMING UNSTRUCTURED NATURAL LANGUAGE INFORMATION INTO STRUCTURED COMPUTER- PROCESSABLE DATA

CROSS-REFERENCE TO RELATED APPLICATIONS

[0001] This application claims the priority benefit, under 35 U.S.C. §119(e), of U.S. Application No. 62/554,508, entitled "Methods, Apparatus, and Systems for Transforming Unstructured Natural Language Information into Structured Computer-Processable Data," which was filed on September 5, 2017, and is incorporated herein by reference in its entirety.

BACKGROUND

[0002] Intelligent machines are the bedrock of Artificial Intelligence (AI). Intelligent machines can convert structured data into meaningful functions. For instance, as illustrated in FIG. 1, intelligent machines can process structured data to predict, recommend, optimize, and take specific actions. These functions are performed at machine speed and machine costs. However, 90% of data available today is in the form of unstructured data. Processing unstructured data to perform meaningful functions necessitates human intervention thereby increasing costs and decreasing speed. Thus, there exists an unmet need for systems that transform unstructured data into structured computer-processable data without human intervention at machine costs and machine speed.

SUMMARY

[0003] The present technology addresses problems associated with transforming unstructured natural language information into structured computer-processable data. Embodiments of this technology includes methods and systems that include a collection module that collects information from around the world (e.g., social media platforms, news platforms, web contents, and/or the like) and codifies the information into structured events. A transformation module processes the structured events based on language modeling and knowledge base technology to provide meaningful synthesized structured information. A syndicate module filters the synthesized structured information to provide services such as recommendations, predictions, optimizations, and actions to customers.

[0004] In one implementation, a system for collecting and transforming unstructured data into computationally relevant structured data comprises a collection controller. The collection controller can comprise of an emitter. The emitter can comprise a web crawler to extract the unstructured data. The collection controller can also comprise a topic monitor to monitor the emitter to determine a frequency of first information included in the extracted unstructured data. The collection controller can also comprise a topological learning network to classify the information in the extracted unstructured data. The system also comprises a transformation controller that is communicatively coupled to the collection controller and a knowledge graph database. The knowledge graph database stores a knowledge graph. The transformation controller can include a language modeling controller to model the classified first information and to stored the classified first information as first structured data within the knowledge graph. The knowledge graph can include a plurality of entities and can indicate relationships between the plurality of entities. The system also comprises a syndicate controller that is

communicatively coupled to the transformation controller and the knowledge graph database. The syndicate controller can perform entity resolution and knowledge-based reasoning over the knowledge graph to resolve the first structured data as first computationally-relevant structured data.

[0005] In some instances, the topic monitor includes at least one of a Twitter™ monitor, at least one website monitor, and/or a Rich Site Summary (RSS) monitor. In some instances, the topological learning network can include a word encoder and a sentence encode. In some instances, the topological learning network can be a bidirectional Long Short-Term Memory with Attention (BLSTM-A) network. The BLSTM-A network can generate context-aware word representations based at least in part on the first information in the extracted unstructured data. The BLSTM-A network can output a weighted sum of the context-aware word representations to provide at least one sentence representation based at least in part on the first information in the extracted unstructured data. The BLSTM-A network can score respective context-aware word representations with a word importance vector to provide a sentence vector. [0006] In some instances, the topological learning network embeds neurons modeled with axons and dendrites on a surface of a continuously differential manifold. In some instances, the continuously differential manifold can be a torus. In some instances, the system can further comprise the knowledge graph database where the knowledge graph database is integrated with prime encoding schemes to reduce complexity of operations on the knowledge graph. The syndicate controller can perform the entity resolution through integration of locality sensitive hash fuzzy matching. The syndicate controller can define a similarity metric between a candidate entity from the plurality of entities and at least one other known entity from the plurality of entities.

[0007] In one implementation, a method for extracting and transforming unstructured data into computationally legible structured data can comprise extracting the unstructured data via at least one web crawler from the World Wide Web. Information in the unstructured data can be classified using a bidirectional Long Short-Term Memory with Attention (BLSTM-A) network to provide classified information. The classified information in the unstructured data can be modeled to extract semantic information from the unstructured data. The extracted semantic information can be encoded in a knowledge graph. The unstructured data can be transformed to structured data based on the knowledge graph. Relationships between a plurality of entities in the knowledge graph can be determined. Entity resolution can be performed on the knowledge graph to resolve the structured data as computationally legible structured data. The computationally legible structured data can be transmitted to at least one recipient.

[0008] In some instances, the method can further comprise generating context-aware word representations using a BLSTM-A based at least in part on the information in the unstructured data. The method can further comprise outputting a weighted sum of the context-aware representations using BLSTM-A to provide at least one sentence representation. The method can further comprise scoring respective context-aware word representations with a word importance vector to provide a sentence vector. In some instances, modeling the classified information can include embedding neurons modeled with axons and dendrites on a surface of a continuously differential manifold. The continuously differential manifold can be a torus. In some instances, the method can further comprise integrating the knowledge graph with prime encoding scheme to reduce complexity of operations on the knowledge graph. [0009] In one implementation, a system for extracting and transforming unstructured data into structured data can comprise a first controller to extract unstructured data from the World Wide Web, a second controller to transform the unstructured data into a structured representation, a third controller to perform semantic resolution of events, entities, and concepts in the structured representation and thereby provide resolved structured information, and a fourth controller to transmit the resolved structured information to a recipient.

[0010] It should be appreciated that all combinations of the foregoing concepts and additional concepts discussed in greater detail below (provided such concepts are not mutually inconsistent) are contemplated as being part of the inventive subject matter disclosed herein. In particular, all combinations of claimed subject matter appearing at the end of this disclosure are contemplated as being part of the inventive subject matter disclosed herein. It should also be appreciated that terminology explicitly employed herein that also may appear in any disclosure incorporated by reference should be accorded a meaning most consistent with the particular concepts disclosed herein.

[0011] Other systems, processes, and features will become apparent to those skilled in the art upon examination of the following drawings and detailed description. It is intended that all such additional systems, processes, and features be included within this description, be within the scope of the present invention, and be protected by the accompanying claims.

BRIEF DESCRIPTION OF THE DRAWINGS

[0012] The skilled artisan will understand that the drawings primarily are for illustrative purposes and are not intended to limit the scope of the inventive subject matter described herein. The drawings are not necessarily to scale; in some instances, various aspects of the inventive subject matter disclosed herein may be shown exaggerated or enlarged in the drawings to facilitate an understanding of different features. In the drawings, like reference characters generally refer to like features (e.g., functionally similar and/or structurally similar elements).

[0013] FIG. 1 shows structured data vs. unstructured data and the unmet need to transform unstructured data into meaningful information. [0014] FIG. 2 is an illustration of a conventional system for transforming unstructured data into computer-processable format.

[0015] FIG. 3 is an illustration of present technology for transforming unstructured data into computer-processable format.

[0016] FIG. 4 shows a flow diagram for transforming unstructured natural language data into machine processable structured data.

[0017] FIG.5 is an illustration of an example collection controller used to collect unstructured data.

[0018] FIG. 6 is an illustration of Topological Neural Network modeled with axons and dendrites, on the surface of a continuously differential manifold, currently a torus.

[0019] FIG. 7 is a high-level illustration showing the main processing stages for a document before and within a topological neural network.

[0020] FIG. 8 illustrates an example implementation of directory hierarchy.

[0021] FIG. 9 illustrates a diagram of a Bidirectional Long Short-Term Memory with Attention being used at the word-encoder level.

[0022] FIG. 10 illustrates word importance and sentence importance vectors for an example.

[0023] FIG. 11 is an illustration of an example transformation and syndication controller.

[0024] FIG. 12 illustrates visualization of an example knowledge graph.

[0025] FIG. 13 is a visualized excerpt from an example knowledge graph that pertains to the entity Apple the fruit.

[0026] FIG. 14 is a visualized excerpt from an example knowledge graph that pertains to the entity Apple, the consumer electronics corporation.

[0027] FIG. 15 is a visualized excerpt from an example knowledge graph that pertains to the relationship between the Norilsk platinum group metals mine in Siberia, Russia and Samsung. [0028] FIG. 16 illustrates top-level event ontology.

[0029] FIG. 17 illustrates a process of structuring unstructured data.

[0030] FIG. 18 illustrates an example transformation of unstructured data into structured data.

[0031] FIG. 19 illustrates an example platform output.

DETAILED DESCRIPTION

[0032] Ninety percent of the data available in the world today is in the form of unstructured data. Structured data can be processed at machine speed and machine costs to provide meaningful information. For example, structured data as represented in Table 1 can be processed/modeled using intelligent machines to provide specific actions.

Table 1 [0033] However, most data available in the world today is in unstructured form. For example, a news article about an event, such as Verizon™ intending to buy Yahoo™, can include valuable information in unstructured format:

That strategy could suffer setbacks if Verizon concludes it cannot follow through with buying Yahoo. It may also be affected by a looming regulatory effort to force Verizon and other Internet providers to ask their customers' permission before using their personal data for advertising, according to business analysts.

At the same time, analysts say, th Yahoo breach may provide an opportunity for Verizon to abandon its current deal with Yahoo and seek a discount on the company, which could bolster Verizon's strategy in the long run,

"I can't think of a reason why Verizon wouldn't attempt to negotiate a lower price," said Walt.Piecyk_> an analyst at BUG, in an interview last week,

An organization investing in Verizon™ and Yahoo™ stocks has to process such unstructured data to make an informed decision.

[0034] Technological Solutions to Collect, Extract, Refine, and Transform Unstructured Data into Computationally Relevant Structured data

[0035] Massive volumes of raw data are created every second and the rate of growth is

exploding. As an example, market research firm IDC predicts that the data created and copied every year will reach 180 zettabytes (180 followed by 21 zeros) by 2025. However, the fact that a massive and growing amount of data exists, does not mean that an endless supply of "AI Fuel" is automatically there for the taking. Existence does not equal usability. It first needs to be made usable.

[0036] Just as coal was mined, pulverized and burned to generate the electricity that powered the industrial revolution, data has its own analogous complex assembly line and supply chain. Data first needs to be collected, extracted, refined and ultimately transformed from its raw form into "computationally relevant fuel" before it can power the AI engines of today. Doing that at scale, presents a significant and ever-growing challenge, particularly if you are trying to handle data that is unstructured. [0037] Analysts estimate that approximately 90% of the global data being produced is unstructured - generally meaning that the data has no predefined data format. Because businesses are designed to run on structured data inputs, the rich global corpus of ever changing unstructured data cannot be effectively used as data fuel. Instead, it is "walled off from the world's AI infrastructure. As a result, oceans of data, along with their information quotient, are being dropped on the floor, left untapped and inaccessible. That data being "left on the floor" is the lion's share of the world's data that contains some of the most valuable information in the world.

[0038] To make the vast majority of raw data usable, we must overcome the challenge of making unstructured data computationally legible and do so at scale. If unstructured data were as easily and as readily usable by a business' s intelligent machines as structured data is today, businesses could begin to look at the world, their customers and their competitors in entirely new ways.

[0039] The present technology transforms unstructured data (including conversations in the form of textual documents, audio streams, video streams, and/or the like) into structured, real-time machine consumable streams representing the ever-changing flow of activities and events that are occurring in the world. The structured data is computationally relevant or computationally legible. Put differently, a programmer or data scientist can use the structured data, or a portion of the structured data, in or as a part of a mathematical operation (e.g., to calculate distance and/or other measures). What was once unstructured and "walled off from machine intelligence can now directly fuel, and be consumed by, that same machine intelligence.

[0040] The present technology provides ways to codify events and ontologically anchor them in time and place so that computers can consume and perform mathematical operations over the event data similar to the manner in which they handle numeric data. An extracted event is a potentially consequential action or statement represented in a computationally efficient manner.

[0041] Technical Challenges to Capturing Language for Computational Processing

[0042] Some of the key technical challenges related to capturing language in a manner suitable for computational processing are data usability, knowledge enrichment, and temporal meaning.

[0043] Data Usability [0044] When developing a new algorithm or exploring a new technique, the representation of the information used by the algorithm is essential. Garbage in, garbage out. Numeric data or data over which a metric may be imposed is ideal. Examples of such data include:

• Money ($5.00)

• Time (10 seconds)

• Weight (20 grams)

• Power (60 watts)

[0045] Discrete data items, such as clicks, parking lots, and pencils are ideally incorporated in machine learning processes by transforming these quantities into rates and frequencies using time and space, such as:

• Clicks per hour

• Parking lots per square kilometer

• Pencils per student

[0046] The imposition of a metric supports multiple optimization techniques integral to most machine learning algorithms. Processing unstructured data requires a different approach because language does not have a direct translation onto a representation suitable for inclusion in machine learning, automated reasoning, or decision support applications.

[0047] Lack of Knowledge Enrichment

[0048] Some of the techniques mentioned above generate measurements that lend themselves to analytical processes such as sentiment analysis, entity extraction and document classification. These processes can identify what may be trending, enable information retrieval, and implement alerts, but much of the crucial information content being expressed in the source communication is lost. [0049] The machine representation of language is mostly done as sets of entities and objects, with directed relationships between them. An embedding space would allow for a natural means of measurement, but is not yet broadly used.

[0050] Words, sentences and documents can be embedded into spaces that maintain a sense of usage similarity, but work still needs to be done to integrate ontological meaning, which is necessary in order to support reasoning. The lack of a "measurable space" in which to computationally express assertions and events expressed in natural language introduces a challenge to the incorporation of speech and text into computational models.

[0051] Limited Representation of Temporal Meaning

[0052] Similar to text classification and measuring sentiment, "NLP processed" information is often "time stamped" with the time it was collected or published. NLP extracted dates are frequently left as a sequence of unresolved text tokens. The document date does support

"document level" time series analysis (e.g. are more people talking about IBM today than yesterday?), etc. this is just a proxy for what the author is conveying. For example, the author may be referring to an upcoming activity or something that happened in the past. There is a significant amount of information loss or misinformation that can occur by using document "meta dates" instead of extracting the dates expressed by the author, relative dates (e.g. next Monday), and time periods (e.g. the third quarter), transforming them from a sequence of tokens to a computational format such as a time t or a std:;chrono:: duration and associating them with the actions and events being conveyed.

[0053] The limited temporal extraction and resolution capabilities of most NLP systems is one of the reasons that they offer limited utility for mission critical solutions and are used instead to support filtering and routing information, helping to identify trends, or transforming information into an alternative, solution specific format for human interpretation such as news readers and alerting systems.

[0054] Conventional Systems vs. Present Technology

[0055] FIG. 2 is an illustration of a conventional system 200 for transforming unstructured data into computer-processable structured data. As illustrated, unstructured data 202 includes conversations in the form of textual documents, audio streams, video streams, and/or the like. In a conventional system 200, once the unstructured data 202 is collected, the conventional system 200 enables static/non-real time indexing of the data 204. Clustering and/or classification techniques 206, such as common Natural Language Processing (NLP) and similarity sentiment analysis, are applied to the indexed data to generate metadata. An analyst performs information retrieval and works on the metadata (see 208). Following the manual effort, feature extraction 210 by human intervention is preformed to create meaningful statistics that can be used to perform specific actions. Due to the human intervention, conventional systems 200 can be inconsistent and inefficient. In addition, such systems 200 are expensive, hard to sustain, messy, and error-prone.

[0056] The present technology 250 can include a platform that can be divided into four different sections: (1) the automated collection of (low latency) source information (performed by collection controller 212), (2) the transformation of the unstructured information into a structured representation (performed by transformation controller 214), (3) the semantic resolution of the events, entities, and concepts in the structured information (performed by syndicate controller 218), and (4) the transmission of the resolved, structured information to recipients. In some instances, the transmission of the resolved structured information can be performed by the syndicate controller 218.

[0057] As illustrated in FIG. 3, the present technology 250 includes a collection controller 212 that converts and codifies unstructured data into events. The collection controller 212 can include emitters (e.g., web crawlers) that extract information/the unstructured data. The unstructured data can be extracted from different channels including the World Wide Web, radio, social media, EDGAR™, and/or the like. The process of extraction can get the raw unstructured data free from ancillary content inputs. The collection controller 212 can monitor the emitters with topic vectors based on the frequency of information. In some instances, the collection controller 212 is modeled such that the emitters can dynamically shift lens based on the topic that is trending. Thus, the collection controller 212 includes an optimized model of emitters. In addition, the collection controller 212 includes the ability to track the similarity of documents and the ability to identify and extract events. Some examples of events include earnings, regulatory filings, analyst ratings, IPO pricing, product information (launches, ratings, and issues), personnel information (hiring, firing), supply chain/partner new, industry conversations/chatter, political events, legal events, economic events, internal research and documentation and/or the like.

[0058] The transformation controller 214 includes revolutionary AI techniques to distill a reasonable representation of the information. The transformation controller 214 has unique capability to integrate LP with probabilistic semantic graph with prime number encoding. The encoding is based upon number theory concepts to enable efficient subsumption and least common ancestor operation over the probabilistic semantic graph.

[0059] The transformation controller 214 interacts with a knowledge base database 216. The knowledge base database includes a knowledge graph where each vertex represents an entity and each edge is directed and represents a relationship between entities.

[0060] Syndicate controller 218 works on the processed data to support analysis, modeling and decision making.

[0061] Method and System for Collecting and Transforming Unstructured Natural Language Data

[0062] FIG. 4 shows a flow diagram for transforming unstructured natural language data into machine processable structured data. At step 412, unstructured data can be extracted via emitters from textual documents, audio streams, video streams, and/or the like. The unstructured data can be data that is open and available to the public, data on social networking and media platforms, published data, and/or proprietary data. The unstructured data can be classified using neural networks (e.g., topological recurrent neural networks, such as bidirectional Long Short-Term memory (BLSTM). At step 414, the classified information can be modeled, and the sematic information from the unstructured data can be encoded in a knowledge graph. Relationship between entities in the knowledge graph can be determined. At step 418, knowledge based reasoning and entity resolution are performed over the knowledge graph to resolve the structured representation as computationally legible structured data. The computationally legible structured data can then be transmitted to one of more recipients (e.g., customers) that can model and analyze the computationally legible structured data to make predictions, recommendations, and/or to take actions. [0063] Collection Controller

[0064] FIG. 5 is an illustration of an example collection controller (e.g., structurally and functionally similar to collection controller 212 in FIG. 3). The collection controller can include one or more emitters (e.g., Web crawlers) to extract unstructured data. The collection controller can monitor the emitters with topic vectors based on the frequency of information. The collection controller can also include twitter monitors/topic monitors 522a, site monitors 522b, RSS monitors 522c, and SCC monitors 522d (collectively, monitors 522) to monitor the emitters. In some implementations, the emitters can be integrated into the monitors 522. In some

implementations, the emitters can be communicably coupled with the monitors 522. The collection controller can also include one or more collection agents 524. The collection agents 524 can include neural networks such as topological neural networks to identify and classify the unstructured data. The collection agents 524 can implement fuzzy matching using permutation groups and Locality Sensitive Hashing (LSH) to identify, index, and classify the unstructured data.

[0065] Improving Fuzzy Matching using Permutation Groups and Locality Sensitive Hashing (LSH)

[0066] Approximate string matching is a component of entity resolution. Often people's names are transliterated incorrectly, abbreviated, misspelled, partial, or incorrect for other reasons. Common techniques for performing a "fuzzy match" to identify a set of candidate entities from a known domain for a given input string typically used soundex/metaphone techniques, edit distance techniques such as the well-known Levenshtein distance, ngram distance measures, such as Jaccard Similarity, prefix matching (using tries), etc. While these techniques have good utility for multiple domains including spell checking, they possess shortcomings with managing partial string matches common when dealing with names and difficult indexing schemes. They also have a high rate of false positives. The present technology includes a technique that develops ngram permutations of the source string (as opposed to traditional shingling operations) and applies a locality sensitive hash (LSH) technique to efficiently find candidate matches. LSH solves the approximate or exact Near Neighbor Search in high dimensional spaces. The implementation allows for run-time updating the candidate set of entities and is computationally efficient with respect to time.

[0067] Topological Learning Networks

[0068] The formalism of different layers of a neural network, from autoencoders, restricted Boltzmann and convolutional layers at the input, to feedback layers in different recurrent neural network strategies has the training algorithms applied to these networks.

[0069] The present technology leverages these techniques and also has a unique learning network architecture which removes the formalism of establishing weight matrices between the subsequent and feedback layers. The experimental technique is being optimized for processing sequential streams of information such as text and speech. The method leverages the

foundational tenets from deep learning neural networks, cognitive science, and studies of early childhood brain development.

[0070] The technique, referred to as Topological Learning Networks (TLN) or Topological Neural Networks (TNN), removes the a priori establishment of weights between neurons and instead embeds each neuron, modeled with axons and dendrites, on the surface of a continuously differential manifold, such as a torus as illustrated in FIG. 6.

[0071] Regions of the embedding space, as illustrated in FIG. 6, have the capacity to receive and hold an activation potential (e.g., a charge). This activation charge decays over time and as neurons transfer the charge between regions. Inputs to the network are modeled as external sensors that are activated by an (encoded) input element from the input sequence. The outputs are fixed points on the manifold. During the training or learning process a neuron's axons and dendrites are imbued with mobility and can change location. Like traditional neural architectures, a neuron has an activation function (tanh, sigmoid, etc.) As noted above the embedding surface maintains a decay rate (with respect to time) in addition to an activation potential. The cost function for optimization includes temporal and spatial dimensions. The spatial gradient used during training (analogous to back propagation) constrains the axons and dendrites to the surface of the manifold and the domain of the decay rate, modeled as an exponential decay, is the set of positive real numbers. The intended benefit of this technique will be to allow establishing an efficient learning epoch and a capability to organically encode the different semantic nuances of sequential communications similar to how vector term embedding (as demonstrated in

Word2Vec or Stanford's GloVe) encode word similarity.

[0072] One of the primary advantages of neural networks is their ability to automatically learn features in the data that are important for making accurate predictions. In one implementation, a neural network component for natural language task processing can include a bidirectional Long Short-Term Memory with attention layer (BLSTM-A). In order to process a document, as a preprocessing step documents can be segmented by sentence and then by words, before being fed to the neural network. A word encoder (e.g., BLSTM-A) can be applied over the work tokens for each sentence in a given document, resulting in a vector representation of each sentence. The sequence of sentence vectors can then be fed into a sentence encoder (e.g., BLSTM-A), to obtain a single vector representation of the document. FIG. 7 illustrates a high level processing for a document containing two sentences. The same word encoder component 702 can be used for processing the word tokens in sentence one and sentence two.

[0073] In one implementation, hierarchical attention network can include two hierarchies- (1) Data hierarchy: The labels for classification tasks can exist in a pre-defined hierarchy; (2) Model hierarchy: The architecture of the model can itself be hierarchical.

[0074] Data Hierarchy

[0075] In one implementation, training data can consist of documents that are organized into a directory hierarchy, where the name of a given directory can define a label for all documents beneath it. A document may have more than a single label (nested directory structure).

Specifically, the set of possible labels for a given document can consists of the directory names along the path from the root directory to the location of the document.

[0076] A unique model can be associated with each directory in the data hierarchy, including the root directory. The only exceptions are directories that do not contain at least two subdirectories, since there is no reason to train a model on one or zero labels. An example directory hierarchy is shown in FIG. 8. [0077] As illustrated in FIG. 8, the root-level model can be trained to label a document as either Finance 802, Politics 804, or one of the omitted remaining labels (denoted by the ellipsis 806). Depending on the level of confidence for the prediction, the document may be passed to models in the next level. If the root-level model predicts the document to belong under Finance 802, that document can be sent to the model trained exclusively on documents within Finance 802, which will output either Dividends 808, Earnings 810, or Finance Other. Beneath every category is a special label reserved for documents that could not be placed neatly into any further

subcategories. Please note that the labels and categories above and shown in FIG. 8 are full illustrative purposes only. For instance, Finance 802 could have other subcategories not shown in FIG. 8. For example, the documents placed directly under Finance in FIG. 8 would be labeled as Finance Other, where the parent name can be prepended to enforce the constraint that no two categories have the same name. This process can be continued down the prediction path until the stopping condition is met (e.g., receiving a prediction confidence lower than a given/specific percent).

[0078] Multi-topic documents can be explored by traversing, for example, the prediction paths for the top two predicted labels. Another option may be traversing down into any subcategory that receives a prediction confidence above some predefined threshold. If the preference is to increase the average number of paths taken, then the confidence threshold must be decreased. For example, if the threshold is above 0.5, then more than one path will not be traversed down.

[0079] Model Hierarchy

[0080] In the above sections, a high-level overview of BLSTM-A and how it is used in the hierarchical attention network are provided. In this section, a full model architecture is provided in more detail. FIG. 9 illustrates a diagram of a BLSTM-A being used at the word-encoder level. This is also the unrolled/unfolded representation where, instead of showing the self-recurrent LSTM loop pointing back on itself, identical copies of the network are shown at each step along the input sequence. Reference numerals also denote identity, so any components with the same reference numerals are exact attention mechanism operates by scoring each element of the BLSTM output sequence with a word importance vector. The word importance vector takes the place of what's often called the query in traditional attention mechanisms. The goal is to learn a representation for the importance vector such that its inner product with a context-aware (e.g., relation to other aspects of the document) word vector (output from the BLSTM) yields some measure of the word's importance. Since the importance vector can be trained jointly with the rest of the model parameters, the meaning of a given word' s importance is defined implicitly through the training task— important words are ones that appear predominantly for some particular label. Formally, the context-aware vector (BLSTM output state) for the t"¹ word in the 1^th sentence is denoted by h¹¹ and

(FC-TANH) ^uit = « < mh( H ^'„·/* ^;, b_w) (SOFTMAX)

• ···· V. ciithu

(SENTENCE VECTOR) i

[0081] with Si being the sentence vector representation output from BLSTM-A. This results in a sentence vector for each sentence in the document. The same process is then run over these vectors with a different BLSTM-A, called the sentence encoder (just to distinguish it from the word encoder), scoring the importance of each sentence relative to one another. Similar to word importance, important sentences are those containing a particular pattern of words that appear predominantly in documents of a particular label. This reduces the chances of the model getting thrown off by, say, some politically-charged word occurring in a document that's otherwise about home cooking. If an entire portion of the document goes off on a political rant, however, this is good reason for the model to increase its relative confidence that the document may be labeled under Politics.

[0082] In one implementation, the detection of important words/sentences occurs before prediction of the document label. Rather than using the label to figure out the important words, the model can first find objectively informative word patterns that will then be used when determining the label downstream. Mini-batch gradient descent updates the model weights by first averaging the gradients for each example in the batch. The effect of this average, and the incremental updates across batch averages, is that representations are learned for the importance vectors that give high scores to recurring patterns in the data that would have been generally useful for determining the document label. Noisy signals can be averaged out and important ones compounded. Indeed, the representations for the importance vector can take on very interesting values, as seen in FIG. 10. Image 1002 is the word importance vector and the Image 1004 is the sentence important vector. The horizontal axis corresponds to the values of the vector elements and the axis going into the page is the training step, the distribution of values in the vector in fifty step intervals can be seen.

[0083] Implementation Notes

[0084] In one implementation, word embeddings can be initialized to pretrained GloVe embeddings instead of training them from scratch. The pretrained embeddings can be trained by the Stanford NLP Group on six billion tokens from Wikipedia and Gigaword.

[0085] Individual documents can be converted into a matrix representation such that the entry in row i and column j corresponds to the j^th word of the 1^th sentence in the document. Zero-padding is inserted such that the matrix has a number of columns equal to the number of words in the longest sentence. Note, however, that there can be flexibility with respect to number of sentences and words in a batch of documents, allowing for variable-length number of sentences and lengths of longest sentence per-document. This can be accomplished by dynamically unfolding the sequences with TensorFlow's tfwhile loop function, in conjunction with dynamic

tf. Tensor Array obj ects .

[0086] In order to support fast training times and flexibility during serving, the model can consume serialized TFRecords protobufs for training and raw tensors during serving. Such a split is possible by utilizing the notion of model signatures in TensorFlow, which allow one to define input/output modalities and associate them with unique identifiers.

[0087] The models themselves can be serialized to binary protocol buffers. At serving time, these are loaded with the TensorFlow C++ API, using the minimal set of operations required to load a model into memory and feed tensors to input layers. At startup, the server loads copies of the full model tree into a thread-safe queue with one model tree for each available thread . A REST endpoint is exposed to allow clients to issue POST requests containing documents to be classified. Issuing a request triggers the request handler that will pop a model tree off the queue, query it for a hierarchy of predictions, and enqueue the model tree back into the queue when finished.

[0088] Transformation and Syndication controllers

[0089] FIG. 11 is an illustration of an example transformation and syndication controller (e.g., structurally and functionally similar to transformation controller 214 and syndication controller 218 in FIG. 3). The transformation controller can be communicatively coupled to the collection controller and a knowledge graph database. The transformation controller can include a language modeling controller to model the classified information and to encode the classified information as a knowledge graph. The knowledge graph database/knowledge base database can include a knowledge graph where each vertex represents an entity and each edge is directed and represents a relationship between entities. In some instances, entities can be proper nouns and concepts (e.g., Apple and Company, respectively), with the edges representing verbs (e.g., IS A).

Together, these form large networks that encode semantic information. For example, encoding the fact that "Apple is a Company" in the knowledge graph is done by storing two vertices, one for "Apple" and one for "Company," with a directed edge originating with Apple and pointing to Company of type "isA." This is visualized in FIG. 12.

[0090] A knowledge graph encodes many facts, each through the use of a directed edge. Each vertex can have many facts connected to it, making this ultimately a directed multigraph. This type of representation provides an intuitive way to reason about queries. For example, from the knowledge graph represented in Figure 12 one can reason about the question "Is apple a company?" by simply walking through the graph, starting at "Apple" and walking to "Company", testing edges and concepts along the way. In production, knowledge graphs tend to be quite large and complex with millions or billions of edges. Such a large amount of knowledge allows us to use these graphs to easily reason about semantic connections for tasks such as enriching business relevant data and resolving entities. In one implementation, these tasks are performed as part of the LP / LU pipeline for extracting individual events from unstructured text into a machine- readable format.

[0091] Entity Disambiguation using Knowledge Graphs [0092] To explore how entity disambiguation is handled with the knowledge base, consider the problem of determining which "Apple" is being referenced in the quote below:

[0093] "Though the company doesn't break out individual unit sales by model, Apple says it sold 77.3 million iPhones— a decrease from the 78.2 million iPhones it sold in the same period in 2017."

[0094] Obviously, this is "Apple" the corporation, not "apple" the type of fruit. In one implementation, the knowledge graph works in the same way as the brain that determines that this is "Apple" the corporation based on contextual clues. When an entity is to be disambiguated, the knowledge graph is provided with a set of co-located entities that provide the graph with the appropriate context. The knowledge graph then searches for all versions of "Apple" on the full graph and constructs small graphs that include contextual information as can be seen in Figures 13 and 14. Note, this is a noisy string search that is capable of finding versions of the initial search term that may differ from the original string or contain the search string as a substring. In one implementation, a look up table of known aliases for each of the entities can also be generated and kept, where aliases can be things like CIK codes or ticker symbols. With these small graphs in hand, the knowledge graph then uses machine reasoning to determine which of the entities is truly being referenced (e.g., by using a greedy algorithm which seeks to maximize the overlap between the contextual entities passed in and the small graphs under consideration).

[0095] Dependency Analysis Using Knowledge Graphs

[0096] Another major task that the knowledge graph can be useful for is dependency analysis. That is, to determine the relationship between two or more entities. This is most useful when attempting to determine whether an extracted event is something that a customer would care about, given their stated interests. To make this concrete, consider the following news story in regards to a customer that is interested in news events relating to Samsung:

[0097] "Russia's Norilsk Nickel has teamed up with Russian Platinum to invest $4.4bn to develop mining projects in Siberia, which contains some of the world's richest deposits of platinum and palladium. The two companies will form a joint venture to develop projects in the Taimyr Peninsula in Russia's far north with an aim to become the world's largest producer of the precious metals, they said Wednesday."

[0098] The question at hand is to determine whether this news event is related to Samsung and, if so, the nature of that relation so one can determine whether or not to pass this event to a customer. In one implementation, initial step involves constructing small graphs around each of the entities. With these graphs in hand, a path can then be computed (e.g., given Dijkstra's algorithm between each of the marked endpoints). An example of such a path is given in Figure 15.

[0099] As seen in FIG. 15, the knowledge graph believes that Iridium is a Platinum Group Metal, and that Platinum Group Metals are mined in Norilsk. As also seen, the Knowledge Graph believes that Iridium is used in Organic Light Emitting Diodes (or OLEDs), which just happen to be used in Samsung phones. Therefore, this news event is likely relevant to a costumer for instance. In fact, this event can be highly relevant to a customer's interest in Samsung because Iridium is incredibly important to the production of OLED screens due to its ability to make a blue LED. Indeed, Samsung has even funded researchers at MIT and Harvard to explore alternatives to Iridium for OLED screens. This type of dependency analysis is illustrative of the power of a well formed knowledge graph and it is critical for machine enabled semantic reasoning.

[0100] Other Features of Knowledge Graph

[0101] Probabilistic Reasoning: Giving the knowledge graph the ability to reason

probabilistically about the validity of facts allows it to hold conflicting facts or hypotheses and evaluate them later in the presence of more evidence. This can additionally be used to evaluate nuance of queries. This can be achieved by using techniques such as softening the axiomatic constraints that power the machine reasoning engine and building ontology-specific Bayesian models. Using these techniques can make the knowledge graph more resilient to internal errors.

[0102] Automatic Fact Checking: Of course, if there is a collection of facts that is intended to use as internal source of truth to augment business data, one should ensure that this set of facts is correct. In one implementation, this fact checking can be performed using a mix of manual spot checking and axiomatic constraint testing (e.g. a person can only be born in one country). This is the standard technique for evaluating the correctness of knowledge graphs. As with most machine learning tasks, this can be incredibly person intensive and, therefore, expensive. In one implementation, techniques related to hinge-loss Markov random fields that are directionally aware are used. In addition to being efficient, this allows us to look at a fact such as "Florida namedAfter Flo Rida" and swap the directionality, instead of having to first infer the need to delete this edge and then infer that the reverse edge should be present.

[0103] Automatic Graph Enrichment: Because it's simply not possible to have humans continually teach the knowledge graph, the present system is constructed to be capable of learning facts on its own. There are many ways to do this including: tracking unexplained queries, generalizing local and global graph features to infer new facts from patterns, and using semantic information. Intuitively, this might look like finding patterns such as "Companies tend to have a CEO" and one of the companies in our graph does not currently have a CEO.

Therefore, this region of the graph can be enriched specifically relating to the specific company and the existence of the CEO. To achieve this, modifications of techniques such as the path rank algorithm and graph embedding methods as well as information retrieval techniques from the internet and other sources can be used.

[0104] Graph Dynamics: Modeling the influence of specific edges on the connectivity of two marked vertices in a graph is fundamental to understanding network resilience. In the context of a knowledge graph, this provides information about the influence of this fact. Intuitively, if we imagine that the vertices in our graph are cities and the edges roads, with the edge weights corresponding to the width of those roads (e.g. 0.1 is a one lane road and 1.0 is a 6 lane super highway), then the time to travel between two different cities indicates the strength of their connection. With many alternative routes and many wide highways, we can say that those cities are tightly connected. Mathematically, the problem can be thought about in terms of a two point correlation function for a collection of random walks over the graph. These are discrete random walks whose dynamics can be modeled with a discrete Green's function. By taking advantage of the connection between discrete Green's functions on a graph of random topology and discrete Laplace equations, it can be possible to evaluate the influence of changing an edge. [0105] Graph Infrastructure

[0106] In addition to standard graph features, in one implementation, each fact that is stored in the knowledge graph can be endowed with the time at which the edge was added and a confidence for that edge. The time dependence intuitively follows from the observation that the totality of human knowledge grows and changes over time. Ultimately, this makes the graph dynamic, which is a natural feature of human knowledge itself.

[0107] In one implementation, the knowledge graph can be designed such that each edge has weights which can be interpreted as confidences. This data enables to capture the inherent uncertainty necessary to model a fast changing world and to reason about the validity of queries. By virtue of the graph being probabilistic, true Bayesian reasoning can be embraced in order to attempt to evaluate a query, as well as provide query specific priors to up or down weight an assertion based on the origin (e.g. a company's own statements about a new product release should be up-weighted over Twitter™ rumors).

[0108] Two complementary approaches can be taken to ensure that our graph algorithms are as quick as possible. First, because edges are interpreted as probabilities, it is possible to set a probability cutoff beyond which there is no interest in graph connections. This allows to only consider graph algorithms over highly restricted subsets of the graph, which provides with major algorithmic improvements. Second, the data structure can be engineered to remain as cache coherent as possible by representing the knowledge graph as a sparse three rank tensor in an attempt to optimize the per-fact throughput through the CPU.

[0109] Efficient parallelization by exploiting what can be termed the "galactic structure" of the graph can also be accomplished. While this is not a general feature of all graphs, there can be highly connected clusters of vertices that are only weakly connected to one another. Intuitively, this makes sense. For example, consider domains such as the Toronto Maple Leafs and modern particle physics— there is little overlap between these fields and therefore no need to reason over a graph that contains both clusters of highly interconnected vertices when reasoning about Dave Keon, the Toronto Maple Leafs legend. This galactic structure provides with a possible route towards efficient parallelization using commodity hardware. [0110] Integrating Prime Encoding Schemes in Knowledge Bases

[0111] Descriptive Logics (DL) provide the semantic reasoning foundation for a semantic graph (SG) or knowledge base (KB.) The computational complexity of reasoning is a major obstacle issues in the development of Description Logics limiting the efficient scalability of a knowledge base.

[0112] Simultaneously, the proper semantic framing of events being conveyed in communication streams requires ontological reasoning over the knowledge base, primarily for ontological subsumption and satisfiability operations. The present technology capability within its knowledge base that which integrates number theoretic concepts of prime and compound numbers to reduce the (time) complexity of essential KB operations to modulo operations and greatest common factor (GCF) operations that promotes KB scalability and low latency of essential functions.

[0113] Improving Entity Resolution through the integration of LSH Fuzzy Matching and

Knowledge Base Reasoning

[0114] Entity resolution is the process of identifying which entity (or concept) is being referenced in a communication. For example to be able to properly incorporate the information from the sentence: "Jack Crowley bought an Apple." the reader would need to know which "Jack Crowley" from the universe of "Jack Crowleys" and what Apple... the company, a piece of fruit, or something different. Knowledge base reasoning is the process of traversing the edges and nodes in the knowledge base to identify a topic that one is looking for and/or to eliminate those topics that one does not want. Knowledge base reasoning is also the process by which connections between entities can be identified (e.g. employer / employee or supply chain etc.).

[0115] Common methods for entity resolution involve defining a similarity metric between a candidate entity and corpus of known entities. This metric may be a very simple scheme that just uses string matching or a record based approach that attempts to use other, contextual information to facilitate matching. The present technology includes a method that resolves entities across a communication sequence by utilizing a locality sensitive hashing techniques to identify and score a set of candidate entities for each entity identified (sequentially) in the communication and then use an underlying semantic graph / knowledge base to assess a locality measure between each sequentially identified entity and other (sequentially) close candidate entities. The measures between a sequential set of candidate entities form a weighted fully connected graph. The resolved entities are those entities that are visited when performing the dual of a least cost traversal of the graph. This technique also has the ability to identify context shifts in a communication sequence (document) by examining areas where {ei-1 } set of candidate entities is not close to the {ei, ei+1, ... } sequence of candidate entity sets.

[0116] Entity resolution and knowledge based reasoning resolves the structured data by identifying connections between entities and associating specific edges with specific nodes. The resolved structured data is computationally relevant or computationally legible meaning that a programmer or data scientist can use the data or a portion of the data in or as a part of a mathematical calculation.

[0117] FIG. 16 illustrates top-level event ontology. The present technology further refines events by applying the appropriate domain ontology and reasoning over the Subject- Types, Object- Types, etc. For example an acquisition event by Pfizer is a "biotech acquisition event."

[0118] Example of Transforming Unstructured Data into Structured Stream Distribution

[0119] FIG. 17 illustrates the process of structuring unstructured data.

[0120] FIG. 18 illustrates an example transformation of unstructured data into structured data. [0121] FIG. 19 illustrates an example platform output.

[0122] In this manner the event streams disclosed herein can: 1) Automate data collection, structuring, and cleansing processes; 2) Feed and inform applications, model s/algorithms, and databases in real-time to provide: a) Any decision support, machine learning, or analytics systems; b) Risk management; c) RegTech, including Know Your Customer (KYC); d)

Research; 3) Proprietary knowledge graphs; 4) Trading - back testing and trading execution. Conclusion

[0123] While various inventive embodiments have been described and illustrated herein, those of ordinary skill in the art will readily envision a variety of other means and/or structures for performing the function and/or obtaining the results and/or one or more of the advantages described herein, and each of such variations and/or modifications is deemed to be within the scope of the inventive embodiments described herein. More generally, those skilled in the art will readily appreciate that all parameters, dimensions, materials, and configurations described herein are meant to be exemplary and that the actual parameters, dimensions, materials, and/or configurations will depend upon the specific application or applications for which the inventive teachings is/are used. Those skilled in the art will recognize, or be able to ascertain using no more than routine experimentation, many equivalents to the specific inventive embodiments described herein. It is, therefore, to be understood that the foregoing embodiments are presented by way of example only and that, within the scope of the appended claims and equivalents thereto, inventive embodiments may be practiced otherwise than as specifically described and claimed. Inventive embodiments of the present disclosure are directed to each individual feature, system, article, material, kit, and/or method described herein. In addition, any combination of two or more such features, systems, articles, materials, kits, and/or methods, if such features, systems, articles, materials, kits, and/or methods are not mutually inconsistent, is included within the inventive scope of the present disclosure.

[0124] The above-described embodiments can be implemented in any of numerous ways. For example, embodiments may be implemented using hardware, software or a combination thereof. When implemented in software, the software code can be executed on any suitable processor or collection of processors, whether provided in a single computer or distributed among multiple computers.

[0125] Further, it should be appreciated that a computer may be embodied in any of a number of forms, such as a rack-mounted computer, a desktop computer, a laptop computer, or a tablet computer. Additionally, a computer may be embedded in a device not generally regarded as a computer but with suitable processing capabilities, including a Personal Digital Assistant (PDA), a smart phone or any other suitable portable or fixed electronic device. [0126] Also, a computer may have one or more input and output devices. These devices can be used, among other things, to present a user interface. Examples of output devices that can be used to provide a user interface include printers or display screens for visual presentation of output and speakers or other sound generating devices for audible presentation of output.

Examples of input devices that can be used for a user interface include keyboards, and pointing devices, such as mice, touch pads, and digitizing tablets. As another example, a computer may receive input information through speech recognition or in other audible format.

[0127] Such computers may be interconnected by one or more networks in any suitable form, including a local area network or a wide area network, such as an enterprise network, and intelligent network (EST) or the Internet. Such networks may be based on any suitable technology and may operate according to any suitable protocol and may include wireless networks, wired networks or fiber optic networks.

[0128] The various methods or processes outlined herein may be coded as software that is executable on one or more processors that employ any one of a variety of operating systems or platforms. Additionally, such software may be written using any of a number of suitable programming languages and/or programming or scripting tools, and also may be compiled as executable machine language code or intermediate code that is executed on a framework or virtual machine.

[0129] Also, various inventive concepts may be embodied as one or more methods, of which an example has been provided. The acts performed as part of the method may be ordered in any suitable way. Accordingly, embodiments may be constructed in which acts are performed in an order different than illustrated, which may include performing some acts simultaneously, even though shown as sequential acts in illustrative embodiments.

[0130] All publications, patent applications, patents, and other references mentioned herein are incorporated by reference in their entirety.

[0131] All definitions, as defined and used herein, should be understood to control over dictionary definitions, definitions in documents incorporated by reference, and/or ordinary meanings of the defined terms. [0132] The indefinite articles "a" and "an," as used herein in the specification and in the claims, unless clearly indicated to the contrary, should be understood to mean "at least one."

[0133] The phrase "and/or," as used herein in the specification and in the claims, should be understood to mean "either or both" of the elements so conjoined, i.e., elements that are conjunctively present in some cases and disjunctively present in other cases. Multiple elements listed with "and/or" should be construed in the same fashion, i.e., "one or more" of the elements so conjoined. Other elements may optionally be present other than the elements specifically identified by the "and/or" clause, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, a reference to "A and/or B", when used in conjunction with open-ended language such as "comprising" can refer, in one embodiment, to A only (optionally including elements other than B); in another embodiment, to B only (optionally including elements other than A); in yet another embodiment, to both A and B (optionally including other elements); etc.

[0134] As used herein in the specification and in the claims, "or" should be understood to have the same meaning as "and/or" as defined above. For example, when separating items in a list, "or" or "and/or" shall be interpreted as being inclusive, i.e., the inclusion of at least one, but also including more than one, of a number or list of elements, and, optionally, additional unlisted items. Only terms clearly indicated to the contrary, such as "only one of or "exactly one of," or, when used in the claims, "consisting of," will refer to the inclusion of exactly one element of a number or list of elements. In general, the term "or" as used herein shall only be interpreted as indicating exclusive alternatives {i.e. "one or the other but not both") when preceded by terms of exclusivity, such as "either," "one of," "only one of," or "exactly one of." "Consisting essentially of," when used in the claims, shall have its ordinary meaning as used in the field of patent law.

[0135] As used herein in the specification and in the claims, the phrase "at least one," in reference to a list of one or more elements, should be understood to mean at least one element selected from any one or more of the elements in the list of elements, but not necessarily including at least one of each and every element specifically listed within the list of elements and not excluding any combinations of elements in the list of elements. This definition also allows that elements may optionally be present other than the elements specifically identified within the list of elements to which the phrase "at least one" refers, whether related or unrelated to those elements specifically identified. Thus, as a non-limiting example, "at least one of A and B" (or, equivalently, "at least one of A or B," or, equivalently "at least one of A and/or B") can refer, in one embodiment, to at least one, optionally including more than one, A, with no B present (and optionally including elements other than B); in another embodiment, to at least one, optionally including more than one, B, with no A present (and optionally including elements other than A); in yet another embodiment, to at least one, optionally including more than one, A, and at least one, optionally including more than one, B (and optionally including other elements); etc.

[0136] In the claims, as well as in the specification above, all transitional phrases such as "comprising," "including," "carrying," "having," "containing," "involving," "holding,"

"composed of," and the like are to be understood to be open-ended, i.e., to mean including but not limited to. Only the transitional phrases "consisting of and "consisting essentially of shall be closed or semi-closed transitional phrases, respectively, as set forth in the United States Patent Office Manual of Patent Examining Procedures, Section 2111.03.

Claims

1. A system for collecting and transforming unstructured data into computationally relevant structured data, the system comprising:

a collection controller comprising:

an emitter comprising a web crawler to extract the unstructured data;

a topic monitor to monitor the emitter to determine a frequency of first information included in the extracted unstructured data; and

a topological learning network to classify the first information in the extracted unstructured data;

a transformation controller communicatively coupled to the collection controller and a knowledge graph database, the knowledge graph database to store a knowledge graph, the transformation controller including a language modeling controller to model the classified first information and store the classified first information as first structured data within the knowledge graph, the knowledge graph including a plurality of entities and indicating relationships between the plurality of entities; and

a syndicate controller, communicatively coupled to the transformation controller and the knowledge graph database, to perform entity resolution and knowledge-based reasoning over the knowledge graph to resolve the first structured data as first computationally-relevant structured data.

2. The system of claim 1, wherein the topic monitor includes at least one of a Twitter™

monitor, at least one website monitor, and/or a Rich Site Summary (RSS) monitor.

3. The system of claim 1, wherein the topological learning network includes a word encoder and a sentence encoder.

4. The system of claim 1, wherein the topological learning network is a bidirectional Long Short-Term Memory with Attention (BLSTM-A) network.

5. The system of claim 4, wherein the bidirectional Long Short-Term Memory with Attention (BLSTM-A) network generates context-aware word representations based at least in part on the first information in the extracted unstructured data.

6. The system of claim 5, wherein the BLSTM-A network outputs a weighted sum of the

context-aware word representations to provide at least one sentence representation based at least in part on the first information in the extracted unstructured data.

7. The system of claim 5, wherein the BLSTM-A network scores respective context-aware word representations with a word importance vector to provide a sentence vector.

8. The system of claim 1, wherein the topological learning network embeds neurons modeled with axons and dendrites on a surface of a continuously differential manifold.

9. The system of claim 8, wherein the continuously differential manifold is a torus.

10. The system of claim 1, further comprising the knowledge graph database wherein the

knowledge graph database is integrated with prime encoding schemes to reduce complexity of operations on the knowledge graph.

11. The system of claim 1, wherein the syndicate controller performs the entity resolution

through integration of locality sensitive hash fuzzy matching.

12. The system of claim 1, wherein the syndicate controller defines a similarity metric between a candidate entity from the plurality of entities and at least one other known entity from the plurality of entities.

13. A method for extracting and transforming unstructured data into computationally legible structured data, the method comprising:

extracting the unstructured data via at least one web crawler from the World Wide Web; classifying information in the unstructured data using a is a bidirectional Long Short-Term Memory with Attention (BLSTM-A) network to provide classified information; modeling the classified information in the unstructured data to extract semantic information from the unstructured data;

encoding the extracted semantic information in a knowledge graph;

transforming the unstructured data to structured data based on the knowledge graph;

determining relationships between a plurality of entities in the knowledge graph;

performing entity resolution on the knowledge graph to resolve the structured data as computationally legible structured data; and

transmitting the computationally legible structured data to at least one recipient.

14. The method of claim 13, further comprising:

generating context-aware word representations using the BLSTM-A based at least in part on the information in the unstructured data.

15. The method of claim 14, further comprising:

outputting a weighted sum of the context-aware word representations using BLSTM-A to provide at least one sentence representation.

16. The method of claim 14, further comprising:

scoring respective context-aware word representations with a word importance vector to provide a sentence vector.

17. The method of claim 13, wherein modeling the classified information includes embedding neurons modeled with axons and dendrites on a surface of a continuously differential manifold.

18. The method of claim 17, wherein the continuously differential manifold is a torus.

19. The method of claim 13, further comprising:

integrating the knowledge graph with prime encoding scheme to reduce complexity of operations on the knowledge graph.

20. A system for extracting and transforming unstructured data into structured data, the system comprising:

a first controller to extract unstructured data from the World Wide Web; a second controller to transform the unstructured data into a structured representation; a third controller to perform semantic resolution of events, entities, and concepts in the structured representation and thereby provide resolved structured information; and

a fourth controller to transmit the resolved structured information to a recipient.