GB2572760A

GB2572760A - Method and system for generating insight

Info

Publication number: GB2572760A
Application number: GB1805793.5A
Authority: GB
Inventors: Zia Dar Nadim; Vernis Daniel
Original assignee: Machineos Ltd
Current assignee: Machineos Ltd
Priority date: 2018-04-06
Filing date: 2018-04-06
Publication date: 2019-10-16
Also published as: GB201805793D0

Abstract

A computer-implemented method for extracting information comprises receiving data 18, generating synthetic data 22 based on the received data and forming a network 26 of the received and synthetic data. The network comprises nodes for which one or more axioms are generated 30, describing the rules and/or behaviours of each node. In response to a query, information from the network is retrieved 34, preferably by running simulations on the nodes using the axioms. At least some of the information is output 40. Preferably, the received data is updated in real time and the network is updated in response. The network may represent a corporation or a telecommunications network, having nodes which represent staff, devices or property, and the simulations may represent future scenarios. Preferably, the received and synthetic data is represented by data graphs (fig. 4). Queries may be received in a natural language and translated to a symbolic language to search the network (fig. 5). Responses may then be translated from the symbolic language to the natural language for output.

Description

Method and System for Generating Insight

The present invention relates to systems and methods for generating insight into one or more systems and/or entities.

In large systems, such as telecommunications networks, or large entities, such as multinational corporations, it is difficult to understand how different areas intersect and collaborate with each other and how information such as trading transactions, telecommunications network events, cashflows, settlements etc. is processed by different parts. Organisations, for example, are in essence information process entities that drive value generation. Understanding the behaviour of the various parts of an organisation or system, at both a human and machine level, can aid in understanding how it can be optimised in order to generate superior, sustainable, long term value generation.

There is a need for a system which can generate a network to model how a system or entity processes information and how different parts interact with each other.

According to a first aspect of the present invention, there is provided a method for extracting information, wherein the method is implemented on a computing device, and the computing device is programmed to execute the steps of the method, the method comprising: receiving data; generating synthetic data based on the received data; forming a network of the received and synthetic data, wherein the network comprises nodes; generating one or more axioms for one or more nodes in the network; retrieving information from the network in response to a query; and outputting at least some of the information.

In one example, the method further comprises generating trust ratings for the received data and, optionally, for the synthetic data. In another example, the method further comprises updating the received data in real time and updating the network in response to the updated received data.

In one example, retrieving information from the network in response to a query comprises running one or more simulations on one or more nodes using the one or more axioms, wherein the information comprises one or more outputs of the one or more simulations.

In one example, the data relates to one or more organisations. In another example, the one or more simulations represent one or more potential future scenarios for the one or more organisations.

In one example, the method further comprises structuring the received data into a first set of one or more data graphs, and wherein the synthetic data is generated in a second set of one or more data graphs.

In one example, the method further comprises receiving one or more queries.

In one example, the method further comprises: receiving the one or more queries in a natural language; translating the one or more queries to a symbolic language; searching the network for one or more responses to the one or more queries in the symbolic language; translating the one or more responses from the symbolic language to the natural language; and outputting the one or more responses in the natural language.

In another example, one or more symbols of the symbolic language comprises one or more graphs comprising nodes.

According to a second aspect of the present invention, there is provided a computing system for extracting information, the computing system comprising: a receiver configured to receive data; a storage device configured to store data; a processor programmed to execute the steps of: receiving data; generating synthetic data based on the received data; forming a network of the received and synthetic data, wherein the network comprises nodes; generating one or more axioms for one or more nodes in the network; retrieving information from the network in response to a query; and outputting at least some of the information.

In one example, the processor is further programmed to execute the step of generating trust ratings for the received data and, optionally, for the synthetic data. In another example, the processor is further programmed to execute the steps of updating the received data in real time and updating the network in response to the updated received data.

In another example the data relates to one or more organisations. In a further example, the one or more simulations represent one or more potential future scenarios for the one or more organisations.

In one example, the processor is further configured to structure the received data into a first set of one or more data graphs, wherein the synthetic data is generated in a second set of one or more data graphs. In another example, the processor is further configured to receive one or more queries.

In one example, the processor is further programmed to execute the steps of: receiving the one or more queries in a natural language; translating the one or more queries to a symbolic language; searching the network for one or more responses to the one or more queries in the symbolic language; translating the one or more responses from the symbolic language to the natural language; and outputting the one or more responses in the natural language.

In another example, each symbol of the symbolic language comprises one or more graphs comprising nodes.

According to a third aspect of the present invention, there is provided a method of generating responses to queries, wherein the method is implemented on a computing device in communication with a network of data and the computing device is programmed to execute the steps of the method, the method comprising: receiving one or more queries in a natural language; translating the one or more queries to a symbolic language; searching the network of data for one or more responses to the one or more queries in the symbolic language; translating the one or more responses from the symbolic language to the natural language; outputting the one or more responses in the natural language.

In one example, the network of data comprises one or more graphs comprising one or more nodes; and wherein one or more symbols of the symbolic language comprises one or more graphs comprising nodes.

Embodiments of the present invention will now be described in detail with reference to the accompanying drawings, in which:

Figure 1 is a flowchart showing the components of the insight generation system in accordance with an embodiment of the present invention;

Figure 2 is a flowchart showing stages of the process carried out by the insight generation system in accordance with an embodiment of the present invention;

Figure 3 is a flowchart showing processes carried out by the data structuring engine in accordance with an embodiment of the present invention;

Figure 4 shows how the data structuring engine structures the data in accordance with an embodiment of the present invention; and

Figure 5 demonstrates a query database used by the insight detection engine in accordance with an embodiment of the present invention.

In embodiments, with reference to Figure 1, the insight generation system 2 comprises a data structuring engine 4, a network generator 6, an axiom generator 8, a simulation engine 10, and an insight detection engine 12. In some embodiments, the insight generation system 2 is connected to a computing device which provides a user interface. The insight generation system 2 receives external data 14 and outputs insight 16.

With reference to Figure 2, the insight generation system 2 allows the extraction of information 36. In an embodiment, the information 36 extracted is insight 16 generated by the method of the invention.

In an embodiment, the insight generation system 2 carries out the steps of receiving data 18, generating synthetic data 22, forming a network 26, generating axioms 30, retrieving information 34 and outputting information 40. Data 14 is received 18 from, for example, a system or entity. Synthetic data 24 is generated 22 based on the received data 20. A network 28 is formed 26 from the received data 20 and the synthetic data 24. Axioms 32 are generated 30 for nodes in the network 28. Information 36 is retrieved 34 in response to a query.

In some embodiments, the insight generation system 2 may be implemented on a computing system, including at least a receiver, storage device and processor. The receiver may receive data 18 and the storage device may store the received data 20 and may also store data produced by the insight generation system 2. The processor may be configured to execute functions of one or more of the data structuring engine 4, network generator 6, axiom generator 8, simulation engine 10, and insight detection engine 12.

The computing system may also comprise an input device and/or an output device as part of a user interface to allow a user to interact with the insight generation system 2. The input device may be a keyboard, a touchscreen, or a microphone, or any other input device. The output device may be a display screen, printer, loudspeaker, or any other output device.

The insight generation system 2 can gather, structure and analyse data 14 in order to provide insight 16 into the data. The term “insight” refers to information that can be deduced from data analysis, for example information 36 about systems and/or entities described by the data. In some embodiments, the insight 16 generated by the insight generation system 2 relates to future behaviours, events or effects that can be predicted from the received and synthetic data 20, 24.

The insight generation system 2 receives external data 14, which is then received data 20. The data 14 can come from any source. In embodiments, the data 14 is received from one or more entities and/or systems. Examples of entities include organisations, institutions, associations, companies, governments, corporations. Examples of systems include telecommunications networks, computer networks, software systems, electrical systems, economic systems, social systems, markets, industries. In a preferred embodiment, the insight generation system 2 receives data 14 from at least one company and generates insight 16 relating to the past, present or future activities of the company.

The received data 20 can be of any type or format. For example, the received data 20 can be one or more of text, audio, images, video, bits, articles, instant messages, emails, newsfeeds, databases, social media connections, transactional data or any other type of data. Preferably, the insight generation system 2 can receive data 14 continuously, and can particularly receive new data 14 whenever the data 14 changes. The flow of data 14 may be generated by, for example, activities of individuals in an organisation, price fluctuations, changes in device performance, or network usage variation.

In some embodiments, the insight generation system 2 receives data 14 directly from the source. In other embodiments, the insight generation system 2 receives data 14 from one or more data storage systems, which may be internal or external to the one or more systems and/or entities. The insight generation system 2 may receive data 14 by a wired and/or wireless connection to one or more data storage devices and/or input devices. The wired and/or wireless connection may be made via a computing device. The insight generation system 2 may receive data 14 from a network, such as a wireless local area network, which may receive data 14 from the internet.

In order to generate useful insight, it is preferable for the received data 20 to originate across a period of time of at least one year. It is also preferable for the received data 20 to be evenly spread throughout the period of time across which it originates. These preferred features each help to show any seasonality that may be present in the received data 20. It is even more preferable for the received data 20 to originate across a period of at least two years. This gives more confidence to the determined seasonality and further helps to establish patterns of behaviour. The longer the period of time spanned by the received data 20, the more useful the generated insight 16.

In embodiments, the insight generation system 2 comprises a data structuring engine 4. When the insight generation system receives data 14, the data is first processed by the data structuring engine 4. The data structuring engine 4 structures 42 the received data 20 and generates 22 synthetic data 24. Preferably, the data structuring engine also generates a trust rating 44 for each piece of received data 20, and optionally assigns meaning 48 to the received and synthetic data 20, 24, as set out in the flow chart of Figure 3.

First, the data structuring engine 4 receives data 14 and automatically structures 42 the received data 20 into one or more graphs 50. As demonstrated in Figure 4, each graph 50 comprises one or more nodes 52. A graph 50 may also comprise one or more edges 54 connecting various nodes 52. A first node 52 may be connected to any number of other nodes 52 by any number of edges 54. A graph 50 may further have a specific shape that depends on the data it represents. The shape of a graph 50 may be determined by the number of nodes 52 and/or edges 54 it contains, and/or by the connections between nodes 52 and edges 54.

Nodes 52 and edges 54 can represent the data in a number of ways. In some embodiments, nodes 52 may represent objects or entities. In other embodiments, edges 54 may represent relationships, actions, or behaviour. The way in which a graph 50 represents data may depend on the format or content of the data. Each identified node 52 and edge 54 in the data may be associated with one or more attributes. An attribute may define one or more of a state, descriptor or characteristic of a node 52 or an edge 54.

In a first example, wherein the received data 20 is in the form of written text or spoken word, nouns may be represented in a graph 50 by nodes 52, verbs may be represented by edges 54, and adjectives may be represented by attributes. If received data 20 comprises speech in audio format, the data structuring engine may translate the audio into text with a speech-to-text algorithm, such as that used in any known speech to-text software. Machine learning can give such an algorithm the ability, through practice, to learn to understand a variety of accents.

In a second example, wherein the received data 20 is in the form of images or video, nodes 52 may represent, for instance, shapes, objects, people or animals. Edges 54 may represent, for instance, spatial relationships between the nodes 52 in each image or video frame, and/or spatial or temporal relationships between the nodes 52 across a plurality of images or video frames.

In a third example, wherein the received data 20 is in tabular form, such as in spreadsheets, nodes 52 may represent, for instance, columns and/or rows. Edges 54 may, for instance, represent correlations and/or relationships between columns and/or rows.

Non-limiting examples of types of nodes 52 are people, countries, property, natural resources, intellectual property, buildings, machinery, factories, and offices. Each of the nodes 52 can then have associated attributes. For example, a node 52 that represents a person could have attributes of job title, location, salary, department, etc.

In a preferred embodiment, the data structuring engine 4 generates trust ratings 44 for the received data 20. A trust rating is an expression of the reliability of the data and the confidence level. Trust ratings may typically be given as a percentage. For example, data with a trust rating of 100% would mean that the data is completely reliable, and the lower the percentage, the less trustworthy the data. One or more thresholds for the trust ratings may be set to signify how trustworthy the data is. For example, it may be preferred that having a trust rating of 95% or above means that the data is reliable and having a trust rating lower than 95% means that the data is unreliable.

In some embodiments, two thresholds may be set, for example a first threshold of 95% and a second threshold of 80%, to signify different levels of trustworthiness or reliability. For example: trust ratings below the second threshold may indicate that the data cannot be trusted at all; trust ratings between the first and second thresholds may indicate that the data can be trusted to an extent but must be viewed with caution; and trust ratings above the first threshold may indicate reliable data that can be trusted to produce useful insight. The extent and value of the insight provided by the insight generation system 2 is driven by how trustworthy the input received data 20 is.

The trust rating can be generated 44 by analysing the connections between systems which have processed the data and tracing the source of the data. If the data is associated with other systems that are not reliable, this could downgrade the trust rating of the data. Additionally, the reliability of the data source can be an important indicator of the reliability of the data itself.

For example, data from a conversation in an instant messaging application may be rated as less reliable than data from an official report. Furthermore, manual changes, breakages and bug fixes can also lessen the reliability of the data and therefore detecting the presence or occurrence of these events can contribute to the generation of the trust rating.

Preferably the trust rating may be 95% or above, for example 95-100%.

Preferably, if the generated trust rating for a piece of data is less than a threshold value, such as 95%, the data may still be used in the subsequent processing, but the data can be flagged as having a less than optimal trust rating. Alternatively, data with a trust rating below a threshold value, such as 95% or 80%, may be ignored or discarded and only data with a trust rating at the threshold value or higher, i.e. more than or equal to 95% or 80%, is used for the insight generation. In some embodiments, data with a trust rating between first and second threshold, such as a trust rating more than or equal to 80% and less than 95%, is used in the subsequent processing but flagged as having a less than optimal trust rating, and data with a trust rating below a second threshold, such as below 80%, is ignored in the subsequent processing.

In some embodiments, meaning can be assigned 48 to individual pieces of data 20, or to collections of data 20. Meaning can be assigned 48 manually, e.g. by a user, or automatically by an algorithm. Meaning can be assigned 48 before or after structuring 42 and before or after generating a trust rating 44. Data 14 received by the data structuring engine 4 may be associated with metadata, which can provide meaning, or context, about the received data 20 itself. If no metadata is present, or the metadata does not contain enough useful information, the data structuring engine 4 can assign meaning 48, or context, to the received data 20. Meaning can be deduced from other aspects of the data 20, such as the source and/or format of the data 20.

In an example of data as a strings of numbers, the data structuring engine 4 may assign significance to the strings of numbers that would be meaningful to a user of the insight generation system 2. For example, strings of numbers could represent specific customers, or specific routers in a computer network. By assigning meaning 48, the data 20 is given context that facilitates a better understanding of the network generated by the network generator 6 and the insights extracted from the network.

The value of the insight provided by the insight generation system 2 depends on the amount of data 14 available to the system. The more data 14 provided to the insight generation system 2, the more useful the generated insights 16 are. Therefore, it is advantageous to supplement the received data 20 with synthetic data 24 in order to increase the total amount of data.

Synthetic data 24 is data that is created by the insight generation system 2, as opposed to “real” data 20 that is received by the insight generation system. In embodiments, aside from the source of the data, synthetic data 24 is indistinguishable from received data 20. Synthetic data 24 may be created by producing data with similarities to received data 20, such as a similar type, format and/or category, but that differs from the received data 20 such that there is a wider variety of total data.

The amount of synthetic data 24 needed depends on the amount of received data 20. For example, the shorter the period of time over which data 14 is received, or the fewer the sources from which data is received, the higher the amount of synthetic data 24 needed to be generated to make up the deficit. In another example, if the received data 20 does not contain enough data with high enough trust ratings, the data that does have high enough trust ratings can be used to make up for it by generating additional reliable synthetic data. The amount of synthetic data 24 is generally higher than the amount of received data 20 in order to increase the quality of the insight. In one embodiment, the amount of synthetic data is around ten times the amount of received data 20.

The data structuring engine 4 can generate synthetic data 24 based on the received data 20. The synthetic data 24 may be generated with the same graph structure of nodes 52, edges 54 and attributes as the received data 20. Trust ratings can also be generated for the synthetic data 24. Synthetic data 24 may be given the trust rating of the received data 20 from which it was generated. If received data 20 is combined to produce synthetic data 24 or if received data 20 is input to an algorithm, the trust ratings of the received data 20 can be compounded.

Synthetic data 24 may be generated 22 by analysing received data 20 to determine one or more features, such as type, format, and/or category. The data structuring engine 4 can then generate new data 24 with the same features of the received data 20, such as fitting into the same category of data.

In a simplified example of generating synthetic data 24, the data 20 comprises the following list: red; green; yellow. The data structuring engine 4 would analyse this received data 20 and determine that the data is a list of colour words. The data structuring engine 4 can then supplement the received data 20 with synthetic data 24 of the same type, so for example the synthetic data could comprise the list: purple; blue; pink; green; orange.

For embodiments in which trust ratings have been generated for the received data 20, the generation of synthetic data 24 may be sensitive to the trust ratings of the received data 20. In some cases, the trust rating for certain pieces or groups of received data 20 may not be high enough to suggest that the data is trustworthy, or in other words, the trust rating of received data 20 is below a predetermined threshold, such as 95% or 80%. Consequently, the data structuring engine 4 may be configured not to use this low-trust received data 20 as basis for synthetic data 24. In other words, the data structuring engine 4 would only generate synthetic data 24 based on received data 20 with a trust rating at or above a predetermined threshold, such as received data 20 with a trust rating at or above 95% or 80%.

However, in some embodiments, the decision to use or exclude one or more pieces of received data 20 from the generation of synthetic data 24 based on the trust rating may depend on the trust ratings of other received data 20. Received data 20Received data 20 with a lower trust rating may be perceived to be less problematic if associated or combined with received data 20 6 with a higher trust rating. For example, the data structuring engine 4 may use received data 20 with a trust rating between a first and second threshold, such as between 80% and 95%, only in association or combination with received data 20 with a trust rating above a first threshold, such as 95%.

The data structuring engine 4 performs its functions in real time as it receives new received data 20. Subsequent to receiving data 14, the order of the functions described here is a preferred order. However the events may be carried out in any order.

In embodiments, the insight generation system 2 comprises a network generator 6 which creates 26 a network 28 of the structured received and synthetic data 20, 24 from the data structuring engine 4. The network 28 is a representation of the data 20, 24 and its interrelations. The network 28 comprises nodes 52 identified by the data structuring engine 4. The network 28 need not comprise all of the received data 20 and synthetic data 24. The network 28 may only comprise the data with trust ratings above a certain threshold, such as 80% or 95%. It is preferable for the network 28 to be generated automatically in real time and maintain a time series history. The network generator 6 allows monitoring of the status of the network 28 and the systems and entities represented by the data in real time.

Depending on the amount of data provided to the network generator 6, the network 28 could have hundreds of billions of nodes 52. Preferably, the network 28 comprises 1000 or more nodes 52; more preferably, the network 28 comprises 10,000 or more nodes 52; and even more preferably, the network 28 comprises 100,000 or more nodes 52. The more nodes 52 in the network 28, the more useful the insight will be due to the higher the level of granularity of the data.

Networks with lower numbers of nodes 52, for example under one thousand nodes 52, can still be used to provide insight and make predictions relating to future scenarios, but the predictions may not be as reliable or useful as predictions from a network of, for example, millions of nodes 52. Having more nodes 52 can increase the confidence level of the insight.

The network generator 6 automatically generates 26 the network 28 and preferably updates the network 28 in real time as the received data 20received data 20 changes. A record or history of the network 28 is preferably retained as the network 28 changes in real time to allow a snapshot at a particular point in the past to be retrieved.

The network generator 6 examines the network 28 to detect and note structures with the same configuration of nodes 52 and edges 54, i.e. isomorphs. Isomorphs represent situational correlation, i.e. a similar or identical opportunity or issue arising. Isomorphs can be a useful indicator of certain occurrences, such as a company’s future collapse.

For example, a collection of key isomorphs may represent aspects of the function of an organisation or system that are essential to its success, or aspects which provide the foundations of an organisation or system. Changes to these key isomorphs can provide warning signals. For example, degradation of isomorphs may indicate that an organisation or system is about to fail. Therefore, the network and isomorphs are monitored as they change over time.

The network generator 6 allows root cause analysis. The paths within the network 28 can be traced back through nodes 52 to identify the source of a changes that has happened at a node 52. In an example, the network represents a telecommunication network and a node 52 represents a router with the attribute that the router has broken down. By working backwards through the chain of nodes, the root cause of the breakdown can be found. This increases the speed and efficiency with which problems in the network 28 can be solved by speedily and reliably detecting the cause of a problem.

It is advantageous for insights produced by the insight generation system 2 to be able to use the network 28 to make projections about the future of an organisation or system and to model potential outcomes based on hypothetical events. In order to do this, the behaviour of the nodes 52 needs to be determined.

In embodiments, the insight generation system 2 comprises an axiom generator 8 which can analyse the network 28 to determine axioms 32 and theorems for the nodes. An axiom 32 describes the function, i.e. the behaviour, of a node 52 in the network 28 and each theorem describes a potential outcome of the node 52.

The axiom generator 8 preferably generates axioms 32 and theorems for all nodes in the network 28 for which axioms can be generated 30. If axioms 32 and theorem are not generated 20 for all of the nodes 52, the network 28 can still be useful for providing insight. However, it is possible that this could miss out key nodes 52 that would have affected the insight. Therefore the insight generation system 2 may be less effective without axioms 32 for all nodes 52.

It is preferable for all of the nodes 52 to have input, processing and output functions. However some nodes 52 may not have all of these functions, or may develop or lose functions over time. For example, a node 52 may not have a processing function and therefore not have an axiom 32. This node 52 would not provide a theorem and so would essentially represent a dead-end in the network 28. A node 52 without an output function would also be a dead-end in the network 28.

The generated network 28 may change in real time as the received data 20 changes. In order to project the status of a node 52, the behaviour of that node 52 must be deduced. By identifying how the attributes of nodes 52 are changing, the axiom generator 8 can produce an axiom 32 describing the rules and/or behaviour of the nodes 52. The rules applied by the nodes 52 can be, for example a simple algorithm, a complex algorithm, or a neural network.

For example, by observing changes to attributes x and y as an input to a node 52 and a change to attribute z as an output from the node 52, the node’s behaviour can be expressed as f(x,y) = z and an axiom 32 for the function f can be deduced. The insight generation system 2 can generate axioms 30 to deduce theorems with at least a 95% confidence level.

Once axioms 32 and theorems have been established for the nodes 52, it is possible to predict the impact of changes to an organisation or system by working out what would happen to the network 28 if the nodes 52 were to be manipulated. In this way, the axiom generator 8 can be used to investigate, for example, the effect of removing a node 52, which in real life could represent the sale of property, loss of an employee, or failure of a device, for example.

Using the axioms 32, it is also possible to project what is likely to happen in each node 52 and across the whole organisation or system in the future. The projections can be made for any time in the future, however the confidence level decreases the further in the future the projection goes. Preferably, the axiom generator 8 makes projections for up to a year in the future, but more preferably the projections are provided over a subsequent period of up to three months. In embodiments, this capability is provided with a user interface on which a user can indicate a specific point in time for which they would like to view the projected status of the organisation or system.

In embodiments, the insight generation system 2 comprises a simulation engine 10 which can run simulations on the network 28. The simulation engine 10 may simulate future, past and/or real time behaviour of the network. The simulation engine can therefore simulate potential future scenarios in relation to the one or more systems and/or entities represented by the network 28.

The simulation engine 10 can apply algorithms to the axioms 32 of the nodes 52 to simulate the behaviour of the network 28. In many cases, the network 28 may be too large to run simulations on the entire network 28 within a reasonable amount of time, therefore it may be preferable for the simulation engine 10 to divide the network 28 into smaller sub-networks and run simulations on the sub-networks. The sub-networks may overlap and interlink with each other.

In embodiments, the insight generation system 2 comprises an insight detection engine 12 which allows a user to interact with the insight generation system 2 and gain insight 16 about the data 20, 24 that makes up the network 28 and about the systems or entities represented by the data 20, 24.

“Insight” refers to information 36 that can be extracted from the network. Nonlimiting examples of information 36 that can be extracted include past, present or projected performance of a system or entity with regard to value, efficiency, profit, loss or efficacy. The information 36 may describe the performance itself, or provide indicators as to performance. The information 36 may indicate weak or broken areas of the network, such as ineffective personnel or broken electronic components. The information 36 may suggest improvements to the network, such as more effective components, or cheaper materials, for example to increase efficiency.

The insight detection engine 12 may interact with the simulation engine 10 in order to simulate scenarios which can generate insight 16. The insight detection engine 12 may also interact with the network 28 to extract insight 16 without running any simulations, however it is preferable to run simulations so that potential scenarios can be tested.

The simulation engine 10 may run simulations on parts of the network 28 in response to one or more queries received by the insight detection engine 12. A simulation tests one or more scenarios, and may output details relating to the scenario. The outputs may describe the results of the scenario at various points throughout the simulation. The insight provided by the insight detection engine 12 may comprise the entire output of a simulation, or a condensed version of the output. The insight may be tailored to only comprise the output that are directly related to a query. The insight detection engine 12 may also process the output so as to provide insight in a different format, for example converting an output in numbers to insight in words.

For example, in response to a query relating to projected variations in a company’s profit given certain conditions, the simulation engine 10 may run a simulation on one or more parts of the network 28 according to the conditions and continually output the projected profit as it changes throughout the simulation, or output the projected profit for certain future time intervals. The insight provided by the insight detection engine 12 may comprise, for example, the entire series of profit variations, the profit at certain key points in time, or the total change in profit between now and a future moment in time.

In embodiments, the insight generation system 2 is connected to a computing device which provides a user interface. For example, interaction with the insight generation system 2 may be provided through a personal computer, including a laptop, tablet, smartphone, and desktop PC. The insight generation system 2, or part of the insight generation system 2, may be provided as a software application on the computing device.

The user interface may include software to allow the user to input queries and, preferably, to receive responses. The user interface may also include an input device, such as a keyboard or touchscreen, or microphone. The user may input the query by, for example, typing or speaking the query, or selecting options on the input device. The response to the query may be output to the user on an output device, such as text, graphics or images displayed on a screen or pages from a printer, or as automatically generated speech output by a loudspeaker, for example.

In embodiments, the insight detection engine 12 comprises a universal language generator which generates a symbolic language to facilitate the mapping of natural language to patterns in the network 28. The insight detection engine 12 can receive a natural language query and provide a natural language response based on data from the network. In some embodiments, the natural language query is converted by the universal language generator into a symbolic language to enable the query to be mapped to patterns in the network 28 that potentially provide a response to the query.

Queries and responses may comprise one or more sentences, words, phrases, numbers, tables, diagrams, graphs, or any other form of representing information. A query may be in the form of one or more questions, search terms, keywords, commands or statements, for example. A response to a query may be in the form of an answer to a question, matches to a search term, or any representation of requested data, for example.

In other embodiments, the natural language query is decomposed into linguistic features, such as nouns, verbs and adjectives, and represented in a structure which corresponds to the nodes 52, edges 54 and attributes of the received and synthetic data structured by the data structuring engine 4. Therefore, a query can be represented as a graph of a similar type to the graphs of received and synthetic data which make up the network 28 produced by the network generator 6. In this case, the symbolic language uses graphs as symbols to express the queries. The data structuring engine 4 may be used to perform the conversions of queries and responses between natural language and graph format.

When the insight detection engine 12 has a query in a symbolic format, which may be a query graph, the insight detection engine 12 can search for places in the network 28 that have similarities, such as similar shapes, to the query graph. Similarities in the network 28 signify that the network 28 around the similarity could potentially provide the answer to the query.

Similarities between shapes can be due to structure only, or to structure and behaviour. For example, a query graph may have the structure of three nodes connected by edges in a triangle. Nodes in the network arranged with this structure would therefore have a similar shape to the query graph. If the nodes in the network with the same structure as the query graph also have similar behaviour to the nodes in the query graph, then this may be a further factor to consider when detecting similarities. A non-limiting example of the behaviour of a node may be intermittent failure of a network router.

The simulation engine 10 may then run simulations on sub-networks that encompass the areas of the network 28 that have similarities to the query graph. The sub-networks are still connected to the rest of the network and connections in the network may cause the simulation to draw in the surrounding network 28, essentially expanding the sub-network under simulation. The sub-networks may interlink and overlap, so that one node could belong to more than one sub-network. A node that belongs to more than one sub-network may have different levels of significance in different sub-networks.

The extent of simulations of the sub-networks may be limited by a number of factors. One factor would be achieving the goal of the query. Queries can often be interpreted as representing a goal that the simulation is looking to reach. For example, a goal could be one or more nodes 52 achieving certain attributes with certain confidence level (worked out using the trust ratings which may be generated for the data) within a certain time frame.

If a sub-network under simulation achieves the goal, then that sub-network can provide the response to the query. The simulation engine may end the simulation once the goal is achieved. The insight detection engine 12 may extract the response from the sub-network that achieved the goal. The response extracted from the network 28 may initially be in network format, but this can be converted into natural language for comprehension by the user. Such conversion may be achieved through the use of a symbolic language.

Another factor which may limit the extent of the simulations is the time duration of the scenario being simulated, i.e. how long the events would take to play out in real-time, as opposed to machine time. The real-time limit on the simulations may vary according to the type of scenario being simulated and could depend on whether the query specifies any time limits. As an example, it is unlikely to be useful to run a simulation that covers one hundred years in real-time. Furthermore, the confidence level of the insight for such a long simulation may be too low for the insight 16 to be useful and/or reliable. It would be more common and useful to run simulations that cover around one to six months in real time. Preferably simulations cover a period of between two to five months in real time, for example three months or four months.

Simulations are also limited by the data available. For example, decisions within a simulation cannot be based on future decisions or actions. Also, it is preferable that the initial conditions for the simulation do not change, even though the data may be continually changing as the received data 20 is updated in real time. A simulation may start from a snapshot of the network 28 or a range of conditions.

To facilitate the query and response process and reduce the time to provide the response, the insight detection engine 12 may in some embodiments comprise a query database 56 containing natural language queries 58, 60 and/or the equivalent symbolic language queries 62, 64 matched to their associated natural language responses 66, 68 and/or the equivalent symbolic language responses 70, 72, as demonstrated in Figure 5.

Initially, the insight detection engine 12 is provided with a default query database 56 populated with common or expected queries 58, 60, 62, 64 and responses 66, 68, 70, 72. When the insight detection engine 12 receives a query 58, 60, 62, 64 already stored in the query database 56, it does not have to go through the process of translating and searching through the network. Rather, the insight detection engine 12 can look up the query 58, 60, 62, 64 and response 66, 68, 70, 72 in the query database 56 and provide the stored response 66, 68, 70, 72.

In embodiments, when the insight detection engine 12 receives a new query that is not already stored in the query database 56, the query database 56 is updated with the new query and a response to the new query so that the response time for that query is quicker in the future. The insight detection engine 12 can also use permutations of variables within stored queries 58, 60, 62, 64 to create its own new internal queries and then find and store the response.

Using permutations of stored queries allows a shorter response time if the insight detection engine 12 receives the created internal query in the future. With each new query received, more internal queries can be created due to the increased possibilities of query permutations.

Finding the response to internal queries constructed by the insight detection engine 12 requires the simulation engine 10 to run simulations in the same way as for queries which originate from a user. Simulations for internal queries may be referred to as passive simulations, whereas simulations carried out in response to a received query may be referred to as active simulations.

For example, linguistic features of a natural language query can be exchanged for others from the same category. In a more specific example, the insight detection engine 12 could exchange the underlined features of the following query 58, 60 for other related features: “Tell me the impact of an increase in the price of crude oil by 5% in the next three months on the working capital for organisation X.”

For example: increase could be replaced with decrease or no change; crude oil could be replaced with another resource, such as natural gas; 5% could be replaced with any other percentage or price; next could be replaced with past; three months could be replaced with any other unit of time; working capital could be replaced with another financial metric, such as debt; organisation X could be replaced with another organisation, such as X’s parent company Y.

Replacing varying numbers of these features could give rise to a large number of related internal queries. The simulation engine 10 would run passive simulations on the internal queries and the responses to these queries may be stored by the insight generation system 2.

In some embodiments, the insight generation system 2 comprises a dictionary of graph shapes to facilitate quicker searching for responses. The dictionary may be populated by at least the network generator 6 and/or the axiom generator 8. The network generator 6 may populate the dictionary with structural shapes and link them to the data that they represent. The structural shapes in the dictionary are structural shapes that appear within the network, such as shapes of graphs 50 of received and synthetic data produced by the nodes 52 and/or edges 54.

The axiom generator 8 may populate the dictionary with behavioural shapes that appear within the network 28. Behavioural shapes are produced by the behaviour of the nodes 52 and/or edges 54. The dictionary may be populated with behavioural shapes linked to the behaviour of the data that they represent.

The dictionary of structural shapes and/or behavioural shapes facilitates the search of the network 28 by the insight detection engine 12 for matches to queries. The dictionary provides the insight detection engine 12 with prior knowledge of how specific pieces of data may be formed within the network 28. The insight detection engine 12 can therefore search the network 28 for shapes, rather than for specific data content.

This can improve the speed and ease with which the insight detection engine 12 locates matches to queries in the network 28.

This method of receiving queries, finding a response in a network 28 and outputting the response can be applied to any data network and is not limited to the networks described herein.

The insight generation system 2 can be employed for a wide range of entities and/or systems in a wide range of situations. An entity and/or system can be of any size, for example a small local telecommunications network, an office, or a national or multinational corporation. However, the insight generation system 2 is likely to be more effective in larger entities and/or systems because they will usually be able to provide a greater volume of real data.

In one example, the insight generation system 2 is used to gain insight into the operation of production lines in factories. The data provided to the insight generation system 2 may include information about raw materials, specific pieces of machinery, deliveries, yield, costs, energy consumption, production sequences, product quality, interactions between operations, and any other data sources.

The network provided by the network generator 6 would mimic the configuration of the production lines, machinery, inputs and outputs. Scenarios which the simulation engine 10 could run include changes in price of raw materials, decrease in demand for a manufactured product, or the breakdown of one or more pieces of machinery. The insight generation system 2 may be used to find out what would happen if one of those events, or any other event, were to occur within a particular time frame, for instance.

An example of how the insight generation system 2 may be applied to a telecommunications network is provided below.

A. Data Structuring Engine 4

1. Import Network Traffic Data, Network Faults Data, Import Network Operations Centre Support Ticket Data, Emails, IMs, Voicemail.

2. Structure the data to find nodes, edges and attributes for customers, devices in the network and call centre staff.

3. Generate synthetic data to complement the real data streaming in to stress test the future operating modes of the client’s network.

B. Network Generator 6

1. Create a graph network visualizing connections between human nodes and systems and devices nodes in the client’s network.

2. Look for shapes in the graph that represent similar issues, opportunities, such as recurring network outage root causes, recurring solutions for network outages, recurring automated solutions to prevent outages in the telecoms network.

C. Axiom Generator 8

1. Populate each node in the graph network with the behaviour of each human (such as call centre staff and their reactions to specific types of incidents) and the way that systems and devices react to outages.

D. Simulation Engine 10

1. Simulate potential outages in the telecoms network to discern the impact on the organization in terms of its ability to react or proactively mitigate outages.

E. Insight Detection Engine 12

1. Ask questions in natural language such as “If this server in the London Data Centre goes down, what is the impact on the telecoms network and which of my clients get impacted the most?” These questions are then run as scenarios by the simulation engine 10 and the responses are provided in natural language.

An example method of the present invention is provided below as pseudocode. The pseudocode sets out exemplary functions, inputs and outputs for each part of the insight generation system 2. In the pseudocode, ‘Xandra’ refers to the data structuring engine 4, ‘Genesis’ refers to the network generator 6, ‘Dyson’ refers to the axiom generator 8, ‘Trantor’ refers to the simulation engine 10, and ‘Cora’ refers to the insight detection engine 12.

A. Xandra - Data Structuring Engine 4

1. Xandra_Real_Fetch (Text)

a. Xandrajdentify (Text, Nouns, Verbs, Adjectives)

b. Xandra_Convert (Nodes, Edges, Attributes, Trust Ratings)

2. Xandra_Real_Fetch (DB Transactions, CSVs, Spreadsheets)

a. Xandrajdentify (Columns, Nouns, Verbs, Adjectives)

b. Xandra_Convert (Nodes, Edges, Attributes, Trust Ratings)

3. Xandra_Real_Fetch (Voice)

a. Xandra_Speech_To_Text (Voice, Noise Reduction Algorithms)

b. Xandra_Speech_To_Text (Voice, Word Signature Algorithms)

c. Xandrajdentify (Voice, Nouns, Verbs, Adjectives)

d. Xandra_Convert (Nodes, Edges, Attributes, Trust Ratings)

4. Xandra_Real_Fetch (Image)

a. Xandra_lmage_To_Text (Image, Noise Reduction Algorithms)

b. Xandra_lmage_To_Text (Image, Image Vector Recognition Algorithms)

c. Xandrajdentify (Image, Nouns, Verbs, Adjectives)

d. Xandra_Convert (Nodes, Edges, Attributes, Trust Ratings)

5. Xandra_Real_Fetch (Video)

a. Xandra_Video_To_lmages (Video, Frame Separation)

b. Xandra_lmage_To_Text (Image, Noise Reduction Algorithms)

c. Xandra_lmage_To_Text (Image, Image Vector Recognition Algorithms)

d. Xandrajdentify (Image, Nouns, Verbs, Adjectives)

e. Xandra_Correlate_lmages (Images, Nouns, Verbs, Adjectives)

f. Xandra_Convert (Nodes, Edges, Attributes, Trust Ratings)

6. Xandra_Synthetic_Data_Generator (Real_Data, Timejnterval)

a. Xandrajdentify (Text, Nouns, Verbs, Adjectives)

b. Xandra_Convert (Nodes, Edges, Attributes, Trust Ratings)

Using the Xandra_Real_Fetch function, the data structuring engine 4 can receive real data 14 in a number of different forms: text, database (DB) transactions, commaseparated values (CSV) files, spreadsheets, voice, image, and video. The data structuring engine 4 identifies different aspects of the data, including nouns, verbs and adjectives and converts these to nodes, edges and attributes. Trust ratings are also generated.

Image and speech data is processed to reduce noise and converted to textual format. Video data is converted to image data by separated the video into individual frames. The cleaned images are processed to select and identify shapes. Image vector algorithms estimate views of shapes from different direction to create a 3D view and identify the object represented by the shape.

The Xandra_Synthetic_Data_Generator function takes the real data and produces synthetic data with nodes, edges, attributes and trust ratings.

B. Genesis - Network Generator 6

1. Genesis_Stream (Real_Data, Timejnterval)

a. Genesis_Delta_Draw_Graph (Real_Data)

b. Genesis_Time_Slice (Real_Data)

c. Genesis_Zoom (Real_Data)

d. Genesis_Visual_Query (Real_Data)

2. Genesis _Stream_Simulation (Synthetic Data, Timejnterval)

a. Genesis_Delta_Draw_Graph (Synthetic_Data)

b. Genesis_Time_Slice (Synthetic_Data)

c. Genesis_Zoom (Synthetic_Data)

3. Genesis_Visual_Query (RealJData, SyntheticJData)

a. Cora_Ask (Text)

b. Xandrajdentify (Text, Nouns, Verbs, Adjectives)

c. Xandra_Convert (Nodes, Edges, Attributes)

d. Genesis_Draw_Graph (Real_Data, Synthetic Data)

e. Genesis_Graph_Search (Real_Data, Synthetic_Data, Result, Confidence_Rating)

f. XandraJJnconvert (Graph_Segment, Text)

g. Xandrajdentify (Text, Nouns, Verbs, Adjectives)

h. Xandra_Construct (Nouns, Verbs, Edges, Sentences)

i. Xandra_Text_To_Speech (Sentences)

4. Genesisjsomorphjdentification (Nodes, Edges, Attributes, Trust Ratings)

a. Genesis_Graph_Segment_ldentification (Node_Theorems, Edge_Theorems)

b. Genesis_Theorem_Matching (Theorem_Graph_Segment, Delta, Match_Rating)

c. Xandra_Unconvert (Theroem_Graph_Segment, Text)

d. Xandrajdentify (Text, Nouns, Verbs, Adjectives)

e. Xandra_Construct (Nouns, Verbs, Edges, Sentences)

The Genesis_Stream and Genesis_Stream_Simulation functions receive realtime feeds of received and synthetic data 20, 24 from the data structuring engine 4 and produces a network graph 28 of the data. The function provides the ability to capture the network 28 at a particular moment in time and save it so that a user can go back through the network 28 and view the state of the network 28 throughout its history. The function also provides a zooming capability for changing the level of detail the network provides, i.e. switching between high level data and low level data.

The Genesis_Visual_Query function interacts with the insight detection engine 12 to receive queries, convert the queries from natural language into graph format using the data structuring engine 4, search the network 28 for responses to the queries, and convert the responses from graph format to natural language using the data structuring engine 4.

The Genesis_lsomorph_ldentification function matches similarly-shaped network segments.

C. Dyson - Axiom Generator 8

1. Dyson_Box (Algorithm, Structured_Algorithm)

a. Dysonjmport (Algorithm_Code)

b. Dyson_Convert (Algorithm_Code, Inputs, Processing, Outputs, Algorithm_Type)

2. Dyson_Cave (Algorithm, Algorithm_Type)

a. DysonJinstantiate (Algorithm, Algorithm_Set, Algorithm_Type, ULA_Type)

b. Dyson_Test (Algorithm_Set, Real_Data, Synthetic_Data, Algorithm_Selected, Confidence_Rating)

3. Dyson_Controller (Nodes, Edges, Attributes, Trust Ratings, Algorithm_Set)

a. Dyson_Sphere (Edgesjn, Edges_Out, Algorithm_Stack, Confidence Rating)

b. Dyson_Test (Algorithm_Stack, Real_Data, Synthetic_Data, Tuned_Algorithm_Stack, Confidence_Rating)

4. Dyson_Node_Machine (Recommended_Algorithms)

a. Dyson_Create_Node_Machine (Tuned_Algorithm_Stack)

b. Dyson_Create_Meta_Node_Machine (Tuned_Algorithm_Stacks, Universal_Learner_Algorithm_Stack, ULA_Type)

c. Dyson_Test (Universal_Learner_Algorithm_Stack, Real_Data, Synthetic_Data, Tuned_Algorithm_Stack, Confidence_Rating)

The functions for the axiom generator 8 runs through the nodes 52 of the network 28 to analyse inputs and outputs and to determine the processes happening within the nodes and potential future outputs, i.e. the axioms and theorems.

D. Trantor-Simulation Engine 10

1. Trantor_Scenario_Creation (Strange_Loop_Network, lnitial_Conditions, Node_Machine_Set, Questions, Scenario_Objectives)

a. Trantor_lnitial_Conditions (Node_Machine_State_Set)

b. Trantor_Scenario_Scope (Sub_Strange_Loop_Network_Sets)

c. Trantor_Objectives (Text, Graph_Segment)

2. Trantor_Boards_ldentification (Questions, Objective_Graph_Segment, Node_Machine_Set)

a. Xandrajdentify (Text, Nouns, Verbs, Adjectives, Question)

b. Corajnstantiate (Question, Question_Set)

c. Xandra_Convert (Nodes, Edges, Attributes)

d. Genesis_Draw_Graph (Real_Data, Synthetic Data)

e. Genesis_Graph_Search (Real_Data, Synthetic_Data, Result, Confidence_Rating)

f. XandraJJnconvert (Graph_Segment, Text)

g. Trantor_Pattern_Matching (Graph_Segment, Strange_Loop_Network, Target_Spaces)

3. Trantor_Game_Simulation (Target_Spaces, Real_Time_Length, Objective_Theorems_Set)

a. Trantor_Run_Game (Target_Spaces, Real_Time_Length, Target_Space_lntersects, Game_Constraints, Game_Results)

b. Trantor_Assess (Game_Results, Objective_Theorems_Set, Recommended_Paths, Confidence_Rating)

c. Trantor_Explain (Recommended_Paths, Confidence_Rating)

4. Trantor_Game_Axiom_ldentifcation (Objective_Theorems_Set, Axiom_Set)

a. Trantor_Theorem_Generator (Full_Theorem_Set, True_Theorems, False_Theorems)

b. Trantor_Axioms (True_Theorems, Axiom_Set)

The simulation engine 10 function applies initial conditions, queries and scenario objectives to the network 28, referred to as a Strange Loop Network, and runs simulations of scenarios. The scenario objectives are obtained from the queries posed to the insight detection engine 12, which are converted from natural language to graph format. Responses are found in the Strange Loop Network by matching patterns.

E. Cora - Insight Detection Engine 12

1. Genesis_Visual_Query (Real_Data, Synthetic_Data)

a. Cora_Ask (Text)

b. Xandrajdentify (Text, Nouns, Verbs, Adjectives, Question)

c. Corajnstantiate (Question, Question_Set)

d. Xandra_Convert (Nodes, Edges, Attributes)

e. Genesis_Draw_Graph (Real_Data, Synthetic Data)

f. Genesis_Graph_Search (Real_Data, Synthetic_Data, Result, Confidence_Rating)

g. XandraJJnconvert (Graph_Segment, Text)

h. Xandrajdentify (Text, Nouns, Verbs, Adjectives)

i. Xandra_Construct (Nouns, Verbs, Edges, Sentences)

j. Xandra_Text_To_Speech (Sentences)

k. Corajnsight_Generation (Sentences, Insights)

l. Cora_Visualization (Real_Data, Synthetic_Data, Visualization_Set)

The insight detection engine 12 draws functions from the data structuring engine 4 and the network generator 6 to convert natural language queries into graph format and to convert the graph responses into natural language.

Embodiments of the present invention have been described with particular reference to the examples illustrated. However, it will be appreciated that variations and modifications may be made to the examples described within the scope of the present invention.

Claims

1. A method for extracting information, wherein the method is implemented on a computing device, and the computing device is programmed to execute the steps of the method, the method comprising:

receiving data;

generating synthetic data based on the received data;

forming a network of the received and synthetic data, wherein the network comprises nodes;

generating one or more axioms for one or more nodes in the network; retrieving information from the network in response to a query; and outputting at least some of the information.

2. The method of claim 1, further comprising generating trust ratings for the received data and, optionally, for the synthetic data.

3. The method of claim 1 or 2, further comprising updating the received data in real time and updating the network in response to the updated received data.

4. The method of any preceding claim, wherein retrieving information from the network in response to a query comprises running one or more simulations on one or more nodes using the one or more axioms, wherein the information comprises one or more outputs of the one or more simulations.

5. The method of any preceding claim, wherein the data relates to one or more organisations.

6. The method of claim 5, wherein the one or more simulations represent one or more potential future scenarios for the one or more organisations.

7. The method of any preceding claim, further comprising structuring the received data into a first set of one or more data graphs, and wherein the synthetic data is generated in a second set of one or more data graphs.

8. The method of any preceding claim, further comprising receiving one or more queries.

9. The method of claim 8, further comprising:

receiving the one or more queries in a natural language;

translating the one or more queries to a symbolic language;

searching the network for one or more responses to the one or more queries in the symbolic language;

translating the one or more responses from the symbolic language to the natural language; and outputting the one or more responses in the natural language.

10. The method of claim 9, wherein one or more symbols of the symbolic language comprises one or more graphs comprising nodes.

11. A computing system for extracting information, the computing system comprising: a receiver configured to receive data;

a storage device configured to store data;

a processor programmed to execute the steps of:

receiving data;

generating synthetic data based on the received data;

12. The system of claim 11, wherein the processor is further programmed to execute the step of generating trust ratings for the received data and, optionally, for the synthetic data.

13. The system of claim 11 or 12, wherein the processor is further programmed to execute the steps of updating the received data in real time and updating the network in response to the updated received data.

14. The system of any of claims 11 to 13, wherein retrieving information from the network in response to a query comprises running one or more simulations on one or more nodes using the one or more axioms, wherein the information comprises one or more outputs of the one or more simulations.

15. The system of any of claims 11 to 14, wherein the data relates to one or more organisations.

16. The system of claim 15, wherein the one or more simulations represent one or more potential future scenarios for the one or more organisations.

17. The system of any of claims 11 to 16, wherein the processor is further programmed to structure the received data into a first set of one or more data graphs, wherein the synthetic data is generated in a second set of one or more data graphs.

18. The system of any of claims 11 to 17, wherein the processor is further programmed to receive one or more queries.

19. The system of claim 18, wherein the processor is further programmed to execute the steps of:

receiving the one or more queries in a natural language;

translating the one or more queries to a symbolic language;

20. The system of claim 19, wherein each symbol of the symbolic language comprises one or more graphs comprising nodes.

21. A method of generating responses to queries, wherein the method is implemented on a computing device in communication with a network of data and the computing device is programmed to execute the steps of the method, the method comprising:

receiving one or more queries in a natural language;

translating the one or more queries to a symbolic language;

5 searching the network of data for one or more responses to the one or more queries in the symbolic language;

translating the one or more responses from the symbolic language to the natural language;

outputting the one or more responses in the natural language.

o

22. The method of claim 21, wherein the network of data comprises one or more graphs comprising one or more nodes; and wherein one or more symbols of the symbolic language comprises one or more graphs comprising nodes.