WO2019026087A1 - An intelligent context based prediction system - Google Patents

An intelligent context based prediction system Download PDF

Info

Publication number
WO2019026087A1
WO2019026087A1 PCT/IN2018/050502 IN2018050502W WO2019026087A1 WO 2019026087 A1 WO2019026087 A1 WO 2019026087A1 IN 2018050502 W IN2018050502 W IN 2018050502W WO 2019026087 A1 WO2019026087 A1 WO 2019026087A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
nodes
concept
context
vector
Prior art date
Application number
PCT/IN2018/050502
Other languages
French (fr)
Inventor
Hrishikesh KULKARNI
Original Assignee
Kulkarni Hrishikesh
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Kulkarni Hrishikesh filed Critical Kulkarni Hrishikesh
Publication of WO2019026087A1 publication Critical patent/WO2019026087A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks

Definitions

  • This invention related to the field of computer engineering, computer architecture, and neural networks.
  • this invention relates to the field of machine learning
  • this invention relates to an intelligent context based prediction system and method.
  • any node is related to other node and these relationships are defined with a pair of probabilities.
  • the continuous text may include different concepts. There can be transition from one concept to another. This transition may be sudden or progressive or smooth. The dominant concept at a particular location may have contextual impact due to description before the occurrence of that sentence. Thus context may progress with concept and representing this unique association with typical graphs becomes difficult, as it does not follow standard graph rules. What can we do about it? Where is the context here? There are many text artifacts floating related to same concept. Which one is related to prime or dominant concept?
  • Text organization Organizing text with flow and indicators
  • Text decoding and determination of missing text has been a challenging exercise over the years. It can help solve problems from different domains. It may be completion of story or even solving a criminal case. researchers are working extensively in this area.
  • TFIDF Term Frequency and Inverse Document Frequency
  • MMO Multi-Modal Optimization
  • the text flow comprises sentiment, feelings, and objectives. Although, a story or a set of continuous text flows in a particular direction, it goes through emotional spikes. The negativity and positivity is generally identified by using sentiment analysis.
  • the keyword and NLP based sentiment analysis is used for product ranking, movie ranking. Even researchers used crowd sourcing along with sentiment analysis to rank products.
  • the literature shows a compelling intent of researchers to head in direction of creativity using Machine Learning and Artificial Intelligence to solve some creativity problems. These problems range from story completion, poem completion, identifying relevant text and so on.
  • the emotional tone and intent- action can play a major role.
  • the novel concept of representation of intent action is supported with mathematical mapping to identify tone. The tone helps us to select the best piece of text in given context, emotional flow and problem.
  • Term Frequency and Inverse Document Frequency was a first breakthrough in assigning meaning to unstructured scattered text. The similarity among documents and differences among them were used for clustering. The frequency of words was a decision- making driver in these clustering efforts.
  • the dictionary meaning of context points to Situation and Circumstances that is completely understood. Even there is a meaning that something immediately preceding what is written or spoken.
  • the context is the situation associated with social environment whereby knowledge is acquired and processed. One may need to collate multiple concepts in a given context to derive meaning and also to provide meaningful impact and to help decision-making. Further there can be very personalized collation. But again, it is about context. Knowing the context allows us to collate and process information as per demand of the situation.
  • fuzzy graph fuzziness is associated with edges of the graph.
  • AdCMa Adversarial Concept Mapping
  • Adversarial is typically meant as involving or presenting or characterized by opposition or conflict or contrary view.
  • the adversarial concept mining does not focus on flow of concept or smoothness but is, typically, marked by peaks and valleys with reference to distortions. The peaks and valleys under observation build transition nodes for concept mapping and even in case of our fuzzy multi edge graph.
  • association uses the association and relationship among words as deciding factor.
  • the association is further used to derive meaning and knowledge.
  • the association and mapping is used to derive symbolic knowledge from set of documents and information artifacts. Many researchers started working on defining the context. Context based relationships are used to derive these associations.
  • An object of the invention is to provide a system and method aimed towards identifying gaps in data.
  • Another object of the invention is to provide a system and method aimed towards analyzing multidimensional intent and action relationships.
  • an intelligent context based prediction system comprising:
  • input nodes configured to accept user inputs
  • mapping device configured to read each node's vector details and to further map it into a network, in terms of node position, node working, association between nodes, flow between nodes, transitions of flow between nodes or in a networked environment of nodes, context vector associated with each node, and concept vector associated with each node, said network comprising a first node and a last node, associated with middle node, said middle node linking said first node and said last node in terms of a concept vector and a context vector, thereby forming a first concept vector link between a first node and a middle node, a second concept vector link between a middle node and a last node; a first context vector link between a first node and a middle node, a second context vector link between a middle node and a last node;
  • a concept determinator configured to extract at least a concept vector of each node associated with its position, content, association with other nodes in terms of associated nodes' position and content
  • a context determinator configured to extract at least a context vector of each node associated with its user, association with a node, and association of the node in the networked environment
  • a concept mapping device configured to derive a concept map, comprising concept vector links
  • a context mapping device configured to derive a context map, comprising context vector links
  • a 2-part determinator configured to determine a network of nodes ("determined network of nodes") in terms of intent-to-action relation between said first node and said last node to determine relative probability for selection of said middle node from a pool of middle nodes;
  • a 3 -part determinator configured to determine and confirm a determined network of nodes ("confirmed network of nodes") in terms of intent-to-action relation between said first node, said middle node, and said last node to confirm relative probability after selection of said middle node from a pool of middle nodes.
  • said mapping device comprises an order determinator to track order of nodes in terms of flow of data from start to end.
  • said concept mapping device comprises one or more processors to perform the steps of:
  • said concept determinator comprises:
  • tokenizer configured to locate independent meaningful tokens from each of said nodes
  • a remover mechanism configured to obtain important part of each node's content removing the least important part
  • a lemmatisation mechanism configured to identify relevant multi-forms of the content of a node with reference to its importance; in order to determine concept vectors associated with each networked node.
  • said context determinator comprises:
  • tokenizer configured to locate independent meaningful tokens from each of said nodes
  • a remover mechanism configured to obtain important part of each node's content removing the least important part
  • a lemmatisation mechanism configured to identify relevant multi-forms of the content of a node with reference to its importance
  • said, system comprises a prime number route mapping mechanism configured to represent content of a node using a prime number route mapping method and to classify each node into one of at least three types: a) intent nodes, b) action nodes, and c) concept nodes; said classification being done by comparing content with pre-populated databases corresping to content segregated with respect to each of said intent node, said action node, and said concept node.
  • said 2-part determinator comprises:
  • a comparator configured to compare content of said first node and said last node with standard ontology
  • an assignor configured to assign first concept vector link, second concept vector link, first context vector link, and second context vector link between said two nodes with certain specific attributes and quantifying those attributes;
  • a plotter configured to plot relative hyperbolic probability graph, concerning each of said links, in order to determine relative probability of each link and selecting a middle node based on highest probability of links;
  • a tone analyzer configured to determine a top pre-defined percentage of nodes selected for tone analysis, wherein a tone-matching algorithm determines deviation in flow, by plotting a matched curve, by matching the graph of said with ideally expected graph in terms of smoothness of said matched curve.
  • said 3-part determinator comprises:
  • a comparator configured to compare content of said first node, said selected middle node, and said last node with standard ontology; an assignor configured to assign first concept vector link, second concept vector link, first context vector link, and second context vector link between said three nodes with certain specific attributes and quantifying those attributes; a plotter configured to plot relative hyperbolic probability graph, concerning each of said links, in order to confirm highest probability of each link and confirming a middle node based on highest probability of links; and a tone analyzer, configured to confirm a top pre-defined percentage of nodes selected for tone analysis, wherein a tone-matching algorithm confirms deviation in flow, by plotting a matched curve, by matching the graph of said with ideally expected graph in terms of smoothness of said matched curve.
  • system is configured to form a plurality of determined network of nodes, said system comprising:
  • a node classifier further configured to classify each of said nodes into one of: a) an intent node; b) an action connected node; c) a concept node; and d) a border node, at least one node being a part of a plurality of confirmed network of nodes;
  • a prime number route mapping mechanism configured to represent each networked node upon classification by said classifier in order to confirm unidirectional flow of nodes in a determined network of nodes, said mechanism comprising a processor to performs the steps of:
  • the product of all numbers assigned to it in different routes is its unique identification number, so that each network of nodes being formed and represented uniquely;
  • an intelligent context based prediction method comprising the steps of:
  • determining a network of nodes in terms of intent-to-action relation between said first node and said last node to determine relative probability for selection of said middle node from a pool of middle nodes
  • confirmed network of nodes in terms of intent-to-action relation between said first node, said middle node, and said last node to confirm relative probability after selection of said middle node from a pool of middle nodes.
  • said step of reading each node's vector details comprises a further step of tracking order of nodes in terms of flow of data from start to end.
  • said step of deriving a concept map comprises the steps of:
  • said step of deriving a concept map comprises the steps of:
  • said step of deriving a context map comprises the steps of:
  • said, system comprises a prime number route mapping mechanism configured to represent content of a node using a prime number route mapping method and to classify each node into one of at least three types: a) intent nodes, b) action nodes, and c) concept nodes; said classification being done by comparing content with pre-populated databases corresping to content segregated with respect to each of said intent node, said action node, and said concept node.
  • said step of determining a network of comprises the steps of:
  • first concept vector link assigning first concept vector link, second concept vector link, first context vector link, and second context vector link between said two nodes with certain specific attributes and quantifying those attributes;
  • said step of confirming a network of comprises the steps of:
  • first concept vector link assigning first concept vector link, second concept vector link, first context vector link, and second context vector link between said three nodes with certain specific attributes and quantifying those attributes;
  • a tone-matching algorithm confirms deviation in flow, by plotting a matched curve, by matching the graph of said with ideally expected graph in terms of smoothness of said matched curve.
  • said method configured to form a plurality of determined network of nodes, said method comprises the steps of:
  • each of said nodes into one of: a) an intent node; b) an action connected node; c) a concept node; and d) a border node, at least one node being a part of a plurality of confirmed network of nodes;
  • said mechanism comprising a processor to performs the steps of:
  • the product of all numbers assigned to it in different routes is its unique identification number, so that each network of nodes being formed and represented uniquely;
  • FIGURE 1 illustrates a flow diagram of the method of this invention
  • FIGURE 2 illustrates flow and relationship among data points with a working example
  • FIGURE 3 represents actual representation of the example of Figure 3
  • FIGURE 4 illustrates a split path address naming method where it depicts one such associative relationship
  • FIGURE 5 illustrates the prime number route mapping method
  • FIGURE 6 depicts the behaviour of probability vs. index with typical range of x
  • FIGURE 7 illustrates sentiment trend
  • FIGURE 8 illustrates ranking wise success
  • FIGURES 9a and 9b success and emotional trends
  • FIGURE 10 illustrates relationships
  • FIGURE 11 illustrates core concept node demonetization. There is a context from opposition is represented with red edge;
  • FIGURE 12 illustrates success for concept context mapping
  • FIGURE 13 illustrates phases (or steps) for a thought mapper.
  • a 'node' is defined as a connected object or device or application or data in a network.
  • a node is defined by parameters such as its position, behaviour, and value.
  • a node's position and value defines the network behaviour.
  • a node's position defines its relativity with connected nodes and cumulatively defines the network behaviour.
  • a 'node' is defined by means of a context.
  • a context vector assigns a specific behaviour, weight, direction, and associative capabilities to a node which means that the node's position in a network is defined, a node's association with its connected node is defined, a node's relative position relative to associated nodes is defined, a node's input is defined (thereby defining behaviour), and a node's output is defined (thereby defining behaviour) by this context vector.
  • a rule engine may define such context vectors as outputs.
  • a 'node' is defined by means of a concept.
  • a concept vector assigns a specific behaviour, weight, direction, and associative capabilities to a node which means that the node's position in a network is defined, a node's association with its connected node is defined, a node's relative position relative to associated nodes is defined, a node's input is defined (thereby defining behaviour), a node's output is defined (thereby defining behaviour) by this concept vector.
  • a rule engine may define such concept vectors as outputs.
  • Context is defined as a multi- faceted vector which affects at least a node in its networked environment.
  • Context is a function of user, a situation depicted in a particular application, and a situation in which the application is used. It is to be noted that these applications are also nodes and may be associated with a user, its actions, and a relative environment. Hence, perspective of the user with reference to the environment, action sequence, and flow of events represent a context. Thus, context is typically represented by the properties of situation, relationships among events, and properties of events.
  • the term, 'concept' is defined as a multi- faceted vector which affects at least a node in its networked environment.
  • Concept is a function of keywords, associated concept, and position in which the application is used. It is to be noted that these applications are also nodes in an environment. Any portion of the application can have multiple concepts. Therefore, an entire application can be represented as a concept map of nodes.
  • a concept is an idea depicted in text, paragraph, or conversation. A text under observation may depict multiple ideas and, hence, can have multiple concepts. Generally, it is expected to represent associated concepts. Some researchers believe that a concept can have multiple meanings. Concept mining focuses on finding out keywords and relationships among them.
  • a 'concept' is defined as a flow across a set of nodes in a particular application of a networked environment.
  • a 'concept' is defined a flow of an idea (behaviour of nodes) across a series of sentences (of an application which is a document) trying to convey a particular meaning (working or output of the network).
  • the relationships among relevant words (relevant nodes) are used along with information flow (association and flow of information between nodes) and intent mapping to mine the concepts for that particular environment.
  • an application is a document with text or data. Therefore, any paragraph in the document can have multiple concepts and it is important to derive meaning of a paragraph in relation to a particular context using the system and method of this invention.
  • a system and method to map multiple events on a time line by determining their interdependency to predict the most probable series of events that might have occurred is a Probabilistic Intent-Action Ontology and Tone Matching system and method.
  • Various influencing factors need to be considered which alter a node's context vectors and concept vectors.
  • various aspects of flow of events, continuity, negation, shift of emotions, and the like are considered.
  • the system and method of this invention is based on analyzing multidimensional intent and action relationships, application of naive Bayes theorem to text, plotting relative hyperbolic probability, plotting the tone matching graph, and calculating deviations. This approach can be used to solve many real life problems like solving criminal cases, completing stories, and identifying gaps in data.
  • a story writing competition is envisaged in which beginning of the story is known and end of the story is also available.
  • writers are expected to complete a middle part of the story by selecting most relevant piece amongst possible options.
  • the Probabilistic Intent-Action Ontology and Tone Matching Algorithm will study all the drafts in detail and select the most appropriate one. It will also give the relative positions of other drafts with respect to the chosen draft by determining closeness factor.
  • a crime investigation is considered. Assuming that all the data before and after the incident is available, except for a missing link. This algorithm could analyze all the possibilities based on the statements and the most appropriate possibility can be judged.
  • FIGURE 1 illustrates a flow diagram of the method of this invention.
  • a set of nodes in a network are defined as input nodes which accept user inputs.
  • these inputs may be text inputs such as text 1, text 3, and n drafts of text2 where text 1, text2, and text3 are in logical or coherent sequence.
  • text 1, text2, and text3 are in logical or coherent sequence.
  • Step 1 The system takes textl and text3 as input.
  • a mapping device is configured to read each node's vector details and to further map it into the network. This aids in mapping the entire network in terms of node position, node working, association between nodes, flow between nodes, transitions of flow between nodes or in a networked environment of nodes, and the like.
  • association across multiple text artifacts will present the collated concepts with reference to context. This can further be improved to map to a user's sensitivity index.
  • the mapping device comprises an order determinator which track order of nodes or how data flows.
  • Step 2 The system takes n drafts of text2 as input to define order. Iteratively we can order even larger number of text artifacts.
  • a concept determinator is configured to focus on vectors of a node associated with its position, content, association with other nodes in terms of the associated nodes' position and content.
  • concepts are defined as a function of keywords, associated concepts, and positions.
  • the output of a concept determinator is to represent an application in terms of its concept vectors which are associated with nodes. This is further used in determination of context vectors associated with the application.
  • a context determinator is configured to focus on vectors of a node associated with its user, association with a node (application), and association of the node (application) in the networked environment.
  • contexts are defined as a function of user, situation depicted in the application, and situation in which the application is used.
  • a concept mapping device provides a concept map comprising concept vector links. It is configured to derive a concept map between nodes relating to the (input) application that is being examined.
  • a concept map is derived using following five steps: 1. Data Vector organization in order to understand flow of data and indicators, thereof.
  • FIGURE 13 illustrates phases (or steps) for a thought mapper.
  • a concept map is derived using the following steps.
  • Text organization Organizing text with flow and indicators
  • a context mapping device provides a context map comprising context vector links.
  • Associating concepts Two or more concepts are associated in a particular way. In many cases it forms a concept chain. This chain helps us to build overall context;
  • Core intent node is a node which, irrespective of its frequency of occurrence, impacts one or more actions in the region under observation
  • [IA] refers to set of impacted prominent actions
  • Border node is a node that shows relationship with more than one prominent action but not directly impact any action.
  • the scenario is analyzed to prepare a context.
  • association between multiple context maps help the system and method to find out whether given two persons can work together very well in a given scenario.
  • the ranking of individuals with reference to a given scenario is performed.
  • the thought process mapping is carried out with reference to the topmost person suitable for a given scenario. It is mapped with all the remaining individuals in context of given scenario. Reinforcement learning is used to collect reward and penalties to select best member in any given scenario.
  • CSC Contextual Sentiment Closeness
  • the Association A of the outcome SI can then be calculated as:
  • n normalization factor
  • the algorithm is implemented using closeness factor with reference to identification of core member for given scenario.
  • core member the other members are identified in context with a given scenario.
  • Deviation in behavior is used for selection of members. Thought indicating words are plotted on X-axis and responses are plotted on Y-axis. These points are connected to represent an overall thought process of the candidate in a given scenario. This comparison and deviation gives first indicator for thought process association.
  • the graph connecting the text one and text three points in the form of a spline represents thought process of the core candidate.
  • the deviation between thought process of core member and the thought process of aspirant in the team is the sum of magnitude of area differences between these curves.
  • a tokenizer is configured to locate independent meaningful tokens from the nodes (text paragraphs).
  • a remover mechanism is configured to obtain important part of the text removing the least important part.
  • a lemmatisation mechanism is configured to obtain identify relevant multi- forms of the words with reference to importance of them.
  • Step 3 The system performs tokenization, stop word removal, lemmatization; and a list of words is obtained for each text. From given text, concept vectors and context vectors are derived which define associated nodes.
  • a prime number route mapping mechanism is configured to represent content of a node using a prime number route mapping method.
  • the nodes are classified into at least three types: a) intent nodes, b) action nodes, and c) concept nodes.
  • a node is an application with text
  • words of text textl, text2 are taken in order and compared with database and UIC that is unique Identification code is associated with every word. This is done based on an INTENT-ACTION ONTOLOGY (IAO) database, which is represented using PRIME NUMBER ROUTE MAPPING METHOD (PNRMM).
  • IAO INTENT-ACTION ONTOLOGY
  • PNRMM PRIME NUMBER ROUTE MAPPING METHOD
  • Step 4 The system takes words of textl, text2, text3 in order and compareswith database and UIC (Unique Identification Code) is associated with every word.
  • UIC Unique Identification Code
  • two text items are provided; textl, text3.
  • the objective is to determine a missing text2 of this sequence or context or logic or a piece of continuous text which joins or provides meaning to textl and text3.
  • a 2-part determinator is provided to determine intent-to-action relation for two parts of input text (textl, text3) portion.
  • a comparator compares it with standard ontology.
  • An assignor assigns relation between these two parts of texts with certain specific attributes and quantifies those attributes.
  • a plotter plots relative hyperbolic probability graph which determines relative probability.
  • a 3-part determinator provided to determine intent-to-action relation for three parts of input text portion (textl, text2, text3).
  • a comparator compares it with standard ontology.
  • An assignor assigns relation between the parts of texts with certain specific attributes and quantifies those attributes.
  • a plotter plots relative hyperbolic probability graph which determines relative probability to get final probability (based only on intent-action ontology) to determine a set of most probable 'text2' on the basis of intent-action ontology.
  • a tone anaylser analyses these sets of texts in order to determine the most appropriate text on the basis of both: 1) intent-action ontology (IAO) module; and 2) tone analysis module.
  • IAO intent-action ontology
  • tone analysis module determines relevance while tone analysis module analyses flow of emotions. Relevant story has higher priority than story with the right flow of emotions. Hence, relevance is judged first and then tone analysis is done.
  • FIGURE 2 illustrates flow and relationship among data points with a working example.
  • FIGURE 3 represents actual representation of the example of Figure 3.
  • a linear path representor represents each route or path between nodes of a given application by a straight line using 2-dimensional coordinate geometry.
  • the direction of flow is one along which X-coordinate increases.
  • the number of lines is determined.
  • the line (path) with maximum number of words is plotted first with any random slope.
  • the words in that flow are assigned with coordinates of points on that line with an increasing value of X.
  • the next longest sequence is considered which has a word common with the first flow. And the same process is repeated.
  • coordinates are easy to plot, process, store in database, and access whenever needed. Further, when new words are added i.e. the database is upgraded, only the new words are given coordinates.
  • a periodic route intersection representor defines and represents each route or path by a periodic linear path representor, longest sequence of words is selected.
  • the most basic path equation y sin (rue) is selected.
  • the value of n can be increased and all paths can first be plotted. All words are now plotted in increasing order of x and also positioned depending upon routes in which they traversed. Since, routes intersect multiple times, there can be any number of common words. Further, value of n specifies the specific route.
  • a split path address naming mechanism is configured to provide a unique identification code to every word in a tree structure.
  • FIGURE 4 illustrates a split path address naming method where it depicts one such associative relationship.
  • the data is divided into number of trees. Each tree has a number. The first word is assigned 0. Then, each branch is assigned a number. Now, there are multiple paths, which are numbered in order. If a certain word leads to next word and does not branch, then that particular path is assigned 1.
  • I-A index can be computed.
  • Step 5 The associated UIC is split in a given way that it powers along different routes.
  • Words from textl and text2 are considered first for calculating I-A index 1 and then pi (probability 1 which is the probability of occurrence of text2 when textl has already occurred.) Words from text2 and text3 are considered then for calculating I-A index2 and then p2 (probability2 which is the probability of occurrence of text3 when text2 has already occurred.)
  • a prime number route mapping mechanism is configured to represent graphical data in tabular form.
  • Step 6 Every route is considered one by one and a list of primes is taken in sequence. Based on this, I-A index is calculated using the following database.
  • FIGURE 5 illustrates the prime number route mapping method.
  • Every unidirectional flow of words can be represented by a route.
  • a unique prime number is assigned to each route using the system and method of this invention. Starting from the each node, the power of prime number is increased. This power of prime number is a unique identification for that word.
  • the product of all the numbers assigned to it in different routes is its unique identification number. Benefit of using this method is that from one number, all the routes can be determined in which the word is present and also its position in that route. These details are found out by factorization of unique identification number. Consider textl and text2 first and then text2 and text3 for determining relationship attributes and quantifying them.
  • association or nodes, as represented by this prime number route mapping method allows for forming a network of nodes in accordance with weights associated with associated prime numbers per node.
  • each network of node is formed and represented uniquely.
  • Intent- Action ontology and na ve-Bayes is not used to all 3 texts (textl, text2, text3) as short stories take turns and twists in the middle part and new nodes start in text2. Analysis is done in two parts.
  • PI and P2 are calculated by substituting I- A index 1 and I- A index 2 respectively in the given equation of hyperbola.
  • a probability determinator is configured to determine probability of nodes.
  • Table 2 depicts occurrence of simple words. It becomes further complex as the dependency and association increases.
  • the Intent-Action index of the relationship between these two texts can be calculated from the above given table.
  • I-A index to relative probability function is defined as
  • I-A index is on x and relative probability is on y
  • FIGURE 6 depicts the behaviour of probability vs. index with typical range of x.
  • a top pre-defined percentage of drafts are selected for tone analysis.
  • the tone-matching algorithm determines deviation in flow of emotions by matching the graph of draft and ideally expected graph. This graph denotes the emotional tone behind the set of sentences. It is assumed that, in any good draft, there are no abrupt changes and hence a smooth curve is expected.
  • a Selector mechanism is configured to select nodes. Step9:
  • the system and method plots p index versus number of words, from start, for all texts (textl, text2, text3).
  • the system and method considers plotted points of textl and text3 only.
  • FIGURE 7 illustrates sentiment trend
  • the tone analyzer first analyses all words in textl and text3 and calculates a positivity index.
  • the output is a database of words which strongly suggest the emotional tone with magnitude (positive or negative).
  • DEVIATION is defined as sum of magnitudes of areas between the two graphs.
  • the delivered order can consider emotional flow and intent of delivering the message.
  • FIGURE 8 illustrates ranking wise success.
  • FIGURES 9a and 9b success and emotional trends.
  • a core intent node is a node which irrespective of its frequency of occurrence impact one or more actions in the region under observation
  • a border node is a node that shows relationship with more than one prominent actions but not directly impact any action.
  • an action connected node is a set of node connected by one or more actions.
  • INTENT-ACTION ONTOLOGY implies that each node points into various directions. And if the story proceeds along any of those directions action nodes (words) related to that direction are found.
  • PNRMM is mathematical representation of this database.
  • Various flows of nodes (words) is predetermined and each flow is called a route and is represented by a prime number.
  • Nodes (words) in a flow are represented by consecutive powers of that specific prime number.
  • Nodes (words) present in multiple roots are represented by product of powers of primes associated in different routes. This relationship is further represented as a multi edge graph context model and semi-graph.
  • this system and method is divided in 3 major phases:
  • the core nodes are identified. These are the nodes (words) associated with one or more action words.
  • Core intent node is a node which irrespective of its frequency of occurrence impact one or more actions in the region under observation.
  • [IA] refers to set of impacted prominent actions
  • Border node A node that shows relationship with more than one prominent action but not directly impact any action
  • Action connected node Is a set of nodes connected by one or more actions.
  • FIGURE 10 illustrates relationships.
  • Phase III Here multiple concepts are associated to build the central concept and associated concepts. Every additional node/concept added to graph result in information gain. This information gain depends on action reachability of the particular node. Standard entropy formulation is used to define information gain. Hence, information gain is defined as:
  • conditional entropy will be calculated iteratively to represent overall information gain.
  • a concept path in given context is derived and for that the modified Bayesian is used.
  • Context helps to decide the context nodes to be traversed.
  • primary concept secondary concept, action reachable points along with preceding information contributes to this concept traversal.
  • simple Bayesian to decide the best path for given context.
  • context can be derived from core concept, supporting word, prelude and situational parameters.
  • context could be:
  • multi-graph multiple concepts are connected by zero or more edges.
  • the association is represented through connection. How to decide degree of node is dependent on action reachability of different concepts.
  • a representative multi-graph with contextual traversal is discussed ahead.
  • FIGURE 11 illustrates core concept node demonetization.
  • There is a context from opposition is represented with red edge. This context associates multiple concepts in a particular order. There are multiple edges connecting different concepts.
  • the demonetization is a core concept since there are five action-reachable and three direct-reachable concepts from this node.
  • the concept drift or adversarial point is identified based on degree transformation of nodes while traversing. These points are used for concept
  • Node is action reachable if there is action edge between two nodes. There can be number of action edges and it can be action reachable through other nodes. Same is true direct reachable.
  • FIGURE 12 illustrates success for concept context mapping

Abstract

An intelligent context based prediction system comprising: input nodes to accept user inputs; mapping device to read each node's vector details and forming a first concept vector link, a second concept vector link; a first context vector link, a second context vector link; a concept determinator; a context determinator; a concept mapping device; a context mapping device; a 2-part determinator to determine a network of nodes in terms of intent-to-action relation between said first node and said last node to determine relative probability for selection of said middle node from a pool of middle nodes; and a 3-part determinator to confirm a determined network of nodes in terms of intent-to-action relation between said first node, said middle node, and said last node to confirm relative probability after selection of said middle node from a pool of middle nodes.

Description

AN INTELLIGENT CONTEXT BASED PREDICTION SYSTEM
FIELD OF THE INVENTION:
This invention related to the field of computer engineering, computer architecture, and neural networks.
Particularly, this invention relates to the field of machine learning
Specifically, this invention relates to an intelligent context based prediction system and method.
BACKGROUND OF THE INVENTION:
Every day, we deal with lot of information. It generally comes in unstructured form. Natural Language Processing tackles mining unstructured data very elegantly. It can range from parsing, tokenization to disambiguation and what not. Throughout a given text, concepts are flowing. These concepts are sometimes represented partially while in other cases they go through many transitions. The concepts come and go - sometimes there is abrupt change in concept. In some other cases there is smooth transition from one concept to another. Concepts come in context. Context has more life in documents as compared to concept. Multiple concepts converge in a particular context and form a new concept. A particular concept in a given context may mean completely different than in case of some other context.
World is about relationships so is mathematics. Different data structures have emerged to represent these relationships. Graphs, trees, and other data structures represent these relationships. When it comes to unstructured data these structures need to be modified. For the same purpose mathematicians introduced Directed Acyclic Graphs, Fuzzy Graphs, Multi Edge Graphs and Semi-graphs. In this paper to represent this unstructured relationship we are going to use Multi-Edge fuzzy graph.
In a Fuzzy Graph, any node is related to other node and these relationships are defined with a pair of probabilities.
Majority of the problems in the environment are partially observable. In this environment we need to use information and intelligence effectively to arrive at the best possible solution. The same analogy can be applied to problems represented in text form.
The continuous text may include different concepts. There can be transition from one concept to another. This transition may be sudden or progressive or smooth. The dominant concept at a particular location may have contextual impact due to description before the occurrence of that sentence. Thus context may progress with concept and representing this unique association with typical graphs becomes difficult, as it does not follow standard graph rules. What can we do about it? Where is the context here? There are many text artifacts floating related to same concept. Which one is related to prime or dominant concept?
There are total five core steps to determine concept map:
1. Text organization: Organizing text with flow and indicators
2. Concept retrieval - first level - The multiple concepts are retrieved from text.
3. Associating concepts: Two or more concepts are associated - this is done using context.
4. Weaving concept to determine concept map: These association and relationships determine concept maps.
5. Determine external parameters and concept
In many real life situations, end results and basic starting data are known. To deduce conclusive evidence or to build a holistic picture, one needs to find out hidden information and missing text.
There is a need to solve many real life problems by identifying gaps in data.
Text decoding and determination of missing text has been a challenging exercise over the years. It can help solve problems from different domains. It may be completion of story or even solving a criminal case. Researchers are working extensively in this area.
According to prior art, all focus was on resolving it through Term Frequency and Inverse Document Frequency (TFIDF). Term frequency based approaches were deployed to determine document clusters in this approach. TFIDF was extensively used to resolve document queries and finding association among various queries. Researchers even worked in the beginning to analyze large number of documents. The work included methods build around keyword and bigram identification. The objective was to cluster the documents.
The relevance between words is at the center of all these experimentations. Researchers worked on extracting symbolic knowledge from different knowledge sources.
For extracting symbolic knowledge, context and association are two key factors. Also, concept and association are two key factors.
According to prior art, different names, keywords and phrases are used for mining documents. Even researchers worked on finding textual similarity based on special names.
In recent times, researchers focused on determining context. For determining context, researchers used association rule mining and Bayes classification.
In case where additional information is made available in due course of time, incremental learning is recommended. Researchers worked on incremental learning concepts to optimize learning time and make effective use of already built knowledge.
For incremental learning closeness factor based technique and similar approaches are used.
Researchers have worked on unsupervised classification using absolute value inequalities.
Furthermore, Multi-Modal Optimization (MMO) is crucial for problem solving. Population based meta-heuristic is effective in solving MMO problems. Researchers presented updated surveys and presented comparison to select best path for problem solving.
The work on building systemic perspective and taking multi perspective view to identify holistic solution is also a part of prior art work. In this work, dependencies among sub parts of the system are used to understand impact of selection.
The text flow comprises sentiment, feelings, and objectives. Although, a story or a set of continuous text flows in a particular direction, it goes through emotional spikes. The negativity and positivity is generally identified by using sentiment analysis. The keyword and NLP based sentiment analysis is used for product ranking, movie ranking. Even researchers used crowd sourcing along with sentiment analysis to rank products.
Also, according to the prior art, there are many research attempts reported to use Artificial Intelligence, for literary creativity.
According to prior art, researchers have used ε-distance based density clustering to cluster distributed data.
The literature shows a compelling intent of researchers to head in direction of creativity using Machine Learning and Artificial Intelligence to solve some creativity problems. These problems range from story completion, poem completion, identifying relevant text and so on. The emotional tone and intent- action can play a major role. In this research the novel concept of representation of intent action is supported with mathematical mapping to identify tone. The tone helps us to select the best piece of text in given context, emotional flow and problem.
With reference to this discussion it becomes very necessary to understand these two terms: Concept and Context.
Concept and context have been the keywords of unstructured data research for many years. Researchers defined concepts and contexts in different ways and contributed to information decoding. When information is to be used, it needs to be organized and meaning needs to be derived from it with reference to tasks at hand. One of the major driving factors remains the domain of use. Context is about situation at hand while concept is the idea or set of ideas depicted in the text. Keywords and key phrases are at the core of this research and frequency of the occurrence has played a key role in the research of text mining and concept determination. The typical preprocessing is used to remove the stop words form the text of interest after tokenizing it. Over the years frequency based algorithms dominated text mining. Term Frequency and Inverse Document Frequency (TFIDF) was a first breakthrough in assigning meaning to unstructured scattered text. The similarity among documents and differences among them were used for clustering. The frequency of words was a decision- making driver in these clustering efforts.
Many researchers have used these terms and even defined them. The dictionary meaning of context points to Situation and Circumstances that is completely understood. Even there is a meaning that something immediately preceding what is written or spoken. The context is the situation associated with social environment whereby knowledge is acquired and processed. One may need to collate multiple concepts in a given context to derive meaning and also to provide meaningful impact and to help decision-making. Further there can be very personalized collation. But again, it is about context. Knowing the context allows us to collate and process information as per demand of the situation.
In fuzzy graph, fuzziness is associated with edges of the graph.
There is a need for a new system and method for Adversarial Concept Mapping (AdCMa). Adversarial is typically meant as involving or presenting or characterized by opposition or conflict or contrary view. The adversarial concept mining does not focus on flow of concept or smoothness but is, typically, marked by peaks and valleys with reference to distortions. The peaks and valleys under observation build transition nodes for concept mapping and even in case of our fuzzy multi edge graph.
It is to be understood that it is not simply about frequently occurring words or missing information but collation of concepts scattered and association among them. This can be used for representation and collation of information that may come in the form of emails, text data, and messages. Here, concept making and breaking is closely associated with context. The frequent concept drift make the task of decoding difficult. Simply using a few keywords or frequently occurring statements cannot determine these concepts. It requires deriving a concept map and transitions among multiple concepts with reference to context. The concept regions are identified using the valleys in the concept flow. The concept is determined based on intent and action association. There is a need to understand change in flow and to understand relationships between dips and peaks which can be used for concept deriving.
The methods, described above, worked very well for structured documents while reasonably for certain categories of unstructured ones. The extension of this method to bi-gram based methods and tri-gram based methods were tested and the evident conclusion restricted majority of efforts to bigram- based approaches. This helped to achieve clustering of documents in relevant groups and classes. These methods evolved to obvious weighted TFIDF based methods. Here, calculating weights was the most challenging task and it is simplified by assigning weights based on frequency of occurrence. Later, some additional parameters were also included in weight calculation, like position of the word in document. This w as used for concept determination and qualification. In these cases, researchers put efforts in defining a concept. Mathematically, a concept is defined as a corpus of words. It uses the association and relationship among words as deciding factor. The association is further used to derive meaning and knowledge. The association and mapping is used to derive symbolic knowledge from set of documents and information artifacts. Many researchers started working on defining the context. Context based relationships are used to derive these associations.
Researchers used dictionary based keyword search and bigrams for document mining. Textual association and disambiguation for deriving meaning was also used by many researchers. Then there was a trend of personalized delivery and personalized processing. Context has got importance and many researchers turned to context based processing and classification. Some researchers worked on associating multiple contexts to derive overall meaning. Traditionally, researchers were focused on place, location and time to derive the context. Slowly the definition of the context became broader. Some researcher deployed other algorithms like association rule mining and Bayes classification for context determination. World evolution and learning has always been incremental in nature. Building on what one has is the organic property of the world. This additional information coming from different sources makes it necessary to correct your learning vectors. Hence, there is pressing need for incremental learning. This began hunt by researchers to learn incrementally and adaptively. The major incentive for this research is to minimize learning time. Slowly, researchers looking for knowledge augmentation joined this movement. With the motivation of protecting system from loosing already learnt and acquired knowledge, some serious research efforts on incremental learning were reported. In incremental learning, it is necessary to find out what to learn and what not to learn. Researchers used closeness factor based approach for the same. Semi- supervised learning allows learning from labeled as well as unlabeled data. Use of absolute value inequalities for semi-supervised learning showed a lot of promise.
Researchers used Gaussian mixture models for multimodal optimization problems. Population based meta-heuristic is also used for best path selection. Information never comes as one unit. It becomes available in parts and over the time. Deriving the meaning out of these parts of information is necessary to take right decision. Systemic and Multi-perspective learning is required to address this issue. In this work of the prior art, focus was on building a systemic view. Words and series of words express emotions. Emotions can depend on context or it can be like an impulse based on intensity and positioning of the word. The simple sentiment for selection of product is derived from feedbacks from users. The text flow also determines emotions. The sentiment analysis is strongly mapped to decisive and expressive keywords. Researchers and professionals used it for movie, product, and book rankings. For this purpose crowd sourcing and crowd intelligence was also used. Text is always associated with creativity. In this quest, a few researchers worked for creativity and learnability mapping. Also researchers used Artificial Intelligence and Machine Learning for different creative activities like expressing in poetic words, writing fictions, and creative assimilations etc.. The learning can happen in compartment and then combining those learning's is a difficult task. Researchers used ε- distance based density clustering to cluster distributed data. The literature shows an impetus to create a real and intelligent learning through association and inference. The compelling intent of researchers helped to solve some very difficult and challenging problems in the domains of creativity. The emotional tone and intent-action can play a major role while solving this problem of completing stories with effective mapping of hidden and missing text. Deriving a concept with reference to context remains a challenge. Researcher even made use of positional significance.
There is a need for a multi-edge graph based system and method to derive this mapping.
OBJECTS OF THE INVENTION:
An object of the invention is to provide a system and method aimed towards identifying gaps in data.
Another object of the invention is to provide a system and method aimed towards analyzing multidimensional intent and action relationships.
SUMMARY OF THE INVENTION:
According to this invention, there is provided an intelligent context based prediction system comprising:
input nodes configured to accept user inputs;
mapping device configured to read each node's vector details and to further map it into a network, in terms of node position, node working, association between nodes, flow between nodes, transitions of flow between nodes or in a networked environment of nodes, context vector associated with each node, and concept vector associated with each node, said network comprising a first node and a last node, associated with middle node, said middle node linking said first node and said last node in terms of a concept vector and a context vector, thereby forming a first concept vector link between a first node and a middle node, a second concept vector link between a middle node and a last node; a first context vector link between a first node and a middle node, a second context vector link between a middle node and a last node;
a concept determinator configured to extract at least a concept vector of each node associated with its position, content, association with other nodes in terms of associated nodes' position and content;
a context determinator configured to extract at least a context vector of each node associated with its user, association with a node, and association of the node in the networked environment; a concept mapping device configured to derive a concept map, comprising concept vector links;
a context mapping device configured to derive a context map, comprising context vector links;
a 2-part determinator configured to determine a network of nodes ("determined network of nodes") in terms of intent-to-action relation between said first node and said last node to determine relative probability for selection of said middle node from a pool of middle nodes; and
a 3 -part determinator configured to determine and confirm a determined network of nodes ("confirmed network of nodes") in terms of intent-to-action relation between said first node, said middle node, and said last node to confirm relative probability after selection of said middle node from a pool of middle nodes.
Typically, said mapping device comprises an order determinator to track order of nodes in terms of flow of data from start to end.
Typically, said concept mapping device comprises one or more processors to perform the steps of:
- organizing data vector in order to understand flow of data and indicators, thereof;
- retrieving concept using concept vector data per node;
- associating concept vectors of various nodes such that two or more concept vectors are associated using context vectors;
- weaving concept vectors to determine a concept map using data relating to the association of data relating to associating concept vectors relating to relationships of organized data vectors and retrieved data vectors; and
- determining external parameters which affect said concept vectors and, therefore, said concept map.
Typically, said concept determinator comprises:
tokenizer configured to locate independent meaningful tokens from each of said nodes;
a remover mechanism configured to obtain important part of each node's content removing the least important part; and
a lemmatisation mechanism configured to identify relevant multi-forms of the content of a node with reference to its importance; in order to determine concept vectors associated with each networked node.
Typically, said context determinator comprises:
tokenizer configured to locate independent meaningful tokens from each of said nodes;
a remover mechanism configured to obtain important part of each node's content removing the least important part; and
a lemmatisation mechanism configured to identify relevant multi-forms of the content of a node with reference to its importance;
in order to determine context vectors associated with each networked node.
Typically, said, system comprises a prime number route mapping mechanism configured to represent content of a node using a prime number route mapping method and to classify each node into one of at least three types: a) intent nodes, b) action nodes, and c) concept nodes; said classification being done by comparing content with pre-populated databases corresping to content segregated with respect to each of said intent node, said action node, and said concept node.
Typically, said 2-part determinator comprises:
a comparator configured to compare content of said first node and said last node with standard ontology;
an assignor configured to assign first concept vector link, second concept vector link, first context vector link, and second context vector link between said two nodes with certain specific attributes and quantifying those attributes; a plotter configured to plot relative hyperbolic probability graph, concerning each of said links, in order to determine relative probability of each link and selecting a middle node based on highest probability of links; and
a tone analyzer, configured to determine a top pre-defined percentage of nodes selected for tone analysis, wherein a tone-matching algorithm determines deviation in flow, by plotting a matched curve, by matching the graph of said with ideally expected graph in terms of smoothness of said matched curve.
Typically, said 3-part determinator comprises:
a comparator configured to compare content of said first node, said selected middle node, and said last node with standard ontology; an assignor configured to assign first concept vector link, second concept vector link, first context vector link, and second context vector link between said three nodes with certain specific attributes and quantifying those attributes; a plotter configured to plot relative hyperbolic probability graph, concerning each of said links, in order to confirm highest probability of each link and confirming a middle node based on highest probability of links; and a tone analyzer, configured to confirm a top pre-defined percentage of nodes selected for tone analysis, wherein a tone-matching algorithm confirms deviation in flow, by plotting a matched curve, by matching the graph of said with ideally expected graph in terms of smoothness of said matched curve.
Typically, system is configured to form a plurality of determined network of nodes, said system comprising:
a node classifier further configured to classify each of said nodes into one of: a) an intent node; b) an action connected node; c) a concept node; and d) a border node, at least one node being a part of a plurality of confirmed network of nodes;
a prime number route mapping mechanism configured to represent each networked node upon classification by said classifier in order to confirm unidirectional flow of nodes in a determined network of nodes, said mechanism comprising a processor to performs the steps of:
i) assigning a unique prime number to each route from start to end for each confirmed network of nodes;
ii) starting from each node, the power of prime number is increased, such that this power of prime number is a unique identification for that node;
iii) for nodes present in multiple routes, the product of all numbers assigned to it in different routes is its unique identification number, so that each network of nodes being formed and represented uniquely; and
iv) factorizing said unique identification number to find out all routes for a node and also its position in a network.
According to this invention, there is also provided an intelligent context based prediction method comprising the steps of:
accepting user inputs through input nodes;
reading each node's vector details and to further mapping it into a network, in terms of node position, node working, association between nodes, flow between nodes, transitions of flow between nodes or in a networked environment of nodes, context vector associated with each node, and concept vector associated with each node, said network comprising a first node and a last node, associated with middle node, said middle node linking said first node and said last node in terms of a concept vector and a context vector, thereby forming a first concept vector link between a first node and a middle node, a second concept vector link between a middle node and a last node; a first context vector link between a first node and a middle node, a second context vector link between a middle node and a last node;
extracting at least a concept vector of each node associated with its position, content, association with other nodes in terms of associated nodes' position and content;
extracting at least a context vector of each node associated with its user, association with a node, and association of the node in the networked environment;
deriving a concept map, comprising concept vector links;
deriving a context map, comprising context vector links;
determining a network of nodes ("determined network of nodes") in terms of intent-to-action relation between said first node and said last node to determine relative probability for selection of said middle node from a pool of middle nodes; and
confirming a determined network of nodes ("confirmed network of nodes") in terms of intent-to-action relation between said first node, said middle node, and said last node to confirm relative probability after selection of said middle node from a pool of middle nodes.
Typically, said step of reading each node's vector details comprises a further step of tracking order of nodes in terms of flow of data from start to end.
Typically, said step of deriving a concept map comprises the steps of:
- organizing data vector in order to understand flow of data and indicators, thereof;
- retrieving concept using concept vector data per node;
- associating concept vectors of various nodes such that two or more concept vectors are associated using context vectors;
- weaving concept vectors to determine a concept map using data relating to the association of data relating to associating concept vectors relating to relationships of organized data vectors and retrieved data vectors; and - determining external parameters which affect said concept vectors and, therefore, said concept map.
Typically, said step of deriving a concept map comprises the steps of:
locating independent meaningful tokens from each of said nodes;
obtaining important parts of each node's content removing the least important part; and
identifying relevant multi-forms of the content of a node with reference to its importance;
in order to determine concept vectors associated with each networked node.
Typically, said step of deriving a context map comprises the steps of:
locating independent meaningful tokens from each of said nodes;
obtaining important part of each node's content removing the least important part; and
identifying relevant multi-forms of the content of a node with reference to its importance;
in order to determine context vectors associated with each networked node.
Typically, said, system comprises a prime number route mapping mechanism configured to represent content of a node using a prime number route mapping method and to classify each node into one of at least three types: a) intent nodes, b) action nodes, and c) concept nodes; said classification being done by comparing content with pre-populated databases corresping to content segregated with respect to each of said intent node, said action node, and said concept node.
Typically, said step of determining a network of comprises the steps of:
comparing content of said first node and said last node with standard ontology;
assigning first concept vector link, second concept vector link, first context vector link, and second context vector link between said two nodes with certain specific attributes and quantifying those attributes;
plotting relative hyperbolic probability graph, concerning each of said links, in order to determine relative probability of each link and selecting a middle node based on highest probability of links; and determining a top pre-defined percentage of nodes selected for tone analysis, wherein a tone-matching algorithm determines deviation in flow, by plotting a matched curve, by matching the graph of said with ideally expected graph in terms of smoothness of said matched curve.
Typically, said step of confirming a network of comprises the steps of:
comparing content of said first node, said selected middle node, and said last node with standard ontology;
assigning first concept vector link, second concept vector link, first context vector link, and second context vector link between said three nodes with certain specific attributes and quantifying those attributes;
plotting relative hyperbolic probability graph, concerning each of said links, in order to confirm highest probability of each link and confirming a middle node based on highest probability of links; and
confirming a top pre-defined percentage of nodes selected for tone analysis, wherein a tone-matching algorithm confirms deviation in flow, by plotting a matched curve, by matching the graph of said with ideally expected graph in terms of smoothness of said matched curve.
Typically, said method configured to form a plurality of determined network of nodes, said method comprises the steps of:
classifying each of said nodes into one of: a) an intent node; b) an action connected node; c) a concept node; and d) a border node, at least one node being a part of a plurality of confirmed network of nodes;
representing each networked node upon classification by said classifier in order to confirm unidirectional flow of nodes in a determined network of nodes, said mechanism comprising a processor to performs the steps of:
i) assigning a unique prime number to each route from start to end for each confirmed network of nodes;
ii) starting from each node, the power of prime number is increased, such that this power of prime number is a unique identification for that node;
iii) for nodes present in multiple routes, the product of all numbers assigned to it in different routes is its unique identification number, so that each network of nodes being formed and represented uniquely; and
iv) factorizing said unique identification number to find out all routes for a node and also its position in a network. BRIEF DESCRIPTION OF THE ACCOMPANYING DRAWINGS:
The invention will now be described in relation to the accompanying drawings, in which:
FIGURE 1 illustrates a flow diagram of the method of this invention;
FIGURE 2 illustrates flow and relationship among data points with a working example;
FIGURE 3 represents actual representation of the example of Figure 3;
FIGURE 4 illustrates a split path address naming method where it depicts one such associative relationship;
FIGURE 5 illustrates the prime number route mapping method;
FIGURE 6 depicts the behaviour of probability vs. index with typical range of x;
FIGURE 7 illustrates sentiment trend;
FIGURE 8 illustrates ranking wise success;
FIGURES 9a and 9b success and emotional trends;
FIGURE 10 illustrates relationships;
FIGURE 11 illustrates core concept node demonetization. There is a context from opposition is represented with red edge;
FIGURE 12 illustrates success for concept context mapping; and
FIGURE 13 illustrates phases (or steps) for a thought mapper.
DETAILED DESCRIPTION OF THE ACCOMPANYING DRAWINGS:
For the purposes of this invention, a 'node' is defined as a connected object or device or application or data in a network. Typically, a node is defined by parameters such as its position, behaviour, and value. A node's position and value defines the network behaviour. A node's position defines its relativity with connected nodes and cumulatively defines the network behaviour.
In at least an embodiment, a 'node' is defined by means of a context. A context vector assigns a specific behaviour, weight, direction, and associative capabilities to a node which means that the node's position in a network is defined, a node's association with its connected node is defined, a node's relative position relative to associated nodes is defined, a node's input is defined (thereby defining behaviour), and a node's output is defined (thereby defining behaviour) by this context vector. A rule engine may define such context vectors as outputs.
In at least an embodiment, a 'node' is defined by means of a concept. A concept vector assigns a specific behaviour, weight, direction, and associative capabilities to a node which means that the node's position in a network is defined, a node's association with its connected node is defined, a node's relative position relative to associated nodes is defined, a node's input is defined (thereby defining behaviour), a node's output is defined (thereby defining behaviour) by this concept vector. A rule engine may define such concept vectors as outputs.
For the purposes of this invention, the term, 'context', is defined as a multi- faceted vector which affects at least a node in its networked environment. Context is a function of user, a situation depicted in a particular application, and a situation in which the application is used. It is to be noted that these applications are also nodes and may be associated with a user, its actions, and a relative environment. Hence, perspective of the user with reference to the environment, action sequence, and flow of events represent a context. Thus, context is typically represented by the properties of situation, relationships among events, and properties of events.
For the purposes of this invention, the term, 'concept', is defined as a multi- faceted vector which affects at least a node in its networked environment. Concept is a function of keywords, associated concept, and position in which the application is used. It is to be noted that these applications are also nodes in an environment. Any portion of the application can have multiple concepts. Therefore, an entire application can be represented as a concept map of nodes. Furthermore, a concept is an idea depicted in text, paragraph, or conversation. A text under observation may depict multiple ideas and, hence, can have multiple concepts. Generally, it is expected to represent associated concepts. Some researchers believe that a concept can have multiple meanings. Concept mining focuses on finding out keywords and relationships among them. For the purposes of this specification, a 'concept' is defined as a flow across a set of nodes in a particular application of a networked environment. According to a non-limiting exemplary embodiment, a 'concept' is defined a flow of an idea (behaviour of nodes) across a series of sentences (of an application which is a document) trying to convey a particular meaning (working or output of the network). There can be multiple ideas in this flow. The relationships among relevant words (relevant nodes) are used along with information flow (association and flow of information between nodes) and intent mapping to mine the concepts for that particular environment.
The prominent concepts become nodes of the graph. Since concept relationship cannot be defined in crisp way we have proposed to use fuzzy graph notion for this purpose. The relationships between the two concepts can carry more than one route and also depend on context. To represent this we have proposed use of Multi Edge Graph. It represents this context action association. The flow of information across the multiple nodes contribute to overall concept in the documentation where edges to follow are selected based on context.
According to a non-limiting exemplary embodiment, an application is a document with text or data. Therefore, any paragraph in the document can have multiple concepts and it is important to derive meaning of a paragraph in relation to a particular context using the system and method of this invention.
According to this invention, there is envisaged a system and method to map multiple events on a time line by determining their interdependency to predict the most probable series of events that might have occurred. This is a Probabilistic Intent-Action Ontology and Tone Matching system and method. Various influencing factors need to be considered which alter a node's context vectors and concept vectors. According to a non-limiting exemplary embodiment, in a document application, various aspects of flow of events, continuity, negation, shift of emotions, and the like are considered. The system and method of this invention is based on analyzing multidimensional intent and action relationships, application of naive Bayes theorem to text, plotting relative hyperbolic probability, plotting the tone matching graph, and calculating deviations. This approach can be used to solve many real life problems like solving criminal cases, completing stories, and identifying gaps in data.
According to a non-limiting exemplary embodiment, a story writing competition is envisaged in which beginning of the story is known and end of the story is also available. In this situation, writers are expected to complete a middle part of the story by selecting most relevant piece amongst possible options. The Probabilistic Intent-Action Ontology and Tone Matching Algorithm will study all the drafts in detail and select the most appropriate one. It will also give the relative positions of other drafts with respect to the chosen draft by determining closeness factor.
According to another non-limiting exemplary embodiment, a crime investigation is considered. Assuming that all the data before and after the incident is available, except for a missing link. This algorithm could analyze all the possibilities based on the statements and the most appropriate possibility can be judged.
In many such scenarios locating and deducing missing information remains the challenge. Hidden and missing information detection is not possible simply by determining a few keywords. There are different aspects like tone of the story, positivity and negativity variations, transition of information states along with language flow. Identifying the intent and locating the relevant actions certainly helps. The probabilistic intent action ontology and tone matching algorithm tries to deduce this flow to locate the most relevant piece of hidden information. It can help you to select relevant piece of information, it can even help in determining authenticity of available piece of information with the help of certainly known beginning and end of the text. The change in positivity and negativity across the text is also analyzed.
FIGURE 1 illustrates a flow diagram of the method of this invention.
In accordance with an embodiment of this invention, a set of nodes in a network are defined as input nodes which accept user inputs. In at least an embodiment, these inputs may be text inputs such as text 1, text 3, and n drafts of text2 where text 1, text2, and text3 are in logical or coherent sequence. There is a need to find the order of multiple text artifacts based on multi-graph.
Step 1: The system takes textl and text3 as input.
In accordance with another embodiment of this invention, a mapping device is configured to read each node's vector details and to further map it into the network. This aids in mapping the entire network in terms of node position, node working, association between nodes, flow between nodes, transitions of flow between nodes or in a networked environment of nodes, and the like. In at least a non-limiting exemplary embodiment, where an application is a document, association across multiple text artifacts will present the collated concepts with reference to context. This can further be improved to map to a user's sensitivity index.
In accordance with yet another embodiment of this invention, the mapping device comprises an order determinator which track order of nodes or how data flows.
Step 2: The system takes n drafts of text2 as input to define order. Iteratively we can order even larger number of text artifacts.
In accordance with still another embodiment of this invention, a concept determinator is configured to focus on vectors of a node associated with its position, content, association with other nodes in terms of the associated nodes' position and content. In at least a non-limiting exemplary embodiment, in an application comprising data, concepts are defined as a function of keywords, associated concepts, and positions. The output of a concept determinator is to represent an application in terms of its concept vectors which are associated with nodes. This is further used in determination of context vectors associated with the application.
In accordance with still another embodiment of this invention, a context determinator is configured to focus on vectors of a node associated with its user, association with a node (application), and association of the node (application) in the networked environment. In at least a non-limiting exemplary embodiment, in an application comprising data, contexts are defined as a function of user, situation depicted in the application, and situation in which the application is used.
In accordance with still another embodiment of this invention, a concept mapping device provides a concept map comprising concept vector links. It is configured to derive a concept map between nodes relating to the (input) application that is being examined. In at least one embodiment, a concept map is derived using following five steps: 1. Data Vector organization in order to understand flow of data and indicators, thereof.
2. Concept retrieval using concept vector data per node;
3 Associating concept vectors of various nodes such that two or more concept vectors are associated - this is done using context vectors;
4. Weaving concept vectors to determine a concept map using data relating to the association of Step 3 and data relating to the the relationships of Steps 1 and 2;
5. Determining external parameters which affect the concept vectors and, therefore, the concept map.
FIGURE 13 illustrates phases (or steps) for a thought mapper.
In at least a non-limiting exemplary embodiment, in an application comprising data, a concept map is derived using the following steps.
1. Text organization: Organizing text with flow and indicators;
2. Concept retrieval - first level - The multiple concepts are retrieved from text;
3. Associating concepts: Two or more concepts are associated - this is done using context;
4. Weaving concept to determine concept map: These association and relationships determine concept maps;
5. Determine external parameters and concept maps.
In accordance with still another embodiment of this invention, a context mapping device provides a context map comprising context vector links.
There are total five steps to derive context map:
1. Local clustering: The iterative clustering of text refines the association to locate the core word;
2. Concept retrieval: The concept drift is determined and from each region one or more concepts are retrieved;
3. Associating concepts: Two or more concepts are associated in a particular way. In many cases it forms a concept chain. This chain helps us to build overall context;
4. The association of concepts and mapping them together with reference to context to represent a context map; and
5. Find out association between multiple contexts. Core intent node (word) is a node which, irrespective of its frequency of occurrence, impacts one or more actions in the region under observation
{Represent it mathematically}
V € w$ w→ M| where [M]≠ Φ
[IA] refers to set of impacted prominent actions
Border node (word) is a node that shows relationship with more than one prominent action but not directly impact any action.
¥ w %v$\w \iA] where [ J
≠ Φ and w does not drive IA
When the system traverses through a network of nodes (or a paragraph) to find out intent action relationships, these relationships are derived based on impact of different intent nodes (words). This helps in location of core intent nodes (words). These core intent nodes (words) are used to define a cluster. Simple action relationships are used in this formation. Iterative usage helps to form clusters. A concept drift, in any document (or network), is determined using these clusters. Similarly, clusters are formed and concept drift is determined for an input from another person (for another network of nodes). This helps the system and method to build an overall context and that context is used for creating of a context map. This creates a context map for a user. Another input is scenario. This scenario is represented by parameters. The scenario is analyzed to prepare a context. With reference to this scenario association between multiple context maps help the system and method to find out whether given two persons can work together very well in a given scenario. Thus, there are multiple rankings possible. The ranking of individuals with reference to a given scenario is performed. The thought process mapping is carried out with reference to the topmost person suitable for a given scenario. It is mapped with all the remaining individuals in context of given scenario. Reinforcement learning is used to collect reward and penalties to select best member in any given scenario.
Contextual Sentiment Closeness (CSC) is derived form series of responses by individual for given scenario. These responses are derived through story completion. At later stages, these responses are converted into a series. This data series is compared with reference to core team member. Define a series:
Figure imgf000024_0001
The Association A of the outcome SI can then be calculated as:
Y XJ)
The expected value of Si (j) is then given by Series (J)) - A x £>( )
Data point closeness is defined as x QU) ~~ Serte f (J)
Figure imgf000024_0002
The CSC Value is now calculated as
Figure imgf000024_0003
esc- ::;. i™7
Here n is normalization factor
The algorithm is implemented using closeness factor with reference to identification of core member for given scenario. With reference to core member, the other members are identified in context with a given scenario. Deviation in behavior is used for selection of members. Thought indicating words are plotted on X-axis and responses are plotted on Y-axis. These points are connected to represent an overall thought process of the candidate in a given scenario. This comparison and deviation gives first indicator for thought process association. The graph connecting the text one and text three points in the form of a spline represents thought process of the core candidate. The deviation between thought process of core member and the thought process of aspirant in the team is the sum of magnitude of area differences between these curves.
In accordance with an additional embodiment of this invention, a tokenizer is configured to locate independent meaningful tokens from the nodes (text paragraphs).
In accordance with an additional embodiment of this invention, a remover mechanism is configured to obtain important part of the text removing the least important part.
In accordance with an additional embodiment of this invention, a lemmatisation mechanism is configured to obtain identify relevant multi- forms of the words with reference to importance of them.
Step 3: The system performs tokenization, stop word removal, lemmatization; and a list of words is obtained for each text. From given text, concept vectors and context vectors are derived which define associated nodes.
According to another aspect of this invention, a prime number route mapping mechanism is configured to represent content of a node using a prime number route mapping method. As an output, the nodes are classified into at least three types: a) intent nodes, b) action nodes, and c) concept nodes. In at least a non- limiting exemplary embodiment, where a node is an application with text, words of text textl, text2, text3 are taken in order and compared with database and UIC that is unique Identification code is associated with every word. This is done based on an INTENT-ACTION ONTOLOGY (IAO) database, which is represented using PRIME NUMBER ROUTE MAPPING METHOD (PNRMM).
The words in paragraph are divided in three types by a node (word) classifier:
1. Intent words;
2. Action words;
3. Concept words.
Step 4: The system takes words of textl, text2, text3 in order and compareswith database and UIC (Unique Identification Code) is associated with every word. According to a non-limiting exemplary embodiment, two text items are provided; textl, text3. The objective is to determine a missing text2 of this sequence or context or logic or a piece of continuous text which joins or provides meaning to textl and text3.
In accordance with an embodiment of this invention, a 2-part determinator is provided to determine intent-to-action relation for two parts of input text (textl, text3) portion. A comparator compares it with standard ontology. An assignor assigns relation between these two parts of texts with certain specific attributes and quantifies those attributes. A plotter plots relative hyperbolic probability graph which determines relative probability.
In accordance with another embodiment of this invention, a 3-part determinator provided to determine intent-to-action relation for three parts of input text portion (textl, text2, text3). A comparator compares it with standard ontology. An assignor assigns relation between the parts of texts with certain specific attributes and quantifies those attributes. A plotter plots relative hyperbolic probability graph which determines relative probability to get final probability (based only on intent-action ontology) to determine a set of most probable 'text2' on the basis of intent-action ontology.
In accordance with yet another embodiment of this invention, a tone anaylser analyses these sets of texts in order to determine the most appropriate text on the basis of both: 1) intent-action ontology (IAO) module; and 2) tone analysis module. Intent-action ontology module determines relevance while tone analysis module analyses flow of emotions. Relevant story has higher priority than story with the right flow of emotions. Hence, relevance is judged first and then tone analysis is done.
According to a non-limiting exemplary embodiment of a story as a part of an application or a document:
Textl is as follows:
One fine day thirsty fox was wandering in forest in the dark and in search of water. He could locate well. Unfortunately, while drinking water he slipped and fell into the well. He tried to come out but without success. So, he had no other option but to adjust with situation and remain there. The next day, a goat while searching for shelter came that way. Text2 is as follows:
She looked into the well and saw the fox there. The goat asked, "Hay, How are you? And what are you doing there in water Mr. Fox?"
The shrewd fox replied, "I was thirsty and came here to drink sweet water. It is the tasty water, I have ever tasted. Come and taste it yourself." Without thinking at all, the goat jumped into the well, drank the water and looked for a way to escape. But she also found it difficult like fox.
Then the fox said, "Let me give you an idea. Please stand on your hind legs. Climbing on your head I will jump out. After going out I will arrange for your escape."
Text3 is as follows:
The goat was too innocent to understand the cunningness of the fox and followed what fox said and helped him get out of the well.
While walking away, the fox said, "If you were smart you wouldn't have believed me without looking at aftereffects."
FIGURE 2 illustrates flow and relationship among data points with a working example.
FIGURE 3 represents actual representation of the example of Figure 3.
In accordance with another embodiment of this invention, a linear path representor represents each route or path between nodes of a given application by a straight line using 2-dimensional coordinate geometry. The direction of flow is one along which X-coordinate increases. Based on data, the number of lines is determined. The line (path) with maximum number of words is plotted first with any random slope. The words in that flow are assigned with coordinates of points on that line with an increasing value of X. Then, the next longest sequence is considered which has a word common with the first flow. And the same process is repeated. Thus, using this method, coordinates are easy to plot, process, store in database, and access whenever needed. Further, when new words are added i.e. the database is upgraded, only the new words are given coordinates. In accordance with another embodiment of this invention, a periodic route intersection representor defines and represents each route or path by a periodic linear path representor, longest sequence of words is selected. The most basic path equation y = sin (rue) is selected. The value of n can be increased and all paths can first be plotted. All words are now plotted in increasing order of x and also positioned depending upon routes in which they traversed. Since, routes intersect multiple times, there can be any number of common words. Further, value of n specifies the specific route.
In accordance with another embodiment of this invention, a split path address naming mechanism is configured to provide a unique identification code to every word in a tree structure.
FIGURE 4 illustrates a split path address naming method where it depicts one such associative relationship. The data is divided into number of trees. Each tree has a number. The first word is assigned 0. Then, each branch is assigned a number. Now, there are multiple paths, which are numbered in order. If a certain word leads to next word and does not branch, then that particular path is assigned 1.
To represent the word 'climb' in the above given data, it is assigned 10132 as:
1-1 st tree
0- start
1- first branch
3- third branch
2- second branch
When two words are considered, 'well' and 'climb', It can be concludes that 1013 is followed by 10132 ensuring the flow. Finding the value associated with this relation, I-A index can be computed.
Step 5: The associated UIC is split in a given way that it powers along different routes.
Words from textl and text2 are considered first for calculating I-A index 1 and then pi (probability 1 which is the probability of occurrence of text2 when textl has already occurred.) Words from text2 and text3 are considered then for calculating I-A index2 and then p2 (probability2 which is the probability of occurrence of text3 when text2 has already occurred.)
In accordance with another embodiment of this invention, a prime number route mapping mechanism is configured to represent graphical data in tabular form.
Step 6: Every route is considered one by one and a list of primes is taken in sequence. Based on this, I-A index is calculated using the following database.
FIGURE 5 illustrates the prime number route mapping method.
TABLE 1, below, also determines the prime number route mapping method.
Figure imgf000029_0001
TABLE 1 Note: fwd*3 and bkwd*2 and so on considered 0.
Every unidirectional flow of words can be represented by a route. A unique prime number is assigned to each route using the system and method of this invention. Starting from the each node, the power of prime number is increased. This power of prime number is a unique identification for that word. For words present in multiple routes, the product of all the numbers assigned to it in different routes is its unique identification number. Benefit of using this method is that from one number, all the routes can be determined in which the word is present and also its position in that route. These details are found out by factorization of unique identification number. Consider textl and text2 first and then text2 and text3 for determining relationship attributes and quantifying them. Since these words are one embodiment of nodes for this invention, the association or nodes, as represented by this prime number route mapping method allows for forming a network of nodes in accordance with weights associated with associated prime numbers per node. Thus, each network of node is formed and represented uniquely.
Intent- Action ontology and na ve-Bayes is not used to all 3 texts (textl, text2, text3) as short stories take turns and twists in the middle part and new nodes start in text2. Analysis is done in two parts.
Firstly, words in a draft are converted into their lemma for more accuracy.
The words present in considered texts (i.e. textl and text2) can be represented in the order they occur with their factorized unique identification codes as follows: Consider the order of occurrence of words:
• Goat
• Fox
• Without thinking Step 7:
PI and P2 are calculated by substituting I- A index 1 and I- A index 2 respectively in the given equation of hyperbola.
(x+l)*(y-l)=-l is same as y=x/(l+x) In at least an embodiment, a probability determinator is configured to determine probability of nodes.
Step 8:
For calculating final probability for textl, text2, text3 to occur in sequence, the system and method uses Bayes theorem.
P(T2/(T1+T3)) = (P(T2.1/Tl)*P(T3/T2.1))/sigmal-
>n(P(T2.n/Tl)*P(T3/T2.n))
TABLE 2, below, shows decoding unique number identification
Figure imgf000031_0001
TABLE 2
Table 2 depicts occurrence of simple words. It becomes further complex as the dependency and association increases. The Intent-Action index of the relationship between these two texts can be calculated from the above given table.
2: 2-0-3 = 2-3
= (fwd) 2-3
= 0.56
3: 3-2-4 = 3-2 + 2-4 +3-4
= 3-2(bkwd) + 2-4(fwd*2) +3-4(fwd)
= 0.32 + 0.21 + 0.51
= 1.04
5: 3-2-4 = 3-2 + 2-4 +3-4
= 3-2(bkwd) + 2-4(fwd*2) +3-4(fwd) = 0.41 + 0.25 + 0.42
= 1.08
Total I- A index = 0.56 + 1.04 + 1.08
2.68
I-A index to relative probability function is defined as
(x+l)*(y-l)= -1 which is an equation of a hyperbola,
where I-A index is on x and relative probability is on y
(x+l)*(y-l)=-l is same as y=x/(l+x)
This equation satisfies 3 main conditions:
1. when x~>infinity y— >1
2. when x=0 y=0
3. continuous one as to one function in our domain [0, infinity)
This gives output of comparison of two texts. So relative probability for occurrence of text2 when textl has already occurred is: y = (2.68)/(l+2.68)
= 2.68/3.68
= 0.7283
FIGURE 6 depicts the behaviour of probability vs. index with typical range of x.
Similarly, the relative probability for occurrence of text3 when text2 has already occurred can be found.
In at least an embodiment of the 3-part determinator, relative probability, for all drafts, are found out.
In at least an embodiment of the tone analyzer, a top pre-defined percentage of drafts are selected for tone analysis. The tone-matching algorithm determines deviation in flow of emotions by matching the graph of draft and ideally expected graph. This graph denotes the emotional tone behind the set of sentences. It is assumed that, in any good draft, there are no abrupt changes and hence a smooth curve is expected. In at least an embodiment, a Selector mechanism is configured to select nodes. Step9:
We select top 10% of the draft for tone analysis positivity index=(sigmap -sigman)/no of words from start
Same method is also used to calculate concept relationships.
Plot two graphs 1. Actual 2. Expected
1. Actual
The system and method plots p index versus number of words, from start, for all texts (textl, text2, text3).
Join all these point with a spline.
2. Expected
The system and method considers plotted points of textl and text3 only.
Join all these point with a spline.
FIGURE 7 illustrates sentiment trend.
The tone analyzer first analyses all words in textl and text3 and calculates a positivity index. The output is a database of words which strongly suggest the emotional tone with magnitude (positive or negative).
While plotting the graph, p index is plotted on Y axis and number of words from start on X axis for textl and text3. A spline is drawn passing through these points. This is an expected flow of emotions. Equation of curve and area under the graph is found out. The system and method solves this determinant to give an equation of the curve and integrating it from 0 to W gives the system and method an area under this curve; where W is the total number of words.
Point for each draft of text2 is plotted and a spline is drawn passing through all these point. This is the flow of emotions for the whole story considering this specific draft of text2. Now, area between these two curves, is found out, which will be the deviation of this draft form ideality. This process is repeated for all the drafts. The text2 draft with least deviation is considered appropriate.
SteplO:
DEVIATION is defined as sum of magnitudes of areas between the two graphs. The delivered order can consider emotional flow and intent of delivering the message.
Testing creativity, measuring learnability is always a difficult task. To evaluate the performance of our model a test set of 50 stories were created. Each sample has beginning and end part available. There are 5 options available for middle part of each story. It even extended to 10 options in some cases. The appropriateness of middle part of each story is judged based on ranking by two experts. The system and method performance is measured using this set of stories. All complete stories initially ranked based on trends of emotions in three different classes. Class one is positive to negative emotion trend, class two is negative to positive emotion trend and class three is steady and vibrant emotion trend with too many spikes and valleys.
The test carried out with this system and method and the ranking of each middle part was verified against manual ranking. The sample outcome of the five stories from the set is given below:
TABLE 3, below, illustrates results
Figure imgf000034_0001
TABLE 3 FIGURE 8 illustrates ranking wise success.
The observations clearly suggest that in 82% of the cases the topmost ranking was correct. The second ranking is correct in 74% of the cases. The accuracy for third, fourth and fifth ranking ranges from 62 to 78%. Figure 13 depicts the percentage accuracy with reference to human ranking for different ranks. Another worth noting observation is that the algorithm performs exceptionally well for stories with transition from negative to positive emotions and success rate touches to 90%.
FIGURES 9a and 9b success and emotional trends.
While for positive to negative emotions it performs well and it touches to 80%. For abrupt emotional transitions with many spikes and valleys along with outliers the performance is 60%. The experimentation can be carried out on larger data sets to evaluate this trend further. The distance between different story parts can be mapped with reference to each other to build story from small story parts. Figure 14 depicts success with reference to emotional transition.
Identifying the most relevant text, most relevant route, most relevant data, most interesting product is what world is striving for over the years. This led to AI movement in this direction where multiple areas like NLP, AI, Data Mining, Machine Learning and Sentiment Analysis converged. Creative selection of stories, creative arrangements of parts of text and establishing association among different textual representations still remain a challenge. This can help in solving problems in different domains. It might be a criminal case with missing thread, it could be document with missing information or it could be a story completion. It can be very helpful in literary and creative activities ranging from recreation to building a story. It can help in identifying gaps in the market to decide initiative. This paper proposes a method for context-based prediction using probabilistic intent action ontology approach. Using this approach it identifies multiple relevant routes leading from the first paragraph to end paragraph. It identifies missing link. It uses tone matching to rank possible options. A prime number route mapping methodology is proposed in this paper. This methodology helps you to represent word and route in unique way. Intent Action index helps in deciding the sequence. The tone analysis is also used in ranking these document artifacts. The experimentation is carried out on the 50 stories each with 10 options of missing middle parts, data set build for creativity and association testing purpose. The said algorithm showed 92% accuracy. Here the human ranking is assumed as the correct one. The promising results definitely show the possibility of extending this technique to many applications. In future work, subspace clustering and this approach can be combined to handle larger inputs.
For the purposes of this invention, a core intent node (word) is a node which irrespective of its frequency of occurrence impact one or more actions in the region under observation
For the purposes of this invention, a border node (word) is a node that shows relationship with more than one prominent actions but not directly impact any action.
For the purposes of this invention, an action connected node (word) is a set of node connected by one or more actions.
Any word in given paragraph is classified into one of these types.
INTENT-ACTION ONTOLOGY implies that each node points into various directions. And if the story proceeds along any of those directions action nodes (words) related to that direction are found. PNRMM is mathematical representation of this database. Various flows of nodes (words) is predetermined and each flow is called a route and is represented by a prime number. Nodes (words) in a flow are represented by consecutive powers of that specific prime number. Nodes (words) present in multiple roots are represented by product of powers of primes associated in different routes. This relationship is further represented as a multi edge graph context model and semi-graph.
According to the non-limiting exemplary embodiment, as continued from above, this system and method is divided in 3 major phases:
Phase I: Consider a paragraph. It is parsed through standard parser and properties are extracted. The words in paragraph are divided in three types: 1. Intent nodes (words) 2. Action nodes (words)
3. Concept nodes (words)
The core nodes (words) are identified. These are the nodes (words) associated with one or more action words.
Phase II:
Identify all words which are action reachable from concept nodes (words). The relationships among these nodes (words) build a concept corpus. There can be multiple concepts in a single paragraph.
Core intent node (word) is a node which irrespective of its frequency of occurrence impact one or more actions in the region under observation.
[IA] refers to set of impacted prominent actions
Border node (word): A node that shows relationship with more than one prominent action but not directly impact any action
Action connected node (word): Is a set of nodes connected by one or more actions.
Any word in given paragraph is classified into one of these types. Intent results in action. Even actions are mapped to intent. FIGURE 10 illustrates relationships.
Phase III: Here multiple concepts are associated to build the central concept and associated concepts. Every additional node/concept added to graph result in information gain. This information gain depends on action reachability of the particular node. Standard entropy formulation is used to define information gain. Hence, information gain is defined as:
Joint entropy H(X, Y) of discrete random variable X and Y with probability distribution p(X, Y) / /ϊ Α". f
The conditional entropy will be calculated iteratively to represent overall information gain. A concept path in given context is derived and for that the modified Bayesian is used.
Figure imgf000038_0001
The mutual information due to occurrence of multiple concepts is defined as:
~ pippi
Figure imgf000038_0002
Context helps to decide the context nodes to be traversed. Interestingly primary concept, secondary concept, action reachable points along with preceding information contributes to this concept traversal. In our case we have used simple Bayesian to decide the best path for given context.
According to a non-limiting exemplary embodiment, where concepts are - Demonetization, Cash Crunch, Corruption, Digitization and Black money. There can be many other tokens like ATM, Reserve Bank, Finance Minister can contribute to concept but cannot be concept in itself.
Now context can be derived from core concept, supporting word, prelude and situational parameters. In this case context could be:
{Demonetization, Corruption, Cash Crunch, ATM, Opposition, Procession (Date, Time, Location) }
The same core concept with different context below traverses a different multi-graph path: {Demonetization, Corruption, Finance Minister, ATM, Digitization, Arrested, Action taken (Date, Time, Location)}
In multi-graph multiple concepts are connected by zero or more edges. The association is represented through connection. How to decide degree of node is dependent on action reachability of different concepts. A representative multi-graph with contextual traversal is discussed ahead.
The invention is further discussed in accordance with a non-limiting exemplary embodiment, as below, in paragraphs 1, 2, and 3.
Paragraph 1 :
RBI Governor told the Parliamentary Standing Committee on Finance that there is a reasonable stock of new 500 and 2000 currency notes. You are already aware of the announcement of demonetisation that was made on November 8, 2016. It is about discontinuing old notes from immediate effect.
Paragraph 2:
The International Monetary Fund trimmed India's growth forecast for 2017 by 0.4%, citing Demonetisation effects. Opposition mentioned that there are long queues at ATM and most of the ATM are without cash. Dry ATMs no cash and unavailability of digital resources is posing serious challenge. Is the digitization move working? Is a serious question.
Paragraph 3:
The Income Tax department carried out over 1100 searches and surveys immediately after demonetisation (claimed as one of the step towards digital economy) and detected undisclosed income of over Rs 5,400 crore, Finance Minister told the Rajya Sabha on Thursday. He added that more follow-up action was taken and 18 lakh people were identified whose tax profiles were not in line with the cash deposits made by them in the demonetisation period and on lines responses were sought. Ruling party gave these details during the Question Hour, where he claimed that no other government had taken so much action against black money as the present regime. FIGURE 11 illustrates core concept node demonetization. There is a context from opposition is represented with red edge. This context associates multiple concepts in a particular order. There are multiple edges connecting different concepts. The demonetization is a core concept since there are five action-reachable and three direct-reachable concepts from this node. The concept drift or adversarial point is identified based on degree transformation of nodes while traversing. These points are used for concept
Figure imgf000040_0001
TABLE 4 CONCEPT NODES
Node is action reachable if there is action edge between two nodes. There can be number of action edges and it can be action reachable through other nodes. Same is true direct reachable.
Where∑ c is number of concepts in text
Testing creativity, measuring learnability is always a difficult task. To evaluate the performance of our model we have created a test set of 1 10 paragraphs. The frequency is used in the beginning to deciding relative importance of keywords. The concepts are derived based on reachability. Core, primary and secondary concepts are identified. Based on intent-action relationship these concepts are connected with one or more directed edges. Fuzzy edges with weights are used in case of uncertain relationships. Adversarial nodes are used for mapping concept drifts.
The concept association with the context and story built based on concept relationships are verified using 1 10 paragraph data. The observations clearly suggest that in 80% of the cases the concepts mapped to the context were correct. The context even determines the association among concepts and in 90% of the cases core concept mapping is correct.
FIGURE 12 illustrates success for concept context mapping
The success is checked for three prominent concepts. When primary concept is at the center of the context the success is close to 90%. The success with reference to these three categories is depicted in Figure 6.
Identifying the most relevant text, most relevant route, most relevant data, most interesting product is what world is striving for over the years. It is never absolute and depends on the context. There can be context in document as well as there can be context associated with user. The concept on one side describes the idea explained - context tries to depict a perspective with which the document is looked upon. This paper proposed a method based on multi-graph and fuzzy graph to represent concept context mapping. This can help in solving problems in different domains. It can help to identify key concepts. It can be very helpful in literary and creative activities ranging from recreation to building a story. It can help in identifying gaps in the market to decide initiative. The experimentation is carried out on the 110 text paragraphs. The said algorithm showed close to 90% accuracy. Here the human association of concept to context is assumed as the correct one. The promising results definitely show the possibility of extending this technique to many applications. In future work concept maps can be applied to larger documents.
While this detailed description has disclosed certain specific embodiments for illustrative purposes, various modifications will be apparent to those skilled in the art which do not constitute departures from the spirit and scope of the invention as defined in the following claims, and it is to be distinctly understood that the foregoing descriptive matter is to be interpreted merely as illustrative of the invention and not as a limitation.

Claims

CLAIMS,
1. An intelligent context based prediction system comprising:
- input nodes configured to accept user inputs;
- mapping device configured to read each node's vector details and to further map it into a network, in terms of node position, node working, association between nodes, flow between nodes, transitions of flow between nodes or in a networked environment of nodes, context vector associated with each node, and concept vector associated with each node, said network comprising a first node and a last node, associated with middle node, said middle node linking said first node and said last node in terms of a concept vector and a context vector, thereby forming a first concept vector link between a first node and a middle node, a second concept vector link between a middle node and a last node; a first context vector link between a first node and a middle node, a second context vector link between a middle node and a last node;
- a concept determinator configured to extract at least a concept vector of each node associated with its position, content, association with other nodes in terms of associated nodes' position and content;
- a context determinator configured to extract at least a context vector of each node associated with its user, association with a node, and association of the node in the networked environment;
- a concept mapping device configured to derive a concept map, comprising concept vector links;
- a context mapping device configured to derive a context map, comprising context vector links;
- a 2-part determinator configured to determine a network of nodes ("determined network of nodes") in terms of intent-to-action relation between said first node and said last node to determine relative probability for selection of said middle node from a pool of middle nodes; and
- a 3-part determinator configured to determine and confirm a determined network of nodes ("confirmed network of nodes") in terms of intent-to- action relation between said first node, said middle node, and said last node to confirm relative probability after selection of said middle node from a pool of middle nodes.
2. The intelligent context based prediction system as claimed in claim 1 wherein, said mapping device comprising an order determinator to track order of nodes in terms of flow of data from start to end.
3. The intelligent context based prediction system as claimed in claim 1 wherein, said concept mapping device comprising one or more processors to perform the steps of:
- organizing data vector in order to understand flow of data and indicators, thereof;
- retrieving concept using concept vector data per node;
- associating concept vectors of various nodes such that two or more concept vectors are associated using context vectors;
- weaving concept vectors to determine a concept map using data relating to the association of data relating to associating concept vectors relating to relationships of organized data vectors and retrieved data vectors; and
- determining external parameters which affect said concept vectors and, therefore, said concept map.
4. The intelligent context based prediction system as claimed in claim 1 wherein, said concept determinator comprising:
- tokenizer configured to locate independent meaningful tokens from each of said nodes;
- a remover mechanism configured to obtain important part of each node's content removing the least important part; and
- a lemmatisation mechanism configured to identify relevant multi-forms of the content of a node with reference to its importance;
in order to determine concept vectors associated with each networked node.
5. The intelligent context based prediction system as claimed in claim 1 wherein, said context determinator comprising:
- tokenizer configured to locate independent meaningful tokens from each of said nodes;
- a remover mechanism configured to obtain important part of each node's content removing the least important part; and
- a lemmatisation mechanism configured to identify relevant multi-forms of the content of a node with reference to its importance;
in order to determine context vectors associated with each networked node.
6. The intelligent context based prediction system as claimed in claim 1 wherein, said, system comprising a prime number route mapping mechanism configured to represent content of a node using a prime number route mapping method and to classify each node into one of at least three types: a) intent nodes, b) action nodes, and c) concept nodes; said classification being done by comparing content with pre-populated databases corresping to content segregated with respect to each of said intent node, said action node, and said concept node.
7. The intelligent context based prediction system as claimed in claim 1 wherein, said 2-part determinator comprising:
- a comparator configured to compare content of said first node and said last node with standard ontology;
- an assignor configured to assign first concept vector link, second concept vector link, first context vector link, and second context vector link between said two nodes with certain specific attributes and quantifying those attributes;
- a plotter configured to plot relative hyperbolic probability graph, concerning each of said links, in order to determine relative probability of each link and selecting a middle node based on highest probability of links; and
- a tone analyzer, configured to determine a top pre-defined percentage of nodes selected for tone analysis, wherein a tone-matching algorithm determines deviation in flow, by plotting a matched curve, by matching the graph of said with ideally expected graph in terms of smoothness of said matched curve.
8. The intelligent context based prediction system as claimed in claim 1 wherein, said 3-part determinator comprising:
- a comparator configured to compare content of said first node, said selected middle node, and said last node with standard ontology;
- an assignor configured to assign first concept vector link, second concept vector link, first context vector link, and second context vector link between said three nodes with certain specific attributes and quantifying those attributes;
- a plotter configured to plot relative hyperbolic probability graph, concerning each of said links, in order to confirm highest probability of each link and confirming a middle node based on highest probability of links; and
- a tone analyzer, configured to confirm a top pre-defined percentage of nodes selected for tone analysis, wherein a tone-matching algorithm confirms deviation in flow, by plotting a matched curve, by matching the graph of said with ideally expected graph in terms of smoothness of said matched curve.
9. The intelligent context based prediction system as claimed in claim 1 wherein, system configured to form a plurality of determined network of nodes, said system comprising:
- a node classifier further configured to classify each of said nodes into one of: a) an intent node; b) an action connected node; c) a concept node; and d) a border node, at least one node being a part of a plurality of confirmed network of nodes;
- a prime number route mapping mechanism configured to represent each networked node upon classification by said classifier in order to confirm unidirectional flow of nodes in a determined network of nodes, said mechanism comprising a processor to performs the steps of:
i) assigning a unique prime number to each route from start to end for each confirmed network of nodes;
ii) starting from each node, the power of prime number is increased, such that this power of prime number is a unique identification for that node; iii) for nodes present in multiple routes, the product of all numbers assigned to it in different routes is its unique identification number, so that each network of nodes being formed and represented uniquely; and
iv) factorizing said unique identification number to find out all routes for a node and also its position in a network.
10. An intelligent context based prediction method comprising the steps of:
- accepting user inputs through input nodes;
- reading each node's vector details and to further mapping it into a network, in terms of node position, node working, association between nodes, flow between nodes, transitions of flow between nodes or in a networked environment of nodes, context vector associated with each node, and concept vector associated with each node, said network comprising a first node and a last node, associated with middle node, said middle node linking said first node and said last node in terms of a concept vector and a context vector, thereby forming a first concept vector link between a first node and a middle node, a second concept vector link between a middle node and a last node; a first context vector link between a first node and a middle node, a second context vector link between a middle node and a last node;
- extracting at least a concept vector of each node associated with its position, content, association with other nodes in terms of associated nodes' position and content;
- extracting at least a context vector of each node associated with its user, association with a node, and association of the node in the networked environment; - deriving a concept map, comprising concept vector links;
- deriving a context map, comprising context vector links;
- determining a network of nodes ("determined network of nodes") in terms of intent-to-action relation between said first node and said last node to determine relative probability for selection of said middle node from a pool of middle nodes; and
- confirming a determined network of nodes ("confirmed network of nodes") in terms of intent-to-action relation between said first node, said middle node, and said last node to confirm relative probability after selection of said middle node from a pool of middle nodes.
11.The intelligent context based prediction method as claimed in claim 10 wherein, said step of reading each node's vector details comprising a further step of tracking order of nodes in terms of flow of data from start to end.
12. The intelligent context based prediction method as claimed in claim 10 wherein, said step of deriving a concept map comprising the steps of:
- organizing data vector in order to understand flow of data and indicators, thereof;
- retrieving concept using concept vector data per node;
- associating concept vectors of various nodes such that two or more concept vectors are associated using context vectors;
- weaving concept vectors to determine a concept map using data relating to the association of data relating to associating concept vectors relating to relationships of organized data vectors and retrieved data vectors; and
- determining external parameters which affect said concept vectors and, therefore, said concept map.
13. The intelligent context based prediction method as claimed in claim 10 wherein, said step of deriving a concept map comprising the steps of:
- locating independent meaningful tokens from each of said nodes;
- obtaining important parts of each node's content removing the least important part; and
- identifying relevant multi-forms of the content of a node with reference to its importance;
in order to determine concept vectors associated with each networked node.
14. The intelligent context based prediction method as claimed in claim 1 wherein, said step of deriving a context map comprising the steps of:
- locating independent meaningful tokens from each of said nodes;
- obtaining important part of each node's content removing the least important part; and
- identifying relevant multi-forms of the content of a node with reference to its importance;
in order to determine context vectors associated with each networked node.
15. The intelligent context based prediction method as claimed in claim 10 wherein, said, system comprising a prime number route mapping mechanism configured to represent content of a node using a prime number route mapping method and to classify each node into one of at least three types: a) intent nodes, b) action nodes, and c) concept nodes; said classification being done by comparing content with pre-populated databases corresping to content segregated with respect to each of said intent node, said action node, and said concept node.
16. The intelligent context based prediction method as claimed in claim 10 wherein, said step of determining a network of comprising the steps of:
- comparing content of said first node and said last node with standard ontology;
- assigning first concept vector link, second concept vector link, first context vector link, and second context vector link between said two nodes with certain specific attributes and quantifying those attributes;
- plotting relative hyperbolic probability graph, concerning each of said links, in order to determine relative probability of each link and selecting a middle node based on highest probability of links; and
- determining a top pre-defined percentage of nodes selected for tone analysis, wherein a tone-matching algorithm determines deviation in flow, by plotting a matched curve, by matching the graph of said with ideally expected graph in terms of smoothness of said matched curve.
17. The intelligent context based prediction system as claimed in claim 10 wherein, said step of confirming a network of comprising the steps of:
- comparing content of said first node, said selected middle node, and said last node with standard ontology;
- assigning first concept vector link, second concept vector link, first context vector link, and second context vector link between said three nodes with certain specific attributes and quantifying those attributes;
- plotting relative hyperbolic probability graph, concerning each of said links, in order to confirm highest probability of each link and confirming a middle node based on highest probability of links; and - confirming a top pre-defined percentage of nodes selected for tone analysis, wherein a tone-matching algorithm confirms deviation in flow, by plotting a matched curve, by matching the graph of said with ideally expected graph in terms of smoothness of said matched curve.
18. The intelligent context based prediction method as claimed in claim 10 wherein, said method configured to form a plurality of determined network of nodes, said method comprising the steps of:
- classifying each of said nodes into one of: a) an intent node; b) an action connected node; c) a concept node; and d) a border node, at least one node being a part of a plurality of confirmed network of nodes;
- representing each networked node upon classification by said classifier in order to confirm unidirectional flow of nodes in a determined network of nodes, said mechanism comprising a processor to performs the steps of: v) assigning a unique prime number to each route from start to end for each confirmed network of nodes;
vi) starting from each node, the power of prime number is increased, such that this power of prime number is a unique identification for that node;
vii) for nodes present in multiple routes, the product of all numbers assigned to it in different routes is its unique identification number, so that each network of nodes being formed and represented uniquely; and
viii) factorizing said unique identification number to find out all routes for a node and also its position in a network.
PCT/IN2018/050502 2017-07-31 2018-07-31 An intelligent context based prediction system WO2019026087A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
IN201721027162 2017-07-31
IN201721027162 2017-07-31

Publications (1)

Publication Number Publication Date
WO2019026087A1 true WO2019026087A1 (en) 2019-02-07

Family

ID=65233634

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/IN2018/050502 WO2019026087A1 (en) 2017-07-31 2018-07-31 An intelligent context based prediction system

Country Status (1)

Country Link
WO (1) WO2019026087A1 (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112543419A (en) * 2019-09-20 2021-03-23 中国移动通信集团吉林有限公司 User trajectory prediction method and device based on density clustering
WO2021051516A1 (en) * 2019-09-18 2021-03-25 平安科技(深圳)有限公司 Ancient poem generation method and apparatus based on artificial intelligence, and device and storage medium
CN116451785A (en) * 2023-06-16 2023-07-18 安徽思高智能科技有限公司 RPA knowledge graph construction and operation recommendation method oriented to operation relation
CN117633328A (en) * 2024-01-25 2024-03-01 武汉博特智能科技有限公司 New media content monitoring method and system based on data mining

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7912700B2 (en) * 2007-02-08 2011-03-22 Microsoft Corporation Context based word prediction
US20120029910A1 (en) * 2009-03-30 2012-02-02 Touchtype Ltd System and Method for Inputting Text into Electronic Devices
US20130218876A1 (en) * 2012-02-22 2013-08-22 Nokia Corporation Method and apparatus for enhancing context intelligence in random index based system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7912700B2 (en) * 2007-02-08 2011-03-22 Microsoft Corporation Context based word prediction
US20120029910A1 (en) * 2009-03-30 2012-02-02 Touchtype Ltd System and Method for Inputting Text into Electronic Devices
US20130218876A1 (en) * 2012-02-22 2013-08-22 Nokia Corporation Method and apparatus for enhancing context intelligence in random index based system

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021051516A1 (en) * 2019-09-18 2021-03-25 平安科技(深圳)有限公司 Ancient poem generation method and apparatus based on artificial intelligence, and device and storage medium
CN112543419A (en) * 2019-09-20 2021-03-23 中国移动通信集团吉林有限公司 User trajectory prediction method and device based on density clustering
CN112543419B (en) * 2019-09-20 2022-08-12 中国移动通信集团吉林有限公司 User trajectory prediction method and device based on density clustering
CN116451785A (en) * 2023-06-16 2023-07-18 安徽思高智能科技有限公司 RPA knowledge graph construction and operation recommendation method oriented to operation relation
CN116451785B (en) * 2023-06-16 2023-09-01 安徽思高智能科技有限公司 RPA knowledge graph construction and operation recommendation method oriented to operation relation
CN117633328A (en) * 2024-01-25 2024-03-01 武汉博特智能科技有限公司 New media content monitoring method and system based on data mining
CN117633328B (en) * 2024-01-25 2024-04-12 武汉博特智能科技有限公司 New media content monitoring method and system based on data mining

Similar Documents

Publication Publication Date Title
WO2019026087A1 (en) An intelligent context based prediction system
Tripathy et al. Sentiment classification of movie reviews using GA and NeuroGA
Chiha et al. A complete framework for aspect-level and sentence-level sentiment analysis
Suleiman et al. Arabic sentiment analysis using Naïve Bayes and CNN-LSTM
Kulkarni Intelligent context based prediction using probabilistic intent-action ontology and tone matching algorithm
Owoeye et al. Classification of extremist text on the web using sentiment analysis approach
Nagarajan et al. Analysing traveller ratings for tourist satisfaction and tourist spot recommendation
Hasib Sentiment analysis on Bangladesh airlines review data using machine learning
Midhunchakkaravarthy et al. A novel approach for feature fatigue analysis using HMM stemming and adaptive invasive weed optimisation with hybrid firework optimisation method
Al-Ghalibi et al. NLP based sentiment analysis for Twitter's opinion mining and visualization
Zhu et al. Sentiment analysis methods: Survey and evaluation
Daniel et al. A comparison of machine learning and deep learning methods with rule based features for mixed emotion analysis
Rachidi et al. Classifying toxicity in the Arabic Moroccan dialect on Instagram: a machine and deep learning approach
Trivedi et al. Analysing user sentiment of Indian movie reviews: A probabilistic committee selection model
Jorvekar et al. ABSC-HMLT: Aspect based sentiment classification using hybrid machine learning techniques
Yang et al. Hierarchical dialog state tracking with unknown slot values
Gosavi et al. Answer selection in community question answering portals
Es-Sabery et al. Optimization focused on parallel fuzzy deep belief neural network for opinion mining
Kulkarni Multi-graph-Based Intent Hierarchy Generation to Determine Action Sequence
Korde Information extraction for personalised services based on conference alerts
Lijo et al. Tweets sentiment analysis using multi-lexicon features and SMO
BURLĂCIOIU et al. TEXT MINING IN BUSINESS. A STUDY OF ROMANIAN CLIENT’S PERCEPTION WITH RESPECT TO USING TELECOMMUNICATION AND ENERGY APPS.
Kulkarni Intent-action ontology and tone matching algorithm for organizing news articles
Fenitha et al. ANALYSIS OF TWITTER DATA USING MACHINE LEARNING ALGORITHMS
Yılmaz Spam detection by using network and text embedding approaches

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18841944

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18841944

Country of ref document: EP

Kind code of ref document: A1