WO2015178758A1 - A system and method for analyzing concept evolution using network analysis - Google Patents

A system and method for analyzing concept evolution using network analysis Download PDF

Info

Publication number
WO2015178758A1
WO2015178758A1 PCT/MY2015/050029 MY2015050029W WO2015178758A1 WO 2015178758 A1 WO2015178758 A1 WO 2015178758A1 MY 2015050029 W MY2015050029 W MY 2015050029W WO 2015178758 A1 WO2015178758 A1 WO 2015178758A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
concepts
artifacts
concept
graphs
Prior art date
Application number
PCT/MY2015/050029
Other languages
French (fr)
Inventor
Arun Anand Sadanandan
Dickson Lukose
Klaus TOCHTERMANN
Norbaitiah AMBIAH
Original Assignee
Mimos Berhad
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mimos Berhad filed Critical Mimos Berhad
Publication of WO2015178758A1 publication Critical patent/WO2015178758A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists

Definitions

  • the present invention relates to a system and method for analyzing concept evolution using network analysis.
  • the invention relates to systems and methods in which data is processed to identify important concepts, a network graph produced describing the concepts and connections between them, and identifying and analyzing evolution concepts.
  • Concept Evolution refers to a phenomenon where concepts evolve to newer or other concepts.
  • United States Patent Publication 2013/0151520 A1 describes a method, system and computer program product for inferring topic evolution and emergence in a set of documents.
  • the method comprises forming a group of matrices using text in the documents, and analyzing these matrices to identify a first group of topics as evolving topics and a second group of topics as emerging topics.
  • the matrices includes a first matrix X identifying a multitude of words in each of the documents, a second matrix W identifying a multitude of topics in each of the documents, and a third matrix H identifying a multitude of words for each of the multitude of topics. These matrices are analyzed to identify the evolving and emerging topics.
  • the documents form a streaming dataset
  • two forms of temporal regularizers are used to help identify the evolving topics and the emerging topics in the streaming dataset.
  • this publication discloses a "bag of words" approach which doesn't consider relationships between words.
  • This prior art does not disclose identifying important concepts using knowledge base concepts ensuring richer meaning, as opposed to a keyword approach for evolution.
  • evolution candidate selection this prior art does not disclose generating a series of neighborhood graphs from temporal and geospacial graphs and using these in concept evolution.
  • concept evolution is determined based on an objective equation and two regularizer functions and only considers temporal evolution.
  • United States Patent No. 7,562,076 B2 describes systems and methods for processing search requests include analyzing received queries in order to provide a more sophisticated understanding of the information being sought.
  • queries are parsed into units, which may comprise one or more words or tokens of the query, and the units are related in concept networks.
  • Trend analysis is performed by sorting the queries into subsets along a dimension of interest and comparing concept networks for different subsets.
  • Trend information is usable to enhance a response of an automated search agent to a subsequently received query.
  • This publication does not disclose identification of important concepts using knowledge base concepts.
  • This prior art also does not disclose concept evolution using neighborhood graphs generated from temporal and geospatial graphs.
  • the present invention provides a system and method for analyzing concept evolution using network analysis to automatically process raw data and identify the important concepts; create network graph that describes the concepts and connections between them; and identifies and analyse the evolution concepts.
  • the present invention relates to a system and method for analyzing concept evolution using network analysis.
  • the present invention discovers concept evolution from unstructured data through identification of concepts based on importance analysis to construct a network of important concepts and further discovering evolution of concepts over time and space using network analysis.
  • One aspect of the present invention provides a method for analyzing concept evolution using network analysis comprising inputting artifacts; performing metadata analysis to identify temporal and geospatial information from each of the artifacts and to classify the artifacts as temporal and geospatial artifacts; identifying domain specific important concepts from the artifacts using a domain knowledge base; constructing a set of network graphs using the important concepts and the temporal and geospatial associations; and analyzing the network connections to discover evolution data for the important concepts from temporal network graphs and geospatial network graphs.
  • Another aspect of the invention provides a method wherein the artifacts comprise unstructured data selected from articles, web postings, emails and the like.
  • a further aspect of the invention provides a method wherein the network graphs comprise nodes connected to each other by edges, wherein the nodes represent concepts and the edges represent linkage properties.
  • Yet another aspect of the invention provides a method wherein the Temporal Network Graphs (TNG) comprises concepts belonging to artifacts of the same temporal association.
  • TNG Temporal Network Graphs
  • Temporal Network Graphs (TNG) ⁇ tnwi , tnw 2 , .... tnw n ⁇ and tnwi . to tnw n is temporal network graph of a set of concepts ⁇ C C 2 , .... C n ⁇ .
  • the Geospatial Network Graphs (GNG) comprises concepts belonging to artifacts of the same geospatial association.
  • Geospatial Network Graphs (GNG) ⁇ gnwi , gnw 2 , .... gnw , and gnw, to gnw n is geospatial network graph of a set of concepts ⁇ C C 2 , .... C n ⁇ .
  • a further aspect of the invention provides a method wherein identifying domain specific important concepts comprises identifying knowledge base concepts in the inputted artifacts; and ranking the concepts based on at least frequency of occurrences.
  • Another aspect of the invention provides a method wherein constructing a set of network graphs comprises: for each of the artifacts, identifying network linkage properties, including at least co-occurrence, using the important concepts; and creating multiple temporal and geospatial networks based on the network linkage properties.
  • Still another aspect of the invention provides a method wherein analyzing the network connections to discover evolution data for the important concepts comprises determining the popularity of all nodes in the network based on linkages, at least by degree of centrality; and performing concept evolution based on neighbourhood networks.
  • Yet another aspect of the invention provides a method wherein performing concept evolution based on neighbourhood networks comprises selecting a network graph g n ; selecting a concept c from the network graph g n; performing neighbourhood generation to create a neighbourhood graph ng of the concept c; finding neighbourhood graphs that match with neighbourhood graph ng from the rest of the network graphs (g n+ i , g n+ 2, QN) ; and constructing evolution data for concept c from matched neighbourhood graphs.
  • Another aspect of the invention provides a method wherein finding neighbourhood graphs that match with neighbourhood graph ng from the rest of the network graphs (g n+ i , g n+ 2, QN) comprises selecting network graph g n+1 ; searching for concept c in network graph g n+1 ; if concept c is found, identifying neighbourhood graph for focal concept c; and storing the neighbourhood graph in graph g n+1 .
  • a further aspect of the invention provides a system for analyzing concept evolution using network analysis comprising a data processing module adapted to perform metadata analysis on inputted artifacts to identify temporal and geospatial information from each of the artifacts and to classify the artifacts as temporal and geospatial artifacts; an importance analysis module in communication with a domain knowledge base and adapted to identify domain specific important concepts from the artifacts using the domain knowledge base; a concept network generation module adapted to construct a set of network graphs using the important concepts and the temporal and geospatial associations; and a concept evolution analysis module adapted to analyse the network connections to discover evolution data for the important concepts from temporal network graphs and geospatial network graphs.
  • FIG. 1 illustrates the general architecture of the method of the present invention.
  • FIG. 1 a is a flowchart illustrating the steps of the method of the present invention.
  • FIG. 2 is a flowchart illustrating the process flow of the data processing of the present invention.
  • FIG. 3 is a flowchart illustrating the process flow of the importance analysis of the present invention.
  • FIG. 4 is a flowchart illustrating the process flow of the concept network generation of the present invention.
  • FIG. 5 is a flowchart illustrating the process flow of the concept evolution analysis of the present invention.
  • FIG. 6 illustrates the neighbourhood network of the present invention.
  • FIG. 7 is a flowchart illustrating the process flow of the concept evolution of neighborhood network of the present invention.
  • FIG. 8 is a flowchart illustrating the neighbourhood graph matching of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
  • the present invention provides a system and method for analyzing concept evolution using network analysis.
  • the invention relates to systems and methods in which data is processed to identify important concepts by utilizing network analysis.
  • the system includes a data processing module 1 10, an importance analysis module 120, a concept network generation module 150, and a concept evolution analysis module 140.
  • the data processing module 1 10 is adapted to perform metadata analysis on inputted artifacts 105 to identify temporal and geospatial information from each of the artifacts and to classify the artifacts as temporal and geospatial artifacts 1 15.
  • the importance analysis module 120 is in communication with a domain knowledge base 125 and is adapted to identify domain specific important concepts 135 from the artifacts using the domain knowledge base 125.
  • the concept network generation module 150 is adapted to construct a set of network graphs 145 using the important concepts 135 and the temporal and geospatial associations 1 15.
  • the concept evolution analysis module 140 is adapted to analyse network connections to discover evolution data 155 for the important concepts 135 from temporal network graphs and geospatial network graphs in the set of network graphs 145.
  • the network graphs comprise nodes connected to each other by edges wherein the nodes represent concepts and the edges represent linkage properties.
  • gnw n ⁇ , and gnwi to gnw n is Geospatial Network Graphs (GNG) of a set of concepts ⁇ d, C 2 , .... C n ⁇ .
  • GNG Geospatial Network Graphs
  • Methodology conducted by each of the respective modules is illustrated in more detail in Figures 2 to 8.
  • content 210 of the artifacts 105 is processed in the data processing module 1 10 and temporal and geospatial metadata is extracted 220 from the content 210.
  • the artifacts 105 are grouped according to the metadata extracted 230 and classified as temporal and geospatial artifacts 1 15, as previously described.
  • the artifacts include unstructured data selected from articles, web postings, and e-mails.
  • identification of domain specific important concepts 300 includes identifying knowledge base concepts 310 in the inputted artifacts 105 and ranking the concepts based on at least frequency of occurrences 320. Important concepts 135 are thereby identified.
  • the step of constructing a set of network graphs 400 is illustrated in more detail in Figure 4.
  • Network linkage properties are identified 410 for each of the artifacts 105, including at least co-occurrence, using the important concepts 135. Based on the network linkage properties identified, multiple temporal and geospatial networks are created 420.
  • Figure 5 illustrates concept evolution analysis 500 in which network connections are analysed to discover evolution data for the important concepts.
  • this process includes determining the popularity of all nodes in the network based on linkages, at least by degree of centrality 510, and performing concept evolution based on neighbourhood networks 520.
  • Concept evolution data 155 is produced.
  • the neighbourhood network may be illustrated, for convenience, as seen in Figure 6.
  • the network 600 includes a first level 610 and a second level 620 mapped together with a focal concept 630 and neighbouring concepts 640.
  • a network graph g n is selected 702 and a concept is selected from the network graph g n 705. It is further determined if a concept exists in the concept evolution data (730). If a concept does not exist in the network graph 730, neighborhood generation 710 is performed to create neighbourhood graph of the concept. However, if a concept exists in the concept evolution data the step of selecting concept from network graph is reiterated. The neighborhood of a concept within n level is identified. Matching neighbourhoods are identified 720 within the rest of the graphs (n+1 , n+2,...N). Evolution data for concept c is constructed from matched neighbourhood graphs 740. More concepts 750 and more graphs 760 may be identified. In cases where c is identified on searching 730, a graph (n+1 ) is selected and the above steps are repeated for all graphs.
  • Neighbourhood graph matching 720 is illustrated in more detail with reference to Figure 8.
  • a network graph g n+1 is selected 810 and a search conducted for concept c in network graph g n+1 820. If concept c is found, a neighbourhood graph for focal concept c is identified 830 and the neighbourhood graph stored in graph g n+1 (840. However, if concept c is not found, the step continues by finding neighbourhood graph which share the same neighbours (825) and thereafter store the neighbourhood graph in graph g n+1 (840. Upon storing the neighbourhood graph in graph g n+1 , it is further determined if there are more graphs and if there are more graphs, the methodology continues by reiterating the step 810 till step 840.
  • Constructing evolution data 740 for concept c from the neighborhood graphs involves the input of a set of matching neighborhood graphs for concept c.

Abstract

The present invention relates to systems and methods in which data is processed to identify important concepts, a network graph produced describing the concepts and connections between them, and identifying and analyzing evolution concepts. The system of the present invention comprises at least one data processing module (110) adapted to perform metadata analysis on inputted artifacts to identify temporal and geospatial information from each of the artifacts and to classify the artifacts as temporal and geospatial artifacts; at least one importance analysis module (120) in communication with a domain knowledge base (125) and adapted to identify domain specific important concepts from the artifacts using said domain knowledge base(125); at least one concept network generation module (150) adapted to construct a set of network graphs using the important concepts and the temporal and geospatial associations; and at least one concept evolution analysis module (140) adapted to analyse the network connections to discover evolution data for the important concepts from temporal network graphs and geospatial network graphs. The present invention automatically process raw data and identifies the important concepts; creates network graph that describes the concepts and connections between them; and identifies and analyse the evolution concepts over time and space using network analysis.

Description

A SYSTEM AND METHOD FOR ANALYZING CONCEPT EVOLUTION USING
NETWORK ANALYSIS
FIELD OF INVENTION
The present invention relates to a system and method for analyzing concept evolution using network analysis. In particular, the invention relates to systems and methods in which data is processed to identify important concepts, a network graph produced describing the concepts and connections between them, and identifying and analyzing evolution concepts.
BACKGROUND ART As used herein, the term "Concept Evolution" refers to a phenomenon where concepts evolve to newer or other concepts.
The process of discovering important concepts and their evolution over time and space from unstructured data is challenging due to a number of factors. These may include, for example, the obscure nature of information and the sheer size of the data. It is considered that existing techniques lack the ability to discover Concept Evolution from unstructured data
United States Patent Publication 2013/0151520 A1 describes a method, system and computer program product for inferring topic evolution and emergence in a set of documents. In one embodiment, the method comprises forming a group of matrices using text in the documents, and analyzing these matrices to identify a first group of topics as evolving topics and a second group of topics as emerging topics. The matrices includes a first matrix X identifying a multitude of words in each of the documents, a second matrix W identifying a multitude of topics in each of the documents, and a third matrix H identifying a multitude of words for each of the multitude of topics. These matrices are analyzed to identify the evolving and emerging topics. In an embodiment, the documents form a streaming dataset, and two forms of temporal regularizers are used to help identify the evolving topics and the emerging topics in the streaming dataset. As such, this publication discloses a "bag of words" approach which doesn't consider relationships between words. This prior art does not disclose identifying important concepts using knowledge base concepts ensuring richer meaning, as opposed to a keyword approach for evolution. Also, in terms of evolution candidate selection, this prior art does not disclose generating a series of neighborhood graphs from temporal and geospacial graphs and using these in concept evolution. Further, concept evolution is determined based on an objective equation and two regularizer functions and only considers temporal evolution.
United States Patent No. 7,562,076 B2 describes systems and methods for processing search requests include analyzing received queries in order to provide a more sophisticated understanding of the information being sought. In one embodiment, queries are parsed into units, which may comprise one or more words or tokens of the query, and the units are related in concept networks. Trend analysis is performed by sorting the queries into subsets along a dimension of interest and comparing concept networks for different subsets. Trend information is usable to enhance a response of an automated search agent to a subsequently received query. This publication does not disclose identification of important concepts using knowledge base concepts. This prior art also does not disclose concept evolution using neighborhood graphs generated from temporal and geospatial graphs.
The paper, Web Science 2.0: Identifying Trends through Semantic Social Network Analysis, International Conference on Computational Science and Engineering (2009) describes a set of social network analysis based algorithms for mining the Web, blogs, and online forums to identify trends and find the people launching these new trends. The algorithms have been implemented in Condor, a software system for predictive search and analysis of the Web and especially social networks. Algorithms include the temporal computation of network centrality measures, the visualization of social networks as Cybermaps, a semantic process of mining and analyzing large amounts of text based on social network analysis, and sentiment analysis and information filtering methods. The temporal calculation of betweenness of concepts permits to extract and predict long-term trends on the popularity of relevant concepts such as brands, movies, and politicians. The authors illustrate their approach by qualitatively comparing Web buzz and their Web betweenness for the 2008 US presidential elections, as well as correlating the Web buzz index with share prices. It is considered that this publication does not disclose identification of important concepts using knowledge base concepts. This prior art also does not disclose concept evolution using neighborhood graphs generated from temporal and geospatial graphs.
The paper, "An efficient approach to detecting concept-evolution in network data streams", Australasian Telecommunication Networks and Applications Conference (ATNAC), 201 1 suggests that an important challenge in network management and intrusion detection is the problem of data stream classification to identify new and abnormal traffic flows. An open research issue in this context is said to be concept- evolution, which involves the emergence of a new class in the data stream. The authors suggest that most traditional data classification techniques are based on the assumption that the number of classes does not change over time. However, it is suggested that is not the case in real world networks, and that existing methods generally do not have the capability of identifying the evolution of a new class in the data stream. In this paper, the authors present an approach to the detection of classes in data streams that exhibit concept-evolution. In particular, it is suggested that the authors' approach is able to improve both accuracy and computational efficiency by eliminating "noise" clusters in the analysis of concept evolution. Through an evaluation on simulated and benchmark data sets, the authors demonstrate that their approach achieves comparable accuracy to an existing scheme from the literature with a significant reduction in computational complexity. It is considered that this publication does not disclose identification of important concepts using knowledge base concepts. This prior art also does not disclose concept evolution using neighborhood graphs generated from temporal and geospatial graphs.
The present invention provides a system and method for analyzing concept evolution using network analysis to automatically process raw data and identify the important concepts; create network graph that describes the concepts and connections between them; and identifies and analyse the evolution concepts. SUMMARY OF INVENTION
The present invention relates to a system and method for analyzing concept evolution using network analysis. In particular, the present invention discovers concept evolution from unstructured data through identification of concepts based on importance analysis to construct a network of important concepts and further discovering evolution of concepts over time and space using network analysis.
One aspect of the present invention provides a method for analyzing concept evolution using network analysis comprising inputting artifacts; performing metadata analysis to identify temporal and geospatial information from each of the artifacts and to classify the artifacts as temporal and geospatial artifacts; identifying domain specific important concepts from the artifacts using a domain knowledge base; constructing a set of network graphs using the important concepts and the temporal and geospatial associations; and analyzing the network connections to discover evolution data for the important concepts from temporal network graphs and geospatial network graphs.
Another aspect of the invention provides a method wherein the artifacts comprise unstructured data selected from articles, web postings, emails and the like.
A further aspect of the invention provides a method wherein the network graphs comprise nodes connected to each other by edges, wherein the nodes represent concepts and the edges represent linkage properties. Yet another aspect of the invention provides a method wherein the Temporal Network Graphs (TNG) comprises concepts belonging to artifacts of the same temporal association.
Another aspect of the invention provides a method wherein Temporal Network Graphs (TNG) = {tnwi , tnw2, .... tnwn} and tnwi.to tnwn is temporal network graph of a set of concepts {C C2, .... Cn}. Still another aspect of the invention provides a method wherein the Geospatial Network Graphs (GNG) comprises concepts belonging to artifacts of the same geospatial association. Another aspect of the invention provides a method wherein Geospatial Network Graphs (GNG) = {gnwi , gnw2, .... gnw , and gnw, to gnwn is geospatial network graph of a set of concepts {C C2, .... Cn}.
A further aspect of the invention provides a method wherein identifying domain specific important concepts comprises identifying knowledge base concepts in the inputted artifacts; and ranking the concepts based on at least frequency of occurrences.
Another aspect of the invention provides a method wherein constructing a set of network graphs comprises: for each of the artifacts, identifying network linkage properties, including at least co-occurrence, using the important concepts; and creating multiple temporal and geospatial networks based on the network linkage properties.
Still another aspect of the invention provides a method wherein analyzing the network connections to discover evolution data for the important concepts comprises determining the popularity of all nodes in the network based on linkages, at least by degree of centrality; and performing concept evolution based on neighbourhood networks.
Yet another aspect of the invention provides a method wherein performing concept evolution based on neighbourhood networks comprises selecting a network graph gn; selecting a concept c from the network graph gn; performing neighbourhood generation to create a neighbourhood graph ng of the concept c; finding neighbourhood graphs that match with neighbourhood graph ng from the rest of the network graphs (gn+i , gn+2, QN) ; and constructing evolution data for concept c from matched neighbourhood graphs.
Another aspect of the invention provides a method wherein finding neighbourhood graphs that match with neighbourhood graph ng from the rest of the network graphs (gn+i , gn+2, QN) comprises selecting network graph gn+1 ; searching for concept c in network graph gn+1 ; if concept c is found, identifying neighbourhood graph for focal concept c; and storing the neighbourhood graph in graph gn+1.
A further aspect of the invention provides a system for analyzing concept evolution using network analysis comprising a data processing module adapted to perform metadata analysis on inputted artifacts to identify temporal and geospatial information from each of the artifacts and to classify the artifacts as temporal and geospatial artifacts; an importance analysis module in communication with a domain knowledge base and adapted to identify domain specific important concepts from the artifacts using the domain knowledge base; a concept network generation module adapted to construct a set of network graphs using the important concepts and the temporal and geospatial associations; and a concept evolution analysis module adapted to analyse the network connections to discover evolution data for the important concepts from temporal network graphs and geospatial network graphs.
The present invention consists of features and a combination of parts hereinafter fully described and illustrated in the accompanying drawings, it being understood that various changes in the details may be made without departing from the scope of the invention or sacrificing any of the advantages of the present invention.
BRIEF DESCRIPTION OF ACCOMPANYING DRAWINGS
To further clarify various aspects of some embodiments of the present invention, a more particular description of the invention will be rendered by references to specific embodiments thereof, which are illustrated in the appended drawings. It is appreciated that these drawings depict only typical embodiments of the invention and are therefore not to be considered limiting of its scope. The invention will be described and explained with additional specificity and detail through the accompanying drawings in which: FIG. 1 illustrates the general architecture of the method of the present invention.
FIG. 1 a is a flowchart illustrating the steps of the method of the present invention.
FIG. 2 is a flowchart illustrating the process flow of the data processing of the present invention.
FIG. 3 is a flowchart illustrating the process flow of the importance analysis of the present invention. FIG. 4 is a flowchart illustrating the process flow of the concept network generation of the present invention.
FIG. 5 is a flowchart illustrating the process flow of the concept evolution analysis of the present invention.
FIG. 6 illustrates the neighbourhood network of the present invention.
FIG. 7 is a flowchart illustrating the process flow of the concept evolution of neighborhood network of the present invention.
FIG. 8 is a flowchart illustrating the neighbourhood graph matching of the present invention. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS
The present invention provides a system and method for analyzing concept evolution using network analysis. In particular, the invention relates to systems and methods in which data is processed to identify important concepts by utilizing network analysis.,
Hereinafter, this specification will describe the present invention according to the preferred embodiments. It is to be understood that limiting the description to the preferred embodiments of the invention is merely to facilitate discussion of the present invention and it is envisioned without departing from the scope of the appended claims.
Referring to Figure 1 , a system 100 for analyzing concept evolution using network analysis is illustrated. The system includes a data processing module 1 10, an importance analysis module 120, a concept network generation module 150, and a concept evolution analysis module 140.
The data processing module 1 10 is adapted to perform metadata analysis on inputted artifacts 105 to identify temporal and geospatial information from each of the artifacts and to classify the artifacts as temporal and geospatial artifacts 1 15. The importance analysis module 120 is in communication with a domain knowledge base 125 and is adapted to identify domain specific important concepts 135 from the artifacts using the domain knowledge base 125. The concept network generation module 150 is adapted to construct a set of network graphs 145 using the important concepts 135 and the temporal and geospatial associations 1 15. The concept evolution analysis module 140 is adapted to analyse network connections to discover evolution data 155 for the important concepts 135 from temporal network graphs and geospatial network graphs in the set of network graphs 145. The network graphs comprise nodes connected to each other by edges wherein the nodes represent concepts and the edges represent linkage properties. Temporal Network Graphs (TNG) in the set of network graphs comprise concepts belonging to artifacts of the same temporal association wherein TNG = {tnwi, tnw2, .... tnwn} and tnwi to tnwn is Temporal Network Graphs (TNG) of a set of concepts {Ci , C2, .... Cn}. Geospatial Network Graphs (GNG) comprises concepts belonging to artifacts of the same geospatial association wherein GNG = {gnwL gnw2, .... gnwn}, and gnwi to gnwn is Geospatial Network Graphs (GNG) of a set of concepts {d, C2, .... Cn}. Methodology conducted by each of the respective modules is illustrated in more detail in Figures 2 to 8. Referring to Figure 2, content 210 of the artifacts 105 is processed in the data processing module 1 10 and temporal and geospatial metadata is extracted 220 from the content 210. The artifacts 105 are grouped according to the metadata extracted 230 and classified as temporal and geospatial artifacts 1 15, as previously described. The artifacts include unstructured data selected from articles, web postings, and e-mails.
Referring to Figure 3, identification of domain specific important concepts 300 includes identifying knowledge base concepts 310 in the inputted artifacts 105 and ranking the concepts based on at least frequency of occurrences 320. Important concepts 135 are thereby identified. The step of constructing a set of network graphs 400 is illustrated in more detail in Figure 4. Network linkage properties are identified 410 for each of the artifacts 105, including at least co-occurrence, using the important concepts 135. Based on the network linkage properties identified, multiple temporal and geospatial networks are created 420.
Figure 5 illustrates concept evolution analysis 500 in which network connections are analysed to discover evolution data for the important concepts. In particular, this process includes determining the popularity of all nodes in the network based on linkages, at least by degree of centrality 510, and performing concept evolution based on neighbourhood networks 520. Concept evolution data 155 is produced.
The neighbourhood network may be illustrated, for convenience, as seen in Figure 6. As illustrated, the network 600 includes a first level 610 and a second level 620 mapped together with a focal concept 630 and neighbouring concepts 640.
Referring to this Figure 7, the process flow for concept evolution of neighbourhood network 700 is illustrated. As illustrated, a network graph gn is selected 702 and a concept is selected from the network graph gn 705. It is further determined if a concept exists in the concept evolution data (730). If a concept does not exist in the network graph 730, neighborhood generation 710 is performed to create neighbourhood graph of the concept. However, if a concept exists in the concept evolution data the step of selecting concept from network graph is reiterated. The neighborhood of a concept within n level is identified. Matching neighbourhoods are identified 720 within the rest of the graphs (n+1 , n+2,...N). Evolution data for concept c is constructed from matched neighbourhood graphs 740. More concepts 750 and more graphs 760 may be identified. In cases where c is identified on searching 730, a graph (n+1 ) is selected and the above steps are repeated for all graphs.
Neighbourhood graph matching 720 is illustrated in more detail with reference to Figure 8. In this figure, a network graph gn+1 is selected 810 and a search conducted for concept c in network graph gn+1 820. If concept c is found, a neighbourhood graph for focal concept c is identified 830 and the neighbourhood graph stored in graph gn+1 (840. However, if concept c is not found, the step continues by finding neighbourhood graph which share the same neighbours (825) and thereafter store the neighbourhood graph in graph gn+1 (840. Upon storing the neighbourhood graph in graph gn+1 , it is further determined if there are more graphs and if there are more graphs, the methodology continues by reiterating the step 810 till step 840.
Constructing evolution data 740 for concept c from the neighborhood graphs involves the input of a set of matching neighborhood graphs for concept c. An evolution score is assigned for concept A by analysing the neighborhood graphs. Examples of graph matching rules to classify evolution may be summarized as follows:
1. High Evolution: If graphs A and B share different neighbouring concepts and the collective popularity index for graph B is high, focal concept graph A has evolved to focal concept in graph B;
2. High Evolution: If graphs A and B have different focal concepts and the collective popularity index is high, focal concept in first graph has evolved to focal concept in graph; 3. No evolution: If graphs A and B share the same concept nodes as neighbours and the popularity is similar; and 4. Popularity Evolution: If graphs A and B share the same concept nodes as neighbors and the popularity for graph B is low, the focal concept has lost popularity. Unless the context requires otherwise or specifically stated to the contrary, integers, steps or elements of the invention recited herein as singular integers, steps or elements clearly encompass both singular and plural forms of the recited integers, steps or elements. Throughout this specification, unless the context requires otherwise, the word "comprise", or variations such as "comprises" or "comprising", will be understood to imply the inclusion of a stated step or element or integer or group of steps or elements or integers, but not the exclusion of any other step or element or integer or group of steps, elements or integers. Thus, in the context of this specification, the term "comprising" is used in an inclusive sense and thus should be understood as meaning "including principally, but not necessarily solely".
It will be appreciated that the foregoing description has been given by way of illustrative example of the invention and that all such modifications and variations thereto as would be apparent to persons of skill in the art are deemed to fall within the broad scope and ambit of the invention as herein set forth.

Claims

1 . A method for analyzing concept evolution using network analysis comprising:
inputting artifacts (105);
performing metadata analysis to identify temporal and geospatial information from each of the artifacts and to classify the artifacts as temporal and geospatial artifacts (200);
identifying domain specific important concepts from the artifacts using a domain knowledge base (300);
constructing a set of network graphs using the important concepts and the temporal and geospatial associations (400); and
analyzing the network connections to discover evolution data for the important concepts from Temporal Network Graphs and Geospatial Network Graphs (500).
2. The method of claim 1 , wherein said artifacts includes unstructured data selected from articles, web postings, and emails. .
3. The method of claim 1 , wherein said Network Graphs comprise nodes connected to each other by edges.
4. The method of claim 1 , wherein said nodes represent concepts and said edges represent linkage properties.
5. The method of claim 1 , wherein said Temporal Network Graphs (TNG) comprise concepts belonging to artifacts of the same temporal association.
6. The method of claim 5, wherein said TNG = {triw^ tnw2, .... tnwn} and tnw^ Temporal network graph of a set of concepts {d , C2; .... Cn}.
7. The method of claim 1 , wherein said Geospatial Network Graphs (GNG) comprises concepts belonging to artifacts of the same geospatial association.
8. The method of claim 7, wherein said GNG = {gnw1 ( gnw2, .... gnwr,}, and gnw, : Geospatial network graph of a set of concepts {Cu C2, .... Cn}.
The method of claim 1 , wherein identifying domain specific important concepts (300) comprises:
identifying knowledge base concepts in the inputted artifacts (310); and ranking the concepts based on at least frequency of occurrences (320).
The method of claim 1 , wherein constructing a set of network graphs (400) comprises:
identifying network linkage properties for each of the artifacts, including at least co-occurrence, using the important concepts (410); and creating multiple temporal and geospatial networks based on the network linkage properties (420).
1 1 . The method of claim 1 , wherein analyzing the network connections to discover evolution data for the important concepts (500) comprises:
determining the popularity of all nodes in the network based on linkages, at least by degree of centrality (510); and
performing concept evolution based on neighbourhood networks (520).
12. The method of claim 10, wherein performing concept evolution based on neighbourhood networks comprises:
selecting a network graph gn (702);
selecting a concept c from said network graph gn (705);
performing neighbourhood generation to create a neighbourhood graph ng of the concept c (710);
finding neighbourhood graphs that match with neighbourhood graph ng from the rest of the network graphs (gn+i , gn+2, gN) (720); and constructing evolution data for concept c from matched neighbourhood graphs (740). The method of claim 1 1 , wherein finding neighbourhood graphs that match with neighbourhood graph ng from the rest of the network graphs (gn+1 , gn+2, gN)
(720) comprises:
selecting network graph gn+1 (810);
searching for concept c in network graph gn+1 (820);
if concept c is found, identifying neighbourhood graph for focal concept c
(830); and
storing the neighbourhood graph in graph gn+1 (840).
A system (100) for analyzing concept evolution using network analysis comprising:
at least one data processing module (1 10) adapted to perform metadata analysis on inputted artifacts to identify temporal and geospatial information from each of the artifacts and to classify the artifacts as temporal and geospatial artifacts;
at least one importance analysis module (120) in communication with a domain knowledge base (125) and adapted to identify domain specific important concepts from the artifacts using said domain knowledge base(125);
at least one concept network generation module (150) adapted to construct a set of network graphs using the important concepts and the temporal and geospatial associations; and
at least one concept evolution analysis module (140) adapted to analyse the network connections to discover evolution data for the important concepts from temporal network graphs and geospatial network graphs.
PCT/MY2015/050029 2014-05-19 2015-05-07 A system and method for analyzing concept evolution using network analysis WO2015178758A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
MYPI2014001434 2014-05-19
MYPI2014001434A MY184201A (en) 2014-05-19 2014-05-19 A system and method for analyzing concept evolution using network analysis

Publications (1)

Publication Number Publication Date
WO2015178758A1 true WO2015178758A1 (en) 2015-11-26

Family

ID=53398172

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/MY2015/050029 WO2015178758A1 (en) 2014-05-19 2015-05-07 A system and method for analyzing concept evolution using network analysis

Country Status (2)

Country Link
MY (1) MY184201A (en)
WO (1) WO2015178758A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019132648A1 (en) * 2017-12-26 2019-07-04 Mimos Berhad System and method for identifying concern evolution within temporal and geospatial windows
US11244013B2 (en) 2018-06-01 2022-02-08 International Business Machines Corporation Tracking the evolution of topic rankings from contextual data

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7562076B2 (en) 2003-11-12 2009-07-14 Yahoo! Inc. Systems and methods for search query processing using trend analysis
WO2012091539A1 (en) * 2010-12-28 2012-07-05 Mimos Berhad A semantic similarity matching system and a method thereof
US20130151520A1 (en) 2011-12-09 2013-06-13 International Business Machines Corporation Inferring emerging and evolving topics in streaming text

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7562076B2 (en) 2003-11-12 2009-07-14 Yahoo! Inc. Systems and methods for search query processing using trend analysis
WO2012091539A1 (en) * 2010-12-28 2012-07-05 Mimos Berhad A semantic similarity matching system and a method thereof
US20130151520A1 (en) 2011-12-09 2013-06-13 International Business Machines Corporation Inferring emerging and evolving topics in streaming text

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"An efficient approach to detecting concept-evolution in network data streams", AUSTRALASIAN TELECOMMUNICATION NETWORKS AND APPLICATIONS CONFERENCE (ATNAC, 2011
ERFANI S M ET AL: "An efficient approach to detecting concept-evolution in network data streams", 2011 IEEE AUSTRALASIAN TELECOMMUNICATION NETWORKS AND APPLICATIONS CONFERENCE (ATNAC), 9 November 2011 (2011-11-09), pages 1 - 7, XP032069636, ISBN: 978-1-4577-1711-6, DOI: 10.1109/ATNAC.2011.6096654 *
MASUD M M ET AL: "Addressing concept-evolution in concept-drifting data streams", 2010 IEEE 10TH INTERNATIONAL CONFERENCE ON DATA MINING (ICDM), 13 December 2010 (2010-12-13), pages 929 - 934, XP031854332, ISBN: 978-1-4244-9131-5 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019132648A1 (en) * 2017-12-26 2019-07-04 Mimos Berhad System and method for identifying concern evolution within temporal and geospatial windows
US11244013B2 (en) 2018-06-01 2022-02-08 International Business Machines Corporation Tracking the evolution of topic rankings from contextual data

Also Published As

Publication number Publication date
MY184201A (en) 2021-03-25

Similar Documents

Publication Publication Date Title
Ferrara Measuring social spam and the effect of bots on information diffusion in social media
Garimella et al. Quantifying controversy on social media
Nouh et al. Understanding the radical mind: Identifying signals to detect extremist content on twitter
Al-Qurishi et al. Leveraging analysis of user behavior to identify malicious activities in large-scale social networks
Fernquist et al. Political bots and the Swedish general election
Marchetti-Bowick et al. Learning for microblogs with distant supervision: political forecasting with Twitter
Lin et al. Malicious URL filtering—A big data application
CN104809108B (en) Information monitoring analysis system
US10135723B2 (en) System and method for supervised network clustering
US20170109358A1 (en) Method and system of determining enterprise content specific taxonomies and surrogate tags
Aljabri et al. Machine learning-based social media bot detection: a comprehensive literature review
Sheu et al. An efficient incremental learning mechanism for tracking concept drift in spam filtering
Zhang et al. Tweetscore: Scoring tweets via social attribute relationships for twitter spammer detection
Zarrad et al. The evaluation of the public opinion-a case study: Mers-cov infection virus in ksa
Ellaky et al. Systematic literature review of social media bots detection systems
Zhu et al. Emotional community detection in social network
Amira et al. Detection and Analysis of Fake News Users’ Communities in Social Media
Alzahrani et al. Finding organizational accounts based on structural and behavioral factors on twitter
WO2015178758A1 (en) A system and method for analyzing concept evolution using network analysis
Elezaj et al. Criminal network community detection in social media forensics
Enoki et al. User community reconstruction using sampled microblogging data
Beskow et al. Bot-Match: Social Bot Detection with Recursive Nearest Neighbors Search
Kausar et al. Towards understanding trends manipulation in Pakistan Twitter
Sun et al. Modeling for user interaction by influence transfer effect in online social networks
Garg et al. Multilayer perceptron optimization approaches for detecting spam on social media based on recursive feature elimination

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15729252

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15729252

Country of ref document: EP

Kind code of ref document: A1