CN112328801A

CN112328801A - Method for predicting group events by event knowledge graph

Info

Publication number: CN112328801A
Application number: CN202011043065.6A
Authority: CN
Inventors: 潘磊; 代翔; 崔莹; 廖泓舟; 刘鑫; 丁洪丽
Original assignee: Southwest Electronic Technology Institute No 10 Institute of Cetc
Current assignee: Southwest Electronic Technology Institute No 10 Institute of Cetc
Priority date: 2020-09-28
Filing date: 2020-09-28
Publication date: 2021-02-05
Anticipated expiration: 2040-09-28
Also published as: CN112328801B

Abstract

The invention discloses a method for predicting a group event by an event knowledge graph, and relates to an event knowledge graph mining and applying technology. The invention is realized by the following technical scheme: the method comprises two stages of prediction model training and real-time prediction; in the prediction model training stage, historical event maps are constructed by taking historical structured event data as input data, vectorization representation of the event maps is learned by adopting a graph embedded network, and whether events occur or not is predicted based on a classification network model established by a deep neural network; in the real-time prediction stage, a real-time event map is constructed by taking real-time structured event data as input data and is used as input data of a trained event prediction model, the event map constructed by the real-time event data is input into the event prediction model, vectorization expression of the event map is obtained, further, deep semantic information of the event data is mined and converted into a two-classification problem of event occurrence and non-occurrence, and the result with the highest probability is used as the prediction result of whether the event occurs or not.

Description

Method for predicting group events by event knowledge graph

Technical Field

The invention relates to knowledge graph mining in the field of data mining, in particular to an event knowledge graph mining and application technology, and particularly relates to a group event prediction method based on event knowledge graph mining.

Background

In the three words of group, burst and event, surrounding the word "event", both group and burst are phrases used to describe a certain state or a certain property of the "event". The group sudden event refers to the behavior and activity of an agent with the same interest appeal, which conflicts or resists with public order and public safety in a spontaneous or organized crowd mode under certain time, space and psychological conditions. Some refer to "incidents" and "events" as emergencies because many of the incidents occur suddenly, or some subjects of the incidents behave as a group; one may put "population" and "event" together to express "population event". Different expressions show different emphasis points. The generalized group event refers to a group conflict behavior which disturbs social order and endangers public safety and is generated by violating national laws and regulations due to serious divergence between benefits of validity and incapability of being solved through judicial approaches. A narrow sense of group event is that some groups with similar or identical interests implement group behaviors that disturb social order and endanger public safety when the appeal of the interest lacks negotiation mechanism and maintenance mechanism. Characteristics of mass events include mass nature of behavior subjects, openness of behavior forms, complex handling of mass events, and often serious event hazards. The group event is very obvious and is typically characterized in that the time of the group is sudden, and because the uncontrollable factors of the event are more, no effective monitoring and identifying method is provided at the present stage, and the occurrence of the event cannot be controlled, so that the occurrence place, scale, form and the like of the group event are difficult to predict. The network public opinion has a specific forming mode, and the size of a node in a hot topic of a network domain is controlled by the frequency of keywords. In a visual graph, node centrality intuitively describes the number of individuals that are directly connected to the point. The larger the centrality is, the more the centrality and frequency of the keyword are, which indicates that the node is in the central position and has a larger influence on the whole network. And other nodes surround the keywords with high centrality and jointly form the clustering content of the graph.

The network public opinion has strong two-sidedness, on one hand, the network public opinion can help the vulnerable party, so that the appeal of the vulnerable group is met, such as network corrosion; on the other hand, the method can be used by other people, so that the method helps to expand the influence of events and causes more serious consequences. At this stage, the range of events involved in the population has spread across various industries and fields. Aiming at the problem of detection and prediction of a group objection event, the prior art provides two large-scale frequent subgraph parallel mining algorithms, namely a PTrGraM algorithm facing a graph set and an SSiGraM algorithm facing a single big graph. The existing frequent subgraph mining algorithm is executed in a single machine in series, and the requirements of large-scale input graphs and low-threshold mode mining cannot be effectively met, so that the parallelization mining frequent subgraphs are provided. Since data in a population event often has close correlation, a graph is a widely-used data structure and is very suitable for describing the data with close correlation, and the data is organized into a Knowledge Graph (KG). The knowledge map describes concepts, entities, events and the relation among the concepts, the entities and the events in the objective world, is a graph which takes scientific knowledge as a measurement research object and displays the development process and the structural relation of the scientific knowledge, has the dual properties and characteristics of a graph and a spectrum, and belongs to the field of scientific metrology. On the basis of expressing scientific knowledge units and relations thereof by mathematical models, a two-dimensional or three-dimensional graph, namely a knowledge graph, is further drawn in a visual form. Wherein, the concept means that people form conceptualized representation of objective things in the process of understanding the world, such as people, animals, organizations and the like. An entity is a specific thing in the objective world; an event is an activity of an objective event; relationships describe the objectively existing associative relationships between concepts, entities, and events. The knowledge graph describes the complex relation between concepts and entities in the objective world in a structured form, expresses the information of the Internet into a form closer to the human cognitive world, and provides the capability of better organizing, managing and understanding the mass information of the Internet. The knowledge graph technology is used for establishing a knowledge graph, and is a cross research integrating cognitive computation, knowledge representation and reasoning, information retrieval and extraction, natural language processing and semantic Web, data mining, machine learning and the like. The knowledge graph can help to understand big data through semantic link, obtain the insight of the big data and provide decision support. Knowledge maps can discover, describe, explain, predict, and evaluate scientific knowledge. The method has more important significance for the book information subject and is also beneficial to information retrieval, information classification, information service and the like. Knowledge-graphs are an effective way to deal with relational analysis problems. The event knowledge graph comprises the whole process of occurrence and development of all historical events, and simultaneously describes the relations of sequence, cause and effect, time sequence and the like among different events. If the event knowledge graph is carefully analyzed to extract valuable information about the event rules and patterns contained therein, then future events can be predicted by the information. Scientific knowledge as a knowledge graph is a new concept proposed by Google corporation in 2012, and the knowledge graph is a large-scale semantic network including entities (entitys), concepts (concepts) and various semantic relationships between the entities (concepts). A semantic network is a way to represent knowledge in a graphical (Graphic) form by points and edges, the basic constituent elements of which are points and edges. Points of the semantic network may be entities, concepts and values. Edges in a knowledge graph can be divided into two categories, Property (Property) and relationship (relationship). Relationships can be thought of as a special class of attributes that are essentially relationships when an attribute value of an entity is also an entity. Edges in a semantic network can be classified into subclass (subclasso) relationships between concepts, instance (instanceof) relationships between entities and concepts, and various attribute relationships between entities according to the types of nodes at both ends of the edges. The construction of the knowledge graph is an iterative updating process, and each iteration comprises three stages according to the logic of knowledge acquisition: information extraction: extracting entities, attributes and interrelations among the entities from various types of data sources, and forming ontology knowledge expression on the basis; and (3) knowledge fusion: after obtaining new knowledge, it needs to be integrated to eliminate contradictions and ambiguities, for example, some entities may have multiple expressions, a certain name may correspond to multiple different entities, etc.; knowledge processing: for the new fused knowledge, after quality evaluation (part of the knowledge needs to be manually screened), the qualified part of the knowledge can be added into the knowledge base to ensure the quality of the knowledge base.

The knowledge graph is widely applied in the directions of data analysis, intelligent recommendation, intelligent search, man-machine interaction, decision support and the like. How to express the graph structure data of the knowledge graph, such as nodes, edges and the like, into a mathematical model, particularly a vector mode required by a neural network model for knowledge mining so as to complete subsequent tasks such as entity classification, event prediction and the like is a key point for the trend of the knowledge graph to be widely applied and is also a key point and a hotspot for the related technical research of the knowledge graph.

Graph-based learning is a technique that combines models with machine learning methods, and there are two main categories that are popular at present: graph embedding (grapphembedding) and graph neural networks (GraphNeutralNetwork). The application of the graph model is very wide, such as social networks and communication networks. The graph model in the security field is also related to more and more extensive applications, such as black and grey product group mining, security knowledge mapping, fraud detection and the like. Real graphs or networks are often difficult to process in high dimension, and graph embedding technology is developed for reducing the dimension of high-dimension data, and the essence of graph embedding is to map the high-dimension graph data to a low-dimension vector space under the condition of ensuring the structural characteristics of a graph model as much as possible. The development of the graph embedding technology to the present can not only be a dimension reduction method, but also have more complex graph calculation and graph mining capabilities after being combined with deep learning. Graph embedding is a graph representation learning method, namely simply mapping a graph model to a low-dimensional vector space, and the representation in a vector form should keep the structural information and potential characteristics of the graph model as much as possible. Since the advent of word2vec, a magic algorithm, a wave Embedding (Embedding) heat, word2vec and doc2vec algorithms based on sentence and document expression, item2vec algorithms based on item sequences, graph Embedding technology based on graph models, and the like, all internet companies come out successively based on papers combining self-service with Embedding, whether in the categories of referral, advertisement and anti-fraud.

The current graph embedding techniques can be broadly divided into three categories: factorization-based methods, random walk-based methods, and deep learning-based methods. Representative methods based on factorization are graph factorization, structure preserving embedding, and the like. Representative methods based on random walks include Deepwalk, node2vec, and the like. The deep learning based graph embedding method originated from deep walk, which is a random walk based method. It learns network embedding by using a more sophisticated language model, such as SkipGram, by generating a sequence of nodes by random walks, then treating the nodes as words and the sequence as sentences. Node2Vec has the same basic idea as deep walk, except for the random walk. It combines depth-first search and breadth-first search, with wandering biased. Subsequently, SDNE suggests using a depth autoencoder to maintain first and second order network proximity. It does this by jointly optimizing these two approximations. The method utilizes highly non-linear functions to achieve the embedding. DNGR combines random walk and depth auto-encoders. The use of an overlay denoising autoencoder helps to make the model robust in the presence of noise in the graph, and to capture the required infrastructure for tasks such as link prediction and node classification.

The two embedding methods, SDNE and DNGR, take the global neighborhood of each node as input. For large sparse maps, this may be a computationally expensive and inapplicable approach. And the Graph Convolution Network (GCN) can effectively solve the problem by defining convolution operators on the graph. The model iteratively aggregates neighborhood embeddings of nodes and uses the embeddings obtained in the previous iteration and a function of their embeddings to obtain new embeddings. Aggregation embedding of only local neighborhoods makes it scalable, and multiple iterations allow learning to embed one node to describe a global neighborhood.

With the arrival of the artificial intelligence era, the event main body usually depends on high-tech means, the action is more concealed and complex, the precursor event is difficult to capture, the event is fragmented, the existing prediction early warning system cannot effectively perform correlation analysis and excavation, and the timely and effective prevention of the group event is seriously influenced.

Disclosure of Invention

The invention aims to provide a group event prediction method which is based on an event map and can effectively realize event correlation analysis and important information mining and predict and early warn in real time, aiming at the problems of single unstructured event data information, weak event correlation analysis capability, insufficient event characteristic mining, long prediction response time, low early warning accuracy and the like of the current group event prediction method.

The above object of the present invention can be achieved by the following technical solutions: a method for event knowledge graph prediction of a population event, characterized by: the method comprises two stages of prediction model training and real-time prediction; in the prediction model training stage, historical structured event data are used as input data to construct a historical event map, the historical event map is used as the input data of an event prediction network model, the prediction network model is trained by vectorization representation of a map embedded network learning event map, and whether an event occurs or not is predicted based on a classification network model established by a deep neural network; in the real-time prediction stage, a real-time event map is constructed by taking real-time structured event data as input data and is used as input data of a trained event prediction model, the event map constructed by the real-time event data is input into the event prediction model, the event prediction model is composed of a graph embedding network and a depth classification network, the graph embedding network adopts a graph convolution neural network model to express and learn the event map, vectorization expression of the event map is obtained, and hidden characteristic information of the event is mined; the deep classification network adopts a convolutional neural network and a bidirectional long-time and short-time memory network to mine deep semantic information of event data; finally, the method is converted into a two-classification problem of occurrence and non-occurrence of the event, and the result with the highest probability is used as a prediction result of whether the event occurs or not.

The invention has the beneficial effects that:

the information richness is high. The method is based on event map mining population event prediction, adopts two stages of prediction model training and real-time prediction, is different from unstructured event data used by the existing event prediction method, adopts structured event data containing information such as event host/object, time, place, event type, event mention times and the like in the two stages of prediction model training and real-time prediction, is associated in the form of an event map, has obvious data structured characteristics, accurate explicit information expression and good organization, is beneficial to deeply mining recessive characteristics of event data, and better supports the development of event prediction technology.

The predicted response time is short. The method adopts a group event prediction system consisting of an off-line stage and an on-line stage, namely a model training stage and a real-time prediction stage. In the model training stage, historical event data are used for completing the training of a prediction model, and model support is provided for the real-time prediction stage; and in the real-time prediction stage, an event map constructed by real-time event data is input into the prediction model to obtain a prediction result. The predicted response time of the system mainly exists in a real-time prediction stage, and the relationship discovery and prediction model in the system are loaded on a GPU to run, so that the prediction speed can reach the second level.

The event characteristic mining capability is strong. According to the method, the deep association information of the event knowledge graph is mined by the graph convolution neural network, and compared with the conventional prediction method based on event frequent pattern mining and matching, the method can deeply mine the implicit characteristics among event data, does not need manual intervention, and is high in intelligence level.

The prediction accuracy is high. The method comprises the steps that a group event prediction system is constructed by a graph embedding network and a deep classification network, the graph embedding network adopts a graph convolution neural network model to express and learn an event graph, vectorization expression of the event graph is obtained, and hidden feature information of events is fully mined; the deep classification network adopts a convolutional neural network and a bidirectional long-time and short-time memory network to mine deep semantic information of event data; the graph convolution neural network can accurately represent valuable information of the event graph, guarantee is provided for accurate judgment of event occurrence prediction, and accuracy of the event occurrence prediction is effectively improved.

Drawings

FIG. 1 is a block diagram of a system flow for event knowledge graph population event prediction in accordance with the present invention;

FIG. 2 is a schematic view of an event map visualization;

FIG. 3 is a schematic diagram of the structure of an event prediction model;

FIG. 4 is a flow chart of a real-time prediction phase;

in order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the following embodiments and accompanying drawings.

Detailed Description

See fig. 1. The method for predicting the group events by the event knowledge graph comprises two stages of prediction model training and real-time prediction; in the prediction model training stage, historical structured event data are used as input data to construct a historical event map, the historical event map is used as the input data of an event prediction network model, the prediction network model is trained by vectorization representation of a map embedded network learning event map, and whether an event occurs or not is predicted based on a classification network model established by a deep neural network; in the real-time prediction stage, a real-time event map is constructed by taking real-time structured event data as input data and is used as input data of a trained event prediction model, the event map constructed by the real-time event data is input into the event prediction model, the event prediction model is composed of a graph embedding network and a depth classification network, the graph embedding network adopts a graph convolution neural network model to express and learn the event map, vectorization expression of the event map is obtained, and hidden characteristic information of the event is mined; the deep classification network adopts a convolutional neural network and a bidirectional long-time and short-time memory network to mine deep semantic information of event data; finally, the method is converted into a two-classification problem of occurrence and non-occurrence of the event, and the result with the highest probability is used as a prediction result of whether the event occurs or not.

See fig. 2. In order to better mine event graph information and complete event occurrence prediction, the event graph of the embodiment adopts a graph construction mode that a GDELT event library of a global Knowledge graph GKG (global Knowledge graph) is used as a data source, a date node is used as a root node, and nodes related to events are used as child nodes to construct a graph, in the process of constructing the graph, node information such as an event type node, event mention times, a participant name 1, a participant name 2, an event location and the like is added into each node related to the events, each participant name comprises a participant country, a participant type and the like, wherein the participant type node reflects the identity types of two participants of the event importance events to a certain extent; the number of times an event is referred to refers to the number of times the event is referred to by other media.

The map has 7 kinds of edges, namely an event type-event edge, a participant name-event edge, an event extraction frequency-event edge, a location-event edge, a participant type-participant edge, a participant country-participant edge and an event-date edge, wherein each edge represents a message transmission mode.

See fig. 3. The event prediction model comprises a graph embedding network and a classification network, wherein the graph embedding network adopts a graph convolutional neural network, an event graph is taken as network input, the graph convolutional neural network excavates hidden information of the event graph through convolutional operation of different scales in a convolutional layer, the output of the convolutional layer is taken as the input of a Linear rectification function ReLU (RecU) layer, each ReLU layer outputs nonlinear characteristics with strong discriminability after nonlinear transformation, and the output of the last ReLU layer is taken as the final output of the graph convolutional neural network, namely a required graph embedding vector. The specific implementation process of the graph embedded network in the embodiment is as follows: the two nodes of the participant type and the participant country correspond to a participant name node, the two participant name nodes correspond to an event node, and the two nodes are averaged after being processed by a message transfer function; the event node receives the information of the participant name, the event type, the event mention times and the point 4 type nodes, and the update vectors from different nodes are accumulated when the event node is updated; according to the same rule, the event node transmits information to the date nodes, the representation of the date nodes is updated, the representation of each date node is an N-dimensional vector, the date nodes are divided into two types according to whether a group event occurs on the current day, and date embedded representation is obtained through training a date node classification task. When the graph convolution neural network learns the representation of the knowledge graph, each entity utilizes the information in other related entities, and the isolation among the entities is broken, so that the representation of the entities and the relations are more complete and richer. In the embodiment, the classification network adopts a combined model of a convolutional neural network and a bidirectional long-term and short-term memory network, takes an embedded vector obtained by the graph convolutional neural network as input, excavates local and global depth semantic information expressed by graph embedding through the convolutional neural network and the bidirectional long-term and short-term memory network, and outputs the occurrence probability and non-occurrence probability of events. The invention trains an event prediction network model by historical event data, and supports the occurrence prediction of real-time groupware events by the trained prediction model.

See fig. 4. And in the real-time prediction stage, the real-time event graph inputs the event graph constructed by the real-time event data into the event prediction model obtained in the training stage, the event prediction model obtains vector representation of network nodes through a graph convolution neural network, further, deep semantic information of the event graph is mined through a CNN + Bi-LSTM network, and finally, a prediction result of whether an event occurs under the condition of the current event data is obtained.

Claims

1. A method for event knowledge graph prediction of a population event, characterized by: the method comprises two stages of prediction model training and real-time prediction; in the prediction model training stage, historical structured event data are used as input data to construct a historical event map, the historical event map is used as the input data of an event prediction network model, the prediction network model is trained by vectorization representation of a map embedded network learning event map, and whether an event occurs or not is predicted based on a classification network model established by a deep neural network; in the real-time prediction stage, a real-time event map is constructed by taking real-time structured event data as input data and is used as input data of a trained event prediction model, the event map constructed by the real-time event data is input into the event prediction model, the event prediction model is composed of a graph embedding network and a depth classification network, the graph embedding network adopts a graph convolution neural network model to express and learn the event map, vectorization expression of the event map is obtained, and hidden characteristic information of the event is mined; the deep classification network adopts a convolutional neural network and a bidirectional long-time and short-time memory network to mine deep semantic information of event data; finally, the method is converted into a two-classification problem of occurrence and non-occurrence of the event, and the result with the highest probability is used as a prediction result of whether the event occurs or not.

2. The method for event knowledge-graph population event prediction according to claim 1, wherein: the example event graph adopts a GDELT event library of a global knowledge graph GKG as a data source, and a graph construction mode that a date node is used as a root node and a node related to an event is used as a child node is used for constructing the graph.

3. The method for event knowledge-graph population event prediction according to claim 2, wherein: in the construction process of the graph, node information of an event type node, event mention times, a participant name 1, a participant name 2 and an event location is added into a node related to each event, wherein each participant name comprises a participant country, a participant type and the like, and the participant type node reflects the identity types of two participants of an event importance event to a certain extent; the number of times an event is referred to refers to the number of times the event is referred to by other media.

4. The method for event knowledge-graph population event prediction according to claim 3, wherein: the map has at least 7 edges, which are event type-event edge, participant name-event edge, event extraction frequency-event edge, location-event edge, participant type-participant edge, participant country-participant edge and event-date edge, each edge representing a message transmission mode.

5. The method for event knowledge-graph population event prediction according to claim 1, wherein: the event prediction model comprises a graph embedding network and a classification network, wherein the graph embedding network adopts a graph convolution neural network, an event graph is used as network input, the graph convolution neural network excavates hidden information of the event graph through convolution operation of different scales in convolution layers, the output of the convolution layers is used as the input of a linear rectification function ReLU layer, each ReLU layer outputs nonlinear features with strong discrimination after nonlinear transformation, and the output of the last ReLU layer is used as the final output of the graph convolution neural network, namely a required graph embedding vector.

6. The method for event knowledge-graph population event prediction according to claim 1, wherein: the specific implementation process of the middle graph embedded network comprises the following steps: the two nodes of the participant type and the participant country correspond to a participant name node, the two participant name nodes correspond to an event node, and the two nodes are averaged after being processed by a message transfer function; the event node receives the information of the participant name, the event type, the event mention times and the point 4 type nodes, and the update vectors from different nodes are accumulated when the event node is updated; according to the same rule, the event node transmits information to the date nodes, the representation of the date nodes is updated, the representation of each date node is an N-dimensional vector, the date nodes are divided into two types according to whether a group event occurs on the current day, and date embedded representation is obtained through training a date node classification task.

7. The method for event knowledge-graph population event prediction according to claim 1, wherein: when the graph convolution neural network learns the representation of the knowledge graph, each entity utilizes the information in other related entities to break the isolation among the entities, and complete and rich entity and relationship representation is realized.

8. The method for event knowledge-graph population event prediction according to claim 1, wherein: the classification network adopts a combined model of a convolutional neural network and a bidirectional long-term and short-term memory network, takes an embedded vector obtained by the graph convolutional neural network as input, excavates local and global depth semantic information expressed by graph embedding through the convolutional neural network and the bidirectional long-term and short-term memory network, and outputs the probability of occurrence and non-occurrence of events.

9. The method for event knowledge-graph population event prediction according to claim 1, wherein: and training an event prediction network model by using historical event data, wherein the trained prediction model supports the occurrence prediction of real-time groupware events.

10. The method for event knowledge-graph population event prediction according to claim 1, wherein: and in the real-time prediction stage, the real-time event graph inputs the event graph constructed by the real-time event data into the event prediction model obtained in the training stage, the event prediction model obtains vector representation of network nodes through a graph convolution neural network, further, deep semantic information of the event graph is mined through a CNN + Bi-LSTM network, and finally, a prediction result of whether an event occurs under the condition of the current event data is obtained.