CN111143576A - Event-oriented dynamic knowledge graph construction method and device - Google Patents

Event-oriented dynamic knowledge graph construction method and device Download PDF

Info

Publication number
CN111143576A
CN111143576A CN201911313473.6A CN201911313473A CN111143576A CN 111143576 A CN111143576 A CN 111143576A CN 201911313473 A CN201911313473 A CN 201911313473A CN 111143576 A CN111143576 A CN 111143576A
Authority
CN
China
Prior art keywords
event
data
elements
knowledge graph
acquisition
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911313473.6A
Other languages
Chinese (zh)
Inventor
吴琼
刘武雷
王元卓
周楠
常诚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Big Data Research Institute Institute Of Computing Technology Chinese Academy Of Sciences
Original Assignee
Big Data Research Institute Institute Of Computing Technology Chinese Academy Of Sciences
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Big Data Research Institute Institute Of Computing Technology Chinese Academy Of Sciences filed Critical Big Data Research Institute Institute Of Computing Technology Chinese Academy Of Sciences
Priority to CN201911313473.6A priority Critical patent/CN111143576A/en
Publication of CN111143576A publication Critical patent/CN111143576A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Abstract

The invention provides an event-oriented dynamic knowledge graph construction method, which comprises the following steps: collecting data; extracting event trigger words, and training and acquiring the trigger words in the first data to be processed through a first extraction model; identifying event elements, namely acquiring the event elements in an event sentence through a second extraction model, wherein the event sentence comprises trigger words; extracting event relations, namely processing the event elements through a third extraction model to obtain the event relations among the event elements; the method comprises the steps of constructing an event body, identifying the coreference relation in the event relation, and merging coreference events. The method and the equipment for constructing the dynamic knowledge graph facing the event, which are provided by the invention, are used for representing and processing the event, not only take the entity knowledge base as a base and a template, but also have unique constituent elements and architectures, and can be associated with the entity knowledge base, thereby realizing more accurate construction of the dynamic knowledge graph.

Description

Event-oriented dynamic knowledge graph construction method and device
The technical field is as follows:
the invention relates to the field of semantic analysis of natural language processing, in particular to a dynamic knowledge graph construction method and device for events.
Background art:
most of the current knowledge maps concern static knowledge with entities as cores, and lack the description and construction of dynamic knowledge with events as cores. From the application perspective, since the knowledge of the simple entity knowledge base, the simple entity-relationship or entity-attribute-value type cannot meet the increasingly complex requirements and the higher expectations of the application field on the knowledge graph, for example, the entity knowledge base based on the knowledge graph technology, which is established in the public opinion field, can cause the distortion of a plurality of entity relationships or entity attributes along with the occurrence and evolution of events, thereby indirectly affecting the accuracy of the established entity knowledge base, correctly capturing the influence of the events and correctly analyzing the events on the change of the entity knowledge base, and being beneficial to timely calibrating the entity knowledge base. In addition, on the basis of completing the construction of the event map, the influence of multidimensional factors such as time domain, region and users on event public opinion tendency is analyzed through historical similar event comparison, and an emergency model is researched, so that the method is beneficial to mastering the evolution trend of the event and accurately early warning.
The event-oriented knowledge graph is different from a common knowledge graph in that the depicting object of the event-oriented knowledge graph is an event, and the event-oriented knowledge graph inevitably interacts with an entity knowledge base in the process of depicting the event to form a brand-new data structure and a knowledge representation framework, wherein the brand-new data structure comprises entities, relations, attributes, events, event attributes, event participation roles (arguments), special incidence relations between the events and the like. In addition, the extraction of the event relationship is different from the extraction of the entity relationship in the construction process of a general knowledge graph, the extraction of the entity relationship only needs to consider entity-to-entity, and the extraction of the event relationship needs to consider a plurality of different complex conditions such as event-to-entity, event-to-space-time attribute, event-to-event and the like. Therefore, a complex network for establishing causal, sequential, subdivided, generalized and other association relationships between events has been highly regarded by the knowledge-graph academic research institution and the artificial intelligence technology company.
In summary, the research focus of current knowledge graph construction is mainly focused on conventional knowledge graph construction, and the effect is relatively poor for the event graph construction method and the conventional graph construction technology migration. Therefore, analyzing the constituent elements and characteristics of the event, and designing a construction method for the event map is an urgent problem to be solved.
Therefore, there is a need in the art for an event-oriented dynamic knowledge graph building method and apparatus to solve at least one technical problem in the prior art.
The invention content is as follows:
the present invention has been made to solve at least one of the problems occurring in the prior art.
Specifically, in a first aspect of the present invention, an event-oriented dynamic knowledge graph construction method is provided, including the steps of:
the method comprises the steps of data acquisition, wherein first data from an internet data source are acquired, the first data comprise natural language, and the first data are preprocessed to generate first data to be processed;
extracting event trigger words, and training and acquiring the trigger words in the first data to be processed through a first extraction model;
identifying event elements, namely acquiring the event elements in an event sentence through a second extraction model, wherein the event sentence comprises trigger words;
extracting event relations, namely processing the event elements through a third extraction model to obtain the event relations among the event elements;
and constructing an event ontology, identifying the coreference relationship in the event relationship, and merging the coreference events.
By adopting the technical scheme, the event relation is extracted by analyzing the constituent elements and characteristics of the events, and the aggregation of the common events is completed, so that the construction of the event-oriented graph is realized.
Preferably, in the data acquisition step, the internet data source is a multi-source heterogeneous data source.
Preferably, in the data acquisition step, the preprocessing includes at least one of noise removal, sentence segmentation and word segmentation.
Preferably, the data acquisition step comprises the steps of:
generating an acquisition task, and generating the acquisition task according to an acquisition data source and an acquisition rule;
and executing the acquisition task, wherein the scheduling program dynamically allocates acquisition resources according to the acquisition task amount, executes the acquisition task to acquire acquisition data and acquires first data.
By adopting the scheme, the resources can be expanded or reduced dynamically through the scheduling program according to the size of the task quantity without influencing the normal operation of the system, and the acquisition efficiency is ensured.
Preferably, the generating the collection task further comprises transmitting the collection task to a message middleware; the executing the collection task further comprises receiving a collection task of the message middleware.
Preferably, in the event-triggered word extraction step, the method for establishing the first extraction model includes:
using the first corpus as a model training corpus and a test corpus extracted from event trigger words;
selecting the language features of the trigger words to establish space feature vectors according to the co-occurrence features of the trigger words;
the first extraction model is obtained through a Support Vector Machine (SVM) algorithm.
Preferably, the first Corpus is a Chinese Event Corpus (CEC) Corpus.
Preferably, the linguistic features include at least one of word features, lexical features, syntactic features, semantic features, and associated text features.
Preferably, the spatial feature vector is:
V={(wi-3,f1(wi-3),....fm(wi-3)),...,(wi,f1(wi),....fm(wi)),...,(wi+2,f1(wi+2),....fm(wi+2))}
wherein w represents the feature vector of the event trigger word, and f represents the linguistic feature of the trigger word.
Preferably, the event-triggered word extracting step further includes the steps of:
and screening the first data to be processed by using the first screening condition.
Preferably, the first screening condition is a word segmentation tool, and the word segmentation tool is used for labeling the first to-be-processed data.
Preferably, the word segmentation tool labels at least one of clauses, word segments and parts of speech of the text in the first data to be processed.
Preferably, the obtaining of the event element in the event sentence through the second extraction model comprises the steps of:
the event characteristics are selected according to the event characteristics,
a second extraction model is established, and the second extraction model,
and extracting event elements.
Preferably, the event features include a word vector, a location feature and a trigger part-of-speech type feature.
Preferably, the position feature is a relative position of the event trigger word in the event sentence.
Preferably, the feature vector of the event sentence is: x ═ X1,x2,...,xn-1,xn}。
Preferably, the second extraction model is a long short-Term Memory network model (LSTM) based on an attention mechanism.
By adopting the long-term and short-term memory network model based on the attention mechanism, the advantages of the attention mechanism on the sequence task can be utilized, the influence of event trigger words and other event elements on current candidate event elements is enlarged when the event element extraction task is carried out, and high-value information in event sentences is reserved, so that the efficiency and the effect of event element extraction are improved.
Preferably, in the event element identification step, the extracted content includes at least one of time, place, and participant.
Preferably, in the event element identification step, the classification function is a softmax function.
Preferably, in the event relation extracting step, the event relation includes at least one of a hierarchical relation, a composition relation, a causal relation, a following relation and a concurrent relation.
Preferably, the third extraction model includes a long-short term memory neural network and a convolutional neural network.
Preferably, the third extraction model processes the event elements, and the step of obtaining the event relationship between the event elements includes:
obtaining a corpus sequence and corpus representation characteristics;
adopting a first vector model to represent the linguistic data;
inputting the processed corpus into a long-short term memory network model for training;
obtaining expressions of at least two different input directions from the memory output of the long-short term memory network model according to the output results of the forward direction and the reverse direction through a convolutional neural network, and carrying out vector cascade on the obtained expressions;
obtaining a final vector representation of the corpus by using a maximum pool operation;
and calculating the prediction category of the corpus.
Preferably, the step of obtaining the corpus sequence and the corpus representation characteristics includes: the dependency analysis and word segmentation processing are carried out on the text.
Preferably, the first vector model is a bert, or word2vec, trained vector model.
Preferably, the prediction category of the corpus is calculated by adopting an integrated softmax function.
Preferably, in the step of constructing an event ontology, the method further comprises the steps of: and merging the similar elements in the event relation.
Preferably, similar elements in the common finger event are obtained through a synonym discrimination algorithm and are combined.
Preferably, the similarity judgment of at least one parameter of the time, the place and the participants of the event is performed through a similarity judgment algorithm, so as to obtain the co-reference event for event merging.
In a second aspect of the present invention, an electronic device is provided, where the electronic device includes a memory and a processor, and the memory has at least one instruction, and the at least one instruction is loaded and executed by the processor to implement the method.
In a third aspect of the present invention, a computer-readable storage medium is provided, on which at least one instruction is stored, the at least one instruction being loaded and executed by a processor to implement the above method.
In conclusion, the invention has the following beneficial effects:
1. the method for constructing the dynamic knowledge graph facing the event, provided by the invention, represents and processes the event, takes the entity knowledge base as a base and a template, has unique constituent elements and architectures, and can be associated with the entity knowledge base, so that more accurate construction of the dynamic knowledge graph is realized.
2. The invention provides an event-oriented dynamic knowledge graph construction method, which accurately describes various elements which are specially assigned to an event, such as time-space attributes of the event, mutual association among the events, decomposition from the event to a sub-event and the like, then statically associates the event to a corresponding entity knowledge base, such as a role structure of the event and the like, finally expresses and links dynamic execution preconditions and results of the event, and jointly deduces the self state of the event, the subsequent state of the event and the real change trajectory of a participating entity related to the event by using an execution script of the event and the interpretation of the execution script, thereby more accurately describing the event.
3. The event-oriented dynamic knowledge graph construction method provided by the invention adopts the long-term and short-term memory network model based on the attention mechanism, can utilize the advantages of the attention mechanism on the sequence task, expands the influence of event trigger words and other event elements on current candidate event elements when an event element extraction task is carried out, and retains high-value information in event sentences, thereby improving the efficiency and effect of event element extraction.
Drawings
FIG. 1 is a flow chart of one embodiment of the present invention;
FIG. 2 is a schematic diagram of a data acquisition process according to an embodiment of the present invention;
FIG. 3 is a flowchart illustrating event trigger extraction according to an embodiment of the present invention;
FIG. 4 is a flow diagram of event element extraction according to an embodiment of the present invention;
FIG. 5 is a diagram of a bidirectional LSTM neural network based on an attention mechanism in accordance with an embodiment of the present invention;
FIG. 6 is a diagram of a long short term memory neural network and a convolutional neural network according to an embodiment of the present invention.
The specific implementation mode is as follows:
the exemplary embodiments will be described herein in detail, and the embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
In order to solve at least one technical problem in the background art, the invention provides an event-oriented dynamic knowledge graph construction method, which comprises the following steps: the method comprises the steps of data acquisition, wherein first data from an internet data source are acquired, the first data comprise natural language, and the first data are preprocessed to generate first data to be processed; extracting event trigger words, and training and acquiring the trigger words in the first data to be processed through a first extraction model; identifying event elements, namely acquiring the event elements in an event sentence through a second extraction model, wherein the event sentence comprises trigger words; extracting event relations, namely processing the event elements through a third extraction model to obtain the event relations among the event elements; and constructing an event ontology, identifying the coreference relationship in the event relationship, and merging the coreference events.
Based on the above inventive concept, the present invention will be described in detail below by way of examples.
In some embodiments of the present invention, as shown in fig. 1, there is provided an event-oriented dynamic knowledge graph construction method, including the steps of:
s101, data acquisition, namely acquiring first data from an Internet data source, wherein the first data comprises natural language, and the first data is preprocessed to generate first data to be processed;
in the specific implementation process, data of multi-source heterogeneous data sources such as specific data source websites and specific data source software from the internet can be collected to be processed, and the collected data is subjected to noise removal, sentence segmentation, word segmentation and the like through a preprocessing program and then is waited for further processing. The specific data source websites include but are not limited to news websites, emergency columns of emergency websites, event encyclopedias and the like, and the specific data source software includes but is not limited to microblogs, WeChats and the like. Further, data acquisition may be performed by a distributed acquisition procedure, as shown in FIG. 2. Furthermore, script can be used as a framework of the acquisition program, the task acquisition module extracts tasks according to the initialized data source and the task extraction rule, the analyzed tasks are written into an acquisition task queue of the message middleware, and the acquisition module reads the tasks from the message middleware and performs data acquisition, completes preprocessing and warehousing work. The scheduling program can dynamically start and pause part of the task acquisition or execution nodes according to the task amount condition in the message middleware, and particularly, the message middleware can be open source technology such as kafka.
In an optional embodiment, the data acquiring step may further include the steps of:
s201, generating an acquisition task, and generating the acquisition task according to an acquisition data source and an acquisition rule;
and S204, executing the acquisition task, dynamically distributing acquisition resources by the scheduling program according to the acquisition task amount, executing the acquisition task to acquire acquisition data, and acquiring first data.
In an optional embodiment, the generating the collection task further includes step S202, transmitting the collection task to a message middleware; s203, the executing of the collection task further comprises receiving a collection task of the message middleware.
In the specific implementation process, the data acquisition can adopt a distributed architecture design, the generated acquisition task is executed by an acquisition task generating module, the executed acquisition task is executed by an acquisition task executing module, and each module can dynamically expand or reduce resources through a scheduling program according to the size of the task quantity without influencing the normal operation of the system; further, a message middleware can be arranged between the two modules, and the two modules are in communication connection with the message middleware respectively to finish data transmission. By adopting the scheme, the resources can be expanded or reduced dynamically through the scheduling program according to the size of the task quantity without influencing the normal operation of the system, and the acquisition efficiency is ensured.
S103, extracting event trigger words, and acquiring the trigger words in the first data to be processed through training of a first extraction model;
in an optional implementation manner, the method for establishing the first extraction model may include:
s301, using the first corpus as a model training corpus and a test corpus extracted by event trigger words;
s302, selecting the language features of the trigger words to establish space feature vectors according to the co-occurrence features of the trigger words;
s303, acquiring a first extraction model through a Support Vector Machine (SVM) algorithm.
In a specific implementation, as shown in fig. 3, the first Corpus may be a Chinese Event Corpus (CEC) Corpus, and in particular, the Chinese Emergency Corpus (CEC) is constructed and sourced by the university of shanghai (semantic intelligent laboratory). According to a classification system of 'national public incident general emergency plan' issued by a state academy, news reports of 5 types (earthquake, fire, traffic accident, terrorist attack and food poisoning) of emergency events are collected from the Internet to serve as raw corpora, then the raw corpora are subjected to text preprocessing, text analysis, event labeling, consistency check and other processing, finally labeling results are stored in a corpus, the corpus is downloaded, and texts are extracted through a program to form training corpora and testing corpus. Further, the language features may include at least one of word features, lexical features, syntactic features, semantic features, and associated text features, thereby creating a training set, and obtaining a machine learning recognition model through an SVM (support vector machine) algorithm. Furthermore, the model can be tested by using test data, various indexes of the model are evaluated, and a parameter optimization algorithm is continuously adjusted; and performing trigger word extraction and warehousing on the text through the trained keyword classification recognition model. By adopting the technical scheme, the accuracy of acquiring the trigger word can be ensured through the first extraction model established according to the spatial feature vector.
In an alternative embodiment, the spatial feature vector may be calculated by the following formula:
V={(wi-3,f1(wi-3),....fm(wi-3)),...,(wi,f1(wi),....fm(wi)),...,(wi+2,f1(wi+2),....fm(wi+2))}
wherein w represents the feature vector of the event trigger word, and f represents the linguistic feature of the trigger word. By adopting the formula, the space characteristic vector of the trigger word can be more accurately obtained.
In an optional implementation manner, in the event triggering word extracting step, the method further includes the steps of:
and screening the first data to be processed by using the first screening condition.
In a specific implementation process, the first screening condition may use a word segmentation tool, and the word segmentation tool is used for labeling the first to-be-processed data, and specifically, the word segmentation tool may be a jieba word segmentation tool. Further, the word segmentation tool labels at least one of clauses, word segments and parts of speech of the text in the first data to be processed, and specifically, corresponding nouns, verbs and dynamic nouns are screened out through the labels. The first data to be processed is screened, and the obtaining efficiency of the trigger words can be improved in a labeling mode.
S105, identifying event elements, and acquiring the event elements in an event sentence through a second extraction model, wherein the event sentence comprises trigger words;
in an optional embodiment, the obtaining, by the second extraction model, an event element in an event sentence includes:
s401, selecting the event characteristics,
s402, establishing a second extraction model,
and S403, extracting event elements.
In the specific implementation process, as shown in fig. 4, it is determined whether the text to be processed includes the trigger word, if the text to be processed does not include the trigger word, the other texts to be processed are continuously read, and if the text to be processed includes the trigger word, it is indicated that the text to be processed includes an event, and then the following steps may be performed to extract the event element; specifically, the text to be processed can be read from message middleware; the extraction of the event elements is based on a long-short term memory neural network of an attention mechanism. The event characteristics can comprise word vectors, position characteristics and trigger part-of-speech type characteristics; the position characteristics are relative positions of event trigger words in the event sentences; in the event element identification step, the extracted content comprises at least one of time, place and participant; the second extraction model may be a Long Short-Term Memory network model (LSTM) based on an attention mechanism. By adopting the long-term and short-term memory network model based on the attention mechanism, the advantages of the attention mechanism on the sequence task can be utilized, the influence of event trigger words and other event elements on current candidate event elements is enlarged when the event element extraction task is carried out, and high-value information in event sentences is reserved, so that the efficiency and the effect of event element extraction are improved.
In an optional embodiment, the establishing the second extraction model includes the steps of:
s501, the text vectorization representation,
in the specific implementation process, the text vectorization of the event sentence is expressed in an input layer; in the event feature selection, the event features comprise word vectors, trigger word types and position features, wherein the event elements adopt the relative positions of the event trigger words in the event sentences as the position features. The feature vector of an event sentence can be expressed as: x ═ X1,x2,·..,xn-1,xn}。
S502, calculating the vector;
in a specific implementation, a bidirectional long short term memory network (BilSTM) may be used to compute the input layer vectors, as shown in FIG. 5. A forward long and short term memory network element calculates the state t1 of the left part of the word (text in front of the particular word) at time t, and a backward long and short term memory network element calculates the state t2 of the right part of the word (text in back of the particular word) at time t, which will be used to calculate the state t1 of the left part of the word (text in front of the particular word) at time t
Figure BDA0002324330970000081
As the output of the coding layer at time t. Further, a unidirectional long-term and short-term memory network is adopted, the output of the last step is used as the input of the network layer for calculation, and the calculation formula is as follows: st=f(st-1,yt-1,ct) In which S istRepresenting the output state of the decoding layer at time t, f being a non-linear function, St-1Indicating the output state of the decoding layer at time t-1, yt-1Result tag, C, representing time t-1tRepresenting the calculation result of the attention layer of the decoding layer at the time t; further, the attention layer is the core of event element extraction. Due to the information h of the coding layertIs the core information of judging the event element, and the context vector C of the attention layertMainly used for acquiring other part of event sentence to be required for candidate eventDue to the influence of the elements, the event extraction performance can be improved by directly using h of the coding layer as a prediction characteristic of the candidate event elements. Thus CtThe calculation formula of (2) is as follows:
Figure BDA0002324330970000082
where t is not equal to i, at,iRepresenting the weight of attention assignment.
S503, outputting data;
in a specific implementation process, the features can be processed by a softmax function in an output layer to obtain a classification prediction result of the event elements. The calculation formula is as follows: y ist=softmax(whht+wsst+ b), where y is the event element prediction of the current word, w is a randomly generated weight matrix, and b is a bias vector.
By adopting the technical scheme, the event elements can be extracted more efficiently and accurately through the steps of vectorization representation of texts, processing of long-term and short-term memory neural networks and the like, so that the construction of the knowledge graph can be more accurate and event-oriented.
S107, extracting event relations, and processing event elements through a third extraction model to obtain the event relations among the event elements;
in an alternative embodiment, the third extraction model includes a long-short term memory neural network and a convolutional neural network.
In an optional implementation manner, the step of processing the event elements by the third extraction model to obtain the event relationship between the event elements includes:
s601, obtaining a corpus sequence and corpus representation characteristics;
s602, representing the linguistic data by adopting a first vector model;
s603, inputting the processed corpus into a long-short term memory network model for training;
s604, obtaining expressions of at least two different input directions from the memory output of the long-term and short-term memory network model according to the output results of the forward direction and the reverse direction through a convolutional neural network, and carrying out vector cascade on the obtained expressions;
s605, obtaining the final vector representation of the corpus by using the maximum pool operation;
s606, calculating the prediction category of the corpus.
In a specific implementation process, the third extraction model includes:
the input layer is used for carrying out preprocessing such as dependency analysis, word segmentation and the like on the text to obtain a required word sequence and vocabulary representation characteristics;
and the vector represents a layer, and the word vector can be obtained by adopting an unsupervised learning method to pre-train in a large-scale corpus. Furthermore, a Word2Vec or bert tool issued by Google can be adopted to train on Chinese Wikipedia participle corpus, and the trained vector model is used for representing words;
and in the recurrent neural network layer, as the model needs to extract the context information of the event entity relationship, the context information can be modeled through the LSTM model and used as the entity relationship context semantic information. Because the bidirectional LSTM is connected by adding a layer of hidden state opposite to the normal sentence time sequence, the capability of the unidirectional LSTM network is expanded. Therefore, at the time t, the sequence information before and after the time t can be used simultaneously, so that parameter learning and fitting of data distribution are better performed. The bi-directional LSTM layer comprises two LSTM sublayer structures as shown in fig. 6. The bi-directional LSTM output at time t is calculated by the following equation:
Figure BDA0002324330970000091
wherein
Figure BDA0002324330970000092
Representing an element addition operation;
the convolutional neural network layer extracts a certain type of features by specifying a window value with a certain size. The window value is called a convolution kernel, the convolution layer can be provided with a plurality of convolution kernels, the input of the convolution layer is a matrix, the memory output of the LSTM unit is expressed in 2 different input directions according to the output results of the forward direction and the reverse direction through a convolution neural network, and then the obtained expressions are subjected to vector cascade;
the pooling layer is used for filtering the features extracted by the model, removing redundant information, reducing the number of nodes of the network and further reducing the number of training parameters, and obtaining the final vector representation of the input corpus by using the maximum pooling operation;
and the output layer calculates the prediction category of the corpus by using the integrated softmax function.
By adopting the technical scheme, the event elements can be more effectively and accurately processed by the third extraction model, and more accurate event relations among the event elements can be obtained, so that the event ontology can be more effectively established.
In an optional embodiment, in the event relation extracting step, the event relation includes at least one of a hierarchical relation, a composition relation, a causal relation, a following relation, and a concurrent relation.
And S108, constructing an event body, identifying the coreference relationship in the event relationship, and merging the coreference events.
In an optional embodiment, the step of constructing the event ontology further includes the steps of: and merging the similar elements in the event relation.
In the specific implementation process, the problem of element loss is solved by an element reasoning method based on the acquired field; further, event classification is carried out through a clustering algorithm, K centroids are obtained through rough clustering performed through a Canopy algorithm, and entity clustering is carried out through a clustering mode through a K-means algorithm, so that the classification and the hierarchical structure of the events are obtained. By adopting the technical scheme, the elements of the same type can be combined, and the efficiency and the accuracy of event body construction are improved. Furthermore, the construction of the knowledge base is a continuous loop iteration updating process, and after the steps are completed, the knowledge base is continuously constructed and updated through a loop.
In an optional implementation manner, similar elements in the common finger event are obtained through a synonym discrimination algorithm, and element combination is performed.
In the specific implementation process, the synonym discrimination algorithm is used for finding the similar elements in the coreference event and merging the elements, and specifically, the synonym forest expansion version maintained by the Harvard university social computing and information retrieval research center can be used as a synonym dictionary for finding the similar elements and merging the elements. By adopting the technical scheme, the merging efficiency of the similar elements can be further improved by merging the elements of the common-finger event, and the efficiency and the accuracy of the construction of the event body are improved.
In an optional implementation manner, the similarity judgment of at least one parameter of the time, the place and the participants of the event is performed through a similarity judgment algorithm, so that the co-reference event is obtained for event merging.
In an optional implementation manner, the method for constructing an event-oriented dynamic knowledge graph further includes the steps of:
and S109, storing the acquired knowledge graph.
In other embodiments of the present invention, the present invention provides an electronic device, which includes a memory and a processor, where the memory has at least one instruction, and the at least one instruction is loaded and executed by the processor to implement the method described in the above embodiments.
In these embodiments, the electronic device includes a memory and a processor, where the memory has at least one instruction, and the at least one instruction is loaded and executed by the processor to implement the method described in the above embodiments, so that the method has all the beneficial effects of the control method in any of the above embodiments, and details are not repeated here.
In other embodiments of the invention, the invention provides a computer-readable storage medium having stored thereon at least one instruction, which is loaded and executed by a processor to perform the above-described method.
In these embodiments, the computer readable storage medium stores a computer program, and when the computer program is executed by the processor, the steps of the control method in any of the above embodiments are implemented, so that all the beneficial effects of the control method in any of the above embodiments are achieved, and details are not described herein again.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the particular application of the solution and design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
It should be understood that the technical problems can be solved by combining and combining the features of the embodiments from the claims.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An event-oriented dynamic knowledge graph construction method is characterized by comprising the following steps: the method comprises the following steps:
the method comprises the steps of data acquisition, wherein first data from an internet data source are acquired, the first data comprise natural language, and the first data are preprocessed to generate first data to be processed;
extracting event trigger words, and training and acquiring the trigger words in the first data to be processed through a first extraction model;
identifying event elements, namely acquiring the event elements in an event sentence through a second extraction model, wherein the event sentence comprises trigger words;
extracting event relations, namely processing the event elements through a third extraction model to obtain the event relations among the event elements;
and constructing an event ontology, identifying the coreference relationship in the event relationship, and merging the coreference events.
2. The event-oriented dynamic knowledge graph building method according to claim 1, characterized in that: in the data acquisition step, the method comprises the following steps:
generating an acquisition task, and generating the acquisition task according to an acquisition data source and an acquisition rule;
and executing the acquisition task, wherein the scheduling program dynamically allocates acquisition resources according to the acquisition task amount, executes the acquisition task to acquire acquisition data and acquires first data.
3. The event-oriented dynamic knowledge graph building method according to claim 2, characterized in that: the generating of the collection task further comprises transmitting the collection task to a message middleware; the executing the collection task further comprises receiving a collection task of the message middleware.
4. The event-oriented dynamic knowledge graph building method according to any one of claims 1 to 3, characterized in that: in the step of extracting the event trigger word, the method for establishing the first extraction model comprises the following steps:
using the first corpus as a model training corpus and a test corpus extracted from event trigger words;
selecting the language features of the trigger words to establish space feature vectors according to the co-occurrence features of the trigger words;
and obtaining a first extraction model through a support vector machine algorithm.
5. The event-oriented dynamic knowledge graph building method according to any one of claim 4, wherein: in the event trigger word extracting step, the method further comprises the following steps:
and screening the first data to be processed by using the first screening condition.
6. The event-oriented dynamic knowledge graph building method according to claim 5, characterized in that: the spatial feature vector is:
V={(wi-3,f1(wi-3),....fm(wi-3)),...,(wi,f1(wi),....fm(wi)),...,(wi+2,f1(wi+2),....fm(wi+2))}
wherein w represents the feature vector of the event trigger word, and f represents the linguistic feature of the trigger word.
7. The event-oriented dynamic knowledge graph building method according to claim 6, characterized in that: the step of obtaining the event elements in the event sentence through the second extraction model comprises the following steps:
the event characteristics are selected according to the event characteristics,
a second extraction model is established, and the second extraction model,
and extracting event elements.
8. The event-oriented dynamic knowledge graph building method according to any one of claims 5 to 7, characterized in that: the third extraction model processes the event elements, and the step of obtaining the event relation among the event elements comprises the following steps:
obtaining a corpus sequence and corpus representation characteristics;
adopting a first vector model to represent the linguistic data;
inputting the processed corpus into a long-short term memory network model for training;
obtaining expressions of at least two different input directions from the memory output of the long-short term memory network model according to the output results of the forward direction and the reverse direction through a convolutional neural network, and carrying out vector cascade on the obtained expressions;
obtaining a final vector representation of the corpus by using a maximum pool operation;
and calculating the prediction category of the corpus.
9. The event-oriented dynamic knowledge graph building method according to claim 8, wherein: in the step of constructing the event ontology, the method further comprises the following steps: and merging the similar elements in the event relation.
10. An electronic device comprising a memory and a processor, the memory having at least one instruction thereon, the at least one instruction being loaded and executed by the processor to implement the method of any of claims 1-9.
CN201911313473.6A 2019-12-18 2019-12-18 Event-oriented dynamic knowledge graph construction method and device Pending CN111143576A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911313473.6A CN111143576A (en) 2019-12-18 2019-12-18 Event-oriented dynamic knowledge graph construction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911313473.6A CN111143576A (en) 2019-12-18 2019-12-18 Event-oriented dynamic knowledge graph construction method and device

Publications (1)

Publication Number Publication Date
CN111143576A true CN111143576A (en) 2020-05-12

Family

ID=70518846

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911313473.6A Pending CN111143576A (en) 2019-12-18 2019-12-18 Event-oriented dynamic knowledge graph construction method and device

Country Status (1)

Country Link
CN (1) CN111143576A (en)

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597328A (en) * 2020-05-27 2020-08-28 青岛大学 New event theme extraction method
CN111709243A (en) * 2020-06-19 2020-09-25 南京优慧信安科技有限公司 Knowledge extraction method and device based on deep learning
CN111753093A (en) * 2020-07-02 2020-10-09 东北电力大学 Method and device for evaluating level of network public opinion crisis
CN111967256A (en) * 2020-06-30 2020-11-20 北京百度网讯科技有限公司 Event relation generation method and device, electronic equipment and storage medium
CN111985221A (en) * 2020-08-12 2020-11-24 北京百度网讯科技有限公司 Text affair relationship identification method, device, equipment and storage medium
CN112100156A (en) * 2020-09-15 2020-12-18 北京百度网讯科技有限公司 Method, device, medium and system for constructing knowledge base based on user behaviors
CN112559756A (en) * 2020-08-07 2021-03-26 新华智云科技有限公司 Construction method and application method of seismic event knowledge graph
CN112613305A (en) * 2020-12-27 2021-04-06 北京工业大学 Chinese event extraction method based on cyclic neural network
CN113157993A (en) * 2021-02-08 2021-07-23 电子科技大学 Network water army behavior early warning model based on time sequence graph polarization analysis
CN113312500A (en) * 2021-06-24 2021-08-27 河海大学 Method for constructing event map for safe operation of dam
CN113434697A (en) * 2021-06-29 2021-09-24 平安科技(深圳)有限公司 Event element extraction method, computer device and storage medium
CN113449116A (en) * 2021-06-22 2021-09-28 青岛海信网络科技股份有限公司 Map construction and early warning method, device and medium
CN113468345A (en) * 2021-09-02 2021-10-01 中科雨辰科技有限公司 Entity co-reference detection data processing system based on knowledge graph
CN113868508A (en) * 2021-09-23 2021-12-31 北京百度网讯科技有限公司 Writing material query method and device, electronic equipment and storage medium
CN114281940A (en) * 2021-12-07 2022-04-05 江苏联著实业股份有限公司 Computer cognition method and system based on semantic engineering and case learning
CN114282534A (en) * 2021-12-30 2022-04-05 南京大峡谷信息科技有限公司 Meteorological disaster event aggregation method based on element information extraction
CN114579675A (en) * 2022-05-05 2022-06-03 中科雨辰科技有限公司 Data processing system for determining common finger event
CN114706992A (en) * 2022-02-17 2022-07-05 中科雨辰科技有限公司 Event information processing system based on knowledge graph
CN114860960A (en) * 2022-07-11 2022-08-05 南京师范大学 Method for constructing flood type Natech disaster event knowledge graph based on text mining
CN115203440A (en) * 2022-09-16 2022-10-18 北京大数据先进技术研究院 Event map construction method and device for time-space dynamic data and electronic equipment
CN115827848A (en) * 2023-02-10 2023-03-21 天翼云科技有限公司 Method, device, equipment and storage medium for extracting knowledge graph events

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704637A (en) * 2017-11-20 2018-02-16 中国人民解放军国防科技大学 Knowledge graph construction method for emergency
CN109446513A (en) * 2018-09-18 2019-03-08 中国电子科技集团公司第二十八研究所 The abstracting method of event in a kind of text based on natural language understanding
CN109597855A (en) * 2018-11-29 2019-04-09 北京邮电大学 Domain knowledge map construction method and system based on big data driving
CN109657074A (en) * 2018-09-28 2019-04-19 北京信息科技大学 News knowledge mapping construction method based on number of addresses
CN110543574A (en) * 2019-08-30 2019-12-06 北京百度网讯科技有限公司 knowledge graph construction method, device, equipment and medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107704637A (en) * 2017-11-20 2018-02-16 中国人民解放军国防科技大学 Knowledge graph construction method for emergency
CN109446513A (en) * 2018-09-18 2019-03-08 中国电子科技集团公司第二十八研究所 The abstracting method of event in a kind of text based on natural language understanding
CN109657074A (en) * 2018-09-28 2019-04-19 北京信息科技大学 News knowledge mapping construction method based on number of addresses
CN109597855A (en) * 2018-11-29 2019-04-09 北京邮电大学 Domain knowledge map construction method and system based on big data driving
CN110543574A (en) * 2019-08-30 2019-12-06 北京百度网讯科技有限公司 knowledge graph construction method, device, equipment and medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
缪学宁 等: "《网络信息体系中数据链系统的建设与应用 2019年度数据链技术论坛论文集》", 31 May 2019 *
郭正斌: "面向社会安全事件的知识图谱构建方法研究", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597328A (en) * 2020-05-27 2020-08-28 青岛大学 New event theme extraction method
CN111709243A (en) * 2020-06-19 2020-09-25 南京优慧信安科技有限公司 Knowledge extraction method and device based on deep learning
CN111709243B (en) * 2020-06-19 2023-07-07 南京优慧信安科技有限公司 Knowledge extraction method and device based on deep learning
CN111967256A (en) * 2020-06-30 2020-11-20 北京百度网讯科技有限公司 Event relation generation method and device, electronic equipment and storage medium
CN111967256B (en) * 2020-06-30 2023-08-04 北京百度网讯科技有限公司 Event relation generation method and device, electronic equipment and storage medium
CN111753093A (en) * 2020-07-02 2020-10-09 东北电力大学 Method and device for evaluating level of network public opinion crisis
CN112559756A (en) * 2020-08-07 2021-03-26 新华智云科技有限公司 Construction method and application method of seismic event knowledge graph
CN111985221B (en) * 2020-08-12 2024-03-26 北京百度网讯科技有限公司 Text event relationship identification method, device, equipment and storage medium
CN111985221A (en) * 2020-08-12 2020-11-24 北京百度网讯科技有限公司 Text affair relationship identification method, device, equipment and storage medium
CN112100156B (en) * 2020-09-15 2024-02-20 北京百度网讯科技有限公司 Method, device, medium and system for constructing knowledge base based on user behaviors
CN112100156A (en) * 2020-09-15 2020-12-18 北京百度网讯科技有限公司 Method, device, medium and system for constructing knowledge base based on user behaviors
CN112613305A (en) * 2020-12-27 2021-04-06 北京工业大学 Chinese event extraction method based on cyclic neural network
CN112613305B (en) * 2020-12-27 2024-04-09 北京工业大学 Chinese event extraction method based on cyclic neural network
CN113157993A (en) * 2021-02-08 2021-07-23 电子科技大学 Network water army behavior early warning model based on time sequence graph polarization analysis
CN113449116B (en) * 2021-06-22 2022-12-20 青岛海信网络科技股份有限公司 Map construction and early warning method, device and medium
CN113449116A (en) * 2021-06-22 2021-09-28 青岛海信网络科技股份有限公司 Map construction and early warning method, device and medium
CN113312500A (en) * 2021-06-24 2021-08-27 河海大学 Method for constructing event map for safe operation of dam
CN113434697A (en) * 2021-06-29 2021-09-24 平安科技(深圳)有限公司 Event element extraction method, computer device and storage medium
CN113468345B (en) * 2021-09-02 2021-12-07 中科雨辰科技有限公司 Entity co-reference detection data processing system based on knowledge graph
CN113468345A (en) * 2021-09-02 2021-10-01 中科雨辰科技有限公司 Entity co-reference detection data processing system based on knowledge graph
CN113868508A (en) * 2021-09-23 2021-12-31 北京百度网讯科技有限公司 Writing material query method and device, electronic equipment and storage medium
CN114281940A (en) * 2021-12-07 2022-04-05 江苏联著实业股份有限公司 Computer cognition method and system based on semantic engineering and case learning
CN114282534A (en) * 2021-12-30 2022-04-05 南京大峡谷信息科技有限公司 Meteorological disaster event aggregation method based on element information extraction
CN114706992A (en) * 2022-02-17 2022-07-05 中科雨辰科技有限公司 Event information processing system based on knowledge graph
CN114579675A (en) * 2022-05-05 2022-06-03 中科雨辰科技有限公司 Data processing system for determining common finger event
CN114860960A (en) * 2022-07-11 2022-08-05 南京师范大学 Method for constructing flood type Natech disaster event knowledge graph based on text mining
CN115203440B (en) * 2022-09-16 2023-02-03 北京大数据先进技术研究院 Event map construction method and device for time-space dynamic data and electronic equipment
CN115203440A (en) * 2022-09-16 2022-10-18 北京大数据先进技术研究院 Event map construction method and device for time-space dynamic data and electronic equipment
CN115827848A (en) * 2023-02-10 2023-03-21 天翼云科技有限公司 Method, device, equipment and storage medium for extracting knowledge graph events

Similar Documents

Publication Publication Date Title
CN111143576A (en) Event-oriented dynamic knowledge graph construction method and device
US11775760B2 (en) Man-machine conversation method, electronic device, and computer-readable medium
CN112131366B (en) Method, device and storage medium for training text classification model and text classification
Snyder et al. Interactive learning for identifying relevant tweets to support real-time situational awareness
CN110321563B (en) Text emotion analysis method based on hybrid supervision model
JP2022003537A (en) Method and device for recognizing intent of dialog, electronic apparatus, and storage medium
JP2022548215A (en) Progressive collocation for real-time conversations
Zhang et al. A multi-feature fusion model for Chinese relation extraction with entity sense
Banik et al. Gru based named entity recognition system for bangla online newspapers
CN112148881A (en) Method and apparatus for outputting information
CN113449204A (en) Social event classification method and device based on local aggregation graph attention network
Lee et al. Detecting suicidality with a contextual graph neural network
Sajeevan et al. An enhanced approach for movie review analysis using deep learning techniques
CN109977194B (en) Text similarity calculation method, system, device and medium based on unsupervised learning
CN117236676A (en) RPA process mining method and device based on multi-mode event extraction
Mahmoud et al. Arabic semantic textual similarity identification based on convolutional gated recurrent units
Lokman et al. A conceptual IR chatbot framework with automated keywords-based vector representation generation
Ullah et al. Unveiling the Power of Deep Learning: A Comparative Study of LSTM, BERT, and GRU for Disaster Tweet Classification
CN114357152A (en) Information processing method, information processing device, computer-readable storage medium and computer equipment
Wang et al. Natural language processing systems and Big Data analytics
Cui et al. Aspect level sentiment classification based on double attention mechanism
Roseline et al. PS-POS embedding target extraction using CRF and BiLSTM
Im et al. Multilayer CARU Model for Text Summarization
Kim Research on Text Classification Based on Deep Neural Network
Kulkarni et al. Deep Reinforcement-Based Conversational AI Agent in Healthcare System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200512

RJ01 Rejection of invention patent application after publication