CN109582958B - Disaster story line construction method and device - Google Patents

Disaster story line construction method and device Download PDF

Info

Publication number
CN109582958B
CN109582958B CN201811382046.9A CN201811382046A CN109582958B CN 109582958 B CN109582958 B CN 109582958B CN 201811382046 A CN201811382046 A CN 201811382046A CN 109582958 B CN109582958 B CN 109582958B
Authority
CN
China
Prior art keywords
information
disaster
entity
extracting
appointed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811382046.9A
Other languages
Chinese (zh)
Other versions
CN109582958A (en
Inventor
周绮凤
倪进鑫
安超杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Research Institute of Xiamen University
Original Assignee
Shenzhen Research Institute of Xiamen University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Research Institute of Xiamen University filed Critical Shenzhen Research Institute of Xiamen University
Priority to CN201811382046.9A priority Critical patent/CN109582958B/en
Publication of CN109582958A publication Critical patent/CN109582958A/en
Application granted granted Critical
Publication of CN109582958B publication Critical patent/CN109582958B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/06Electricity, gas or water supply
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A10/00TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE at coastal zones; at river basins
    • Y02A10/40Controlling or monitoring, e.g. of flood or hurricane; Forecasting, e.g. risk assessment or mapping

Abstract

The invention relates to a disaster story line construction method and device, and belongs to the technical field of semantic networks. The method comprises the following steps: collecting relevant information of a specified disaster; extracting the ternary group entity information related to the appointed disaster from the related information; extracting the relation among the triplet entity information; extracting attributes of the appointed disaster entity; and constructing a story line of the appointed disaster according to the triplet entity information, the relation among the triplet entity information and the attribute of the appointed disaster entity. The disaster accident line is generated by utilizing the knowledge graph, and useful information is extracted from the news text by utilizing knowledge graph construction technologies such as entity identification, relation extraction, attribute extraction and the like to generate the disaster accident line, so that the problem that the disaster accident line cannot be constructed by extracting useful information from massive information in the prior art is solved.

Description

Disaster story line construction method and device
Technical Field
The invention belongs to the technical field of semantic networks, and particularly relates to a disaster story line construction method and device.
Background
The reaction and disposal of disaster events has been a major concern for society. When a disaster happens, if we can find out the evolution rule, the loss caused by the disaster event can be effectively reduced. In the present, most of information about disasters comes from news reports of media, if we can extract effective information from the news, the complete process of disaster evolution is restored, and when the disaster of the same type occurs again, we can make a targeted measure according to the disaster evolution process of the same type, so that the loss caused by the disaster can be effectively reduced.
At present, the story line construction in the disaster field mainly adopts a document abstract method, along with the progress of science and technology, the increasing speed of network information quantity is in a straight line, and the rapid increase of information generally causes information explosion, so that the disaster story line construction method based on the document abstract method is difficult to quickly and accurately extract useful information from massive information to construct a disaster story line.
Disclosure of Invention
The invention provides a disaster accident line construction method and device for solving the technical problem that useful information cannot be extracted from massive information to construct a disaster story line in the prior art.
In order to achieve the above purpose, the invention adopts the following technical scheme:
in one aspect, a method of disaster storyline construction, the method comprising:
collecting relevant information of a specified disaster;
extracting the ternary group entity information related to the appointed disaster from the related information;
extracting the relation among the triplet entity information;
extracting attributes of the appointed disaster entity;
and constructing a story line of the appointed disaster according to the triple entity information, the relation among the triple entity information and the attribute of the appointed disaster entity.
Further optionally, the collecting relevant information specifying the disaster includes:
acquiring relevant information of a specified disaster on the Internet by utilizing a web crawler technology;
and selecting preset target information from the crawled information.
Further optionally, the selecting preset target information from the crawled information includes: and selecting preset target information from the crawled information by using a network node importance measurement method based on the degree and the aggregation coefficient.
Further optionally, the extracting the specified disaster related triplet entity information from the related information includes: and extracting disaster related triplet entity information from the preprocessed information by using a bidirectional cyclic neural network model of the fusion conditional random field.
Further optionally, the extracting the relationship between the triplet entity information includes: the relationship between disaster entities is extracted by using a bidirectional cyclic neural network model of an attention mechanism.
Further optionally, the extracting attributes of the disaster entity includes: and extracting the attribute of the disaster entity by using a Bootstrapping model.
Further optionally, the building the story line of the specified disaster according to the triple entity information, the relation among the triple entity information and the attribute of the specified disaster entity includes:
constructing a local disaster story line;
a global disaster story line is generated.
Further optionally, the constructing the local disaster storyline includes:
classifying by the location entities to obtain information disaster entity relations and disaster entity attributes of different locations;
performing disaster entity disambiguation;
and carrying out disaster attribute fusion.
Further optionally, the building a global disaster storyline includes:
constructing a cost function, wherein the cost function is used for describing the similarity degree between at least 2 maps;
judging whether directed edge connection exists among the at least 2 partial maps according to a cost function;
fusing the cost function and the local map to construct a global story line;
when the number of the at least 2 partial maps is 2, the cost function includes:
in yet another aspect, a disaster storyline construction device comprises: the system comprises an information collection module, an entity information extraction module, an entity relation extraction module, an entity attribute extraction module and a story line generation module;
the information collection module is used for collecting relevant information of the appointed disaster;
the entity information extraction module is used for extracting the ternary group entity information related to the appointed disaster from the related information;
the entity relation extracting module is used for extracting the relation among the triplet entity information;
the entity attribute extraction module is used for extracting the attribute of the appointed disaster entity;
and the story line generation module is used for constructing the story line of the appointed disaster according to the triplet entity information, the relation among the triplet entity information and the attribute of the appointed disaster entity.
In the embodiment of the invention, the disaster accident line is generated by utilizing the knowledge graph, and the disaster accident line is generated by extracting useful information from the news text through knowledge graph construction technologies such as entity identification, relation extraction, attribute extraction and the like, so that the problem that the disaster accident line cannot be constructed by extracting useful information from massive information in the prior art is solved.
Drawings
In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of an embodiment of a disaster storyline construction method provided by the present invention;
FIG. 2 is a diagram of a model of a named entity recognition BLSTM-CRF (bidirectional cyclic neural network fused with conditional random fields) in one embodiment of a disaster storyline construction method provided by the invention;
FIG. 3 is a diagram of a model of a bi-directional recurrent neural network (ATt-BLSTM) for relational extraction in one embodiment of a disaster storyline construction method provided by the invention;
FIG. 4 is a drawing of a model of attribute extraction Bootstrapping in one embodiment of a method for constructing a disaster storyline provided by the present invention;
FIG. 5 is a schematic diagram of a local storyline in an embodiment of a disaster storyline construction method provided by the present invention;
FIG. 6 is a diagram of a global storyline in an embodiment of a disaster storyline construction method provided by the present invention;
fig. 7 is a block diagram of an embodiment of a disaster storyline construction device according to the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, based on the examples herein, which are within the scope of the invention as defined by the claims, will be within the scope of the invention as defined by the claims.
To more clearly illustrate the process and advantages of the method of the present embodiment, the present invention provides an embodiment of a disaster storyline construction method.
Referring to fig. 1, the method of the embodiment of the present invention includes:
collecting relevant information of a specified disaster;
extracting the ternary group entity information related to the appointed disaster from the related information;
extracting the relation among the triplet entity information;
extracting attributes of the appointed disaster entity;
and constructing a story line of the appointed disaster according to the triplet entity information, the relation among the triplet entity information and the attribute of the appointed disaster entity.
In the embodiment of the invention, the disaster accident line is generated by utilizing the knowledge graph, and the disaster accident line is generated by extracting useful information from the news text through knowledge graph construction technologies such as entity identification, relation extraction, attribute extraction and the like, so that the problem that the disaster accident line cannot be constructed by extracting useful information from massive information in the prior art is solved.
Based on the above-mentioned method for constructing a disaster story line, an embodiment of the present invention provides an alternative embodiment: referring to fig. 1, the disaster storyline construction method of the present embodiment may include the steps of:
and s101, collecting relevant information of the appointed disaster.
Google formally proposed the concept of a knowledge graph in 2012, and then the knowledge graph rapidly becomes a great hot research field. The knowledge map is also called scientific knowledge map, called knowledge domain visualization or knowledge domain mapping map in book emotion, and is a series of different graphs for displaying knowledge development process and structural relationship, and knowledge resources and their carriers are described by using visualization technology, and knowledge and their interrelations are mined, analyzed, constructed, drawn and displayed. Knowledge graph is a semantic network that reveals entity relationships and can formally describe things in the real world. The representation of the knowledge-graph is a triplet, i.e. where the entity set, the relation set, and the triplet set are represented. There are two basic manifestations of triples, namely (entity 1, relationship, entity 2) and (entity, attribute value). The knowledge graph construction technology can extract useful information from a large amount of network information and display the useful information in a graph drawing mode, reveals the development rule of disasters, and provides references for formulating targeted precautionary measures.
The strong semantic representation capability and the simple expression mode of the knowledge graph are very suitable for constructing the disaster story line, the story line construction in the disaster field is mainly based on a document abstract method, and the research based on the knowledge graph method is basically in a blank state. Therefore, it is particularly important to construct a disaster story line with sufficient information and conciseness through a knowledge graph.
In the present embodiment, the relevant information specifying the disaster is collected, for example, the relevant information of typhoon disaster is collected. In a specific implementation process, different typhoon news reports are acquired on the Internet by using a web crawler technology, particularly a Python crawler technology, and important news is selected from the crawled news by a node importance method based on the degree and the aggregation coefficient so as to achieve the aim of removing redundant information.
Specifically, the degree index describes the number of neighbor nodes of a node:
k i =∑ j∈G δ ij
wherein, the liquid crystal display device comprises a liquid crystal display device,
the degree index reflects the capability of establishing direct connection between the node and surrounding nodes, but cannot reflect the edge connection condition of the neighbor nodes of the node.
The aggregation factor describes the proportion of neighbors to each other among the neighbors of a node in the network, expressed as:
wherein e i Representing the triangle format formed between node i and any two of its neighbors. Contrary to the degree index, the aggregation coefficient can reflect the edge connecting condition of the neighbor node to a certain extent, but cannot reflect the scale of the neighbor node, so we use the node neighbor information and consider the aggregation coefficient to provide a new node importance evaluation index p i
Wherein f i The sum of the node i self-degree and the neighbor degree is expressed as:
wherein k is w Representing the degree, delta, of node w i Representing the set of neighbor nodes for node i. Function g i Expressed as:
by using a network node importance measurement method based on the degree and the aggregation coefficient, we select importance news from the crawled news documents, thereby achieving the purpose of removing redundant information, obtaining important news information related to typhoons, and taking the important news information as related information of appointed disasters.
s102, extracting the ternary group entity information related to the appointed disaster from the related information.
Specifically, the entity is extracted from the selected important news text by using a bidirectional cyclic neural network model of the fusion conditional random field, and the entity comprises, but is not limited to, a name of a person, a name of an organization and a name of a place.
Named entities are extracted from news text using the BLSTM-CRF model.
The expression form of the knowledge graph is a triplet, and the core of the triplet is an entity, so that the first step of information extraction is named entity identification. Named entity recognition has become a fundamental technology for many natural language processing applications. In recent years, with the rapid development of deep learning, recurrent neural networks began to exhibit strong capabilities in natural language processing tasks. In this embodiment, we use the BLSTM-CRF model for named entity recognition.
Referring to FIG. 2, FIG. 2 is a schematic diagram of a BLSTM-CRF model structure.
Input layer: for a given sentence x= (x 1 ,x 2 ,...,x n ) Wherein x is i Is a one-hot vector representing the position of the character in the character dictionary. We then project the one-hot vector into a word vector through the word2vec model, the input to the BLSTM-CRF model being the word vector.
LSTM layer: the bi-directional LSTM layer is used to automatically extract the features of sentences. For a given sentence, wherein the word vector of each word is used as the input of each time step of the bidirectional LSTM, the hidden state sequences obtained by the forward LSTM and the reverse LSTM are then spliced to obtain a complete hidden state sequenceThe specific operation process of the LSTM layer is as follows:
i t =σ(W xi x t +W hi h t-1 +W ci c t-1 +b i ) (input door)
f t =σ(W xf x t +W hf h t-1 +W cf c t-1 +b if ) (forget door)
c t =f t ·c t-1 +i t ·tanh(W xc x t +W hc h t-1 +b if ) (cell State)
o t =σ(W xo x t +W ho h t-1 +W co c t-1 +b o ) (output door)
h t =o t ·tanh(c t ) (output)
CRF layer: the third layer of the model is the CRF layer, inputting x= (x) for one sentence 1 ,x 2 ,...,x n ) Let P be the fractional matrix of the BLSTM network output. P has a size of n x k, where k is the number of different tags, P ij Is the score of the j-th tag of the i-th character in the sentence. For a series of predictions y= (y) 1 ,y 2 ,...,y n ) We define it as:
where a is a matrix modeling the conversion score from tag i to tag j. We add start and end tags to a set of possible tags,they are y 0 And y n Respectively representing the start and end symbols of the sentence. Thus, a is a square matrix of size k+2. After applying the softmax layer to all possible tag sequences, the probability of sequence y is:
thus, the triad entity information specifying disaster related information, for example, the related information of typhoons, taiwan, and nibert typhoons in typhoons disasters can be extracted from the total related information.
And S103, extracting the relation among the triplet entity information.
In this embodiment, relationships among 8 entities are artificially constructed by analyzing news text data, and then the relationship extraction task is regarded as a classification task, and relationship extraction among the entities is performed by using a bidirectional cyclic neural network model of an attention mechanism.
Relationship extraction is a task of finding semantic relationships between nouns, and in recent studies, relationship extraction tasks are often regarded as classification tasks. In the present invention, we use the Att-BLSTM model to extract the relationships between entities, the structure of which is shown in fig. 4.
Compared with the BLSTM model, the Att-BLSTM has one more attention layer, so that H= (H 1 ,h 2 ,...,h T ) Is the output of the BLSTM layer, where T is the size of the input sentence. The representation r of the sentence is made up of a weighted sum of these output vectors:
M=tanh(H)
α=softmax(w T M)
r=Hα T
h * =tanh(r)
wherein the method comprises the steps ofd w Is the dimension of the word vector and w is the training parameter.
At the last layer, we use the softmax function to predict the relationship tag, the prediction process is as follows:
the output of softmax is a vector of the number of labels, where each element represents the probability of taking the value of the corresponding label, and we take the label corresponding to the highest probability as the relationship of the entities in the input sentence.
Based on this, in the present embodiment, the relationship between "taiwan" and "nib", the relationship between "typhoon" and "taiwan", and the like can be extracted.
s104, extracting the attribute of the appointed disaster entity.
And extracting entity attributes in the text by using a Bootstrapping semi-supervised learning method.
Bootstrapping is a semi-supervised machine learning technique widely used for knowledge acquisition, and is a progressive learning method. Only a small amount of marked data or initial seed sets are needed, and the data set is finally achieved through effectively expanding by cyclic learning. The acquisition of attribute information based on Bootstrapping mainly includes the selection of a small number of seed patterns and the preparation of a large number of unlabeled text.
In the acquisition of the attribute mode based on the Bootstrapping algorithm, the evaluation of the candidate mode plays a role. If the wrong mode is used as the seed mode again to enter the iterative acquisition state, the wrong amplification is caused, and even the whole mode acquisition fails. Therefore, the credibility of each candidate mode needs to be estimated according to a certain evaluation function, and the candidate modes are ranked, and the first n candidate mode instances or candidate mode instances larger than a certain threshold value are selected to enter an iterative process. Calculating the similarity between the seed pattern and the candidate pattern is a good pattern evaluation mode. The common similarity calculation method at present comprises a vector space model, an edit distance method, a query likelihood model and the like. The edit distance refers to the minimum number of editing operations required to shift from S1 to S2 for two character strings S1 and S2. A license edit operation to delete a character performs text similarity calculation by replacing the character with another character, inserting the character. The invention adopts the edit distance to evaluate the similarity between the candidate mode and the seed mode.
New mode acquisition process:
(1) Preprocessing such as sentence splitting, word segmentation, part-of-speech tagging and the like on the text;
(2) Searching sentences with trigger words in the training corpus, and extracting syntactic patterns of descriptive sentences containing attribute trigger words as candidate patterns;
(3) Calculating the similarity between the candidate mode and the seed mode based on the editing distance;
(4) Comparing the magnitude of the similarity calculated in the step (3) with a given threshold value, and if the magnitude of the similarity is larger than the threshold value, reserving the mode;
(5) And (3) converting the mode obtained in the step (4) into a new mode seed, and then carrying out the next iteration to obtain a new mode.
In this embodiment, according to the acquired new pattern, the "typhoon", "taiwan" and "nilbert" attributes and attribute values are obtained.
And s105, constructing a story line of the appointed disaster according to the triplet entity information, the relation among the triplet entity information and the attribute of the appointed disaster entity.
Constructing disaster story lines by using information extracted from news texts, wherein the disaster story lines comprise local disaster story lines and global disaster story lines, and the local story lines are constructed by classifying place entities to obtain information of different places; for global disaster storyline, we extract a cost functionDetermining whether directed edge connection exists between the two local atlases, and finally fusing to obtain a final global story line.
Local disaster storyboard construction:
according to the invention, all news documents are divided according to time, each day news is extracted, and the aim of acquiring entity, relation and entity attribute information from unstructured and semi-structured data is realized through information extraction, however, a large amount of redundant and erroneous information may be contained in the results, so that the results are necessary to be cleaned and integrated. Through knowledge fusion, ambiguity of concepts can be eliminated, redundant concepts and wrong concepts are removed, and accordingly knowledge quality is guaranteed.
In entity disambiguation, since the entities extracted herein are places and organization names, the web pages to which the entities are linked are compared according to the hundred degrees encyclopedia to complete the work of entity disambiguation.
For attribute fusion, the attribute is classified according to the trigger word classification, and the attribute value with the largest occurrence number is selected for each type of attribute by adopting a voting method.
Based on the above rules we can get a daily map of typhoons during their occurrence and then connect the map together in a timeline to form a local disaster storyline as shown in figure 5.
Global story line construction:
nodes generated using local storylines construct a directed graph (i to j only occur earlier than j), for which we construct a cost function to describe the degree of similarity between two graphs, as follows:
where d (i, j) represents the distance between the locations of the normalized i and j map descriptions, N j The number of triples representing the map j, we comprehensively consider the geographic location and the amount of map information, in general, i will tend to transition to the same map as the location described by i, but typhoons are moving all the time, and there is more information at the location of the typhoons center, so we also consider the cost of constructing the map information amount.
Referring to fig. 6, fig. 6 is a global disaster storyline constructed based on a cost function and finally generated.
In the embodiment of the invention, the disaster accident line is generated by utilizing the knowledge graph, and the disaster accident line is generated by extracting useful information from the news text through knowledge graph construction technologies such as entity identification, relation extraction, attribute extraction and the like, so that the problem that the disaster accident line cannot be constructed by extracting useful information from massive information in the prior art is solved.
Fig. 7 is a block diagram of an embodiment of a disaster storyline construction device according to the present invention.
Referring to fig. 7, in the embodiment of the disaster event line construction device, the disaster event line construction device includes: an information collection module 61, an entity information extraction module 62, an entity relationship extraction module 63, an entity attribute extraction module 64, and a story line generation module 65.
Specifically, the information collection module 61 is configured to collect relevant information specifying a disaster; the entity information extraction module 62 is configured to extract at least two entity information specifying a disaster from the related information; the entity relation extracting module 63 is configured to extract a relation between at least two entities; entity attribute extraction module 64 is configured to extract attributes of a specified disaster entity; the story line generation module 65 is configured to construct a story line of a specified disaster according to at least two entities information, a relationship between at least two entities, and an attribute of the specified disaster entity.
In the embodiment of the invention, the disaster accident line is generated by utilizing the knowledge graph, and the disaster accident line is generated by extracting useful information from the news text through knowledge graph construction technologies such as entity identification, relation extraction, attribute extraction and the like, so that the problem that the disaster accident line cannot be constructed by extracting useful information from massive information in the prior art is solved.
The foregoing description is merely illustrative of the present invention, and the scope of the present invention is not limited thereto, and any person skilled in the art will readily recognize that variations or substitutions are within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.
It is to be understood that the same or similar parts in the above embodiments may be referred to each other, and that in some embodiments, the same or similar parts in other embodiments may be referred to.
It should be noted that in the description of the present invention, the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present invention, unless otherwise indicated, the meaning of "plurality" means at least two.
Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and further implementations are included within the scope of the preferred embodiment of the present invention in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present invention.
It is to be understood that portions of the present invention may be implemented in hardware, software, firmware, or a combination thereof. In the above-described embodiments, the various steps or methods may be implemented in software or firmware stored in a memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, may be implemented using any one or combination of the following techniques, as is well known in the art: discrete logic circuits having logic gates for implementing logic functions on data signals, application specific integrated circuits having suitable combinational logic gates, programmable Gate Arrays (PGAs), field Programmable Gate Arrays (FPGAs), and the like.
Those of ordinary skill in the art will appreciate that all or a portion of the steps carried out in the method of the above-described embodiments may be implemented by a program to instruct related hardware, where the program may be stored in a computer readable storage medium, and where the program, when executed, includes one or a combination of the steps of the method embodiments.
In addition, each functional unit in the embodiments of the present invention may be integrated in one processing module, or each unit may exist alone physically, or two or more units may be integrated in one module. The integrated modules may be implemented in hardware or in software functional modules. The integrated modules may also be stored in a computer readable storage medium if implemented in the form of software functional modules and sold or used as a stand-alone product.
The above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, or the like.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described above, it will be understood that the above embodiments are illustrative and not to be construed as limiting the invention, and that variations, modifications, alternatives and variations may be made to the above embodiments by one of ordinary skill in the art within the scope of the invention.

Claims (7)

1. A method of disaster storyline construction, the method comprising:
collecting information about a specified disaster, comprising: acquiring relevant information of a specified disaster on the Internet by utilizing a web crawler technology; selecting preset target information from the crawled information as the related information;
the method specifically comprises the following steps: selecting important information from the crawled information by a node importance method based on the degree and aggregation coefficient so as to achieve the aim of removing redundant information and obtain the related information;
wherein the metric describes the number of neighbor nodes of a node,
k i =∑ j∈G δ ij
wherein, the liquid crystal display device comprises a liquid crystal display device,
i, j are different nodes;
the aggregation factor describes the proportion of neighbors to each other among the neighbors of a node in the network, expressed as:
node importance evaluation index p i Expressed as:
wherein f i The sum of the node i self-degree and the neighbor degree is expressed as:
wherein k is w Representing the degree of node w, Δi representing the set of neighbor nodes of node i; function g i Expressed as:
according to the node importance evaluation index p i Selecting important information from the crawled information so as to achieve the aim of removing redundant information, and obtaining the related information;
extracting the ternary group entity information related to the appointed disaster from the related information;
extracting the relation among the triplet entity information;
extracting attributes of the entities of the appointed disaster;
constructing a story line of the appointed disaster according to the triple entity information, the relation among the triple entity information and the attribute of the appointed disaster entity, wherein the story line comprises the following steps: constructing a local disaster story line; generating a global disaster story line; wherein the generating a global disaster storyline comprises:
constructing a cost function, wherein the cost function is used for describing the similarity degree between at least 2 maps;
judging whether directed edge connection exists among the at least 2 partial maps according to a cost function;
fusing the cost function and the local map to construct a global story line;
when the number of the at least 2 partial maps is 2, the cost function includes:
wherein d (i, j) represents the distance between the locations representing the normalized i and j two map descriptions; n (N) j Representing the number of triples of the atlas j.
2. The method according to claim 1, wherein selecting preset target information from the crawled information comprises: and selecting preset target information from the crawled information by using a network node importance measurement method based on the degree and the aggregation coefficient.
3. The method of claim 1, wherein said extracting said specified disaster-related triad entity information from said related information comprises: and extracting disaster related triplet entity information from the preprocessed information by using a bidirectional cyclic neural network model of the fusion conditional random field.
4. The method of claim 1, wherein said extracting relationships between said triplet entity information comprises: the relationship between disaster entities is extracted by using a bidirectional cyclic neural network model of an attention mechanism.
5. The method of claim 1, wherein the extracting attributes of the entity specifying the disaster comprises: and extracting the attributes of the entities of the appointed disaster by using a Bootstrapping model.
6. The method of claim 1, wherein constructing a local disaster storyline comprises:
classifying by the location entities to obtain information disaster entity relations and disaster entity attributes of different locations;
performing disaster entity disambiguation;
and carrying out disaster attribute fusion.
7. A disaster storyline construction device, comprising: the system comprises an information collection module, an entity information extraction module, an entity relation extraction module, an entity attribute extraction module and a story line generation module;
the information collection module is used for collecting relevant information of the appointed disaster, and is particularly used for acquiring the relevant information of the appointed disaster on the Internet by utilizing a web crawler technology; selecting preset target information from the crawled information as the related information; the method is particularly used for: selecting important information from the crawled information by a node importance method based on the degree and aggregation coefficient so as to achieve the aim of removing redundant information and obtain the related information;
wherein the metric describes the number of neighbor nodes of a node,
k i =∑ j∈G δ ij
wherein, the liquid crystal display device comprises a liquid crystal display device,
i, j are different nodes;
the aggregation factor describes the proportion of neighbors to each other among the neighbors of a node in the network, expressed as:
node importance evaluation index p i Expressed as:
wherein f i The sum of the node i self-degree and the neighbor degree is expressed as:
wherein k is w Representing the degree of node w, Δi representing the set of neighbor nodes of node i; function g i Expressed as:
according to the node importance evaluation index pi, important information is selected from the crawled information so as to achieve the aim of removing redundant information, and the related information is obtained;
the entity information extraction module is used for extracting the ternary group entity information related to the appointed disaster from the related information;
the entity relation extracting module is used for extracting the relation among the triplet entity information;
the entity attribute extraction module is used for extracting the attribute of the entity of the appointed disaster;
the story line generation module is used for constructing a story line of the appointed disaster according to the triplet entity information, the relation among the triplet entity information and the attribute of the entity of the appointed disaster; the method is particularly used for constructing local disaster story lines; generating a global disaster story line; wherein the generating a global disaster storyline comprises:
constructing a cost function, wherein the cost function is used for describing the similarity degree between at least 2 maps;
judging whether directed edge connection exists among the at least 2 partial maps according to a cost function;
fusing the cost function and the local map to construct a global story line;
when the number of the at least 2 partial maps is 2, the cost function includes:
wherein d (i, j) represents the distance between the locations representing the normalized i and j two map descriptions;
N j representing the number of triples of the atlas j.
CN201811382046.9A 2018-11-20 2018-11-20 Disaster story line construction method and device Active CN109582958B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811382046.9A CN109582958B (en) 2018-11-20 2018-11-20 Disaster story line construction method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811382046.9A CN109582958B (en) 2018-11-20 2018-11-20 Disaster story line construction method and device

Publications (2)

Publication Number Publication Date
CN109582958A CN109582958A (en) 2019-04-05
CN109582958B true CN109582958B (en) 2023-07-18

Family

ID=65922787

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811382046.9A Active CN109582958B (en) 2018-11-20 2018-11-20 Disaster story line construction method and device

Country Status (1)

Country Link
CN (1) CN109582958B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110083709B (en) * 2019-04-28 2021-09-24 宁波深擎信息科技有限公司 Method and system for automatically constructing knowledge graph based on description definition
CN110866190B (en) * 2019-11-18 2021-05-14 支付宝(杭州)信息技术有限公司 Method and device for training neural network model for representing knowledge graph

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845474A (en) * 2015-12-07 2017-06-13 富士通株式会社 Image processing apparatus and method
CN107330125A (en) * 2017-07-20 2017-11-07 云南电网有限责任公司电力科学研究院 The unstructured distribution data integrated approach of magnanimity of knowledge based graphical spectrum technology
CN108776684A (en) * 2018-05-25 2018-11-09 华东师范大学 Optimization method, device, medium, equipment and the system of side right weight in knowledge mapping

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8909653B1 (en) * 2012-02-06 2014-12-09 Su-Kam Intelligent Education Systems, Inc. Apparatus, systems and methods for interactive dissemination of knowledge
US9892210B2 (en) * 2014-10-31 2018-02-13 Microsoft Technology Licensing, Llc Partial graph incremental update in a social network
WO2017040632A2 (en) * 2015-08-31 2017-03-09 Omniscience Corporation Event categorization and key prospect identification from storylines
CN106156365B (en) * 2016-08-03 2019-06-18 北京儒博科技有限公司 A kind of generation method and device of knowledge mapping
CN108664615A (en) * 2017-05-12 2018-10-16 华中师范大学 A kind of knowledge mapping construction method of discipline-oriented educational resource
CN107194422A (en) * 2017-06-19 2017-09-22 中国人民解放军国防科学技术大学 A kind of convolutional neural networks relation sorting technique of the forward and reverse example of combination
CN108763333B (en) * 2018-05-11 2022-05-17 北京航空航天大学 Social media-based event map construction method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106845474A (en) * 2015-12-07 2017-06-13 富士通株式会社 Image processing apparatus and method
CN107330125A (en) * 2017-07-20 2017-11-07 云南电网有限责任公司电力科学研究院 The unstructured distribution data integrated approach of magnanimity of knowledge based graphical spectrum technology
CN108776684A (en) * 2018-05-25 2018-11-09 华东师范大学 Optimization method, device, medium, equipment and the system of side right weight in knowledge mapping

Also Published As

Publication number Publication date
CN109582958A (en) 2019-04-05

Similar Documents

Publication Publication Date Title
CN111291185B (en) Information extraction method, device, electronic equipment and storage medium
Adedoyin-Olowe et al. A survey of data mining techniques for social media analysis
Lin et al. Continuous improvement of knowledge management systems using Six Sigma methodology
WO2019050968A1 (en) Methods, apparatus, and systems for transforming unstructured natural language information into structured computer- processable data
Jotheeswaran et al. OPINION MINING USING DECISION TREE BASED FEATURE SELECTION THROUGH MANHATTAN HIERARCHICAL CLUSTER MEASURE.
US8370119B2 (en) Website design pattern modeling
Shu Knowledge discovery in the social sciences: A data mining approach
Wang et al. Understanding geological reports based on knowledge graphs using a deep learning approach
Ceci et al. Closed sequential pattern mining for sitemap generation
Stahl et al. A Survey of Data Mining Techniques for Social Network Analysis
CN109582958B (en) Disaster story line construction method and device
Roudsari et al. Comparison and analysis of embedding methods for patent documents
Keith Norambuena et al. A survey on event-based news narrative extraction
Sohrabi et al. Systematic method for finding emergence research areas as data quality
Ribeiro et al. Discovering IMRaD structure with different classifiers
CN112632223B (en) Case and event knowledge graph construction method and related equipment
Wei et al. GP-GCN: Global features of orthogonal projection and local dependency fused graph convolutional networks for aspect-level sentiment classification
Abbas et al. Automated File Labeling for Heterogeneous Files Organization Using Machine Learning.
CN114840685A (en) Emergency plan knowledge graph construction method
Wei et al. Sentiment classification of tourism reviews based on visual and textual multifeature fusion
Haris et al. Mining graphs from travel blogs: a review in the context of tour planning
Liu et al. Ipod: An industrial and professional occupations dataset and its applications to occupational data mining and analysis
Preethi Survey on text transformation using Bi-LSTM in natural language processing with text data
Chen et al. Demand-driven knowledge acquisition method for enhancing domain ontology integrity
Eddamiri et al. Graph embeddings for linked data clustering

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant