CN108897792B - Disaster monitoring and analyzing method for extracting multi-dimensional disaster-related information of Internet - Google Patents

Disaster monitoring and analyzing method for extracting multi-dimensional disaster-related information of Internet Download PDF

Info

Publication number
CN108897792B
CN108897792B CN201810597449.9A CN201810597449A CN108897792B CN 108897792 B CN108897792 B CN 108897792B CN 201810597449 A CN201810597449 A CN 201810597449A CN 108897792 B CN108897792 B CN 108897792B
Authority
CN
China
Prior art keywords
disaster
words
information
index
knowledge base
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810597449.9A
Other languages
Chinese (zh)
Other versions
CN108897792A (en
Inventor
解吉波
杨腾飞
李国庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Remote Sensing and Digital Earth of CAS
Original Assignee
Institute of Remote Sensing and Digital Earth of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Remote Sensing and Digital Earth of CAS filed Critical Institute of Remote Sensing and Digital Earth of CAS
Priority to CN201810597449.9A priority Critical patent/CN108897792B/en
Publication of CN108897792A publication Critical patent/CN108897792A/en
Application granted granted Critical
Publication of CN108897792B publication Critical patent/CN108897792B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a disaster monitoring and analyzing method for extracting multi-dimensional disaster-related information of the Internet, which comprises the following steps: s1: real-time acquisition and pretreatment of multi-source data; s2: disaster situation information extraction knowledge base is constructed in an auxiliary mode through a feature word multidimensional expansion algorithm FWME to extract disaster loss information and emotion feedback information of disaster audience groups, and the disaster loss information and the emotion feedback information are used for obtaining multidimensional disaster-related information; s3: and performing combined monitoring and analysis on the disaster area according to the multidimensional disaster-related information. The disaster information extraction knowledge base is constructed, disaster loss information contained in an internet platform and emotion feedback information of audience groups in a disaster process are accurately extracted, combined analysis is carried out by combining time and space dimensions, a disaster progress process is described in detail, and the method is used for assisting in real-time disaster monitoring, disaster damage assessment analysis, disaster rescue feedback, follow-up influence assessment and other works.

Description

Disaster monitoring and analyzing method for extracting multi-dimensional disaster-related information of Internet
Technical Field
The invention relates to the technical field of disaster monitoring and analysis, in particular to a method for extracting multi-dimensional disaster-related information for disaster monitoring and analysis based on internet multi-source text data.
Background
China is a disaster-prone country, a large amount of personnel and property losses are caused by various disasters every year, and based on traditional disaster monitoring methods such as satellite remote sensing, manual investigation and the like, the traditional disaster monitoring methods are often difficult to play roles in time due to the defects of harsh implementation conditions, high cost and the like. And some disaster information such as emotional feedback of disaster audience groups is difficult to achieve by the traditional disaster monitoring method, and the information is also extremely important in disaster monitoring analysis.
At present, many scholars monitor and analyze disasters based on social media, such as exploring typhoon movement tracks, human behavior rules and the like through space-time dimensions, or mining information such as public attention hotspots, emotional attitudes and the like when disasters such as typhoons, earthquakes and the like occur based on text contents. Then, the monitoring granularity of the methods is coarse, the pertinence is not strong, the loss detail information caused by disasters cannot be described specifically, and the emotional feedback of the public on government rescue activities cannot be reflected.
The method comprises the steps of extracting fine-grained disaster-related information from an Internet text, wherein context features of the information are quite sparse, and a plurality of fine-grained disaster topics are often included in the same text. The traditional supervised learning method not only needs to manually label large-scale training corpora, but also aims at text classification of a single subject. The traditional rule-based method needs a great deal of expert knowledge to summarize the extraction rule, and has poor portability.
Disclosure of Invention
Technical problem to be solved
The present disclosure provides a disaster monitoring and analyzing method for extracting multidimensional disaster-related information of the internet, so as to at least partially solve the above-mentioned technical problems.
(II) technical scheme
According to one aspect of the disclosure, a disaster monitoring and analyzing method for extracting internet multi-dimensional disaster-related information is provided, which includes:
s1: real-time acquisition and pretreatment of multi-source data;
s2: disaster situation information extraction knowledge base is constructed in an auxiliary mode through a feature word multidimensional expansion algorithm FWME to extract disaster loss information and emotion feedback information of disaster audience groups, and the disaster loss information and the emotion feedback information are used for obtaining multidimensional disaster-related information;
s3: and performing combined monitoring and analysis on the disaster area according to the multidimensional disaster-related information.
In some embodiments of the present disclosure, the multi-source data in step S1 is obtained from multiple internet platforms, and obtained through a search engine by using keywords related to a specified disaster; the preprocessing comprises text deduplication, complex and simple conversion and full half-angle conversion, and time stamp information and position information of text data are extracted and stored separately.
In some embodiments of the present disclosure, in step S2, the method for extracting disaster damage information and emotional feedback information of a disaster audience group by using a feature word multidimensional extension algorithm FWME to assist in building a disaster information extraction knowledge base includes:
s21: constructing a disaster information extraction knowledge base;
s22: extracting various feature words in a knowledge base based on feature word multi-dimensional expansion algorithm FWME longitudinal and transverse expansion disaster situation information;
s23: disaster damage information and emotional feedback information of disaster audience groups are extracted.
In some embodiments of the present disclosure, in step S21, the disaster information extraction knowledge base includes a disaster information identification knowledge base and a public emotion feedback knowledge base, and the disaster information identification knowledge base and the public emotion feature words included in the text are used for identification and classification, while negative words counteracting the text semantics and degree words promoting the text semantics are considered.
In some embodiments of the present disclosure, the disaster information extraction knowledge base is structurally represented as a tree with a height of 4 levels, where the top level of the tree is the disaster information extraction knowledge base, and the level 2 includes two subtrees, where:
the left sub-tree of the layer 2 represents a disaster damage information identification knowledge base, the left sub-tree corresponds to the layer 3 node and represents each disaster damage category and a negative word, each disaster damage category node comprises a plurality of unfixed leaf nodes, and entity information represented by each leaf node is independent; each leaf node of the fourth layer stores a plurality of feature word pairs which represent the disaster damage events and correspond to each node of the 3 rd layer, wherein the feature words comprise feature words which represent disaster damage objects and feature words of disaster damage actions and are used for identifying the disaster damage events; the negative word node comprises a series of negative word information and is stored in the leaf node of the fourth layer below the negative word node; dynamically reserving a storage space for the feature words and the negative words in each leaf node to store Index position information of the feature words or the negative words in the text to be extracted, namely the disaster damage feature word Index (W)i) And negative feature word Index (N);
the right subtree at the layer 2 represents a public emotion feedback knowledge base, the nodes at the layer 2 represent emotion characteristic words, negative characteristic words and degree characteristic words, each related node is provided with a leaf node and stores a corresponding characteristic word base, and each characteristic word dynamically reserves two storage spaces for storing Score information and Index information of the corresponding characteristic word in the text to be extracted, wherein the Score information comprises an emotion characteristic word Score (E) and an Index (E) thereof; negative feature word Score (N) and its Index (N); the degree feature word Score (D) and its Index (D).
In some embodiments of the present disclosure, in the step S22, the FWME algorithm is used to perform feature expansion on each type of feature words in the disaster information extraction knowledge base from two dimensions, namely, a longitudinal dimension and a transverse dimension, respectively, wherein,
the longitudinal expansion comprises the steps of utilizing a word vector model integrated in a FWME algorithm, taking Internet disaster-related texts as linguistic data, carrying out similarity calculation on various feature words, and taking the similar words meeting a threshold value as longitudinal supplementary words of feature words to be expanded in a knowledge base;
and the transverse expansion comprises the steps of utilizing the synonymy relation among the words and phrases for the existing characteristic words on the basis of longitudinal expansion, and taking the words meeting the synonymy condition as synonymy supplementary words of each characteristic word in the knowledge base.
In some embodiments of the present disclosure, in the step S23, the extracting disaster damage information based on the wind disaster knowledge base includes the following steps:
s231: text to be processed forms a short sentence set by punctuating punctuation and punctuating sentences { S1,S2,...Si,i>=1},SiThe short sentences formed after the text to be processed is broken are represented, each short sentence is divided into words and stop words are removed, and index position information of the rest words in each short sentence is recorded;
s232: matching the words in each short sentence in S231 with the feature word pairs in the leaf nodes under each disaster type node of the disaster damage knowledge base, and recording the Index position Index (w) of the word in the sentence when the feature word pairs of a certain disaster type are completely met1) And Index (w)2) And marking the sentence as a candidate sentence of the disaster damage category.
S233: and judging whether other words in the candidate sentence can be matched with the negative words in the disaster damage information identification knowledge base or not, and if so, recording the Index position Index (N) of the negative words.
S234: if and only if Index (N) < Index (w)1) Or Index (N) < Index (w)2) Then, the candidate sentence belongs to the corresponding damage category.
In some embodiments of the present disclosure, in step S23, the method for extracting emotional feedback information of a disaster audience group by the wind disaster knowledge base includes the following steps:
s231': text to be processed forms a short sentence set by punctuating punctuation and punctuating sentences { S1,S2,...SiAnd i is equal to 1, and the words are divided and the stop words are removed from each short sentence, and the index position information of the rest words in each short sentence is recorded.
S232': and matching the words in the short sentences in the S231 with the characteristic words in the leaf nodes under the nodes of various emotion characteristic words, negative words and degree words in the public emotion feedback knowledge base respectively, and recording Index position information Index (E), Index (N), Index (D) and scores Score (E), Score (N) and Score (D) of the matched characteristic words.
S233': when the index values of the negative words and the degree words are less than the index values of the emotion words, respectively multiplying Score (N) and Score (D) by corresponding emotion value Score (E), and then summing the final scores to form Score (S) of the short sentencei)。
S234': setting a threshold sequence, and putting each text Score (S) into the threshold sequence, thereby obtaining the emotion category of the text.
In some embodiments of the present disclosure, the multidimensional disaster-related information in step S3 is a four-dimensional disaster array [ loss, observation, time, location ] that is constructed by extracting disaster damage information and public emotional feedback information included in internet text data related to a specified disaster event using the disaster information extraction knowledge base constructed in step S2, and combining timestamp information and included location information of the internet text.
In some embodiments of the present disclosure, in the step S3, the monitoring analysis is performed on the disaster-affected area according to the multidimensional disaster-related information, including spreading the constructed four-dimensional disaster array [ loss, observation, time, location ] on the vector map and performing a space-time analysis, so as to perform detailed monitoring and real-time analysis on the whole process of the disaster occurrence, and the final result is used for disaster real-time monitoring, disaster damage assessment analysis, disaster rescue feedback, and subsequent impact assessment.
(III) advantageous effects
According to the technical scheme, the disaster monitoring and analyzing method for extracting the internet multi-dimensional disaster-related information has at least one of the following beneficial effects:
(1) disaster loss information contained in an internet platform such as a microblog and emotion feedback information of audience groups in a disaster process are accurately extracted through the constructed disaster information extraction knowledge base, combined analysis is carried out by combining time and space dimensions, a disaster progress process is described in detail, and the method is used for assisting in real-time disaster monitoring, disaster damage evaluation analysis, disaster rescue feedback, subsequent influence evaluation and other work;
(2) by means of the feature word multi-dimensional expansion algorithm, the defects that context features of short texts are sparse, and fine-grained disaster loss information is difficult to extract and classify are overcome, so that automatic mining of disaster loss information contained in internet texts is achieved, and efficient auxiliary disaster reduction work is carried out.
Drawings
Fig. 1 is a schematic flow chart of a disaster monitoring and analyzing method for extracting internet multidimensional disaster-related information according to the embodiment of the present disclosure.
Fig. 2 is a schematic structural diagram of a disaster information extraction knowledge base according to an embodiment of the present disclosure.
Fig. 3 is a schematic flow chart of a method for performing longitudinal and transverse expansion on each feature word in a knowledge base by using a feature word multidimensional extension algorithm FWME according to an embodiment of the disclosure.
Fig. 4 is a schematic view of spatial distribution of various types of disasters and losses in the typhoon passing process according to the embodiment of the disclosure.
Fig. 5 is a schematic diagram of a time-space distribution sequence and an emotion feedback change sequence of each category of disaster in the typhoon crossing process according to the embodiment of the disclosure.
Detailed Description
The invention provides a disaster monitoring and analyzing method based on multi-dimensional disaster-related information joint analysis, which adopts an algorithm FWME (Feature Words Multidimensional Extension) with Feature word multi-dimensional expansion to assist in constructing a disaster information extraction knowledge base so as to solve the problems of low extraction precision of fine-grained multi-dimensional disaster-related information and insufficient existing disaster monitoring and analyzing methods, extracts multi-dimensional disaster-related information contained in an internet text through the knowledge base, and evaluates disaster loss and post-government disaster rescue activities by analyzing disaster loss change characteristics under a space-time sequence and emotion change characteristics of disaster audience groups.
The overall technical scheme adopted by the invention comprises the following steps:
and S1, real-time acquisition and pretreatment of multi-source data.
S2, providing a feature word multidimensional expansion algorithm FWME to assist in constructing a disaster information extraction knowledge base to extract disaster loss information and emotion feedback information of disaster audience groups;
and S3, monitoring and analyzing the disaster area according to the multi-dimensional disaster information.
And in the step S1, the multi-source data are obtained in real time, the data from Internet platforms such as forum posts, WeChat public numbers, news, New wave microblogs and the like are obtained, and keywords related to the designated disaster are used as search conditions. The preprocessing comprises text deduplication, complex and simple conversion, full half-angle conversion and the like. And simultaneously, independently extracting and storing the time and space position information of the text into a database. The time and space position information of the text comprises time stamp information, position information and the like.
In S2, the multidimensional disaster information in the text is obtained by constructing a disaster information extraction knowledge base, and the knowledge base stores abundant disaster damage feature word pair information for representing various disasters and emotion feedback feature word information of the audience in the disasters, and also stores negative word information and degree word information for accurately expressing semantic features. Each feature word in the knowledge base is enriched and supplemented through the feature word multidimensional expansion algorithm FWME provided by the invention.
In step S2, a disaster information extraction knowledge base is established by using a feature word multidimensional extension algorithm FWME algorithm to extract disaster loss information and emotion feedback information of a disaster audience group, including:
s21, constructing disaster information extraction knowledge base
The disaster information extraction knowledge base comprises the steps of extracting disaster loss information and emotion feedback information of disaster audience groups in an internet text, and identifying and classifying by using disaster loss and public emotion characteristic words contained in the text. Such as disaster damage characteristic word pair of 'power off-electricity', etc., public emotion feedback words such as 'happy' and 'too hard', etc., and negative words and degree words with a promoting effect which are adverse to text semantics are also considered.
The structure of the disaster information extraction knowledge base can be represented as a tree with the height of 4, wherein the 2 nd layer of the tree is provided with two subtrees, the left subtree represents a disaster information identification knowledge base, the lower node represents each disaster category and a negative word, each disaster category node comprises a plurality of unfixed leaf nodes, and the entity information represented by each leaf node is independent. Each leaf node stores a plurality of feature word pairs representing disaster events. The negative word node comprises a series of negative word information stored in leaf nodes thereof. In addition, a storage space is dynamically reserved for the feature words and the negative words in each leaf node, and the storage space is used for storing Index position information of the feature words or the negative words in the text to be extracted, namely the disaster damage feature word Index (Wi) and the negative feature word Index (N). The right subtree on the 2 nd layer represents a public emotion feedback knowledge base, the structure of the public emotion feedback knowledge base is similar to that of a disaster damage information identification knowledge base, the nodes on the 2 nd layer represent emotion types, negative words and degree words, related nodes are respectively provided with a leaf node and store corresponding feature word bases, and two storage spaces are dynamically reserved for each feature word and used for storing Score information and Index information of the corresponding feature word in the text to be extracted, namely, emotion feature word Score (E) and Index (E); negative feature word Score (N) and Index (N); the degree feature word Score (D) and Index (D).
S22, longitudinally and transversely expanding disaster situation information and extracting various feature words in the knowledge base based on the feature word multi-dimensional expansion algorithm FWME algorithm.
The FWME algorithm is used for expanding disaster information and extracting various disaster-related characteristic words in a knowledge base, feature expansion is carried out from the longitudinal dimension and the transverse dimension respectively, the longitudinal expansion is carried out by utilizing a word vector model integrated in the FWME algorithm, a large amount of internet disaster-related texts are used as linguistic data (including text data provided by forum posts, WeChat public numbers, news websites, Xinunres microblogs and the like) to carry out similarity calculation on various characteristic words, and the similar words meeting a threshold value (and consistent with the characteristic words to be expanded) are taken as longitudinal supplement words of the characteristic words to be expanded in the knowledge base. The horizontal expansion is based on the longitudinal expansion, and the synonymy relation among the words is utilized for the existing characteristic words, the words meeting the synonymy condition are used as synonymy supplementary words of each characteristic word in the knowledge base, wherein synonymy calculation takes synonymy forest as the corpus.
And S23, extracting disaster damage information and emotional feedback information of disaster audience groups.
In S2, the method for extracting disaster damage information through the disaster information extraction knowledge base includes the following steps:
s231: text to be processed is punctuated to form a short sentence set { S1, S2,. Si, i > -1 }, SiThe short sentences formed after the text is broken are shown, the words are divided and the stop words are removed from each short sentence, and the index position information of the rest words in each short sentence is recorded.
S232: and (S231) matching the words in the short sentences in the step (S231) with the feature word pairs in the leaf nodes under the disaster damage class nodes of the disaster damage knowledge base, recording the Index positions Index (w1) and Index (w2) of the words in the sentences at the same time when the feature word pairs of a certain disaster damage class are completely met, and marking the sentences as candidates of the disaster damage class.
S233: and then judging whether other words in the candidate sentence can be matched with the negative words in the disaster damage knowledge base or not, and if so, recording the Index position Index (N) of the negative words.
S234: if and only if Index (N) < Index (w1) or Index (N) < Index (w2), the candidate sentence belongs to the corresponding damage category.
The disaster information extraction knowledge base in the S2 extracts emotion feedback information of disaster audience groups, and the method comprises the following steps:
s231': and (3) segmenting the text to be processed into a short sentence set { S1, S2,. Si, i > -1 } according to punctuations, dividing words and removing stop words for each short sentence, and recording index position information of the rest words in each short sentence.
S232': and matching the words in the short sentences in the S231' with the characteristic words in the leaf nodes under the nodes of various emotion characteristic words, negative words and degree words in the public emotion feedback knowledge base respectively, and recording Index position information Index (E), Index (N), Index (D) and scores Score (E), Score (N) and Score (D) of the matched characteristic words.
S233': and when the index values of the negative words and the degree words are smaller than the index value of the emotion word, respectively multiplying Score (N) and Score (D) by the corresponding emotion value Score (E), and then summing the final scores to form Score (Si) of the short sentence.
S234': the final whole sentence Score, Score (S), is the sum of the scores, Σ, of each short sentencei≥1Score(Si)。
Setting a threshold sequence, and putting each text Score (S) into the threshold sequence, thereby obtaining the emotion category of the text.
The multidimensional disaster-related information in step S3 is obtained by extracting disaster damage information and public emotion feedback information included in internet text data related to a specified disaster event using the disaster information extraction knowledge base constructed in step S2, and combining timestamp information and included location information of the internet text, thereby constructing a four-dimensional disaster array [ loss, observation, time, and location ].
In step S3, the disaster area is monitored and analyzed according to the multidimensional disaster-related information, and the constructed four-dimensional disaster event array [ loss, observation, time, location ] is spread on the vector map and subjected to spatial-temporal analysis, so that the whole process of disaster occurrence is monitored in detail and analyzed in real time, and the final result can be used for disaster damage assessment analysis, disaster rescue feedback, disaster subsequent influence assessment, and the like.
The disaster information extraction knowledge base constructed by the method accurately extracts disaster loss information contained in a microblog and emotion feedback information of audience groups in a disaster process, performs joint analysis by combining time and space dimensions, describes a disaster progress process in detail, and is assisted for real-time disaster monitoring, disaster damage evaluation analysis, disaster rescue feedback, subsequent influence evaluation and other work.
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
Certain embodiments of the present disclosure will now be described more fully hereinafter with reference to the accompanying drawings, in which some, but not all embodiments of the disclosure are shown. Indeed, various embodiments of the disclosure may be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements.
In a first exemplary embodiment of the present disclosure, a disaster monitoring and analyzing method for extracting multidimensional disaster-related information of the internet is provided. Taking typhoon 'pigeon sky' in the Zhuhai city of 23.8.2017 as an embodiment, and performing real-time fine-grained monitoring analysis on an area affected by wind disasters, the invention acquires a multi-source disaster-related text based on an internet multi-platform, and then uses a Feature word multi-dimensional Extension algorithm FWME (Feature Words multi-dimensional Extension) provided by the invention to assist in constructing a disaster information extraction knowledge base to extract disaster loss information and emotion feedback information of disaster audience groups contained in the internet text. And finally, performing joint analysis by combining the time and space position information of the text, and assisting in disaster assessment, disaster rescue, rescue feedback acquisition and the like.
Fig. 1 is a schematic flow chart of a disaster monitoring and analyzing method for extracting internet multidimensional disaster-related information according to the embodiment of the present disclosure. As can be seen from the figure, the method comprises the following processes:
and S1, real-time acquisition and pretreatment of multi-source data.
S2, providing a Feature word Multidimensional Extension algorithm FWME (Feature Words Multidimensional Extension) to assist in constructing a typhoon disaster situation information extraction knowledge base to extract wind disaster loss information and emotion feedback information of wind disaster audience groups;
and S3, monitoring and analyzing the wind disaster area according to the multidimensional disaster information.
And S1, real-time acquisition and preprocessing of multi-source data. And acquiring internet multi-source data of the monitoring and analyzing area, wherein the data is from platforms such as forum posts, WeChat public numbers, news, New wave microblogs and the like. And then preprocessing all data, including duplicate removal, simplified conversion, full half-angle conversion and the like, simultaneously analyzing time information and position information uploaded by texts contained in the data, and storing other fields.
In the step S2, the typhoon damage information extraction knowledge base is constructed to obtain the wind damage information and the emotion feedback information of the wind disaster audience group contained in the text. The characteristic word pairs used for representing different disaster losses and emotion feedback characteristic words of the audience groups in the disasters, which are stored in the knowledge base, are enriched and supplemented through the characteristic word multi-dimensional expansion algorithm FWME provided by the invention, and in addition, the knowledge base also comprises negative words and degree words used for accurately expressing semantic characteristics.
Fig. 2 is a schematic structural diagram of a disaster information extraction knowledge base according to an embodiment of the present disclosure, and as shown in fig. 2, the structure of the disaster information extraction knowledge base may be represented as a tree with a height of 4, where two subtrees are located at the 2 nd layer of the tree, a left subtree represents a disaster information identification knowledge base, a lower node of the tree represents each disaster category and a negative word, each disaster category node includes a plurality of unfixed leaf nodes, and entity information represented by each leaf node is independent of each other. Each leaf node stores a plurality of feature word pairs representing disaster events. The negative word node comprises a series of negative word information stored in leaf nodes thereof. In addition, a storage space is dynamically reserved for the feature words and the negative words in each leaf node, and the storage space is used for storing Index position information of the feature words or the negative words in the text to be extracted, namely the disaster damage feature word Index (Wi) and the negative feature word Index (N). The right subtree on the 2 nd layer represents a public emotion feedback knowledge base, the structure of the public emotion feedback knowledge base is similar to that of a disaster damage information identification knowledge base, the nodes on the 2 nd layer represent emotion types, negative words and degree words, related nodes are respectively provided with a leaf node and store corresponding feature word bases, and two storage spaces are dynamically reserved for each feature word and used for storing Score information and Index information of the corresponding feature word in the text to be extracted, namely, the Score (E) and the Index (E) of the emotion feature word; negative feature word Score (N) and Index (N); the degree feature word Score (D) and the Index (D).
In this embodiment, the types of the damage are divided into 11 types, as shown in table 1
TABLE 1
Serial number Category of damage
1 Traffic jam
2 Vehicle destruction
3 Damage to forest
4 Casualty
5 Influence of water supply
6 Damage to buildings
7 Commercial impact
8 Communication impact
9 Influence of power supply
10 Damage to electric power facilities
11 Infrastructure damage
In this embodiment, the emotion categories are classified into 3 categories as shown in Table 2
TABLE 2
Serial number Public group mood
1 Positive emotion
2 Neutral mood
3 Negative emotions
And setting scores of all characteristic words in the public emotional feedback knowledge base, wherein each Positive emotional characteristic word Score is 1, each Negative emotional characteristic word corresponding Score is-1, each Negative word Score is-1, and each Degree word Score is 1.5.
Fig. 3 is a flowchart of a method for performing longitudinal and lateral expansion on each feature word in the knowledge base by the feature word multidimensional extension algorithm FWME in S2.
The longitudinal expansion is to use a word vector model integrated in the FWME algorithm, use the obtained multi-source Internet disaster-related text as a corpus, calculate the similarity of the feature words to be expanded, and take the similar words within a threshold as longitudinal supplementary words of a knowledge base. And taking the words which meet the similarity threshold and have the same part of speech as the longitudinal supplementary words of the characteristic words to be expanded.
The threshold takes the 4 words with the closest similarity to the target word.
The longitudinal supplement increases the depth of words, such as the words 'telegraph pole', 'street lamp', etc. in the same context of 'tree', the words 'broken', 'blown down', etc. in the same context of 'inverted'. As shown in table 3.
TABLE 3
Figure BDA0001691808370000111
The horizontal expansion is based on the longitudinal expansion, and the synonymy relation among the words is utilized for the existing characteristic words, and the word set meeting the synonymy condition is used as the supplement of each characteristic word in the knowledge base. For example, synonymy computation takes "synonym forest" as corpus, and takes the words satisfying the synonymy condition among words as synonymy expansion words of the words.
The synonymy condition is set as that all words in the atomic word set where the word is located are obtained as synonymy expansion words by taking the position of the synonymy forest where the existing characteristic word is located as a reference.
Lateral expansion, expansion of synonyms for the feature, such as "railing", is shown in table 4.
TABLE 4
Figure BDA0001691808370000112
Figure BDA0001691808370000121
The same method carries out multi-dimensional expansion on the emotional characteristic words so as to enrich the emotional characteristic words.
By the feature word expansion, the disaster damage feature words in the wind disaster information extraction knowledge base and the emotion feedback feature words of disaster audience groups are enriched.
In S2, extracting disaster loss information through the wind disaster information extraction knowledge base includes the following steps:
(1) text to be processed forms a short sentence set by punctuating punctuation and punctuating sentences { S1,S2,...SiAnd i is equal to 1, and the words are divided and the stop words are removed from each short sentence, and the index position information of the rest words in each short sentence is recorded.
(2) Matching the words in the short sentences in the step (1) with the feature word pairs in the leaf nodes under the disaster damage class nodes of the disaster damage knowledge base, and recording the Index positions Index (w) of the words in the sentences when the feature word pairs of a certain disaster damage class are completely met1) And Index (w)2) And the sentence is marked as a candidate of the disaster damage category.
(3) And then judging whether other words in the candidate sentence can be matched with the negative words in the disaster damage knowledge base or not, and if so, recording the Index position Index (N) of the negative words.
(4) If and only if Index (N) < Index (W)1) Or Index (N) < Index (W)2) Then, the candidate sentence belongs to the corresponding damage category.
The method for extracting the emotion feedback information of the disaster audience group from the stroke disaster information extraction knowledge base in the S2 comprises the following steps:
(1) text to be processed forms a short sentence set by punctuating punctuation and punctuating sentences { S1,S2,...SiAnd i is equal to 1, and the words are divided and the stop words are removed from each short sentence, and the index position information of the rest words in each short sentence is recorded.
(2) Matching the words in the short sentences in the step (1) with the characteristic words in the leaf nodes under the Positive emotion, Negative emotion and Negative words and the Degree word nodes in the public emotion feedback knowledge base respectively, and recording Index position information Index (Positive), Index (Negative), Index (Positive), Index (Negative), Score (Negative) and scores (Positive), Score (Negative), Score (Positive) and Score (Score) of the matched characteristic words.
(3) When the index values of the negative words and the Degree words are less than the index values of the emotion words, respectively multiplying Score (Negage) and Score (Degree) with the corresponding emotion values, and then summing the final positive and negative scores to form Score (S) of the short sentence (S)i)。
(4) The final whole sentence Score (S) is the sum of the scores of the short sentences.
Score(S)=∑i≥1Score(Si)。
The emotion threshold value of the embodiment is set to 1 bit, and the numerical value is 0;
when Score (S) < 0, the emotion value of the sentence is negative. Score (S) > 0, this emotion value is positive. Score (S) ═ 0, this sentence is neutral in emotion value.
In S3, the disaster area is monitored and analyzed according to the multidimensional disaster information, a four-dimensional information array [ loss, observation, time, location ] is constructed from the time information, the spatial location information, the disaster information, and the public emotional feedback information, and is spread on a map, and sequence comparison is performed according to the progress of the disaster, so as to understand the disaster in detail.
Fig. 4 shows a schematic diagram of the spatial distribution of various types of disasters and damages in the typhoon passing process. As can be seen from FIG. 4, the disaster of the winter area is large, the disaster mainly comprises the aspects of water supply and power supply and traffic influence, and the rescue can be developed aiming at each disaster event by combining specific position information.
FIG. 5 shows a schematic diagram of the time-space distribution sequence and the emotion feedback change sequence of each type of disaster during the typhoon crossing. As can be seen from fig. 5, as the typhoon moves (the typhoon moves in the northwest direction), the disaster damage information of each crossing area increases successively, and different areas present different disaster damage distributions and emotion feedbacks before and after the typhoon crosses, such as before the typhoon logs in at 9 am to 12 am, the disaster damage categories mainly include traffic influence, and in this time period, the public travel returns more, the traffic is congested, and the emotional attitude is more negative. After the typhoon passes through the border, such as 18-24 points, the related disaster damage information is gradually reduced, the positive emotion of the public is increased, particularly in the winter area (the influence of the wind disaster is the largest), according to the text content, the government has higher rescue intensity after the typhoon passes through the time slot, and the public emotion deaf is positive.
The method comprises the steps of firstly, using internet multi-source data as disaster monitoring and analyzing corpora. Secondly, a feature word multidimensional expansion algorithm FWME is provided to assist in building a typhoon disaster situation information extraction knowledge base to extract the wind disaster loss information and emotion feedback information of wind disaster audience groups. And finally, performing multi-time sequence joint analysis on the disaster by combining time and space position information. The disaster situation monitoring in real time and comprehensively in the disaster area is achieved, and the implementation of disaster reduction and relief activities is assisted.
So far, the embodiments of the present disclosure have been described in detail with reference to the accompanying drawings. It is to be noted that, in the attached drawings or in the description, the implementation modes not shown or described are all the modes known by the ordinary skilled person in the field of technology, and are not described in detail. Further, the above definitions of the various elements and methods are not limited to the various specific structures, shapes or arrangements of parts mentioned in the examples, which may be easily modified or substituted by those of ordinary skill in the art.
It should also be noted that directional terms, such as "upper", "lower", "front", "rear", "left", "right", and the like, used in the embodiments are only directions referring to the drawings, and are not intended to limit the scope of the present disclosure. Throughout the drawings, like elements are represented by like or similar reference numerals. Conventional structures or constructions will be omitted when they may obscure the understanding of the present disclosure.
And the shapes and sizes of the respective components in the drawings do not reflect actual sizes and proportions, but merely illustrate the contents of the embodiments of the present disclosure. Furthermore, in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim.
Furthermore, the word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.
In addition, unless steps are specifically described or must occur in sequence, the order of the steps is not limited to that listed above and may be changed or rearranged as desired by the desired design. The embodiments described above may be mixed and matched with each other or with other embodiments based on design and reliability considerations, i.e., technical features in different embodiments may be freely combined to form further embodiments.
The algorithms and displays presented herein are not inherently related to any particular computer, virtual machine, or other apparatus. Various general purpose systems may also be used with the teachings herein. The required structure for constructing such a system will be apparent from the description above. Moreover, this disclosure is not directed to any particular programming language. It is appreciated that a variety of programming languages may be used to implement the present disclosure as described herein, and any descriptions above of specific languages are provided for disclosure of enablement and best mode of the present disclosure.
The disclosure may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. Various component embodiments of the disclosure may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art will appreciate that a microprocessor or Digital Signal Processor (DSP) may be used in practice to implement some or all of the functionality of some or all of the components in the relevant apparatus according to embodiments of the present disclosure. The present disclosure may also be embodied as apparatus or device programs (e.g., computer programs and computer program products) for performing a portion or all of the methods described herein. Such programs implementing the present disclosure may be stored on a computer-readable medium or may be in the form of one or more signals. Such a signal may be downloaded from an internet website or provided on a carrier signal or in any other form.
Those skilled in the art will appreciate that the modules in the device in an embodiment may be adaptively changed and disposed in one or more devices different from the embodiment. The modules or units or components of the embodiments may be combined into one module or unit or component, and furthermore they may be divided into a plurality of sub-modules or sub-units or sub-components. All of the features disclosed in this specification (including any accompanying claims, abstract and drawings), and all of the processes or elements of any method or apparatus so disclosed, may be combined in any combination, except combinations where at least some of such features and/or processes or elements are mutually exclusive. Each feature disclosed in this specification (including any accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Also in the unit claims enumerating several means, several of these means may be embodied by one and the same item of hardware.
Similarly, it should be appreciated that in the foregoing description of exemplary embodiments of the disclosure, various features of the disclosure are sometimes grouped together in a single embodiment, figure, or description thereof for the purpose of streamlining the disclosure and aiding in the understanding of one or more of the various disclosed aspects. However, the disclosed method should not be interpreted as reflecting an intention that: that is, the claimed disclosure requires more features than are expressly recited in each claim. Rather, as the following claims reflect, disclosed aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the detailed description are hereby expressly incorporated into this detailed description, with each claim standing on its own as a separate embodiment of this disclosure.
The above-mentioned embodiments are intended to illustrate the objects, aspects and advantages of the present disclosure in further detail, and it should be understood that the above-mentioned embodiments are only illustrative of the present disclosure and are not intended to limit the present disclosure, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims (9)

1. A disaster monitoring and analyzing method for extracting multi-dimensional disaster-related information of the Internet comprises the following steps:
s1: real-time acquisition and pretreatment of multi-source data;
s2: disaster situation information extraction knowledge base is constructed in an auxiliary mode through a feature word multidimensional expansion algorithm FWME to extract disaster loss information and emotion feedback information of disaster audience groups, and the disaster loss information and the emotion feedback information are used for obtaining multidimensional disaster-related information; the method comprises the following steps: extracting various feature words in a knowledge base based on feature word multi-dimensional expansion algorithm FWME longitudinal and transverse expansion disaster situation information;
s3: performing combined monitoring and analysis on the disaster area according to the multidimensional disaster-related information;
the FWME algorithm is used for performing feature expansion on various feature words in the disaster information extraction knowledge base from a longitudinal dimension and a transverse dimension respectively, wherein,
the longitudinal expansion comprises the steps of utilizing a word vector model integrated in a FWME algorithm, taking Internet disaster-related texts as linguistic data, carrying out similarity calculation on various feature words, and taking the similar words meeting a threshold value as longitudinal supplementary words of feature words to be expanded in a knowledge base;
and the transverse expansion comprises the steps of utilizing the synonymy relation among the words and phrases for the existing characteristic words on the basis of longitudinal expansion, and taking the words meeting the synonymy condition as synonymy supplementary words of all the characteristic words in the knowledge base.
2. The disaster monitoring and analyzing method as claimed in claim 1, wherein the multi-source data in step S1 is obtained from a plurality of internet platforms by using keywords related to the designated disaster through a search engine; the preprocessing comprises text deduplication, complex and simple conversion and full half-angle conversion, and time stamp information and position information of text data are extracted and stored separately.
3. The disaster monitoring and analyzing method as claimed in claim 1, wherein in step S2, a disaster information extraction knowledge base is constructed with the help of a feature word multidimensional extension algorithm FWME to extract disaster damage information and emotional feedback information of disaster audience groups, which includes:
s21: constructing a disaster information extraction knowledge base;
s23: disaster damage information and emotional feedback information of disaster audience groups are extracted.
4. The disaster monitoring and analyzing method according to claim 3, wherein,
in step S21, the disaster information extraction knowledge base includes a disaster information recognition knowledge base and a public emotion feedback knowledge base, and the disaster information and public emotion feature words included in the text are used for recognition and classification, while negative words counteracting the text semantics and degree words contributing to the text semantics are considered.
5. The disaster monitoring and analyzing method according to claim 3, wherein,
the disaster information extraction knowledge base is structurally represented as a tree with a height of 4 layers, wherein the top layer of the tree is the disaster information extraction knowledge base, the 2 nd layer comprises two subtrees, wherein:
the left sub-tree of the layer 2 represents a disaster damage information identification knowledge base, the left sub-tree corresponds to the layer 3 node and represents each disaster damage category and a negative word, each disaster damage category node comprises a plurality of unfixed leaf nodes, and entity information represented by each leaf node is independent; each leaf node of the fourth layer stores a plurality of feature word pairs which represent the disaster damage events and correspond to each node of the 3 rd layer, wherein the feature words comprise feature words which represent disaster damage objects and feature words of disaster damage actions and are used for identifying the disaster damage events; the negative word node comprises a series of negative word information and is stored in the leaf node of the fourth layer below the negative word node; dynamically reserving a storage space for the feature words and the negative words in each leaf node to store Index position information of the feature words or the negative words in the text to be extracted, namely the disaster damage feature word Index (W)i) And negative feature word Index (N);
the right subtree at the layer 2 represents a public emotion feedback knowledge base, the nodes at the layer 2 represent emotion characteristic words, negative characteristic words and degree characteristic words, each related node is provided with a leaf node and stores a corresponding characteristic word base, and each characteristic word dynamically reserves two storage spaces for storing Score information and Index information of the corresponding characteristic word in the text to be extracted, wherein the Score information comprises an emotion characteristic word Score (E) and an Index (E) thereof; negative feature word Score (N) and its Index (N); the degree feature word Score (D) and its Index (D).
6. The disaster monitoring and analyzing method according to claim 3, wherein,
in step S23, the extracting disaster damage information based on the knowledge base includes the following steps:
s231: text to be processed forms a short sentence set by punctuating punctuation and punctuating sentences { S1,S2,...Si,i>=1},SiThe short sentences formed after the text to be processed is broken are represented, each short sentence is divided into words and stop words are removed, and index position information of the rest words in each short sentence is recorded;
s232: matching the words in each short sentence in S231 with the feature word pairs in the leaf nodes under each disaster type node of the disaster damage knowledge base, and recording the Index position Index (w) of the word in the sentence when the feature word pairs of a certain disaster type are completely met1) And Index (w)2) Marking the sentence as a candidate sentence of the disaster damage category;
s233: judging whether other words in the candidate sentence can be matched with negative words in the disaster damage information identification knowledge base or not, and if so, recording the Index position Index (N) of the negative words;
s234: if and only if Index (N) < Index (w)1) Or Index (N) < Index (w)2) Then, the candidate sentence belongs to the corresponding damage category.
7. The disaster monitoring and analyzing method according to claim 3, wherein,
in step S23, the knowledge base extracts emotional feedback information of the disaster audience group, and includes the following steps:
s231': text to be processed forms a short sentence set by punctuating punctuation and punctuating sentences { S1,S2,...SiI > -1 }, and dividing words and removing stop words for each short sentence, and recording index position information of the rest words in each short sentence;
s232': matching the words in the short sentences in the S231 with the characteristic words in the leaf nodes under the nodes of various emotion characteristic words, negative words and degree words in the public emotion feedback knowledge base respectively, and recording Index position information Index (E), Index (N), Index (D) and scores Score (E), Score (N) and Score (D) of the matched characteristic words;
s233': when the index values of the negative words and the degree words are less than the index values of the emotion words, respectively multiplying Score (N) and Score (D) by corresponding emotion value Score (E), and then summing the final scores to form Score (S) of the short sentencei);
S234': setting a threshold sequence, and putting each text Score (S) into the threshold sequence, thereby obtaining the emotion category of the text.
8. The disaster monitoring and analyzing method of claim 1, wherein the multidimensional disaster-related information in step S3 is a four-dimensional disaster array [ loss, observation, time, location ] constructed by extracting disaster damage information and public emotional feedback information included in internet text data related to a specified disaster event using the disaster information extraction knowledge base constructed in step S2, and simultaneously combining timestamp information and included location information of internet text.
9. The disaster monitoring and analyzing method of claim 8, wherein the step S3 of monitoring and analyzing the disaster-affected area according to the multidimensional disaster-related information includes spreading the constructed four-dimensional disaster array [ loss, observation, time, location ] on a vector map and performing space-time analysis, so as to perform detailed monitoring and real-time analysis on the entire process of disaster occurrence, and the final result is used for disaster real-time monitoring, disaster damage assessment analysis, disaster rescue feedback and subsequent impact assessment.
CN201810597449.9A 2018-06-11 2018-06-11 Disaster monitoring and analyzing method for extracting multi-dimensional disaster-related information of Internet Active CN108897792B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810597449.9A CN108897792B (en) 2018-06-11 2018-06-11 Disaster monitoring and analyzing method for extracting multi-dimensional disaster-related information of Internet

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810597449.9A CN108897792B (en) 2018-06-11 2018-06-11 Disaster monitoring and analyzing method for extracting multi-dimensional disaster-related information of Internet

Publications (2)

Publication Number Publication Date
CN108897792A CN108897792A (en) 2018-11-27
CN108897792B true CN108897792B (en) 2022-05-03

Family

ID=64344484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810597449.9A Active CN108897792B (en) 2018-06-11 2018-06-11 Disaster monitoring and analyzing method for extracting multi-dimensional disaster-related information of Internet

Country Status (1)

Country Link
CN (1) CN108897792B (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102044022A (en) * 2010-12-24 2011-05-04 中国科学院合肥物质科学研究院 Emergency rescue decision making system aiming at natural disasters and method thereof
CN103390039A (en) * 2013-07-17 2013-11-13 北京建筑工程学院 Urban disaster thematic map real-time generating method based on network information
CN104809108A (en) * 2015-05-20 2015-07-29 成都布林特信息技术有限公司 Information monitoring and analyzing system
CN107562814A (en) * 2017-08-14 2018-01-09 中国农业大学 A kind of earthquake emergency and the condition of a disaster acquisition of information sorting technique and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2017078986A1 (en) * 2014-12-29 2017-05-11 Cyence Inc. Diversity analysis with actionable feedback methodologies

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102044022A (en) * 2010-12-24 2011-05-04 中国科学院合肥物质科学研究院 Emergency rescue decision making system aiming at natural disasters and method thereof
CN103390039A (en) * 2013-07-17 2013-11-13 北京建筑工程学院 Urban disaster thematic map real-time generating method based on network information
CN104809108A (en) * 2015-05-20 2015-07-29 成都布林特信息技术有限公司 Information monitoring and analyzing system
CN107562814A (en) * 2017-08-14 2018-01-09 中国农业大学 A kind of earthquake emergency and the condition of a disaster acquisition of information sorting technique and system

Also Published As

Publication number Publication date
CN108897792A (en) 2018-11-27

Similar Documents

Publication Publication Date Title
CN109213999B (en) Subjective question scoring method
CN109190117B (en) Short text semantic similarity calculation method based on word vector
US10719664B2 (en) Cross-media search method
CN102262634B (en) Automatic questioning and answering method and system
CN107679224B (en) Intelligent question and answer method and system for unstructured text
CN111767408A (en) Causal graph construction method based on integration of multiple neural networks
CN107392147A (en) A kind of image sentence conversion method based on improved production confrontation network
CN112148832B (en) Event detection method of dual self-attention network based on label perception
EP3940582A1 (en) Method for disambiguating between authors with same name on basis of network representation and semantic representation
CN113505200B (en) Sentence-level Chinese event detection method combined with document key information
CN106126619A (en) A kind of video retrieval method based on video content and system
CN111309891B (en) System for reading robot to automatically ask and answer questions and application method thereof
CN110502742B (en) Complex entity extraction method, device, medium and system
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN104142995A (en) Social event recognition method based on visual attributes
CN112100212A (en) Case scenario extraction method based on machine learning and rule matching
CN111191413B (en) Method, device and system for automatically marking event core content based on graph sequencing model
CN109359299A (en) A kind of internet of things equipment ability ontology based on commodity data is from construction method
CN111597349A (en) Rail transit standard entity relation automatic completion method based on artificial intelligence
KR20140044156A (en) Duplication news detection system and method for detecting duplication news
CN110347812A (en) A kind of search ordering method and system towards judicial style
CN108897792B (en) Disaster monitoring and analyzing method for extracting multi-dimensional disaster-related information of Internet
CN103500214B (en) Word segmentation information pushing method and device based on video searching
CN115270943A (en) Knowledge tag extraction model based on attention mechanism
Jin et al. Aesthetic image captioning on the FAE-Captions dataset

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant