CN108897792B - Disaster monitoring and analysis method for extracting multi-dimensional disaster-related information from the Internet - Google Patents
Disaster monitoring and analysis method for extracting multi-dimensional disaster-related information from the Internet Download PDFInfo
- Publication number
- CN108897792B CN108897792B CN201810597449.9A CN201810597449A CN108897792B CN 108897792 B CN108897792 B CN 108897792B CN 201810597449 A CN201810597449 A CN 201810597449A CN 108897792 B CN108897792 B CN 108897792B
- Authority
- CN
- China
- Prior art keywords
- disaster
- words
- information
- index
- knowledge base
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012544 monitoring process Methods 0.000 title claims abstract description 42
- 238000004458 analytical method Methods 0.000 title claims abstract description 35
- 230000006378 damage Effects 0.000 claims abstract description 82
- 238000000605 extraction Methods 0.000 claims abstract description 44
- 238000000034 method Methods 0.000 claims abstract description 40
- 230000002996 emotional effect Effects 0.000 claims abstract description 38
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 28
- 230000008569 process Effects 0.000 claims abstract description 14
- 238000007781 pre-processing Methods 0.000 claims abstract description 8
- 230000008451 emotion Effects 0.000 claims description 26
- 239000000284 extract Substances 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 8
- 238000004364 calculation method Methods 0.000 claims description 2
- 238000010223 real-time analysis Methods 0.000 claims description 2
- 230000007480 spreading Effects 0.000 claims description 2
- 238000010276 construction Methods 0.000 abstract description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000000694 effects Effects 0.000 description 4
- 230000009286 beneficial effect Effects 0.000 description 3
- 230000008859 change Effects 0.000 description 3
- 238000004590 computer program Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000005065 mining Methods 0.000 description 2
- 230000007935 neutral effect Effects 0.000 description 2
- 230000009467 reduction Effects 0.000 description 2
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 2
- 238000000429 assembly Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 230000009469 supplementation Effects 0.000 description 1
- 230000002123 temporal effect Effects 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
本公开提供了一种抽取互联网多维涉灾信息的灾情监测分析方法,包括:S1:多源数据的实时获取与预处理;S2:通过特征词多维扩展算法FWME辅助构建灾情信息抽取知识库来抽取灾害损失信息以及灾害受众群体的情绪反馈信息,用于获取多维涉灾信息;S3:根据多维涉灾信息对灾区作联合监测与分析。通过构建的灾情信息抽取知识库,精确地抽取互联网平台中蕴含的灾害损失信息和灾害过程中受众群体的情绪反馈信息,并结合时间和空间维度作联合分析,详细刻画了灾情进展过程,并辅助用于灾情实时监测、灾损评估分析、灾情救援反馈以及后续影响评估等工作。
The present disclosure provides a disaster monitoring and analysis method for extracting multi-dimensional disaster-related information from the Internet, including: S1: real-time acquisition and preprocessing of multi-source data; S2: assisting in building a disaster information extraction knowledge base through feature word multi-dimensional expansion algorithm FWME to extract The disaster loss information and the emotional feedback information of the disaster audience are used to obtain multi-dimensional disaster-related information; S3: Jointly monitor and analyze the disaster area according to the multi-dimensional disaster-related information. Through the construction of the disaster information extraction knowledge base, the disaster loss information contained in the Internet platform and the emotional feedback information of the audience during the disaster process are accurately extracted, and combined with the time and space dimensions for joint analysis, the disaster progress process is described in detail, and the auxiliary It is used for real-time disaster monitoring, disaster damage assessment and analysis, disaster rescue feedback, and follow-up impact assessment.
Description
技术领域technical field
本发明涉及灾害监测分析技术领域,具体涉及基于互联网多源文本数据抽取多维度涉灾信息作灾害监测分析。The invention relates to the technical field of disaster monitoring and analysis, in particular to the extraction of multi-dimensional disaster-related information based on Internet multi-source text data for disaster monitoring and analysis.
背景技术Background technique
我国是灾害多发国家,每年因各种灾害造成了大量的人员和财产损失,基于传统的灾害监测方法如卫星遥感、人工调查等,常常因实施条件苛刻、成本较高等缺点难以及时发挥效用。且一些灾情信息如灾害受众群体的情绪反馈等是传统灾情监测方法难以做到的,而这些信息同样是灾情监测分析中极其重要的。my country is a disaster-prone country. Various disasters cause a lot of human and property losses every year. Traditional disaster monitoring methods, such as satellite remote sensing and manual surveys, are often difficult to play in time due to harsh implementation conditions and high costs. And some disaster information, such as the emotional feedback of the disaster audience, is difficult to achieve by traditional disaster monitoring methods, and this information is also extremely important in disaster monitoring and analysis.
目前很多学者基于社交媒体对灾情进行监测分析,如时空维度探究台风移动轨迹、人类行为规律等,或是基于文本内容,挖掘台风、地震等灾害发生时公众关注热点、情感态度等信息。然后这些方法监测粒度较粗,针对性不强,不能够具体描述灾害造成的损失细节信息,也无法反应公众对政府救援活动的情绪反馈。At present, many scholars monitor and analyze disasters based on social media, such as exploring typhoon movement trajectories and human behavior patterns in time and space, or mining public attention, emotional attitudes and other information based on text content when typhoons, earthquakes and other disasters occur. However, these methods are relatively coarse-grained and less targeted, and cannot specifically describe the details of the losses caused by disasters, nor can they reflect the public's emotional feedback on government rescue activities.
从互联网文本中抽取细粒度的涉灾信息,通常这些信息所在上下文特征非常稀疏,且同一文本中往往包括多个细粒度灾情主题。基于传统的监督学习方法不仅需要人工标注大规模训练语料,且其多针对于单一主题的文本分类。而传统的基于规则的方法,则需要大量的专家知识来总结抽取规则,可移植性较差。Extract fine-grained disaster-related information from Internet texts. Usually, the contextual features of such information are very sparse, and the same text often includes multiple fine-grained disaster themes. Traditional supervised learning methods not only require manual annotation of large-scale training corpora, but are mostly aimed at text classification of a single topic. The traditional rule-based method requires a large amount of expert knowledge to summarize the extraction rules, and the portability is poor.
发明内容SUMMARY OF THE INVENTION
(一)要解决的技术问题(1) Technical problems to be solved
本公开提供了一种抽取互联网多维涉灾信息的灾情监测分析方法,以至少部分解决以上所提出的技术问题。The present disclosure provides a disaster monitoring and analysis method for extracting multi-dimensional disaster-related information from the Internet, so as to at least partially solve the above-mentioned technical problems.
(二)技术方案(2) Technical solutions
根据本公开的一个方面,提供了一种抽取互联网多维涉灾信息的灾情监测分析方法,包括:According to one aspect of the present disclosure, a disaster monitoring and analysis method for extracting multi-dimensional disaster-related information from the Internet is provided, including:
S1:多源数据的实时获取与预处理;S1: Real-time acquisition and preprocessing of multi-source data;
S2:通过特征词多维扩展算法FWME辅助构建灾情信息抽取知识库来抽取灾害损失信息以及灾害受众群体的情绪反馈信息,用于获取多维涉灾信息;S2: The disaster information extraction knowledge base is assisted by the feature word multi-dimensional expansion algorithm FWME to extract the disaster loss information and the emotional feedback information of the disaster audience, which is used to obtain multi-dimensional disaster-related information;
S3:根据多维涉灾信息对灾区作联合监测与分析。S3: Joint monitoring and analysis of the disaster area based on multi-dimensional disaster-related information.
在本公开一些实施例中,所述的步骤S1中的多源数据来自于多个互联网平台,利用与指定灾情相关的关键词通过搜索引擎获取;所述预处理包括文本去重、繁简转换以及全半角转换,以及对文本数据的时间戳信息和位置信息进行提取并单独存储。In some embodiments of the present disclosure, the multi-source data in the step S1 comes from multiple Internet platforms, and is obtained through a search engine using keywords related to the specified disaster situation; the preprocessing includes text deduplication and simple-to-traditional conversion. And full half-width conversion, as well as the time stamp information and location information of text data are extracted and stored separately.
在本公开一些实施例中,步骤S2中利用特征词多维扩展算法FWME辅助构建灾情信息抽取知识库来抽取灾害损失信息以及灾害受众群体的情绪反馈信息,包括:In some embodiments of the present disclosure, in step S2, the feature word multidimensional expansion algorithm FWME is used to assist in constructing a disaster information extraction knowledge base to extract disaster loss information and emotional feedback information of disaster audience groups, including:
S21:构建灾情信息抽取知识库;S21: Build a knowledge base for disaster information extraction;
S22:基于特征词多维扩展算法FWME纵向和横向扩展灾情信息抽取知识库中的各类特征词;S22: Extract various feature words in the knowledge base based on the feature word multi-dimensional expansion algorithm FWME vertically and horizontally expand disaster information;
S23:抽取灾害损失信息以及灾害受众群体的情绪反馈信息。S23: Extract the disaster loss information and the emotional feedback information of the disaster audience.
在本公开一些实施例中,所述步骤S21中,所述的灾情信息抽取知识库包括灾损信息识别知识库以及公众情绪反馈知识库,利用文本中蕴含的灾损和公众情绪特征词作识别和分类,同时考虑对文本语义起反作用的否定词和促进作用的程度词。In some embodiments of the present disclosure, in the step S21, the disaster information extraction knowledge base includes a disaster damage information identification knowledge base and a public sentiment feedback knowledge base, using the disaster damage and public sentiment feature words contained in the text for identification and classification, taking into account both negative words that counteract the semantics of the text and degree words that promote it.
在本公开一些实施例中,所述的灾情信息抽取知识库,其结构表示为一棵高度为4层的树,其中该树的顶层为灾情信息抽取知识库,第2层包括两棵子树,其中:In some embodiments of the present disclosure, the structure of the disaster information extraction knowledge base is represented as a tree with a height of 4 layers, wherein the top layer of the tree is the disaster information extraction knowledge base, and the second layer includes two subtrees, in:
第2层的左子树表示灾损信息识别知识库,其对应第3层节点表示各灾损类别和否定词,各灾损类别节点包含不固定的多个叶子节点,各叶子节点代表的实体信息相互独立;第四层的每个叶子节点存储多个与第3层各节点对应的表示灾损事件的特征词对,该特征词包括表示灾损对象的特征词及灾损动作的特征词,用以识别灾损事件;否定词节点包含一系列否定词信息,储存在其下第四层的叶子节点中;各叶子节点中的特征词和否定词都动态预留一个存储空间,用以保存待抽取文本中特征词或否定词的索引位置信息,即灾损特征词索引Index(Wi)和否定特征词索引Index(N);The left subtree of the second layer represents the knowledge base of disaster damage information, and the corresponding node of the third layer represents each disaster damage category and negative word. Each disaster damage category node contains multiple leaf nodes that are not fixed, and the entity represented by each leaf node The information is independent of each other; each leaf node of the fourth layer stores a plurality of feature word pairs representing disaster damage events corresponding to each node of the third layer, and the feature words include the feature words representing disaster damage objects and the feature words of disaster damage actions , used to identify disaster damage events; the negative word node contains a series of negative word information, which is stored in the leaf nodes of the fourth layer below it; the feature words and negative words in each leaf node dynamically reserve a storage space for Save the index position information of the feature word or negative word in the text to be extracted, namely the disaster damage feature word index Index(W i ) and the negative feature word index Index(N);
第2层右子树表示公众情绪反馈知识库,第2层节点表示情感特征词、否定特征词和程度特征词,相关节点各有一个叶子节点,储存对应的特征词库,其中的各特征词动态预留二个存储空间,用以保存待抽取文本中对应特征词的分值信息和索引信息,包括情感特征词分值Score(E)和其索引Index(E);否定特征词分值Score(N)和其索引Index(N);程度特征词分值Score(D)和其索引Index(D)。The second layer of the right subtree represents the public sentiment feedback knowledge base, the second layer of nodes represents sentiment feature words, negative feature words and degree feature words, and each related node has a leaf node, which stores the corresponding feature thesaurus, in which each feature word Dynamically reserve two storage spaces to save the score information and index information of the corresponding feature words in the text to be extracted, including the emotional feature word score Score(E) and its index Index(E); Negative feature word score Score (N) and its index Index(N); the degree feature word score Score(D) and its index Index(D).
在本公开一些实施例中,所述步骤S22中,所述的FWME算法用于对灾情信息抽取知识库中各类特征词分别从纵向和横向两个维度作特征扩展,其中,In some embodiments of the present disclosure, in the step S22, the FWME algorithm is used to perform feature expansion for various types of feature words in the disaster information extraction knowledge base respectively from vertical and horizontal dimensions, wherein,
纵向扩充包括利用FWME算法中集成的词向量模型,以互联网涉灾文本作为语料,对各类特征词作相似度计算,取满足阈值的相似词,作为知识库中待扩展特征词的纵向补充词;Vertical expansion includes using the word vector model integrated in the FWME algorithm, using the Internet disaster-related text as the corpus, calculating the similarity of various feature words, and taking similar words that meet the threshold as vertical supplementary words for the feature words to be expanded in the knowledge base. ;
横向扩充包括在纵向扩展得基础上,对已有的特征词,利用词语间的同义关系,将满足同义条件的词作为该知识库中各特征词的同义补充词。Horizontal expansion includes, on the basis of vertical expansion, for existing feature words, using the synonymous relationship between words, the words that meet the synonymy conditions are used as synonymous supplementary words for each feature word in the knowledge base.
在本公开一些实施例中,所述步骤S23中,所述的基于风灾知识库抽取灾害损失信息,包括如下步骤:In some embodiments of the present disclosure, in the step S23, the extraction of disaster loss information based on the wind disaster knowledge base includes the following steps:
S231:对待处理文本按标点断句形成短句集合{S1,S2,...Si,i>=1},Si表示待处理文本断句后形成的短句,并对各短句分别分词、去停用词,同时记录剩余各词在各短句中的索引位置信息; S231 : The text to be processed is segmented according to punctuation to form a set of short sentences {S 1 , S 2 , . Participle words, remove stop words, and record the index position information of the remaining words in each short sentence;
S232:将S231中各短句中的词分别与灾损知识库各灾损类节点下的叶子节点中的特征词对匹配,当完全满足某一灾损类别的特征词对时,就记录该词在句中的索引位置Index(w1)和Index(w2),将该句子标记为该灾损类别的候选句。S232: Match the words in each short sentence in S231 with the feature word pairs in the leaf nodes under each disaster damage category node of the disaster damage knowledge base, and record the feature word pair when the feature word pair of a certain disaster damage category is completely satisfied. Index (w 1 ) and Index (w 2 ) of the word in the sentence, mark the sentence as a candidate sentence of the disaster damage category.
S233:判断该候选句中的其他词是否能与灾损信息识别知识库中的否定词匹配,若能匹配,记录该否定词的索引位置Index(N)。S233: Determine whether other words in the candidate sentence can match the negative words in the disaster damage information identification knowledge base, and if they match, record the index position Index(N) of the negative words.
S234:当且仅当Index(N)<Index(w1)或Index(N)<Index(w2)时,该候选句隶属于对应的灾损类别。S234: If and only if Index(N)<Index(w 1 ) or Index(N)<Index(w 2 ), the candidate sentence belongs to the corresponding disaster damage category.
在本公开一些实施例中,所述步骤S23中,所述风灾知识库抽取灾害受众群体的情绪反馈信息,包括如下步骤:In some embodiments of the present disclosure, in the step S23, the wind disaster knowledge base extracts the emotional feedback information of the disaster audience, including the following steps:
S231’:对待处理文本按标点断句形成短句集合{S1,S2,...Si,i>=1},并对各短句分别分词、去停用词,同时记录剩余各词在各短句中的索引位置信息。S231': Segment the text to be processed according to punctuation to form a short sentence set {S 1 , S 2 , . Index position information in each phrase.
S232’:将S231中的各短句中的词分别与公众情绪反馈知识库中的各类情感特征词、否定词以及程度词节点下的叶子节点中的特征词匹配,对完成匹配的特征词记录其索引位置信息Index(E)、Index(N)、Index(D)和分值Score(E)、Score(N)、Score(D)。S232': Match the words in each short sentence in S231 with various emotional feature words, negative words and feature words in the leaf nodes under the degree word node in the public emotion feedback knowledge base respectively, and match the matched feature words Record its index position information Index(E), Index(N), Index(D) and scores Score(E), Score(N), Score(D).
S233’:当否定词和程度词的索引值小于情感词的索引值,则将Score(N)、Score(D)分别与对应的情感值Score(E)作乘积,然后将最终的得分求和,形成该短句的得分Score(Si)。S233': when the index value of the negative word and the degree word is less than the index value of the sentiment word, then multiply Score(N) and Score(D) with the corresponding sentiment value Score(E) respectively, and then sum the final scores , forming the score Score(S i ) for the short sentence.
S234’:设置阈值序列,将各文本得分Score(S)放入阈值序列中,从而得到文本的情感类别。S234': Set a threshold sequence, and put each text score Score(S) into the threshold sequence, so as to obtain the sentiment category of the text.
在本公开一些实施例中,所述的步骤S3中的多维涉灾信息,是利用步骤S2构建的灾情信息抽取知识库抽取与指定灾害事件相关的互联网文本数据中蕴含的灾损信息和公众情绪反馈信息,同时结合互联网文本的时间戳信息和蕴含的位置信息,构建的四维灾情数组[loss,emotion,time,location]。In some embodiments of the present disclosure, the multi-dimensional disaster-related information in step S3 is to use the disaster information extraction knowledge base constructed in step S2 to extract disaster damage information and public sentiment contained in Internet text data related to a designated disaster event Feedback information, combined with the timestamp information and the location information contained in the Internet text, to construct a four-dimensional disaster array [loss, emotion, time, location].
在本公开一些实施例中,所述的步骤S3中根据多维涉灾信息对受灾区作监测分析,包括将构建的四维灾情数组[loss,emotion,time,location]在矢量地图上展布并作时空分析,从而对灾害发生的整个过程作详细的监测和实时的分析,最终结果用于灾情实时监测、灾损评估分析、灾情救援反馈以及后续影响评估。In some embodiments of the present disclosure, in step S3, monitoring and analysis of the disaster-affected area is performed according to the multi-dimensional disaster-related information, including spreading the constructed four-dimensional disaster situation array [loss, emotion, time, location] on a vector map and making Space-time analysis, so as to conduct detailed monitoring and real-time analysis of the entire process of disaster occurrence, and the final results are used for real-time disaster monitoring, disaster damage assessment and analysis, disaster rescue feedback and subsequent impact assessment.
(三)有益效果(3) Beneficial effects
从上述技术方案可以看出,本公开抽取互联网多维涉灾信息的灾情监测分析方法,至少具有以下有益效果其中之一:It can be seen from the above technical solutions that the disclosed method for monitoring and analyzing disaster situations for extracting multi-dimensional disaster-related information from the Internet has at least one of the following beneficial effects:
(1)通过构建的灾情信息抽取知识库精确地抽取微博等互联网平台中蕴含的灾害损失信息和灾害过程中受众群体的情绪反馈信息,并结合时间和空间维度作联合分析,详细刻画了灾情进展过程,并辅助用于灾情实时监测、灾损评估分析、灾情救援反馈以及后续影响评估等工作;(1) Accurately extract the disaster loss information contained in Internet platforms such as Weibo and the emotional feedback information of the audience during the disaster process through the constructed disaster information extraction knowledge base, and combine the time and space dimensions for joint analysis to describe the disaster situation in detail. progress, and assist in real-time disaster monitoring, disaster damage assessment and analysis, disaster rescue feedback, and follow-up impact assessment;
(2)通过特征词多维扩展算法,克服了短文本上下文特征稀疏,细粒度灾情损失信息抽取和分类困难的缺点,从而实现了互联网文本中蕴含灾损信息的自动化挖掘,高效的辅助减灾工作的进行。(2) Through the feature word multi-dimensional expansion algorithm, it overcomes the shortcomings of sparse context features of short texts, and the difficulty in extracting and classifying fine-grained disaster loss information, thereby realizing the automatic mining of disaster loss information contained in Internet texts, and efficiently assisting disaster reduction work. conduct.
附图说明Description of drawings
图1为本公开实施例抽取互联网多维涉灾信息的灾情监测分析方法的流程示意图。FIG. 1 is a schematic flowchart of a disaster monitoring and analysis method for extracting Internet multi-dimensional disaster-related information according to an embodiment of the present disclosure.
图2为本公开实施例灾情信息抽取知识库的结构示意图。FIG. 2 is a schematic structural diagram of a disaster information extraction knowledge base according to an embodiment of the present disclosure.
图3为本公开实施例特征词多维扩展算法FWME对知识库各特征词作纵向和横向扩充方法的流程示意图。FIG. 3 is a schematic flowchart of a method for vertically and horizontally expanding each feature word in a knowledge base by a feature word multi-dimensional expansion algorithm FWME according to an embodiment of the present disclosure.
图4为本公开实施例台风过境过程中各类别灾损的空间分布状况示意图。FIG. 4 is a schematic diagram of the spatial distribution of various types of disaster damage during the typhoon transit process according to an embodiment of the present disclosure.
图5为本公开实施例台风过境过程中各类别灾损的时空分布序列和情感反馈变化序列示意图。FIG. 5 is a schematic diagram of a spatiotemporal distribution sequence and an emotional feedback change sequence of various types of disaster damage during a typhoon transit process according to an embodiment of the present disclosure.
具体实施方式Detailed ways
本公开提供了一种基于多维涉灾信息联合分析的灾害监测分析方法,采用一种特征词多维扩展的算法FWME(Feature Words Multidimensional Extension)来辅助构建灾情信息抽取知识库,用以解决细粒度多维涉灾信息抽取精度不高和现有灾情监测分析方法不足的问题,并通过该知识库抽取互联网文本中蕴含的多维涉灾信息,通过分析时空序列下灾损变化特征以及灾害受众群体的情绪变化特征来评估灾情损失和政府灾后救援活动。The present disclosure provides a disaster monitoring and analysis method based on the joint analysis of multi-dimensional disaster-related information. A feature word multidimensional extension algorithm FWME (Feature Words Multidimensional Extension) is used to assist in the construction of a disaster information extraction knowledge base, so as to solve the problem of fine-grained multidimensional extension. The extraction accuracy of disaster-related information is not high and the existing disaster monitoring and analysis methods are insufficient, and the multi-dimensional disaster-related information contained in the Internet text is extracted through this knowledge base, and the change characteristics of disaster damage and the emotional changes of the disaster audience are analyzed by analyzing the time and space sequence. Features to assess disaster damage and government post-disaster relief activities.
本发明采用的整体技术方案包括:The overall technical scheme adopted by the present invention includes:
S1、多源数据的实时获取与预处理。S1. Real-time acquisition and preprocessing of multi-source data.
S2、提出了特征词多维扩展算法FWME辅助构建灾情信息抽取知识库来抽取灾害损失信息以及灾害受众群体的情绪反馈信息;S2. A feature word multi-dimensional expansion algorithm FWME is proposed to assist in the construction of a disaster information extraction knowledge base to extract disaster loss information and the emotional feedback information of disaster audiences;
S3、根据多维涉灾信息对受灾区作监测分析。S3. Monitor and analyze the disaster area according to the multi-dimensional disaster-related information.
所述S1中多源数据的实时获取,获取包括来自于论坛贴吧、微信公众号、新闻、新浪微博等互联网平台的数据,使用与指定灾情相关的关键词作为搜索条件。预处理包括文本去重、繁简转换以及全半角转换等。同时,对文本的时间和空间位置信息单独提取并保存至数据库。所述文本的时间和空间位置信息包括时间戳信息及位置信息等。The real-time acquisition of multi-source data in S1 includes data from Internet platforms such as forum posts, WeChat public accounts, news, Sina Weibo, etc., using keywords related to the designated disaster situation as search conditions. Preprocessing includes text de-duplication, simple-to-traditional conversion, and full-width conversion. At the same time, the temporal and spatial position information of the text is separately extracted and saved to the database. The time and space location information of the text includes time stamp information, location information, and the like.
所述S2中通过构建灾情信息抽取知识库来获取文本中的多维灾情信息,知识库中储存了丰富的用以表示各类灾害的灾损特征词对信息以及灾害中受众群体的情绪反馈特征词信息,此外还存储了用于精确表达语义特征的否定词信息和程度词信息。知识库中的各特征词通过本发明提出的特征词多维扩展算法FWME进行丰富和补充。The multi-dimensional disaster information in the text is obtained by constructing a disaster information extraction knowledge base in S2, and the knowledge base stores a wealth of disaster damage characteristic word pair information used to represent various disasters and the emotional feedback characteristic words of the audience in the disaster. In addition, negative word information and degree word information used to accurately express semantic features are also stored. Each feature word in the knowledge base is enriched and supplemented by the feature word multidimensional expansion algorithm FWME proposed by the present invention.
所述步骤S2利用特征词多维扩展算法FWME算法辅助构建灾情信息抽取知识库来抽取灾害损失信息以及灾害受众群体的情绪反馈信息,包括:The step S2 utilizes the feature word multidimensional expansion algorithm FWME algorithm to assist in the construction of a disaster information extraction knowledge base to extract disaster loss information and the emotional feedback information of the disaster audience, including:
S21、构建灾情信息抽取知识库S21. Build a knowledge base for disaster information extraction
所述的灾情信息抽取知识库包括抽取互联网文本中的灾害损失信息和灾害受众群体的情绪反馈信息,利用文本中蕴含的灾损和公众情绪特征词作识别和分类。如灾损特征词对“断-电”等,公众情绪反馈词如“高兴”“难过”等,此外还考虑对文本语义起反作用的否定词和促进作用的程度词。The disaster information extraction knowledge base includes extracting the disaster loss information and the emotional feedback information of the disaster audience in the Internet text, and using the disaster damage and public emotion feature words contained in the text for identification and classification. For example, the characteristic words of disaster damage are “off-electricity”, public sentiment feedback words such as “happy” and “sad”, etc. In addition, negative words and promotion words that have a negative effect on the semantics of the text are also considered.
所述的灾情信息抽取知识库,其结构可表示为一棵高度为4的树,其中该树的第2层有两棵子树,左子树表示灾损信息识别知识库,其下节点表示各灾损类别和否定词,各灾损类别节点包含不固定的多个叶子节点,各叶子节点代表的实体信息相互独立。每个叶子节点存储多个表示灾损事件的特征词对。否定词节点包含一系列否定词信息,储存在其叶子节点中。此外各叶子节点中的特征词和否定词都动态预留一个存储空间,用以保存待抽取文本中特征词或否定词的索引位置信息,即灾损特征词索引Index(Wi)和否定特征词索引Index(N)。第2层右子树表示公众情绪反馈知识库,其结构与灾损信息识别知识库相似,第2层节点表示情感类别、否定词和程度词,相关节点各有一个叶子节点,储存对应的特征词库,其中的各特征词动态预留二个存储空间,用以保存待抽取文本中对应特征词的分值信息和索引信息,即情感特征词分值Score(E)和索引Index(E);否定特征词分值Score(N)和索引Index(N);程度特征词分值Score(D)和索引Index(D)。The structure of the disaster information extraction knowledge base can be represented as a tree with a height of 4, wherein the second layer of the tree has two subtrees, the left subtree represents the disaster damage information identification knowledge base, and the lower node represents each Disaster damage category and negative word, each disaster damage category node includes multiple leaf nodes that are not fixed, and the entity information represented by each leaf node is independent of each other. Each leaf node stores multiple feature word pairs representing disaster events. Negative word node contains a series of negative word information, stored in its leaf nodes. In addition, the feature words and negative words in each leaf node dynamically reserve a storage space to save the index position information of the feature words or negative words in the text to be extracted, that is, the index of the disaster feature word Index (Wi) and the negative feature word Index Index(N). The right subtree of the second layer represents the public sentiment feedback knowledge base, and its structure is similar to that of the disaster damage information identification knowledge base. The second layer nodes represent sentiment categories, negative words and degree words, and the relevant nodes each have a leaf node to store the corresponding features. Thesaurus, in which each feature word dynamically reserves two storage spaces to save the score information and index information of the corresponding feature word in the text to be extracted, that is, the emotional feature word score Score (E) and index Index (E) ; Negative feature word score Score(N) and index Index(N); Degree feature word score Score(D) and index Index(D).
S22、基于特征词多维扩展算法FWME算法纵向和横向扩展灾情信息抽取知识库中的各类特征词。S22 , extracting various feature words in the knowledge base based on the feature word multidimensional expansion algorithm FWME algorithm to vertically and horizontally expand disaster information.
所述的FWME算法是用来扩展灾情信息抽取知识库中各类灾情相关特征词得,分别从纵向和横向两个维度作特征扩展,其纵向扩充是利用FWME算法中集成的词向量模型,以大量的互联网涉灾文本作为语料(包括论坛贴吧、微信公众号、新闻网站、新浪微博等提供的文本数据)对各类特征词作相似度计算,取满足阈值的相似词(且与待扩展特征词词性一致)作为知识库中待扩展特征词的纵向补充词。横向扩充是在纵向扩展得基础上,对已有的特征词,利用词语间的同义关系,将满足同义条件的词作为该知识库中各特征词的同义补充词,其中同义计算以《同义词词林》为语料。The FWME algorithm is used to extract various disaster-related feature words in the knowledge base of disaster information extraction, and the feature expansion is carried out from the vertical and horizontal dimensions respectively. A large number of Internet disaster-related texts are used as corpus (including text data provided by forum posts, WeChat public accounts, news websites, Sina Weibo, etc.) to calculate the similarity of various feature words, and select similar words that meet the threshold (and the same words to be expanded). The feature words have the same part of speech) as the vertical supplementary words of the feature words to be expanded in the knowledge base. Horizontal expansion is based on the vertical expansion, using the synonymous relationship between the existing feature words to use the words that meet the synonymous conditions as the synonymous supplementary words of each feature word in the knowledge base, where the synonyms are calculated. Take "Synonym Cilin" as the corpus.
S23、抽取灾害损失信息以及灾害受众群体的情绪反馈信息。S23, extracting disaster loss information and emotional feedback information of disaster audience groups.
所述S2中通过灾情信息抽取知识库来抽取灾害损失信息,包括如下步骤:In S2, the disaster loss information is extracted through the disaster situation information extraction knowledge base, including the following steps:
S231:对待处理文本按标点断句形成短句集合{S1,S2,...Si,i>=1},Si表示待处理文本断句后形成的短句,并对各短句分别分词、去停用词,同时记录剩余各词在各短句中的索引位置信息,。S231: The text to be processed is segmented according to punctuation to form a short sentence set {S1, S2, ...Si, i >=1}, Si represents the short sentences formed after the text to be processed is segmented, and each short sentence is divided into words and removed. Stop words, and record the index position information of the remaining words in each short sentence.
S232:将(S231)中的各短句中的词分别与灾损知识库各灾损类节点下的叶子节点中的特征词对匹配,当完全满足某一灾损类别的特征词对时,同时记录该词在句中的索引位置Index(w1)和Index(w2),且将该句子标记为该灾损类别的候选。S232: Match the words in each short sentence in (S231) with the feature word pairs in the leaf nodes under each disaster damage class node of the disaster damage knowledge base, and when the feature word pair of a certain disaster damage class is completely satisfied, At the same time, the index positions Index(w1) and Index(w2) of the word in the sentence are recorded, and the sentence is marked as a candidate for the disaster damage category.
S233:然后判断该候选句中的其他词是否能与灾损知识库中的否定词匹配,若能匹配,则记录该否定词的索引位置Index(N)。S233: Then judge whether other words in the candidate sentence can match the negative words in the disaster damage knowledge base, and if they match, record the index position Index(N) of the negative words.
S234:当且仅当Index(N)<Index(w1)或Index(N)<Index(w2)时,该候选句隶属于对应的灾损类别。S234: If and only if Index(N)<Index(w1) or Index(N)<Index(w2), the candidate sentence belongs to the corresponding disaster damage category.
所述S2中灾情信息抽取知识库抽取灾害受众群体的情绪反馈信息,包括如下步骤:The disaster information extraction knowledge base in the S2 extracts the emotional feedback information of the disaster audience, including the following steps:
S231’:对待处理文本按标点断句形成短句集合{S1,S2,...Si,i>=1},并对各短句分别分词、去停用词,同时记录剩余各词在各短句中的索引位置信息。S231': Segment the text to be processed according to punctuation to form a short sentence set {S1, S2, ...Si, i>=1}, segment each short sentence, remove stop words, and record the remaining words in each short sentence. The index position information in the sentence.
S232’:将S231’中的各短句中的词分别与公众情绪反馈知识库中的各类情感特征词、否定词以及程度词节点下的叶子节点中的特征词匹配,对完成匹配的特征词记录其索引位置信息Index(E)、Index(N)、Index(D)和分值Score(E)、Score(N)、Score(D)。S232': Match the words in each short sentence in S231' with the various emotional feature words, negative words and the feature words in the leaf nodes under the degree word node in the public emotion feedback knowledge base respectively, and compare the matched features. The word records its index position information Index(E), Index(N), Index(D) and scores Score(E), Score(N), Score(D).
S233’:当否定词和程度词的索引值小于情感词的索引值,则将Score(N)、Score(D)分别与对应的情感值Score(E)作乘积,然后将最终的得分求和,形成该短句的得分Score(Si)。S233': when the index value of the negative word and the degree word is less than the index value of the sentiment word, then multiply Score(N) and Score(D) with the corresponding sentiment value Score(E) respectively, and then sum the final scores , forming the Score(Si) of the short sentence.
S234’:最终整个句子得分Score(S)为各短句的得分求和∑i≥1Score(Si)。S234': The final score Score(S) of the entire sentence is the summation of the scores of each short sentence ∑ i≥1 Score(S i ).
设置阈值序列,将各文本得分Score(S)放入阈值序列中,从而得到文本的情感类别。A threshold sequence is set, and each text score Score(S) is put into the threshold sequence, so as to obtain the sentiment category of the text.
所述的步骤S3中的多维涉灾信息,是利用S2构建的灾情信息抽取知识库抽取与指定灾害事件相关的互联网文本数据中蕴含的灾损信息和公众情绪反馈信息,同时结合互联网文本的时间戳信息和蕴含的位置信息,从而构建四维灾情数组[loss,emotion,time,location]。The multi-dimensional disaster-related information in the step S3 is to use the disaster information extraction knowledge base constructed in S2 to extract the disaster damage information and public sentiment feedback information contained in the Internet text data related to the designated disaster event, and combine the time of the Internet text. Stamp information and implied location information to construct a four-dimensional disaster array [loss, emotion, time, location].
所述的步骤S3中根据多维涉灾信息对受灾区作监测分析,是将构建的四维灾情数组[loss,emotion,time,location]在矢量地图上展布并作时空分析,从而对灾害发生的整个过程作详细的监测和实时的分析,最终结果可用于灾损评估分析、灾情救援反馈、灾害后续影响评估等。In the described step S3, monitoring and analyzing the disaster-affected area according to the multi-dimensional disaster-related information is to spread the constructed four-dimensional disaster situation array [loss, emotion, time, location] on the vector map and make a time-space analysis, so as to analyze the disaster situation. The whole process is monitored in detail and analyzed in real time, and the final results can be used for disaster damage assessment and analysis, disaster rescue feedback, disaster follow-up impact assessment, etc.
本公开提出了一种抽取互联网多维涉灾信息的灾情监测分析方法,通过本公开构建的灾情信息抽取知识库精确的抽取微博中蕴含的灾害损失信息和灾害过程中受众群体的情绪反馈信息,并结合时间和空间维度作联合分析,详细刻画了灾情进展过程,并辅助用于灾情实时监测、灾损评估分析、灾情救援反馈以及后续影响评估等工作。The present disclosure proposes a disaster monitoring and analysis method for extracting multi-dimensional disaster-related information from the Internet. The disaster information extraction knowledge base constructed by the present disclosure can accurately extract the disaster loss information contained in the microblog and the emotional feedback information of the audience during the disaster process. Combined with the time and space dimensions for joint analysis, the disaster progress process is described in detail, and it is used for real-time disaster monitoring, disaster damage assessment and analysis, disaster rescue feedback, and subsequent impact assessment.
为使本公开的目的、技术方案和优点更加清楚明白,以下结合具体实施例,并参照附图,对本公开进一步详细说明。In order to make the objectives, technical solutions and advantages of the present disclosure clearer, the present disclosure will be further described in detail below with reference to the specific embodiments and the accompanying drawings.
本公开某些实施例于后方将参照所附附图做更全面性地描述,其中一些但并非全部的实施例将被示出。实际上,本公开的各种实施例可以许多不同形式实现,而不应被解释为限于此数所阐述的实施例;相对地,提供这些实施例使得本公开满足适用的法律要求。Certain embodiments of the present disclosure will be described more fully hereinafter with reference to the accompanying drawings, some but not all embodiments of which are shown. Indeed, various embodiments of the present disclosure may be embodied in many different forms and should not be construed as limited to the embodiments set forth in this number; rather, these embodiments are provided so that this disclosure will satisfy applicable legal requirements.
在本公开的第一个示例性实施例中,提供了一种抽取互联网多维涉灾信息的灾情监测分析方法。以2017年8月23日珠海市台风“天鸽”为实施例,对受风灾影响的区域作实时的细粒度监测分析,本发明基于互联网多平台获取多源涉灾文本,然后使用本文提出的特征词多维扩展算法FWME(Feature Words Multidimensional Extension)辅助构建灾情信息抽取知识库来抽取互联网文本中蕴含的灾害损失信息以及灾害受众群体的情绪反馈信息。最后结合文本的时间、空间位置信息进行联合分析,辅助灾损评估、灾损救援以及获取救援反馈等。In the first exemplary embodiment of the present disclosure, a disaster monitoring and analysis method for extracting Internet multi-dimensional disaster-related information is provided. Taking the typhoon "Hato" in Zhuhai City on August 23, 2017 as an example, the real-time fine-grained monitoring and analysis of the area affected by the typhoon is carried out. The feature word multidimensional extension algorithm FWME (Feature Words Multidimensional Extension) assists the construction of a disaster information extraction knowledge base to extract the disaster loss information contained in the Internet text and the emotional feedback information of the disaster audience. Finally, combined with the time and space position information of the text, a joint analysis is carried out to assist in disaster damage assessment, disaster rescue and rescue feedback.
图1为本公开实施例抽取互联网多维涉灾信息的灾情监测分析方法的流程示意图。从图中可以看出,本方法包括如下流程:FIG. 1 is a schematic flowchart of a disaster monitoring and analysis method for extracting Internet multi-dimensional disaster-related information according to an embodiment of the present disclosure. As can be seen from the figure, the method includes the following processes:
S1、多源数据的实时获取与预处理。S1. Real-time acquisition and preprocessing of multi-source data.
S2、提出特征词多维扩展算法FWME(Feature Words MultidimensionalExtension)辅助构建台风灾情信息抽取知识库来抽取风灾损失信息以及风灾受众群体的情绪反馈信息;S2. Propose a feature word multidimensional extension algorithm FWME (Feature Words MultidimensionalExtension) to assist in the construction of a knowledge base for typhoon disaster information extraction to extract the typhoon disaster loss information and the emotional feedback information of the typhoon disaster audience;
S3、根据多维涉灾信息对受风灾区域作监测分析。S3. Monitor and analyze the wind-affected area according to the multi-dimensional disaster-related information.
所述S1多源数据的实时获取与预处理。获取监测分析区域的互联网多源数据,数据来源于论坛贴吧、微信公众号、新闻、新浪微博等平台。然后对所有数据进行预处理,包括去重、繁简转换以及全半角转换等,同时解析出其中包含的文本上传的时间信息和位置信息,并另起字段存储。Real-time acquisition and preprocessing of the S1 multi-source data. Obtain the Internet multi-source data in the monitoring and analysis area, and the data comes from forums and posts, WeChat public accounts, news, Sina Weibo and other platforms. Then preprocess all the data, including de-duplication, simple-to-traditional conversion, and full-half-width conversion, etc. At the same time, parse out the time information and location information of the text upload contained in it, and store it in a separate field.
所述S2中通过构建台风灾情信息抽取知识库来获取文本中蕴含的风灾损失信息以及风灾受众群体的情绪反馈信息。该知识库中储存的用以表示不同灾损的特征词对以及灾害中受众群体的情绪反馈特征词通过本发明提出的特征词多维扩展算法FWME进行丰富和补充,此外,该知识库中还包括一些用于精确表达语义特征的否定词和程度词。In the step S2, the typhoon disaster information extraction knowledge base is constructed to obtain the typhoon disaster loss information contained in the text and the emotional feedback information of the typhoon disaster audience groups. The feature word pairs used to represent different disasters and the emotional feedback feature words of the audience in the disaster stored in the knowledge base are enriched and supplemented by the feature word multidimensional expansion algorithm FWME proposed by the present invention. In addition, the knowledge base also includes Some negative words and degree words used to express semantic features precisely.
图2为本公开实施例灾情信息抽取知识库的结构示意图,如图2所示,其结构可表示为一棵高度为4的树,其中该树的第2层有两棵子树,左子树表示灾损信息识别知识库,其下节点表示各灾损类别和否定词,各灾损类别节点包含不固定的多个叶子节点,各叶子节点代表的实体信息相互独立。每个叶子节点存储多个表示灾损事件的特征词对。否定词节点包含一系列否定词信息,储存在其叶子节点中。此外各叶子节点中的特征词和否定词都动态预留一个存储空间,用以保存待抽取文本中特征词或否定词的索引位置信息,即灾损特征词索引Index(Wi)和否定特征词索引Index(N)。第2层右子树表示公众情绪反馈知识库,其结构与灾损信息识别知识库相似,第2层节点表示情感类别、否定词和程度词,相关节点各有一个叶子节点,储存对应的特征词库,其中的各特征词动态预留二个存储空间,用以保存待抽取文本中对应特征词的分值信息和索引信息,即情感特征词得分Score(E)和索引Index(E);否定特征词得分Score(N)和索引Index(N);程度特征词得分Score(D)和索引Index(D)。FIG. 2 is a schematic structural diagram of a disaster information extraction knowledge base according to an embodiment of the present disclosure. As shown in FIG. 2, its structure can be represented as a tree with a height of 4, wherein the second layer of the tree has two subtrees, the left subtree Represents the knowledge base of disaster damage information identification, the lower node represents each disaster damage category and negative words, each disaster damage category node contains multiple leaf nodes that are not fixed, and the entity information represented by each leaf node is independent of each other. Each leaf node stores multiple feature word pairs representing disaster events. Negative word node contains a series of negative word information, stored in its leaf nodes. In addition, the feature words and negative words in each leaf node dynamically reserve a storage space to save the index position information of the feature words or negative words in the text to be extracted, that is, the index of the disaster feature word Index (Wi) and the negative feature word Index Index(N). The right subtree of the second layer represents the public sentiment feedback knowledge base, and its structure is similar to that of the disaster damage information identification knowledge base. The second layer nodes represent sentiment categories, negative words and degree words, and the relevant nodes each have a leaf node to store the corresponding features. Thesaurus, in which each feature word dynamically reserves two storage spaces to store the score information and index information of the corresponding feature word in the text to be extracted, that is, the emotional feature word score Score(E) and index Index(E); Negative feature word score Score(N) and index Index(N); degree feature word score Score(D) and index Index(D).
本实施例中灾损类别共分为11类,如表1所示In this embodiment, the categories of disaster damage are divided into 11 categories, as shown in Table 1
表1Table 1
本实施例中情感类别分为3类,如表2所示In this embodiment, the emotion categories are divided into three categories, as shown in Table 2
表2Table 2
设置公众情绪反馈知识库中各特征词的分值,其中各正面情感特征词分值为Score(Positive)=1,负面情感特征词对应分值为Score(Negative)=-1,否定词分值为Score(Negation)=-1,程度词分值为Score(Degree)=1.5。Set the score of each feature word in the public sentiment feedback knowledge base, in which the score of each positive sentiment feature word is Score(Positive)=1, the corresponding score of the negative sentiment feature word is Score(Negative)=-1, and the negative word score is is Score(Negation)=-1, and the degree word score is Score(Degree)=1.5.
图3所示为所述S2中特征词多维扩展算法FWME对知识库各特征词作纵向和横向扩充的方法流程图。FIG. 3 is a flow chart of the method for vertically and horizontally expanding each feature word in the knowledge base by the feature word multi-dimensional expansion algorithm FWME in S2.
其纵向扩充是使用FWME算法中集成的词向量模型,以所获取的多源互联网涉灾文本为语料库,对待扩展特征词作相似度计算,取满足阈值以内的相似词作为知识库的纵向补充词。取满足相似度阈值且词性相同的词作为该待扩展特征词的纵向补充词。The vertical expansion is to use the word vector model integrated in the FWME algorithm, take the multi-source Internet disaster-related texts as the corpus, calculate the similarity of the extended feature words, and take the similar words within the threshold as the vertical supplementary words of the knowledge base. . The words that satisfy the similarity threshold and have the same part of speech are selected as the vertical supplementary words of the feature word to be expanded.
阈值取与目标词相似度最近的4个词。The threshold is the 4 words with the closest similarity to the target word.
纵向补充,增加了词的深度,如“树”的同语境词“电线杆”、“路灯”等,“倒”的同语境词“折断”、“吹垮”等。如表3所示。Longitudinal supplementation increases the depth of words, such as "tree" in the same context as "telephone pole", "street lamp", etc., and "down" in the same context as "broken", "blown down" and so on. as shown in Table 3.
表3table 3
其横向扩充是在纵向扩展得基础上,对已有得特征词,利用词语间的同义关系,将满足同义条件的词集作为知识库中各特征词的补充。例如,同义计算以《同义词词林》为语料,取满足词间同义条件的词作为该词的同义扩展词。The horizontal expansion is based on the vertical expansion, for the existing feature words, using the synonymous relationship between the words, the word set that satisfies the synonymy conditions is used as the supplement of each feature word in the knowledge base. For example, in the synonym calculation, the "Synonym Cilin" is used as the corpus, and the word that satisfies the synonymy conditions between words is taken as the synonym expansion word of the word.
同义条件设置为以已有的特征词所在《同义词词林》的位置为基准,获取该词所在的原子词集中的所有词作为其同义扩展词。The synonym condition is set to take the position of the existing feature word in the "Synonym Word Forest" as the benchmark, and obtain all the words in the atomic word set where the word is located as its synonym expansion word.
横向扩充,扩充特征词的同义词,如“栏杆”的同义词扩充如表4所示。Horizontal expansion, expand the synonyms of feature words, such as the synonym expansion of "railing" as shown in Table 4.
表4Table 4
同样的方法对情感特征词作多维扩展,以丰富情感特征词。The same method is used for multi-dimensional expansion of sentiment feature words to enrich sentiment feature words.
通过特征词扩展,风灾信息抽取知识库中的灾损特征词和灾害受众群体的情感反馈特征词得以丰富。Through feature word expansion, the disaster damage feature words in the knowledge base of wind disaster information extraction and the emotional feedback feature words of disaster audience groups are enriched.
所述S2中通过风灾信息抽取知识库抽取灾害损失信息,包括如下步骤:In S2, the disaster loss information is extracted from the knowledge base of wind disaster information extraction, including the following steps:
(1)对待处理文本按标点断句形成短句集合{S1,S2,...Si,i>=1},并对各短句分别分词、去停用词,同时记录剩余各词在各短句中的索引位置信息。(1) The text to be processed is segmented according to punctuation to form a short sentence set {S 1 , S 2 , ... S i , i >= 1}, and each short sentence is divided into words and stop words are removed, and the remaining words are recorded at the same time. Index position information in each phrase.
(2)将(1)中的各短句中的词分别与灾损知识库各灾损类节点下的叶子节点中的特征词对匹配,当完全满足某一灾损类别的特征词对时,同时记录该词在句中的索引位置Index(w1)和Index(w2),且将该句子标记为该灾损类别的候选。(2) Match the words in each short sentence in (1) with the feature word pairs in the leaf nodes under each disaster damage category node of the disaster damage knowledge base respectively, when the feature word pairs of a certain disaster damage category are completely satisfied , and simultaneously record the index positions Index(w 1 ) and Index(w 2 ) of the word in the sentence, and mark the sentence as a candidate for the disaster category.
(3)然后判断该候选句中的其他词是否能与灾损知识库中的否定词匹配,若能匹配,则记录该否定词的索引位置Index(N)。(3) Then judge whether other words in the candidate sentence can match the negative words in the disaster damage knowledge base, and if they can match, record the index position Index(N) of the negative words.
(4)当且仅当Index(N)<Index(W1)或Index(N)<Index(W2)时,该候选句隶属于对应的灾损类别。(4) If and only if Index(N)<Index(W 1 ) or Index(N)<Index(W 2 ), the candidate sentence belongs to the corresponding disaster category.
所述S2中风灾信息抽取知识库抽取灾害受众群体的情绪反馈信息,包括如下步骤:The S2 stroke disaster information extraction knowledge base extracts the emotional feedback information of the disaster audience, including the following steps:
(1)对待处理文本按标点断句形成短句集合{S1,S2,...Si,i>=1},并对各短句分别分词、去停用词,同时记录剩余各词在各短句中的索引位置信息。(1) The text to be processed is segmented according to punctuation to form a short sentence set {S 1 , S 2 , ... S i , i >= 1}, and each short sentence is divided into words and stop words are removed, and the remaining words are recorded at the same time. Index position information in each phrase.
(2)将(1)中的各短句中的词分别与公众情绪反馈知识库中的正面情感、负面情感、否定词以及程度词节点下的叶子节点中的特征词匹配,对完成匹配的特征词记录其索引位置信息Index(Positive)、Index(Negative)、Index(Negation)、Index(Degree)和分值Score(Positive)、Score(Negative)、Score(Negation)、Score(Degree)。(2) Match the words in each short sentence in (1) with the positive emotions, negative emotions, negative words in the public sentiment feedback knowledge base, and the characteristic words in the leaf nodes under the degree word node, respectively. The feature word records its index position information Index(Positive), Index(Negative), Index(Negation), Index(Degree) and scores Score(Positive), Score(Negative), Score(Negation), Score(Degree).
(3)当否定词和程度词的索引值小于情感词的索引值,则将Score(Negation)、Score(Degree)分别与对应的情感值作乘积,然后将最终的正负得分求和,形成该短句的得分Score(Si)。(3) When the index value of the negative word and the degree word is less than the index value of the sentiment word, the Score(Negation) and Score(Degree) are respectively multiplied with the corresponding sentiment value, and then the final positive and negative scores are summed to form Score(S i ) for the phrase.
(4)最终整个句子得分Score(S)为各短句的得分的和。(4) The final whole sentence score Score(S) is the sum of the scores of each short sentence.
Score(S)=∑i≥1Score(Si)。Score(S)=∑ i≥1 Score(S i ).
本实施例情感阈值设置为1位,数值为0;In this embodiment, the emotion threshold is set to 1 bit, and the value is 0;
当Score(S)<0,该句情感值为负。Score(S)>0,该句情感值为正。Score(S)=0,该句情感值为中性。When Score(S)<0, the sentiment value of the sentence is negative. Score(S)>0, the sentiment value of the sentence is positive. Score(S)=0, the sentiment value of the sentence is neutral.
所述S3中根据多维涉灾信息对受灾区作监测分析,将时间信息、空间位置信息、灾损信息以及公众情绪反馈信息构建四维信息数组[loss,emotion,time,location],并在地图上展布,并按照灾情进展过程作序列对比,以详细了解灾情。In the S3, monitoring and analysis of the disaster area is carried out according to the multi-dimensional disaster-related information, and a four-dimensional information array [loss, emotion, time, location] is constructed from time information, spatial location information, disaster damage information and public sentiment feedback information, and is displayed on the map. Spread out, and make a sequence comparison according to the progress of the disaster to understand the disaster in detail.
图4展示了台风过境过程中各类别灾损的空间分布状况示意图。由图4可知,香洲区受灾较大,灾损主要为供水供电以及交通影响方面,结合具体位置信息,可针对各灾损事件展开救援。Figure 4 shows a schematic diagram of the spatial distribution of various types of disaster damage during the typhoon transit process. It can be seen from Figure 4 that the Xiangzhou District suffered a lot of disasters, and the disaster damage was mainly in terms of water supply, power supply and traffic impact. Combined with the specific location information, rescue can be carried out for each disaster damage event.
图5展示了台风过境过程中各类别灾损的时空分布序列和情感反馈变化序列示意图。由图5可知,随着台风的移动(台风向西北方向移动),各过境区域灾损信息陆续增加,且不同的区域随着台风过境前后呈不同的灾损分布和情绪反馈,如上午9时-12时台风登陆前,灾损类别主要以交通影响为主,该时段,公众出行返程较多,交通拥挤,情感态度多呈负面。台风过境后,如18点-24点,相关灾损信息逐渐减少,公众正面情绪有所增加,尤其是香洲区(风灾影响最大),根据文本内容表明,该时段,台风过后,政府救援力度较大,公众情绪发聩为正面。Figure 5 shows a schematic diagram of the time-space distribution sequence and emotional feedback change sequence of various types of disaster damage during the typhoon transit process. It can be seen from Figure 5 that with the movement of the typhoon (the typhoon moves to the northwest), the disaster damage information in each transit area gradually increases, and different areas show different damage distribution and emotional feedback before and after the typhoon transits, such as 9:00 a.m. -Before the typhoon made landfall at 12:00, the main types of disaster damage were traffic impacts. During this period, there were many public trips and return trips, traffic was congested, and emotional attitudes were mostly negative. After the typhoon crosses the border, such as from 18:00 to 24:00, the relevant disaster damage information gradually decreases, and the positive public sentiment increases, especially in Xiangzhou District (the most affected by the typhoon). Large, public sentiment turned positive.
本发明首先以互联网多源数据作为灾害监测分析语料。其次提出了特征词多维扩展算法FWME辅助构建台风灾情信息抽取知识库来抽取风灾损失信息以及风灾受众群体的情绪反馈信息。最后,结合时间和空间位置信息对灾情作多时序联合分析。达到对受灾区实时、全面的灾情监测,以辅助减灾救灾活动的实施。The present invention firstly uses Internet multi-source data as the corpus for disaster monitoring and analysis. Secondly, the feature word multi-dimensional expansion algorithm FWME is proposed to assist in constructing the knowledge base of typhoon disaster information extraction to extract the information of typhoon disaster loss and the emotional feedback information of the typhoon disaster audience. Finally, combined with time and space location information, a multi-sequence joint analysis of the disaster situation is carried out. Achieving real-time and comprehensive disaster monitoring in disaster-stricken areas to assist the implementation of disaster reduction and relief activities.
至此,已经结合附图对本公开实施例进行了详细描述。需要说明的是,在附图或说明书正文中,未绘示或描述的实现方式,均为所属技术领域中普通技术人员所知的形式,并未进行详细说明。此外,上述对各元件和方法的定义并不仅限于实施例中提到的各种具体结构、形状或方式,本领域普通技术人员可对其进行简单地更改或替换。So far, the embodiments of the present disclosure have been described in detail with reference to the accompanying drawings. It should be noted that, in the accompanying drawings or the text of the description, the implementations that are not shown or described are in the form known to those of ordinary skill in the technical field, and are not described in detail. In addition, the above definitions of various elements and methods are not limited to various specific structures, shapes or manners mentioned in the embodiments, and those of ordinary skill in the art can simply modify or replace them.
还需要说明的是,实施例中提到的方向用语,例如“上”、“下”、“前”、“后”、“左”、“右”等,仅是参考附图的方向,并非用来限制本公开的保护范围。贯穿附图,相同的元素由相同或相近的附图标记来表示。在可能导致对本公开的理解造成混淆时,将省略常规结构或构造。It should also be noted that the directional terms mentioned in the embodiments, such as "up", "down", "front", "rear", "left", "right", etc., only refer to the directions of the drawings, not used to limit the scope of protection of the present disclosure. Throughout the drawings, the same elements are denoted by the same or similar reference numbers. Conventional structures or constructions will be omitted when it may lead to obscuring the understanding of the present disclosure.
并且图中各部件的形状和尺寸不反映真实大小和比例,而仅示意本公开实施例的内容。另外,在权利要求中,不应将位于括号之间的任何参考符号构造成对权利要求的限制。Moreover, the shapes and sizes of the components in the figures do not reflect the actual size and proportion, but merely illustrate the contents of the embodiments of the present disclosure. Furthermore, in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim.
再者,单词“包含”不排除存在未列在权利要求中的元件或步骤。位于元件之前的单词“一”或“一个”不排除存在多个这样的元件。Furthermore, the word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements.
此外,除非特别描述或必须依序发生的步骤,上述步骤的顺序并无限制于以上所列,且可根据所需设计而变化或重新安排。并且上述实施例可基于设计及可靠度的考虑,彼此混合搭配使用或与其他实施例混合搭配使用,即不同实施例中的技术特征可以自由组合形成更多的实施例。Furthermore, unless the steps are specifically described or must occur sequentially, the order of the above steps is not limited to those listed above, and may be varied or rearranged according to the desired design. And the above embodiments can be mixed and matched with each other or with other embodiments based on the consideration of design and reliability, that is, the technical features in different embodiments can be freely combined to form more embodiments.
在此提供的算法和显示不与任何特定计算机、虚拟系统或者其它设备固有相关。各种通用系统也可以与基于在此的示教一起使用。根据上面的描述,构造这类系统所要求的结构是显而易见的。此外,本公开也不针对任何特定编程语言。应当明白,可以利用各种编程语言实现在此描述的本公开的内容,并且上面对特定语言所做的描述是为了披露本公开的最佳实施方式。The algorithms and displays provided herein are not inherently related to any particular computer, virtual system, or other device. Various general-purpose systems can also be used with teaching based on this. The structure required to construct such a system is apparent from the above description. Furthermore, this disclosure is not directed to any particular programming language. It is to be understood that various programming languages can be used to implement the disclosures described herein and that the descriptions of specific languages above are intended to disclose the best mode of the disclosure.
本公开可以借助于包括有若干不同元件的硬件以及借助于适当编程的计算机来实现。本公开的各个部件实施例可以以硬件实现,或者以在一个或者多个处理器上运行的软件模块实现,或者以它们的组合实现。本领域的技术人员应当理解,可以在实践中使用微处理器或者数字信号处理器(DSP)来实现根据本公开实施例的相关设备中的一些或者全部部件的一些或者全部功能。本公开还可以实现为用于执行这里所描述的方法的一部分或者全部的设备或者装置程序(例如,计算机程序和计算机程序产品)。这样的实现本公开的程序可以存储在计算机可读介质上,或者可以具有一个或者多个信号的形式。这样的信号可以从因特网网站上下载得到,或者在载体信号上提供,或者以任何其他形式提供。The present disclosure can be implemented by means of hardware comprising several different elements, as well as by means of a suitably programmed computer. Various component embodiments of the present disclosure may be implemented in hardware, or in software modules running on one or more processors, or in a combination thereof. Those skilled in the art should understand that a microprocessor or a digital signal processor (DSP) may be used in practice to implement some or all of the functions of some or all of the components in the related apparatus according to the embodiments of the present disclosure. The present disclosure can also be implemented as apparatus or apparatus programs (eg, computer programs and computer program products) for performing some or all of the methods described herein. Such a program implementing the present disclosure may be stored on a computer-readable medium, or may be in the form of one or more signals. Such signals may be downloaded from Internet sites, or provided on carrier signals, or in any other form.
本领域那些技术人员可以理解,可以对实施例中的设备中的模块进行自适应性地改变并且把它们设置在与该实施例不同的一个或多个设备中。可以把实施例中的模块或单元或组件组合成一个模块或单元或组件,以及此外可以把它们分成多个子模块或子单元或子组件。除了这样的特征和/或过程或者单元中的至少一些是相互排斥之外,可以采用任何组合对本说明书(包括伴随的权利要求、摘要和附图)中公开的所有特征以及如此公开的任何方法或者设备的所有过程或单元进行组合。除非另外明确陈述,本说明书(包括伴随的权利要求、摘要和附图)中公开的每个特征可以由提供相同、等同或相似目的的替代特征来代替。并且,在列举了若干装置的单元权利要求中,这些装置中的若干个可以是通过同一个硬件项来具体体现。Those skilled in the art will understand that the modules in the device in the embodiment can be adaptively changed and arranged in one or more devices different from the embodiment. The modules or units or components in the embodiments may be combined into one module or unit or component, and further they may be divided into multiple sub-modules or sub-units or sub-assemblies. All features disclosed in this specification (including accompanying claims, abstract and drawings) and any method so disclosed may be employed in any combination unless at least some of such features and/or procedures or elements are mutually exclusive. All processes or units of equipment are combined. Each feature disclosed in this specification (including accompanying claims, abstract and drawings) may be replaced by alternative features serving the same, equivalent or similar purpose, unless expressly stated otherwise. Also, in a unit claim enumerating several means, several of these means can be embodied by one and the same item of hardware.
类似地,应当理解,为了精简本公开并帮助理解各个公开方面中的一个或多个,在上面对本公开的示例性实施例的描述中,本公开的各个特征有时被一起分组到单个实施例、图、或者对其的描述中。然而,并不应将该公开的方法解释成反映如下意图:即所要求保护的本公开要求比在每个权利要求中所明确记载的特征更多的特征。更确切地说,如下面的权利要求书所反映的那样,公开方面在于少于前面公开的单个实施例的所有特征。因此,遵循具体实施方式的权利要求书由此明确地并入该具体实施方式,其中每个权利要求本身都作为本公开的单独实施例。Similarly, it will be appreciated that in the above description of exemplary embodiments of the disclosure, various features of the disclosure are sometimes grouped together into a single embodiment, figure, or its description. However, this method of disclosure should not be interpreted as reflecting an intention that the claimed disclosure requires more features than are expressly recited in each claim. Rather, as the following claims reflect, disclosed aspects lie in less than all features of a single foregoing disclosed embodiment. Thus, the claims following the Detailed Description are hereby expressly incorporated into this Detailed Description, with each claim standing on its own as a separate embodiment of the present disclosure.
以上所述的具体实施例,对本公开的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本公开的具体实施例而已,并不用于限制本公开,凡在本公开的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本公开的保护范围之内。The specific embodiments described above further describe the purpose, technical solutions and beneficial effects of the present disclosure in detail. It should be understood that the above-mentioned specific embodiments are only specific embodiments of the present disclosure, and are not intended to limit the present disclosure. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present disclosure should be included within the protection scope of the present disclosure.
Claims (9)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810597449.9A CN108897792B (en) | 2018-06-11 | 2018-06-11 | Disaster monitoring and analysis method for extracting multi-dimensional disaster-related information from the Internet |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810597449.9A CN108897792B (en) | 2018-06-11 | 2018-06-11 | Disaster monitoring and analysis method for extracting multi-dimensional disaster-related information from the Internet |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108897792A CN108897792A (en) | 2018-11-27 |
CN108897792B true CN108897792B (en) | 2022-05-03 |
Family
ID=64344484
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810597449.9A Active CN108897792B (en) | 2018-06-11 | 2018-06-11 | Disaster monitoring and analysis method for extracting multi-dimensional disaster-related information from the Internet |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108897792B (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102044022A (en) * | 2010-12-24 | 2011-05-04 | 中国科学院合肥物质科学研究院 | Emergency rescue decision making system aiming at natural disasters and method thereof |
CN103390039A (en) * | 2013-07-17 | 2013-11-13 | 北京建筑工程学院 | Urban disaster thematic map real-time generating method based on network information |
CN104809108A (en) * | 2015-05-20 | 2015-07-29 | 成都布林特信息技术有限公司 | Information monitoring and analyzing system |
CN107562814A (en) * | 2017-08-14 | 2018-01-09 | 中国农业大学 | A kind of earthquake emergency and the condition of a disaster acquisition of information sorting technique and system |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2017078986A1 (en) * | 2014-12-29 | 2017-05-11 | Cyence Inc. | Diversity analysis with actionable feedback methodologies |
-
2018
- 2018-06-11 CN CN201810597449.9A patent/CN108897792B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102044022A (en) * | 2010-12-24 | 2011-05-04 | 中国科学院合肥物质科学研究院 | Emergency rescue decision making system aiming at natural disasters and method thereof |
CN103390039A (en) * | 2013-07-17 | 2013-11-13 | 北京建筑工程学院 | Urban disaster thematic map real-time generating method based on network information |
CN104809108A (en) * | 2015-05-20 | 2015-07-29 | 成都布林特信息技术有限公司 | Information monitoring and analyzing system |
CN107562814A (en) * | 2017-08-14 | 2018-01-09 | 中国农业大学 | A kind of earthquake emergency and the condition of a disaster acquisition of information sorting technique and system |
Also Published As
Publication number | Publication date |
---|---|
CN108897792A (en) | 2018-11-27 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108763333B (en) | Social media-based event map construction method | |
CN108717408B (en) | A sensitive word real-time monitoring method, electronic equipment, storage medium and system | |
CN111460247B (en) | Automatic detection method for network picture sensitive characters | |
JP2023502827A (en) | How to acquire geographic knowledge | |
CN101777042B (en) | Neural network and tag library-based statement similarity algorithm | |
CN111143576A (en) | Event-oriented dynamic knowledge graph construction method and device | |
CN103226580B (en) | A kind of topic detection method of interaction text | |
CN110781670B (en) | Chinese place name semantic disambiguation method based on encyclopedic knowledge base and word vectors | |
CN106951438A (en) | A kind of event extraction system and method towards open field | |
CN106326212A (en) | Method for analyzing implicit type discourse relation based on hierarchical depth semantics | |
US10528662B2 (en) | Automated discovery using textual analysis | |
CN102110140A (en) | Network-based method for analyzing opinion information in discrete text | |
CN112148832B (en) | Event detection method of dual self-attention network based on label perception | |
CN108038205A (en) | For the viewpoint analysis prototype system of Chinese microblogging | |
CN105260488B (en) | A kind of text sequence alternative manner for semantic understanding | |
CN107844609A (en) | A kind of emergency information abstracting method and system based on style and vocabulary | |
CN114818717B (en) | Chinese named entity recognition method and system integrating vocabulary and syntax information | |
CN113988075B (en) | Entity relationship extraction method for text data in network security field based on multi-task learning | |
CN116029305A (en) | Chinese attribute-level emotion analysis method, system, equipment and medium based on multitask learning | |
CN111191413A (en) | Method, device and system for automatically marking event core content based on graph sequencing model | |
CN107957990B (en) | Trigger word expansion method and device and event extraction method and system | |
CN115017902A (en) | Construction method and device of Tibetan phrase structure recognition model based on deep learning | |
CN114398471A (en) | A Visual Question Answering Method Based on Deep Inference Attention Mechanism | |
CN108897792B (en) | Disaster monitoring and analysis method for extracting multi-dimensional disaster-related information from the Internet | |
Miller et al. | Digging into human rights violations: Data modelling and collective memory |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |