WO2022099927A1 - Information aggregation method for typhoon events - Google Patents

Information aggregation method for typhoon events Download PDF

Info

Publication number
WO2022099927A1
WO2022099927A1 PCT/CN2021/072796 CN2021072796W WO2022099927A1 WO 2022099927 A1 WO2022099927 A1 WO 2022099927A1 CN 2021072796 W CN2021072796 W CN 2021072796W WO 2022099927 A1 WO2022099927 A1 WO 2022099927A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
aggregation
attribute
typhoon
behavior
Prior art date
Application number
PCT/CN2021/072796
Other languages
French (fr)
Chinese (zh)
Inventor
张雪英
怀安
叶鹏
Original Assignee
南京师范大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 南京师范大学 filed Critical 南京师范大学
Priority to JP2022505249A priority Critical patent/JP2023504961A/en
Publication of WO2022099927A1 publication Critical patent/WO2022099927A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24553Query execution of query operations
    • G06F16/24554Unary operations; Data partitioning operations
    • G06F16/24556Aggregation; Duplicate elimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • G06F16/313Selection or weighting of terms for indexing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking

Definitions

  • the invention belongs to the field of big data mining, and in particular relates to a typhoon event information aggregation method.
  • Typhoons will have a very serious and destructive impact on the natural ecology, social economy and even human sustainable development.
  • Timely acquisition of relevant information on the evolution of typhoon events has become an important basis and reference for disaster emergency response.
  • social media has shown great application potential in disaster management with its efficient update frequency, multi-source communication channels and wide participation, and has gradually developed into a new way to obtain information on typhoon events.
  • due to the short text characteristics of social media itself it also has the characteristics of high information fragmentation, complex and diverse forms of expression, and diverse information granularity.
  • Huge and scattered social media information is not only difficult to reflect the full picture of the evolution of typhoon events, but also hinders users from effectively detecting the process of typhoon events.
  • Information aggregation method improves the rationality of information organization and optimizes the access efficiency through the effective description of information resources, so as to meet the needs and convenience of users to obtain effective information resources.
  • Information aggregation methods for disaster events mainly include statistical-based methods, topic model-based methods and knowledge element-based methods: (1) Statistical methods use statistical features such as word frequency, TF-IDF, N-gram, and mutual information to calculate information.
  • the keyword weight in the unit from which the most representative keywords are selected and aggregated based on this. This kind of method is simple, subjective and easy to understand, but due to the low accuracy of keyword screening, it is generally necessary to combine auxiliary information for secondary screening.
  • the probabilistic topic model assumes that each document has a latent distribution over all the topic words, and the topic word probability distribution can be used to represent the topics in the information unit.
  • the effect of this type of method depends on the determination of the number of topics. In reality, different topics in social media are always changing dynamically. The same message on social media may contain content of multiple topics, which also makes the interpretability of topic words more controversial.
  • Knowledge element is to define the logical relationship and hierarchical structure between different concepts. Common knowledge element forms include ontology, semantic network, linked data and so on. Knowledge element-based aggregation is based on knowledge element theory. By constructing a conceptual model describing the structure of disaster events, information is reordered and organized according to the semantic relationships defined in the model to reveal information features and their associations.
  • the purpose of the present invention is to provide a typhoon event information aggregation method, which can screen, organize and integrate typhoon event information from scattered sources in social media, and provide an orderly information basis for detecting the development stage and situation of the typhoon event process. It is conducive to the improvement of social media resource service capabilities in emergency management.
  • the present invention provides the following technical solutions:
  • Step 1 Collect the message text related to the typhoon event in the social media, extract the typhoon event information from it, and convert it into a structured information tuple form;
  • Step 2 Object information aggregation based on multi-feature similarity: according to the similarity between object names to determine whether it belongs to the information tuple of the same object, it is necessary to aggregate the information tuples describing the same object;
  • Step 3 Aggregation of state information based on spatiotemporal features: In the aggregation result of object information, the attribute values and behavior values that meet the requirements of a single time and location condition are screened. Time information, location information, and the filtered attribute values and behavior values together constitute the object in the Aggregation results of state information in a specific time and space;
  • Step 4 State-based process information aggregation: screen the space-time node information that meets the time and location range requirements in the object information aggregation result, perform state information aggregation on these space-time nodes respectively, and sort multiple state information aggregation results to form Process information aggregation results reflecting dynamic characteristics.
  • the typhoon event information includes object name, time information, location information, attribute information and behavior information.
  • step 2 for different information tuples describing the same object, attribute items and behavior items of the same type also need to be further aggregated.
  • the extraction of typhoon event information includes at least two parts: information element identification and information element association:
  • Identification of information elements clarify the constituent objects of typhoon events and build a classification system, extract the names and characteristic information describing different types of objects from social media texts, and the characteristic information includes time, location, attributes and behaviors.
  • the attribute information can be further divided into attribute items and attribute values, the attribute items represent the type of the attribute, and the attribute value is the data or the amount of data possessed by the attribute of this type.
  • Behavior information is similar to attribute information;
  • On is the object name
  • T is the time information
  • L is the location information
  • A is the attribute information
  • B is the behavior information.
  • step 2 the similarity between object names, attribute items and behavior items is judged by using word vector similarity, including the following steps:
  • the word segmentation result is used as the training set, and the Skip-gram model is used for word vector training;
  • step 4 when sorting multiple state information aggregation results, the following steps are included:
  • the attribute information and behavior information of the state it can be sorted according to the size or level of the feature value, or it can be sorted according to the similarity with the user aggregation condition.
  • the present invention constructs a social media-based typhoon event process information aggregation method.
  • the multi-level aggregation is described from "object-state-process" respectively. model.
  • object layer according to the similarity of multi-dimensional features, various types of scattered feature information of the same object are aggregated;
  • the state layer the attribute information and behavior information in the object that conform to specific spatiotemporal characteristics are aggregated to realize information spatiotemporal. Unification of granularity; finally, in the process layer, multiple states are sorted according to the space-time relationship to achieve the effect of orderly organization of information.
  • This aggregation mode aims at the decentralization, multi-granularity and disordered description characteristics of information in social media, and also fully takes into account the dynamic evolution characteristics of typhoon events. Ordered information on the process characteristics of typhoon events. In practical application scenarios, it can play an important role in meeting the emergency task needs of government agencies and the public's cognitive needs.
  • Figure 1 shows the multi-level typhoon event process information aggregation model
  • Figure 2 shows the spatiotemporal semantic unit constructed in social media
  • Figure 3 is an example of typhoon event information extraction results in social media
  • Fig. 4 is the organizational structure and example of the object information aggregation result
  • Fig. 5 is the organizational structure and example of the state information aggregation result
  • Figure 6 shows the different stages of process information aggregation
  • Figure 7 shows the organization structure and example of the process information aggregation result.
  • the invention discloses a social media-based typhoon event process information aggregation method, including:
  • Step 1 Collect message texts related to typhoon events in social media, and extract typhoon event information from them, including object name, time information, location information, attribute information, and behavior information, and convert them into a structured information tuple form.
  • Step 2 Object information aggregation based on multi-feature similarity. To judge whether the object names belong to the information tuple of the same object according to the similarity between the object names, it is necessary to aggregate the information tuples describing the same object. For different information tuples describing the same object, attribute items and behavior items of the same type also need to be further aggregated.
  • Step 3 Aggregate state information based on spatiotemporal features. Attribute values and behavior values that meet the requirements of a single time and location condition are filtered in the object information aggregation result. Time information, location information, and the filtered attribute values and behavior values together constitute the state information aggregation result of the object under a specific time and space.
  • Step 4 state-based process information aggregation.
  • the space-time node information that meets the requirements of time and location range is screened in the object information aggregation result, the state information is aggregated for these space-time nodes respectively, and the multiple state aggregation results are sorted to form the process information aggregation result that reflects the dynamic characteristics.
  • the typhoon event information extraction in step 1 includes:
  • the attribute information can be further divided into attribute items and attribute values, the attribute items represent the type of the attribute, and the attribute value is the data or the amount of data possessed by the attribute of this type. Behavior information is similar to attribute information.
  • On is the object name
  • T is the time information
  • L is the location information
  • A is the attribute information
  • B is the behavior information.
  • the typhoon event composition objects are divided into subject objects and object objects.
  • the cyclone as a hazard factor, is the main object in the event, and other objects that are damaged, acted, and affected by the cyclone are the object objects in the event. According to the different properties of the objects, they can be classified separately, mainly including people, infrastructure, transportation facilities, social activities and other types. It should be noted that different objects can learn from the classification methods of related fields, and make more detailed classification according to actual needs (Table 1).
  • extracting names and feature information describing different types of objects from social media texts includes:
  • a time information extraction model is constructed based on the conditional random field model, and the time information in the social media text is automatically identified.
  • a location information extraction model is constructed based on the deep belief network, and the location information in the social media text is automatically identified.
  • S1 Spatiotemporal semantic unit construction. Words, words, phrases, clauses, sentences or paragraphs are all language units in text, and different language units form the basic structure of text through semantic relationships. If some language units or the combination of different language units can express the complete semantic connotation, it is a semantic unit. When the semantic unit contains temporal information and spatial information, it can clearly express the spatiotemporal characteristics of the content in the semantic unit. In this method, the semantic unit is defined as the spatiotemporal semantic unit.
  • the distribution of spatiotemporal semantic units can be roughly divided into three categories: (1) only describe object information at the same time and location, and such texts occupy most of social media texts; (2) ) describes the object information at different locations at the same time, and the number of such texts is relatively small; (3) The object information of multiple times and locations is listed and compared, which is a comprehensive report, and the number of such texts is very small.
  • this method divides social media texts into different spatiotemporal semantic units based on the extracted spatiotemporal information (Fig. 2).
  • the location of the spatiotemporal information in the text is used as the basis for division into spatiotemporal semantic units, including:
  • the association between feature trigger words and feature values constitute the feature information of the object. At this time, it specifically refers to the attribute feature and behavior feature.
  • the feature trigger word represents the attribute item and the behavior item
  • the feature value represents the attribute value and the behavior value.
  • Feature trigger words and feature values follow the adjacent law when they are expressed, forming a structure of "feature trigger word-feature value". By counting the word frequencies of the top three words in the attribute value, the frequency of feature trigger words is over 99%. Therefore, the feature value is associated with the closest feature trigger word in front of its position.
  • the object information aggregation in step 2 includes:
  • the object name of the aggregation condition is set as N, and the similarity sim n between the name of On and N is judged in turn. If sim n ⁇ ⁇ n , and ⁇ n is the object similarity threshold, it indicates that it is the same object, and the information tuples of the same object are merged.
  • the word vector similarity method For the measurement method of judging the similarity of object names, the word vector similarity method is used. Based on the word vector model trained by the Skip-gram model, the word vector similarity method first maps the object name to a vector in a multi-dimensional space, and determines whether the directions of different vectors in the multi-dimensional space are consistent through the similarity algorithm. It is measured by cosine similarity.
  • O (typhoon) ⁇ August 10, 2019 1:45, Wenling City, Zhejiang City, wind force: 16, landfall>
  • O (tropical cyclone) ⁇ August 11, 2019 20:50, Shandong province Qingdao City, wind force: level 9, login> is an information tuple extracted from social media.
  • the object name of the aggregation condition is set as "typhoon”, and the similarity of the object names "typhoon" and "tropical cyclone" in the information tuple is judged respectively, and their semantics are to express the cyclone ontology. as the aggregated result.
  • sim a ⁇ a , and ⁇ a is the attribute similarity threshold, it indicates that the attribute items are the same, and information aggregation can be performed, and each attribute value and space-time characteristics are also retained after the aggregation; otherwise, it is a different attribute item describing the same object, no Aggregate property items.
  • the word vector similarity method judges the similarity sim b between On behavior item and B. If sim b ⁇ ⁇ b , and ⁇ b is the behavior similarity threshold, it indicates that the behavior items are the same, information aggregation can be performed, and each behavior information and space-time characteristics are also retained after aggregation; otherwise, it is a different behavior item describing the same object, not Aggregate behavior items.
  • the "wind” attribute feature information of the typhoon is further aggregated.
  • Both O (typhoon) and O (tropical cyclone) have an attribute item "wind force" that meets the similarity threshold, so ⁇ August 10, 2019 1:45, Wenling City, Zhejiang City, wind force: 16> and ⁇ 2019 August 11, 20:50, Qingdao City, Shandong province, wind force: level 9 > as an aggregated result of object features.
  • O(N) represents the aggregated object
  • a l is the property item of the aggregated object
  • a ls is the specific property value
  • B n is the behavior item of the aggregated object
  • b nu is the specific behavior value
  • ⁇ T S> is the time and place where the attribute value or behavior value occurs.
  • the state information aggregation in step 3 includes:
  • the time and space benchmarks are unified.
  • the spatiotemporal framework is the basis for the existence of states, and a unified spatiotemporal reference needs to be established in the aggregation of state information.
  • the date is set to the Gregorian calendar era
  • the time is set to Beijing time
  • the space base uses the CGCS2000 coordinate system.
  • Time information and location information are the basis for judging whether the associated attribute information and behavior information are the basis for describing the state characteristics of objects under specific space-time conditions.
  • time information according to the current daily usage habits of people, the Gregorian calendar year, calendar time and clock time are used for standardized description.
  • the time normalization form is defined as "date+time” in the format “YYYY-MM-DD HH:MM:SS”, for example: "2019-08-10 12:00:00".
  • Location information should be converted into a normalized representation according to a unified spatial reference, including descriptions such as place names, addresses, and spatial coordinates.
  • the place name can refer to the standard name, code and category issued by the country at a specific time
  • the address element type and element combination method contained in the address can refer to the standard specification issued by the country or industry
  • the spatial coordinates should follow the requirements of the spatial datum. coordinate transformation.
  • attribute item or behavior item will not be aggregated. By traversing all attribute items and behavior items in O(N), each attribute item and behavior item will filter out at most one feature value that best fits the spatiotemporal characteristics. These attribute information and behavior information are aggregated to form the aggregated result of the state information of the object under specific spatiotemporal conditions.
  • the state information is not limited to the object features that are explicitly mentioned as belonging to the current space-time, but also includes the latest progress of all the object features in all previous times up to now, ensuring that The comprehensiveness and completeness of the aggregated results.
  • S represents the state of the object O(N) existing at time t and location l
  • a l and a ls describe the attribute characteristics of the state
  • B n and bn u are the behavioral characteristics describing the state
  • ⁇ T, S> is When and where attributes and behavioral characteristics arise.
  • the process information aggregation in step 4 includes two parts: state sequence aggregation and event process aggregation.
  • the process is the connection of different states in time and space, and the dynamics of the process is reflected through the changes of attribute information and behavior information in the state.
  • a typhoon event includes the evolution process of multiple objects during the event, and the process of a typhoon event is composed of different states of multiple objects. Therefore, a step-by-step decomposition method is adopted in the aggregation of process layer information, and the connection between state information and process information is abstracted into three stages: object state, state sequence and event process ( Figure 6).
  • the object state aggregates the attribute information and behavior information of the object in a certain time and space; the state sequence is to record the evolution process of the same object, and different states of the same object need to be aggregated; the event process is the common evolution process of multiple objects. It consists of multiple state sequences.
  • performing state sequence aggregation includes:
  • Sort all the state aggregation results First, according to the time information of the state, sort in order or in reverse order; secondly, according to the position information of the state, follow the scale from large to small or from small to large. According to the attribute information and behavior information of the state, it can be sorted according to the size or level of the feature value, or it can be sorted according to the similarity with the user aggregation condition.
  • the sequence of states arranged according to three-dimensional conditions is the result of the process aggregation of a single object.
  • S3 Information organization of state sequence aggregation results.
  • the organizational form of the state sequence information aggregation results can be expressed as Figure 5.
  • P represents the process experienced by the object O(N) on the temporal scope tr and the spatial scope sr
  • S represents the object state on the space-time node ⁇ t n , ln >.
  • performing event process aggregation includes:

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Business, Economics & Management (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Economics (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Primary Health Care (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An information aggregation method for typhoon events comprises the following main steps: step 1, acquiring, from social media, message texts related to typhoon events, extracting typhoon event information therefrom, and converting the information into structured information tuples; step 2, performing object information aggregation on the basis of levels of similarity between multiple features; step 3, performing state information aggregation on the basis of spatio-temporal features; and step 4, performing process information aggregation on the basis of states, wherein the step of performing process information aggregation on the basis of states comprises: screening object information aggregation results to obtain information of spatio-temporal nodes that meet time and location range requirements, performing state information aggregation with respect to the spatio-temporal nodes respectively, and arranging multiple state information aggregation results in order so as to form a process information aggregation result having dynamic characteristics. The information aggregation method for typhoon events is used to screen, organize and integrate typhoon event information acquired from scattered sources in social media, thereby providing an ordered information basis for assessing development stages and situations in typhoon event processes.

Description

台风事件信息聚合方法Typhoon event information aggregation method 技术领域technical field
本发明属于大数据挖掘领域,具体涉及一种台风事件信息聚合方法。The invention belongs to the field of big data mining, and in particular relates to a typhoon event information aggregation method.
背景技术Background technique
台风会对自然生态、社会经济甚至人类可持续发展造成十分严重的破坏性影响,及时地获取台风事件演化过程的相关信息,成为灾害应急响应的重要依据和参考。在当前大数据环境下,社交媒体凭借其高效的更新频率、多源的传播渠道和广泛的参与程度,在灾害管理中显示出巨大的应用潜力,并逐渐发展为获取台风事件信息的新途径。然而,由于社交媒体本身的短文本特性,也存在信息破碎度高、表达形式复杂多样、信息粒度多样化等特点。庞杂散乱的社交媒体信息不仅难以反映台风事件演化的全貌,也阻碍了用户对于台风事件过程的有效探测。Typhoons will have a very serious and destructive impact on the natural ecology, social economy and even human sustainable development. Timely acquisition of relevant information on the evolution of typhoon events has become an important basis and reference for disaster emergency response. In the current big data environment, social media has shown great application potential in disaster management with its efficient update frequency, multi-source communication channels and wide participation, and has gradually developed into a new way to obtain information on typhoon events. However, due to the short text characteristics of social media itself, it also has the characteristics of high information fragmentation, complex and diverse forms of expression, and diverse information granularity. Huge and scattered social media information is not only difficult to reflect the full picture of the evolution of typhoon events, but also hinders users from effectively detecting the process of typhoon events.
信息聚合方法通过对信息资源的有效描述,来提高信息组织的合理性并优化访问效率,以满足用户获取有效信息资源的需求和便利性。面向灾害事件的信息聚合方式主要包括基于统计的方法、基于主题模型的方法和基于知识元的方法:(1)统计方法是利用词频、TF-IDF、N-gram、互信息等统计特征计算信息单元中的关键词权重,从中选取最具代表性的关键词并基于此进行聚合。该类方法简单主观、易于理解,但由于关键词筛选精度不高,一般需要结合辅助信息进行二次筛选。(2)概率主题模型假设每个文档在所有主题词上都存在一个潜在分布,可以利用主题词概率分布表示信息单元中的主题。然而,该类方法的效果依赖于主题个数的确定,在现实中社交媒体中不同主题一直处于动态变化。社交媒体的同一条消息中可能包含多个主题的内容,也使得主题词的可解释性存在较大争议。(3)知识元是对不同概念间的逻辑关系和层次结构进行定义,常见知识元形式有本体、语义网络、关联数据等。基于知识元的聚合是以知识元理论为基础,通过构建描述灾害事件结构的概念模型,根据模型中定义的语义关系进行信息重新序化和组织,以揭示信息特征及其关联。The information aggregation method improves the rationality of information organization and optimizes the access efficiency through the effective description of information resources, so as to meet the needs and convenience of users to obtain effective information resources. Information aggregation methods for disaster events mainly include statistical-based methods, topic model-based methods and knowledge element-based methods: (1) Statistical methods use statistical features such as word frequency, TF-IDF, N-gram, and mutual information to calculate information. The keyword weight in the unit, from which the most representative keywords are selected and aggregated based on this. This kind of method is simple, subjective and easy to understand, but due to the low accuracy of keyword screening, it is generally necessary to combine auxiliary information for secondary screening. (2) The probabilistic topic model assumes that each document has a latent distribution over all the topic words, and the topic word probability distribution can be used to represent the topics in the information unit. However, the effect of this type of method depends on the determination of the number of topics. In reality, different topics in social media are always changing dynamically. The same message on social media may contain content of multiple topics, which also makes the interpretability of topic words more controversial. (3) Knowledge element is to define the logical relationship and hierarchical structure between different concepts. Common knowledge element forms include ontology, semantic network, linked data and so on. Knowledge element-based aggregation is based on knowledge element theory. By constructing a conceptual model describing the structure of disaster events, information is reordered and organized according to the semantic relationships defined in the model to reveal information features and their associations.
目前,基于统计和主题模型的方法是进行灾害事件信息聚合最常用的方式。然而,这两类方法聚合结果的信息粒度较粗,通常只是将与灾害事件有关的各类信息集中在一起。相比较而言,基于知识元的聚合方法能够依据灾害领域的概念体系对原始资源进行分解和重组,获得具有一定知识结构的深度聚合结果。但是现有的台风事件知识建模多关注于台风事件中各个概念的层次结构与关联关系,忽略了对于台风事件动态过程的描述与表达。面对海量且类型复杂的社交媒体资源分散分布的状况,有必要构建信息聚合方法,依据事件的演化过程对台风事件信息进行有序化整合。Currently, methods based on statistics and topic models are the most common ways to aggregate disaster event information. However, the information granularity of the aggregated results of these two types of methods is relatively coarse, and usually only all kinds of information related to disaster events are gathered together. In contrast, the aggregation method based on knowledge elements can decompose and reorganize the original resources according to the conceptual system of the disaster field, and obtain in-depth aggregation results with a certain knowledge structure. However, the existing knowledge modeling of typhoon events mostly focuses on the hierarchical structure and relationship of various concepts in typhoon events, ignoring the description and expression of the dynamic process of typhoon events. Faced with the scattered distribution of massive and complex social media resources, it is necessary to build an information aggregation method to orderly integrate typhoon event information according to the evolution of events.
发明内容SUMMARY OF THE INVENTION
本发明的目的在于提供一种台风事件信息聚合方法,对社交媒体中来源分散的台风事件信息进行筛选、组织和整合,为探测台风事件过程的发展阶段和态势提供有序化的信息基础,也有利于应急管理中社交媒体资源服务能力的提升。The purpose of the present invention is to provide a typhoon event information aggregation method, which can screen, organize and integrate typhoon event information from scattered sources in social media, and provide an orderly information basis for detecting the development stage and situation of the typhoon event process. It is conducive to the improvement of social media resource service capabilities in emergency management.
为实现上述目的,本发明提供如下技术方案:To achieve the above object, the present invention provides the following technical solutions:
台风事件信息聚合方法,主要步骤如下:The main steps of the typhoon event information aggregation method are as follows:
步骤1、采集社交媒体中与台风事件相关的消息文本,并从中抽取台风事件信息,并转换为结构化的信息元组形式; Step 1. Collect the message text related to the typhoon event in the social media, extract the typhoon event information from it, and convert it into a structured information tuple form;
步骤2、基于多特征相似度的对象信息聚合:依据对象名称间的相似度判断其是否属于同一对象的信息元组,需要将描述同一对象的信息元组进行聚合; Step 2. Object information aggregation based on multi-feature similarity: according to the similarity between object names to determine whether it belongs to the information tuple of the same object, it is necessary to aggregate the information tuples describing the same object;
步骤3、基于时空特征的状态信息聚合:在对象信息聚合结果中筛选符合单一时间和位置条件要求的属性值和行为值,时间信息、位置信息与筛选出的属性值和行为值共同构成对象在特定时空下的状态信息聚合结果;Step 3. Aggregation of state information based on spatiotemporal features: In the aggregation result of object information, the attribute values and behavior values that meet the requirements of a single time and location condition are screened. Time information, location information, and the filtered attribute values and behavior values together constitute the object in the Aggregation results of state information in a specific time and space;
步骤4、基于状态的过程信息聚合:在对象信息聚合结果中筛选符合时间和位置范围要求的时空节点信息,对这些时空节点分别进行状态信息聚合,并将多个状态信息聚合结果进行排序,形成体现动态特性的过程信息聚合结果。Step 4. State-based process information aggregation: screen the space-time node information that meets the time and location range requirements in the object information aggregation result, perform state information aggregation on these space-time nodes respectively, and sort multiple state information aggregation results to form Process information aggregation results reflecting dynamic characteristics.
优选地,在步骤1中,所述台风事件信息包括对象名称、时间信息、位置信息、属性信息和行为信息。Preferably, in step 1, the typhoon event information includes object name, time information, location information, attribute information and behavior information.
优选地,在步骤2中,对于描述同一对象的不同信息元组,其中相同类型的属性项和行为项也需要进行进一步聚合。Preferably, in step 2, for different information tuples describing the same object, attribute items and behavior items of the same type also need to be further aggregated.
优选地,在步骤1中,台风事件信息抽取至少包括信息要素识别和信息要素关联两个部分:Preferably, in step 1, the extraction of typhoon event information includes at least two parts: information element identification and information element association:
信息要素识别:明确台风事件的组成对象并构建分类体系,从社交媒体文本中抽取描述不同类型对象的名称与特征信息,其中特征信息包括时间、位置、属性和行为。属性信息可以进一步分为属性项和属性值,属性项表示属性的类型,而属性值为该类型属性具有的数据或数据量。行为信息与属性信息相类似;Identification of information elements: clarify the constituent objects of typhoon events and build a classification system, extract the names and characteristic information describing different types of objects from social media texts, and the characteristic information includes time, location, attributes and behaviors. The attribute information can be further divided into attribute items and attribute values, the attribute items represent the type of the attribute, and the attribute value is the data or the amount of data possessed by the attribute of this type. Behavior information is similar to attribute information;
信息要素关联:在同一篇社交媒体文本中,将特征信息依据其表征对象与名称进行关联,形成O n=<T,L,A,B>形式的信息元组。其中,O n为对象名称,T为时间信息,L为位置信息,A为属性信息,B为行为信息。 Information element association: In the same social media text, the feature information is associated with the name according to its representative object to form an information tuple in the form of On =<T, L, A, B>. Among them, On is the object name, T is the time information, L is the location information, A is the attribute information, and B is the behavior information.
优选地,在步骤2中,采用词向量相似度判断对象名称、属性项和行为项之间相似性,包括以下步骤:Preferably, in step 2, the similarity between object names, attribute items and behavior items is judged by using word vector similarity, including the following steps:
S1、将全部社交媒体文本数据进行分词;S1. Perform word segmentation on all social media text data;
S2、将分词结果作为训练集,利用Skip-gram模型进行词向量训练;S2. The word segmentation result is used as the training set, and the Skip-gram model is used for word vector training;
S3、设定对象名称O n1、O n2,属性项A 1、A 2,行为项B 1、B 2,依据训练过的词向量模型分别获得O n1、O n2、A 1、A 2、B 1、B 2的词向量E(O n1)、E(O n2)、E(A 1)、E(A 2)、E(B 1)、E(B 2); S3. Set object names On1 , On2 , attribute items A1 , A2, behavior items B1, B2, respectively obtain On1 , On2 , A1 , A2 , B according to the trained word vector model 1. Word vectors E(O n1 ), E(O n2 ), E(A 1 ), E(A 2 ), E(B 1 ), E(B 2 ) of B 2 ;
S4、利用余弦相似度分别计算E(O n1)与E(O n2)、E(A 1)与E(A 2)、E(B 1)与E(B 2)之间的相似度值sim n、sim a和sim b。若sim n≥ε n,sim a≥ε a,sim b≥ε b,其中ε n、ε a、ε b是阈值,则表明O n1与O n2、A 1与A 2、B 1与B 2是相同的对象名称、属性项和行为项,可以进行相应的信息聚合。 S4. Calculate the similarity value sim between E(O n1 ) and E(O n2 ), E(A 1 ) and E(A 2 ), and E(B 1 ) and E(B 2 ) respectively by using the cosine similarity n , sim a and sim b . If sim n ≥ ε n , sim a ≥ ε a , sim b ≥ ε b , where ε n , ε a , and ε b are thresholds, it means that O n1 and O n2 , A 1 and A 2 , B 1 and B 2 are the same object name, attribute item, and behavior item, and corresponding information aggregation can be performed.
优选地,在步骤4中,对多个状态信息聚合结果进行排序时,包括以下步骤:Preferably, in step 4, when sorting multiple state information aggregation results, the following steps are included:
A1、依据状态的时间信息,遵循顺序或倒序的方式进行排序;A1. According to the time information of the state, follow the order or reverse order;
A2、依据状态的位置信息,遵循尺度由大到小或由小到大的方式进行排序;A2. According to the position information of the state, follow the order of the scale from large to small or from small to large;
A3、依据状态的属性信息和行为信息,可以依据特征值的大小或等级排序,也可以依据与用户聚合条件的相似度进行排序。A3. According to the attribute information and behavior information of the state, it can be sorted according to the size or level of the feature value, or it can be sorted according to the similarity with the user aggregation condition.
采用以上技术方案,能够实现以下技术效果:By adopting the above technical solutions, the following technical effects can be achieved:
本发明构建了基于社交媒体的台风事件过程信息聚合方法,在识别出社交媒体文本中与台风事件相关的不同对象信息元组基础上,分别从“对象-状态-过程”阐述了多层次的聚合模式。首先,在对象层中依据多维特征的相似度,将同一对象各类分散的特征信息进行聚合;其次,在状态层中将对象中符合特定时空特征的属性信息和行为信息进行聚合,实现信息时空粒度的统一;最后,在过程层中将多个状态依据时空关系进行排序,达到信息有序化组织的效果。这种聚合模式针对了社交媒体中信息分散化、多粒度和无序化的描述特点,也充分顾及了台风事件的动态演化特性,可以获取任一时空节点上不同对象的特征信息,并形成体现台风事件过程特性的有序化信息。在实际应用场景中,对于满足政府机构的应急任务需求和社会公众的事理认知需求都可以发挥重要作用。The present invention constructs a social media-based typhoon event process information aggregation method. On the basis of identifying different object information tuples related to typhoon events in social media texts, the multi-level aggregation is described from "object-state-process" respectively. model. First, in the object layer, according to the similarity of multi-dimensional features, various types of scattered feature information of the same object are aggregated; secondly, in the state layer, the attribute information and behavior information in the object that conform to specific spatiotemporal characteristics are aggregated to realize information spatiotemporal. Unification of granularity; finally, in the process layer, multiple states are sorted according to the space-time relationship to achieve the effect of orderly organization of information. This aggregation mode aims at the decentralization, multi-granularity and disordered description characteristics of information in social media, and also fully takes into account the dynamic evolution characteristics of typhoon events. Ordered information on the process characteristics of typhoon events. In practical application scenarios, it can play an important role in meeting the emergency task needs of government agencies and the public's cognitive needs.
附图说明Description of drawings
图1为多层次的台风事件过程信息聚合模式;Figure 1 shows the multi-level typhoon event process information aggregation model;
图2为社交媒体中构建的时空语义单元;Figure 2 shows the spatiotemporal semantic unit constructed in social media;
图3为社交媒体中台风事件信息抽取结果示例;Figure 3 is an example of typhoon event information extraction results in social media;
图4为对象信息聚合结果的组织结构及示例;Fig. 4 is the organizational structure and example of the object information aggregation result;
图5为状态信息聚合结果的组织结构及示例;Fig. 5 is the organizational structure and example of the state information aggregation result;
图6为过程信息聚合的不同阶段;Figure 6 shows the different stages of process information aggregation;
图7为过程信息聚合结果的组织结构及示例。Figure 7 shows the organization structure and example of the process information aggregation result.
具体实施方式Detailed ways
以下结合附图和具体实施例,对本发明做进一步说明。The present invention will be further described below with reference to the accompanying drawings and specific embodiments.
实施例Example
本发明公开了基于社交媒体的台风事件过程信息聚合方法,包括:The invention discloses a social media-based typhoon event process information aggregation method, including:
步骤1、采集社交媒体中与台风事件相关的消息文本,并从中抽取台风事件信息,包括对象名称、时间信息、位置信息、属性信息和行为信息,并转换为结构化的信息元组形式。 Step 1. Collect message texts related to typhoon events in social media, and extract typhoon event information from them, including object name, time information, location information, attribute information, and behavior information, and convert them into a structured information tuple form.
步骤2、基于多特征相似度的对象信息聚合。依据对象名称间的相似度判断其是否属于同一对象的信息元组,需要将描述同一对象的信息元组进行聚合。对于描述同一对象的不同信息元组,其中相同类型的属性项和行为项也需要进行进一步聚合。 Step 2. Object information aggregation based on multi-feature similarity. To judge whether the object names belong to the information tuple of the same object according to the similarity between the object names, it is necessary to aggregate the information tuples describing the same object. For different information tuples describing the same object, attribute items and behavior items of the same type also need to be further aggregated.
步骤3、基于时空特征的状态信息聚合。在对象信息聚合结果中筛选符合单一时间和位置条件要求的属性值和行为值,时间信息、位置信息与筛选出的属性值和行为值共同构成对象在特定时空下的状态信息聚合结果。Step 3: Aggregate state information based on spatiotemporal features. Attribute values and behavior values that meet the requirements of a single time and location condition are filtered in the object information aggregation result. Time information, location information, and the filtered attribute values and behavior values together constitute the state information aggregation result of the object under a specific time and space.
步骤4、基于状态的过程信息聚合。在对象信息聚合结果中筛选符合时间和位置范围要求的时空节点信息,对这些时空节点分别进行状态信息聚合,并将多个状态聚合结果进行排序,形成体现动态特性的过程信息聚合结果。Step 4, state-based process information aggregation. The space-time node information that meets the requirements of time and location range is screened in the object information aggregation result, the state information is aggregated for these space-time nodes respectively, and the multiple state aggregation results are sorted to form the process information aggregation result that reflects the dynamic characteristics.
作为一种优选的技术方案,步骤1中台风事件信息抽取包括:As a preferred technical solution, the typhoon event information extraction in step 1 includes:
1、明确台风事件的组成对象并构建分类体系,从社交媒体文本中抽取描述不同类型对象的名称与特征信息,其中特征信息包括时间、位置、属性和行为。属性信息可以进一步分为属性项和属性值,属性项表示属性的类型,而属性值为该类型属性具有的数据或数据量。行为信息与属性信息相类似。1. Identify the constituent objects of typhoon events and build a classification system, and extract the names and feature information describing different types of objects from social media texts, where the feature information includes time, location, attributes, and behavior. The attribute information can be further divided into attribute items and attribute values, the attribute items represent the type of the attribute, and the attribute value is the data or the amount of data possessed by the attribute of this type. Behavior information is similar to attribute information.
2、在同一篇社交媒体文本中,将特征信息依据其表征对象与名称进行关联,形成O n=<T,L,A,B>形式的信息元组。其中,O n为对象名称,T为时间信息,L为位置信息,A为属性信息,B为行为信息。 2. In the same social media text, the feature information is associated with the name according to its representative object to form an information tuple in the form of On =<T, L, A, B>. Among them, On is the object name, T is the time information, L is the location information, A is the attribute information, and B is the behavior information.
作为一种优选的技术方案,台风事件组成对象分为主体对象和客体对象。气旋作为致灾因子就是事件中的主体对象,而受到气旋破坏、作用、影响的其他对象都是事件中的客体对象。依据客体对象的不同性质可以分别归类,主要包括人物、基础设施、交通设施、社会活动等类型。需要说明的是,不同对象可以借鉴相关领域分类方法,依据实际需要进行更加详细的类型划分(表1)。As a preferred technical solution, the typhoon event composition objects are divided into subject objects and object objects. The cyclone, as a hazard factor, is the main object in the event, and other objects that are damaged, acted, and affected by the cyclone are the object objects in the event. According to the different properties of the objects, they can be classified separately, mainly including people, infrastructure, transportation facilities, social activities and other types. It should be noted that different objects can learn from the classification methods of related fields, and make more detailed classification according to actual needs (Table 1).
表1 台风事件中主要的对象类型Table 1 Main object types in typhoon events
Figure PCTCN2021072796-appb-000001
Figure PCTCN2021072796-appb-000001
作为一种优选的技术方案,从社交媒体文本中对描述不同类型对象的名称与特征信息进行抽取包括:As a preferred technical solution, extracting names and feature information describing different types of objects from social media texts includes:
S1、构建社交媒体文本台风事件信息标注语料库,标注的内容包括描述不同类型对象的名称、时间、位置、属性和行为信息要素。S1. Build a social media text typhoon event information annotation corpus, and the annotated content includes name, time, location, attribute and behavior information elements describing different types of objects.
S2、依据标注语料库,基于条件随机场模型构建时间信息抽取模型,对社交媒体文本中的时间信息进行自动识别。S2. According to the annotated corpus, a time information extraction model is constructed based on the conditional random field model, and the time information in the social media text is automatically identified.
S3、依据标注语料库,基于深度信念网络构建位置信息抽取模型,对社交媒体文本中的位置信息进行自动识别。S3. According to the labeled corpus, a location information extraction model is constructed based on the deep belief network, and the location information in the social media text is automatically identified.
S4、依据标注语料库,总结对象名称、属性信息和行为信息的规则模型,包括触发词词典与句法模式,对社交媒体文本中的对象名称、属性信息和行为信息进行自动识别。S4. Summarize rule models of object names, attribute information and behavior information, including trigger word dictionaries and syntactic patterns, based on the labeled corpus, and automatically identify object names, attribute information, and behavior information in social media texts.
作为一种优选的技术方案,对于从社交媒体中抽取出的各类信息要素需要进行关联包括:As a preferred technical solution, various types of information elements extracted from social media need to be correlated, including:
S1、时空语义单元构建。字、词、短语、子句、句或段落等都是文本中的语言单位,不 同语言单位间通过语义关系形成文本的基本结构。若部分语言单位或将不同语言单位组合后,能够表达出完整的语义内涵,即为语义单元。当语义单元中包含了时间信息和空间信息,能够明确表达出语义单元中阐述内容存在的时空特征,本方法中将此语义单元定义为时空语义单元。S1. Spatiotemporal semantic unit construction. Words, words, phrases, clauses, sentences or paragraphs are all language units in text, and different language units form the basic structure of text through semantic relationships. If some language units or the combination of different language units can express the complete semantic connotation, it is a semantic unit. When the semantic unit contains temporal information and spatial information, it can clearly express the spatiotemporal characteristics of the content in the semantic unit. In this method, the semantic unit is defined as the spatiotemporal semantic unit.
对蕴含台风事件的社交媒体文本进行分析,时空语义单元的分布大致可以分为三类:(1)只描述了同一时间和位置的对象信息,此类文本占据社交媒体文本的大部分;(2)描述了同一时间不同位置的对象信息,此类文本数量相对较少;(3)将多个时间和位置的对象信息进行列举并进行比较,属于综合性报道,此类文本数量很少。By analyzing social media texts containing typhoon events, the distribution of spatiotemporal semantic units can be roughly divided into three categories: (1) only describe object information at the same time and location, and such texts occupy most of social media texts; (2) ) describes the object information at different locations at the same time, and the number of such texts is relatively small; (3) The object information of multiple times and locations is listed and compared, which is a comprehensive report, and the number of such texts is very small.
利用时空信息可以跟踪文本中对象特征的变化情况。因此,本方法基于提取出的时空信息,将社交媒体文本划分为不同的时空语义单元(图2)。以时空信息在文本中的存在位置,作为划分为时空语义单元的依据,具体包括:Using spatiotemporal information can track changes in object features in text. Therefore, this method divides social media texts into different spatiotemporal semantic units based on the extracted spatiotemporal information (Fig. 2). The location of the spatiotemporal information in the text is used as the basis for division into spatiotemporal semantic units, including:
(1)对于第一类情况,由于仅存在唯一的时间、位置信息,将文本整体划分为1个时空语义单元。(1) For the first case, since there is only unique time and location information, the entire text is divided into one spatiotemporal semantic unit.
(2)对于第二类和第三类情况,先依据时间信息划分文本为多个时间单元。当时间单元中存在多个位置信息时,则利用位置信息进一步划分,划分出时空语义单元共享时间单元中的时间信息。(2) For the second and third types of cases, first divide the text into multiple time units according to the time information. When there are multiple location information in the time unit, the location information is used for further division, and the spatiotemporal semantic units are divided to share the time information in the time unit.
S2、对象名称与特征信息的关联规则。在将社交媒体文本划分为了多个时空语义单元的基础上,识别出的对象名称以及各类特征信息分布在不同的单元内。因此,可以依据各个信息要素所隶属的单元进行结构化组织。在每个时空语义单元中,依次按照以下步骤进行不同信息要素的关联:S2. Association rules between object names and feature information. On the basis of dividing social media text into multiple spatiotemporal semantic units, the recognized object names and various feature information are distributed in different units. Therefore, it can be structured according to the unit to which each information element belongs. In each spatiotemporal semantic unit, the following steps are followed to associate different information elements:
(1)特征触发词与特征值的关联。特征触发词与特征值共同构成对象的特征信息,此时专指属性特征和行为特征,特征触发词表示属性项和行为项,特征值表示属性值和行为值。特征触发词与特征值在表达时遵循邻近规律,形成“特征触发词-特征值”的结构。通过统计属性值前三位词语的词频,出现特征触发词的频率超过99%。因此,将特征值与其位置前最接近的特征触发词相关联。(1) The association between feature trigger words and feature values. The feature trigger word and the feature value together constitute the feature information of the object. At this time, it specifically refers to the attribute feature and behavior feature. The feature trigger word represents the attribute item and the behavior item, and the feature value represents the attribute value and the behavior value. Feature trigger words and feature values follow the adjacent law when they are expressed, forming a structure of "feature trigger word-feature value". By counting the word frequencies of the top three words in the attribute value, the frequency of feature trigger words is over 99%. Therefore, the feature value is associated with the closest feature trigger word in front of its position.
(2)属性、行为信息与对象名称的关联。在中文的基本表述习惯中,通常会先提及对象名称,再分别阐述对象具有的各类特征。因此,在同一个时空语义单元中,将属性信息和行为信息分别与其位置前最接近的对象名称相关联。(2) The association of attributes, behavior information and object names. In the basic expression habits of Chinese, the name of the object is usually mentioned first, and then the various characteristics of the object are described separately. Therefore, in the same spatiotemporal semantic unit, attribute information and behavior information are respectively associated with the closest object name before its location.
(3)对象名称与时间、位置信息的关联。对于对象名称所在的时空语义单元,将其时间信息和位置信息分别与对象名称相关联。(3) The association of object names with time and location information. For the spatiotemporal semantic unit where the object name is located, its time information and location information are respectively associated with the object name.
对依次建立关联关系的对象名称与各类特征信息,按照O n=<T,L,A,B>的元组形式进行 填充(图3)。需要说明的是,一个时空语义单元中对于台风事件的描述可能仅限于某一方面,构建对象信息元组时可以存在属性和行为其中一项缺失的情况。 The object names and various types of feature information that are associated in turn are filled according to the tuple form of On =<T, L, A, B> (FIG. 3). It should be noted that the description of typhoon events in a spatiotemporal semantic unit may be limited to a certain aspect, and one of attributes and behaviors may be missing when constructing an object information tuple.
作为一种优选的技术方案,步骤2中对象信息聚合包括:As a preferred technical solution, the object information aggregation in step 2 includes:
1、基于对象名称的聚合。设定聚合条件的对象名称为N,依次判断O n名称与N的相似度sim n。若sim n≥ε n,ε n是对象相似度阈值,则表明是同一个对象,对于同一对象的信息元组进行合并。 1. Aggregation based on object name. The object name of the aggregation condition is set as N, and the similarity sim n between the name of On and N is judged in turn. If sim n ≥ ε n , and ε n is the object similarity threshold, it indicates that it is the same object, and the information tuples of the same object are merged.
对于判断对象名称相似度的度量方法,采用词向量相似度法。词向量相似度法在利用Skip-gram模型训练出词向量模型的基础上,首先将对象名称映射为一个多维空间的向量,通过相似度算法判断不同向量间在多维空间中的方向是否一致,并采用余弦相似度进行度量。For the measurement method of judging the similarity of object names, the word vector similarity method is used. Based on the word vector model trained by the Skip-gram model, the word vector similarity method first maps the object name to a vector in a multi-dimensional space, and determines whether the directions of different vectors in the multi-dimensional space are consistent through the similarity algorithm. It is measured by cosine similarity.
例如,O(台风)=<2019年8月10日1:45,浙江省温岭市,风力:16级,登陆>,O(热带气旋)=<2019年8月11日20:50,山东省青岛市,风力:9级,登陆>为社交媒体中抽取出的信息元组。设定聚合条件的对象名称为“台风”,分别对信息元组中的对象名称“台风”和“热带气旋”进行相似度判断,其语义都是表达气旋本体,则将这两项信息元组作为聚合结果。For example, O (typhoon) = <August 10, 2019 1:45, Wenling City, Zhejiang Province, wind force: 16, landfall>, O (tropical cyclone) = <August 11, 2019 20:50, Shandong Province Qingdao City, wind force: level 9, login> is an information tuple extracted from social media. The object name of the aggregation condition is set as "typhoon", and the similarity of the object names "typhoon" and "tropical cyclone" in the information tuple is judged respectively, and their semantics are to express the cyclone ontology. as the aggregated result.
2、结合对象特征的聚合。在对同一对象的信息元组聚合后,会出现多项相同类型的属性和行为特征信息,可以进一步聚合出符合特定特征的对象信息。在基于对象名称聚合结果的基础上,设定聚合条件的对象属性特征A和行为特征B。对于属性特征的聚合,采用词向量相似度法判断O n属性项与A的相似度sim a。若sim a≥ε a,ε a是属性相似度阈值,则表明属性项相同,可以进行信息聚合,并且在聚合后同样保留各个属性值及时空特征;否则为描述同一对象的不同属性项,不进行属性项的聚合。 2. Combine the aggregation of object features. After aggregating the information tuple of the same object, there will be multiple pieces of attribute and behavior feature information of the same type, which can further aggregate object information that conforms to specific features. On the basis of the aggregation result based on the object name, set the object attribute feature A and behavior feature B of the aggregation condition. For the aggregation of attribute features, the word vector similarity method is used to judge the similarity sim a between the On attribute item and A. If sim a ≥ε a , and ε a is the attribute similarity threshold, it indicates that the attribute items are the same, and information aggregation can be performed, and each attribute value and space-time characteristics are also retained after the aggregation; otherwise, it is a different attribute item describing the same object, no Aggregate property items.
对于行为特征的聚合,词向量相似度法判断O n行为项与B的相似度sim b。若sim b≥ε b,ε b是行为相似度阈值,则表明行为项相同,可以进行信息聚合,并且在聚合后同样保留各个行为信息及时空特征;否则为描述同一对象的不同行为项,不进行行为项的聚合。 For the aggregation of behavior features, the word vector similarity method judges the similarity sim b between On behavior item and B. If sim b ≥ ε b , and ε b is the behavior similarity threshold, it indicates that the behavior items are the same, information aggregation can be performed, and each behavior information and space-time characteristics are also retained after aggregation; otherwise, it is a different behavior item describing the same object, not Aggregate behavior items.
例如,基于上述的O(台风)和O(热带气旋)对象信息元组,进一步聚合台风的“风力”属性特征信息。O(台风)和O(热带气旋)中都有符合相似度阈值的属性项“风力”,因此将<2019年8月10日1:45,浙江省温岭市,风力:16级>和<2019年8月11日20:50,山东省青岛市,风力:9级>作为对象特征的聚合结果。For example, based on the above-mentioned O (typhoon) and O (tropical cyclone) object information tuples, the "wind" attribute feature information of the typhoon is further aggregated. Both O (typhoon) and O (tropical cyclone) have an attribute item "wind force" that meets the similarity threshold, so <August 10, 2019 1:45, Wenling City, Zhejiang Province, wind force: 16> and <2019 August 11, 20:50, Qingdao City, Shandong Province, wind force: level 9 > as an aggregated result of object features.
3、对象聚合结果的信息组织。对象信息聚合结果的组织形式可以表达为图4。其中,O(N)表示聚合的对象,A l是聚合获得的对象的属性项,a ls是具体的属性值,B n是聚合获得的对象的行为项,b nu是具体的行为值,<T,S>是属性值或行为值发生的时间和地点。可以看出, 原本分散的信息碎片都与其描述的对象相关联,对象中相同的属性项和行为项也合并在一起,而每个属性和行为项中都包含了多个时空条件下表现出的不同特征值。 3. Information organization of object aggregation results. The organizational form of the object information aggregation result can be expressed as Figure 4. Among them, O(N) represents the aggregated object, A l is the property item of the aggregated object, a ls is the specific property value, B n is the behavior item of the aggregated object, b nu is the specific behavior value, < T, S> is the time and place where the attribute value or behavior value occurs. It can be seen that the originally scattered pieces of information are all associated with the objects they describe, and the same attribute items and behavior items in the object are also merged together, and each attribute and behavior item contains multiple temporal and spatial conditions. different eigenvalues.
作为一种优选的技术方案,步骤3中状态信息聚合包括:As a preferred technical solution, the state information aggregation in step 3 includes:
1、时空基准统一。时空框架是状态存在的基础,在状态信息聚合中需要建立统一的时空基准。本文的时间基准中将日期设置为公历纪元,时间设置为北京时间,空间基准采用CGCS2000坐标系。1. The time and space benchmarks are unified. The spatiotemporal framework is the basis for the existence of states, and a unified spatiotemporal reference needs to be established in the aggregation of state information. In the time base of this article, the date is set to the Gregorian calendar era, the time is set to Beijing time, and the space base uses the CGCS2000 coordinate system.
2、时空信息规范化。时间信息和位置信息是判定与之关联的属性信息和行为信息是否为描述特定时空条件下对象状态特征的依据。对于时间信息,按照目前人们日常的使用习惯,使用公历纪年、日历时间和时钟时间进行规范化描述。时间规范化形式定义为“日期+时间”的格式“YYYY-MM-DD HH:MM:SS”,例如:“2019-08-10 12:00:00”。位置信息应按照统一空间基准转换为规范化的表示形式,包括地名、地址和空间坐标等描述内容。其中,地名可以参照在特定时间国家发布的标准名称、编码和类别,而地址中包含的地址要素类型和要素组合方式可以参考国家或行业发布的标准规范,空间坐标应遵循空间基准的要求进行相应的坐标转换。2. Standardization of spatiotemporal information. Time information and location information are the basis for judging whether the associated attribute information and behavior information are the basis for describing the state characteristics of objects under specific space-time conditions. For time information, according to the current daily usage habits of people, the Gregorian calendar year, calendar time and clock time are used for standardized description. The time normalization form is defined as "date+time" in the format "YYYY-MM-DD HH:MM:SS", for example: "2019-08-10 12:00:00". Location information should be converted into a normalized representation according to a unified spatial reference, including descriptions such as place names, addresses, and spatial coordinates. Among them, the place name can refer to the standard name, code and category issued by the country at a specific time, and the address element type and element combination method contained in the address can refer to the standard specification issued by the country or industry, and the spatial coordinates should follow the requirements of the spatial datum. coordinate transformation.
3、面向状态的聚合。设定聚合的时间特征t和位置特征l,基于对象层信息聚合结果O(N),在O(N)的每个属性项和行为项中,判断是否存在T=t且S=l的特征值(属性值和行为值),若存在则将此特征值作为聚合信息。否则继续判断是否存在S=l,T<t且与t最接近的特征值,若存在也将此特征值作为聚合信息。若不存在,继续判断是否存在S与l临近,T<t且与t最接近的特征值,若存在同样将此特征值作为聚合信息。若依然不存在,则此属性项或行为项不进行聚合。通过对O(N)中所有属性项和行为项的遍历,每个属性项和行为项中会筛选出最多1项最符合时空特征的特征值。将这些属性信息和行为信息进行聚合,共同形成对象在特定时空条件下的状态信息聚合结果。3. State-oriented aggregation. Set the aggregated time feature t and location feature 1, based on the object layer information aggregation result O(N), in each attribute item and behavior item of O(N), determine whether there is a feature of T=t and S=1 Value (attribute value and behavior value), if present, this feature value is used as aggregate information. Otherwise, continue to judge whether there is an eigenvalue with S=l, T<t and the closest to t, and if so, also use this eigenvalue as aggregated information. If it does not exist, continue to judge whether there is an eigenvalue that S is close to l, T<t and is closest to t, and if there is, this eigenvalue is also used as aggregate information. If it still does not exist, the attribute item or behavior item will not be aggregated. By traversing all attribute items and behavior items in O(N), each attribute item and behavior item will filter out at most one feature value that best fits the spatiotemporal characteristics. These attribute information and behavior information are aggregated to form the aggregated result of the state information of the object under specific spatiotemporal conditions.
例如:社交媒体中有消息记录在8月10日1:45气旋风力在浙江省温岭市达到16级,当聚合(2:00,温岭市)的气旋状态时,由于1:45-2:00之间没有关于风力的信息更新,因此将“风力16级”作为气旋对象在(2:00,温岭市)状态的1项属性特征。通过这种聚合机制,对于获取的任一时空节点上的聚合结果,状态信息不仅限于被明确提及属于当前时空下的对象特征,还包含之前所有时间中全部对象特征截至目前的最新进展,保证了聚合结果的全面性与完整性。For example: there is news in social media that the cyclone wind reached 16 in Wenling City, Zhejiang Province at 1:45 on August 10, when the cyclone state of aggregation (2:00, Wenling City), due to 1:45-2:00 There is no information about wind force between updates, so "wind force level 16" is used as an attribute feature of the cyclone object at (2:00, Wenling City) state. Through this aggregation mechanism, for the aggregation results obtained on any space-time node, the state information is not limited to the object features that are explicitly mentioned as belonging to the current space-time, but also includes the latest progress of all the object features in all previous times up to now, ensuring that The comprehensiveness and completeness of the aggregated results.
4、状态聚合结果的信息组织。状态信息聚合结果的组织形式可以表达为图5。其中,S表示对象O(N)在时间t和位置l上存在的状态,A l和a ls描述状态的属性特征,B n和bn u是 描述状态的行为特征,<T,S>则是属性和行为特征产生的时间和位置。 4. Information organization of state aggregation results. The organizational form of the state information aggregation results can be expressed as Figure 5. Among them, S represents the state of the object O(N) existing at time t and location l, A l and a ls describe the attribute characteristics of the state, B n and bn u are the behavioral characteristics describing the state, and <T, S> is When and where attributes and behavioral characteristics arise.
作为一种优选的技术方案,步骤4中过程信息聚合包括状态序列聚合和事件过程聚合两个部分。过程是不同状态在时空上的连接,并通过状态中属性信息和行为信息的变化体现出过程的动态性。台风事件包含了在事件发生期间多个对象的演化过程,台风事件的过程是由多个对象的不同状态共同构成。因此,在进行过程层信息聚合时采用逐级分解方式,将状态信息到过程信息的连接分级抽象为对象状态、状态序列和事件过程三个阶段(图6)。其中,对象状态聚合了某一时空下对象的属性信息和行为信息;状态序列是记录同一对象的演变历程,需要将同一对象的不同状态进行聚合;事件过程则是多个对象共同的演变历程,由多个状态序列共同构成。As a preferred technical solution, the process information aggregation in step 4 includes two parts: state sequence aggregation and event process aggregation. The process is the connection of different states in time and space, and the dynamics of the process is reflected through the changes of attribute information and behavior information in the state. A typhoon event includes the evolution process of multiple objects during the event, and the process of a typhoon event is composed of different states of multiple objects. Therefore, a step-by-step decomposition method is adopted in the aggregation of process layer information, and the connection between state information and process information is abstracted into three stages: object state, state sequence and event process (Figure 6). Among them, the object state aggregates the attribute information and behavior information of the object in a certain time and space; the state sequence is to record the evolution process of the same object, and different states of the same object need to be aggregated; the event process is the common evolution process of multiple objects. It consists of multiple state sequences.
作为一种优选的技术方案,进行状态序列聚合包括:As a preferred technical solution, performing state sequence aggregation includes:
S1、设定聚合的时间范围tr和空间范围sr,基于对象信息聚合结果O(N),依次遍历O(N)中全部的属性项和行为项。在每个属性项和行为项中,判断是否存在
Figure PCTCN2021072796-appb-000002
Figure PCTCN2021072796-appb-000003
的属性值或行为值,将全部符合tr与sr范围的<T,S>形成时空节点集合。对于集合中全部的时空节点,分别基于步骤3的方法聚合获得多个状态聚合结果。
S1. Set the time range tr and spatial range sr of the aggregation, and traverse all the attribute items and behavior items in O(N) in turn based on the object information aggregation result O(N). In each attribute item and behavior item, determine whether there is
Figure PCTCN2021072796-appb-000002
and
Figure PCTCN2021072796-appb-000003
The attribute value or behavior value of , will all conform to the <T, S> range of tr and sr to form a set of space-time nodes. For all spatiotemporal nodes in the set, multiple state aggregation results are obtained by aggregation based on the method in step 3 respectively.
S2、对全部状态聚合结果进行排序,首先依据状态的时间信息,遵循顺序或倒序的方式进行排序;其次依据状态的位置信息,遵循尺度由大到小或由小到大的方式进行排序;最后依据状态的属性信息和行为信息,可以依据特征值的大小或等级排序,也可以依据与用户聚合条件的相似度进行排序。按照三维条件排列的状态序列即为单一对象的过程聚合结果。S2. Sort all the state aggregation results. First, according to the time information of the state, sort in order or in reverse order; secondly, according to the position information of the state, follow the scale from large to small or from small to large. According to the attribute information and behavior information of the state, it can be sorted according to the size or level of the feature value, or it can be sorted according to the similarity with the user aggregation condition. The sequence of states arranged according to three-dimensional conditions is the result of the process aggregation of a single object.
S3、状态序列聚合结果的信息组织。状态序列信息聚合结果的组织形式可以表达为图5。其中,P表示对象O(N)在时间范围tr和空间范围sr上经历的过程,S表示在时空节点<t n,l n>上的对象状态。 S3. Information organization of state sequence aggregation results. The organizational form of the state sequence information aggregation results can be expressed as Figure 5. Among them, P represents the process experienced by the object O(N) on the temporal scope tr and the spatial scope sr, and S represents the object state on the space-time node <t n , ln >.
作为一种优选的技术方案,进行事件过程聚合包括:As a preferred technical solution, performing event process aggregation includes:
S1、设定聚合的时间范围tr和空间范围sr,基于多项对象信息聚合结果O(N s)-O(N t),先遍历O(N s)中全部的属性项和行为项,获得符合tr与sr范围的<T,S>。再继续遍历O(N s+1),直至遍历完O(N t)。将全部符合tr与sr范围的<T,S>形成时空节点集合。 S1. Set the time range tr and spatial range sr of the aggregation, based on the multi-object information aggregation result O(N s )-O(N t ), first traverse all the attribute items and behavior items in O(N s ), and obtain <T, S> conforming to the range of tr and sr. Continue to traverse O(N s+1 ) until O(N t ) is traversed. All <T, S> in the range of tr and sr are formed into a set of space-time nodes.
S2、对于多个对象状态序列还需要采取相同的排序机制,以保证聚合结果整体次序的一致性。对于面向事件过程的聚合结果,通过比较过程前后不同时间节点的状态特征,可以分析出空间特征的移动,以及属性、行为特征的差异,显式地记录整个台风事件的动态过程(图7)。S2. For multiple object state sequences, the same sorting mechanism needs to be adopted to ensure the consistency of the overall order of the aggregation results. For the aggregated results oriented to the event process, by comparing the state characteristics of different time nodes before and after the process, the movement of spatial features, as well as the differences in attributes and behavioral characteristics can be analyzed, and the dynamic process of the entire typhoon event can be recorded explicitly (Figure 7).
以上已对本发明创造的较佳实施例进行了具体说明,但本发明创造并不限于所述实施例, 熟悉本领域的技术人员在不违背本发明创造精神的前提下还可做出种种的等同的变型或替换,这些等同的变型或替换均包含在本申请权利要求所限定的范围。The preferred embodiments of the present invention have been specifically described above, but the present invention is not limited to the embodiments. Those skilled in the art can also make various equivalents without departing from the spirit of the present invention. Modifications or substitutions of the present application, and these equivalent modifications or substitutions are all included in the scope defined by the claims of the present application.

Claims (6)

  1. 台风事件信息聚合方法,其特征在于,主要步骤如下:The typhoon event information aggregation method is characterized in that the main steps are as follows:
    步骤1、采集社交媒体中与台风事件相关的消息文本,并从中抽取台风事件信息,并转换为结构化的信息元组形式;Step 1. Collect the message text related to the typhoon event in the social media, extract the typhoon event information from it, and convert it into a structured information tuple form;
    步骤2、基于多特征相似度的对象信息聚合:依据对象名称间的相似度判断其是否属于同一对象的信息元组,需要将描述同一对象的信息元组进行聚合;Step 2. Object information aggregation based on multi-feature similarity: according to the similarity between object names to determine whether it belongs to the information tuple of the same object, it is necessary to aggregate the information tuples describing the same object;
    步骤3、基于时空特征的状态信息聚合:在对象信息聚合结果中筛选符合单一时间和位置条件要求的属性值和行为值,时间信息、位置信息与筛选出的属性值和行为值共同构成对象在特定时空下的状态信息聚合结果;Step 3. Aggregation of state information based on spatiotemporal features: In the aggregation result of object information, the attribute values and behavior values that meet the requirements of a single time and location condition are screened. Time information, location information, and the filtered attribute values and behavior values together constitute the object in the Aggregation results of state information in a specific time and space;
    步骤4、基于状态的过程信息聚合:在对象信息聚合结果中筛选符合时间和位置范围要求的时空节点信息,对这些时空节点分别进行状态信息聚合,并将多个状态信息聚合结果进行排序,形成体现动态特性的过程信息聚合结果。Step 4. State-based process information aggregation: screen the space-time node information that meets the time and location range requirements in the object information aggregation result, perform state information aggregation on these space-time nodes respectively, and sort multiple state information aggregation results to form Process information aggregation results reflecting dynamic characteristics.
  2. 根据权利要求1所述的台风事件信息聚合方法,其特征在于,在步骤1中,所述台风事件信息包括对象名称、时间信息、位置信息、属性信息和行为信息。The typhoon event information aggregation method according to claim 1, wherein in step 1, the typhoon event information includes object name, time information, location information, attribute information and behavior information.
  3. 根据权利要求1所述的台风事件信息聚合方法,其特征在于,在步骤2中,对于描述同一对象的不同信息元组,其中相同类型的属性项和行为项也需要进行进一步聚合。The typhoon event information aggregation method according to claim 1, wherein in step 2, for different information tuples describing the same object, the attribute items and behavior items of the same type also need to be further aggregated.
  4. 根据权利要求1所述的台风事件信息聚合方法,其特征在于,在步骤1中,台风事件信息抽取至少包括信息要素识别和信息要素关联两个部分:The typhoon event information aggregation method according to claim 1, wherein in step 1, the typhoon event information extraction includes at least two parts: information element identification and information element association:
    信息要素识别:明确台风事件的组成对象并构建分类体系,从社交媒体文本中抽取描述不同类型对象的名称与特征信息,其中特征信息包括时间、位置、属性和行为。属性信息可以进一步分为属性项和属性值,属性项表示属性的类型,而属性值为该类型属性具有的数据或数据量。行为信息与属性信息相类似;Identification of information elements: clarify the constituent objects of typhoon events and build a classification system, extract the names and characteristic information describing different types of objects from social media texts, and the characteristic information includes time, location, attributes and behaviors. The attribute information can be further divided into attribute items and attribute values, the attribute items represent the type of the attribute, and the attribute value is the data or the amount of data possessed by the attribute of this type. Behavior information is similar to attribute information;
    信息要素关联:在同一篇社交媒体文本中,将特征信息依据其表征对象与名称进行关联,形成O n=<T,L,A,B>形式的信息元组。其中,O n为对象名称,T为时间信息,L为位置信息,A为属性信息,B为行为信息。 Information element association: In the same social media text, the feature information is associated with the name according to its representative object to form an information tuple in the form of On =<T, L, A, B>. Among them, On is the object name, T is the time information, L is the location information, A is the attribute information, and B is the behavior information.
  5. 根据权利要求1所述的台风事件信息聚合方法,其特征在于,在步骤2中,采用词向量相似度判断对象名称、属性项和行为项之间相似性,包括以下步骤:The typhoon event information aggregation method according to claim 1, characterized in that, in step 2, using word vector similarity to judge the similarity between object names, attribute items and behavior items, comprising the following steps:
    S1、将全部社交媒体文本数据进行分词;S1. Perform word segmentation on all social media text data;
    S2、将分词结果作为训练集,利用Skip-gram模型进行词向量训练;S2. The word segmentation result is used as the training set, and the Skip-gram model is used for word vector training;
    S3、设定对象名称O n1、O n2,属性项A 1、A 2,行为项B 1、B 2,依据训练过的词向量模型分别获得O n1、O n2、A 1、A 2、B 1、B 2的词向量E(O n1)、E(O n2)、E(A 1)、E(A 2)、E(B 1)、E(B 2); S3. Set object names On1 , On2 , attribute items A1 , A2, behavior items B1, B2, respectively obtain On1 , On2 , A1 , A2 , B according to the trained word vector model 1. Word vectors E(O n1 ), E(O n2 ), E(A 1 ), E(A 2 ), E(B 1 ), E(B 2 ) of B 2 ;
    S4、利用余弦相似度分别计算E(O n1)与E(O n2)、E(A 1)与E(A 2)、E(B 1)与E(B 2)之间的相似度值sim n、sim a和sim b。若sim n≥ε n,sim a≥ε a,sim b≥ε b,其中ε nS4. Calculate the similarity value sim between E(O n1 ) and E(O n2 ), E(A 1 ) and E(A 2 ), and E(B 1 ) and E(B 2 ) respectively by using the cosine similarity n , sim a and sim b . If sim n ≥ε n , sim a ≥ε a , sim b ≥ε b , where ε n ,
    ε a、ε b是阈值,则表明O n1与O n2、A 1与A 2、B 1与B 2是相同的对象名称、属性项和行为项,可以进行相应的信息聚合。 ε a and ε b are thresholds, indicating that On1 and On2 , A 1 and A 2 , B 1 and B 2 are the same object name, attribute item and behavior item, and corresponding information aggregation can be performed.
  6. 根据权利要求1所述的台风事件信息聚合方法,其特征在于,在步骤4中,对多个状态信息聚合结果进行排序时,包括以下步骤:The method for aggregating typhoon event information according to claim 1, wherein in step 4, when sorting a plurality of state information aggregation results, the following steps are included:
    A1、依据状态的时间信息,遵循顺序或倒序的方式进行排序;A1. According to the time information of the state, follow the order or reverse order;
    A2、依据状态的位置信息,遵循尺度由大到小或由小到大的方式进行排序;A2. According to the position information of the state, follow the order of the scale from large to small or from small to large;
    A3、依据状态的属性信息和行为信息,可以依据特征值的大小或等级排序,也可以依据与用户聚合条件的相似度进行排序。A3. According to the attribute information and behavior information of the state, it can be sorted according to the size or level of the feature value, or it can be sorted according to the similarity with the user aggregation condition.
PCT/CN2021/072796 2020-11-10 2021-01-20 Information aggregation method for typhoon events WO2022099927A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2022505249A JP2023504961A (en) 2020-11-10 2021-01-20 Typhoon incident information convergence method

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011245204.3 2020-11-10
CN202011245204.3A CN112328794B (en) 2020-11-10 2020-11-10 Typhoon event information aggregation method

Publications (1)

Publication Number Publication Date
WO2022099927A1 true WO2022099927A1 (en) 2022-05-19

Family

ID=74317863

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/072796 WO2022099927A1 (en) 2020-11-10 2021-01-20 Information aggregation method for typhoon events

Country Status (3)

Country Link
JP (1) JP2023504961A (en)
CN (1) CN112328794B (en)
WO (1) WO2022099927A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880498A (en) * 2022-07-11 2022-08-09 北京百度网讯科技有限公司 Event information display method and device, equipment and medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113903238B (en) * 2021-09-23 2023-10-03 成都信息工程大学 Typhoon virtual simulation experiment teaching system and typhoon virtual simulation experiment teaching method
CN114282534A (en) * 2021-12-30 2022-04-05 南京大峡谷信息科技有限公司 Meteorological disaster event aggregation method based on element information extraction
CN114003646A (en) * 2021-12-30 2022-02-01 南京师范大学 High-concurrency real-time multi-attribute aggregated map cluster service system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100005260A1 (en) * 2008-07-02 2010-01-07 Shintaro Inoue Storage system and remote copy recovery method
CN110008355A (en) * 2019-04-11 2019-07-12 华北科技学院 The disaster scene information fusion method and device of knowledge based map
CN110009158A (en) * 2019-04-11 2019-07-12 中国水利水电科学研究院 Heavy Rain of Typhoon flood damage Life cycle monitoring method and system
CN111241311A (en) * 2020-01-09 2020-06-05 腾讯科技(深圳)有限公司 Media information recommendation method and device, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102541886B (en) * 2010-12-20 2015-04-01 郝敬涛 System and method for identifying relationship among user group and users
CN106484767B (en) * 2016-09-08 2019-06-21 中国科学院信息工程研究所 A kind of event extraction method across media
US10229193B2 (en) * 2016-10-03 2019-03-12 Sap Se Collecting event related tweets
CN107220286B (en) * 2017-04-24 2020-03-17 深圳市龙岗远望软件技术有限公司 Emergency command information presentation method, emergency command system platform and server
KR20210086833A (en) * 2019-12-30 2021-07-09 동국대학교 산학협력단 System and method of providing disaster information using SNS database
CN111708879A (en) * 2020-05-11 2020-09-25 北京明略软件系统有限公司 Text aggregation method and device for event and computer-readable storage medium

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100005260A1 (en) * 2008-07-02 2010-01-07 Shintaro Inoue Storage system and remote copy recovery method
CN110008355A (en) * 2019-04-11 2019-07-12 华北科技学院 The disaster scene information fusion method and device of knowledge based map
CN110009158A (en) * 2019-04-11 2019-07-12 中国水利水电科学研究院 Heavy Rain of Typhoon flood damage Life cycle monitoring method and system
CN111241311A (en) * 2020-01-09 2020-06-05 腾讯科技(深圳)有限公司 Media information recommendation method and device, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHANG CHUNJU: "Interpretation of Event Spatio-temporal and Attribute Information in Chinese Text", CHINESE DOCTORAL DISSERTATIONS FULL-TEXT DATABASE, 10 February 2013 (2013-02-10), pages 1 - 157, XP055929399 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114880498A (en) * 2022-07-11 2022-08-09 北京百度网讯科技有限公司 Event information display method and device, equipment and medium

Also Published As

Publication number Publication date
CN112328794B (en) 2021-08-24
CN112328794A (en) 2021-02-05
JP2023504961A (en) 2023-02-08

Similar Documents

Publication Publication Date Title
WO2022099927A1 (en) Information aggregation method for typhoon events
CN109992645B (en) Data management system and method based on text data
US9589208B2 (en) Retrieval of similar images to a query image
Van Ham et al. “Search, show context, expand on demand”: Supporting large graph exploration with degree-of-interest
WO2022121560A1 (en) Data query method and apparatus, device, and computer-readable storage medium
CN106919689A (en) Professional domain knowledge mapping dynamic fixing method based on definitions blocks of knowledge
JP5626733B2 (en) Personal information anonymization apparatus and method
WO2021175009A1 (en) Early warning event graph construction method and apparatus, device, and storage medium
Pervin et al. Fast, scalable, and context-sensitive detection of trending topics in microblog post streams
CN102622453A (en) Body-based food security event semantic retrieval system
CN110232126B (en) Hot spot mining method, server and computer readable storage medium
CN113239111B (en) Knowledge graph-based network public opinion visual analysis method and system
Qu et al. Efficient online summarization of large-scale dynamic networks
Wick et al. A joint model for discovering and linking entities
CN108647322A (en) The method that word-based net identifies a large amount of Web text messages similarities
Sapul et al. Trending topic discovery of Twitter Tweets using clustering and topic modeling algorithms
JP5504097B2 (en) Binary relation classification program, method and apparatus for classifying semantically similar word pairs into binary relation
Larriba-Pey et al. Introduction to graph databases
CN114996549A (en) Intelligent tracking method and system based on active object information mining
Benedetti et al. Exposing the underlying schema of LOD sources
Hani et al. Fane-kg: A semantic knowledge graph for context-based fake news detection on social media
CN114385845A (en) Image classification management method and system based on graph clustering
CN109871429B (en) Short text retrieval method integrating Wikipedia classification and explicit semantic features
CN115309789B (en) Method for generating associated data graph in real time based on intelligent dynamic business object
Spitz et al. A versatile hypergraph model for document collections

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2022505249

Country of ref document: JP

Kind code of ref document: A

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21890465

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21890465

Country of ref document: EP

Kind code of ref document: A1

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 28.09.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21890465

Country of ref document: EP

Kind code of ref document: A1