WO2022099927A1

WO2022099927A1 - Information aggregation method for typhoon events

Info

Publication number: WO2022099927A1
Application number: PCT/CN2021/072796
Authority: WO
Inventors: 张雪英; 怀安; 叶鹏
Original assignee: 南京师范大学
Priority date: 2020-11-10
Filing date: 2021-01-20
Publication date: 2022-05-19
Also published as: CN112328794B; CN112328794A; JP2023504961A

Abstract

An information aggregation method for typhoon events comprises the following main steps: step 1, acquiring, from social media, message texts related to typhoon events, extracting typhoon event information therefrom, and converting the information into structured information tuples; step 2, performing object information aggregation on the basis of levels of similarity between multiple features; step 3, performing state information aggregation on the basis of spatio-temporal features; and step 4, performing process information aggregation on the basis of states, wherein the step of performing process information aggregation on the basis of states comprises: screening object information aggregation results to obtain information of spatio-temporal nodes that meet time and location range requirements, performing state information aggregation with respect to the spatio-temporal nodes respectively, and arranging multiple state information aggregation results in order so as to form a process information aggregation result having dynamic characteristics. The information aggregation method for typhoon events is used to screen, organize and integrate typhoon event information acquired from scattered sources in social media, thereby providing an ordered information basis for assessing development stages and situations in typhoon event processes.

Description

Typhoon event information aggregation method

technical field

The invention belongs to the field of big data mining, and in particular relates to a typhoon event information aggregation method.

Background technique

Typhoons will have a very serious and destructive impact on the natural ecology, social economy and even human sustainable development. Timely acquisition of relevant information on the evolution of typhoon events has become an important basis and reference for disaster emergency response. In the current big data environment, social media has shown great application potential in disaster management with its efficient update frequency, multi-source communication channels and wide participation, and has gradually developed into a new way to obtain information on typhoon events. However, due to the short text characteristics of social media itself, it also has the characteristics of high information fragmentation, complex and diverse forms of expression, and diverse information granularity. Huge and scattered social media information is not only difficult to reflect the full picture of the evolution of typhoon events, but also hinders users from effectively detecting the process of typhoon events.

The information aggregation method improves the rationality of information organization and optimizes the access efficiency through the effective description of information resources, so as to meet the needs and convenience of users to obtain effective information resources. Information aggregation methods for disaster events mainly include statistical-based methods, topic model-based methods and knowledge element-based methods: (1) Statistical methods use statistical features such as word frequency, TF-IDF, N-gram, and mutual information to calculate information. The keyword weight in the unit, from which the most representative keywords are selected and aggregated based on this. This kind of method is simple, subjective and easy to understand, but due to the low accuracy of keyword screening, it is generally necessary to combine auxiliary information for secondary screening. (2) The probabilistic topic model assumes that each document has a latent distribution over all the topic words, and the topic word probability distribution can be used to represent the topics in the information unit. However, the effect of this type of method depends on the determination of the number of topics. In reality, different topics in social media are always changing dynamically. The same message on social media may contain content of multiple topics, which also makes the interpretability of topic words more controversial. (3) Knowledge element is to define the logical relationship and hierarchical structure between different concepts. Common knowledge element forms include ontology, semantic network, linked data and so on. Knowledge element-based aggregation is based on knowledge element theory. By constructing a conceptual model describing the structure of disaster events, information is reordered and organized according to the semantic relationships defined in the model to reveal information features and their associations.

Currently, methods based on statistics and topic models are the most common ways to aggregate disaster event information. However, the information granularity of the aggregated results of these two types of methods is relatively coarse, and usually only all kinds of information related to disaster events are gathered together. In contrast, the aggregation method based on knowledge elements can decompose and reorganize the original resources according to the conceptual system of the disaster field, and obtain in-depth aggregation results with a certain knowledge structure. However, the existing knowledge modeling of typhoon events mostly focuses on the hierarchical structure and relationship of various concepts in typhoon events, ignoring the description and expression of the dynamic process of typhoon events. Faced with the scattered distribution of massive and complex social media resources, it is necessary to build an information aggregation method to orderly integrate typhoon event information according to the evolution of events.

SUMMARY OF THE INVENTION

The purpose of the present invention is to provide a typhoon event information aggregation method, which can screen, organize and integrate typhoon event information from scattered sources in social media, and provide an orderly information basis for detecting the development stage and situation of the typhoon event process. It is conducive to the improvement of social media resource service capabilities in emergency management.

To achieve the above object, the present invention provides the following technical solutions:

The main steps of the typhoon event information aggregation method are as follows:

Step 1. Collect the message text related to the typhoon event in the social media, extract the typhoon event information from it, and convert it into a structured information tuple form;

Step 2. Object information aggregation based on multi-feature similarity: according to the similarity between object names to determine whether it belongs to the information tuple of the same object, it is necessary to aggregate the information tuples describing the same object;

Step 3. Aggregation of state information based on spatiotemporal features: In the aggregation result of object information, the attribute values and behavior values that meet the requirements of a single time and location condition are screened. Time information, location information, and the filtered attribute values and behavior values together constitute the object in the Aggregation results of state information in a specific time and space;

Step 4. State-based process information aggregation: screen the space-time node information that meets the time and location range requirements in the object information aggregation result, perform state information aggregation on these space-time nodes respectively, and sort multiple state information aggregation results to form Process information aggregation results reflecting dynamic characteristics.

Preferably, in step 1, the typhoon event information includes object name, time information, location information, attribute information and behavior information.

Preferably, in step 2, for different information tuples describing the same object, attribute items and behavior items of the same type also need to be further aggregated.

Preferably, in step 1, the extraction of typhoon event information includes at least two parts: information element identification and information element association:

Identification of information elements: clarify the constituent objects of typhoon events and build a classification system, extract the names and characteristic information describing different types of objects from social media texts, and the characteristic information includes time, location, attributes and behaviors. The attribute information can be further divided into attribute items and attribute values, the attribute items represent the type of the attribute, and the attribute value is the data or the amount of data possessed by the attribute of this type. Behavior information is similar to attribute information;

Information element association: In the same social media text, the feature information _is associated with the name according to its representative object to form an information tuple in the form of On =<T, L, A, B>. Among them, On _is the object name, T is the time information, L is the location information, A is the attribute information, and B is the behavior information.

Preferably, in step 2, the similarity between object names, attribute items and behavior items is judged by using word vector similarity, including the following steps:

S1. Perform word segmentation on all social media text data;

S2. The word segmentation result is used as the training set, and the Skip-gram model is used for word vector training;

S3. Set object names _On1 , _On2 , attribute items _A1 , A2, behavior items B1, _{B2, respectively obtain On1} _, _On2 , _A1 , _A2 , _B according to the _trained word vector model _1. Word vectors E(O _n1 ), E(O _n2 ), E(A ₁ ), E(A ₂ ), E(B ₁ ), E(B ₂ ) of B ₂ ;

S4. Calculate the similarity value sim between E(O _n1 ) and E(O _n2 ), E(A ₁ ) and E(A ₂ ), and E(B ₁ ) and E(B ₂ ) respectively by using the cosine similarity _n , sim _a and sim _b . If sim _n ≥ ε _n , sim _a ≥ ε _a , sim _b ≥ ε _b , where ε _n , ε _a , and ε _b are thresholds, it means that O _n1 and O _n2 , A ₁ and A ₂ , B ₁ and B ₂ are the same object name, attribute item, and behavior item, and corresponding information aggregation can be performed.

Preferably, in step 4, when sorting multiple state information aggregation results, the following steps are included:

A1. According to the time information of the state, follow the order or reverse order;

A2. According to the position information of the state, follow the order of the scale from large to small or from small to large;

A3. According to the attribute information and behavior information of the state, it can be sorted according to the size or level of the feature value, or it can be sorted according to the similarity with the user aggregation condition.

By adopting the above technical solutions, the following technical effects can be achieved:

The present invention constructs a social media-based typhoon event process information aggregation method. On the basis of identifying different object information tuples related to typhoon events in social media texts, the multi-level aggregation is described from "object-state-process" respectively. model. First, in the object layer, according to the similarity of multi-dimensional features, various types of scattered feature information of the same object are aggregated; secondly, in the state layer, the attribute information and behavior information in the object that conform to specific spatiotemporal characteristics are aggregated to realize information spatiotemporal. Unification of granularity; finally, in the process layer, multiple states are sorted according to the space-time relationship to achieve the effect of orderly organization of information. This aggregation mode aims at the decentralization, multi-granularity and disordered description characteristics of information in social media, and also fully takes into account the dynamic evolution characteristics of typhoon events. Ordered information on the process characteristics of typhoon events. In practical application scenarios, it can play an important role in meeting the emergency task needs of government agencies and the public's cognitive needs.

Description of drawings

Figure 1 shows the multi-level typhoon event process information aggregation model;

Figure 2 shows the spatiotemporal semantic unit constructed in social media;

Figure 3 is an example of typhoon event information extraction results in social media;

Fig. 4 is the organizational structure and example of the object information aggregation result;

Fig. 5 is the organizational structure and example of the state information aggregation result;

Figure 6 shows the different stages of process information aggregation;

Figure 7 shows the organization structure and example of the process information aggregation result.

Detailed ways

The present invention will be further described below with reference to the accompanying drawings and specific embodiments.

Example

The invention discloses a social media-based typhoon event process information aggregation method, including:

Step 1. Collect message texts related to typhoon events in social media, and extract typhoon event information from them, including object name, time information, location information, attribute information, and behavior information, and convert them into a structured information tuple form.

Step 2. Object information aggregation based on multi-feature similarity. To judge whether the object names belong to the information tuple of the same object according to the similarity between the object names, it is necessary to aggregate the information tuples describing the same object. For different information tuples describing the same object, attribute items and behavior items of the same type also need to be further aggregated.

Step 3: Aggregate state information based on spatiotemporal features. Attribute values and behavior values that meet the requirements of a single time and location condition are filtered in the object information aggregation result. Time information, location information, and the filtered attribute values and behavior values together constitute the state information aggregation result of the object under a specific time and space.

Step 4, state-based process information aggregation. The space-time node information that meets the requirements of time and location range is screened in the object information aggregation result, the state information is aggregated for these space-time nodes respectively, and the multiple state aggregation results are sorted to form the process information aggregation result that reflects the dynamic characteristics.

As a preferred technical solution, the typhoon event information extraction in step 1 includes:

1. Identify the constituent objects of typhoon events and build a classification system, and extract the names and feature information describing different types of objects from social media texts, where the feature information includes time, location, attributes, and behavior. The attribute information can be further divided into attribute items and attribute values, the attribute items represent the type of the attribute, and the attribute value is the data or the amount of data possessed by the attribute of this type. Behavior information is similar to attribute information.

2. In the same social media text, the feature information _is associated with the name according to its representative object to form an information tuple in the form of On =<T, L, A, B>. Among them, On _is the object name, T is the time information, L is the location information, A is the attribute information, and B is the behavior information.

As a preferred technical solution, the typhoon event composition objects are divided into subject objects and object objects. The cyclone, as a hazard factor, is the main object in the event, and other objects that are damaged, acted, and affected by the cyclone are the object objects in the event. According to the different properties of the objects, they can be classified separately, mainly including people, infrastructure, transportation facilities, social activities and other types. It should be noted that different objects can learn from the classification methods of related fields, and make more detailed classification according to actual needs (Table 1).

Table 1 Main object types in typhoon events

As a preferred technical solution, extracting names and feature information describing different types of objects from social media texts includes:

S1. Build a social media text typhoon event information annotation corpus, and the annotated content includes name, time, location, attribute and behavior information elements describing different types of objects.

S2. According to the annotated corpus, a time information extraction model is constructed based on the conditional random field model, and the time information in the social media text is automatically identified.

S3. According to the labeled corpus, a location information extraction model is constructed based on the deep belief network, and the location information in the social media text is automatically identified.

S4. Summarize rule models of object names, attribute information and behavior information, including trigger word dictionaries and syntactic patterns, based on the labeled corpus, and automatically identify object names, attribute information, and behavior information in social media texts.

As a preferred technical solution, various types of information elements extracted from social media need to be correlated, including:

S1. Spatiotemporal semantic unit construction. Words, words, phrases, clauses, sentences or paragraphs are all language units in text, and different language units form the basic structure of text through semantic relationships. If some language units or the combination of different language units can express the complete semantic connotation, it is a semantic unit. When the semantic unit contains temporal information and spatial information, it can clearly express the spatiotemporal characteristics of the content in the semantic unit. In this method, the semantic unit is defined as the spatiotemporal semantic unit.

By analyzing social media texts containing typhoon events, the distribution of spatiotemporal semantic units can be roughly divided into three categories: (1) only describe object information at the same time and location, and such texts occupy most of social media texts; (2) ) describes the object information at different locations at the same time, and the number of such texts is relatively small; (3) The object information of multiple times and locations is listed and compared, which is a comprehensive report, and the number of such texts is very small.

Using spatiotemporal information can track changes in object features in text. Therefore, this method divides social media texts into different spatiotemporal semantic units based on the extracted spatiotemporal information (Fig. 2). The location of the spatiotemporal information in the text is used as the basis for division into spatiotemporal semantic units, including:

(1) For the first case, since there is only unique time and location information, the entire text is divided into one spatiotemporal semantic unit.

(2) For the second and third types of cases, first divide the text into multiple time units according to the time information. When there are multiple location information in the time unit, the location information is used for further division, and the spatiotemporal semantic units are divided to share the time information in the time unit.

S2. Association rules between object names and feature information. On the basis of dividing social media text into multiple spatiotemporal semantic units, the recognized object names and various feature information are distributed in different units. Therefore, it can be structured according to the unit to which each information element belongs. In each spatiotemporal semantic unit, the following steps are followed to associate different information elements:

(1) The association between feature trigger words and feature values. The feature trigger word and the feature value together constitute the feature information of the object. At this time, it specifically refers to the attribute feature and behavior feature. The feature trigger word represents the attribute item and the behavior item, and the feature value represents the attribute value and the behavior value. Feature trigger words and feature values follow the adjacent law when they are expressed, forming a structure of "feature trigger word-feature value". By counting the word frequencies of the top three words in the attribute value, the frequency of feature trigger words is over 99%. Therefore, the feature value is associated with the closest feature trigger word in front of its position.

(2) The association of attributes, behavior information and object names. In the basic expression habits of Chinese, the name of the object is usually mentioned first, and then the various characteristics of the object are described separately. Therefore, in the same spatiotemporal semantic unit, attribute information and behavior information are respectively associated with the closest object name before its location.

(3) The association of object names with time and location information. For the spatiotemporal semantic unit where the object name is located, its time information and location information are respectively associated with the object name.

The object names and various types of feature information that are associated in turn are filled according to the _tuple form of On =<T, L, A, B> (FIG. 3). It should be noted that the description of typhoon events in a spatiotemporal semantic unit may be limited to a certain aspect, and one of attributes and behaviors may be missing when constructing an object information tuple.

As a preferred technical solution, the object information aggregation in step 2 includes:

1. Aggregation based on object name. The object name of the aggregation condition is set as N, and the similarity sim _n between the name of On and _N is judged in turn. If sim _n ≥ ε _n , and ε _n is the object similarity threshold, it indicates that it is the same object, and the information tuples of the same object are merged.

For the measurement method of judging the similarity of object names, the word vector similarity method is used. Based on the word vector model trained by the Skip-gram model, the word vector similarity method first maps the object name to a vector in a multi-dimensional space, and determines whether the directions of different vectors in the multi-dimensional space are consistent through the similarity algorithm. It is measured by cosine similarity.

For example, O (typhoon) = <August 10, 2019 1:45, Wenling City, Zhejiang Province, wind force: 16, landfall>, O (tropical cyclone) = <August 11, 2019 20:50, Shandong Province Qingdao City, wind force: level 9, login> is an information tuple extracted from social media. The object name of the aggregation condition is set as "typhoon", and the similarity of the object names "typhoon" and "tropical cyclone" in the information tuple is judged respectively, and their semantics are to express the cyclone ontology. as the aggregated result.

2. Combine the aggregation of object features. After aggregating the information tuple of the same object, there will be multiple pieces of attribute and behavior feature information of the same type, which can further aggregate object information that conforms to specific features. On the basis of the aggregation result based on the object name, set the object attribute feature A and behavior feature B of the aggregation condition. For the aggregation of attribute features, the word _vector similarity method is used to judge the similarity sim _a between the On attribute item and A. If sim _a ≥ε _a , and ε _a is the attribute similarity threshold, it indicates that the attribute items are the same, and information aggregation can be performed, and each attribute value and space-time characteristics are also retained after the aggregation; otherwise, it is a different attribute item describing the same object, no Aggregate property items.

For the aggregation of behavior features, the word _vector similarity method judges the similarity sim _b between On behavior item and B. If sim _b ≥ ε _b , and ε _b is the behavior similarity threshold, it indicates that the behavior items are the same, information aggregation can be performed, and each behavior information and space-time characteristics are also retained after aggregation; otherwise, it is a different behavior item describing the same object, not Aggregate behavior items.

For example, based on the above-mentioned O (typhoon) and O (tropical cyclone) object information tuples, the "wind" attribute feature information of the typhoon is further aggregated. Both O (typhoon) and O (tropical cyclone) have an attribute item "wind force" that meets the similarity threshold, so <August 10, 2019 1:45, Wenling City, Zhejiang Province, wind force: 16> and <2019 August 11, 20:50, Qingdao City, Shandong Province, wind force: level 9 > as an aggregated result of object features.

3. Information organization of object aggregation results. The organizational form of the object information aggregation result can be expressed as Figure 4. Among them, O(N) represents the aggregated object, A _l is the property item of the aggregated object, a _ls is the specific property value, B _n is the behavior item of the aggregated object, b _nu is the specific behavior value, < T, S> is the time and place where the attribute value or behavior value occurs. It can be seen that the originally scattered pieces of information are all associated with the objects they describe, and the same attribute items and behavior items in the object are also merged together, and each attribute and behavior item contains multiple temporal and spatial conditions. different eigenvalues.

As a preferred technical solution, the state information aggregation in step 3 includes:

1. The time and space benchmarks are unified. The spatiotemporal framework is the basis for the existence of states, and a unified spatiotemporal reference needs to be established in the aggregation of state information. In the time base of this article, the date is set to the Gregorian calendar era, the time is set to Beijing time, and the space base uses the CGCS2000 coordinate system.

2. Standardization of spatiotemporal information. Time information and location information are the basis for judging whether the associated attribute information and behavior information are the basis for describing the state characteristics of objects under specific space-time conditions. For time information, according to the current daily usage habits of people, the Gregorian calendar year, calendar time and clock time are used for standardized description. The time normalization form is defined as "date+time" in the format "YYYY-MM-DD HH:MM:SS", for example: "2019-08-10 12:00:00". Location information should be converted into a normalized representation according to a unified spatial reference, including descriptions such as place names, addresses, and spatial coordinates. Among them, the place name can refer to the standard name, code and category issued by the country at a specific time, and the address element type and element combination method contained in the address can refer to the standard specification issued by the country or industry, and the spatial coordinates should follow the requirements of the spatial datum. coordinate transformation.

3. State-oriented aggregation. Set the aggregated time feature t and location feature 1, based on the object layer information aggregation result O(N), in each attribute item and behavior item of O(N), determine whether there is a feature of T=t and S=1 Value (attribute value and behavior value), if present, this feature value is used as aggregate information. Otherwise, continue to judge whether there is an eigenvalue with S=l, T<t and the closest to t, and if so, also use this eigenvalue as aggregated information. If it does not exist, continue to judge whether there is an eigenvalue that S is close to l, T<t and is closest to t, and if there is, this eigenvalue is also used as aggregate information. If it still does not exist, the attribute item or behavior item will not be aggregated. By traversing all attribute items and behavior items in O(N), each attribute item and behavior item will filter out at most one feature value that best fits the spatiotemporal characteristics. These attribute information and behavior information are aggregated to form the aggregated result of the state information of the object under specific spatiotemporal conditions.

For example: there is news in social media that the cyclone wind reached 16 in Wenling City, Zhejiang Province at 1:45 on August 10, when the cyclone state of aggregation (2:00, Wenling City), due to 1:45-2:00 There is no information about wind force between updates, so "wind force level 16" is used as an attribute feature of the cyclone object at (2:00, Wenling City) state. Through this aggregation mechanism, for the aggregation results obtained on any space-time node, the state information is not limited to the object features that are explicitly mentioned as belonging to the current space-time, but also includes the latest progress of all the object features in all previous times up to now, ensuring that The comprehensiveness and completeness of the aggregated results.

4. Information organization of state aggregation results. The organizational form of the state information aggregation results can be expressed as Figure 5. Among them, S represents the state of the object O(N) existing at time t and location l, A _l and a _ls describe the attribute characteristics of the state, B _n and bn _u are the behavioral characteristics describing the state, and <T, S> is When and where attributes and behavioral characteristics arise.

As a preferred technical solution, the process information aggregation in step 4 includes two parts: state sequence aggregation and event process aggregation. The process is the connection of different states in time and space, and the dynamics of the process is reflected through the changes of attribute information and behavior information in the state. A typhoon event includes the evolution process of multiple objects during the event, and the process of a typhoon event is composed of different states of multiple objects. Therefore, a step-by-step decomposition method is adopted in the aggregation of process layer information, and the connection between state information and process information is abstracted into three stages: object state, state sequence and event process (Figure 6). Among them, the object state aggregates the attribute information and behavior information of the object in a certain time and space; the state sequence is to record the evolution process of the same object, and different states of the same object need to be aggregated; the event process is the common evolution process of multiple objects. It consists of multiple state sequences.

As a preferred technical solution, performing state sequence aggregation includes:

S1. Set the time range tr and spatial range sr of the aggregation, and traverse all the attribute items and behavior items in O(N) in turn based on the object information aggregation result O(N). In each attribute item and behavior item, determine whether there is

and

The attribute value or behavior value of , will all conform to the <T, S> range of tr and sr to form a set of space-time nodes. For all spatiotemporal nodes in the set, multiple state aggregation results are obtained by aggregation based on the method in step 3 respectively.

S2. Sort all the state aggregation results. First, according to the time information of the state, sort in order or in reverse order; secondly, according to the position information of the state, follow the scale from large to small or from small to large. According to the attribute information and behavior information of the state, it can be sorted according to the size or level of the feature value, or it can be sorted according to the similarity with the user aggregation condition. The sequence of states arranged according to three-dimensional conditions is the result of the process aggregation of a single object.

S3. Information organization of state sequence aggregation results. The organizational form of the state sequence information aggregation results can be expressed as Figure 5. Among them, P represents the process experienced by the object O(N) on the temporal scope tr and the spatial scope sr, and S represents the object state on the space-time node <t _n , _ln >.

As a preferred technical solution, performing event process aggregation includes:

S1. Set the time range tr and spatial range sr of the aggregation, based on the multi-object information aggregation result O(N _s )-O(N _t ), first traverse all the attribute items and behavior items in O(N _s ), and obtain <T, S> conforming to the range of tr and sr. Continue to traverse O(N _s+1 ) until O(N _t ) is traversed. All <T, S> in the range of tr and sr are formed into a set of space-time nodes.

S2. For multiple object state sequences, the same sorting mechanism needs to be adopted to ensure the consistency of the overall order of the aggregation results. For the aggregated results oriented to the event process, by comparing the state characteristics of different time nodes before and after the process, the movement of spatial features, as well as the differences in attributes and behavioral characteristics can be analyzed, and the dynamic process of the entire typhoon event can be recorded explicitly (Figure 7).

The preferred embodiments of the present invention have been specifically described above, but the present invention is not limited to the embodiments. Those skilled in the art can also make various equivalents without departing from the spirit of the present invention. Modifications or substitutions of the present application, and these equivalent modifications or substitutions are all included in the scope defined by the claims of the present application.

Claims

The typhoon event information aggregation method is characterized in that the main steps are as follows:

Step 1. Collect the message text related to the typhoon event in the social media, extract the typhoon event information from it, and convert it into a structured information tuple form;

Step 2. Object information aggregation based on multi-feature similarity: according to the similarity between object names to determine whether it belongs to the information tuple of the same object, it is necessary to aggregate the information tuples describing the same object;

Step 3. Aggregation of state information based on spatiotemporal features: In the aggregation result of object information, the attribute values and behavior values that meet the requirements of a single time and location condition are screened. Time information, location information, and the filtered attribute values and behavior values together constitute the object in the Aggregation results of state information in a specific time and space;

Step 4. State-based process information aggregation: screen the space-time node information that meets the time and location range requirements in the object information aggregation result, perform state information aggregation on these space-time nodes respectively, and sort multiple state information aggregation results to form Process information aggregation results reflecting dynamic characteristics.
The typhoon event information aggregation method according to claim 1, wherein in step 1, the typhoon event information includes object name, time information, location information, attribute information and behavior information.
The typhoon event information aggregation method according to claim 1, wherein in step 2, for different information tuples describing the same object, the attribute items and behavior items of the same type also need to be further aggregated.
The typhoon event information aggregation method according to claim 1, wherein in step 1, the typhoon event information extraction includes at least two parts: information element identification and information element association:

Identification of information elements: clarify the constituent objects of typhoon events and build a classification system, extract the names and characteristic information describing different types of objects from social media texts, and the characteristic information includes time, location, attributes and behaviors. The attribute information can be further divided into attribute items and attribute values, the attribute items represent the type of the attribute, and the attribute value is the data or the amount of data possessed by the attribute of this type. Behavior information is similar to attribute information;

Information element association: In the same social media text, the feature information is associated with the name according to its representative object to form an information tuple in the form of On =<T, L, A, B>. Among them, On is the object name, T is the time information, L is the location information, A is the attribute information, and B is the behavior information.
The typhoon event information aggregation method according to claim 1, characterized in that, in step 2, using word vector similarity to judge the similarity between object names, attribute items and behavior items, comprising the following steps:

S1. Perform word segmentation on all social media text data;

S2. The word segmentation result is used as the training set, and the Skip-gram model is used for word vector training;

S3. Set object names On1 , On2 , attribute items A1 , A2, behavior items B1, B2, respectively obtain On1 , On2 , A1 , A2 , B according to the trained word vector model 1. Word vectors E(O n1 ), E(O n2 ), E(A 1 ), E(A 2 ), E(B 1 ), E(B 2 ) of B 2 ;

S4. Calculate the similarity value sim between E(O n1 ) and E(O n2 ), E(A 1 ) and E(A 2 ), and E(B 1 ) and E(B 2 ) respectively by using the cosine similarity n , sim a and sim b . If sim n ≥ε n , sim a ≥ε a , sim b ≥ε b , where ε n ,

ε a and ε b are thresholds, indicating that On1 and On2 , A 1 and A 2 , B 1 and B 2 are the same object name, attribute item and behavior item, and corresponding information aggregation can be performed.
The method for aggregating typhoon event information according to claim 1, wherein in step 4, when sorting a plurality of state information aggregation results, the following steps are included:

A1. According to the time information of the state, follow the order or reverse order;

A2. According to the position information of the state, follow the order of the scale from large to small or from small to large;

A3. According to the attribute information and behavior information of the state, it can be sorted according to the size or level of the feature value, or it can be sorted according to the similarity with the user aggregation condition.