CN106021508A - Sudden event emergency information mining method based on social media - Google Patents

Sudden event emergency information mining method based on social media Download PDF

Info

Publication number
CN106021508A
CN106021508A CN201610345293.6A CN201610345293A CN106021508A CN 106021508 A CN106021508 A CN 106021508A CN 201610345293 A CN201610345293 A CN 201610345293A CN 106021508 A CN106021508 A CN 106021508A
Authority
CN
China
Prior art keywords
social media
emergency
document
media data
classified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610345293.6A
Other languages
Chinese (zh)
Inventor
王艳东
朱建奇
王腾
郭丰芹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University WHU
Original Assignee
Wuhan University WHU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University WHU filed Critical Wuhan University WHU
Priority to CN201610345293.6A priority Critical patent/CN106021508A/en
Publication of CN106021508A publication Critical patent/CN106021508A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a sudden event emergency information mining method based on social media. The method comprises the steps of (S1) using an open platform API or a web crawler to acquire social media data, the social media data being a document set; (S2) using a MongoDB cluster to store the document set; (S3) preprocessing the document set; (S4) using an LDA to mark the preprocessed document set, and obtaining a known sample; (S5) forming a word characteristic set by all words in each document of the known sample, word frequency of each word characteristic in the document being the weight of the word characteristic in the document; (S6) constructing a short text real-time classification model; (S7) classifying real-time sudden events by using the short text classification model, and predicting themes of the sudden events; and (S8) performing information mining according to the social media data of the classified sudden events. Classification of social media short texts can be automatically and rapidly achieved, and therefore sudden event emergency information is mined.

Description

Emergency information mining method for emergency based on social media
Technical Field
The invention relates to the technical field of social media, in particular to an emergency information mining method based on social media.
Background
The emergency event refers to a natural disaster, an accident disaster, a public health event and a social security event which are caused or possibly cause serious social hazards by sudden occurrence and need to be dealt with by emergency treatment measures. Along with the continuous acceleration of industrialization and urbanization processes in China, emergencies occur continuously. Meanwhile, China is one of the most seriously affected countries in the world by natural disasters, and the natural disasters are various, high in occurrence frequency and huge in loss caused by the natural disasters every year.
Displaying data issued according to national disaster reduction of civil administration department: 24353.7 million people in China are suffered from disasters caused by various natural disasters in 2014 only, 601.7 million people are placed in emergency transfer, 235 people are lost, 1583 people die, 45 million houses collapse, 354.2 million buildings are damaged in different degrees, 298.3 million people need emergency life rescue, the crop disaster area is 24890.7 kilo hectares, 3090.3 kilo hectares are harvested absolutely, and 3373.8 million yuan is directly lost economically. Natural disasters, which are only one type of emergency, cause a great deal of casualties and huge economic losses. The hazards of the emergency are visible.
How to reduce the loss caused by the emergency as much as possible is an urgent problem to be solved. On one hand, from the initial stage of the emergency, improving the early warning capability is an effective way to reduce the harm brought by the emergency. It helps to suppress the occurrence of emergency from the source or reduce the loss caused by insufficient time. However, this is difficult to achieve in the case of random emergencies such as natural disasters and accident disasters. On the other hand, from the development stage of the emergency, obtaining the emergency information effectively and timely is an important way for reducing the loss caused by the emergency. Once an emergency occurs, emergency information of the emergency needs to be acquired in time, and emergency treatment measures are taken accordingly, so that loss and harm caused by the emergency are reduced. Therefore, how to quickly, timely and effectively acquire emergency information is a critical problem of whether emergency information can be well dealt with.
The traditional emergency information is collected, organized and issued by an official or an authority. The disadvantages are mainly shown in that: the acquisition process lacks the participation and feedback of the public, and the information source is single; the acquisition timeliness is low, and even any information of an incident cannot be acquired in time after the occurrence of the emergency, such as major natural disasters; the information transmission direction is from the official to the public, and the information flows in a single direction and lacks feedback and communication. These deficiencies make it difficult for traditional emergency information to meet the need for timely, effective, and rational disposal of emergency events.
With the wide popularization of mobile devices and the rapid development of communication technologies, social media is becoming an important social networking tool for people. As the largest social media platform in China, the Sina microblog accumulates huge user groups in China and in the Chinese communities of more than 190 countries. By 12 months in 2013, the number of active users in the Xinlang microblog month and the number of active users in the day reach 1.291 hundred million and 6140 million respectively. Social media is a tool for people to spontaneously compose, share, rate, discuss, and communicate with each other. By using the tool, the public can tell the visions of the people and give opinions of important events (such as earthquake and urban waterlogging) in time through short characters and rich multimedia data.
As part of User Generated Content (UGC), social media data contains text, pictures, video, and geographic location data. Besides various content forms, the social media data is huge in quantity, high in propagation speed and wide in coverage. In 12 months in 2013, the Sinlang microblog shares over 28 million contents, including 22 million contents with pictures, 8170 million contents with short videos, and 2150 million contents with songs. In 2013, only in 10-12 months, the Xinlang microblog is signed in more than 1.2 hundred million times, namely, the user adds the geographical position of the user in the microblog content through the mobile device.
Social media is increasingly seen as a sensor that moves with a group of people, senses events that occur in the vicinity as well as other incidents that are far away, and shares and communicates with each other in a network. Once an emergency occurs in a place, people in the place broadcast the state of the event to the internet through characters, pictures and videos at the first time. Meanwhile, after people around the event place see related reports or narratives on the social network, timely responses are made in a dispute, and finally, information related to the event can quickly flood the whole social network. Incident information is widely spread in social networks in the form of social media data.
In conclusion, the social media has spontaneity, timeliness, wide participation and content diversity, which just make up the deficiency of the traditional emergency information. In the face of massive social media data, how to rapidly, timely and accurately mine emergency information from the social media data is a key problem to be researched.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides an emergency information mining method based on social media.
In order to solve the technical problems, the invention adopts the following technical scheme:
the emergency information mining method based on the social media for the emergency events comprises the following steps:
a method for classifying emergency events based on social media comprises the following steps:
s1, collecting social media data by adopting an open platform API or a web crawler, wherein the social media data is a document set;
s2 storing the document set using the MongoDB cluster;
s3 preprocessing a document set, including document duplication elimination, document word segmentation, document word stop and removal of documents with rare words;
s4 labeling the preprocessed document set with LDA to obtain a known sample, the step further includes:
4.1, respectively calculating the document themes in the document set after preprocessing to obtain a document-theme probability matrix and a theme-word probability matrix;
4.2 traversing the document-theme probability matrix, and taking the document and the theme corresponding to the probability exceeding the theme probability threshold lambda as a known sample, wherein the theme probability threshold lambda is an empirical value, and the document in the known sample is composed of a series of words;
s5 it is known that all words in each document of the sample form a word feature set, and the word frequency of each word feature in the document is the weight of the word feature in the document;
s6, constructing a short text real-time classification model, and the method further comprises the following steps:
6.1 training the SVM to obtain an SVM classifier by adopting the word characteristic set and the weight of each word characteristic, and enumerating model parameters of a series of SVM by adopting a grid search method;
6.2 verifying the SVM classifiers under the model parameters one by adopting a K-fold cross verification method, and taking the model parameter with the minimum prediction error as an optimal model parameter, wherein the SVM classifier corresponding to the optimal model parameter is a short text real-time classification model;
s7, classifying the real-time emergency by adopting a short text classification model based on the number of the social media of the real-time emergency, and predicting the subject of the emergency.
In step S1, the open platform API is used to collect social media data, specifically:
the method comprises the steps of using a plurality of search center points to conduct buffer area analysis with a specified search radius, enabling a buffer area to cover the whole emergency occurrence area, and obtaining social media data of the emergency occurrence area.
In step S1, social media data is collected by using a web crawler, specifically:
and adopting a customized crawler to capture social media data by inputting keywords, area segments and time ranges.
Secondly, an emergency information mining method based on social media for emergencies comprises the following steps:
the method of claim 1, wherein real-time emergencies are classified, and information mining is performed according to social media data of the classified emergencies.
The information mining according to the social media data of the classified sudden events comprises the following steps:
and obtaining the change trend of the number of the social media users participating in the discussion of the emergency in time according to the classified social media data of the emergency.
The information mining according to the social media data of the classified sudden events comprises the following steps:
and analyzing the change trend of the number of the social media users participating in each topic emergency along with the time according to the classified social media data of the emergency.
The information mining according to the social media data of the classified sudden events comprises the following steps:
and analyzing the spatial position information published by the social media data of each topic emergency according to the classified social media data of the emergency.
The information mining according to the social media data of the classified sudden events comprises the following steps:
and analyzing the spatial position information issued by the social media data of each topic emergency according to the classified social media data of the emergency, and clustering the microblog points of the emergency by adopting a multi-layer greedy clustering method according to the spatial position information.
The information mining according to the social media data of the classified sudden events comprises the following steps:
and analyzing spatial position information issued by the social media data of each topic emergency according to the classified social media data of the emergency, clustering the microblog points of the emergency by adopting a multi-layer greedy clustering method according to the spatial position information to obtain a hot spot dense area, and performing kernel density estimation and detection on the hot spot dense area to obtain the hot spot area.
Compared with the prior art, the invention has the following advantages and beneficial effects:
(1) the method supports real-time and quick acquisition of social media data related to the emergency;
(2) the classification of the short texts of the social media can be automatically and quickly realized, so that emergency information of the emergency is extracted;
(3) from the perspective of time and space, the emergency information based on the social media is analyzed to have close relation with the development process of the emergency.
Drawings
FIG. 1 is a social media data collection area based on the city of Beijing;
FIG. 2 is a detailed flow diagram of a web crawler collecting social media data;
FIG. 3 is a schematic diagram of a MongoDB cluster storing social media data;
FIG. 4 is a schematic diagram of a short text real-time classification model framework;
FIG. 5 is a statistical curve and a trend curve of the number of microblog users participating in a rainstorm discussion and the number of forwarded microblogs;
fig. 6 is a variation trend of the number of social media users participating in each topic emergency with time, wherein the ordinate is a ratio of the number of social media users of each topic emergency to the total number of social media users;
fig. 7 is a time variation trend of the number of social media users participating in the originality and forwarding of the "rescue information" topic, wherein the ordinate represents the ratio of the number of social media users participating in the originality and forwarding of the "rescue information" topic to the total number of social media users;
FIG. 8 is a distribution density diagram of microblog points containing position information under a traffic condition topic;
FIG. 9 shows a clustering result of a microblog point with a "rainstorm" topic;
fig. 10 is a hot spot map and a real water spot distribution map of a rainstorm microblog around a captain airport, wherein the map (a) is the hot spot map of the rainstorm microblog, and the map (b) is the real water spot distribution map provided by a dog searching map;
FIG. 11 is a schematic flow chart of the present invention;
FIG. 12 is a block diagram of the system of the present invention;
FIG. 13 shows the information classification table and emergency information positioning.
The case in the figure is a Beijing rainstorm emergency event of 7 months and 21 days in 2012.
Detailed Description
Theoretical basis
1. Social media data acquisition method
There are two main methods for social media data collection in the prior art.
One is to gather social media data through an open platform API. An Application Programming Interface (API) opened by the Sina microblog belongs to a Web API, and a channel for conveniently acquiring microblog official data is provided for a user. And the developer sends an HTTP request, and the background returns the microblog data meeting the conditions. The microblog open platform lists 25 types of interfaces, of which 24 types are available. The social media data returned by each interface is packaged in a JSON format. The open platform API mode has certain disadvantages, and cannot realize a mode of collecting data by 'keyword + region + time'.
Another is to crawl social media data using web crawlers. Web crawlers are classified into search engine crawlers and custom crawlers. The invention employs a customized crawler. The customized crawler is mainly used for capturing webpages within a specified range in order to meet a certain specific requirement. For searching microblogs in a mode of 'keyword + region + time', the general idea is as follows: constructing a URL, crawling a webpage, downloading the webpage, and analyzing microblog information in the webpage; and constructing a next URL until the microblog within the specified time is captured.
2. Unsupervised learning method
The late Dirichlet Allocation is called LDA for short, and is a theme model based on semantics. For computers, each document is a collection of words that is not known about its topic. The implicit topic model can find out each topic and the probability distribution condition of each document in the document set through the frequency of the appearance of the words in the document, wherein the topic is a group of words distributed according to the probability. LDA is an unsupervised learning method, and has the advantages that: (1) during training, a training set does not need to be marked manually, and only a document set and the number of specified topics need to be input; particularly, when the document set is large, the LDA greatly saves the manual labeling cost and the training time, and the efficiency is high; (2) after training is completed, the meaning of each topic can be distinguished through a group of words, and the topic of each document is given by a group of probability distribution.
3. Supervised learning method
A Support Vector Machine (SVM) is a supervised learning method. Given a training set containing N samples, the training set is represented asWherein, is a set of real numbers, xkN-dimensional vector, y, representing characteristics of the kth samplekAnd the output value, the real value or the sample mark corresponding to the kth sample is represented.
Second, the technical difficulties
Social media data belongs to short texts, and the classification of the short texts has difficulties: the text is short and short, and the words are too sparse; the spoken language is serious and difficult to be segmented; the information is noisy. In order to overcome the problems of short text classification, the invention designs a short text real-time classification model. The short text real-time classification model mainly comprises a learning process and a prediction process, wherein the learning process is divided into short text sample labeling based on LDA and training of a classification model based on SVM; the prediction process is mainly used for the theme speculation of each piece of text in the real-time text stream.
Thirdly, the invention realizes the process
The method comprises the following specific implementation steps:
step 1, social media data, namely a document set, is collected.
Social media data can be collected mainly through an open platform API or a web crawler.
When social media data are collected by using an Application Programming Interface (API) opened by a Sina microblog, as shown in fig. 1, the method and the device use a plurality of search center points to perform buffer analysis with a specified search radius so as to cover the whole emergency occurrence area, thereby obtaining the social media data of the emergency occurrence area.
When the webpage crawler is adopted to collect the social media data, microblog data, namely the social media data, can be collected by inputting a mode of 'keyword + region segment + time'. A specific process for collecting social media data by using a web crawler is shown in fig. 2.
And 2, storing the document set by using the MongoDB cluster.
The social media data collected in the step 1 is in a JSON format, and the MongoDB cluster is used for storing the social media data, and as shown in FIG. 3, the MongoDB cluster can establish a spatial index, so that the spatial data can be well organized, and the spatio-temporal query is convenient.
And 3, preprocessing social media data.
The method comprises the steps of social media data, namely a document set, preprocessing of the step comprises document duplication elimination, document word segmentation, document stop words and document rare word elimination, wherein the document duplication elimination is to eliminate repeated documents, and the document stop words are to eliminate words which do not contribute to text classification. The preprocessed document is composed of a series of words.
And 4, marking the preprocessed document set by utilizing the LDA to obtain a known sample.
The method further comprises the following steps:
4.1 obtaining a document-theme probability matrix and a theme-word probability matrix of the social media data according to the preprocessed document set.
And respectively calculating the topics of the documents in the document set after the preprocessing so as to obtain a document-topic probability matrix and a topic-word probability matrix, wherein the document-topic probability matrix can be used for representing the distribution of the topics in the documents, and the topic-word probability matrix can be used for representing the distribution of words in the topics.
The document-topic probability is the probability that the document corresponds to different topics, and the probability that the document d corresponds to the ith topic is the ratio of the number of words corresponding to the ith topic in the document d to the total number of words in the document d.
Theme-word probability, i.e. the probability that a theme corresponds to different words, and the probability that the ith theme corresponds to the jth word, i.e. the ratio of the jth word number in the ith theme to the total word number in the ith theme.
The calculation of the document-topic probability matrix and the topic-term probability matrix of social media data is conventional in the art, and for the sake of understanding, the following describes the calculation process of the document-topic probability matrix in detail.
(1) Traversing words of each document in the document set, and randomly giving a theme to each word; and traversing all the documents to obtain a document-subject probability matrix.
(2) According to the document-theme probability matrix, updating the theme corresponding to each word in the document by adopting a formula (1), and obtaining a new document-theme probability matrix:
P ( z = t | w , - t ) = M t w - w + β M t - w + β V · M d t - z + α t L d - 1 + Σ t α t ∝ M t w - w + β M t - w + β V ( M d t - z + α t ) - - - ( 1 )
in formula (1):
αtis a prior parameter, i.e. a hyperparameter, of the document-topic probability distribution;
beta is a prior parameter of the topic-word probability distribution;
v represents the size of a dictionary, namely a preprocessed document;
Ldrepresents the length of document d;
Mtwrepresenting the number of occurrences of the word w in the corpus topic t,is shown at MtwEliminating the influence of the current sampling word w;
Mtrepresenting the number of occurrences of the topic t in the corpus,is shown at MtEliminating the influence of the current sampling word w;
Mdtindicates the number of occurrences of the topic t in the document d,is shown at MdtThe influence of the current subject z is eliminated.
(3) And (3) repeating the step (2) until the document-subject probability matrix is converged.
The probability of the document corresponding to each topic can be obtained according to the document-topic probability matrix, and the topic with higher probability represents that the document is most likely to talk about the topic, namely the topic can represent the true topic of the document.
4.2, determining a theme probability threshold lambda, traversing the document-theme probability matrix, and extracting the documents with the probability exceeding the lambda and the corresponding themes as known samples. The subject probability threshold λ is an empirical value.
And 5, selecting the characteristics to obtain a word characteristic set.
Step 4 obtains a known sample, wherein the known sample comprises a document and a corresponding subject, and the document is composed of a series of words. All words in the known sample constitute a set of word features. And taking the word frequency of the word characteristic as the weight of the word characteristic, namely taking the word frequency of the word characteristic j in the document d as the weight of the word characteristic j in the document d. The weights are used to give reasonable degrees of importance to the word features.
And 6, constructing a short text real-time classification model.
The method further comprises the following steps:
6.1 training the SVM to obtain an SVM classifier by adopting the word characteristic set and the weight of each word characteristic, and enumerating a series of model parameters (C, gamma) of the SVM by adopting a Grid Search (Grid Search) method;
6.2 verifying the SVM classifiers under the model parameters (C, gamma) one by adopting a K-fold cross verification method, taking the model parameters (C, gamma) with the best cross verification effect as the optimal model parameters (C, gamma), wherein the SVM classifier corresponding to the optimal model parameters (C, gamma) is the short text real-time classification model. The cross validation effect is described by adopting the prediction error, namely the prediction error is minimum, namely the cross validation effect is best.
And 7, preprocessing the real-time emergency social media data, including removing advertisement information, filtering and forwarding blog messages and removing URL blog messages, and predicting the emergency topic by adopting a short text classification model based on the preprocessed real-time emergency social media data.
And 8, mining information according to the classified social media data of the emergency.
The information mining comprises the following steps:
(1) the time variation trend of the number of the social media users participating in the emergency discussion is obtained. For example, if the emergency event is rainstorm, the change trend of the number of users participating in the rainstorm discussion along with time can be counted through excel, as shown in fig. 5, wherein a point a corresponds to the moment when a rainstorm yellow early warning and a thunder yellow early warning are issued for the first time by a weather station in beijing city at 7, 21 and 14 in 2012, and a microblog trend line reaches the highest point 1 hour in advance; the B point corresponds to the moment when the weather station issues a rainstorm orange early warning, and the microblog trend line reaches the high point to be reached; referring to the point C at the same time of 20 days, the trend of the point B is reduced because the total number of the microblog participants begins to be reduced at 22 nights, and the forwarding behavior of the user is enhanced.
(2) And analyzing the change trend of the number of the social media users participating in each topic emergency along with the time. For example, the number of users of social media users participating in the discussion of the three topics of "disaster information", "weather report" and "loss influence" is counted over time, as shown in fig. 6; and counting the change trend of the number of the social media users participating in the originality and forwarding of the 'rescue information' theme along with time, as shown in figure 7.
(3) And analyzing the spatial position information published by the emergency social media data.
For example, the spatial distribution of the microblog points containing the position information of the topic of the traffic condition is analyzed, and the spatial distribution density of the microblog points of the topic of the traffic condition is drawn by using an ArcGIS spatial analysis tool, which is shown in FIG. 8.
(4) Clustering the emergency microblog points according to the spatial position information of the emergency microblog points so as to reveal the spatial distribution rule of the emergency.
Clustering the microblog points by adopting a multi-level greedy clustering method:
(a) the initial default map zoom level is 0, and all microblog points are regarded as one colony.
(b) And calculating the clustering threshold of the next hierarchy level according to the ascending order of the map zooming hierarchy.
(c) Sequentially taking out all the colonies of the previous level, and re-clustering the colonies according to the clustering threshold value in the step (2): sequentially taking out microblog points a in the colony, and calculating the distance between the microblog points a and each new colony; if the distance is smaller than the clustering threshold value, adding a into the cluster; otherwise, a new colony is formed.
(d) Repeating the steps (b) to (c) from the lowest level to the highest level to form the colony of each level, calculating the convex hull range of the colony, and storing the convex hull range in the tree structure.
And obtaining a hot spot dense area according to the clustering result of the microblog points, and obtaining the clustering result of the microblog points of the rainstorm theme in fig. 9.
(5) And performing kernel density estimation and detection on the hot spot dense area to obtain a hot spot area, wherein a hot spot map is represented on the map, as shown in fig. 10.
The kernel density estimation formula adopted by the invention is as follows:
f ( x ) = 1 n h Σ i = 1 n k ( x - x i h ) - - - ( 2 )
in the formula (2), f (x) represents a nuclear density; x is the number ofiExpressing microblog points obeying unknown distribution, wherein n represents the microblog points; the radius of the hot spot graph corresponding to the bandwidth h is larger, and the microblog points are arranged according to the larger radiusThe wider the influence range of (c); k (·) represents a kernel function, defined as a monotonic function of the euclidean distance between any point in space and some center.
Fig. 11 is a schematic diagram of a specific flow of the method of the present invention, and fig. 12 is a system framework of the present invention, which mainly includes four parts, namely, data acquisition, data storage, mining, and result visualization, which are divided into 4 explicit submodules and 1 implicit submodule, which are respectively a social media data acquisition submodule, a social media data storage submodule, an emergency information mining submodule, a mining result visualization submodule, and a map submodule.
Different sub-modules realize different functions, wherein the social media data acquisition sub-module comprises an open API data acquisition function, a web crawler data acquisition function and a data storage function; the social media data storage submodule has a data acquisition storage function, a short text classification model storage function and a theme category updating function; the emergency information mining submodule comprises functions of emergency information classification, trend analysis, cluster analysis, time sequencing, heat sequencing and the like; the mining result visualization sub-module is divided into visualization forms such as scene reappearance, information classification table, emergency information positioning, trend curve chart, cluster map, hot spot map and word cloud map, and is shown in figure 13; the map sub-module provides basic functions of map base map switching, zooming, roaming and the like.

Claims (9)

1. The emergency classification method based on the social media is characterized by comprising the following steps:
s1, collecting social media data by adopting an open platform API or a web crawler, wherein the social media data is a document set;
s2 storing the document set using the MongoDB cluster;
s3 preprocessing a document set, including document duplication elimination, document word segmentation, document word stop and removal of documents with rare words;
s4 labeling the preprocessed document set with LDA to obtain a known sample, the step further includes:
4.1, respectively calculating the document themes in the document set after preprocessing to obtain a document-theme probability matrix and a theme-word probability matrix;
4.2 traversing the document-theme probability matrix, and taking the document and the theme corresponding to the probability exceeding the theme probability threshold lambda as a known sample, wherein the theme probability threshold lambda is an empirical value, and the document in the known sample is composed of a series of words;
s5 it is known that all words in each document of the sample form a word feature set, and the word frequency of each word feature in the document is the weight of the word feature in the document;
s6, constructing a short text real-time classification model, and the method further comprises the following steps:
6.1 training the SVM to obtain an SVM classifier by adopting the word characteristic set and the weight of each word characteristic, and enumerating model parameters of a series of SVM by adopting a grid search method;
6.2 verifying the SVM classifiers under the model parameters one by adopting a K-fold cross verification method, and taking the model parameter with the minimum prediction error as an optimal model parameter, wherein the SVM classifier corresponding to the optimal model parameter is a short text real-time classification model;
s7, classifying the real-time emergency by adopting a short text classification model based on the number of the social media of the real-time emergency, and predicting the subject of the emergency.
2. The method of classifying emergency events based on social media as set forth in claim 1, wherein:
the method comprises the following steps of adopting an open platform API to collect social media data, specifically:
the method comprises the steps of using a plurality of search center points to conduct buffer area analysis with a specified search radius, enabling a buffer area to cover the whole emergency occurrence area, and obtaining social media data of the emergency occurrence area.
3. The method of classifying emergency events based on social media as set forth in claim 1, wherein:
adopting the webpage crawler to collect social media data, specifically:
and adopting a customized crawler to capture social media data by inputting keywords, area segments and time ranges.
4. An emergency information mining method based on social media is characterized by comprising the following steps:
the method of claim 1, wherein real-time emergencies are classified, and information mining is performed according to social media data of the classified emergencies.
5. The emergency information mining method based on social media as claimed in claim 4, wherein:
the information mining according to the social media data of the classified sudden events comprises the following steps:
and obtaining the change trend of the number of the social media users participating in the discussion of the emergency in time according to the classified social media data of the emergency.
6. The emergency information mining method based on social media as claimed in claim 4, wherein:
the information mining according to the social media data of the classified sudden events comprises the following steps:
and analyzing the change trend of the number of the social media users participating in each topic emergency along with the time according to the classified social media data of the emergency.
7. The emergency information mining method based on social media as claimed in claim 4, wherein:
the information mining according to the social media data of the classified sudden events comprises the following steps:
and analyzing the spatial position information published by the social media data of each topic emergency according to the classified social media data of the emergency.
8. The emergency information mining method based on social media as claimed in claim 4, wherein:
the information mining according to the social media data of the classified sudden events comprises the following steps:
and analyzing the spatial position information issued by the social media data of each topic emergency according to the classified social media data of the emergency, and clustering the microblog points of the emergency by adopting a multi-layer greedy clustering method according to the spatial position information.
9. The emergency information mining method based on social media as claimed in claim 4, wherein:
the information mining according to the social media data of the classified sudden events comprises the following steps:
and analyzing spatial position information issued by the social media data of each topic emergency according to the classified social media data of the emergency, clustering the microblog points of the emergency by adopting a multi-layer greedy clustering method according to the spatial position information to obtain a hot spot dense area, and performing kernel density estimation and detection on the hot spot dense area to obtain the hot spot area.
CN201610345293.6A 2016-05-23 2016-05-23 Sudden event emergency information mining method based on social media Pending CN106021508A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610345293.6A CN106021508A (en) 2016-05-23 2016-05-23 Sudden event emergency information mining method based on social media

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610345293.6A CN106021508A (en) 2016-05-23 2016-05-23 Sudden event emergency information mining method based on social media

Publications (1)

Publication Number Publication Date
CN106021508A true CN106021508A (en) 2016-10-12

Family

ID=57095827

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610345293.6A Pending CN106021508A (en) 2016-05-23 2016-05-23 Sudden event emergency information mining method based on social media

Country Status (1)

Country Link
CN (1) CN106021508A (en)

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106933949A (en) * 2017-01-20 2017-07-07 浙江大学 The planing method of influence power outburst in a kind of control social networks
CN107229712A (en) * 2017-05-27 2017-10-03 中南大学 A kind of space-time clustering method towards occurred events of public safety acquisition of information
CN107908636A (en) * 2017-09-26 2018-04-13 武汉大学 A kind of method that mankind's activity spatiotemporal mode is excavated using social media
US10136294B2 (en) 2015-12-17 2018-11-20 Rapidsos, Inc. Devices and methods for efficient emergency calling
CN108959424A (en) * 2018-06-11 2018-12-07 长春电力设计有限公司 A kind of operating method of the city electricity consumption map for power system load monitoring
US10375558B2 (en) 2017-04-24 2019-08-06 Rapidsos, Inc. Modular emergency communication flow management system
US10419915B2 (en) 2016-02-26 2019-09-17 Rapidsos, Inc. Systems and methods for emergency communications amongst groups of devices based on shared data
US10425799B2 (en) 2014-07-08 2019-09-24 Rapidsos, Inc. System and method for call management
US10447865B2 (en) 2016-04-26 2019-10-15 Rapidsos, Inc. Systems and methods for emergency communications
CN110426735A (en) * 2019-07-02 2019-11-08 武汉大学 A kind of detection method of the earthquake disaster coverage based on social media
CN110555568A (en) * 2019-09-12 2019-12-10 重庆交通大学 Road traffic running state real-time perception method based on social network information
US10657799B2 (en) 2015-11-02 2020-05-19 Rapidsos, Inc. Method and system for situational awareness for emergency response
US10701542B2 (en) 2017-12-05 2020-06-30 Rapidsos, Inc. Social media content for emergency management
US10805786B2 (en) 2018-06-11 2020-10-13 Rapidsos, Inc. Systems and user interfaces for emergency data integration
US10820181B2 (en) 2018-02-09 2020-10-27 Rapidsos, Inc. Emergency location analysis system
US10861320B2 (en) 2016-08-22 2020-12-08 Rapidsos, Inc. Predictive analytics for emergency detection and response management
US10911926B2 (en) 2019-03-29 2021-02-02 Rapidsos, Inc. Systems and methods for emergency data integration
CN112396441A (en) * 2019-08-14 2021-02-23 腾讯科技(深圳)有限公司 Data processing method and device and readable storage medium
US10977927B2 (en) 2018-10-24 2021-04-13 Rapidsos, Inc. Emergency communication flow management and notification system
US11146680B2 (en) 2019-03-29 2021-10-12 Rapidsos, Inc. Systems and methods for emergency data integration
CN113821739A (en) * 2021-11-22 2021-12-21 南方科技大学 Local event detection method, device, equipment and storage medium
US11218584B2 (en) 2019-02-22 2022-01-04 Rapidsos, Inc. Systems and methods for automated emergency response
US11228891B2 (en) 2019-07-03 2022-01-18 Rapidsos, Inc. Systems and methods for emergency medical communications
US11330664B1 (en) 2020-12-31 2022-05-10 Rapidsos, Inc. Apparatus and method for obtaining emergency data and providing a map view
CN114637853A (en) * 2022-05-17 2022-06-17 天津卓朗科技发展有限公司 Grading method of emergency and model training method and device thereof
US11425529B2 (en) 2016-05-09 2022-08-23 Rapidsos, Inc. Systems and methods for emergency communications
US11641575B2 (en) 2018-04-16 2023-05-02 Rapidsos, Inc. Emergency data management and access system
US11917514B2 (en) 2018-08-14 2024-02-27 Rapidsos, Inc. Systems and methods for intelligently managing multimedia for emergency response
US12041525B2 (en) 2014-09-19 2024-07-16 Rapidsos, Inc. Method and system for emergency call management

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411611A (en) * 2011-10-15 2012-04-11 西安交通大学 Instant interactive text oriented event identifying and tracking method
CN103176981A (en) * 2011-12-20 2013-06-26 中国科学院计算机网络信息中心 Event information mining and warning method
CN103176983A (en) * 2011-12-20 2013-06-26 中国科学院计算机网络信息中心 Event warning method based on Internet information
CN103744978A (en) * 2014-01-14 2014-04-23 清华大学 Parameter optimization method for support vector machine based on grid search technology
CN105260437A (en) * 2015-09-30 2016-01-20 陈一飞 Text classification feature selection method and application thereof to biomedical text classification

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102411611A (en) * 2011-10-15 2012-04-11 西安交通大学 Instant interactive text oriented event identifying and tracking method
CN103176981A (en) * 2011-12-20 2013-06-26 中国科学院计算机网络信息中心 Event information mining and warning method
CN103176983A (en) * 2011-12-20 2013-06-26 中国科学院计算机网络信息中心 Event warning method based on Internet information
CN103744978A (en) * 2014-01-14 2014-04-23 清华大学 Parameter optimization method for support vector machine based on grid search technology
CN105260437A (en) * 2015-09-30 2016-01-20 陈一飞 Text classification feature selection method and application thereof to biomedical text classification

Non-Patent Citations (6)

* Cited by examiner, † Cited by third party
Title
刘坤: "基于微博的网络舆情事件主动感知研究", 《中国优秀硕士论文全文数据库》 *
刘小溪: "基于短文本分类的微博舆情信息检测系统的设计与实现", 《中国优秀硕士论文全文数据库》 *
朱建奇等: "基于社交媒体的应急信息系统设计与实现", 《测绘与空间地理信息》 *
王健峰等: "基于改进的网格搜索法的SVM参数优化", 《应用科技》 *
王艳东等: "基于社交媒体的突发事件应急信息挖掘与分析", 《武汉大学学报》 *
葛文镇: "面向微博的短文本多分类研究", 《中国优秀硕士学位论文全文数据库》 *

Cited By (61)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11659375B2 (en) 2014-07-08 2023-05-23 Rapidsos, Inc. System and method for call management
US11153737B2 (en) 2014-07-08 2021-10-19 Rapidsos, Inc. System and method for call management
US12047858B2 (en) 2014-07-08 2024-07-23 Rapidsos, Inc. System and method for call management
US10425799B2 (en) 2014-07-08 2019-09-24 Rapidsos, Inc. System and method for call management
US12041525B2 (en) 2014-09-19 2024-07-16 Rapidsos, Inc. Method and system for emergency call management
US10657799B2 (en) 2015-11-02 2020-05-19 Rapidsos, Inc. Method and system for situational awareness for emergency response
US11605287B2 (en) 2015-11-02 2023-03-14 Rapidsos, Inc. Method and system for situational awareness for emergency response
US11580845B2 (en) 2015-11-02 2023-02-14 Rapidsos, Inc. Method and system for situational awareness for emergency response
US10701541B2 (en) 2015-12-17 2020-06-30 Rapidsos, Inc. Devices and methods for efficient emergency calling
US11832157B2 (en) 2015-12-17 2023-11-28 Rapidsos, Inc. Devices and methods for efficient emergency calling
US10136294B2 (en) 2015-12-17 2018-11-20 Rapidsos, Inc. Devices and methods for efficient emergency calling
US11140538B2 (en) 2015-12-17 2021-10-05 Rapidsos, Inc. Devices and methods for efficient emergency calling
US11445349B2 (en) 2016-02-26 2022-09-13 Rapidsos, Inc. Systems and methods for emergency communications amongst groups of devices based on shared data
US10419915B2 (en) 2016-02-26 2019-09-17 Rapidsos, Inc. Systems and methods for emergency communications amongst groups of devices based on shared data
US10771951B2 (en) 2016-02-26 2020-09-08 Rapidsos, Inc. Systems and methods for emergency communications amongst groups of devices based on shared data
US11665523B2 (en) 2016-02-26 2023-05-30 Rapidsos, Inc. Systems and methods for emergency communications amongst groups of devices based on shared data
US10447865B2 (en) 2016-04-26 2019-10-15 Rapidsos, Inc. Systems and methods for emergency communications
US11425529B2 (en) 2016-05-09 2022-08-23 Rapidsos, Inc. Systems and methods for emergency communications
US11790766B2 (en) 2016-08-22 2023-10-17 Rapidsos, Inc. Predictive analytics for emergency detection and response management
US10861320B2 (en) 2016-08-22 2020-12-08 Rapidsos, Inc. Predictive analytics for emergency detection and response management
CN106933949B (en) * 2017-01-20 2020-09-11 浙江大学 Planning method for controlling influence outbreak in social network
CN106933949A (en) * 2017-01-20 2017-07-07 浙江大学 The planing method of influence power outburst in a kind of control social networks
US11974207B2 (en) 2017-04-24 2024-04-30 Rapidsos, Inc. Modular emergency communication flow management system
US10375558B2 (en) 2017-04-24 2019-08-06 Rapidsos, Inc. Modular emergency communication flow management system
US11496874B2 (en) 2017-04-24 2022-11-08 Rapidsos, Inc. Modular emergency communication flow management system
CN107229712A (en) * 2017-05-27 2017-10-03 中南大学 A kind of space-time clustering method towards occurred events of public safety acquisition of information
CN107908636A (en) * 2017-09-26 2018-04-13 武汉大学 A kind of method that mankind's activity spatiotemporal mode is excavated using social media
US12063581B2 (en) 2017-12-05 2024-08-13 Rapidsos, Inc. Emergency registry for emergency management
US10701542B2 (en) 2017-12-05 2020-06-30 Rapidsos, Inc. Social media content for emergency management
US11197145B2 (en) 2017-12-05 2021-12-07 Rapidsos, Inc. Social media content for emergency management
US11818639B2 (en) 2018-02-09 2023-11-14 Rapidsos, Inc. Emergency location analysis system
US10820181B2 (en) 2018-02-09 2020-10-27 Rapidsos, Inc. Emergency location analysis system
US11641575B2 (en) 2018-04-16 2023-05-02 Rapidsos, Inc. Emergency data management and access system
CN108959424B (en) * 2018-06-11 2021-08-20 长春电力设计有限公司 Operation method of urban electricity utilization map for monitoring load of power system
US11310647B2 (en) 2018-06-11 2022-04-19 Rapidsos, Inc. Systems and user interfaces for emergency data integration
US10805786B2 (en) 2018-06-11 2020-10-13 Rapidsos, Inc. Systems and user interfaces for emergency data integration
US11871325B2 (en) 2018-06-11 2024-01-09 Rapidsos, Inc. Systems and user interfaces for emergency data integration
CN108959424A (en) * 2018-06-11 2018-12-07 长春电力设计有限公司 A kind of operating method of the city electricity consumption map for power system load monitoring
US11917514B2 (en) 2018-08-14 2024-02-27 Rapidsos, Inc. Systems and methods for intelligently managing multimedia for emergency response
US11741819B2 (en) 2018-10-24 2023-08-29 Rapidsos, Inc. Emergency communication flow management and notification system
US10977927B2 (en) 2018-10-24 2021-04-13 Rapidsos, Inc. Emergency communication flow management and notification system
US11218584B2 (en) 2019-02-22 2022-01-04 Rapidsos, Inc. Systems and methods for automated emergency response
US12074999B2 (en) 2019-02-22 2024-08-27 Rapidsos, Inc. Systems and methods for automated emergency response
US11689653B2 (en) 2019-02-22 2023-06-27 Rapidsos, Inc. Systems and methods for automated emergency response
US11695871B2 (en) 2019-03-29 2023-07-04 Rapidsos, Inc. Systems and methods for emergency data integration
US11943694B2 (en) 2019-03-29 2024-03-26 Rapidsos, Inc. Systems and methods for emergency data integration
US11558728B2 (en) 2019-03-29 2023-01-17 Rapidsos, Inc. Systems and methods for emergency data integration
US10911926B2 (en) 2019-03-29 2021-02-02 Rapidsos, Inc. Systems and methods for emergency data integration
US11146680B2 (en) 2019-03-29 2021-10-12 Rapidsos, Inc. Systems and methods for emergency data integration
CN110426735A (en) * 2019-07-02 2019-11-08 武汉大学 A kind of detection method of the earthquake disaster coverage based on social media
US11716605B2 (en) 2019-07-03 2023-08-01 Rapidsos, Inc. Systems and methods for victim identification
US11228891B2 (en) 2019-07-03 2022-01-18 Rapidsos, Inc. Systems and methods for emergency medical communications
CN112396441B (en) * 2019-08-14 2023-08-22 腾讯科技(深圳)有限公司 Data processing method, device and readable storage medium
CN112396441A (en) * 2019-08-14 2021-02-23 腾讯科技(深圳)有限公司 Data processing method and device and readable storage medium
CN110555568B (en) * 2019-09-12 2022-12-02 重庆交通大学 Road traffic running state real-time perception method based on social network information
CN110555568A (en) * 2019-09-12 2019-12-10 重庆交通大学 Road traffic running state real-time perception method based on social network information
US11956853B2 (en) 2020-12-31 2024-04-09 Rapidsos, Inc. Apparatus and method for obtaining emergency data and providing a map view
US11330664B1 (en) 2020-12-31 2022-05-10 Rapidsos, Inc. Apparatus and method for obtaining emergency data and providing a map view
US11528772B2 (en) 2020-12-31 2022-12-13 Rapidsos, Inc. Apparatus and method for obtaining emergency data related to emergency sessions
CN113821739A (en) * 2021-11-22 2021-12-21 南方科技大学 Local event detection method, device, equipment and storage medium
CN114637853A (en) * 2022-05-17 2022-06-17 天津卓朗科技发展有限公司 Grading method of emergency and model training method and device thereof

Similar Documents

Publication Publication Date Title
CN106021508A (en) Sudden event emergency information mining method based on social media
Resch et al. Combining machine-learning topic models and spatiotemporal analysis of social media data for disaster footprint and damage assessment
Laylavi et al. Event relatedness assessment of Twitter messages for emergency response
CN110472066B (en) Construction method of urban geographic semantic knowledge map
CN103955505B (en) A kind of event method of real-time and system based on microblogging
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
CN104182389B (en) A kind of big data analyzing business intelligence service system based on semanteme
Zhou et al. Real world city event extraction from Twitter data streams
CN110533212A (en) Urban waterlogging public sentiment monitoring and pre-alarming method based on big data
Huang et al. Early detection of emergency events from social media: A new text clustering approach
CN103324666A (en) Topic tracing method and device based on micro-blog data
CN103020159A (en) Method and device for news presentation facing events
CN103064880B (en) A kind of methods, devices and systems providing a user with website selection based on search information
CN110162626A (en) A kind of calculation method of the public sentiment emotion temperature entropy based on two-way LSTM
CN109685153A (en) A kind of social networks rumour discrimination method based on characteristic aggregation
CN107193867A (en) Much-talked-about topic analysis method based on big data
Tang et al. Social media-based disaster research: Development, trends, and obstacles
CN109918648B (en) Rumor depth detection method based on dynamic sliding window feature score
CN103761286B (en) A kind of Service Source search method based on user interest
CN112905800A (en) Public character public opinion knowledge graph and XGboost multi-feature fusion emotion early warning method
Almehmadi et al. Language usage on Twitter predicts crime rates
CN109597926A (en) A kind of information acquisition method and system based on social media emergency event
CN108984514A (en) Acquisition methods and device, storage medium, the processor of word
CN111858924A (en) System with network public opinion monitoring and analyzing functions
Wu et al. Mining typhoon victim information based on multi-source data fusion using social media data in China: a case study of the 2019 Super Typhoon Lekima

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20161012

WD01 Invention patent application deemed withdrawn after publication