CN104536956A - A Microblog platform based event visualization method and system - Google Patents

A Microblog platform based event visualization method and system Download PDF

Info

Publication number
CN104536956A
CN104536956A CN201410354273.6A CN201410354273A CN104536956A CN 104536956 A CN104536956 A CN 104536956A CN 201410354273 A CN201410354273 A CN 201410354273A CN 104536956 A CN104536956 A CN 104536956A
Authority
CN
China
Prior art keywords
microblogging
microblog
event
word
time range
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410354273.6A
Other languages
Chinese (zh)
Inventor
曹娟
储达峰
周兴
张勇东
谢菲
苏宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
XINHUA NEWS AGENCY
Institute of Computing Technology of CAS
Original Assignee
XINHUA NEWS AGENCY
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by XINHUA NEWS AGENCY, Institute of Computing Technology of CAS filed Critical XINHUA NEWS AGENCY
Priority to CN201410354273.6A priority Critical patent/CN104536956A/en
Publication of CN104536956A publication Critical patent/CN104536956A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/957Browsing optimisation, e.g. caching or content distillation
    • G06F16/9577Optimising the visualization of content, e.g. distillation of HTML documents
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention discloses a Microblog platform based event visualization method and system. The present invention relates to an information extracting and visualizing technology. The method includes: retrieving Microblogs in a time period related to an event through an event search interface of the Microblog platform according to a key word of the event and the time period; sorting the Microblogs according to time to generate a set of the Microblogs; generating a plurality of cluster subsets from the set of the Microblogs through a clustering algorithm; extracting a key word from the plurality of cluster subsets, to generate a plurality of word clouds, and endowing the key word, which repeatedly appears in the plurality of word clouds, with the same color, position and rotational mode; and performing a visualization presentation of the event by means of presenting each of the cluster subsets with one corresponding word cloud. Depending on the Microblog platform, Microblog information on a certain event can be fully obtained by collecting related Microblogs using the key word of the event.

Description

A kind of incident visualization method and system based on microblog
Technical field
The present invention relates to information extraction and visualization technique, particularly a kind of incident visualization method and system based on microblog.
Background technology
Along with the develop rapidly of internet, various social media is arisen at the historic moment in recent years, common are Facebook (face book), Twitter (pushing away spy), Sina's microblogging, Renren Network, wherein become the internet, applications of current hot topic with Twitter, the Sina's microblogging microblog that is representative with its open Information Sharing and propagation characteristic.
Microblogging, the i.e. abbreviation of micro-blog (Microblog), user can issue the information such as word, picture, video within 140 words whenever and wherever possible on platform.Microblogging has originality, ageing, the feature such as fragment, repeatability.In the middle of microblog, user can search for and check oneself interested topic, browses content that topic is correlated with and participates in the discussion of topic content.But due in the middle of microblog, be flooded with in a large number about the relevant microblog of some events, simultaneously because the short text characteristic of microblogging, this results in the problem such as fragmentation, indigestibility that releases news.Releasing news uneven is a very significant phenomenon in microblog.Exactly because above-mentioned a variety of causes, make user to the development trend being difficult to understand fast at short notice an event, user-interaction experience is deteriorated.
In the middle of the technology of existing microblogging incident visualization, generally the simple microblogging relevant to event temporally sorts, microblogging in the nearest time period is showed user, also have and sort by the temperature of microblogging, the microblogging of hot topic being showed user, also having the displaying of method by selecting the microblogging within the scope of certain hour to carry out time or temperature sequence in addition.These methods of exhibiting are the direct displaying to original content of microblog above, have many-sided weak point.The first, because network information is explosive growth, traditional be difficult to allow user obtain the relevant information content of event fast to the method that original microblogging carries out visual presentation; The second, due to the short text characteristic of microblogging, the microblogging quality that releases news is uneven, and the colloquial problem of microblogging makes user be difficult to the content of fast understanding microblogging, and will look for a needle in a haystack especially by the important information excavated about event from microblogging text.
In the middle of incident visualization method, there is a class to be all text messages to event, carry out keyword abstraction, then the keyword extracted is shown by a word cloud.This mode, microblog users can be allowed from the main topic of understanding event in the middle of main keyword, but microblog users can not have one to get information about to the development evolvement of each subevent of event and event.
The visual mode of other, by the personage in the middle of extraction event, place, event summary sentence, using they nodal informations as event evolves, with the incidence relation between them for limit, carries out visual displaying to event.But this visual presentation mode based on personage, place, event summary sentence has significant limitation for microblogging event, because microblogging has the information such as personage, place, organizational structure of specification unlike formal news report, so be difficult to obtain these information from microblogging.Therefore this visual means has significant limitation for microblogging.
Patent of invention " microblogging word cloud generation method and access back-up system based on Users' Interests Mining ", a kind of microblogging word cloud generation method based on Users' Interests Mining of this disclosure of the invention and Twitter message access back-up system, the method comprises: given current login user pay close attention to the Twitter message collection that user newly issues, therefrom extract keyword set; Respectively based on customer relationship, based on the current login user of Similarity Measure of keyword to the interest-degree of keyword in this keyword set, and the interest-degree two kinds being calculated gained merges, and calculates final interest-degree; K the keyword that interest-degree is the highest is selected from described keyword set; K the keyword gone out selected by display in a region.This system comprises the key modules such as user profile acquisition module, word cloud maker.This invention can make user from Twitter message, obtain its interested information more efficiently.But the present invention is different from this invention research object: this invention take microblog users as research object, analyzing the content of microblog of microblog users, carrying out the displaying of word cloud by extracting keyword.And the present invention take media event as research object; Visual difference: this invention is only carried out keyword abstraction to microblogging and shown in word cloud mode.And the present invention carries out keyword abstraction with the subevent of event, the various dimensions carrying out portmanteau word cloud are shown.
Patent of invention " affair character evolution excavation method and system based on microblogging ", a kind of affair character evolution excavation method based on microblogging of this disclosure of the invention, comprise: in microblogging time series, choose evolution initial document collection, and on microblogging collection of document based on the graph model of the co-occurrence latent structure document of vocabulary to obtain the knowledge network structure of event; According to the literal feature of vocabulary, microblogging graph model merges by the tendentious compatible characteristics of vocabulary, the microcosmic evolution diagram of tectonic event feature; The microcosmic evolution diagram of event carries out beta pruning, cutting and conversion, forms the Macro Evolution figure of affair character.The method have employed the figure method for digging of the knowledge network based on event in the Evolution process excavating affair character, and whole event Character evolution method for digging is got a promotion in the inheritance of knowledge, and the interpretation of Result is stronger.But the present invention extracts different from this inventive features: this invention mainly carries out feature extraction from word structure, the evolution carrying out event by building knowledge network structure is shown.The present invention is mainly to affair clustering, and the sub-topic characteristic information excavating event carries out evolution to be shown.
Summary of the invention
Not enough for prior art, the present invention proposes a kind of incident visualization method and system based on microblog, to solve above technical matters.
The present invention proposes a kind of incident visualization method based on microblog, comprising:
Step 1, according to keyword and the time range of this event, by the event searching interface of this microblog, retrieves the microblogging in this time range relevant to this event;
Step 2, sorts this microblogging according to the time, generates a microblogging set;
Step 3, this microblogging set, by clustering algorithm, generates multiple cluster subset;
Step 4, carries out keyword abstraction to the plurality of cluster subset, generates multiple word cloud, and gives identical color, position, rotation mode by this keyword repeated in the plurality of word cloud;
Step 5, by the mode of being carried out showing by this cluster subset sums each this word cloud corresponding thereto, carries out visual presentation by this event.
The described incident visualization method based on microblog, also comprises before this step 2:
Step 21, filters number of words in this microblogging in this time range and is less than the microblogging of certain threshold value;
Step 22, filters temperature in this microblogging in this time range and is less than the microblogging of certain threshold value;
Step 23, filters the information of non-textual format in this microblogging in this time range;
Step 24, filters " the@user name " in this microblogging in this time range.
The described incident visualization method based on microblog, in this step 22, the computing formula of this temperature is:
Heat = retweets + comments 3
Wherein retweets represents microblogging and forwards quantity, and comments represents the comment number of microblogging, and Heat represents microblogging temperature.
The described incident visualization method based on microblog, carries out keyword abstraction to this cluster subset each in this step 4, and the concrete steps generating portmanteau word cloud comprise:
Step 41, carries out word segmentation processing to this cluster subset each, generates set of words;
Step 42, is merged this set of words by wikipedia entry and network boom word, generates this portmanteau word cloud.
The described incident visualization method based on microblog, it is characterized in that, this step 4 also comprises: according to inverse document frequency, gives the high grade of transparency by this word.
The invention allows for a kind of incident visualization system based on microblog, comprising:
Retrieval module, for according to the keyword of this event and time range, by the event searching interface of this microblog, retrieves the microblogging in this time range relevant to this event;
Order module, for being sorted according to the time by this microblogging, generates a microblogging set;
Cluster module, for this microblogging set by clustering algorithm, generates multiple cluster subset;
Generate portmanteau word cloud module, for carrying out keyword abstraction to the plurality of cluster subset, generate multiple word cloud, and give identical color, position, rotation mode by this keyword repeated in the plurality of word cloud;
Display module, for the mode by being carried out showing by this cluster subset sums each this word cloud corresponding thereto, carries out visual presentation by this event.
The described incident visualization system based on microblog, also comprises filtering module, is less than the microblogging of certain threshold value for filtering number of words in this microblogging in this time range; Filter temperature in this microblogging in this time range and be less than the microblogging of certain threshold value; Filter the information of non-textual format in this microblogging in this time range; Filter " the@user name " in this microblogging in this time range.
The described incident visualization system based on microblog, in this filtering module, the computing formula of this temperature is:
Heat = retweets + comments 3
Wherein retweets represents microblogging and forwards quantity, and comments represents the comment number of microblogging, and Heat represents microblogging temperature.
The described incident visualization system based on microblog, carries out keyword abstraction to this cluster subset each in this generation portmanteau word cloud module, and the concrete steps generating portmanteau word cloud comprise: carry out word segmentation processing to this cluster subset each, generates set of words; By wikipedia entry and network boom word, this set of words is merged, generate this portmanteau word cloud.
The described incident visualization system based on microblog, this display module also for: according to inverse document frequency, by this word give the high grade of transparency.
From above scheme, the invention has the advantages that:
Rely on microblog, by event keyword, relevant microblogging is gathered, comprehensively can obtain the micro-blog information about certain event; Adopt micro-blog information filtering technique, the significant micro-blog information of high-quality can be obtained; By carrying out the cluster based on time dimension to the microblog data collection of event, the affair clustering subset obtained has the information of time dimension, and these data subsets both can represent certain topic of event, also can the evolution of outgoing event in general; By keyword abstraction technology, can extract representational microblogging keyword from one group of microblogging, the keyword of one group of event can allow microblog users have individual getting information about to content of microblog; By controlling color, the positional information of identical word in the middle of multiple word cloud, make them in the visual procedure for displaying of portmanteau word cloud, have the consistance of height, make microblog users can be found out the staple of conversation of whole event very easily by portmanteau word cloud, and the topic in the middle of each subevent, and can be analyzed each subevent very easily.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of the time method for visualizing based on microblog;
Fig. 2 is portmanteau word cloud visual presentation process flow diagram;
Fig. 3 is the visual presentation instance graph of event.
Wherein Reference numeral is:
Step 100, based on the concrete steps of the incident visualization method of microblog, comprising:
Step 101/102/103/104/105/106/107.
Embodiment
In detail the specific embodiment of the present invention is described below in conjunction with drawings and Examples.
Idiographic flow of the present invention comprises the following steps, as shown in Figure 1:
Step 101, simulation logs in microblog;
Because the media event that the present invention be directed to microblog carries out visual presentation, so before the information of the event of acquisition, analog subscriber is needed to log in the process of microblogging website.
Log in the middle of this process of microblog in simulation, first a collection of microblogging account is registered, these accounts informations are utilized to form the user message table of simulating and logging in, when carrying out simulation and logging in, secondly to the hyperlink request of the website transmission login page of microblogging, utilize local information of registered users table, just can provide to website parameters such as logging in required user name, password, cipher mode, realize the simulation logon operation of user.
Because microblog to have the restriction of access times to the operation of user within the scope of certain hour, the phenomenon that the access of too frequent may cause account to block, so after a user logs in successfully, when the page number of times of user's access exceedes certain number of times, from the user message table of this locality, just select another one user to carry out simulation logon operation, in this way, the respective services request of conducting interviews just can carried out microblog, the media event data message required for acquisition.
Step 102, the microblogging relevant according to event keyword retrieval;
An event is made up of keyword and time two parts usually.By screening within the scope of certain hour, the microblogging within the scope of the fixed time just can be got by microblog.
In this step, rely on the event searching interface that microblog provides, the event keyword inputted by user and time range, obtain relevant microblog page.
Step 103, micro-blog information pre-service;
In this step pre-service is carried out to micro-blog information, obtain data set to be analyzed.Concrete process comprises following a few part:
Carry out filtration treatment to the short text in the middle of data set, microblogging number of words being less than certain threshold value filters;
Filter out that influence power in the middle of data set is less, the microblogging of unexpected winner (namely microblogging temperature is less than the microblogging of a certain threshold value), microblogging temperature calculates by following formula:
Heat = retweets + comments 3
Wherein retweets represents microblogging and forwards quantity, and comments represents the comment number of microblogging;
Filter the content information of the non-textual format such as emoticon, webpage link address in microblogging;
Filtration treatment is carried out to distinctive in microblogging "@user name ";
Carry out sequence process according to the temporal information of microblogging, obtain microblogging set continuous in time.
Step 104, microblogging affair clustering;
In this step, clustering processing is carried out to the microblog data collection sorted, obtain continuous print cluster subset in time.In order to the topic making each cluster subset can represent a class, adopt hierarchical clustering algorithm or Once-clustering algorithm (Single-Pass Clustering), continuity simultaneously in order to make cluster event keep certain in time, the Article 1 microblogging fetching data concentrated is as initial cluster subset, in the middle of each step afterwards, all document is divided in the middle of the cluster subset the most similar to the document, if the similarity of the document and current all documents is all less than the threshold value of setting, then using him as a new cluster subset, wherein Documents Similarity calculates and adopts following formula tolerance:
sim ( d , c ) ′ = ( 1 - i m ) × sim ( d , c )
sim ( d , c ) = d → × c → | | d → | | × | | c → | |
Wherein, the number of documents of m representative before the document d place time in time window, i represents the position of document in the middle of time window nearest with document d time gap in cluster c apart.Calculating by the way, the time of document distance cluster is nearer, and its similarity is higher.When calculating the similarity of document, vector space model is set up to document, each section of document representation is become the vector in space, each in vector is the word in document, weights the present invention of each adopts normalized TF-IDF (termfrequency – inverse document frequency) to calculate, and formula is as follows:
w i = TF i × IDF i Σ j = 1 n [ TF j × IDF j ] 2
TF i , j = n i , j Σ k n k , j , IDF i = log ( D | { j : t i ∈ d j } | )
Wherein, n i,jthe occurrence number of this word in document d, it is then the occurrence number sum of all words in a document.D represents the quantity of All Files, | { j:t i∈ d j| represent and comprise word t iquantity of documents.
Step 105, subevent data set keyword abstraction;
By carrying out clustering processing to all relevant microbloggings, be there is certain successional subdata set in time, wherein this event of each subdata set representative topic, by carrying out the keyword abstraction of microblogging to each subevent, the candidate key set of words needing to carry out portmanteau word cloud visual presentation just can be obtained.Keyword abstraction is carried out by following mode:
First word segmentation processing is carried out to each section of document of collection of document, the present invention adopt ICTCLAS participle instrument (Institute of Computing Technology, Chinese Lexical Analysis System, major function comprises Chinese word segmentation; Part-of-speech tagging; Named entity recognition; New word identification; Support user-oriented dictionary simultaneously) carry out document word segmentation processing, obtain the set of letters after processing.In order to make the semantic information of word abundanter, wikipedia entry and network boom word two dictionaries are adopted to carry out phrase merging to original word set, obtain the set of words that meaning is enriched more, in above-mentioned phrase merging process, the algorithm based on maximum coupling is adopted to process original word set, the weight according to weighing each word with under type:
w i=tf i×df i×Heat×|w i|
Wherein tf irepresent the frequency that word i occurs in a document, df irepresent the number of documents comprising word i in collection of document, Heat represents the temperature of microblogging, | w i| represent the length of word i, i.e. the number of word.
In order to the word in outstanding popular microblogging, the weight that each word occurs by the present invention merges mutually with microblogging temperature, by the frequency multiplication that the temperature of microblogging and word occur, as the weight of word, the word elected like this has meaning more, and there is abundanter semantic information based on the relatively short word of long word, so the present invention introduces word length item, allow the weight of long word relatively increase.
Step 106, based on the incident visualization of portmanteau word cloud;
A kind of mode of simple generation portmanteau word cloud adopts label-cloud technology, a word cloud is generated to each subevent in event, but the word cloud that this mode produces is bad visual, even if because the content of two topic discussions is very similar, their word cloud also can be very different.Therefore, when carrying out the displaying of portmanteau word cloud, the present invention needs the word cloud to producing to be optimized process, and concrete processing mode is as follows, as shown in Figure 2:
Identical color, position, rotation mode given in the word appeared in the middle of multiple word cloud, and make them on effect of visualization, keep the consistance of attribute, the fast browsing that helps reader finds the general character between topic.
Control the transparency of word with the idf of word (inverse document frequency), make the common word occurred in multiple word cloud give higher transparency, and lower transparency given in the word making document frequency low.Carry out word unique in the middle of each word cloud outstanding in this way and desalinate the frequent words that many documents occur, thus the content that the grasp topic making reader very fast is discussed.
Step 107, incident visualization is shown;
Take timing node as ordinate (getting the averaging time of all events in each cluster as time node), the mode of cluster subset by text message and portmanteau word cloud is displayed, represent the evolution process of this particular event with this, allow reader while grasping episode topic fast, understand the detail content of each subevent simultaneously.
Fig. 3 gives the embodiment that incident visualization is shown, whole visual figure runs through with a time shaft, and the initial point on the left side represents each time node.Be divided into two row display frames on the right of time shaft, one is subevent microblogging cluster result display frames, and another is the word cloud of subevent.For event " Shenzhen heavy rain " in embodiment, visual presentation is carried out in three subevents of getting wherein: first for Shenzhen heavy rain cause surface gathered water to people trip make troubles; Second can not stop that Shenzhen people buy house for Shenzhen heavy rain; Yellow is reduced to by red early warning for 3rd for Shenzhen heavy rain.First the development trend of outgoing event within a period of time can be seen on the whole, by observing three word clouds, can find out that the word such as " Shenzhen ", " heavy rain " all occurs in three word clouds, illustrate that there is common topic characteristic each subevent, the document frequency occurred due to them is higher, so be endowed higher transparency.
From another one aspect, can find out the word that each word cloud is representative, the frequency that these words occur usually in notebook data set is high and occur lower in other data subsets or do not occur.Words such as " whole cities ", " early warning " in " ponding ", " trip " in such as word cloud one, " the buying house ", " opening quotation " in word cloud two, word cloud three.Due to the low document frequency of these words, so these words often have lower transparency, thus more highlight in word cloud.The main contents of each threads of talk opinion can being understood according to these words reader fast, whether occurring contrasting the difference phenomenon between two document topics by observing word position identical in other word cloud.
This embodiment feature of incident visualization exhibiting method provided by the present invention, the process can assisted reader's main contents that are quick, comprehensively understanding event and develop with event, also can make reader by the difference between the contrast understanding event fast between the word cloud of subevent simultaneously.

Claims (10)

1., based on an incident visualization method for microblog, it is characterized in that, comprising:
Step 1, according to keyword and the time range of this event, by the event searching interface of this microblog, retrieves the microblogging in this time range relevant to this event;
Step 2, sorts this microblogging according to the time, generates a microblogging set;
Step 3, this microblogging set, by clustering algorithm, generates multiple cluster subset;
Step 4, carries out keyword abstraction to the plurality of cluster subset, generates multiple word cloud, and gives identical color, position, rotation mode by this keyword repeated in the plurality of word cloud;
Step 5, by the mode of being carried out showing by this cluster subset sums each this word cloud corresponding thereto, carries out visual presentation by this event.
2. as claimed in claim 1 based on the incident visualization method of microblog, it is characterized in that, also comprise before this step 2:
Step 21, filters number of words in this microblogging in this time range and is less than the microblogging of certain threshold value;
Step 22, filters temperature in this microblogging in this time range and is less than the microblogging of certain threshold value;
Step 23, filters the information of non-textual format in this microblogging in this time range;
Step 24, filters " the@user name " in this microblogging in this time range.
3., as claimed in claim 2 based on the incident visualization method of microblog, it is characterized in that, in this step 22, the computing formula of this temperature is:
Heat = retweets + comments 3
Wherein retweets represents microblogging and forwards quantity, and comments represents the comment number of microblogging, and Heat represents microblogging temperature.
4. as claimed in claim 1 based on the incident visualization method of microblog, it is characterized in that, carry out keyword abstraction in this step 4 to this cluster subset each, the concrete steps generating portmanteau word cloud comprise:
Step 41, carries out word segmentation processing to this cluster subset each, generates set of words;
Step 42, is merged this set of words by wikipedia entry and network boom word, generates this portmanteau word cloud.
5. as claimed in claim 1 based on the incident visualization method of microblog, it is characterized in that, this step 4 also comprises: according to inverse document frequency, gives the high grade of transparency by this word.
6. based on an incident visualization system for microblog, it is characterized in that, comprising:
Retrieval module, for according to the keyword of this event and time range, by the event searching interface of this microblog, retrieves the microblogging in this time range relevant to this event;
Order module, for being sorted according to the time by this microblogging, generates a microblogging set;
Cluster module, for this microblogging set by clustering algorithm, generates multiple cluster subset;
Generate portmanteau word cloud module, for carrying out keyword abstraction to the plurality of cluster subset, generate multiple word cloud, and give identical color, position, rotation mode by this keyword repeated in the plurality of word cloud;
Display module, for the mode by being carried out showing by this cluster subset sums each this word cloud corresponding thereto, carries out visual presentation by this event.
7., as claimed in claim 6 based on the incident visualization system of microblog, it is characterized in that, also comprise filtering module, being less than the microblogging of certain threshold value for filtering number of words in this microblogging in this time range; Filter temperature in this microblogging in this time range and be less than the microblogging of certain threshold value; Filter the information of non-textual format in this microblogging in this time range; Filter " the@user name " in this microblogging in this time range.
8., as claimed in claim 7 based on the incident visualization system of microblog, it is characterized in that, in this filtering module, the computing formula of this temperature is:
Heat = retweets + comments 3
Wherein retweets represents microblogging and forwards quantity, and comments represents the comment number of microblogging, and Heat represents microblogging temperature.
9. as claimed in claim 6 based on the incident visualization system of microblog, it is characterized in that, in this generation portmanteau word cloud module, keyword abstraction is carried out to this cluster subset each, the concrete steps generating portmanteau word cloud comprise: carry out word segmentation processing to this cluster subset each, generate set of words; By wikipedia entry and network boom word, this set of words is merged, generate this portmanteau word cloud.
10., as claimed in claim 6 based on the incident visualization system of microblog, it is characterized in that, this display module also for: according to inverse document frequency, give the high grade of transparency by this word.
CN201410354273.6A 2014-07-23 2014-07-23 A Microblog platform based event visualization method and system Pending CN104536956A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410354273.6A CN104536956A (en) 2014-07-23 2014-07-23 A Microblog platform based event visualization method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410354273.6A CN104536956A (en) 2014-07-23 2014-07-23 A Microblog platform based event visualization method and system

Publications (1)

Publication Number Publication Date
CN104536956A true CN104536956A (en) 2015-04-22

Family

ID=52852484

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410354273.6A Pending CN104536956A (en) 2014-07-23 2014-07-23 A Microblog platform based event visualization method and system

Country Status (1)

Country Link
CN (1) CN104536956A (en)

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933129A (en) * 2015-06-12 2015-09-23 百度在线网络技术(北京)有限公司 Event context acquisition method and system based on micro-blogs
CN105159998A (en) * 2015-09-08 2015-12-16 海南大学 Keyword calculation method based on document clustering
CN106484724A (en) * 2015-08-31 2017-03-08 富士通株式会社 Information processor and information processing method
CN106528624A (en) * 2016-09-30 2017-03-22 财付通支付科技有限公司 Information display method and device
CN106874419A (en) * 2017-01-22 2017-06-20 北京航空航天大学 A kind of real-time focus polymerization of many granularities
CN106886576A (en) * 2017-01-22 2017-06-23 广东广业开元科技有限公司 It is a kind of based on the short text keyword extracting method presorted and system
CN107741929A (en) * 2017-10-18 2018-02-27 网智天元科技集团股份有限公司 The analysis of public opinion method and device
CN107918644A (en) * 2017-10-31 2018-04-17 北京锐思爱特咨询股份有限公司 News subject under discussion analysis method and implementation system in reputation Governance framework
CN108170830A (en) * 2018-01-10 2018-06-15 清华大学 Group event data visualization method and system
CN108376175A (en) * 2018-03-02 2018-08-07 成都睿码科技有限责任公司 Visualization method for displaying news events
CN108415900A (en) * 2018-02-05 2018-08-17 中国科学院信息工程研究所 A kind of visualText INFORMATION DISCOVERY method and system based on multistage cooccurrence relation word figure
CN108595388A (en) * 2018-04-23 2018-09-28 乐山师范学院 A kind of chronicle of events automatic generation method of network-oriented news report
CN108733791A (en) * 2018-05-11 2018-11-02 北京科技大学 network event detection method
CN109063198A (en) * 2018-09-10 2018-12-21 浙江广播电视集团 Melt the multidimensional visual search recommender system of media resource
CN111858908A (en) * 2020-03-03 2020-10-30 北京市计算中心 Method and device for generating newspaper picking text, server and readable storage medium
CN112417026A (en) * 2020-09-23 2021-02-26 郑州大学 Urban waterlogging early warning rainstorm threshold dividing method based on crowd-sourcing waterlogging feedback
CN113157908A (en) * 2021-03-22 2021-07-23 北京邮电大学 Text visualization method for displaying hot sub-topics of social media

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103324665A (en) * 2013-05-14 2013-09-25 亿赞普(北京)科技有限公司 Hot spot information extraction method and device based on micro-blog
US20140019119A1 (en) * 2012-07-13 2014-01-16 International Business Machines Corporation Temporal topic segmentation and keyword selection for text visualization
CN103631862A (en) * 2012-11-02 2014-03-12 中国人民解放军国防科学技术大学 Event characteristic evolution excavation method and system based on microblogs
CN103793481A (en) * 2014-01-16 2014-05-14 中国科学院软件研究所 Microblog word cloud generating method based on user interest mining and accessing supporting system

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140019119A1 (en) * 2012-07-13 2014-01-16 International Business Machines Corporation Temporal topic segmentation and keyword selection for text visualization
CN103631862A (en) * 2012-11-02 2014-03-12 中国人民解放军国防科学技术大学 Event characteristic evolution excavation method and system based on microblogs
CN103324665A (en) * 2013-05-14 2013-09-25 亿赞普(北京)科技有限公司 Hot spot information extraction method and device based on micro-blog
CN103793481A (en) * 2014-01-16 2014-05-14 中国科学院软件研究所 Microblog word cloud generating method based on user interest mining and accessing supporting system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
单月光: "基于微博的网络舆情关键技术的研究与实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *
邱云飞等: "微博突发话题检测方法研究", 《计算机工程》 *
黄珊珊: "基于用户行为的微博信息聚合可视化系统设计和实现", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104933129B (en) * 2015-06-12 2019-04-30 百度在线网络技术(北京)有限公司 Event train of thought acquisition methods and system based on microblogging
CN104933129A (en) * 2015-06-12 2015-09-23 百度在线网络技术(北京)有限公司 Event context acquisition method and system based on micro-blogs
US10324989B2 (en) 2015-06-12 2019-06-18 Baidu Online Network Technology (Beijing) Co., Ltd Microblog-based event context acquiring method and system
CN106484724A (en) * 2015-08-31 2017-03-08 富士通株式会社 Information processor and information processing method
CN105159998A (en) * 2015-09-08 2015-12-16 海南大学 Keyword calculation method based on document clustering
CN106528624A (en) * 2016-09-30 2017-03-22 财付通支付科技有限公司 Information display method and device
CN106874419A (en) * 2017-01-22 2017-06-20 北京航空航天大学 A kind of real-time focus polymerization of many granularities
CN106886576A (en) * 2017-01-22 2017-06-23 广东广业开元科技有限公司 It is a kind of based on the short text keyword extracting method presorted and system
CN106886576B (en) * 2017-01-22 2018-04-03 广东广业开元科技有限公司 It is a kind of based on the short text keyword extracting method presorted and system
CN106874419B (en) * 2017-01-22 2019-09-10 北京航空航天大学 A kind of real-time hot spot polymerization of more granularities
CN107741929A (en) * 2017-10-18 2018-02-27 网智天元科技集团股份有限公司 The analysis of public opinion method and device
CN107918644A (en) * 2017-10-31 2018-04-17 北京锐思爱特咨询股份有限公司 News subject under discussion analysis method and implementation system in reputation Governance framework
CN108170830B (en) * 2018-01-10 2020-07-31 华控清交信息科技(北京)有限公司 Group event data visualization method and system
CN108170830A (en) * 2018-01-10 2018-06-15 清华大学 Group event data visualization method and system
CN108415900A (en) * 2018-02-05 2018-08-17 中国科学院信息工程研究所 A kind of visualText INFORMATION DISCOVERY method and system based on multistage cooccurrence relation word figure
CN108376175A (en) * 2018-03-02 2018-08-07 成都睿码科技有限责任公司 Visualization method for displaying news events
CN108595388A (en) * 2018-04-23 2018-09-28 乐山师范学院 A kind of chronicle of events automatic generation method of network-oriented news report
CN108733791A (en) * 2018-05-11 2018-11-02 北京科技大学 network event detection method
CN109063198A (en) * 2018-09-10 2018-12-21 浙江广播电视集团 Melt the multidimensional visual search recommender system of media resource
CN109063198B (en) * 2018-09-10 2022-02-11 浙江广播电视集团 Multi-dimensional visual search recommendation system for fusing media resources
CN111858908A (en) * 2020-03-03 2020-10-30 北京市计算中心 Method and device for generating newspaper picking text, server and readable storage medium
CN112417026A (en) * 2020-09-23 2021-02-26 郑州大学 Urban waterlogging early warning rainstorm threshold dividing method based on crowd-sourcing waterlogging feedback
CN112417026B (en) * 2020-09-23 2022-10-25 郑州大学 Urban waterlogging early warning rainstorm threshold dividing method based on crowd-sourcing waterlogging feedback
CN113157908A (en) * 2021-03-22 2021-07-23 北京邮电大学 Text visualization method for displaying hot sub-topics of social media

Similar Documents

Publication Publication Date Title
CN104536956A (en) A Microblog platform based event visualization method and system
Agarwal et al. Applying social media intelligence for predicting and identifying on-line radicalization and civil unrest oriented threats
CN103390051B (en) A kind of topic detection and tracking method based on microblog data
CN109829089B (en) Social network user anomaly detection method and system based on associated graph
CN103324665B (en) Hot spot information extraction method and device based on micro-blog
CA3138730C (en) Public-opinion analysis method and system for providing early warning of enterprise risks
Das et al. Sense GST: Text mining & sentiment analysis of GST tweets by Naive Bayes algorithm
CN103577549A (en) Crowd portrayal system and method based on microblog label
CN103745000A (en) Hot topic detection method of Chinese micro-blogs
Chawla et al. Product opinion mining using sentiment analysis on smartphone reviews
Middleton et al. Geoparsing and geosemantics for social media: Spatiotemporal grounding of content propagating rumors to support trust and veracity analysis during breaking news
Lloret et al. A novel concept-level approach for ultra-concise opinion summarization
CN106484764A (en) User's similarity calculating method based on crowd portrayal technology
CN103020159A (en) Method and device for news presentation facing events
CN104504024B (en) Keyword method for digging based on content of microblog and system
CN103577404A (en) Microblog-oriented discovery method for new emergencies
CN104317784A (en) Cross-platform user identification method and cross-platform user identification system
CN105378730A (en) Social media content analysis and output
CN104077417A (en) Figure tag recommendation method and system in social network
TW201426360A (en) System and method of analysing text stream message
CN103150335A (en) Co-clustering-based coal mine public sentiment monitoring system
CN110287329A (en) A kind of electric business classification attribute excavation method based on commodity text classification
CN104978332A (en) UGC label data generating method, UGC label data generating device, relevant method and relevant device
Chowdury et al. A data mining based spam detection system for youtube
CN104408083A (en) Socialized media analyzing system

Legal Events

Date Code Title Description
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20150422

WD01 Invention patent application deemed withdrawn after publication