CN103390051B - A kind of topic detection and tracking method based on microblog data - Google Patents

A kind of topic detection and tracking method based on microblog data Download PDF

Info

Publication number
CN103390051B
CN103390051B CN201310316316.7A CN201310316316A CN103390051B CN 103390051 B CN103390051 B CN 103390051B CN 201310316316 A CN201310316316 A CN 201310316316A CN 103390051 B CN103390051 B CN 103390051B
Authority
CN
China
Prior art keywords
topic
time window
window
microblogging
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201310316316.7A
Other languages
Chinese (zh)
Other versions
CN103390051A (en
Inventor
孙国梓
黄斯琪
杨涛
杨一涛
陈国兰
仇呈燕
郑冬亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201310316316.7A priority Critical patent/CN103390051B/en
Publication of CN103390051A publication Critical patent/CN103390051A/en
Application granted granted Critical
Publication of CN103390051B publication Critical patent/CN103390051B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of topic detection and tracking method based on microblog data, the method is excavated for hidden theme potential in large-scale social network information;First the microblog data that magnanimity increases is carried out piecemeal according to Temporal Order, filter redundancy;Content of text in time window it is analyzed and sorts out semantic independent key subject description after therefrom return is extracted, extracting the topic in different time window;Microblog topic variation tendency is summed up finally by the inheritance of topic and homogeneity between analysis time window.The dynamic development process of topic content can be represented, i.e. the generation of topic by the method, development, climax, the whole process of extinction, topic more accurately, is comprehensively described.

Description

A kind of topic detection and tracking method based on microblog data
Technical field
The present invention relates to data mining technology field, particularly to a kind of topic detection and tracking method based on microblog data.
Background technology
Along with developing rapidly and the progress of Information Communication means of Web2.0, microblogging is grown into and is quickly grown and affect very big network Social Media form in recent years.As a kind of new information carrier and route of transmission, microblogging makes netizen more easily various products and service to be commented on, and participates in the discussion of various much-talked-about topic, initiates at network public sentiment information and plays more and more important effect in communication process.The extensive micro-blog information increased is not all valuable for a user in real time, it is necessary to automatically extracts energy much-talked-about topic interested for user from massive micro-blog information, filters out the redundant data without real value.
Topic is the set of event relevant report.In network, information source is varied, including public's much-talked-about topic of interest, certainly also likely to be present relevant public safety, the sensitive subjects of social stability.Over time, the impact of factors such as culture, its state of development can produce corresponding change to event.Topic develops and reflects the generation from him of some topic, rises, and the process declining and terminating, As time goes on, the intensity of topic and content all can change, and namely there is the migration of topic.The analysis of public opinion is through the mass text data on the Internet are analyzed, and grasps the evolution trend of theme, makes prediction correct in time, for policymaker's reference.
At present, the media such as tradition topic develops and is mainly used in newswire, broadcast, TV, blog, community of forum are Data Source, by a series of data digging method and carry out similarity comparison and reach the purpose of topic detection.In the research of this problem, the text in source-information is particularly important information.Microblogging text is the short text within number of words is limited in 140 characters, and they produce at any time, enormous amount.Due to the restriction of number of words, user generally delivers in the way of more simplifying.Textual form freedom, colloquial style, abbreviation, netspeak, cacography phenomenon are extremely common, and are often imbedded within hypertext, such as expression, picture, video, web page interlinkage etc..If using traditional mode by structure vocabulary-text feature matrix to analyze topic, the exclusive properties of microblogging text self can cause eigenmatrix height sparse, well imagines that the testing result obtained also can be had a greatly reduced quality.And the present invention can solve problem above well.
Summary of the invention
Present invention aim at devising a kind of topic detection and tracking method based on microblog data, the method is to carry out real-time data analysis on extensive increment micro-blog information, modeled by theme, realize topic automatic clustering to generate, and topic association on a timeline and change is set up over time according to topic content and topic strength, sum up the dynamic trend that topic develops.
The technical solution adopted for the present invention to solve the technical problems is: the present invention devises a kind of topic detection and tracking method based on microblog data, the microblog data that magnanimity is increased by the method carries out piecemeal according to Temporal Order, and the content of text in time window is carried out mining analysis, extract the topic in different time window, sum up microblog topic variation tendency finally by the inheritance of topic and homogeneity between analysis time window.The method is mainly by data prediction, and time window topic generates and between time window, the step such as topic association analysis completes.
Method flow:
Step 1: data prediction
1. directivity dialogue interaction message is ignored.Namely neglecting the micro-blog information with " user name " form, this kind of microblogging model does not often have the embodiment row of general topic, can eliminate as much as possible just for noise data mutual between individual after ignoring.
2. former microblog data expands.Being gone out by information extraction in the URL related in microblogging text and add in micro-blog information, the viewpoint supporting user describes.
3. microblogging text type: microblogging text is carried out participle, remove stop words, remove the process of low-frequency word and high frequency words.Consider the comment in microblogging text, forwarding, User Defined label (shape such as the hashtag of " # subject # ") and embedded external linkage (URL), use amended TF-IDF weight algorithm.By each microblogging model formalization, with a multidimensional term vector WiCorresponding.
4. go openness: for the data text that microblogging is shorter, it is carried out the clustering processing based on term vector.(namely first it is expressed as word vector by after microblogging participle, based on word vector, microblogging K mean algorithm is carried out clustering processing.Assume that cluster result is K class, the Twitter message in each class is merged into single document, then obtain the microblogging document D of K synthesis.)
Step 2: in time window, topic generates
1. by discrete in time window t corresponding in time series according to its temporal information through pretreated all data messages, the set in each time window is St={W1,W2,……WMt, continuous print text flow divide into several time windows so originally, the wherein number of documents M in each time windowtCan be the same or different.
2. go openness.Microblog data mostly is short sentence even phrase, for the data content that it is comparatively sparse, it is carried out the clustering processing based on term vector.
3. for the microblogging text that cutting is timeslice, process the text collection in each time window successively, use LDA model to carry out topic model modeling, therefrom extract several themes T, and obtain topic content and topic strength respectively.The topic numbers wherein generated in each window can be the same or different, and topic numbers N is dynamically generated according to the microblogging content of text in each time window by model selection method.
4. certain topic owing to having occurred still can occur with certain probability in ensuing time window, hence with the priori that the posterior probability of the distribution of word in historical time window is excavated as topic in actual time window.It is taken based on the first discrete method that unconditional relies on, for actual time window t, is the word distribution in t-1 and certain weighted value w priori being distributed as word in time window t with time window.
Step 3: topic association analysis between time window
Topic develops and generally refers in different time sections, has the topic trend over time of identical semanteme and the destruction of old topic, generation of new topic etc..Topic relevance between analysis time window, including the inheritance between topic and homogeneity, thus obtaining the Evolution Paths of topic.Wherein, the inheritance between topic is weighed by semantic similarity, and homogeneity is weighed by the similarity in microblogging vector information.By the change of window topic content and intensity, topic is described as some stages by producing to wither away to draw, the newsy variation tendency of shape.Some window topics with sequential relationship and relevance are combined into topic, by the change of window topic content and intensity, a topic are divided into some stages by producing extinction, the evolutionary process of topic is depicted.
Beneficial effect:
1, at data preprocessing phase, take into full account the feature of Twitter message self, consider the forwarding in microblogging, comment, label etc., useless noise data is filtered, it is weighted describing the constructive data of topic, constructs the vector that more can reflect microblogging feature.
2, to the embedded URL contained in microblogging, in this URL data filling pointed to former content of microblog, the quantity of information of microblogging original text will be enriched.
3, owing to microblog data is different from general text data, limited by 140 words, comparatively short and small, use clustering method to solve the problem that text is sparse.
4, extract based on the topic of local time's window, be dynamically determined topic number by model selection method, adopt the window topic with sequential relationship and relevance to describe, it is possible to comparatively accurately to describe the semanteme of topic.
5, adopt the comparison method of weighted array similarity to weigh the association between topic, combine three kinds of different thoughts of similarity and angle, it is to avoid use any single defect to method.
Accompanying drawing explanation
Fig. 1 is microblog data topic detection and tracking method flow diagram of the present invention.
Fig. 2 is that LDA of the present invention generates topic model schematic diagram.
Detailed description of the invention
Below in conjunction with Figure of description, the invention is described in further detail.
Step 1: data prediction
1. directivity dialogue interaction message is ignored.Namely neglecting the micro-blog information with " user name " form, this type of information mostly is the dialogue between the user with directivity, and the probability often describing general topic is less.Noise data can be eliminated as much as possible after removal.
2. former microblog data expands.Being gone out and add in micro-blog information by information extraction in the embedded external linkage (URL) related in microblogging text, the viewpoint supporting user describes.During the data application that extracts to next step TF-IDF value is calculated.
3. microblogging text type.In order to microblog data be standardized, first its data are carried out pretreatment.Through participle, remove stop words, go the process of low-and high-frequency word, and the TF-IDF weight after being changed calculates.
Owing to microblogging is different from other traditional data text, it is possible to the source text that be divided into forwarding microblogging clear and definite by it, current microblogging text and three parts of review information.Although the theme of its information is information expressed in its text, but by forwarding the word occurred in source text and comment to be analyzed, it is possible to more effective, extract the vocabulary that can show article feature more accurately.Such as, if word is forwarding source text, microblogging text and occurring in comment, then this word is just very likely able to represent the descriptor of this microblogging feature, and no matter its TF-IDF value is how many.And at body part, shape such as the label field of " # subject # " form is also that a kind of summary the to theme is embodied, often can summarize and ought win theme to be expressed above.
For case above, traditional TF-IDF weights adding method is modified so that it is being more suitable for the structure in microblogging text vector space, its computational methods are as follows:
tf ij = n i , j Σ k n k , j Formula (1)
ni,j=n_posti,j+o_hashi,j×whashtag+o_urli,j×wurl
In formula (1), tfijRepresent Feature Words j word frequency in microblogging i, nI, jRepresent the Feature Words j number of times occurred in microblogging i, n_posti,jRepresent the Feature Words j number of times occurred in text (including forwarding and comment, remove hashtag, URL) data of microblogging i, n_hashi,j、n_urli,jRepresent the Feature Words j number of times occurred in hashtag and URL in microblogging i, w respectivelyhashtag、wurlThe respectively weighted value of its weighting.ΣknkjShow the total word number in microblogging i.
idf = ( N n + 0.01 ) Formula (2)
In formula (2), N represents total microblogging quantity, and n represents microblogging quantity Feature Words j occur, and 0.01 is constant, in order to avoid 0 value occurs in idf result.
Vij=tfij×idfjFormula (3)
Obtain formal text.Every microblog data after formalization and a multidimensional term vector WiCorresponding:
Wi~(Vi1,Vi2,…Vik) formula (4)
In formula (4), k represents the dimension of term vector, VijRepresent the TF-IDF weight of Feature Words j in microblogging i, formula (3) obtain.
Step 2: in time window, topic generates
1. by its time attribute discretization, the information carrying out pretreatment being become several time dependent chunk, corresponding to each time window in time series, the set in time window t is St={W1,W2,……WMt}.Number of documents M in each time windowtDepending on concrete flow of information, number of documents can be the same or different.
2. go openness.Microblog data mostly is short sentence even phrase, for the data content that it is comparatively sparse, it is carried out the clustering processing based on term vector.In time window t, to StIn term vector WjK mean algorithm is used to carry out clustering processing.Assume that cluster result is K class, the microblog data in each class is merged into single document, then obtain the microblogging document D t of K synthesis.
3. for the microblogging text D that cutting is timeslicetProcess the text collection in each time window successively, use D.M.Blei to carry out topic model modeling at LDA (LatentDirichletAllocation) model that 2003 propose, therefrom extract several themes T, and obtain topic content and topic strength respectively.Detailed process is as shown in Figure 2.
The topic numbers wherein generated in each window can be the same or different, and topic numbers N is dynamically generated according to the microblogging content of text in each time window by model selection method:
P ( w | z ) = ( Γ ( Vβ ) Γ ( β ) V ) N Π i = 1 N Π w Γ ( f j w + β ) Γ ( f j + Vβ ) Formula (4)
The Gamma function that wherein Γ () is standard,Represent the frequency that vocabulary w is distributed to theme j, njRepresent the word number of all words distributing to theme j.Above formula make p (w | z) minimum N be the topic number of the best.
4. the posterior probability utilizing previous time window affects the prior probability of actual time window to maintain intersubjective seriality, solves the topic probability produced problem in ensuing time window occurred.Using first discrete method, it relies on based on unconditional, for actual time window t, is the word distribution in t-1 with time windowWith certain weighted value w as the priori of word distribution in time window tNamely
Formula (5)
Step 3: topic association analysis between time window
Topic develops and generally refers in different time sections, has the topic trend over time of identical semanteme and the destruction of old topic, generation of new topic etc..So needing between window analysis time the relation between topic content, including the inheritance between topic and homogeneity, thus obtaining the variation tendency of topic.Wherein, the inheritance between topic is weighed by semantic similarity, and homogeneity is weighed by the similarity in microblogging vector information.
1. topic inheritance between window: the inheritance between topic shows the similarity in topic content, by Arithmetic of Semantic Similarity, it is weighed.
2. homogeneity between window topic: two topics that semantic similarity is high can not directly represent the trend which constituting topic change, in order to avoid being semantically couple purely, and not there is the content describing same topic function, adopt the comparison method of weighted array similarity to weigh the inheritance between topic.Algorithm combines the different thought of two kinds of similarities of cosine angle-off set and Jaccard coefficient and angle, it is to avoid any single the defect to method of use.Ensure that similarity is in [0,1] interval simultaneously, be worth more big expression Similarity value more high.
Siminh(T1,T2)=Simcos(T1,T2)×α+Simjac(T1,T2) × β formula (6)
In formula, Simcos(T1,T2), Simjac(T1,T2) represent cosine similarity respectively, under Jaccard Coefficient Algorithm, time window 1 and topic T in time window 21, T2Similarity.α, β represent weight coefficient, reflect 2 kinds of different similarities weights size to overall similarity.
Consider the inheritance between topic and homogeneity tolerance, draw and weigh the combination similarity associating judgement between topic:
Simcom(T1,T2)=Siminh(T1,T2)×λ+Simsen(T1,T2) × μ formula (7)
Wherein Simsem(T1,T2), Siminh(T1,T2) it is the algorithm of the tolerance of inheritance and homogeneity between topic respectively, λ, μ is weight coefficient.
2. correlation analysis between window topic: some window topics with sequential relationship and relevance are combined into topic, by the change of window topic content and intensity, is divided into some stages by a topic by producing extinction, the evolutionary process of topic is depicted.
By each window topic T in association analysisiForward direction time window topic Ti-1The new topic raw with backward time window topic, Simcom(Ti,Ti+1) < ε illustrates that Ti is the old topic disappeared, Simcom(Ti,Ti-1) >=ε illustrates that topic obtains succession, and thus process show that topic is by the process producing extinction.By topic detection and tracking approach application to microblog, it is possible to pooling our ideas and make concerted efforts, fast track much-talked-about topic also updates topic temperature, makes up the traditional media weak point to real-time much-talked-about topic follow-up analysis.

Claims (8)

1. the topic detection and tracking method based on microblog data, it is characterised in that be divided into following steps:
Step 1: data prediction;
1. directivity dialogue interaction message is ignored;
2. former microblog data expands;
3. microblogging text type: microblogging text is carried out participle, remove stop words, remove the process of low-frequency word and high frequency words;
4. go openness: for the data text that microblogging is shorter, it is carried out the clustering processing based on term vector;
Step 2: in time window, topic generates;
1. by discrete in time window t corresponding in time series according to its temporal information through pretreated all data messages;
2. go openness: microblog data mostly is short sentence even phrase, for the data content that it is comparatively sparse, it is carried out the clustering processing based on term vector;
3. for the microblogging text that cutting is timeslice, process the text collection in each time window successively, use LDA model to carry out topic model modeling, therefrom extract several themes T, and obtain topic content and topic strength respectively;The topic numbers wherein generated in each window can be the same or different, and topic numbers N is dynamically generated according to the microblogging content of text in each time window by model selection method;
4. certain topic owing to having occurred still can occur with certain probability in ensuing time window, hence with the priori that the posterior probability of the distribution of word in historical time window is excavated as topic in actual time window;It is taken based on the first discrete method that unconditional relies on, for actual time window t, is the word distribution in t-1 and certain weighted value w priori being distributed as word in time window t with time window;
Step 3: topic association analysis between time window;
Topic develops and generally refers in different time sections, has the topic trend over time of identical semanteme and the destruction of old topic, the generation of new topic;Topic relevance between analysis time window, including the inheritance between topic and homogeneity, thus obtaining the Evolution Paths of topic;Wherein, the inheritance between topic is weighed by semantic similarity, and homogeneity is weighed by the similarity in microblogging vector information;By the change of window topic content and intensity, topic is described as some stages by producing to wither away to draw, the newsy variation tendency of shape;Some window topics with sequential relationship and relevance are combined into topic, by the change of window topic content and intensity, a topic are divided into some stages by producing extinction, the evolutionary process of topic is depicted.
2. a kind of topic detection and tracking method based on microblog data according to claim 1, it is characterised in that: in the step 1 of described method, at data preprocessing phase, ignore directivity dialogue interaction message;Namely neglecting the micro-blog information with " user name " form, this type of information mostly is the dialogue between the user with directivity, and the probability often describing general topic is less.
3. a kind of topic detection and tracking method based on microblog data according to claim 1, it is characterized in that: in the step 1 of described method, originally sparse microblog data information is expanded, information extraction in the embedded external linkage URL related in microblogging text is gone out and adds in micro-blog information, and the viewpoint supporting user describes;In being calculated to the TF-IDF value improved for microblogging feature by the data application extracted, it, for the text in micro-blog information, comment, forwards and imparts different weights.
4. a kind of topic detection and tracking method based on microblog data according to claim 1, it is characterized in that: in the step 1 of described method, microblog data is gone openness process, microblog data mostly is short sentence even phrase, for the data content that it is comparatively sparse, it is carried out the clustering processing based on term vector;In time window t, to StIn term vector WjK mean algorithm is used to carry out clustering processing;Assume that cluster result is K class, the microblog data in each class is merged into single document, then obtain the microblogging document D t of K synthesis.
5. a kind of topic detection and tracking method based on microblog data according to claim 1, it is characterised in that: in the step 2 of described method, for the microblogging text D that cutting is timeslicetProcess the text collection in each time window successively, LDA (LatentDirichletAllocation) the model jargon topic model modeling that application D.M.Blei proposed in 2003, therefrom extract several themes T, and obtain topic content and topic strength respectively, the topic numbers wherein generated in each window can be the same or different, and topic numbers N is dynamically generated according to the microblogging content of text in each time window by model selection method:
P ( w | z ) = ( &Gamma; ( V &beta; ) &Gamma; ( &beta; ) V ) N &Pi; i = 1 N &Pi; w &Gamma; ( f j w + &beta; ) &Gamma; ( f j + V &beta; ) .
6. a kind of topic detection and tracking method based on microblog data according to claim 1, it is characterized in that: in the step 3 of described method, utilize the posterior probability of previous time window to affect the prior probability of actual time window to maintain intersubjective seriality;The first discrete method of application, it relies on based on unconditional, for actual time window t, is the word distribution in t-1 with time windowWith certain weighted value w as the priori of word distribution in time window tNamely
7. a kind of topic detection and tracking method based on microblog data according to claim 1, it is characterised in that: in the step 2 of described method, adopt the comparison method of weighted array similarity to weigh the inheritance between topic;Method combines the different thought of two kinds of similarities of cosine angle-off set and Jaccard coefficient and angle, it is to avoid any single the defect to method of use, ensure that similarity is in [0,1] interval simultaneously, be worth more big expression Similarity value more high;
Siminh(T1,T2)=Simcos(T1,T2)×α+Simjac(T1,T2)×β
In formula, Simcos(T1,T2), Simjac(T1,T2) representing cosine similarity respectively, under Jaccard Coefficient Algorithm, time window 1 and the similarity of topic T1, T2 in time window 2, α, β represents weight coefficient, reflects 2 kinds of different similarities weights size to overall similarity;
Consider the inheritance between topic and homogeneity tolerance, draw and weigh the combination similarity associating judgement between topic:
Simcom(T1,T2)=Siminh(T1,T2)×λ+Simsen(T1,T2)×μ
Wherein Simsem(T1,T2), Siminh(T1,T2) it is the algorithm of the tolerance of inheritance and homogeneity between topic respectively, λ, μ is weight coefficient;
Inheritance between topic shows the similarity in topic content, by Arithmetic of Semantic Similarity, it is weighed.
8. a kind of topic detection and tracking method based on microblog data according to claim 1, it is characterized in that: in the step 3 of described method, between window topic, correlation analysis is that some window topics with sequential relationship and relevance are combined into topic, change by window topic content and intensity, one topic is divided into some stages by producing extinction, the evolutionary process of topic is depicted;
By each window topic T in association analysisiForward direction time window topic Ti-1The new topic raw with backward time window topic, Simcom(Ti,Ti+1) < ε illustrates that Ti is the old topic disappeared, Simcom(Ti,Ti-1) >=ε illustrates that topic obtains succession, and thus process show that topic is by the process producing extinction.
CN201310316316.7A 2013-07-25 2013-07-25 A kind of topic detection and tracking method based on microblog data Expired - Fee Related CN103390051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310316316.7A CN103390051B (en) 2013-07-25 2013-07-25 A kind of topic detection and tracking method based on microblog data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310316316.7A CN103390051B (en) 2013-07-25 2013-07-25 A kind of topic detection and tracking method based on microblog data

Publications (2)

Publication Number Publication Date
CN103390051A CN103390051A (en) 2013-11-13
CN103390051B true CN103390051B (en) 2016-07-20

Family

ID=49534323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310316316.7A Expired - Fee Related CN103390051B (en) 2013-07-25 2013-07-25 A kind of topic detection and tracking method based on microblog data

Country Status (1)

Country Link
CN (1) CN103390051B (en)

Families Citing this family (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678277A (en) * 2013-12-04 2014-03-26 东软集团股份有限公司 Theme-vocabulary distribution establishing method and system based on document segmenting
CN103699611B (en) * 2013-12-16 2017-01-11 浙江大学 Microblog flow information extracting method based on dynamic digest technology
CN104731811B (en) * 2013-12-20 2018-10-09 北京师范大学珠海分校 A kind of clustering information evolution analysis method towards extensive dynamic short text
CN103793478B (en) * 2014-01-14 2017-01-11 四川大学 Online theme modeling method on basis of theme heredity
CN103793501B (en) * 2014-01-20 2016-03-02 惠州学院 Based on the theme Combo discovering method of social networks
CN103970863B (en) * 2014-05-08 2017-12-19 清华大学 The method for digging and system of microblog users interest based on LDA topic models
CN103984729A (en) * 2014-05-19 2014-08-13 北京大学 Microblog information tracing method and microblog information tracing method
CN103984731B (en) * 2014-05-19 2017-03-08 北京大学 Self adaptation topic tracking method and apparatus under microblogging environment
CN104281653B (en) * 2014-09-16 2018-07-27 南京弘数信息科技有限公司 A kind of opining mining method for millions scale microblogging text
CN105760410B (en) * 2015-04-15 2019-04-19 北京工业大学 A kind of microblogging semanteme expansion model and method based on forwarding comment
CN106294405A (en) * 2015-05-22 2017-01-04 国家计算机网络与信息安全管理中心 A kind of microblogging topic evolution analysis method and device
CN105138684B (en) * 2015-09-15 2018-12-14 联想(北京)有限公司 A kind of information processing method and information processing unit
CN105260358A (en) * 2015-10-14 2016-01-20 上海大学 Short text-oriented unexpected incident development process representation method
CN106599002B (en) * 2015-10-19 2020-06-05 北京国双科技有限公司 Topic evolution analysis method and device
CN105354333B (en) * 2015-12-07 2018-11-06 天云融创数据科技(北京)有限公司 A kind of method for extracting topic based on newsletter archive
CN106055538B (en) * 2016-05-26 2019-03-08 达而观信息科技(上海)有限公司 The automatic abstracting method of the text label that topic model and semantic analysis combine
US10275444B2 (en) 2016-07-15 2019-04-30 At&T Intellectual Property I, L.P. Data analytics system and methods for text data
CN106354818B (en) * 2016-08-30 2020-01-10 电子科技大学 Social media-based dynamic user attribute extraction method
CN107918611A (en) * 2016-10-09 2018-04-17 郑州大学 A kind of model analyzed microblog topic and developed
CN106570088A (en) * 2016-10-20 2017-04-19 浙江大学 Discovering and evolution tracking method for scientific research document topics
CN106557551A (en) * 2016-10-27 2017-04-05 西南石油大学 Scale forecast method and system is propagated based on the microblogging that microblogging affair clustering is modeled
CN106570167A (en) * 2016-11-08 2017-04-19 南京理工大学 Knowledge-integrated subject model-based microblog topic detection method
CN106776503B (en) * 2016-12-22 2020-03-10 东软集团股份有限公司 Text semantic similarity determination method and device
CN106649726A (en) * 2016-12-23 2017-05-10 中山大学 Association-topic evolution mining method in social network
CN106934014B (en) * 2017-03-10 2021-03-19 山东省科学院情报研究所 Hadoop-based network data mining and analyzing platform and method thereof
CN107025299B (en) * 2017-04-24 2018-02-27 北京理工大学 A kind of financial public sentiment cognitive method based on weighting LDA topic models
CN107203513A (en) * 2017-06-06 2017-09-26 中国人民解放军国防科学技术大学 Microblogging text data fine granularity topic evolution analysis method based on probabilistic model
CN107835113B (en) * 2017-07-05 2020-09-08 中山大学 Method for detecting abnormal user in social network based on network mapping
CN108399162A (en) * 2018-03-21 2018-08-14 北京理工大学 The topic of phrase-based bag topic model finds method
CN108717421B (en) * 2018-04-23 2023-01-24 深圳市城市规划设计研究院有限公司 Social media text theme extraction method and system based on space-time change
CN108763208B (en) * 2018-05-22 2023-09-05 腾讯科技(上海)有限公司 Topic information acquisition method, topic information acquisition device, server and computer-readable storage medium
CN109543110A (en) * 2018-11-28 2019-03-29 南京航空航天大学 A kind of microblog emotional analysis method and system
CN110059225B (en) * 2019-03-11 2022-02-15 北京奇艺世纪科技有限公司 Video classification method and device, terminal equipment and storage medium
CN111125305A (en) * 2019-12-05 2020-05-08 东软集团股份有限公司 Hot topic determination method and device, storage medium and electronic equipment
CN111666268A (en) * 2020-05-20 2020-09-15 安徽火蓝数据有限公司 Microblog big data public opinion analysis method
CN112905751B (en) * 2021-03-19 2024-03-29 常熟理工学院 Topic evolution tracking method combining topic model and twin network model
CN113127643A (en) * 2021-05-11 2021-07-16 江南大学 Deep learning rumor detection method integrating microblog themes and comments

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116651A (en) * 2013-03-05 2013-05-22 南京理工大学常熟研究院有限公司 Public sentiment hot topic dynamic detection method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120041953A1 (en) * 2010-08-16 2012-02-16 Microsoft Corporation Text mining of microblogs using latent topic labels

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116651A (en) * 2013-03-05 2013-05-22 南京理工大学常熟研究院有限公司 Public sentiment hot topic dynamic detection method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种话题演化建模与分析方法;胡艳丽等;《自动化学报》;20121031;第38卷(第10期);第1690-1697页 *
科技文献话题演化研究;贺亮;《现代图书情报技术》;20120430(第4期);第61-67页 *

Also Published As

Publication number Publication date
CN103390051A (en) 2013-11-13

Similar Documents

Publication Publication Date Title
CN103390051B (en) A kind of topic detection and tracking method based on microblog data
Li et al. Filtering out the noise in short text topic modeling
Hu et al. Text analytics in social media
CN105069102B (en) Information push method and apparatus
CN103023714B (en) The liveness of topic Network Based and cluster topology analytical system and method
CN103793503B (en) Opinion mining and classification method based on web texts
Du et al. Extracting and tracking hot topics of micro-blogs based on improved Latent Dirichlet Allocation
Zhang et al. Encoding conversation context for neural keyphrase extraction from microblog posts
CN104484431B (en) A kind of multi-source Personalize News webpage recommending method based on domain body
Hou et al. Newsminer: Multifaceted news analysis for event search
CN104536956A (en) A Microblog platform based event visualization method and system
CN109670039A (en) Sentiment analysis method is commented on based on the semi-supervised electric business of tripartite graph and clustering
CN104965823A (en) Big data based opinion extraction method
Merhav et al. Extracting information networks from the blogosphere
CN103455487A (en) Extracting method and device for search term
CN102955853A (en) Method and device for generating cross-language abstract
Musaev et al. Fast text classification using randomized explicit semantic analysis
Huang et al. Topic detection from microblog based on text clustering and topic model analysis
Wang et al. Seeft: Planned social event discovery and attribute extraction by fusing twitter and web content
Zhao et al. Towards events detection from microblog messages
CN103970865B (en) Microblog text level subject finding method and system based on seed words
Lampos Detecting events and patterns in large-scale user generated textual streams with statistical learning methods
Bellaachia et al. Learning from twitter hashtags: Leveraging proximate tags to enhance graph-based keyphrase extraction
Luo et al. Structuring T weets for improving T witter search
CN106372147B (en) Heterogeneous topic network construction and visualization method based on text network

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20131113

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: 2016320000218

Denomination of invention: Topic detection and tracking method based on microblog data

Granted publication date: 20160720

License type: Common License

Record date: 20161118

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
EC01 Cancellation of recordation of patent licensing contract

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: NANJING University OF POSTS AND TELECOMMUNICATIONS

Contract record no.: 2016320000218

Date of cancellation: 20170706

EC01 Cancellation of recordation of patent licensing contract
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20160720