CN103390051A - Topic detection and tracking method based on microblog data - Google Patents

Topic detection and tracking method based on microblog data Download PDF

Info

Publication number
CN103390051A
CN103390051A CN2013103163167A CN201310316316A CN103390051A CN 103390051 A CN103390051 A CN 103390051A CN 2013103163167 A CN2013103163167 A CN 2013103163167A CN 201310316316 A CN201310316316 A CN 201310316316A CN 103390051 A CN103390051 A CN 103390051A
Authority
CN
China
Prior art keywords
topic
microblogging
time window
window
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013103163167A
Other languages
Chinese (zh)
Other versions
CN103390051B (en
Inventor
孙国梓
黄斯琪
杨一涛
陈国兰
仇呈燕
郑冬亚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing Post and Telecommunication University
Nanjing University of Posts and Telecommunications
Original Assignee
Nanjing Post and Telecommunication University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing Post and Telecommunication University filed Critical Nanjing Post and Telecommunication University
Priority to CN201310316316.7A priority Critical patent/CN103390051B/en
Publication of CN103390051A publication Critical patent/CN103390051A/en
Application granted granted Critical
Publication of CN103390051B publication Critical patent/CN103390051B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses a topic detection and tracking method based on microblog data. In the method, potential hidden subjects in large-scale social network information are mined. The method comprises the following steps: firstly, partitioning microblog data increasing massively according to time sequence properties, and filtering redundant information; secondly, analyzing and classifying text contents in time windows, returning key subject descriptions with independent semantics after extraction, and extracting topics in different time windows; and lastly, analyzing the inheritance and the identity of topics among the time windows to conclude the variation tendency of microblog topics. According to the method, the dynamic developing process of topic contents, namely, the generation, development, climax and extinction of topics can be shown, and topics are described more accurately and fully.

Description

A kind of method of topic detection and tracking based on the microblogging data
Technical field
The present invention relates to the data mining technology field, particularly a kind of method of topic detection and tracking based on the microblogging data.
Background technology
Along with the progress with the Information Communication means of developing rapidly of Web2.0, microblogging is grown into development rapidly and is affected very large network Social Media form in recent years.As a kind of new information carrier and route of transmission, microblogging can be commented on the netizen more easily to various products and service, participates in the discussion of various much-talked-about topics, plays a part more and more important in network public sentiment information initiation and communication process.The extensive micro-blog information that increases is not all valuable for the user in real time, need to automatically extract energy much-talked-about topic interested for user from the magnanimity micro-blog information, filters out the redundant data without actual value.
Topic is the set of event relevant report.In network, information source is varied, and the much-talked-about topic comprising the public is concerned about, also may exist relevant public safety certainly, the sensitive subjects of social stability.Event is along with the time, and culture waits the impact of factors, and its state of development can produce corresponding variation.Topic develops and has reflected the generation of some topics from him, rises, and a process that descends and finish, As time goes on, the intensity of topic and content all can change, and namely have the migration of topic.The analysis of public opinion is exactly by the mass text data analysis on internet, grasps the evolution trend of theme, makes in time correct prediction, for decision maker's reference.
At present, traditional topic develops and is mainly used in newswire, broadcasting, and TV, blog, the media such as forum community are Data Source, by a series of data digging method and carry out similarity and compare to reach the purpose of topic detection.In the research of this problem, the text in source-information is very important information.The microblogging text be word number limit at 140 characters with interior short text, they produce at any time, enormous amount.Due to the restriction of number of words, the user delivers in the mode of more simplifying usually.Textual form freedom, colloquial style, abbreviation, netspeak, misspelling phenomenon are very common, and often embed hypertext, as expression, picture, video, web page interlinkage etc.If with traditional mode of passing through structure vocabulary-text feature matrix, analyze topic, the exclusive properties of microblogging text self can cause the eigenmatrix height sparse, well imagines that the testing result that obtains also can have a greatly reduced quality.And the present invention can solve top problem well.
Summary of the invention
The object of the invention has been to design a kind of method of topic detection and tracking based on the microblogging data, the method is to carry out real-time data analysis on extensive increment micro-blog information, by the theme modeling, realize that the topic automatic clustering generates, and according to topic content and topic intensity, set up over time the related and variation of topic on time shaft, sum up the dynamic trend that topic develops.
The technical solution adopted for the present invention to solve the technical problems is: the present invention has designed a kind of method of topic detection and tracking based on the microblogging data, the method is carried out piecemeal with the microblogging data that magnanimity increases according to Temporal Order, and the content of text in time window is carried out mining analysis, extract the topic in the different time window, inheritance and the homogeneity by topic between the analysis time window sums up the microblog topic variation tendency finally.The method is mainly by the data pre-service, the time window topic generate and time window between the step such as topic association analysis complete.
Method flow:
Step 1: data pre-service
1. ignore the interactive message of directive property dialogue.Namely neglect the micro-blog information with "@user name " form, this class microblogging model does not often have the embodiment row of general topic, can eliminate as much as possible only for noise data mutual between the individual after ignoring.
2. former microblogging data extending.Information extraction in the URL that relates in the microblogging text is gone out and add in micro-blog information, the viewpoint of supporting user is described.
3. microblogging text type: the processing that the microblogging text is carried out participle, removes stop words, removes low-frequency word and high frequency words.Consider comment, forwarding, User Defined label (shape is as the hashtag of " # subject # ") and embedded external linkage (URL) in the microblogging text, use amended TF-IDF Weight algorithm.With each microblogging model formalization, with a multidimensional term vector W iCorresponding.
4. go sparse property: for the shorter data text of microblogging, it is carried out clustering processing based on term vector.(namely, at first with being expressed as the word vector after the microblogging participle, based on the word vector, microblogging is carried out clustering processing with the K mean algorithm.Suppose that cluster result is the K class, the Twitter message in each class is merged into single document, obtained K synthetic microblogging document D.)
Step 2: in time window, topic generates
1. will be discrete according to its time information in time window t corresponding on the time sequence through pretreated all data messages, the set in each time window is S t={ W 1, W 2... W Mt, continuous text flow has been divided into several time windows so originally, wherein the number of documents M in each time window tCan be the same or different.
2. go sparse property.The microblogging data mostly are even phrase of short sentence, for its comparatively sparse data content, it are carried out clustering processing based on term vector.
3. be the microblogging text of timeslice for cutting, process successively the text collection in each time window, use the LDA model to carry out the topic model modeling, therefrom extract several themes T, and obtain respectively topic content and topic intensity.The topic quantity that wherein generates in each window can be the same or different, and topic quantity N is dynamically generated according to the microblogging content of text in each time window by model selection method.
4., because certain topic that had occurred still can occur with certain probability in ensuing time window, therefore utilize the priori of the posterior probability of the distribution of word in the historical time window as topic excavation in the current time window.Taking the first discrete method that relies on based on non-condition,, for current time window t, is that the interior word of t-1 distributes and the priori of certain weighted value w as word distribution in time window t with the time window.
Step 3: topic association analysis between time window
Topic develops and mainly to refer on different time sections, and the topic with identical semanteme is trend over time, and the destruction of old topic, generation of new topic etc.Topic relevance between the analysis time window, comprise inheritance and homogeneity between topic, thereby obtain the Evolution Paths of topic.Wherein, the inheritance between topic is weighed by semantic similarity, and homogeneity is weighed by the similarity in the microblogging vector information.By the variation of window topic content and intensity, topic is drawn the some stages, the newsy variation tendency of shape of being described as by producing to wither away.Some window topics that will have sequential relationship and relevance are combined into topic, by the variation of window topic content and intensity, a topic are divided into some stages by producing to wither away, and describe out the evolutionary process of topic.
Beneficial effect:
1, at data preprocessing phase, take into full account the characteristics of Twitter message self, consider the forwarding in microblogging, comment, labels etc., filter useless noise data, be weighted describing the constructive data of topic, constructed and more can reflect the vector of microblogging feature.
2, the embedded URL to containing in microblogging, in former microblogging content, enrich the quantity of information of microblogging original text with the data filling of this URL sensing.
3, because the microblogging data are different from general text data, limited by 140 words, comparatively short and small, use clustering method to solve the sparse problem of text.
4, the topic based on local time's window extracts, and by model selection method, dynamically determines the topic number, adopts the window topic with sequential relationship and relevance to describe, and can comparatively accurately describe the semanteme of topic.
5, the comparison method of employing weighted array similarity is weighed the association between topic, combines three kinds of thought and angles that similarity is different, has avoided using any single defect to method.
Description of drawings
Fig. 1 is microblogging data topic detection and tracking method flow diagram of the present invention.
Fig. 2 is that LDA of the present invention generates topic model schematic diagram.
Embodiment
Below in conjunction with Figure of description, the invention is described in further detail.
Step 1: data pre-service
1. ignore the interactive message of directive property dialogue.Namely neglect the micro-blog information with "@user name " form, this type of information mostly is the dialogue between the user with directive property, and the possibility of often describing general topic is less.Can eliminate as much as possible noise data after removal.
2. former microblogging data extending.Information extraction in the embedded external linkage (URL) that relates in the microblogging text is gone out and adds in micro-blog information, and the viewpoint of supporting user is described.During the data application that extracts is calculated to next step TF-IDF value.
3. microblogging text type., for the microblogging data normalization, at first its data are carried out pre-service.Through participle, remove stop words, go the processing of low-and high-frequency word, and the TF-IDF weight calculation after changing.
The traditional data text that is different from other due to microblogging, can be with its clear and definite source text that forwards microblogging, three parts of current microblogging text and review information of being divided into.Although the theme of its information is information expressed in its text, by to forwarding the word that occurs in source text and comment, analyzing, can be more effective, extract more accurately the vocabulary that can show the article feature.For example,, if a word is forwarding source text, occur in microblogging text and comment, this word is very likely just the descriptor that can represent this microblogging feature, and no matter how many its TF-IDF values is.And at body part, shape is also that a kind of summary to theme embodies as the label field of " # subject # " form, often can summarize when the rich theme to be expressed of preamble.
For above situation, traditional TF-IDF weights adding method is modified, make its structure that is more suitable for microblogging text vector space, its computing method are as follows:
tf ij = n i , j Σ k n k , j Formula (1)
n i,j=n_post i,j+o_hash i,j×w hashtag+o_url i,j×w url
In formula (1), tf ijThe word frequency of representation feature word j in microblogging i, n I, jRepresentation feature word j occurs in microblogging i number of times, n_post i,jRepresentation feature word j occurs in text (comprise and forwarding and comment, remove hashtag, the URL) data of microblogging i number of times, n_hash i,j, n_url i,jThe number of times that occurs in hashtag and URL in microblogging i of representation feature word j respectively, w Hashtag, w urlBe respectively the weighted value of its weighting.Σ kn kjShow the total word number in microblogging i.
idf = ( N n + 0.01 ) Formula (2)
In formula (2), N represents total microblogging quantity, and n represents to occur the microblogging quantity of Feature Words j, the 0.01st, and constant, 0 value occurs for fear of the idf result.
V ij=tf ij* idf jFormula (3)
Obtain formal text.Every microblogging data and a multidimensional term vector W after formalization iCorresponding:
W i~(V i1, V i2... V ik) formula (4)
In formula (4), k represents the dimension of term vector, V ijIn expression microblogging i, the TF-IDF weight of Feature Words j, obtained by formula (3).
Step 2: in time window, topic generates
1. will carry out pretreated information and become several time dependent message block by its time attribute discretization, corresponding to each time window on time series, the set in time window t is S t={ W 1, W 2... W Mt.Number of documents M in each time window tDepending on concrete information flow, number of documents can be the same or different.
2. go sparse property.The microblogging data mostly are even phrase of short sentence, for its comparatively sparse data content, it are carried out clustering processing based on term vector.In time window t, to S tIn term vector W jUse the K mean algorithm to carry out clustering processing.Suppose that cluster result is the K class, the microblogging data in each class are merged into single document, obtained K synthetic microblogging document D t.
3. be the microblogging text D of timeslice for cutting tProcess successively the text collection in each time window, LDA (the Latent Dirichlet Allocation) model that uses D.M.Blei to propose in 2003 carries out the topic model modeling, therefrom extracts several themes T, and obtains respectively topic content and topic intensity.Detailed process as shown in Figure 2.
The topic quantity that wherein generates in each window can be the same or different, and topic quantity N is dynamically generated according to the microblogging content of text in each time window by model selection method:
P ( w | z ) = ( Γ ( Vβ ) Γ ( β ) V ) N Π i = 1 N Π w Γ ( f j w + β ) Γ ( f j + Vβ ) Formula (4)
Wherein Γ () is the Gamma function of standard, The frequency of theme j, n are distributed to vocabulary w in expression jThe word number that represents the word that all distribute to theme j.Make the minimum N of p (w|z) be best topic number in following formula.
Figure BDA00003567047000041
4. the prior probability that utilizes the posterior probability of last time window to affect the current time window is kept intersubjective continuity, solves the topic problem that probability occurs in ensuing time window that had occurred.Use first discrete method, it relies on based on non-condition,, for current time window t, with the time window, is that the interior word of t-1 distributes
Figure BDA00003567047000042
With the priori of certain weighted value w as word distribution in time window t
Figure BDA00003567047000043
Namely
Figure BDA00003567047000044
Formula (5)
Step 3: topic association analysis between time window
Topic develops and mainly to refer on different time sections, and the topic with identical semanteme is trend over time, and the destruction of old topic, generation of new topic etc., so need the relation between the topic content between window analysis time, comprise inheritance and homogeneity between topic, thereby obtain the variation tendency of topic.Wherein, the inheritance between topic is weighed by semantic similarity, and homogeneity is weighed by the similarity in the microblogging vector information.
1. topic inheritance between window: the inheritance between topic shows the similarity on the topic content, by Arithmetic of Semantic Similarity, it is weighed.
2. homogeneity between the window topic: high two topics of semantic similarity can not direct representation its formed the trend that topic changes, for fear of being semantically to be coupled purely, and not having the content of describing same topic function, the comparison method of employing weighted array similarity is weighed the inheritance between topic.Combine different thought and the angles of two kinds of similarities of cosine angle-off set and Jaccard coefficient in algorithm, avoided using any single defect to method.Simultaneously can guarantee similarity in [0,1] interval, be worth larger expression similarity value higher.
Sim inh(T 1, T 2)=Sim cos(T 1, T 2) * α+Sim jac(T 1, T 2) * β formula (6)
In formula, Sim cos(T 1, T 2), Sim jac(T 1, T 2) represent respectively the cosine similarity, under the Jaccard Coefficient Algorithm, topic T in time window 1 and time window 2 1, T 2Similarity.α, β represents weighting coefficient, has reflected that 2 kinds of different similarities are big or small to the weights of overall similarity.
Consider inheritance and homogeneity tolerance between topic, draw the combination similarity of weighing related judgement between topic:
Sim com(T 1, T 2)=Sim inh(T 1, T 2) * λ+Sim sen(T 1, T 2) * μ formula (7)
Sim wherein sem(T 1, T 2), Sim inh(T 1, T 2) be respectively the algorithm of the tolerance of inheritance and homogeneity between topic, λ, μ are weighting coefficient.
2. correlation analysis between the window topic: some window topics that will have sequential relationship and relevance are combined into topic, by the variation of window topic content and intensity, a topic are divided into some stages by producing extinction, describe out the evolutionary process of topic.
In association analysis with each window topic T iForward direction time window topic T i-1With the new topic that backward time window topic is given birth to, Sim com(T i, T i+1The old topic of)<ε explanation Ti for disappearing, Sim com(T i, T i-1) 〉=ε explanation topic has obtained succession, and process draws topic by producing to the process of withering away thus.The topic detection and tracking method is applied to the microblogging platform, can pool our ideas and make concerted efforts, follow the trail of fast much-talked-about topic and upgrade the topic temperature, make up the weak point of traditional media to real-time much-talked-about topic follow-up analysis.

Claims (8)

1. the method for the topic detection and tracking based on the microblogging data, is characterized in that, is divided into following steps:
Step 1: data pre-service;
1. ignore the interactive message of directive property dialogue;
2. former microblogging data extending;
3. microblogging text type: the processing that the microblogging text is carried out participle, removes stop words, removes low-frequency word and high frequency words;
4. go sparse property: for the shorter data text of microblogging, it is carried out clustering processing based on term vector;
Step 2: in time window, topic generates;
1. will be discrete according to its time information in time window t corresponding on the time sequence through pretreated all data messages;
2. go sparse property: the microblogging data mostly are even phrase of short sentence, for its comparatively sparse data content, it are carried out clustering processing based on term vector;
3. be the microblogging text of timeslice for cutting, process successively the text collection in each time window, use the LDA model to carry out the topic model modeling, therefrom extract several themes T, and obtain respectively topic content and topic intensity; The topic quantity that wherein generates in each window can be the same or different, and topic quantity N is dynamically generated according to the microblogging content of text in each time window by model selection method;
4., because certain topic that had occurred still can occur with certain probability in ensuing time window, therefore utilize the priori of the posterior probability of the distribution of word in the historical time window as topic excavation in the current time window; Taking the first discrete method that relies on based on non-condition,, for current time window t, is that the interior word of t-1 distributes and the priori of certain weighted value w as word distribution in time window t with the time window;
Step 3: topic association analysis between time window;
Topic develops and mainly to refer on different time sections, and the topic with identical semanteme is trend over time, and the destruction of old topic, the generation of new topic; Topic relevance between the analysis time window, comprise inheritance and homogeneity between topic, thereby obtain the Evolution Paths of topic; Wherein, the inheritance between topic is weighed by semantic similarity, and homogeneity is weighed by the similarity in the microblogging vector information; By the variation of window topic content and intensity, topic is drawn the some stages, the newsy variation tendency of shape of being described as by producing to wither away; Some window topics that will have sequential relationship and relevance are combined into topic, by the variation of window topic content and intensity, a topic are divided into some stages by producing to wither away, and describe out the evolutionary process of topic.
2. a kind of method of topic detection and tracking based on the microblogging data according to claim 1, is characterized in that: in the step 1 of described method,, at data preprocessing phase, ignore the interactive message of directive property dialogue; Namely neglect the micro-blog information with "@user name " form, this type of information mostly is the dialogue between the user with directive property, and the possibility of often describing general topic is less.
3. a kind of method of topic detection and tracking based on the microblogging data according to claim 1, it is characterized in that: in the step 1 of described method, the sparse microblogging data message of script is expanded, information extraction in the embedded external linkage (URL) that relates in the microblogging text is gone out and adds in micro-blog information, and the viewpoint of supporting user is described; To in calculating for the improved TF-IDF value of microblogging feature, it is for the text in micro-blog information with the data application that extracts, and comment, forward and given different weights.
4. a kind of method of topic detection and tracking based on the microblogging data according to claim 1, it is characterized in that: in the step 1 of described method, the microblogging data are gone the processing of sparse property, the microblogging data mostly are even phrase of short sentence, for its comparatively sparse data content, it is carried out clustering processing based on term vector; In time window t, to S tIn term vector W jUse the K mean algorithm to carry out clustering processing; Suppose that cluster result is the K class, the microblogging data in each class are merged into single document, obtained K synthetic microblogging document D t.
5. a kind of method of topic detection and tracking based on the microblogging data according to claim 1, is characterized in that: in the step 2 of described method, be the microblogging text D of timeslice for cutting tProcess successively the text collection in each time window, LDA (Latent Dirichlet Allocation) the model jargon topic model modeling that application D.M.Blei proposed in 2003, therefrom extract several themes T, and obtain respectively topic content and topic intensity, the topic quantity that wherein generates in each window can be the same or different, and topic quantity N is dynamically generated according to the microblogging content of text in each time window by model selection method:
P ( w | z ) = ( Γ ( Vβ ) Γ ( β ) V ) N Π i = 1 N Π w Γ ( f j w + β ) Γ ( f j + Vβ ) .
6. a kind of method of topic detection and tracking based on the microblogging data according to claim 1, it is characterized in that: in the step 3 of described method, the prior probability that utilizes the posterior probability of last time window to affect the current time window is kept intersubjective continuity; Apply first discrete method, it relies on based on non-condition,, for current time window t, with the time window, is that the interior word of t-1 distributes
Figure FDA00003567046900021
With the priori of certain weighted value w as word distribution in time window t
Figure FDA00003567046900022
Namely
7. a kind of method of topic detection and tracking based on the microblogging data according to claim 1 is characterized in that: in the step 2 of described method, adopt the comparison method of weighted array similarity to weigh inheritance between topic; Combine different thought and the angles of two kinds of similarities of cosine angle-off set and Jaccard coefficient in method, avoided using any single defect to method, simultaneously can guarantee similarity in [0,1] interval, be worth larger expression similarity value higher;
Sim inh(T 1,T 2)=Sim cos(T 1,T 2)×α+Sim jac(T 1,T 2)×β
In formula, Sim cos(T 1, T 2), Sim jac(T 1, T 2) represent respectively the cosine similarity, under the Jaccard Coefficient Algorithm, topic T1 in time window 1 and time window 2, the similarity of T2, α, β represents weighting coefficient, has reflected that 2 kinds of different similarities are big or small to the weights of overall similarity;
Consider inheritance and homogeneity tolerance between topic, draw the combination similarity of weighing related judgement between topic:
Sim com(T 1,T 2)=Sim inh(T 1,T 2)×λ+Sim sen(T 1,T 2)×μ
Sim wherein sem(T 1, T 2), Sim inh(T 1, T 2) be respectively the algorithm of the tolerance of inheritance and homogeneity between topic, λ, μ are weighting coefficient;
Inheritance between topic shows the similarity on the topic content, by Arithmetic of Semantic Similarity, it is weighed.
8. a kind of method of topic detection and tracking based on the microblogging data according to claim 1, it is characterized in that: in the step 3 of described method, between the window topic, correlation analysis is that the some window topics with sequential relationship and relevance are combined into topic, variation by window topic content and intensity, one topic is divided into some stages by producing to wither away, describes out the evolutionary process of topic;
In association analysis with each window topic T iForward direction time window topic T i-1With the new topic that backward time window topic is given birth to, Sim com(T i, T i+1The old topic of)<ε explanation Ti for disappearing, Sim com(T i, T i-1) 〉=ε explanation topic has obtained succession, and process draws topic by producing to the process of withering away thus.
CN201310316316.7A 2013-07-25 2013-07-25 A kind of topic detection and tracking method based on microblog data Active CN103390051B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310316316.7A CN103390051B (en) 2013-07-25 2013-07-25 A kind of topic detection and tracking method based on microblog data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310316316.7A CN103390051B (en) 2013-07-25 2013-07-25 A kind of topic detection and tracking method based on microblog data

Publications (2)

Publication Number Publication Date
CN103390051A true CN103390051A (en) 2013-11-13
CN103390051B CN103390051B (en) 2016-07-20

Family

ID=49534323

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310316316.7A Active CN103390051B (en) 2013-07-25 2013-07-25 A kind of topic detection and tracking method based on microblog data

Country Status (1)

Country Link
CN (1) CN103390051B (en)

Cited By (37)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678277A (en) * 2013-12-04 2014-03-26 东软集团股份有限公司 Theme-vocabulary distribution establishing method and system based on document segmenting
CN103699611A (en) * 2013-12-16 2014-04-02 浙江大学 Microblog flow information extracting method based on dynamic digest technology
CN103793501A (en) * 2014-01-20 2014-05-14 惠州学院 Theme community discovery method based on social network
CN103793478A (en) * 2014-01-14 2014-05-14 四川大学 Online theme modeling method on basis of theme heredity
CN103970863A (en) * 2014-05-08 2014-08-06 清华大学 Method and system for excavating interest of microblog users based on LDA theme model
CN103984729A (en) * 2014-05-19 2014-08-13 北京大学 Microblog information tracing method and microblog information tracing method
CN103984731A (en) * 2014-05-19 2014-08-13 北京大学 Self-adaption topic tracing method and device under microblog environment
CN104281653A (en) * 2014-09-16 2015-01-14 南京弘数信息科技有限公司 Viewpoint mining method for ten million microblog texts
CN104731811A (en) * 2013-12-20 2015-06-24 北京师范大学珠海分校 Cluster information evolution analysis method for large-scale dynamic short texts
CN105138684A (en) * 2015-09-15 2015-12-09 联想(北京)有限公司 Information processing method and device
CN105260358A (en) * 2015-10-14 2016-01-20 上海大学 Short text-oriented unexpected incident development process representation method
CN105354333A (en) * 2015-12-07 2016-02-24 天云融创数据科技(北京)有限公司 Topic extraction method based on news text
CN105760410A (en) * 2015-04-15 2016-07-13 北京工业大学 Model and method for expanding microblog semanteme based on forwarding and commenting
CN106055538A (en) * 2016-05-26 2016-10-26 达而观信息科技(上海)有限公司 Automatic extraction method for text labels in combination with theme model and semantic analyses
CN106294405A (en) * 2015-05-22 2017-01-04 国家计算机网络与信息安全管理中心 A kind of microblogging topic evolution analysis method and device
CN106354818A (en) * 2016-08-30 2017-01-25 电子科技大学 Dynamic user attribute extraction method based on social media
CN106557551A (en) * 2016-10-27 2017-04-05 西南石油大学 Scale forecast method and system is propagated based on the microblogging that microblogging affair clustering is modeled
CN106570088A (en) * 2016-10-20 2017-04-19 浙江大学 Discovering and evolution tracking method for scientific research document topics
CN106570167A (en) * 2016-11-08 2017-04-19 南京理工大学 Knowledge-integrated subject model-based microblog topic detection method
CN106599002A (en) * 2015-10-19 2017-04-26 北京国双科技有限公司 Topic evolution analysis method and device
CN106649726A (en) * 2016-12-23 2017-05-10 中山大学 Association-topic evolution mining method in social network
CN106776503A (en) * 2016-12-22 2017-05-31 东软集团股份有限公司 The determination method and device of text semantic similarity
CN106934014A (en) * 2017-03-10 2017-07-07 山东省科学院情报研究所 A kind of network data excavation based on Hadoop and analysis platform and its method
CN107025299A (en) * 2017-04-24 2017-08-08 北京理工大学 A kind of financial public sentiment cognitive method based on weighting LDA topic models
CN107203513A (en) * 2017-06-06 2017-09-26 中国人民解放军国防科学技术大学 Microblogging text data fine granularity topic evolution analysis method based on probabilistic model
CN107835113A (en) * 2017-07-05 2018-03-23 中山大学 Abnormal user detection method in a kind of social networks based on network mapping
CN107918611A (en) * 2016-10-09 2018-04-17 郑州大学 A kind of model analyzed microblog topic and developed
CN108399162A (en) * 2018-03-21 2018-08-14 北京理工大学 The topic of phrase-based bag topic model finds method
CN108717421A (en) * 2018-04-23 2018-10-30 深圳市城市规划设计研究院有限公司 A kind of social media text subject extracting method and system based on change in time and space
CN108763208A (en) * 2018-05-22 2018-11-06 腾讯科技(上海)有限公司 Topic information acquisition methods, device, server and computer readable storage medium
CN109543110A (en) * 2018-11-28 2019-03-29 南京航空航天大学 A kind of microblog emotional analysis method and system
US10275444B2 (en) 2016-07-15 2019-04-30 At&T Intellectual Property I, L.P. Data analytics system and methods for text data
CN110059225A (en) * 2019-03-11 2019-07-26 北京奇艺世纪科技有限公司 Video classification methods, device, terminal device and storage medium
CN111125305A (en) * 2019-12-05 2020-05-08 东软集团股份有限公司 Hot topic determination method and device, storage medium and electronic equipment
CN111666268A (en) * 2020-05-20 2020-09-15 安徽火蓝数据有限公司 Microblog big data public opinion analysis method
CN112905751A (en) * 2021-03-19 2021-06-04 常熟理工学院 Topic evolution tracking method combining topic model and twin network model
CN113127643A (en) * 2021-05-11 2021-07-16 江南大学 Deep learning rumor detection method integrating microblog themes and comments

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120041953A1 (en) * 2010-08-16 2012-02-16 Microsoft Corporation Text mining of microblogs using latent topic labels
CN103116651A (en) * 2013-03-05 2013-05-22 南京理工大学常熟研究院有限公司 Public sentiment hot topic dynamic detection method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120041953A1 (en) * 2010-08-16 2012-02-16 Microsoft Corporation Text mining of microblogs using latent topic labels
CN103116651A (en) * 2013-03-05 2013-05-22 南京理工大学常熟研究院有限公司 Public sentiment hot topic dynamic detection method

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
胡艳丽等: "一种话题演化建模与分析方法", 《自动化学报》 *
贺亮: "科技文献话题演化研究", 《现代图书情报技术》 *

Cited By (56)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103678277A (en) * 2013-12-04 2014-03-26 东软集团股份有限公司 Theme-vocabulary distribution establishing method and system based on document segmenting
CN103699611A (en) * 2013-12-16 2014-04-02 浙江大学 Microblog flow information extracting method based on dynamic digest technology
CN103699611B (en) * 2013-12-16 2017-01-11 浙江大学 Microblog flow information extracting method based on dynamic digest technology
CN104731811A (en) * 2013-12-20 2015-06-24 北京师范大学珠海分校 Cluster information evolution analysis method for large-scale dynamic short texts
CN104731811B (en) * 2013-12-20 2018-10-09 北京师范大学珠海分校 A kind of clustering information evolution analysis method towards extensive dynamic short text
CN103793478A (en) * 2014-01-14 2014-05-14 四川大学 Online theme modeling method on basis of theme heredity
CN103793478B (en) * 2014-01-14 2017-01-11 四川大学 Online theme modeling method on basis of theme heredity
CN103793501B (en) * 2014-01-20 2016-03-02 惠州学院 Based on the theme Combo discovering method of social networks
CN103793501A (en) * 2014-01-20 2014-05-14 惠州学院 Theme community discovery method based on social network
CN103970863B (en) * 2014-05-08 2017-12-19 清华大学 The method for digging and system of microblog users interest based on LDA topic models
CN103970863A (en) * 2014-05-08 2014-08-06 清华大学 Method and system for excavating interest of microblog users based on LDA theme model
CN103984731A (en) * 2014-05-19 2014-08-13 北京大学 Self-adaption topic tracing method and device under microblog environment
CN103984729A (en) * 2014-05-19 2014-08-13 北京大学 Microblog information tracing method and microblog information tracing method
CN103984731B (en) * 2014-05-19 2017-03-08 北京大学 Self adaptation topic tracking method and apparatus under microblogging environment
CN104281653A (en) * 2014-09-16 2015-01-14 南京弘数信息科技有限公司 Viewpoint mining method for ten million microblog texts
CN104281653B (en) * 2014-09-16 2018-07-27 南京弘数信息科技有限公司 A kind of opining mining method for millions scale microblogging text
CN105760410B (en) * 2015-04-15 2019-04-19 北京工业大学 A kind of microblogging semanteme expansion model and method based on forwarding comment
CN105760410A (en) * 2015-04-15 2016-07-13 北京工业大学 Model and method for expanding microblog semanteme based on forwarding and commenting
CN106294405A (en) * 2015-05-22 2017-01-04 国家计算机网络与信息安全管理中心 A kind of microblogging topic evolution analysis method and device
CN105138684B (en) * 2015-09-15 2018-12-14 联想(北京)有限公司 A kind of information processing method and information processing unit
CN105138684A (en) * 2015-09-15 2015-12-09 联想(北京)有限公司 Information processing method and device
CN105260358A (en) * 2015-10-14 2016-01-20 上海大学 Short text-oriented unexpected incident development process representation method
CN106599002B (en) * 2015-10-19 2020-06-05 北京国双科技有限公司 Topic evolution analysis method and device
CN106599002A (en) * 2015-10-19 2017-04-26 北京国双科技有限公司 Topic evolution analysis method and device
CN105354333A (en) * 2015-12-07 2016-02-24 天云融创数据科技(北京)有限公司 Topic extraction method based on news text
CN105354333B (en) * 2015-12-07 2018-11-06 天云融创数据科技(北京)有限公司 A kind of method for extracting topic based on newsletter archive
CN106055538A (en) * 2016-05-26 2016-10-26 达而观信息科技(上海)有限公司 Automatic extraction method for text labels in combination with theme model and semantic analyses
CN106055538B (en) * 2016-05-26 2019-03-08 达而观信息科技(上海)有限公司 The automatic abstracting method of the text label that topic model and semantic analysis combine
US10275444B2 (en) 2016-07-15 2019-04-30 At&T Intellectual Property I, L.P. Data analytics system and methods for text data
US10642932B2 (en) 2016-07-15 2020-05-05 At&T Intellectual Property I, L.P. Data analytics system and methods for text data
US11010548B2 (en) 2016-07-15 2021-05-18 At&T Intellectual Property I, L.P. Data analytics system and methods for text data
CN106354818A (en) * 2016-08-30 2017-01-25 电子科技大学 Dynamic user attribute extraction method based on social media
CN106354818B (en) * 2016-08-30 2020-01-10 电子科技大学 Social media-based dynamic user attribute extraction method
CN107918611A (en) * 2016-10-09 2018-04-17 郑州大学 A kind of model analyzed microblog topic and developed
CN106570088A (en) * 2016-10-20 2017-04-19 浙江大学 Discovering and evolution tracking method for scientific research document topics
CN106557551A (en) * 2016-10-27 2017-04-05 西南石油大学 Scale forecast method and system is propagated based on the microblogging that microblogging affair clustering is modeled
CN106570167A (en) * 2016-11-08 2017-04-19 南京理工大学 Knowledge-integrated subject model-based microblog topic detection method
CN106776503A (en) * 2016-12-22 2017-05-31 东软集团股份有限公司 The determination method and device of text semantic similarity
CN106776503B (en) * 2016-12-22 2020-03-10 东软集团股份有限公司 Text semantic similarity determination method and device
CN106649726A (en) * 2016-12-23 2017-05-10 中山大学 Association-topic evolution mining method in social network
CN106934014B (en) * 2017-03-10 2021-03-19 山东省科学院情报研究所 Hadoop-based network data mining and analyzing platform and method thereof
CN106934014A (en) * 2017-03-10 2017-07-07 山东省科学院情报研究所 A kind of network data excavation based on Hadoop and analysis platform and its method
CN107025299A (en) * 2017-04-24 2017-08-08 北京理工大学 A kind of financial public sentiment cognitive method based on weighting LDA topic models
CN107203513A (en) * 2017-06-06 2017-09-26 中国人民解放军国防科学技术大学 Microblogging text data fine granularity topic evolution analysis method based on probabilistic model
CN107835113B (en) * 2017-07-05 2020-09-08 中山大学 Method for detecting abnormal user in social network based on network mapping
CN107835113A (en) * 2017-07-05 2018-03-23 中山大学 Abnormal user detection method in a kind of social networks based on network mapping
CN108399162A (en) * 2018-03-21 2018-08-14 北京理工大学 The topic of phrase-based bag topic model finds method
CN108717421A (en) * 2018-04-23 2018-10-30 深圳市城市规划设计研究院有限公司 A kind of social media text subject extracting method and system based on change in time and space
CN108763208A (en) * 2018-05-22 2018-11-06 腾讯科技(上海)有限公司 Topic information acquisition methods, device, server and computer readable storage medium
CN109543110A (en) * 2018-11-28 2019-03-29 南京航空航天大学 A kind of microblog emotional analysis method and system
CN110059225A (en) * 2019-03-11 2019-07-26 北京奇艺世纪科技有限公司 Video classification methods, device, terminal device and storage medium
CN111125305A (en) * 2019-12-05 2020-05-08 东软集团股份有限公司 Hot topic determination method and device, storage medium and electronic equipment
CN111666268A (en) * 2020-05-20 2020-09-15 安徽火蓝数据有限公司 Microblog big data public opinion analysis method
CN112905751A (en) * 2021-03-19 2021-06-04 常熟理工学院 Topic evolution tracking method combining topic model and twin network model
CN112905751B (en) * 2021-03-19 2024-03-29 常熟理工学院 Topic evolution tracking method combining topic model and twin network model
CN113127643A (en) * 2021-05-11 2021-07-16 江南大学 Deep learning rumor detection method integrating microblog themes and comments

Also Published As

Publication number Publication date
CN103390051B (en) 2016-07-20

Similar Documents

Publication Publication Date Title
CN103390051B (en) A kind of topic detection and tracking method based on microblog data
Choi et al. Emerging topic detection in twitter stream based on high utility pattern mining
CN103514183B (en) Information search method and system based on interactive document clustering
Li et al. Filtering out the noise in short text topic modeling
Hai et al. Identifying features in opinion mining via intrinsic and extrinsic domain relevance
Bellaachia et al. Ne-rank: A novel graph-based keyphrase extraction in twitter
Kang et al. Modeling user interest in social media using news media and wikipedia
CN102200975B (en) Vertical search engine system using semantic analysis
CN103544255A (en) Text semantic relativity based network public opinion information analysis method
CN103455562A (en) Text orientation analysis method and product review orientation discriminator on basis of same
CN104536956A (en) A Microblog platform based event visualization method and system
CN102609427A (en) Public opinion vertical search analysis system and method
CN104516961A (en) Topic digging and topic trend analysis method and system based on region
CN106202065A (en) A kind of across language topic detecting method and system
CN101609445A (en) Crucial sub-method for extracting topic based on temporal information
CN105183765A (en) Big data-based topic extraction method
Mahalakshmi et al. Summarization of text and image captioning in information retrieval using deep learning techniques
Kim et al. Effective fake news detection using graph and summarization techniques
Kotlerman et al. Clustering small-sized collections of short texts
Campbell et al. Content+ context networks for user classification in twitter
Zhao et al. Towards events detection from microblog messages
Arafat et al. Analyzing public emotion and predicting stock market using social media
Lampos Detecting events and patterns in large-scale user generated textual streams with statistical learning methods
Ruichen The Basic Principles of Marxism with the Internet as a Carrier
Zhao et al. Expanding approach to information retrieval using semantic similarity analysis based on WordNet and Wikipedia

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20131113

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: Nanjing Post & Telecommunication Univ.

Contract record no.: 2016320000218

Denomination of invention: Topic detection and tracking method based on microblog data

Granted publication date: 20160720

License type: Common License

Record date: 20161118

LICC Enforcement, change and cancellation of record of contracts on the licence for exploitation of a patent or utility model
EC01 Cancellation of recordation of patent licensing contract
EC01 Cancellation of recordation of patent licensing contract

Assignee: Jiangsu Nanyou IOT Technology Park Ltd.

Assignor: Nanjing Post & Telecommunication Univ.

Contract record no.: 2016320000218

Date of cancellation: 20170706