CN103279483B - A kind of topic Epidemic Scope appraisal procedure towards micro-blog and system - Google Patents

A kind of topic Epidemic Scope appraisal procedure towards micro-blog and system Download PDF

Info

Publication number
CN103279483B
CN103279483B CN201310143846.6A CN201310143846A CN103279483B CN 103279483 B CN103279483 B CN 103279483B CN 201310143846 A CN201310143846 A CN 201310143846A CN 103279483 B CN103279483 B CN 103279483B
Authority
CN
China
Prior art keywords
topic
new
message
community
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201310143846.6A
Other languages
Chinese (zh)
Other versions
CN103279483A (en
Inventor
程学旗
李静远
李佳
王元卓
刘悦
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
Original Assignee
Institute of Computing Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS filed Critical Institute of Computing Technology of CAS
Priority to CN201310143846.6A priority Critical patent/CN103279483B/en
Publication of CN103279483A publication Critical patent/CN103279483A/en
Application granted granted Critical
Publication of CN103279483B publication Critical patent/CN103279483B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention provides a kind of topic Epidemic Scope appraisal procedure towards micro-blog and system, wherein method comprises: S1, gather the historical data of micro-blog platform, extract multiple topic and multiple message, carry out union operation and obtain multiple merging message, then the user issuing or forward same merging message is built a community, obtain multiple community, coincidence degree based on multiple community carries out the classification of topic, extracts the feature of topic in same classification; S2, obtain the real time data of micro-blog platform, extract new topic and multiple new information, carry out union operation and obtain multiple new merging message, the user issuing or forward same new merging message is built new communities, obtain multiple new communities, the coincidence degree based on multiple new communities carries out the classification of new topic, extracts the new feature of new topic in same classification; S3, mates described feature with described new feature, obtains target topic, assesses the Epidemic Scope of described target topic.

Description

A kind of topic Epidemic Scope appraisal procedure towards micro-blog and system
Technical field
The present invention relates to internet information management domain, particularly relate to a kind of topic Epidemic Scope appraisal procedure towards micro-blog and system.
Background technology
The fast development of the fast development of internet, particularly Web2.0, the social networking service being representative with Facebook, Myspace and Twitter becomes the indispensable media of communication of the network user.These social networking service are the function such as lastest imformation, the relevant information of up-to-date popular time that user provides the lastest imformation comprising good friend, interested people or group, and these functions are changing the information acquiring pattern of social networking service user gradually.Make a big difference using the virtual community form that external Twitter and the domestic Sina's microblogging micro-blog that is representative relies on traditional community to produce as a kind of novel social networks and Facebook etc., this is mainly manifested in the mechanism of concern, message propagation mode and message real-time.Micro-blog is different from general social networks, have employed the mechanism of unidirectional concern, makes any user arbitrarily can pay close attention to oneself interested people any; The message propagation of micro-blog is that broadcast type is propagated, and the message that user issues can be pushed to all audiences of this user; Micro-blog is the new network service in conjunction with the mode such as network and mobile terminal, and it limits the content that user sends, and more emphasizes the real-time of message.Microblog users describes news, event often through dapper text (being generally no more than 140 words) and expresses the viewpoint of oneself
These characteristics being different from traditional social networks of micro-blog make the data volume of real-time update in micro-blog platform very huge, and in this grand information flow, user has had more urgent demand for acquisition of information.First, because micro-blog is short text, topic finds to be different from traditional blog etc., how effectively to find topic and to sum up topic, corresponding micro-blog being referred on significant topic is challenging problem, and the inner link between topic is out in the cold; Secondly, the user above social networks is made up of the community of implying, and current community discovery does not still have corresponding direct application.In addition, at present still not for the correlative study of relation between community and topic.These weak points are also the problem places with researching value.
First, micro-blog is a kind of topic driving mechanism, the lifetime of topic comprises appearance, development derivation and disappears these stages, due to the real-time of micro-blog, user wishes the stage Real-time Obtaining relevant information occurred at topic, thus can more participating in the discussion of oneself interested topic of morning go.How to realize topic at micro-blog platform and find not yet there is clear and definite scheme; The content quantity that micro-blog platform sends user limits, and this is the real-time in order to ensure message, but this result also in user to a certain extent and cannot accomplish complete statement in a piece of news.This information deficiency too increases the difficulty that burst topic finds;
The second, after micro-blog platform finds topic, during the discovery of the relation between multiple topic, one uncared-forly studies a question.How to find relation between topic, express relation between topic, and to utilize the relation between topic to carry out the assessment of following topic popularity be all the problem with challenge.
3rd, micro-blog platform carries out the discovery of meaningful community.Definition at present for community still has dispute, and a kind of viewpoint thinks that connecting user is closely a community, and another kind of viewpoint thinks that the user with same interest and topic is a community.How relation between community and topic, represent relation between the two, and whether relation is between the two meaningful, how to utilize the possible Epidemic Scope etc. of this relationship assessment topic still to lack relevant research.
Summary of the invention
The object of the invention is the message flow line range assessment of merging topic and community relations, utilize topic and community, between community and community, the relation between topic and topic, can assess by the effective possible Epidemic Scope to new topic in real time.
For achieving the above object, the invention provides a kind of topic Epidemic Scope appraisal procedure towards micro-blog, the method comprises:
Step 1, gathers the historical data of micro-blog platform, extracts multiple topic according to described historical data, and the multiple message corresponding to described multiple topic, and according to union operation is carried out to described multiple message and obtains multiple merging message, then the user issuing or forward same merging message is built a community, obtain multiple community, based on the coincidence degree of described multiple community, carry out the classification of topic, extract the feature of topic in same classification;
Step 2, obtains the real time data of micro-blog platform, extracts new topic according to described real time data, and the multiple new informations corresponding to described new topic, and according to union operation is carried out to described multiple new information and obtains multiple new merging message, the user issuing or forward same new merging message is built new communities, obtain multiple new communities, based on the coincidence degree of described multiple new communities, carry out the classification of new topic, extract the new feature of new topic in same classification;
Step 3, mates described feature with described new feature, obtains target topic, assesses the Epidemic Scope of described target topic;
Wherein L1 and L2 is respectively the length of any two message, and Lcom is the number of the common word of any two message, and threshold is in [0.3,0.4] interval.
Following process is carried out after carrying out union operation in described step 1 and step 2:
Perform LDA machine learning mode according to amalgamation result and obtain topic, utilize calculate the difference value between topic, wherein P and Q is two vectors, is the probability that corresponding all message appears in topic respectively, makes previous D kLfor D_KL_Old, this D kLbe D_KL_new, retain amalgamation result as D_KL_new>D_KL_older and continue new union operation, otherwise eliminate amalgamation result and continue new union operation.
Categorizing operation in described step 1 and step 2 is specially:
To meet any topic under any Liang Ge community be classified as same class, wherein C1 and C2 is any Liang Ge community, and all users in C1 are all users in U1, C2 be user identical in U2, U1 and U2 is Ucom.
For achieving the above object, the present invention also provides a kind of topic Epidemic Scope evaluating system towards micro-blog, and this system comprises:
Historical data processing unit, gathers the historical data of micro-blog platform, extracts multiple topic according to described historical data, and the multiple message corresponding to described multiple topic, and according to union operation is carried out to described multiple message and obtains multiple merging message, then the user issuing or forward same merging message is built a community, obtain multiple community, based on the coincidence degree of described multiple community, carry out the classification of topic, extract the feature of topic in same classification;
Real time data processing unit, obtains the real time data of micro-blog platform, extracts new topic according to described real time data, and the multiple new informations corresponding to described new topic, and according to union operation is carried out to described multiple new information and obtains multiple new merging message, the user issuing or forward same new merging message is built new communities, obtain multiple new communities, based on the coincidence degree of described multiple new communities, carry out the classification of new topic, extract the new feature of new topic in same classification;
Topic area assessment unit, mates described feature with described new feature, obtains target topic, assesses the Epidemic Scope of described target topic;
Wherein L1 and L2 is respectively the length of any two message, and Lcom is the number of the common word of any two message, and threshold is in [0.3,0.4] interval.
Following process is carried out after carrying out union operation in described historical data processing unit and real time data processing unit:
Perform LDA machine learning mode according to amalgamation result and obtain topic, utilize calculate the difference value between topic, wherein P and Q is two vectors, is the probability that corresponding all message appears in topic respectively, makes previous D kLfor D_KL_Old, this D kLbe D_KL_new, retain amalgamation result as D_KL_new>D_KL_older and continue new union operation, otherwise eliminate amalgamation result and continue new union operation.
Categorizing operation in described historical data processing unit and real time data processing unit is specially:
To meet any topic under any Liang Ge community be classified as same class, wherein C1 and C2 is any Liang Ge community, and all users in C1 are all users in U1, C2 be user identical in U2, U1 and U2 is Ucom.
Beneficial effect of the present invention is:
1, be directed to the feature of short text in micro-blog in the present invention, propose a kind of correction to LDA, namely data are merged, be conducive to LDA model after merging and find more significant topic.
2, present invention uses topic to obtain different customer groups, under different topics, is not carry out community discovery for all users, but to the interested user of this topic being carried out to the discovery of community;
3, the information that present invention uses community is sorted out topic, can find the topic classification being more applicable to assessing topic propagation, utilize the corresponding relation between community and topic, effectively assess topic Epidemic Scope.
Describe the present invention below in conjunction with the drawings and specific embodiments, but not as a limitation of the invention.
Accompanying drawing explanation
Fig. 1 is the topic Epidemic Scope appraisal procedure process flow diagram towards micro-blog of the present invention;
Fig. 2 is the topic Epidemic Scope evaluating system schematic diagram towards micro-blog of the present invention;
Fig. 3 is the topic Epidemic Scope evaluating system schematic diagram towards micro-blog of one embodiment of the invention;
Fig. 4 is the topic discovery of one embodiment of the invention and the pretreatment process figure of feature extracting method;
Fig. 5 is the new topic Epidemic Scope appraisal procedure process flow diagram of one embodiment of the invention;
Fig. 6 is the diagram of LDA model used in the present invention;
Fig. 7 is the process flow diagram that in the present invention, topic finds module.
Embodiment
Fig. 1 is the topic Epidemic Scope appraisal procedure process flow diagram towards micro-blog of the present invention.As shown in Figure 1, the method comprises:
S1, gathers the historical data of micro-blog platform, extracts multiple topic according to described historical data, and the multiple message corresponding to described multiple topic, and according to union operation is carried out to described multiple message and obtains multiple merging message, then the user issuing or forward same merging message is built a community, obtain multiple community, based on the coincidence degree of described multiple community, carry out the classification of topic, extract the feature of topic in same classification;
S2, obtains the real time data of micro-blog platform, extracts new topic according to described real time data, and the multiple new informations corresponding to described new topic, and according to union operation is carried out to described multiple new information and obtains multiple new merging message, the user issuing or forward same new merging message is built new communities, obtain multiple new communities, based on the coincidence degree of described multiple new communities, carry out the classification of new topic, extract the new feature of new topic in same classification;
S3, mates described feature with described new feature, obtains target topic, assesses the Epidemic Scope of described target topic;
Wherein L1 and L2 is respectively the length of any two message, and Lcom is the number of the common word of any two message, and threshold is in [0.3,0.4] interval.
Following process is carried out after carrying out union operation in described S1 and S2:
Perform LDA machine learning mode according to amalgamation result and obtain topic, utilize calculate the difference value between topic, wherein P and Q is two vectors, is the probability that corresponding all message appears in topic respectively, makes previous D kLfor D_KL_Old, this D kLbe D_KL_new, retain amalgamation result as D_KL_new>D_KL_older and continue new union operation, otherwise eliminate amalgamation result and continue new union operation.
Categorizing operation in described S1 and S2 is specially:
To meet any topic under any Liang Ge community be classified as same class, wherein C1 and C2 is any Liang Ge community, and all users in C1 are all users in U1, C2 be user identical in U2, U1 and U2 is Ucom.
Fig. 2 is the topic Epidemic Scope evaluating system schematic diagram towards micro-blog of the present invention.As shown in Figure 2, this system comprises:
Historical data processing unit 10, gathers the historical data of micro-blog platform, extracts multiple topic according to described historical data, and the multiple message corresponding to described multiple topic, and according to union operation is carried out to described multiple message and obtains multiple merging message, then the user issuing or forward same merging message is built a community, obtain multiple community, based on the coincidence degree of described multiple community, carry out the classification of topic, extract the feature of topic in same classification;
Real time data processing unit 20, obtains the real time data of micro-blog platform, extracts new topic according to described real time data, and the multiple new informations corresponding to described new topic, and according to union operation is carried out to described multiple new information and obtains multiple new merging message, the user issuing or forward same new merging message is built new communities, obtain multiple new communities, based on the coincidence degree of described multiple new communities, carry out the classification of new topic, extract the new feature of new topic in same classification;
Topic area assessment unit 30, mates described feature with described new feature, obtains target topic, assesses the Epidemic Scope of described target topic;
Wherein L1 and L2 is respectively the length of any two message, and Lcom is the number of the common word of any two message, and threshold is in [0.3,0.4] interval.
Following process is carried out after carrying out union operation in described historical data processing unit 10 and real time data processing unit 20:
Perform LDA machine learning mode according to amalgamation result and obtain topic, utilize calculate the difference value between topic, wherein P and Q is two vectors, is the probability that corresponding all message appears in topic respectively, makes previous D kLfor D_KL_Old, this D kLbe D_KL_new, retain amalgamation result as D_KL_new>D_KL_older and continue new union operation, otherwise eliminate amalgamation result and continue new union operation.
Categorizing operation in described historical data processing unit 10 and real time data processing unit 20 is specially:
To meet any topic under any Liang Ge community be classified as same class, wherein C1 and C2 is any Liang Ge community, and all users in C1 are all users in U1, C2 be user identical in U2, U1 and U2 is Ucom.
Enumerate one embodiment of the invention now.In the following embodiments to provide the micro-blog environment of basic function, method of the present invention is described.The basic function that micro-blog provides comprises: user function, message function.User function comprises concern, is concerned.Message function has transmission, comment, forwarding.
Provide a kind of evaluating system of micro-blog topic Epidemic Scope in one embodiment of the invention, all topics that this system selects suitable model to find in a period of time from the data gathered, after completing topic discovery, be directed to each topic, extract all users relating to this topic, apply suitable model carries out community discovery to user.After completing community discovery, the registration according to community is sorted out topic, topic class is carried out to the extraction of feature.When a new topic occurs, feature is extracted to this new topic, according to feature, mates the classification of new topic.According to the classification matched, the scope that assessment topic may be popular.System comprises micro-blog data acquisition module, topic class finds and characteristic extracting module, new topic Epidemic Scope evaluation module and data storage of collected module.
Wherein, topic finds module, carries out the discovery of topic in existing historical data.Wherein the data of history mainly comprise following content, user data, and described user data comprises the message and review information etc. of transmission in the personal information of micro-blog user, friend's (concern) relation and given interval, forwarding.The essential information of such as user, the friends of user, user's transmission, the message number forwarding, comment on, the information such as the number of times that the message that user sends in collection period is forwarded and comments on.The data collected can be stored in log server.The third party API that usually can provide with spiders or service provider gathers primary data.Topic is carried out for micro-blog and finds that the model adopted is the improvement to LDA model, LDA is a topic model in a machine learning, can be used for identifying the subject information hidden in extensive collection of document, use the common information occurred between word and word.The subject matter that LDA exists in micro-blog is because the text shorter (within 140 words) of micro-blog, causes the common number of times occurred between word and word to reduce greatly.We have proposed a kind of mode of merging, can increase between word and word and jointly occur, improve the result of LDA on short text.
In topic classifying module, the present invention mainly proposes a kind of mode paid close attention to according to user and classifies to topic.The present invention propose topic with community relations thought be: time identical a group of people has paid close attention to different topics, there is contact and the attribute of some inherence in these topics, for the topic having like attribute, probably also pay close attention to by identical a group of people.For different topics, all users relating to certain topic are carried out to the analysis of community, instead of all users are carried out to the analysis of community.According to the intercommunal coincidence degree under different topic, can find propagating the classification with the topic of actual value.
The characteristic module extracting topic class carries out the extraction of feature for each topic class and feature is saved in property data base.After completing topic classification, for each classification, extract the feature of this classification, such as topic generic, the scene etc. of event paid close attention in topic.
New topic Epidemic Scope evaluation module is for emerging topic, after having there is a period of time, extract corresponding feature, and the coupling that topic class finds and in characteristic extracting module, the feature that produces carries out similarity, the coupling of similarity utilizes cosine similarity.Obtain the new classification belonging to topic possibility after coupling, according to the scope that topic classification is in the past popular, assess the Epidemic Scope of new topic.Popular along with topic, can obtain more information about this topic, after extracting the feature of topic further, for may popular scope revise.
Because micro-blog platform data has ageing, the term of validity of data is very short, this requires that system data of newly gathering of adaptive utilization can be carried out feature extraction and follow-up model training thus improve the stability of system, and this requires that system adaptively should be able to carry out model modification.In the present invention, data collecting module collected to data preserve at data memory module, then can carry out off-line renewal to feature, complete the iterative renewal process of model.
Fig. 3 is the topic Epidemic Scope evaluating system schematic diagram towards micro-blog of one embodiment of the invention.As shown in Figure 3, the method is first in the discovery (S101) of the enterprising jargon topic of historical data, secondly, below the topic that these find, obtain the user paying close attention to each topic, topic high for user's registration is assigned to (S102) in identical classification, by user's degree of community, find, for the valuable topic group of propagation, to carry out feature extraction (S103) to topic group, and preserve the feature of each topic.Afterwards, the data stream of Real-time Collection is carried out to the assessment (S104) of new topic Epidemic Scope.Wherein, data characteristics comprises 1), Account Registration time and recently log in the micro-blog time; 2), pay close attention to and the quantity being concerned friend; 3) quantity of the message, sending, forward and comment on; 4) quantity that the message, sent is commented on and forwarded; Etc., and constantly feature is upgraded in system cloud gray model.
Fig. 4 is the topic discovery of one embodiment of the invention and the pretreatment process figure of feature extracting method.As shown in Figure 4, first the method will select the method (S201) of properly inscribing discovery according to the feature of short text in micro-blog, because this method is different from long blog, need to carry out on short text, the number of topic is uncertain in addition, optional model comprises the LDA model in machine learning, and the improvement being directed to short text is carried out for LDA model, method uses historical data to carry out gathering of topic, be directed to the discovery (S202) that each topic found in previous step carries out community, first all users relating to certain topic will be obtained, for these users, the user interconnected is divided into a community.By this step, method obtains the division of the multiple community under different topic.Then, according to the coincidence degree (S203) of user in community, complete and topic is sorted out.After having sorted out, each topic class is carried out to the extraction (S204) of feature, feature comprises, the classification of topic, the time, place etc. of the event that topic relates to.
Fig. 5 is the new topic Epidemic Scope appraisal procedure process flow diagram of one embodiment of the invention.As shown in Figure 5, first system is carried out initialization process by the method, comprise Epidemic Scope possible for message is emptied, the data that may be stored in buffer memory are cured (S301) such as (stored in databases).Because system cloud gray model is on real-time stream, initialization process is extremely important, otherwise can cause data contamination and affect the effect of method.After completing initialization step, system starts the real-time stream (S302) acting on the acquisition of micro-blog data acquisition module, the data of Real-time Collection are carried out respectively to the extraction (S303) of topic feature, the feature used in this step should be identical with the feature used in S204.After completing previous step, the feature according to the topic group obtained in characteristic sum S204 is mated, and selects the most similar topic group, and according to user popular before this topic group, the Epidemic Scope for this topic carries out assessing (S305).After assessment, popular further along with topic, can obtain more features about topic, further can revise the scope of assessment.If topic has been in the extinction stage, so terminate.So far, method completes based on the topic Epidemic Scope appraisal procedure of topic and community relations under micro-blog platform, and the method be incorporated in system, the feature for topic class is preserved, along with the propelling of time, obtain more topic class and the Epidemic Scope of topic is assessed.
Fig. 6 is the diagram of LDA model used in the present invention, and Fig. 7 is the process flow diagram that in the present invention, topic finds module.As shown in Figures 6 and 7:
First, at S501, in system is cleared up.Afterwards similar message is carried out merging (S502) according to above-mentioned rule, above data after consolidation, the topic carrying out LDA model finds (s503), afterwards, at S504 for the topic found, calculate the KL-Divergence between topic, it is the similarity in order to judge between topic and topic, wish that the difference between topic and topic becomes large, close topic should be belong to same topic, become large (s505) if KL-Divergence has, so we proceed to be merged into operation, know the value that can not increase KL-Divergence.So algorithm terminates.
The method and system provided in the present invention is applicable to have in the disparate networks service of micro-blog feature, such as Twitter, Sina's microblogging and Tengxun's microblogging etc.
Method in the present invention is described with a concrete little example below.The improvement of the topic discover method first in description of step one.We picked out comprise 5 topics 50 microbloggings so that our method to be described, five topics are film respectively, healthy, study, game, microblogging.The result of LDA is shown and is usually used the most possible vocabulary in each topic to show, here is five study topics out before the improvement not carrying out LDA.
fortopic1:gameawesomefarmlovetownitsfuckingaddictivelolgames
fortopic2:inceptionmovieyearnightstudyingamazingeasilyyesterdaycoolbrilliant
fortopic3:tweetshopperaccountclarifymeantlookedrecentflannelmaggieseason
fortopic4:gamecrispyhealthcarelistengamecomicswilliamscomicbackwardaaron
fortopic5:twitterfacebookmyspacetexttweetpeoplenopeyoutubelatemessaging
Can find out in topic above and distinguish limitation, such as topic3 with topic5 is relevant with microblogging, and topic2 with topic4 is relevant with game, and five topics are not distinguished well.We merge afterwards, such as inceptionwaseasilythebestmovieihaveseen. and inceptionisthebestmovieoftheyear, sofar. identical in these two sentences word number is many, so merge, after having carried out a series of similar merging, the result obtained is as follows.
fortopic1:gamecrispyteamaddictivelolrulelovereasonsbadtown
fortopic2:twitterfacebookmyspacetextaccounttweetslateshowtweetmessaging
fortopic3:studyingclasssittodaybiocriblayingcalllowhoursschoolnight.boutiamdolleyfiercelifesupposedattentionpayive
fortopic4:healthcarestarwarnfluhospitalbuddypublicityillmaskhoosiers
fortopic5:inceptionmovieyearnightgreatwatchyesterdaycoolenjoyedawesome
Can find out that the differentiation degree between each topic is higher.And substantially corresponding five topics recited above.
Certainly; the present invention also can have other various embodiments; when not deviating from the present invention's spirit and essence thereof; those of ordinary skill in the art are when making various corresponding change and distortion according to the present invention, but these change accordingly and are out of shape the protection domain that all should belong to the claim appended by the present invention.

Claims (6)

1., towards a topic Epidemic Scope appraisal procedure for micro-blog, it is characterized in that, comprising:
Step 1, gathers the historical data of micro-blog platform, extracts multiple topic according to described historical data, and the multiple message corresponding to described multiple topic, and according to union operation is carried out to described multiple message and obtains multiple merging message, then the user issuing or forward same merging message is built a community, obtain multiple community, based on the coincidence degree of described multiple community, carry out the classification of topic, extract the feature of topic in same classification;
Step 2, obtains the real time data of micro-blog platform, extracts new topic according to described real time data, and the multiple new informations corresponding to described new topic, and according to union operation is carried out to described multiple new information and obtains multiple new merging message, the user issuing or forward same new merging message is built new communities, obtain multiple new communities, based on the coincidence degree of described multiple new communities, carry out the classification of new topic, extract the new feature of new topic in same classification;
Step 3, mates described feature with described new feature, obtains target topic, assesses the Epidemic Scope of described target topic;
Wherein L1 and L2 is respectively the length of any two message, and Lcom is the number of the common word of any two message, and threshold is in [0.3,0.4] interval.
2. topic Epidemic Scope appraisal procedure as claimed in claim 1, is characterized in that, carries out following process after carrying out union operation in described step 1 and step 2:
Perform LDA machine learning mode according to amalgamation result and obtain topic, utilize calculate the difference value between topic, wherein P and Q is two vectors, is the probability that corresponding all message appears in topic respectively, makes previous D kLfor D_KL_Old, this D kLbe D_KL_new, retain amalgamation result as D_KL_new>D_KL_older and continue new union operation, otherwise eliminate amalgamation result and continue new union operation.
3. topic Epidemic Scope appraisal procedure as claimed in claim 1, is characterized in that, the categorizing operation in described step 1 and step 2 is specially:
To meet any topic under any Liang Ge community be classified as same class, all users of a community in any Liang Ge community are U1, and all users of another one community are user identical in U2, U1 and U2 is Ucom.
4., towards a topic Epidemic Scope evaluating system for micro-blog, it is characterized in that, comprising:
Historical data processing unit, gathers the historical data of micro-blog platform, extracts multiple topic according to described historical data, and the multiple message corresponding to described multiple topic, and according to union operation is carried out to described multiple message and obtains multiple merging message, then the user issuing or forward same merging message is built a community, obtain multiple community, based on the coincidence degree of described multiple community, carry out the classification of topic, extract the feature of topic in same classification;
Real time data processing unit, obtains the real time data of micro-blog platform, extracts new topic according to described real time data, and the multiple new informations corresponding to described new topic, and according to union operation is carried out to described multiple new information and obtains multiple new merging message, the user issuing or forward same new merging message is built new communities, obtain multiple new communities, based on the coincidence degree of described multiple new communities, carry out the classification of new topic, extract the new feature of new topic in same classification;
Topic area assessment unit, mates described feature with described new feature, obtains target topic, assesses the Epidemic Scope of described target topic;
Wherein L1 and L2 is respectively the length of any two message, and Lcom is the number of the common word of any two message, and threshold is in [0.3,0.4] interval.
5. topic Epidemic Scope evaluating system as claimed in claim 4, is characterized in that, after carrying out union operation in described historical data processing unit and real time data processing unit, carry out following process:
Perform LDA machine learning mode according to amalgamation result and obtain topic, utilize calculate the difference value between topic, wherein P and Q is two vectors, is the probability that corresponding all message appears in topic respectively, makes previous D kLfor D_KL_Old, this D kLbe D_KL_new, retain amalgamation result as D_KL_new>D_KL_older and continue new union operation, otherwise eliminate amalgamation result and continue new union operation.
6. topic Epidemic Scope evaluating system as claimed in claim 4, is characterized in that, be specially for the categorizing operation in described historical data processing unit and real time data processing unit:
To meet any topic under any Liang Ge community be classified as same class, all users of a community in any Liang Ge community are U1, and all users of another one community are user identical in U2, U1 and U2 is Ucom.
CN201310143846.6A 2013-04-23 2013-04-23 A kind of topic Epidemic Scope appraisal procedure towards micro-blog and system Active CN103279483B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310143846.6A CN103279483B (en) 2013-04-23 2013-04-23 A kind of topic Epidemic Scope appraisal procedure towards micro-blog and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310143846.6A CN103279483B (en) 2013-04-23 2013-04-23 A kind of topic Epidemic Scope appraisal procedure towards micro-blog and system

Publications (2)

Publication Number Publication Date
CN103279483A CN103279483A (en) 2013-09-04
CN103279483B true CN103279483B (en) 2016-04-13

Family

ID=49062003

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310143846.6A Active CN103279483B (en) 2013-04-23 2013-04-23 A kind of topic Epidemic Scope appraisal procedure towards micro-blog and system

Country Status (1)

Country Link
CN (1) CN103279483B (en)

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105227425B (en) * 2014-05-26 2019-11-15 腾讯科技(北京)有限公司 Method, equipment and the network social intercourse system of syndication message
CN104111971B (en) * 2014-06-09 2018-03-13 合肥工业大学 Passing microblog data is collected and processing method
CN104834632B (en) * 2015-05-13 2017-09-29 北京工业大学 A kind of microblog topic detection expanded based on semanteme and temperature appraisal procedure
WO2017197566A1 (en) * 2016-05-16 2017-11-23 华为技术有限公司 Method, device, and system for journal displaying
CN107391705B (en) * 2017-07-28 2020-05-12 岳小玲 Network viewpoint propagation and prediction method
CN111694955B (en) * 2020-05-08 2023-09-12 中国科学院计算技术研究所 Early dispute message detection method and system for social platform

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622443A (en) * 2012-03-13 2012-08-01 北京邮电大学 Customized screening system and method for microblog
CN102801657A (en) * 2012-09-03 2012-11-28 鲁赤兵 Composite microblog system and method
CN103023714A (en) * 2012-11-21 2013-04-03 上海交通大学 Activeness and cluster structure analyzing system and method based on network topics

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7613690B2 (en) * 2005-10-21 2009-11-03 Aol Llc Real time query trends with multi-document summarization

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102622443A (en) * 2012-03-13 2012-08-01 北京邮电大学 Customized screening system and method for microblog
CN102801657A (en) * 2012-09-03 2012-11-28 鲁赤兵 Composite microblog system and method
CN103023714A (en) * 2012-11-21 2013-04-03 上海交通大学 Activeness and cluster structure analyzing system and method based on network topics

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
话题检测与跟踪技术的发展与研究;骆卫华 等;《语言计算与基于内容的文本处理-全国第七届计算语言学联合学术会议论文集》;20030801;560-566 *

Also Published As

Publication number Publication date
CN103279483A (en) 2013-09-04

Similar Documents

Publication Publication Date Title
CN103279483B (en) A kind of topic Epidemic Scope appraisal procedure towards micro-blog and system
Resende et al. (Mis) information dissemination in WhatsApp: Gathering, analyzing and countermeasures
Xu et al. Discovering user interest on twitter with a modified author-topic model
CN106980692B (en) Influence calculation method based on microblog specific events
CN103116605B (en) A kind of microblog hot event real-time detection method based on monitoring subnet and system
CN102609460B (en) Method and system for microblog data acquisition
US11122009B2 (en) Systems and methods for identifying geographic locations of social media content collected over social networks
US20170242926A1 (en) Method and apparatus to identify outliers in social networks
CN103279479A (en) Emergent topic detecting method and system facing text streams of micro-blog platform
CN106656732A (en) Scene information-based method and device for obtaining chat reply content
US20120284397A1 (en) Method and device for pushing data
CN106649405A (en) Method and device for acquiring reply prompt content of chat initiating sentence
CN102200987A (en) Method and system for searching sock puppet identification number based on behavioural analysis of user identification numbers
Quercia et al. Talk of the city: Our tweets, our community happiness
CN102831206B (en) Microblogging social contact method and device based on browser
CN102195899B (en) Method and system for information mining of communication network
CN103823844A (en) Question forwarding system and question forwarding method on the basis of subjective and objective context and in community question-and-answer service
CN105893484A (en) Microblog Spammer recognition method based on text characteristics and behavior characteristics
CN104536956A (en) A Microblog platform based event visualization method and system
CN103838814A (en) Method for dynamically displaying contacts diagram relationship
CN103425703A (en) Method and device for processing network information
WO2021114634A1 (en) Text annotation method, device, and storage medium
CN106658441A (en) Method and device of rapidly replaying unread information
CN103905507A (en) Service information recommendation system and service information recommendation method
CN103365917A (en) Method and device for searching information

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
EE01 Entry into force of recordation of patent licensing contract

Application publication date: 20130904

Assignee: Branch DNT data Polytron Technologies Inc

Assignor: Institute of Computing Technology, Chinese Academy of Sciences

Contract record no.: 2018110000033

Denomination of invention: Topic prevalence range assessment method and system facing micro-blogs

Granted publication date: 20160413

License type: Common License

Record date: 20180807

EE01 Entry into force of recordation of patent licensing contract