CN104298765B - The Dynamic Recognition and method for tracing of a kind of internet public feelings topic - Google Patents

The Dynamic Recognition and method for tracing of a kind of internet public feelings topic Download PDF

Info

Publication number
CN104298765B
CN104298765B CN201410574419.8A CN201410574419A CN104298765B CN 104298765 B CN104298765 B CN 104298765B CN 201410574419 A CN201410574419 A CN 201410574419A CN 104298765 B CN104298765 B CN 104298765B
Authority
CN
China
Prior art keywords
mrow
msub
topic
topics
public sentiment
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201410574419.8A
Other languages
Chinese (zh)
Other versions
CN104298765A (en
Inventor
陈海汉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fuzhou University
Original Assignee
Fuzhou University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fuzhou University filed Critical Fuzhou University
Priority to CN201410574419.8A priority Critical patent/CN104298765B/en
Publication of CN104298765A publication Critical patent/CN104298765A/en
Application granted granted Critical
Publication of CN104298765B publication Critical patent/CN104298765B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention relates to a kind of Dynamic Recognition of internet public feelings topic and method for tracing, comprise the following steps:1st, by public sentiment topic it is abstract be node, represent there is association between public sentiment topic to connect arc between node, the weights of connection arc represent the degree of correlation of public sentiment topic;2nd, the time issued according to public sentiment topic is incorporated into corresponding timeslice, builds the internet public feelings topic Dynamic Evolution Model being made up of topic information layer, web page information layer and netizen's information layer;3rd, pair newly an enter webpage related to public sentiment topic carries out feature extraction, obtains characteristic item, webpage is converted into the m-vector space of characteristic item formation, its topic degree of correlation between former public sentiment topic is calculated;4th, using incremental clustering, handle successively it is described newly enter webpage, recognize new topic, and by the new topic expanding and updating of the public sentiment tracked into model.This method is conducive to overcoming topic drift and variation in topic evolution, improves network public-opinion topic tracking effect.

Description

Dynamic identification and tracking method for Internet public sentiment topics
Technical Field
The invention relates to the technical field of Internet public sentiment, in particular to a dynamic identification and tracking method for Internet public sentiment topics.
Background
The network public opinion is a set of cognition, attitude, emotion and behavior tendency of the public on the internet to a certain event. The topic derivation is a main characteristic of the propagation and evolution of network public sentiment, particularly in the period of public sentiment decline, because of the attention transfer of netizens, the interest, appeal and needs of related elements of the original public sentiment topic are lost, the original public sentiment topic loses vitality and is replaced by a new derived topic, and the secondary influence of the public sentiment on the society is generated. The derived topics and the original topics are mutually interwoven to form a dynamic derived network, the life cycle of the original event is prolonged, the duration and the duration of the regression period of the original event are prolonged, the emergency treatment difficulty of the emergency event is increased, and sometimes the social influence of the derived topics is far greater than that of the original event, so that great loss is brought to the social environment. Therefore, the method has very important significance for tracking the public sentiment topics, is beneficial to understanding the development situation of the event, avoids infinite derivation and spread of the event, and provides important decision support for emergency management of the emergency event.
The research of topic identification and tracking methods is mainly divided into three categories: firstly, based on keyword matching without considering the problem of topic semantic correlation, in order to give consideration to the semantic information of a text, a method of implicit semantic analysis is introduced to model the corpus information, and topics which are concerned more on a network are found through a two-stage clustering strategy; secondly, the time is discretized into time points, and then the limit condition of the time points is utilized to process the dynamic theme tracking problem of continuous time; thirdly, extracting the network hot topic theme by adopting an LDA model, and finding the hot topic by utilizing a time tag. Due to derivation and dynamics of internet public sentiment, the public sentiment presents complex evolution characteristics, and a topic model constructed by a learner in the past mostly focuses on description of structured text data of a conversation topic and cannot describe dynamic changes of the topic. In fact, besides the structured text information, the public sentiment topics also include multiple information such as web page link information and association information between publishers (i.e., users) of the topics, and the time sequence characteristics between the topics are important bases for describing evolution relationships of the topics. Because the conventional topic identification and tracking method lacks effective description on the dynamic process and microstructure of topic evolution, the evolution mechanism of public sentiment topics is not enough to be revealed, and the problems of topic drift and derivation which cannot be ignored in the later stage of public sentiment development exist, the conventional Internet public sentiment topic identification and tracking method cannot meet the practical application requirements.
Disclosure of Invention
The invention aims to provide a dynamic identification and tracking method for internet public sentiment topics, which is favorable for overcoming topic drift and derivation problems in topic evolution and improving the tracking effect of the internet public sentiment topics.
In order to achieve the purpose, the technical scheme of the invention is as follows: a dynamic identification and tracking method for Internet public sentiment topics comprises the following steps:
step 1: the public sentiment topics are abstracted into nodes, the nodes represent the association among the public sentiment topics through connecting arcs, and the weight values of the connecting arcs represent the correlation degree of the public sentiment topics;
step 2: dividing a time axis into time slices with a certain length, classifying the public sentiment topics into corresponding time slices according to the time for publishing the public sentiment topics, and constructing an internet public sentiment topic dynamic evolution model consisting of a topic information layer, a webpage information layer and a netizen information layer;
and step 3: extracting characteristics of a new webpage related to the public sentiment topics to obtain characteristic items, describing the webpage by using the characteristic items with the weight higher than the average value, converting the webpage into a multivariate vector space formed by the characteristic items, and calculating the topic correlation degree between the webpage and the original public sentiment topics;
and 4, step 4: identifying new topics by incremental clustering, processing the newly entered web pages in sequence, and identifying new topics, namely if topic relevancyRGreater than a set thresholdθIf the new topic is found, the new topic is found in the webpage, and the tracked new topic of the public opinion is expanded and updated into the dynamic evolution model of the Internet public opinion topic.
Further, in step 1, the topic information layer is an architecture corresponding to topic compositions of different time series information, and is represented as:
wherein,Tin the event of an emergency, the system will,t i is a time slice corresponding to the time slice,e ij to be in time slicet i A public sentiment topic related to the emergency is generated in the system and is described in a vector form,E i as time slicest i A set of internally generated public sentiment topics;
the web page information layers correspond to different time sequence messagesWeb page collection of informationP={P 1,P 2, …,P T Set of link relationships between web pagesPR={PR 1,PR 2, …,PR T },P i As time slicest i The collection of web pages generated in-flight,PR t is fronttA set of web pages within a time slice, andweb pagep i Pointing to web pages by linksp j
The netizen information layer is the collection of information and relation of network usersUG={UG 1,UG 2, …,UG T },UG i Is as followsiThe relationship set of topic discussion in each time slice comprises the characteristics of netizens.
Further, in step 3, the relevance of the related topics is calculated as follows:
calculating topic correlation degree between the webpages based on the link relation and the content similarity between the webpages, wherein the topic correlation degree is shown in formula (1):
(1)
wherein,R C the relevancy is calculated according to the content of the webpage;R L the correlation degree between the web page topics is calculated on the premise of distinguishing the link properties according to the link relation between the web pages;presentation pairR L AndR C the operation between them is generalized addition operation, i.e. topic correlation degree between web pagesRSatisfy the requirement of Is based onR L AndR C the relative importance of the adjustment factor;
new web pageP a Topic relevance to original public sentiment topicR L (P a ) The specific calculation method of (3) is as shown in formula (2):
(2)
wherein,R C (P i ) For newly entering web pageP a With the original web pageP i The degree of similarity of the contents of (a),N(a) Is a new web pageP a Total number of links issued.
Further, updating the topic model according to the following method:
definition ofFor internet public opinion report corpusSTopic of harmony public sentimentTThe content similarity of (2) represents the adjustment of the content similarity of the new public opinion report, as shown in formula (3):
(3)
wherein,representing a vector space formed after feature extraction is carried out on the public opinion reports at the time t;showing the existing time topic at the time t;Nis an internet public opinion newspaperRoad corpusSThe length of time that it lasts for is,presentation internet public opinion report corpusSThe sum of the similarity of the topic involved in the step (a) and the topic existing in the time slice in which the topic is located;
for theR L Mainly adjusting according to the link pointing relation between the web page reported by the new public opinion and the original web page; if newly-entered public opinion report webpageP a Original words with directionTIs adjusted according to equation (4)R L
(4)
R c (P a ) Is the content similarity calculated by formula (3);
calculating new public opinion reportsR L R c Post-adjustment topic relevanceR
Compared with the prior art, the invention has the beneficial effects that: according to the dynamic evolution characteristics and topological structure characteristics of the public sentiment topics, the quantity of the public sentiment topics and the dynamic change of the public sentiment topics along with time are fully considered, the problems of topic drift and derivation in topic evolution are solved, the recognition and tracking effects of the network public sentiment topics can be obviously improved, and therefore a decision basis is provided for emergency management of emergencies.
Drawings
FIG. 1 is a flow chart of an implementation of the method of the present invention.
FIG. 2 is a schematic structural diagram of a dynamic evolution model of Internet public sentiment topics in the method of the present invention.
Detailed Description
The invention is described in further detail below with reference to the figures and the embodiments.
The invention discloses a dynamic identification and tracking method of internet public sentiment topics, which comprises the following steps as shown in figure 1:
step 1: the public sentiment topics are abstracted into nodes, the nodes represent the association among the public sentiment topics through connecting arcs, and the significance of the connecting arcs represents the degree of correlation of the public sentiment topics.
Step 2: dividing a time axis into time slices with a certain length, classifying the public sentiment topics into corresponding time slices according to the time for publishing the public sentiment topics, and constructing an internet public sentiment topic dynamic evolution model consisting of a topic information layer, a webpage information layer and a netizen information layer.
And step 3: and extracting the characteristics of the newly-entered web pages related to the public sentiment topics to obtain characteristic items, namely obtaining a plurality of characteristic items of each of the newly-entered web pages, describing the web pages by using the characteristic items with the weight higher than the average value, converting the web pages into a multivariate vector space formed by the characteristic items, and calculating the topic relevance between the multivariate vector space and the original public sentiment topics.
And 4, step 4: identifying new topics by incremental clustering, processing the newly entered web pages in sequence, and identifying new topics, namely if topic relevancyRGreater than a set thresholdθIf the new topic is found, the new topic is found in the webpage, and the tracked new topic of the public opinion is expanded and updated into the dynamic evolution model of the Internet public opinion topic.
In a preferred embodiment of the invention, step 1 determines the mapping relationship of the topic evolution model composition elements according to the analysis of the micro composition and the evolution characteristics of the public sentiment topics: the model abstracts topics into nodes, connection arcs between the nodes represent that associations exist among the topics, and weight values of the arcs represent correlation degrees of the topics. The topic evolution model determines that the topological structure of the topic evolution model is a hierarchical structure according to the composition of the multi-element information of the topic, and each hierarchy corresponds to one type of information of the topic. As shown in fig. 2.
Topic information layer: the concept of time slicing is introduced here due to the chronological nature of the evolution of the topic. The time slices are formed by dividing the topic evolution process in time. The dynamic topic model constructed by introducing the time slice concept through the time sequence characteristic can better reflect the situation that the public sentiment topic evolves along with the evolution of time. The topic information layer is a system structure corresponding to topic compositions of different time sequence information, and is expressed as follows:
wherein,Tfor a particular emergency event, the event may be,t i is a time slice corresponding to the time slice,e ij to be in time slicet i A public sentiment topic related to the emergency is generated in the system and is described in a vector form,E i as time slicest i The public sentiment topic collection is generated internally.
The web page information layer is a web page set corresponding to different time sequence informationP={P 1,P 2, …,P T Set of link relationships between web pagesPR={PR 1,PR 2, …,PR T },P i As time slicest i The collection of web pages generated in-flight,PR t is fronttA set of web pages within a time slice, andweb pagep i Pointing to web pages by linksp j
The netizen information layer is the collection of information and relation of network usersUG={UG 1,UG 2, …,UG T },UG i Is as followsiThe relationship set of topic discussion in each time slice comprises the characteristics of netizens. The reason for adding the netizen information layer into the model is that the interactive relationship among network users has a key effect on the evolution of the view of the users, when most users have negative attitude towards the view of a certain user, the user is most likely to give up the view, and when most users have inverted attitude towards the view of a certain user, the user is more likely to stick to the view of the user.
In a preferred embodiment of the present invention, step 3, for the corpus information of the related reports around the public sentiment topic of the emergency, the calculation of the relevance between the nodes in the topic information layer in the topic model and the public sentiment topic needs to comprehensively consider the link relation and the content similarity between the node web pages. Based on the link relation and the content relevancy among the webpages, the invention provides a method for calculating the topic relevancy among the webpages, which is shown in a formula (1):
(1)
wherein,R C the method is characterized in that the relevancy is calculated according to the content of a webpage, and the similarity between a content space vector of an internet news report corpus and a content space vector of a public sentiment topic is calculated. Because some web page links are only used for social purposes or attract the attention of others, the relevance of the web page topics is not high, if different properties of the links are ignored and the types of the links are not distinguished, the phenomenon that a model cannot effectively deal with topic drift can be caused, and derived topics cannot be effectively detected. Thus, in formula (1)R L The correlation degree between the web page topics is calculated on the premise of distinguishing the link properties according to the link relation between the web pages.Presentation pairR L AndR C the operation between them is generalized addition operation, i.e. topic correlation degree between web pagesRSatisfy the requirement of Is based onR L AndR C relative importance of (d) is set by the adjustment factor.
New web pageP a Topic relevance to original public sentiment topicR L (P a ) The specific calculation method of (3) is as shown in formula (2):
(2)
because the original topic may relate to a plurality of web pages, and if the newly-entered public opinion report web page has link relations with a plurality of web pages of the original reports, the similarity between the topic of the newly-entered web page and the original topic needs to be the average value of the sum of the correlation degrees reported by the original web pages,R C (P i ) For newly entering web pageP a Web page reported with originalP i The degree of similarity of the contents of (a),N(a) Is a new web pageP a Total number of links issued.
In a preferred embodiment of the present invention, step 4 is based on the timeliness of the public opinion report corpus and the change of the dynamic information stream in the network, in order to identify the derived new topic, the topic model calculates the relevance according to the link relation of the new web page and properly adjusts the historical data, and the present invention provides a topic model updating strategy based on the topic relevance adjusting method. Updating the topic model as follows:
definition ofFor internet public opinion report corpusSWords of harmony public sentimentQuestion (I)TThe content similarity of (2) represents the adjustment of the content similarity of the new public opinion report, as shown in formula (3):
(3)
wherein,representing a vector space formed after feature extraction is carried out on the public opinion reports at the time t;showing the existing time topic at the time t;Nis a report corpus of Internet public sentimentsSThe length of time that it lasts for is,presentation internet public opinion report corpusSThe sum of the similarity of the topic involved in (2) and the topic existing in the time slice in which the topic is located. Since the derivation and drift phenomena of topics often occur between topics with close time distance, and the probability of the derivation and secondary relationship of topics with longer time interval is smaller, only the topics in the same time slice need to be considered when calculating the topic similarity of the new public opinion report.
For theR L Mainly adjusting according to the link pointing relation between the web page reported by the new public opinion and the original web page; if newly-entered public opinion report webpageP a Original words with directionTIs adjusted according to equation (4)R L
(4)
R c (P a ) Is the content similarity calculated by the formula (3).
Calculating new public opinion reportsR L R c Then adjusting topic relevance according to formula (1)R
In order to determine the generation of new topics, a threshold value needs to be presetθWhen is coming into contact withRθAnd if so, considering that a new topic appears in the report, and otherwise, considering that the report is a repeated report of the existing topic.
In a preferred embodiment of the present invention, the topic tracking method in step 4 captures the dynamic changes of public opinion reports from two aspects: on one hand, topic information at the current moment is stored in a topic information layer of the model, and mainly clustering results obtained through topic mining are stored; on the other hand, the relevance of the newly-entered reports is calculated according to a topic model updating strategy, and new information is dynamically expanded to the topic model by using the tracked topic mining results of the public opinion reports. The incremental topic clustering process is equivalent to a clustering algorithm for the whole report set, the algorithm performs incremental clustering on the report set according to the sequence of time slices, and sequentially processes the report webpages in the public opinion report information stream, and the specific algorithm is realized as shown below.
The algorithm is as follows:
inputting:(public opinion report set) output:(topic set)
1Will be provided withR 1As a seed report, extracting the characteristics of the seed report to obtain a seed topicInitializing a topic model;
2//R i a web page for subsequent public opinion reporting;
3// judgmentR i Whether it is a story related to the original topic content;
4ifR i For related reports, the method will be describedR i Adding a topic model and updating the topic model;
5// differentiationR i The issued webpage link type removes friend links and advertisement links;
6
7// LinkL j Pointing to web pagesP j And is andP j not in the existing topic set;
8// will web pageP j Adding a topic model;
9v/updating the Web page information layer of the topic model, addingR i Point of directionP j The link information of (2);
10// analyzing reports based on link relationshipsR i The similarity of (2);
11
12v/adjusted and reported according to equation (4)R i All web pages with link relationP j The degree of correlation of (c);
13
14
15
16// reportR i The correlation degree of (2) exceeds a preset threshold value, and the public opinion report is consideredR i New topics appear, and the topic collection is updated;
17// returning the tracked topic set;
18
the algorithm shows that the topic model is continuously adjusted along with the updating of the new public opinion report, when an emergency occurs, the initial public opinion report is used as a seed report, topics contained in the seed report are seed topics, and the topic model is gradually constructed and updated on the basis.
Step (1) in the algorithm is a model initialization process for determining seed reports and seed topics, and steps (2) to (4) are processes for judging whether newly-entered reports are related to the seed reports, adding the reports to a topic model if the newly-entered reports are related, and updating report sets. And (5) to (13) calculating the relevance of the report and the relevance of the webpage indicated by the link based on the link relation, and updating the topic model according to the calculation result. The steps (14) to (15) are the process of judging whether new topics are generated in the reports, and finally returning to the topic set in a certain time slice.
The above are preferred embodiments of the present invention, and all changes made according to the technical scheme of the present invention that produce functional effects do not exceed the scope of the technical scheme of the present invention belong to the protection scope of the present invention.

Claims (1)

1. A dynamic identification and tracking method for Internet public sentiment topics is characterized by comprising the following steps:
step 1: the public sentiment topics are abstracted into nodes, the nodes represent the association among the public sentiment topics through connecting arcs, and the weight values of the connecting arcs represent the correlation degree of the public sentiment topics;
step 2: dividing a time axis into time slices with a certain length, classifying the public sentiment topics into corresponding time slices according to the time for publishing the public sentiment topics, and constructing an internet public sentiment topic dynamic evolution model consisting of a topic information layer, a webpage information layer and a netizen information layer;
and step 3: extracting characteristics of a new webpage related to the public sentiment topics to obtain characteristic items, describing the webpage by using the characteristic items with the weight higher than the average value, converting the webpage into a multivariate vector space formed by the characteristic items, and calculating the topic correlation degree between the webpage and the original public sentiment topics;
and 4, step 4: identifying new topics by adopting incremental clustering, sequentially processing the newly-entered webpages, and identifying the new topics, namely if the topic correlation degree R is greater than a set threshold value theta, considering that the existing topics are repeatedly reported, discarding the topics, otherwise, considering that the new topics appear in the webpages, and expanding and updating the tracked new public sentiments into an Internet public sentiment topic dynamic evolution model;
in step 2, the topic information layer is an architecture of topic components corresponding to different time series information, and is represented as:
<mrow> <mi>T</mi> <mo>=</mo> <mfenced open = "{" close = ""> <mtable> <mtr> <mtd> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mn>1</mn> </msub> <mo>|</mo> <msub> <mi>e</mi> <mn>11</mn> </msub> <mo>,</mo> <mn>...</mn> <msub> <mi>e</mi> <mrow> <mn>1</mn> <mi>j</mi> </mrow> </msub> <mo>,</mo> <mn>...</mn> <msub> <mi>e</mi> <mrow> <mn>1</mn> <mi>h</mi> </mrow> </msub> <mo>)</mo> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mn>1</mn> <mi>j</mi> </mrow> </msub> <mo>&amp;Element;</mo> <msub> <mi>E</mi> <mn>1</mn> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>...</mn> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>i</mi> </msub> <mo>|</mo> <msub> <mi>e</mi> <mrow> <mi>i</mi> <mn>1</mn> </mrow> </msub> <mo>,</mo> <mn>...</mn> <msub> <mi>e</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>,</mo> <mn>...</mn> <msub> <mi>e</mi> <mrow> <mi>i</mi> <mi>n</mi> </mrow> </msub> <mo>)</mo> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mi>i</mi> <mi>j</mi> </mrow> </msub> <mo>&amp;Element;</mo> <msub> <mi>E</mi> <mi>i</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mn>...</mn> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>(</mo> <msub> <mi>t</mi> <mi>m</mi> </msub> <mo>|</mo> <msub> <mi>e</mi> <mrow> <mi>m</mi> <mn>1</mn> </mrow> </msub> <mo>,</mo> <mn>...</mn> <msub> <mi>e</mi> <mrow> <mi>m</mi> <mi>j</mi> </mrow> </msub> <mo>,</mo> <mn>...</mn> <msub> <mi>e</mi> <mrow> <mi>m</mi> <mi>n</mi> </mrow> </msub> <mo>)</mo> <mo>,</mo> <msub> <mi>e</mi> <mrow> <mi>m</mi> <mi>j</mi> </mrow> </msub> <mo>&amp;Element;</mo> <msub> <mi>E</mi> <mi>m</mi> </msub> </mrow> </mtd> </mtr> </mtable> </mfenced> </mrow>
wherein T is an emergency event, TiFor corresponding time slice, eijAt a time slice tiAn internally generated public sentiment topic related to the emergency, described in the form of a vector, EiIs a time slice tiA set of internally generated public sentiment topics;
the webpage information layer is a webpage set P ═ { P ═ P corresponding to different time sequence information1,P2,…,PTAnd link relation set PR between web pages { PR ═ PR1,PR2,…,PRT},PiIs a time slice tiInternally generated collection of web pages, PRtIs a set of web pages in the first t time slices, anWeb page piPointing to web page p by linkj
The netizen information layer is the set UG ═ UG of the information and the relation of the network users1,UG2,…,UGT},UGiA relation set of topic discussion persons in the ith time slice comprises characteristics of netizens;
in step 3, the relevance of the topic is calculated as follows:
calculating topic correlation degree between the webpages based on the link relation and the content similarity between the webpages, wherein the topic correlation degree is shown in formula (1):
<mrow> <mi>R</mi> <mo>=</mo> <msub> <mi>R</mi> <mi>L</mi> </msub> <mo>&amp;CirclePlus;</mo> <msub> <mi>R</mi> <mi>C</mi> </msub> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>1</mn> <mo>)</mo> </mrow> </mrow>
wherein R isCThe relevancy is calculated according to the content of the webpage; rLThe correlation degree between the web page topics is calculated on the premise of distinguishing the link properties according to the link relation between the web pages;represents a pair of RLAnd RCThe operation between the pages is generalized addition operation, namely that the topic correlation R between the web pages satisfies max (R)L,RC)≤R≤min(1,RL+RC) Is according to RLAnd RCThe relative importance of the adjustment factor;
new web page PaTopic relevance R to original public sentiment topicL(Pa) The specific calculation method of (3) is as shown in formula (2):
RL(Pa)=(RC(P1)+RC(P2)+...+RC(Pn))/N(a) (2)
wherein R isC(Pi) For newly entering webpage PaWith the original webpage PiThe content similarity of (A) is the newly entered web page PaTotal number of links issued;
updating the topic model as follows:
define RnewC(S, K) is the content similarity of the Internet public opinion report corpus S and the public opinion topic K, and represents the adjustment of the content similarity of the new public opinion report, as shown in formula (3):
<mrow> <mi>R</mi> <mi>n</mi> <mi>e</mi> <mi>w</mi> <mi>c</mi> <mrow> <mo>(</mo> <mi>S</mi> <mo>,</mo> <mi>K</mi> <mo>)</mo> </mrow> <mo>=</mo> <mfrac> <mrow> <mo>&amp;Sigma;</mo> <mi>r</mi> <mrow> <mo>(</mo> <msubsup> <mi>e</mi> <mi>t</mi> <mi>S</mi> </msubsup> <mo>,</mo> <msubsup> <mi>e</mi> <mi>t</mi> <mi>K</mi> </msubsup> <mo>)</mo> </mrow> </mrow> <mi>N</mi> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>3</mn> <mo>)</mo> </mrow> </mrow>
wherein,representing a vector space formed after feature extraction is carried out on the public opinion reports at the time t;showing the existing time topic at the time t; n is the duration of the Internet public opinion report corpus S,representing the sum of the similarity of the topics related in the Internet public opinion report corpus S and the existing topics in the time slice;
for RLMainly adjusting according to the link pointing relation between the web page reported by the new public opinion and the original web page; if newly-entered public opinion reported webpage PaWith links to the original topic K, adjusting R according to equation (4)L
<mrow> <msub> <mi>R</mi> <mi>L</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>P</mi> <mi>a</mi> </msub> <mo>)</mo> </mrow> <mo>=</mo> <msub> <mi>R</mi> <mi>L</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>P</mi> <mi>a</mi> </msub> <mo>)</mo> </mrow> <mo>+</mo> <mfrac> <mrow> <msub> <mi>R</mi> <mi>c</mi> </msub> <mrow> <mo>(</mo> <msub> <mi>P</mi> <mi>a</mi> </msub> <mo>)</mo> </mrow> </mrow> <mrow> <mi>N</mi> <mrow> <mo>(</mo> <msub> <mi>P</mi> <mi>a</mi> </msub> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>-</mo> <mo>-</mo> <mo>-</mo> <mrow> <mo>(</mo> <mn>4</mn> <mo>)</mo> </mrow> </mrow>
Rc(Pa) Is the content similarity calculated by formula (3);
calculating R of new public opinion reportL、RcAnd adjusting the topic relevance R.
CN201410574419.8A 2014-10-24 2014-10-24 The Dynamic Recognition and method for tracing of a kind of internet public feelings topic Expired - Fee Related CN104298765B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410574419.8A CN104298765B (en) 2014-10-24 2014-10-24 The Dynamic Recognition and method for tracing of a kind of internet public feelings topic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410574419.8A CN104298765B (en) 2014-10-24 2014-10-24 The Dynamic Recognition and method for tracing of a kind of internet public feelings topic

Publications (2)

Publication Number Publication Date
CN104298765A CN104298765A (en) 2015-01-21
CN104298765B true CN104298765B (en) 2017-09-15

Family

ID=52318490

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410574419.8A Expired - Fee Related CN104298765B (en) 2014-10-24 2014-10-24 The Dynamic Recognition and method for tracing of a kind of internet public feelings topic

Country Status (1)

Country Link
CN (1) CN104298765B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104965930B (en) * 2015-07-30 2019-03-26 成都信息工程大学 A kind of emergency event evolution analysis method based on big data
CN104965931A (en) * 2015-07-30 2015-10-07 成都布林特信息技术有限公司 Big data based public opinion analysis method
CN106874292B (en) * 2015-12-11 2020-05-05 北京国双科技有限公司 Topic processing method and device
CN106649726A (en) * 2016-12-23 2017-05-10 中山大学 Association-topic evolution mining method in social network
CN107066567B (en) * 2017-04-05 2021-08-31 竹间智能科技(上海)有限公司 Topic detection-based user portrait modeling method and system in text conversation
CN107391660B (en) * 2017-07-18 2021-05-11 太原理工大学 Induced division method for subtopic division
CN108021651B (en) * 2017-11-30 2020-07-28 中科金联(北京)科技有限公司 Network public opinion risk assessment method and device
CN109871434B (en) * 2019-02-25 2019-12-10 内蒙古工业大学 Public opinion evolution tracking method based on dynamic incremental probability graph model
CN110968668B (en) * 2019-11-29 2023-03-14 中国农业科学院农业信息研究所 Method and device for calculating similarity of network public sentiment topics based on hyper-network
CN111475732B (en) * 2020-04-13 2023-07-14 深圳市雅阅科技有限公司 Information processing method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101695011B1 (en) * 2011-08-24 2017-01-10 한국전자통신연구원 System for Detecting and Tracking Topic based on Topic Opinion and Social-influencer and Method thereof

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"基于文本的网络舆情话题跟踪的研究";廖秀玲;《中国优秀硕士学位论文全文数据库 信息科技辑》;20121215(第12期);论文第22页第2段,第28-29页第4.3节,第32页第4.4.2节 *
"基于语义和链接的话题跟踪方法";宋丹;《中国优秀硕士学位论文全文数据库 信息科技辑》;20080515(第05期);第18-26页第3.3.3节,第3.5节 *
"微博热点话题预判技术研究";张思龙;《中国优秀硕士学位论文全文数据库 信息科技辑》;20140215(第2期);第15-16页第3.1节,图7 *

Also Published As

Publication number Publication date
CN104298765A (en) 2015-01-21

Similar Documents

Publication Publication Date Title
CN104298765B (en) The Dynamic Recognition and method for tracing of a kind of internet public feelings topic
CN104008203B (en) A kind of Users&#39; Interests Mining method for incorporating body situation
CN107122455A (en) A kind of network user&#39;s enhancing method for expressing based on microblogging
CN107577665B (en) Text emotional tendency judging method
CN110889282B (en) Text emotion analysis method based on deep learning
Liu et al. Context-aware social media user sentiment analysis
CN102521420B (en) Socialized filtering method on basis of preference model
CN103488637B (en) A kind of method carrying out expert Finding based on dynamics community&#39;s excavation
Claypo et al. Opinion mining for thai restaurant reviews using K-Means clustering and MRF feature selection
CN112765480A (en) Information pushing method and device and computer readable storage medium
CN108228867A (en) A kind of theme collaborative filtering recommending method based on viewpoint enhancing
CN107944911A (en) A kind of recommendation method of the commending system based on text analyzing
CN111626050B (en) Microblog emotion analysis method based on expression dictionary and emotion general knowledge
CN105740382A (en) Aspect classification method for short comment texts
Kaur et al. Social issues sentiment analysis using python
CN104573070A (en) Text clustering method special for mixed length text sets
CN104572915B (en) One kind is based on the enhanced customer incident relatedness computation method of content environment
CN105184654A (en) Public opinion hotspot real-time acquisition method and acquisition device based on community division
CN104572623B (en) A kind of efficient data analysis and summary method of online LDA models
Shukla et al. Role of hybrid optimization in improving performance of sentiment classification system
Zhu et al. MMLUP: Multi-Source & Multi-Task Learning for User Profiles in Social Network.
Dhande et al. Review of sentiment analysis using naive bayes and neural network classifier
İş et al. A Profile Analysis of User Interaction in Social Media Using Deep Learning.
CN111797235B (en) Text real-time clustering method based on time attenuation factor
Zhang et al. Analysis and Research on Factors Affecting Information Dissemination of Emergencies in Social Media Environment

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20170915

Termination date: 20201024

CF01 Termination of patent right due to non-payment of annual fee