CN105809554A - Prediction method of hot topics participated by users in social networks - Google Patents

Prediction method of hot topics participated by users in social networks Download PDF

Info

Publication number
CN105809554A
CN105809554A CN201610083734.XA CN201610083734A CN105809554A CN 105809554 A CN105809554 A CN 105809554A CN 201610083734 A CN201610083734 A CN 201610083734A CN 105809554 A CN105809554 A CN 105809554A
Authority
CN
China
Prior art keywords
user
alternative user
topic
alternative
according
Prior art date
Application number
CN201610083734.XA
Other languages
Chinese (zh)
Other versions
CN105809554B (en
Inventor
肖云鹏
赖佳伟
刘宴兵
叶青
王宇航
黄恺
李露
李松阳
Original Assignee
重庆邮电大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 重庆邮电大学 filed Critical 重庆邮电大学
Priority to CN201610083734.XA priority Critical patent/CN105809554B/en
Publication of CN105809554A publication Critical patent/CN105809554A/en
Application granted granted Critical
Publication of CN105809554B publication Critical patent/CN105809554B/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING; COUNTING
    • G06QDATA PROCESSING SYSTEMS OR METHODS, SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL, SUPERVISORY OR FORECASTING PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation, e.g. linear programming, "travelling salesman problem" or "cutting stock problem"

Abstract

The invention belongs to the field of computer network information technology analysis, and discloses a prediction method of hot topics participated by users in social networks. Based on online users and user friend relation networks, by taking features of three-side relations among personal interests of fans, concerned users and communities into consideration, through time discretization and time slice methods, the influence of a timeliness factor of topic information is added, at the same time, for the purpose of solving the problems of data distribution nonuniformity and network structure sparsity at each phase of life cycles of the hot topics, a hot topic participation prediction model is constructed and the model is fitted, and power factors of the fans in participating in discussion of the topics can be analyzed simply by inputting the data, such that whether the fans of users already participating in the topics participate in the discussion of the topics can be dynamically predicted, and the power factors of participation in the discussion of the topics and the heat trend of the topics can be mined.

Description

In a kind of social networks, user participates in the Forecasting Methodology of much-talked-about topic

Technical field

The invention belongs to computer information technology analysis field, participate in the forecast analysis of much-talked-about topic especially for user.

Background technology

Along with popularizing rapidly of social networks application, the time that user spends on social networks gets more and more.Meanwhile, use The relevant information that family stays also allows social networks become a huge information platform.Utilize this information platform, use can be grasped The behavior at family and the propagation law of information.Can be conducive to calculating network information flow volume, network capacity and Internet resources Reasonable distribution and utilization.

Present stage, the network public-opinion that much-talked-about topic and focus incident cause in social networks gets more and more, and talks about focus The analysis of topic is increasingly becoming study hotspot.It is mainly studied has not in directions such as user force, Information Communication, user behaviors Exploration with degree.The method used includes text based analysis, analysis based on user force and ties based on network The analysis etc. of structure.Wherein, text based analysis mainly estimates its spread scope from the information strength of much-talked-about topic itself;Base Analyze main research user in social networks for the power of influence of other users in user force, and turn with affecting user Send out, the behavial factor of comment combines, thus reaches to excavate the purpose of Information Communication situation in network.As: Sen Wu et al. exists " power of influence of comforming: the power of influence of comforming in social networks " (Conflence:Conformity Influence in Large Social Networks) the middle analysis passing behavior of whole network user, and set up regional effection model with this;Based on network structure Analysis mainly utilize Small-world Theory in Self, user to go out in-degree scheduling theory to make the propagation of topic reach stable equilibrium in the network architecture State, predict the propagation of much-talked-about topic with this.If: Jing Zhang et al. is " forwarding microblogging based on good friend's circle is pre- Survey " (Who Influenced You?Predicting Re-tweet via social Influence Locality) in grind Study carefully forecast model based on user's good friend's circle.

But above-mentioned prior art does not all take into full account the advanced dynamic origin cause of formation of user behavior, and much-talked-about topic exists life Cycle stages data skewness and the openness problem of network structure.

Summary of the invention

The solved problem of the present invention: for the advanced dynamic origin cause of formation of user behavior, it is each that much-talked-about topic exists life cycle Phase data skewness and the problem such as network structure is openness.The present invention proposes a kind of user, and to participate in much-talked-about topic pre- Survey method.The method will participate in the vermicelli alternately user of topic user, and the object of research is concentrated mainly on alternative use On family.Meanwhile, drive and in terms of belonging to alternative user, corporations affect three from alternative user self, alternative user good friend respectively Set out, and add the impact of topic ageing factor, set up user by Random Field Theory and participate in the forecast model of much-talked-about topic.

A kind of user participates in the Forecasting Methodology of much-talked-about topic, including, obtain data source modules and utilize focus in social networks The interactive data of topic, will participate in the vermicelli alternately user of topic user;Property extracting module is respectively from alternative user Belonging to self attributes, alternative user good friend's attribute and alternative user, corporations' attribute obtains the correlation factor letter of above-mentioned three parts Number;Build and participate in much-talked-about topic forecast model, and model parameter is fitted;By the parameter obtained after matching and any instant The topic participation situation of t is input to forecast model and carries out whether subsequent time alternative user can participate in the prediction of this topic discussion, Obtain social networks and the traffic flow information of subsequent time alternative user ownership according to predicting the outcome, adjust network structure.As for Subsequent time alternative user participates in the social networks more Internet resources of distribution that many topics relate to.

Described alternative user self attributes includes: alternative user viWhether it is any active ues isActivity (vi);Alternative user viLabel in whether have the keyword isSameTag (v consistent with much-talked-about topici);Alternative user viConcern user to have several be Participate in the user countOfHF (v of topici);Alternative user viPay close attention to user topic drive inf (vi).It is easy to describe, the present invention Middle by above-mentioned four attribute x about alternative user selfikSuch Unified Form describes, and represents alternative user vi? K attribute: xi1=isActivity (vi);xi2=isSameTag (vi);xi3=countOfHF (vi);xi4=inf (vi).According to formula:Determine the saturation that alternative user self is relevant, wherein, fk(xik,yi) table Show the dependency between the participative behavior of alternative user and self association attributes, yi=1 represents alternative user viCan participate in future time section The discussion of this topic, xik≠0∩yi=1 represents alternative user viKth property value be not 0 and alternative user at subsequent time period This topic can be participated in;According to formula: Determine the saturation that alternative user good friend is correlated with, wherein, gl(yi,yj,pafl(vj)) represent alternative user participative behavior with Alternative user good friend attribute pafl(vjDependency between), pafl(vj) represent alternative user viThe l property value;According to formulaDetermine corporations belonging to alternative user because of Subfunction, wherein, h (yi,gaf(vi,Cm)) represent the participative behavior of alternative user and affiliated corporations attribute gaf (vi,Cm) phase Guan Xing, gaf (vi,Cm) represent alternative user viAffiliated corporations CmWhether it is τ corporations.CmRepresent m-th corporations.Corporations are according to words Topic customer relationship network, is got by corporations' sorting algorithm.

Determine whether alternative user is that any active ues farther includes: according to formula: Judge alternative user viWhether it is any active ues, wherein, xi1(vi) represent alternative user viThe 1st attribute, activity (vi) Represent user viActive index, κ is user's active index ranking threshold values;According to formula: activity (vi)=ρ * Ε [origNum(vi)]+Ε[retwNum(vi)] determine activity (vi), wherein, Ε [origNum (vi)],Ε[retwNum (vi)] it is user v respectivelyiBefore topic is initiated, the average daily original microblogging number of predetermined amount of time forwards microblogging number with average daily, and ρ is can Variable element.

Determine that the topic drive paying close attention to user that alternative user is paid close attention to farther includes: according to formula: Determine concern user ukTo alternative user viTopic total drive inf (vi), wherein, inf (uk) pay close attention to user ukIf Topic drive, n represents alternative user viThe concern total number of users participating in topic.

According to formula:

inf(uk)=ln (Ε [readNum (uk)]+1)+Ε[retNum(uk)]+Ε[comNum(uk)] obtain to pay close attention to and use Family ukTopic drive, wherein, Ε [readNum (uk)],Ε[retNum(uk)],Ε[comNum(uk)] refer to respectively pay close attention to User ukBefore topic is initiated in certain period of time original microblogging and forwarding microblogging browse several expected value, forwarding number expected value and Comment number expected value.

Alternative user good friend's attribute includes: whether alternative user good friend is certification user, and whether alternative user good friend is meaning See leader.Same for the ease of describing, by the above-mentioned pass alternative user such paf of good friend's attribute in the present inventionk (vi) Unified Form describes, and represents alternative user viKth attribute.paf1(vi) represent alternative user viWhether it is that certification is used Family, paf1(vi)=1 represents alternative user viIt is certification user, paf1(vi)=0 represents alternative user viIt not certification user. paf2(vi) represent alternative user viWhether it is opinion leader, paf2(vi)=1 represents alternative user viIt is opinion leader, paf2 (vi)=0 represents alternative user viIt it not opinion leader.Pass through formula:Determine alternative use Family good friend viWhether it is opinion leader, deg (vi) represent alternative user viVermicelli eigenvalue, vermicelli eigenvalue deg (vi) it is: deg(vi)=σ × [fans (vi)-mutfans(vi)]+mutfans(vi), wherein, fans (vi),mutfans(vi) generation respectively Table alternative user good friend viVermicelli number, good friend's number each other, ζ is vermicelli eigenvalue rank threshold, and σ is variable element.

Corporations' attribute gaf (v belonging to alternative useri,Cm) include, alternative user viAffiliated corporations CmWhether it is τ corporations, CmTable Show m-th corporations.According to formulaDetermine alternative user viAffiliated corporations CmWhether it is τ Corporations, wherein, τ (Cm) > ψ represents corporations CmThe total number of users of middle this topic of participation accounts for the percentage ratio of this corporations' number of users more than setting Determining cause number ψ.

Forecast model carries out subsequent time period alternative user and whether can participate in the prediction of this topic and specifically include, during according to t Between section extract alternative user Vt, friend relation set between alternative userAlternative user self, alternative user The community set X, paf (V) of corporations belonging to good friend, alternative user, gaf (V), set up the input network G in the t time periodt=(Vt, Et,X,paf(V),gaf(V));Seek the value making maximized parameter sets θ of object function=({ α }, { β }, γ), it is thus achieved that next Whether time period alternative user can participate in this topic.Concrete model and how to be predicted to do in specific implementation method Detailed description.

The present invention proposes a kind of user and participates in much-talked-about topic Forecasting Methodology.First from network structure, it is considered to focus The propagation of topic relies primarily on that " user forwards participation topic, and the vermicelli of user forwards and participates in topic, and the vermicelli of vermicelli forwards ginseng again With topic " such information spreading network construction features, therefore the object of research is concentrated mainly on and participates in this topic user's On vermicelli, by its vermicelli user alternately user.Secondly from the standpoint of power of influence, owing to participating in user's table of much-talked-about topic Existing form mainly shows themselves in that personal interest, its follower drive and corporations' three aspects of promotion, therefore respectively from alternative user certainly Belonging to body, alternative user good friend and alternative user, three aspects of corporations extract association attributeses, be simultaneously introduced topic ageing because of Element impact.Finally, due to user-user information is propagated has the speciality of Markov property, i.e. nodes ' behavior is only by self and about The impact of limited node, utilizes markov random file theoretical basis thought and method, constructs user and participate in much-talked-about topic Forecast model.The present invention utilizes the interactive data of much-talked-about topic in social networks, using participate in topic user vermicelli as standby Select family, it was predicted that alternative user the most also can also assist in the discussion of this topic in following time period.

Consider that the user's form of expression participating in much-talked-about topic mainly shows themselves in that individual promotion, good friend promote and corporations push away Dynamic three aspects, therefore extract three aspects of corporations belonging to alternative user self, alternative user good friend and alternative user respectively Association attributes, as in terms of alternative user self, extracts whether it is any active ues, if having the keyword consistent with topic The attributes such as label;In terms of alternative user good friend, extract whether its good friend is certification user, if be the attributes such as opinion leader; In terms of corporations belonging to alternative user, extract the number having participated in topic discussion in these corporations as an attribute.Meanwhile, determine The ageing influence factor of topic information is added during the correlation factor function of above three aspects;On the other hand, according to Information Communication There is the speciality of Markov property, i.e. nodes ' behavior only affected by self and the most limited node, utilize Markov with Airport theoretical basis thought and method, construct the forecast model participating in much-talked-about topic.Can the prediction of mobilism alternative User's participative behavior, and analyze the following participation temperature of this much-talked-about topic.I.e. subsequent time period has how many people can participate in this topic Discussion.

The present invention is directed to the advanced dynamic origin cause of formation of user behavior, there is the distribution of life cycle each phase data in much-talked-about topic The following temperature that participates in of user's participative behavior and much-talked-about topic can be made accurately by the openness problem of uneven and network structure Prediction.Can obtain, according to above-mentioned prediction, the much-talked-about topic etc. that alternative user will be participated in discussion accurately to predict, according to prediction knot Really, it is thus achieved that subsequent time network traffic statistics, network structure and bandwidth resource allocation etc. are adjusted in real time.

Accompanying drawing explanation

Fig. 1 is the overview flow chart of the present invention;

Fig. 2 is the forecast model factor graph of the present invention;

Fig. 3 is parameter fitting flow chart.

Detailed description of the invention

In order to better illustrate present disclosure, below with reference to Figure of description and according to concrete to the present invention of example Enforcement is further elaborated.

Mainly show themselves in that individual promotion, good friend promote and corporations push away owing to participating in user's form of expression of much-talked-about topic Dynamic three aspects, therefore the present invention is respectively directed to vermicelli personal interest, its concern user and corporations' tripartite relationship feature, based on Time discretization and time dicing method, add the ageing factor impact of topic information, determine the correlation factor of three aspects Function;There is life cycle each phase data skewness and the openness problem of network structure for much-talked-about topic, build Participate in the mobilism forecast model of much-talked-about topic so that it is can participate in the vermicelli of topic user and the most also can participate in by dynamic prediction The discussion of this topic, and excavate dynamic factor and this topic temperature trend participating in this topic discussion.

Specifically it is expressed as, a social networks G of certain time period t under certain much-talked-about topic givenU t=(Ut,EU,AU), Wherein, UtRepresenting the user that the t time period is relevant under this much-talked-about topic, these users participate in, in including the t time period, the use of coming in Family and the vermicelli of these users,Represent the relation between all users, given a series of topic disseminationsRepresent topic information propagation between users;From existing network GU tIn find out alternative network GV t=(Vt, EV,AV), wherein, VtFor alternative user.Prediction Yt+1={ y1,y2,...,yn, yiRepresent alternative user viAt subsequent time period t+1 Whether can participate in the discussion of this topic, i.e. be represented by:

The present invention participates in situation according to the topic under certain topic of t time period and predicts that the certain user of t+1 time period is The no discussion that can participate in topic.It is illustrated in figure 1 the overview flow chart of the present invention, including: obtain data module, resolve attribute mould Block, builds model module, it was predicted that analyze module four module altogether.

The detailed implementation process of the detailed description below present invention.

S1: obtain data source.The acquisition of data source can directly be downloaded from the research commending system of existing sing on web Or the public API utilizing ripe social platform obtains.Can use following steps:

The data obtained specifically include much-talked-about topic participant in its life cycle and participate in situation and participant Vermicelli situation, topic participates in situation and includes time, the essential information of participating user and the passing behavior that topic is forwarded and comments on Data;The vermicelli situation of participant includes that the essential information of vermicelli and the passing behavioral data of vermicelli (forward and original microblogging feelings Condition), and the concern between vermicelli, it is concerned relation.

Data source acquisition module collect between user basic information, user's vermicelli essential information, vermicelli friend relation and Vermicelli historical behavior, specifically can adopt with the following method (conventional method that may be used without prior art obtains):

S11: obtain initial data.By the public API of social networks or directly download available data source and can obtain former Beginning data, can obtain this network data to public by the public API of social networks, it is possible to combine the methods such as web crawlers Supplementary data.

S111: obtain all participants under certain much-talked-about topic and essential information thereof.

S112: obtain concern between the essential information of the vermicelli of all participants of this much-talked-about topic and vermicelli and closed Note information.

S113: obtain all participants of this much-talked-about topic and the passing behavioral data of vermicelli thereof.

S12: simple data cleansing.Major part data can be made to be beneficial to analyze by simple data cleansing.Such as, delete Except repeating data, cleaning invalid node etc..

S13: data are carried out time slicing and finds out alternative user.Rapid feature is propagated, with predetermined according to much-talked-about topic Time (such as 8 hours) is a time period to carry out time slicing.In certain time period t, find out at this time period internal reference With the user come in, by the vermicelli of these users alternately user, set up network according to the friend relation between alternative user.

S2: extract association attributes.Consider alternative user participate in certain much-talked-about topic mainly from personal interest, pay close attention to people to its shadow Ring, corporations affect three aspects to it, and the present invention is from three, corporations belonging to alternative user self, alternative user good friend, alternative user Aspect extracts association attributes.It can be appropriately modified by its attribute according to the feature in terms of data, and concrete example is made as follows Explanation.

S21: extract alternative user self attributes.Alternative user self attributes mainly considers with properties: 1. alternative user Whether it is any active ues;2. whether the label of alternative user there is the keyword consistent with much-talked-about topic;3. the pass of alternative user Note user has several user being to have participated in topic;4. the topic drive of the concern user (having participated in topic) of alternative user. Use Xi={ xi1,xi2,...,ximRepresent alternative user viCommunity set, such as: xi1=1 represents alternative user viThe 1st genus Property value is 1, i.e. alternative user viIt it is any active ues.xi1=0 represents alternative user viIt it not any active ues.Other several attributes are same Reason.

User viWhether it is any active ues, is judged by following formula:

activity(vi) represent user viActive index, and wherein κ take all user's active index rankings 10%~ The threshold values of 15%.activity(vi) obtain as follows:

activity(vi)=ρ * Ε [origNum (vi)]+Ε[retwNum(vi)]

Wherein, Ε [origNum (vi)],Ε[retwNum(vi)] it is user v respectivelyiThe scheduled time before topic is initiated (as being one month) day original microblogging number and day forward microblogging number.ρ is original microblogging number reduction rate, such as desirable ρ=0.8.

Total drive x of topic of the concern user (having participated in topic) of alternative useri4=inf (vi) it is:

Wherein, topic user u has been participated inkFor alternative user viConcern user.inf(uk) represent participated in topic user ukTopic drive, n represents alternative user viThe concern total number of users participating in topic, according to equation below obtain:

inf(uk)=ln (Ε [readNum (uk)]+1)+Ε[retNum(uk)]+Ε[comNum(uk)]

Wherein, Ε [readNum (uk)],Ε[retNum(uk)],Ε[comNum(uk)] refer to user u respectivelykSend out at topic Before, in scheduled time slot, the number that browses of original microblogging and forwarding microblogging is expected, is forwarded number expectation and comment on number expectation.

S22: extract alternative user good friend's attribute.Alternative user good friend is also alternative user herein.Main consideration is following several Individual attribute: 1. whether alternative user good friend is certification user;2. whether alternative user good friend is opinion leader.Use pafk(vi) fixed Justice is alternative user good friend viKth attribute.As: paf1(vi)=1 represents alternative user viThe 1st property value be 1, i.e. Alternative user viIt is certification user.paf1(vi)=0 represents alternative user viIt not certification user.

Wherein, the good friend v of alternative useriWhether it is opinion leader paf2(vi) can be determined by following formula:

Wherein, deg (vi) represent alternative user viVermicelli eigenvalue, ζ is that vermicelli eigenvalue rank threshold is (before ranking 10%-20%), vermicelli eigenvalue deg (vi) it is:

deg(vi)=σ × [fans (vi)-mutfans(vi)]+mutfans(vi)

Wherein, fans (vi),mutfans(vi) represent user v respectivelyiVermicelli number, good friend's number each other.σ is for reducing vermicelli The quantity gap of quantative attribute value, and be variable element.

S23: extract corporations' attribute belonging to alternative user.gaf(vi,Cm) represent alternative user viAffiliated corporations CmWhether it is τ Corporations.Its definition mode is as follows:

Wherein, the number having participated in this topic during τ corporations are defined as these corporations accounts for the percentage ratio of corporations' number more than certain Threshold values.τ(Cm) > ψ represents corporations CmIn participated in the number of this topic and accounted for the percentage ratio of corporations' number more than ψ.ψ is for being manually set Factor, can value in 1%~5%.

S24: after having extracted each attribute of above three aspects, obtain its correlation function.Correlation factor function is used for representing Dependency between attribute and alternative user.Its mode is as follows.

1. the saturation that alternative user self is relevant:

Wherein, fk(xik,yi) represent alternative user attribute xikAnd the dependency between alternative user.yi=1 represents alternative use Family viThis topic can be participated in the t+1 time period.xik≠0∩yi=1 represents user viKth property value be not 0 and alternative user This topic can be participated in the t+1 time period.It is half-life function, represents that topic is from the power of influence meeting initiating the t time period As time goes on reduce, i.e. the ageing impact of topic information.T represents the now residing time period, and ξ is according to experiment Data artificially determine, in present invention experiment, value is 2.

2. the saturation that alternative user good friend is correlated with:

Wherein, gl(yi,yj,pafl(vj)) represent alternative user good friend attribute paf (vj) and alternative user and alternative use Dependency between the good friend of family.

3. the saturation of corporations belonging to alternative user:

Wherein, h (yi,gaf(vi,Cm)) represent corporations attribute gaf (v belonging to alternative useri,Cm) and alternative user between phase Guan Xing.

According to defined above, calculate relevant saturation f () of alternative user self, alternative user good friend respectively to it Affect corporations belonging to g (), alternative user and it is affected h ().

S3: set up model, is illustrated in figure 2 forecast model factor graph.Alternative user viWhether can participate in the t+1 time period This topic, is mainly driven g (), alternative use by relevant saturation f () of alternative user self, alternative user good friend to it Promotion h () that it is affected by corporations belonging to family.The alternative user V extracted according to the t time periodt, good between alternative user Friend's set of relationshipThe attribute of three aspects of corporations belonging to alternative user self, alternative user good friend, alternative user Set X, paf (V), gaf (V) sets up the input network G in the t time periodt=(Vt,Et, X, paf (V), gaf (V)) build prediction Factor of a model figure, calculates the marginal probability of each alternative user, whether obtains subsequent time period alternative user according to marginal probability The prediction of this topic can be participated in.

Our purpose is in given input network GtIn the case of obtain Yt+1The value of set, Yt+1Set represents alternative use Whether family can participate in topic discussion at t+1.Prior probability P (Y can be soughtt+1|Gt) value, below illustrate according to hammersley- Clifford is theoretical and markov random file defines its log-likelihood object function, and prior probability model is as follows:

In order to succinctly remove subscript.Wherein, fk(xik,yi)、gl(yi,yj,pafl(vj))、h(yi,gaf(vi,Cm)) point Not representing alternative user self to be correlated with, alternative user good friend is correlated with, and the saturation that corporations belonging to alternative user are relevant, N represents Alternative user sum.eijRepresent alternative user viWith alternative user vjFriend relation, if eij=1 represents alternative user viWith standby Select family vjIt is good friend, if eij=0 represents alternative user viWith alternative user vjIt it not friend relation.II[eij] it is instruction letter Number, represents alternative user viWith alternative user vjBetween whether be friend relation.If friend relation, then II [eij]=1, if not Friend relation, then II [eij]=0.E represents friend relation set.In like manner, II [vi,Cm] represent alternative user viWhether belong to Corporations CmIf, alternative user viBelong to corporations Cm, then II [vi,Cm]=1, if alternative user viIt is not belonging to corporations Cm, then II [vi, Cm]=0.C is corporations' set.Pθ(Y | G) is represented to determine the topic of next stage during Exist Network Structure and participates in situation.αkRepresent The participative behavior of kth alternative user and alternative user self attributes function fk(xik,yi) degree of correlation.Same, βlRepresent The participative behavior of alternative user and alternative user good friend correlation function gl(yi,yj,pafl(vj)) degree of correlation, γ represents alternative The degree of correlation of the saturation that corporations belonging to the participative behavior of user and alternative user are relevant.It is easy to aspect, the present invention uses θ Represent these parameter sets, i.e. θ=({ α }, { β }, γ).Wherein, { α } is αkSet, k=1,2,3,4, { β } is βlCollection Close, l=1,2.Z is normalization factor, has ensured that this probability adds up to 1.Thus, it was predicted that the topic ginseng of alternative user next stage Translate into situation and ask the value of parameter sets θ=({ α }, { β }, γ) to make object function maximize.Solving and how of model The topic of prediction alternative user next stage participates in situation and will describe in detail in ensuing part.

It is illustrated in figure 3 parameter fitting flow chart.

S31: input t time period network Gt=(Vt,Et, X, paf (V), gaf (V)), and input θ=({ α }, { β }, γ's) Initial value, and the initial value of Studying factors η.

S32: in forecast model factor graph, calculates the marginal probability of each alternative user, i.e. p (yi) value.According to limit Edge probability is predicted.The algorithm having many can calculate p (yi) value, the conventional Junction Tree algorithm that has (combines tree Algorithm), BP algorithm (Belief Propagation, belief propagation algorithm), LBP algorithm (Loopy Belief Propagation, the belief propagation algorithm of band).Following steps revise the precision of prediction further:

S33: calculate the intensity of variation of each parameter according to marginal probability.Below with αkFurther illustrate as a example by parameter:

Wherein, Ε [fk(xik,yi)] represent the f bringing t+1 time period truthful data intok(xik,yi) saturation expected value.Represent bring into S33 part calculate marginal probability fk(xik,yi) saturation expected value. Same gradient descent algorithm calculates the intensity of variation of β, γ.

S34: utilize gradient descent algorithm to update αk、βl, the value of γ, this sentences αkAs a example by parameter, formula is as follows:

S35: updated αk、βl, after γ parameter, it may be judged whether convergence.The condition of convergence can have different methods, the present invention The changing value both less than threshold values of each parameter of middle employing is i.e. considered as convergence.If convergence forwards step S36 to, if do not restrained, again count Calculate marginal probability.

S36: the value of θ=({ α }, { β }, the γ) after output convergence.

The value of S4: available matching θ=({ α }, { β }, γ) out and the network G of any time period tt=(Vt,Et,X, Paf (V), gaf (V)), according to the network correction forecast model factor graph of matching, calculate its marginal probability p (yi) value, Predicted the outcome.

The participation temperature of this topic subsequent time period, i.e. subsequent time period can be analyzed by the result predicted and have many Few people can participate in the discussion of this topic.

The present invention utilizes the association attributes of the interactive data analysis user of much-talked-about topic in social networks, will participate in topic The vermicelli of user alternately user, it was predicted that alternative user the most also can also assist in the discussion of this topic in following time period. First, for the advanced dynamic origin cause of formation of user behavior, utilize time discretization and time dicing method, add topic information Ageing factor affects, and goes out respectively in terms of three, corporations belonging to alternative user self, alternative user good friend and alternative user Send out, define the correlation factor function of above tripartite.Then, there is according to Information Communication the speciality of Markov property, utilize horse Er Kefu Random Field Theory basis thought and method, construct the forecast model participating in much-talked-about topic so that it is can mobilism Prediction alternative user participative behavior, and analyze the following participation temperature of this much-talked-about topic.

Should be understood that above-mentioned specific embodiment, can make those skilled in the art and reader that this is more fully understood The implementation of bright creation, it should be understood that protection scope of the present invention is not limited to such special statement and enforcement Example.Therefore, although referring to the drawings and embodiment has been carried out detailed description to description of the invention to the invention, but, It will be understood by those of skill in the art that and still can modify the invention or equivalent, in a word, all are not Departing from technical scheme and the improvement thereof of the spirit and scope of the invention, it all should contain the protection in the invention patent In the middle of scope.

Claims (8)

1. a user participates in the Forecasting Methodology of much-talked-about topic, it is characterised in that obtains data source modules and utilizes in social networks The interactive data of much-talked-about topic, will participate in the vermicelli alternately user of topic user;Property extracting module, respectively from alternative Belonging to user's self attributes, alternative user good friend's attribute and alternative user, corporations' attribute obtains the correlation factor of above-mentioned three parts Function;Build and participate in much-talked-about topic forecast model, and model parameter is fitted;By the parameter obtained after matching and arbitrary time Carve the topic of t to participate in situation and be input to forecast model and carry out whether subsequent time alternative user can participate in the pre-of this topic discussion Survey, obtain social networks and the traffic flow information of subsequent time alternative user ownership according to predicting the outcome, adjust network structure.
Forecasting Methodology the most according to claim 1, it is characterised in that described alternative user self attributes includes: alternative use Whether family is any active ues;Whether the label of alternative user has the keyword consistent with much-talked-about topic;The concern of alternative user User has several user being to have participated in topic;The total drive of topic paying close attention to user of alternative user.
3. according to claim Forecasting Methodology according to claim 1, it is characterised in that affiliated correlation factor function representation Dependency between alternative user and association attributes, according to formula:Determine The saturation that alternative user self is relevant, wherein, fk(xik,yi) represent the participative behavior of alternative user and self association attributes Between dependency, yi=1 represents alternative user viThe discussion of this topic, x can be participated in future time sectionik≠0∩yi=1 represents Alternative user viKth property value be not 0 and alternative user can participate in this topic at subsequent time period;According to formula:Determine the factor letter that alternative user good friend is correlated with Number, wherein, gl(yi,yj,pafl(vj)) represent alternative viThe participative behavior of user and alternative user good friend vjAttribute pafl(vjDependency between), pafl (vj) represent alternative user vjThe l property value;According to formula Determine the saturation of corporations belonging to alternative user, wherein, h (yi,gaf(vi,Cm)) represent the participative behavior of alternative user and institute Belong to corporations attribute gaf (vi,Cm) dependency.
Forecasting Methodology the most according to claim 2, it is characterised in that determine whether alternative user is that any active ues is further Including: according to formula:Judge alternative user viWhether it is any active ues, wherein, xi1 (vi) represent alternative user viThe 1st attribute, activity (vi) represent user viActive index, κ is user's active index Ranking threshold values.
Forecasting Methodology the most according to claim 4, it is characterised in that according to formula: activity (vi)=ρ * Ε [origNum(vi)]+Ε[retwNum(vi)] determine user viActive index activity (vi), wherein, Ε [origNum (vi)],Ε[retwNum(vi)] it is user v respectivelyiBefore topic is initiated, the average daily original microblogging number of predetermined amount of time is with average daily Forwarding microblogging number, ρ is original microblogging number reduction rate.
Forecasting Methodology the most according to claim 2, it is characterised in that determine that the topic paying close attention to user of alternative user drives Power farther includes: according to formula:Determine alternative user viPay close attention to user the total drive of topic, Wherein, inf (uk) table concern user ukTopic drive, n represents alternative user viThe concern user participating in topic total Number, according to formula:
inf(uk)=ln (Ε [readNum (uk)]+1)+Ε[retNum(uk)]+Ε[comNum(uk)] obtain, wherein, Ε [readNum(uk)],Ε[retNum(uk)],Ε[comNum(uk)] refer to user u respectivelykBefore topic is initiated in scheduled time slot The number that browses of original microblogging and forwarding microblogging is expected, is forwarded number expectation and comment on number expectation.
Forecasting Methodology the most according to claim 1, it is characterised in that alternative user good friend's attribute includes: alternative user good friend Whether being certification user, whether alternative user good friend is opinion leader, pass through formula: Determine alternative user good friend viWhether it is opinion leader, deg (vi) represent alternative user viVermicelli eigenvalue, vermicelli eigenvalue deg(vi) it is: deg (vi)=σ × [fans (vi)-mutfans(vi)]+mutfans(vi), wherein, fans (vi),mutfans (vi) represent alternative user good friend v respectivelyiVermicelli number, good friend's number each other, ζ is vermicelli eigenvalue rank threshold, σ for reduce powder The quantity gap of silk quantative attribute value.
Forecasting Methodology the most according to claim 1, it is characterised in that forecast model carries out subsequent time period alternative user and is The no prediction that can participate in this topic specifically includes, the alternative user V extracted according to the t time periodt, friend relation between alternative user SetThe community set X, paf (V) of corporations belonging to alternative user, alternative user good friend, alternative user, gaf (V), the input network G in the t time period is set upt=(Vt,Et, X, paf (V), gaf (V)) build forecast model factor graph;Calculate According to marginal probability, the marginal probability of each alternative user, obtains whether subsequent time period alternative user can participate in the pre-of this topic Survey.
CN201610083734.XA 2016-02-07 2016-02-07 Prediction method for user participating in hot topics in social network CN105809554B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610083734.XA CN105809554B (en) 2016-02-07 2016-02-07 Prediction method for user participating in hot topics in social network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610083734.XA CN105809554B (en) 2016-02-07 2016-02-07 Prediction method for user participating in hot topics in social network

Publications (2)

Publication Number Publication Date
CN105809554A true CN105809554A (en) 2016-07-27
CN105809554B CN105809554B (en) 2020-03-17

Family

ID=56466277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610083734.XA CN105809554B (en) 2016-02-07 2016-02-07 Prediction method for user participating in hot topics in social network

Country Status (1)

Country Link
CN (1) CN105809554B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557552A (en) * 2016-10-27 2017-04-05 国家计算机网络与信息安全管理中心 A kind of network topics temperature Forecasting Methodology
CN106651016A (en) * 2016-12-13 2017-05-10 重庆邮电大学 System and method for dynamically predicting user behaviors under hot topics
CN106649714A (en) * 2016-12-21 2017-05-10 重庆邮电大学 topN recommendation system and method for data non-uniformity and data sparsity
CN106682770A (en) * 2016-12-14 2017-05-17 重庆邮电大学 Friend circle-based dynamic microblog forwarding behavior prediction system and method
CN107358534A (en) * 2017-06-29 2017-11-17 浙江理工大学 The unbiased data collecting system and acquisition method of social networks

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092921A (en) * 2012-12-26 2013-05-08 中国科学院深圳先进技术研究院 Dynamic prediction method facing microblog hot topic community and system
CN104216954A (en) * 2014-08-20 2014-12-17 北京邮电大学 Prediction device and prediction method for state of emergency topic
CN104408108A (en) * 2014-11-18 2015-03-11 重庆邮电大学 Hot topic group influence analysis system and method based on grey system theory
US20150356571A1 (en) * 2014-06-05 2015-12-10 Adobe Systems Incorporated Trending Topics Tracking
CN105224608A (en) * 2015-09-06 2016-01-06 华南理工大学 The hot news Forecasting Methodology analyzed based on microblog data and system
CN105243448A (en) * 2015-10-13 2016-01-13 北京交通大学 Method and device for predicting evolution trend of internet public opinion

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092921A (en) * 2012-12-26 2013-05-08 中国科学院深圳先进技术研究院 Dynamic prediction method facing microblog hot topic community and system
US20150356571A1 (en) * 2014-06-05 2015-12-10 Adobe Systems Incorporated Trending Topics Tracking
CN104216954A (en) * 2014-08-20 2014-12-17 北京邮电大学 Prediction device and prediction method for state of emergency topic
CN104408108A (en) * 2014-11-18 2015-03-11 重庆邮电大学 Hot topic group influence analysis system and method based on grey system theory
CN105224608A (en) * 2015-09-06 2016-01-06 华南理工大学 The hot news Forecasting Methodology analyzed based on microblog data and system
CN105243448A (en) * 2015-10-13 2016-01-13 北京交通大学 Method and device for predicting evolution trend of internet public opinion

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106557552A (en) * 2016-10-27 2017-04-05 国家计算机网络与信息安全管理中心 A kind of network topics temperature Forecasting Methodology
CN106651016A (en) * 2016-12-13 2017-05-10 重庆邮电大学 System and method for dynamically predicting user behaviors under hot topics
CN106682770A (en) * 2016-12-14 2017-05-17 重庆邮电大学 Friend circle-based dynamic microblog forwarding behavior prediction system and method
CN106649714A (en) * 2016-12-21 2017-05-10 重庆邮电大学 topN recommendation system and method for data non-uniformity and data sparsity
CN107358534A (en) * 2017-06-29 2017-11-17 浙江理工大学 The unbiased data collecting system and acquisition method of social networks

Also Published As

Publication number Publication date
CN105809554B (en) 2020-03-17

Similar Documents

Publication Publication Date Title
Tang et al. Exploiting homophily effect for trust prediction
Bourigault et al. Representation learning for information diffusion through social networks: an embedded cascade model
Squartini et al. Reciprocity of weighted networks
Pinto et al. Using early view patterns to predict the popularity of youtube videos
Perra et al. Activity driven modeling of time varying networks
US8909646B1 (en) Pre-processing of social network structures for fast discovery of cohesive groups
Wang et al. Opportunity model for e-commerce recommendation: right product; right time
Abel et al. Analyzing user modeling on twitter for personalized news recommendations
Duan et al. Motivating smartphone collaboration in data acquisition and distributed computing
CN104254852B (en) Method and system for mixed information inquiry
Chaoji et al. Recommendations to boost content spread in social networks
Bedi et al. Trust based recommender system using ant colony for trust computation
Schneider et al. Unravelling daily human mobility motifs
Maia et al. Identifying user behavior in online social networks
Galuba et al. Outtweeting the twitterers-predicting information cascades in microblogs.
Liu et al. Personalized travel package recommendation
Wang et al. SentiView: Sentiment analysis and visualization for internet popular topics
Yang et al. Predicting the speed, scale, and range of information diffusion in twitter
Wang et al. Diffusive logistic model towards predicting information diffusion in online social networks
CN103177090B (en) A kind of topic detection method and device based on big data
CN104471571B (en) To Web activities index, sequence and the system and method for analysis under event-driven framework
Yin et al. Structural link analysis and prediction in microblogs
Zhang et al. A collective bayesian poisson factorization model for cold-start local event recommendation
CN105260474B (en) A kind of microblog users influence power computational methods based on information exchange network
Pham et al. S3g2: A scalable structure-correlated social graph generator

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant