CN105809554B - Prediction method for user participating in hot topics in social network - Google Patents

Prediction method for user participating in hot topics in social network Download PDF

Info

Publication number
CN105809554B
CN105809554B CN201610083734.XA CN201610083734A CN105809554B CN 105809554 B CN105809554 B CN 105809554B CN 201610083734 A CN201610083734 A CN 201610083734A CN 105809554 B CN105809554 B CN 105809554B
Authority
CN
China
Prior art keywords
alternative
user
users
topic
alternative user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610083734.XA
Other languages
Chinese (zh)
Other versions
CN105809554A (en
Inventor
肖云鹏
赖佳伟
刘宴兵
叶青
王宇航
黄恺
李露
李松阳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chongqing University of Post and Telecommunications
Original Assignee
Chongqing University of Post and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chongqing University of Post and Telecommunications filed Critical Chongqing University of Post and Telecommunications
Priority to CN201610083734.XA priority Critical patent/CN105809554B/en
Publication of CN105809554A publication Critical patent/CN105809554A/en
Application granted granted Critical
Publication of CN105809554B publication Critical patent/CN105809554B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Marketing (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Quality & Reliability (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Development Economics (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Primary Health Care (AREA)
  • Information Transfer Between Computers (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention belongs to the technical field of computer network information analysis. Based on an online user and a user friend relationship network, the characteristics of fan personal interests, concerned users and community three-party relationship are considered, the influence of timeliness factors of topic information is added through a time discretization and time slicing method, and meanwhile, the problems of uneven data distribution and network structure sparsity of all stages of a life cycle of a hot topic exist. And constructing a hot topic participation prediction model, fitting the model, inputting data into the prediction model to analyze the power factor of the fan participating in the topic discussion, enabling the fan to dynamically predict whether the fan of a user participating in the topic participates in the topic discussion, and mining the power factor participating in the topic discussion and the heat trend of the topic.

Description

Prediction method for user participating in hot topics in social network
Technical Field
The invention belongs to the field of computer information technology analysis, and particularly relates to prediction analysis for hot topics participated by users.
Background
With the rapid popularity of social networking applications, users spend more and more time on social networks. Meanwhile, the related information left by the user also enables the social network to become a huge information platform. By using the information platform, the behavior of the user and the propagation rule of the information can be mastered. The method can be beneficial to the reasonable distribution and utilization of the calculated network information flow, the network capacity and the network resources.
At present, network public opinions caused by hot topics and hot events in a social network are increasing, and the analysis of the hot topics gradually becomes a research hotspot. The method mainly researches the exploration in different degrees in the directions of user influence, information propagation, user behaviors and the like. The methods used include text-based analysis, user influence-based analysis, and network structure-based analysis, among others. Wherein, the text-based analysis mainly estimates the propagation range of the hot topics from the information intensity of the hot topics; the influence of the user on other users in the social network is mainly researched based on the user influence analysis, and the influence is combined with the behavior factors influencing the forwarding and comment of the user, so that the purpose of mining the information propagation situation in the network is achieved. Such as: sen Wu et al in "public influence: analyzing past behaviors of network-wide users in the public Influence in social Networks (Conflict: Confortimity influx in Large social Networks), and establishing an Influence model according to the past behaviors; the analysis based on the network structure mainly utilizes the theory of the small world, the user access degree and the like to lead the topic propagation to reach a stable and balanced state in the network structure, thereby predicting the propagation of the hot topic. Such as: the prediction model based on the friend circle of the user is researched by the jin Zhang et al in the forwarded microblog prediction based on the friend circle (Who influted You.
However, the prior art does not fully consider the complex dynamic cause of user behavior, and the hot topic has the problems of uneven data distribution and network structure sparsity in each stage of the life cycle.
Disclosure of Invention
The problems to be solved by the invention are as follows: aiming at the complex dynamic cause of user behaviors, hot topics have the problems of uneven data distribution in each stage of a life cycle, sparsity of network structures and the like. The invention provides a method for predicting hot topics participated by users. The method takes fans of users who have participated in the topics as alternative users, and mainly focuses the research objects on the alternative users. Meanwhile, the method starts from three aspects of influence of the alternative user, friend drive of the alternative user and community to which the alternative user belongs, adds influence of topic timeliness factors, and establishes a prediction model of the user participating in the hot topic through a random field theory.
A prediction method for users participating in hot topics comprises the steps that a data source obtaining module utilizes interactive data of the hot topics in a social network, and fans of users who have participated in the topics are used as alternative users; the attribute extraction module respectively acquires the correlation factor functions of the three parts from the self attribute of the alternative user, the friend attribute of the alternative user and the attribute of the community to which the alternative user belongs; constructing a prediction model of the participating hot topics, and fitting model parameters; and inputting the parameters obtained after fitting and the topic participation condition at any time t into a prediction model to predict whether the alternative user will participate in the topic discussion at the next time, acquiring the social network and data stream information to which the alternative user belongs at the next time according to the prediction result, and adjusting the network structure. Such as allocating more network resources for social networks involved in topics on which alternative users participate at the next moment.
The self-attribute of the alternative user comprises the following steps: alternative user viWhether it is an active user, isActivity (v)i) (ii) a Alternative user viWhether the label of (2) has a keyword isSameTag (v) consistent with the hot topici) (ii) a Alternative user viThere are several users countOfHF (v) who are already participating in the topici) (ii) a Alternative user viIs concerned about the topic of the user and is dynamic inf (v)i). For convenience of description, the above-mentioned four attributes about the alternative user itself are used as x in the present inventionikDescribed in such a unified form, representing alternative users viThe k-th attribute of (2): x is the number ofi1=isActivity(vi);xi2=isSameTag(vi);xi3=countOfHF(vi);xi4=inf(vi). According to the formula:
Figure GDA0001017491620000031
determining a factor function associated with the alternative user, wherein fk(xik,yi) Representing the correlation between the participation behavior of the alternative user and the self-correlation attribute, y i1 denotes an alternative user viWill participate in the topic in a future time periodDiscussion, xik≠0∩y i1 denotes an alternative user viIs not 0 and the alternative user will participate in the topic in the next time period; according to the formula:
Figure GDA0001017491620000032
determining a factor function related to the alternative user friends, wherein gl(yi,yj,pafl(vj) An alternative user's participation behavior and alternative user friend attributes pafl(vj) Correlation between, pafl(vj) Representing alternative users viThe ith attribute value of (1); according to the formula
Figure GDA0001017491620000033
Determining a factor function of a community to which the alternative user belongs, wherein h (y)i,gaf(vi,Cm) An engagement behavior and community attribute gaf (v) representing alternative usersi,Cm) Correlation of gaf (v)i,Cm) Representing alternative users viCommunity C of belongingsmIs a community of τ. CmRepresenting the mth community. The community is obtained through a community classification algorithm according to the topic user relationship network.
Determining whether the alternate user is an active user further comprises: according to the formula:
Figure GDA0001017491620000041
judging alternative users viWhether it is an active user, where xi1(vi) Representing alternative users viActivity (v) of the 1 st attribute of (1)i) Representing user viK is a user activity index ranking threshold; according to the formula: activity (v)i)=ρ*E[origNum(vi)]+E[retwNum(vi)]Determining activity (v)i) Wherein, E [ origNum (v) ]i)],E[retwNum(vi)]Are respectively user viThe number of original microblogs in the day average and the number of forwarded microblogs in the day average in a preset time period before the topic is launched, wherein rho is a variable parameter.
Determining alternativesThe motivation for the user to focus on the user's topics further includes: according to the formula:
Figure GDA0001017491620000042
determining a user u of interestkFor alternative user viTopic of (d) Total drive force inf (v)i) Wherein inf (u)k) Representing a user u of interestkIs dynamic, n represents an alternative user viThe total number of interested users who have participated in the topic.
According to the formula:
inf(uk)=ln(E[readNum(uk)]+1)+E[retNum(uk)]+E[comNum(uk)]obtaining attention user ukThe topic of (1) is dynamic, wherein E [ readNum (u) ]k)],E[retNum(uk)],E[comNum(uk)]Refer to concerned users u respectivelykAnd creating an expected browsing number value, an expected forwarding number value and an expected comment number value of the original microblog and the forwarded microblog within a certain time period before the topic is launched.
The friend attributes of the alternative users comprise: whether the friend of the alternative user is the authenticated user or not and whether the friend of the alternative user is the opinion leader or not. Also for ease of description, the friend attribute of the candidate user mentioned above is used as paf in the present inventionk(vi) Described in a unified form, representing alternative users viThe k-th attribute of (1). paf1(vi) Representing alternative users viWhether it is an authenticated user, paf1(vi) 1 denotes an alternative user viIs authenticating a user, paf1(vi) 0 denotes an alternative user viRather than authenticating the user. paf2(vi) Representing alternative users viWhether it is an opinion leader, paf2(vi) 1 denotes an alternative user viOpinion leader, paf2(vi) 0 denotes an alternative user viNot the opinion leader. By the formula:
Figure GDA0001017491620000051
determining alternative user friends viWhether it is an opinion leader, deg (v)i) To representAlternative user viCharacteristic value of vermicelli, i.e. characteristic value deg (v) of vermicellii) Comprises the following steps: deg (v)i)=σ×[fans(vi)-mutfans(vi)]+mutfans(vi) Wherein, fans (v)i),mutfans(vi) Respectively represent friends v of alternative usersiThe number of fans of (1) is the number of friends of each other, zeta is a fan characteristic value ranking threshold, and sigma is a variable parameter.
Community attribute gaf to which alternative user belongs (v)i,Cm) Including, alternative users viCommunity C of belongingsmWhether it is a community of tau, CmRepresenting the mth community. According to the formula
Figure GDA0001017491620000052
Determining alternative users viCommunity C of belongingsmIs τ Community, where τ (C)m)>Psi denotes the community CmThe percentage of the total number of users participating in the topic to the number of users in the community is larger than a set factor psi.
The prediction of whether the alternative user in the next time period can participate in the topic by the prediction model specifically comprises the step of extracting the alternative user V according to the time period ttSet of friend relationships between alternative users
Figure GDA0001017491620000053
The attribute sets X, paf (V) and gaf (V) of the community to which the alternative user, the alternative user friend and the alternative user belong are established in an input network G at the time period tt=(Vt,EtX, paf (v), gaf (v)), the parameter set θ ({ α }, { β }, γ) for maximizing the objective function is evaluated to obtain whether the candidate user will participate in the topic in the next time period.
The invention provides a method for predicting hot topics participated by users. Firstly, starting from a network structure, considering the structural characteristics of an information propagation network that the propagation of the hot topic mainly depends on the fact that the fans of the users forward the participating topic and then forward the participating topic, the fans of the users forward the participating topic, so that the research objects are mainly concentrated on the fans of the users who have participated in the topic, and the fan users are used as alternative users. Secondly, considering from the aspect of influence, the expression of the user participating in the hot topic is mainly expressed as follows: the method comprises three aspects of personal interest, follower of a person concerned and community promotion, so that relevant attributes are extracted from three aspects of an alternative user, an alternative user friend and a community to which the alternative user belongs, and influence of topic timeliness factors is added. And finally, because the information transmission among the users has the characteristic of Markov property, namely the node behavior is only influenced by the self and the limited nodes around the node, a prediction model of the hot topic participated by the user is constructed by utilizing the basic thought and the method of the Markov random field theory. According to the method and the system, the fans of users who have participated in the topic are used as alternative users by utilizing the interactive data of the hot topic in the social network, and whether the alternative users can also participate in the discussion of the topic in a future time period is predicted.
The user expression form considering the participation of the hot topics is mainly expressed as follows: the method comprises the following steps of (1) individual pushing, friend pushing and community pushing, so that relevant attributes of an alternative user, an alternative user friend and a community to which the alternative user belongs are extracted respectively, for example, on the aspect of the alternative user, whether the alternative user is an active user or not, whether key word tags consistent with topics exist or not and the like are extracted; in the aspect of the optional user friend, extracting the attributes of whether the friend is an authenticated user, whether the friend is an opinion leader and the like; in terms of the community to which the alternative user belongs, the number of people who have participated in the topic discussion in the community is extracted as an attribute. Meanwhile, adding timeliness influence factors of topic information when determining the correlation factor functions of the three aspects; on the other hand, according to the characteristic that information propagation has Markov property, namely, the node behavior is only influenced by the node and the limited nodes around the node, a prediction model participating in the hot topic is constructed by utilizing the basic thought and the method of the Markov random field theory. The method and the system can dynamically predict the alternative user participation behaviors and analyze the future participation popularity of the hot topic. I.e. how many people will participate in the discussion of the topic in the next time period.
The method and the system aim at the problems of complex dynamic cause of user behaviors, uneven data distribution of hot topics in each stage of a life cycle and sparsity of a network structure, and can accurately predict the user participation behaviors and the future participation popularity of the hot topics. According to the prediction, accurate prediction of hot topics about to participate in discussion of the alternative users can be obtained, network flow statistical data at the next moment are obtained according to the prediction result, and the network structure, bandwidth resource allocation and the like are adjusted in real time.
Drawings
FIG. 1 is a general flow diagram of the present invention;
FIG. 2 is a prediction model factor graph of the present invention;
fig. 3 is a flow chart of parameter fitting.
Detailed Description
In order to better explain the content of the invention, the following description further explains the concrete implementation of the invention according to the example with reference to the attached drawings.
The expression form of the user participating in the hot topic is mainly expressed as follows: the method comprises three aspects of individual promotion, friend promotion and community promotion, so that according to the characteristics of three-party relations among individual interests of fans, concerned users of the fans and communities, time discretization and time slicing methods are used for determining related factor functions of the three aspects by adding timeliness factor influence of topic information; aiming at the problems of uneven data distribution and sparsity of a network structure in each stage of a life cycle of a hot topic, a dynamic prediction model participating in the hot topic is constructed, so that whether fans of users participating in the topic also participate in the discussion of the topic can be dynamically predicted, and a power factor participating in the discussion of the topic and a heat degree trend of the topic are mined.
Specifically, a social network G of a certain time period t given a certain hot topicU t=(Ut,EU,AU) Wherein, UtRepresenting users related to the t time period under the hot topic, the users including the users who participated in the t time period and fans of the users,
Figure GDA0001017491620000081
to representRelationships among all users, given a series of topic propagation behaviors
Figure GDA0001017491620000082
Representing the propagation of topic information among users; from the existing network GU tIn finding alternative network GV t=(Vt,EV,AV) Wherein V istIs an alternative user. Prediction of Yt+1={y1,y2,...,yn},yiRepresenting alternative users viWhether to participate in the discussion of the topic in the next time period t +1 can be expressed as:
Figure GDA0001017491620000083
the invention predicts whether some users in the t +1 time period can participate in the discussion of the topic according to the topic participation situation under a certain topic in the t time period. Fig. 1 shows a general flow chart of the present invention, which includes: the system comprises a data acquisition module, an attribute analysis module, a model construction module and a prediction analysis module.
The detailed implementation of the present invention is described in detail below.
S1: a data source is acquired. The data source may be obtained by downloading directly from existing Web-based research recommendation systems or by using the public API of a sophisticated social platform. The following steps may be employed:
the acquired data specifically comprises participant participation conditions of the hot topics in the life cycle of the hot topics and fan conditions of the participants, wherein the topic participation conditions comprise the time of forwarding and commenting the topics, basic information of participating users and past behavior data; the fan conditions of the participants comprise basic information of fans, past behavior data (forwarding and original microblog conditions) of the fans, attention among the fans and attention-receiving relationship.
The data source obtaining module obtains the user basic information, the user fan basic information, the friend relationship among fans and the fan historical behavior by collecting, and specifically adopts the following method (or adopts the conventional method in the prior art to obtain):
s11: raw data is acquired. The original data can be obtained through the social network public API or the existing data source is directly downloaded, the data disclosed to the public by the network can be obtained through the social network public API, and the data can be supplemented by combining methods such as a web crawler and the like.
S111: and acquiring all participants and basic information thereof under a certain hot topic.
S112: and acquiring basic information of fans of all participants of the hot topic and attention and concerned information among the fans.
S113: and acquiring past behavior data of all participants and fans of the hot topic.
S12: simple data cleaning. Most of the data can be made available for analysis by simple data cleansing. Such as deleting duplicate data, cleaning up invalid nodes, etc.
S13: time slicing the data to find alternative users. According to the characteristic that the hot topic is spread rapidly, time slicing is carried out by taking preset time (such as 8 hours) as a time period. And in a certain time period t, finding out users participating in the time period, taking fans of the users as alternative users, and establishing a network according to the friend relationship among the alternative users.
S2: and extracting the relevant attributes. Considering that the alternative users participate in a certain hot topic and mainly comprises three aspects of personal interests, influence of interested persons on the alternative users and influence of communities on the alternative users, the invention extracts relevant attributes from three aspects of the alternative users, friends of the alternative users and communities to which the alternative users belong. The attributes of which may be modified as appropriate depending on the characteristics of the data aspects, specific examples of which are described below.
S21, extracting self attributes of alternative users, wherein the self attributes of the alternative users mainly consider whether ① alternative users are active users, whether ② labels of the alternative users have keywords consistent with hot topics, whether ③ concerned users of the alternative users are users who already participate in topics, and the topic driving force of ④ concerned users (already participating in topics) of the alternative users is utilized by Xi={xi1,xi2,...,ximDenotes an alternative user viSuch as: x is the number ofi11 denotes an alternative user viHas a value of 1 as the 1 st attribute value, i.e. alternative user viAre active users. x is the number ofi10 denotes an alternative user viNot an active user. Several other attributes work similarly.
User viWhether the user is active or not is judged by the following formula:
Figure GDA0001017491620000101
activity(vi) Representing user viAnd wherein k is a threshold of 10% to 15% of all user activity index rankings. activity (v)i) The following are obtained:
activity(vi)=ρ*E[origNum(vi)]+E[retwNum(vi)]
wherein, E [ origNum (v)i)],E[retwNum(vi)]Are respectively user viThe original microblog number of the day and the forwarding microblog number of the day of a preset time (such as one month) before the topic is launched. Rho is the weakening rate of the original microblog number, and if rho is 0.8, the weakening rate can be obtained.
Topic total driving force x of concerned users (participated topics) of alternative usersi4=inf(vi) Comprises the following steps:
Figure GDA0001017491620000102
wherein, the users u already participating in the topickFor alternative users viIs focused on the user. inf (u)k) Representing a participated topic user ukIs dynamic, n represents an alternative user viThe total number of concerned users who have participated in the topic is obtained according to the following formula:
inf(uk)=ln(E[readNum(uk)]+1)+E[retNum(uk)]+E[comNum(uk)]
wherein, E [ readNum (u) ]k)],E[retNum(uk)],E[comNum(uk)]Respectively refer to user ukOn-line telephoneAnd the browsing number expectation, the forwarding number expectation and the comment number expectation of the original microblog and the forwarded microblog in a preset time period before the topic is initiated.
S22, extracting the attribute of the alternative user friend, wherein the alternative user friend is the alternative user, mainly considering the following attributes, ① whether the alternative user friend is the authenticated user, ② whether the alternative user friend is the opinion leader, pafk(vi) Defined as alternative user friends viThe k-th attribute of (1). Such as: paf1(vi) 1 denotes an alternative user viHas a value of 1 as the 1 st attribute value, i.e. alternative user viIs to authenticate the user. paf1(vi) 0 denotes an alternative user viRather than authenticating the user.
Wherein, the friends v of the alternative usersiWhether it is an opinion leader paf2(vi) Can be determined by the following formula:
Figure GDA0001017491620000111
wherein deg (v)i) Representing alternative users viThe bean starch characteristic value zeta is the ranking threshold value (such as 10-20% before ranking) of the bean starch characteristic value deg (v)i) Comprises the following steps:
deg(vi)=σ×[fans(vi)-mutfans(vi)]+mutfans(vi)
wherein, fans (v)i),mutfans(vi) Respectively representing users viThe number of fans, the number of friends. σ is a quantity difference for reducing the fan quantity characteristic value and is a variable parameter.
S23: and extracting community attributes to which the alternative users belong. gaf (v)i,Cm) Representing alternative users viCommunity C of belongingsmIs a community of τ. The definition is as follows:
Figure GDA0001017491620000112
wherein, the tau community is defined as the number of people participating in the topic in the communityThe percentage of people in the group is greater than a certain threshold. Tau (C)m)>Psi denotes the community CmThe percentage of the community population that has participated in the topic is greater than psi. Psi is an artificial setting factor, and can be selected from 1-5%.
S24: and after extracting the attributes of the three aspects, acquiring the correlation function of the attributes. The correlation factor function is used to represent the correlation between the attribute and the alternative user. This is done as follows.
① alternative user self-related factor function:
Figure GDA0001017491620000121
wherein f isk(xik,yi) Representing alternative user attributes xikAnd the relevance between alternative users. y isi1 denotes an alternative user viThe topic would be engaged during the t +1 time period. x is the number ofik≠0∩y i1 denotes user viIs not 0 and the alternative user would participate in the topic during the time period t + 1.
Figure GDA0001017491620000122
The time period t represents the time period in which the topic is located, ξ is determined artificially according to experimental data, and the value in the experiment of the invention is 2.
② alternative user friend-related factor function:
Figure GDA0001017491620000123
wherein, gl(yi,yj,pafl(vj) An attribute paf (v) indicating alternative user buddiesj) And the correlation between the alternative users and the friends of the alternative users.
③ factor function of community to which alternative user belongs:
Figure GDA0001017491620000131
wherein, h (y)i,gaf(vi,Cm) An attribute gaf (v) indicating the community to which the alternative user belongsi,Cm) And the relevance between alternative users.
According to the definition, a factor function f (normal) related to the alternative user, the influence g (normal) of the friend of the alternative user on the factor function f (normal), and the influence h (normal) of the community to which the alternative user belongs are respectively calculated.
S3: a model is established, and a factor graph of the prediction model is shown in fig. 2. Alternative user viWhether the topic can be participated in the time period of t +1 is mainly promoted by a factor function f (cndot.) related to the alternative user, g (cndot.) driven by friends of the alternative user and h (cndot.) influenced by a community to which the alternative user belongs. Alternative user V extracted according to t time periodtSet of friend relationships between alternative users
Figure GDA0001017491620000133
Attribute sets X, paf (V) and gaf (V) of the alternative user, the alternative user friend and the community to which the alternative user belongs are established in an input network G at the time period tt=(Vt,EtX, paf (V) and gaf (V)) constructing a prediction model factor graph, calculating the edge probability of each alternative user, and obtaining the prediction of whether the alternative user can participate in the topic in the next time period according to the edge probability.
Our purpose is to provide a given input network GtIn the case of (2), Y is obtainedt+1Value of the set, Yt+1The set represents whether the alternative user will participate in the topic discussion at t + 1. The prior probability P (Y) can be obtainedt+1|Gt) The log-likelihood objective function is defined by hammersley-clifford theory and markov random field, as follows:
Figure GDA0001017491620000132
the superscript has been removed for brevity. Wherein f isk(xik,yi)、gl(yi,yj,pafl(vj))、h(yi,gaf(vi,Cm) Respectively represent the correlation of the candidate users themselves, the correlation of the friends of the candidate users and the correlation of the communities to which the candidate users belong, and N represents the total number of the candidate users. e.g. of the typeijRepresenting alternative users viAnd alternative user vjIf e is a friend relationship ofij1 denotes an alternative user viAnd alternative user vjIs a friend, if e ij0 denotes an alternative user viAnd alternative user vjNot a buddy relationship. II [ e ]ij]Is an indication function, represents an alternative user viAnd alternative user vjWhether or not there is a friend relationship. If it is a friend relationship, II [ e ]ij]If not friend, II [ e ═ 1ij]0. E represents a set of buddy relationships. In a similar manner, II [ v ]i,Cm]Representing alternative users viWhether it belongs to the community CmIf the alternative user viBelong to a community CmThen II [ v ]i,Cm]If the alternative user v is 1iNot belonging to community CmThen II [ v ]i,Cm]0. C is a community set. Pθ(Y | G) represents topic engagement at the next stage given the current network architecture αkFunction f representing participation behavior of kth alternative user and self-attribute of alternative userk(xik,yi) Degree of correlation of (d): likewise, βlFunction g for representing participation behavior of alternative user and friend of alternative userl(yi,yj,pafl(vj) In terms of convenience, these parameter sets are denoted by θ in the present invention, i.e., θ ({ α }, { β }, γ) — where { α } is α }kIs β k ═ 1,2,3,4, { β }, andlis 1,2, Z is a normalization factor, which has been ensured to add up to 1, whereby predicting the topic participation of the next stage of alternative users translates into solving the values of the parameter set θ ({ α }, { β }, γ) to maximize the objective functionThe solution of the pattern and how to predict the topic participation of the next stage of the alternative user will be described in detail in the next section.
Fig. 3 shows a flow chart of parameter fitting.
S31: input t-slot network Gt=(Vt,EtX, paf (v), gaf (v)), and inputs an initial value of θ ═ ({ α }, { β }, γ), and an initial value of the learning factor η.
S32: in the prediction model factor graph, the marginal probability of each candidate user, i.e., p (y), is computedi) The value of (c). And predicting according to the edge probability. There are many algorithms for calculating p (y)i) Commonly used are the Junction Tree algorithm (joint Tree algorithm), BP algorithm (Belief Propagation algorithm), LBP algorithm (Loopy Belief Propagation algorithm). The following steps further modify the accuracy of the prediction:
s33 calculating the variation degree of each parameter according to the marginal probability, αkThe parameters are further illustrated by examples:
Figure GDA0001017491620000151
wherein, E [ f ]k(xik,yi)]F representing real data substituted into t +1 time periodk(xik,yi) The factor function expects a value. Ep(yi|G)[fk(xik,yi)]F representing calculated edge probability substituted into S33 partk(xik,yi) Factor function expectation value the same gradient descent algorithm calculates β the degree of change of gamma.
S34 updating α with gradient descent algorithmk、βlGamma, here at αkTaking parameters as an example, the formula is as follows:
Figure GDA0001017491620000152
s35, update is finished αk、βlAnd after the gamma parameter, judging whether convergence occurs. The convergence conditions may be differentIn the method of the invention, the convergence is regarded as that the variation value of each parameter is smaller than the threshold value. If the convergence is transferred to step S36, if the convergence is not reached, the edge probability is recalculated.
At S36, the converged θ is output as the value ({ α }, { β }, γ).
S4, the network G that can obtain the fitted value of θ ({ α }, { β }, γ) and any time period tt=(Vt,EtX, paf (V), gaf (V)), calculating the edge probability p (y) according to the fitted network correction prediction model factor graphi) The value of (2) can be used to obtain a prediction result.
The participation heat of the topic in the next time period can be analyzed through the predicted result, namely how many people participate in the discussion of the topic in the next time period.
According to the method and the system, the related attributes of the users are analyzed by utilizing the interactive data of the hot topics in the social network, the fans of the users who have participated in the topics are used as alternative users, and whether the alternative users can also participate in the discussion of the topics in the future time period is predicted. Firstly, aiming at the complex dynamic cause of user behavior, time dispersion and time slicing methods are utilized, the influence of timeliness factors of topic information is added, and the correlation factor functions of the three parties are defined respectively from the three aspects of the alternative user, the alternative user friend and the community to which the alternative user belongs. And then, according to the characteristic that the information is spread to have Markov property, a prediction model participating in the hot topic is constructed by utilizing the Markov random field theory basic thought and method, so that the participation behavior of the alternative user can be dynamically predicted, and the future participation heat of the hot topic is analyzed.
It should be noted that the above-mentioned specific examples, while enabling those skilled in the art and readers to more fully understand the manner in which the present invention may be practiced, are to be construed as being without limitation to such specific statements and examples. Therefore, although the present invention has been described in detail with reference to the drawings and examples, it will be understood by those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention.

Claims (6)

1. A prediction method for users to participate in hot topics is characterized in that a data source obtaining module obtains information and past behavior data of all participants and fans of the participants and attention and concerned information among the fans through a social network public API or a download data source, deletes repeated data, cleans invalid nodes, time-slices are conducted on the data to find out alternative users, and a network is established according to friend relations among the alternative users; the attribute extraction module extracts the attributes according to the formula:
Figure FDA0002317873880000011
calculating a factor function f associated with an alternative userk(xik,yi) According to the formula:
Figure FDA0002317873880000012
calculating factor function g related to alternative user friendsl(yi,yj,pafl(vj) According to the formula:
Figure FDA0002317873880000013
calculating a factor function h (y) of the community to which the alternative user belongsi,gaf(vi,Cm) Wherein x isik≠0∩yi1 denotes an alternative user viK-th attribute value x of (2)ikNot 0 and the alternative user may participate in the topic during the t +1 time period, yiRepresenting alternative users viWhether or not to participate in the discussion of the topic in the next time period,
Figure FDA0002317873880000014
is a half-life function, pafl(vj) Representing alternative users vjGaf (v)i,Cm) Watch (A)Attribute of community to which alternative user belongs, CmRepresents the m-th community; alternative user V extracted according to t time periodtSet of friend relationships E between alternative userstThe attribute sets X, paf (V) and gaf (V) of the alternative users, friends and belonged communities are established on the input network G in the time period tt=(Vt,EtAnd X, paf (V) and gaf (V)), constructing a prediction model factor graph, calculating the marginal probability of each alternative user, obtaining the prediction whether the alternative user can participate in the topic in the next time period according to the marginal probability, obtaining the social network and data flow information to which the alternative user belongs at the next time according to the prediction result, and adjusting the network structure.
2. The prediction method according to claim 1, wherein the candidate user self-attributes comprise: whether the alternative user is an active user; whether keywords consistent with the hot topics exist in the labels of the alternative users or not; some concerned users of the alternative users are users who already participate in the topic; the topic of the concerned user of the alternative user is always dynamic.
3. The prediction method of claim 1, wherein the alternative user self-attributes comprise: whether the alternative user is an active user, whether keywords consistent with the hot topic exist in the label of the alternative user, several concerned users of the alternative user are users already participating in the topic, and the topic of the concerned user of the alternative user is motivated, and determining whether the alternative user is an active user further comprises the following steps: according to the formula:
Figure FDA0002317873880000021
judging alternative users viWhether it is an active user, where xi1(vi) Representing alternative users viActivity (v) of the 1 st attribute of (1)i) Representing alternative users viIs the user activity index ranking threshold, κ.
4. The prediction method according to claim 3, characterized in thatAccording to the formula: activity (v)i)=ρ*Ε[origNum(vi)]+Ε[retwNum(vi)]Determining alternative users viActivity index of (v)i) Therein, Ε [ orignnum (v)i)],Ε[retwNum(vi)]Respectively alternative users viThe average daily original microblog number and the average daily forwarded microblog number in a preset time period before the topic is launched, and rho is the weakening rate of the original microblog number.
5. The prediction method of claim 2 or 3, wherein determining the topic motivation of the interested user of the alternative user further comprises: according to the formula:
Figure FDA0002317873880000031
determining alternative users viIs always motivated to pay attention to the topic of the user, wherein inf (u)k) Representing a user u of interestkIs dynamic, n represents an alternative user viAccording to the formula:
inf(uk)=ln(Ε[readNum(uk)]+1)+Ε[retNum(uk)]+Ε[comNum(uk)]obtaining attention user ukIs dynamic, where e [ readNum (u)k)],Ε[retNum(uk)],Ε[comNum(uk)]Refer to concerned users u respectivelykAnd creating the browsing number expectation, the forwarding number expectation and the comment number expectation of the original microblog and the forwarding microblog in a preset time period before the topic is launched.
6. The prediction method according to one of claims 1 to 4, wherein the alternative user friend attributes comprise: whether the friend of the alternative user is the authenticated user or not and whether the friend of the alternative user is the opinion leader or not are determined by a formula:
Figure FDA0002317873880000032
determining alternative users viWhether it is an opinion leader, deg (v)i) Representing alternative users viCharacteristic value of vermicelli, i.e. characteristic value deg (v) of vermicellii) Comprises the following steps: deg (v)i)=σ×[fans(vi)-mutfans(vi)]+mutfans(vi) Wherein, fans (v)i),mutfans(vi) Respectively represent alternative users viThe number of fans of the list is the number of friends each other, zeta is a fan characteristic value ranking threshold, and sigma is the quantity difference of the fan quantity reducing characteristic values.
CN201610083734.XA 2016-02-07 2016-02-07 Prediction method for user participating in hot topics in social network Active CN105809554B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610083734.XA CN105809554B (en) 2016-02-07 2016-02-07 Prediction method for user participating in hot topics in social network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610083734.XA CN105809554B (en) 2016-02-07 2016-02-07 Prediction method for user participating in hot topics in social network

Publications (2)

Publication Number Publication Date
CN105809554A CN105809554A (en) 2016-07-27
CN105809554B true CN105809554B (en) 2020-03-17

Family

ID=56466277

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610083734.XA Active CN105809554B (en) 2016-02-07 2016-02-07 Prediction method for user participating in hot topics in social network

Country Status (1)

Country Link
CN (1) CN105809554B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11775813B2 (en) * 2019-06-19 2023-10-03 Adobe Inc. Generating a recommended target audience based on determining a predicted attendance utilizing a machine learning approach

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107870957A (en) * 2016-09-28 2018-04-03 郑州大学 A kind of popular microblogging Forecasting Methodology based on information gain and BP neural network
CN106557552B (en) * 2016-10-27 2020-08-21 国家计算机网络与信息安全管理中心 Network topic heat prediction method
CN106651016B (en) * 2016-12-13 2020-08-04 重庆邮电大学 System and method for dynamically predicting user behavior under hot topics
CN106682770B (en) * 2016-12-14 2020-08-04 重庆邮电大学 Dynamic microblog forwarding behavior prediction system and method based on friend circle
CN106649714B (en) * 2016-12-21 2020-08-04 重庆邮电大学 TopN recommendation system and method for data nonuniformity and data sparsity
CN107358534A (en) * 2017-06-29 2017-11-17 浙江理工大学 The unbiased data collecting system and acquisition method of social networks
CN110134788B (en) * 2019-05-16 2021-05-11 杭州师范大学 Microblog release optimization method and system based on text mining
CN110825980B (en) * 2019-11-05 2022-07-01 重庆邮电大学 Microblog topic pushing method based on countermeasure generation network
CN110851684B (en) * 2019-11-12 2022-10-04 重庆邮电大学 Social topic influence recognition method and device based on ternary association graph
CN110825972B (en) * 2019-11-12 2022-10-25 重庆邮电大学 Hot topic key user discovery method based on field differentiation
CN111130996A (en) * 2019-12-16 2020-05-08 深圳市微购科技有限公司 View information sharing method and device and computer readable storage medium
CN111143566A (en) * 2019-12-27 2020-05-12 北京工业大学 Method for predicting hot event outbreak aiming at twitter

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092921A (en) * 2012-12-26 2013-05-08 中国科学院深圳先进技术研究院 Dynamic prediction method facing microblog hot topic community and system
CN104216954A (en) * 2014-08-20 2014-12-17 北京邮电大学 Prediction device and prediction method for state of emergency topic
CN104408108A (en) * 2014-11-18 2015-03-11 重庆邮电大学 Hot topic group influence analysis system and method based on grey system theory
CN105224608A (en) * 2015-09-06 2016-01-06 华南理工大学 The hot news Forecasting Methodology analyzed based on microblog data and system
CN105243448A (en) * 2015-10-13 2016-01-13 北京交通大学 Method and device for predicting evolution trend of internet public opinion

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150356571A1 (en) * 2014-06-05 2015-12-10 Adobe Systems Incorporated Trending Topics Tracking

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103092921A (en) * 2012-12-26 2013-05-08 中国科学院深圳先进技术研究院 Dynamic prediction method facing microblog hot topic community and system
CN104216954A (en) * 2014-08-20 2014-12-17 北京邮电大学 Prediction device and prediction method for state of emergency topic
CN104408108A (en) * 2014-11-18 2015-03-11 重庆邮电大学 Hot topic group influence analysis system and method based on grey system theory
CN105224608A (en) * 2015-09-06 2016-01-06 华南理工大学 The hot news Forecasting Methodology analyzed based on microblog data and system
CN105243448A (en) * 2015-10-13 2016-01-13 北京交通大学 Method and device for predicting evolution trend of internet public opinion

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11775813B2 (en) * 2019-06-19 2023-10-03 Adobe Inc. Generating a recommended target audience based on determining a predicted attendance utilizing a machine learning approach

Also Published As

Publication number Publication date
CN105809554A (en) 2016-07-27

Similar Documents

Publication Publication Date Title
CN105809554B (en) Prediction method for user participating in hot topics in social network
Xia et al. Reciprocal recommendation system for online dating
CN106682991B (en) Information propagation model based on online social network and propagation method thereof
CN106682770B (en) Dynamic microblog forwarding behavior prediction system and method based on friend circle
Pham et al. A general graph-based model for recommendation in event-based social networks
CN106651030B (en) Improved RBF neural network hot topic user participation behavior prediction method
Liu et al. C-RBFNN: A user retweet behavior prediction method for hotspot topics based on improved RBF neural network
CN103064917B (en) The high-impact customer group of a kind of specific tendency towards microblogging finds method
CN110795641B (en) Network rumor propagation control method based on representation learning
CN102646122B (en) Automatic building method of academic social network
CN111222029A (en) Method for selecting key nodes in network public opinion information dissemination
CN106780071B (en) Online social network information propagation modeling method based on multi-mode hybrid model
Zhi et al. Dynamic truth discovery on numerical data
Lin et al. Steering information diffusion dynamically against user attention limitation
Xiao et al. User behavior prediction of social hotspots based on multimessage interaction and neural network
Pérez-Rosés et al. Synthetic generation of social network data with endorsements
Dai et al. ICS-SVM: A user retweet prediction method for hot topics based on improved SVM
Huang et al. Information fusion oriented heterogeneous social network for friend recommendation via community detection
Bródka A method for group extraction and analysis in multilayer social networks
Lu et al. Collective human behavior in cascading system: discovery, modeling and applications
Aylani et al. Community detection in social network based on useras social activities
Kadge et al. Graph based forecasting for social networking site
Lin et al. Analysis and comparison of interaction patterns in online social network and social media
Zygmunt Role identification of social networkers
Li et al. Influence maximization in multiagent systems by a graph embedding method: dealing with probabilistically unstable links

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant