WO2016041376A1 - 社交网络中预测信息传播的方法及设备 - Google Patents

社交网络中预测信息传播的方法及设备 Download PDF

Info

Publication number
WO2016041376A1
WO2016041376A1 PCT/CN2015/079877 CN2015079877W WO2016041376A1 WO 2016041376 A1 WO2016041376 A1 WO 2016041376A1 CN 2015079877 W CN2015079877 W CN 2015079877W WO 2016041376 A1 WO2016041376 A1 WO 2016041376A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
clusters
probability distribution
determining
feature
Prior art date
Application number
PCT/CN2015/079877
Other languages
English (en)
French (fr)
Inventor
杨洋
梁颖琪
唐杰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201510131640.0A external-priority patent/CN106156030A/zh
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP15841322.9A priority Critical patent/EP3159809A4/en
Publication of WO2016041376A1 publication Critical patent/WO2016041376A1/zh
Priority to US15/460,247 priority patent/US10860941B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/01Social networking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0241Advertisements

Definitions

  • Embodiments of the present invention relate to the field of information processing, and, more particularly, to a method and apparatus for predicting information propagation in a social network.
  • the information dissemination model calculates the help or influence of different factors on information dissemination through the propagation records of different information on the network, including path, scope and/or speed, and then predicts the propagation path of new information or existing information.
  • the most important factor of information dissemination is the influence of different network nodes on the information dissemination process, and it is widely used in different types of networks such as social networks, communication networks, computer networks, and the Internet.
  • networks such as social networks, communication networks, computer networks, and the Internet.
  • social networks one of the most important applications of the information dissemination model is to find the most valuable users (network nodes), such as the most influential users of other users, the most influential users, and the users who spread the fastest messages. and many more.
  • the embodiment of the invention provides a method for predicting information propagation in a social network, which has high computational efficiency.
  • a method for predicting information dissemination in a social network comprising:
  • the influence of the K clusters includes the information transmission success rate of the K clusters, and K is a positive integer;
  • the target information is advertised or forwarded by the first user at an initial time
  • the method further includes: outputting a condition that meets a preset condition The account number of the second user
  • the preset condition is that the probability of forwarding the target information is greater than a preset probability threshold.
  • the method before the obtaining the target information to be predicted and acquiring the influence of the K clusters, the method further includes:
  • the user feature database includes feature attributes of existing users
  • the influence of the K clusters is obtained by using a learning method.
  • the determining a role probability distribution of the first user includes:
  • the method before the obtaining the target information to be predicted and acquiring the influence of the K clusters, the method further include:
  • the first user belongs to the existing user
  • Determining the role probability distribution of the first user including:
  • the first user does not belong to the existing user
  • Determining the role probability distribution of the first user including:
  • the determining, according to the role probability distribution of the N third users, determining the role of the first user Probability distribution including:
  • the role probability distribution of the first user is an arithmetic mean of the role probability distributions of the N third users.
  • the method of learning is a method of machine learning or a method of statistical learning.
  • a probability distribution of the role determining a probability that the second user forwards the target information from the first user, including:
  • the expected value of the propagation probability is used as a probability that the second user forwards the target information from the first user.
  • the influence of the K clusters further includes information about the K clusters.
  • Spreading the time delay rate the method further includes:
  • the second user is not propagated among the first user's followers The user of the target information.
  • the determining, by the second user that the target information is not propagated includes:
  • a device for predicting information dissemination in a social network comprising:
  • An obtaining unit configured to acquire target information to be predicted and obtain influence of K clusters, where The target information is published or forwarded by the first user, and the K clusters are used to represent K categories of the feature attributes of the user, and the influence of the K clusters includes the information propagation success rate of the K clusters.
  • K is a positive integer
  • a determining unit configured to determine a role probability distribution of the first user, and determine a second user that does not propagate the target information acquired by the acquiring unit, where the role probability distribution of the first user is used to represent Describe the probability that the first user belongs to the K clusters respectively;
  • the determining unit is further configured to: determine, according to the influence of the K clusters acquired by the acquiring unit and the role probability distribution of the first user, that the second user forwards the target from the first user The probability of information.
  • the target information is advertised or forwarded by the first user at an initial time
  • the device further includes:
  • An output unit configured to output an account of the second user that meets a preset condition
  • the preset condition is that the probability of forwarding the target information is greater than a preset probability threshold.
  • the acquiring unit is further configured to obtain an information propagation record, a user, from the social network.
  • a relational database and a user feature database wherein the information dissemination record includes a historical propagation record of existing information, the user relationship database includes a relationship of interest between existing users, and the user feature database includes feature attributes of an existing user ;
  • the determining unit is further configured to obtain, according to the user feature database, a feature attribute of the K clusters and the K clusters by using a soft clustering algorithm, where the K clusters are according to the existing K categories determined by the user's feature attributes;
  • the determining unit is further configured to obtain the influence of the K clusters by using a learning method according to the information propagation record and the user relationship database.
  • the determining unit is specifically configured to:
  • the feature attribute of the first user is represented as an AT
  • the determining unit is specifically configured to:
  • the determining unit is further configured to:
  • the first user belongs to the existing user
  • the determining unit is specifically configured to:
  • the first user does not belong to the existing user
  • the determining unit is specifically configured to:
  • the determining unit is specifically configured to:
  • the role probability distribution of the first user is an arithmetic mean of the role probability distributions of the N third users.
  • the method of learning is a method of machine learning or a method of statistical learning.
  • the determining unit is specifically configured to:
  • the expected value of the propagation probability is used as a probability that the second user forwards the target information from the first user.
  • the influence of the K clusters further includes information about the K clusters.
  • the determining unit is further configured to:
  • the second user is not propagated among the first user's followers The user of the target information.
  • the determining unit is specifically configured to:
  • the influence of the K clusters can be used to predict the propagation of the target information in the social network, and the calculation method has a small calculation amount and high calculation efficiency.
  • FIG. 1 is a flow chart of a method for predicting information dissemination in a social network according to an embodiment of the present invention.
  • FIG. 2 is a flow chart of a method for predicting information dissemination in a social network according to another embodiment of the present invention.
  • FIG. 3 is a block diagram of an apparatus for predicting information propagation in a social network in accordance with an embodiment of the present invention.
  • FIG. 4 is a block diagram of an apparatus for predicting information propagation in a social network in accordance with another embodiment of the present invention.
  • a social network can be understood as an online community.
  • the number of users of a social network is huge.
  • the number of users may be hundreds or thousands, or millions or even more.
  • weibo Mobo (Microbolog), WeChat, WeChat, MiTalk, Facebook, Twitter, and LinkedIn.
  • Social network can use the "user relationship database” to record the relationship between users.
  • the "user relationship database” includes the relationship of interest between existing users. Specifically, relationships can be established between users by following. For example, User A is concerned with User B, then User A is User B's follower (follower). Alternatively, it may also be referred to as User A being a fan of User B.
  • a social network such as Weibo
  • User A is the follower of User B, but User B is not necessarily the Follower of User A.
  • user A is the follower of user B, and user B must also be the follower of user A, and may also be referred to as user A and user B as friends.
  • a triplet or a dual group may be used in a social network to represent relationships between users.
  • the first item of the triplet may be the first user ID
  • the second item may be the second user ID
  • the third item may indicate whether the first user is concerned about the second user.
  • the third item is 1 for concern
  • the third is 0 for no attention.
  • every two users can be represented by two triples ⁇ A, B, 1> and ⁇ B, A, 0>.
  • ⁇ A, B, 1> indicates that user A is concerned with user B
  • ⁇ B, A, 0> indicates that user B has no interest.
  • the "user relationship database” in the social network such as Weibo can be represented by M x (M-1) triples.
  • the "user relationship database” may include only the triples whose third item is 1, and the number of stored triples may be much smaller than M x (M-1).
  • the "user relationship database” may include only a two-group, which may be understood as the first two items of the triple of the foregoing third item, and the number of stored two-groups may be much smaller than M. ⁇ (M-1). This saves storage space.
  • the first item of the triplet may be the first user ID
  • the second item may be the second user ID
  • the third item may indicate whether the first user and the second user are friends.
  • the third item is 1 for a friend
  • the third is 0 for a friend.
  • the triples ⁇ A, B, 1> indicate that User A and User B are friends. That is, user A is the follower of user B, and user B is also the follower of user A.
  • the "user relationship database" in a social network such as WeChat can be M! /2 ⁇ (M-2)! A triplet is indicated.
  • the "user relationship database” can only include triples with a third item of 1, so the number of stored triples can be much smaller than M! /2 ⁇ (M-2)! .
  • the "user relationship database” may include only a two-group, which may be understood as the first two items of the triple of the foregoing third item, and the number of stored two-groups may be much smaller than M. ! /2 ⁇ (M-2)! . This saves storage space.
  • a relationship between users can be represented by a four-tuple in a social network.
  • the first item of the quad group may be the first user ID
  • the second item may be the second user ID
  • the third item may indicate whether the first user is concerned about the second user
  • the fourth item may represent Whether the second user is following the first user.
  • the third item and the fourth item can be represented by 0 or 1.
  • ⁇ A, B, 1, 0> indicates that User A is paying attention to User B, but User B is not paying attention to User A
  • ⁇ A, B, 1, 1> indicates that user A is paying attention to user B, and user B is paying attention to user A.
  • the "user relationship database" in the social network can be M! /2 ⁇ (M-2)! A four-tuple representation.
  • the "user relationship database” may include only the fourth and fourth items of at least one quad, so the number of stored quaternions may be much smaller than M! /2 ⁇ (M-2)! .
  • Social information can be used to record the spread of existing information.
  • “Information dissemination records” include historical dissemination records of existing information. Historical records can include historical propagation paths and time. Specifically, it is possible to record that a certain user posts (post or tweet) a certain information at a certain time, or that a certain user forwards (forward or repost or retweet) a certain information from another user at a certain time.
  • the "information propagation record" can be represented by a four-tuple in the social network.
  • the first item of the quad group may be the first user ID
  • the second item may be the second user ID
  • the third item may be the time
  • the fourth item may be the information ID.
  • ⁇ A, B, t1, m1> indicates that the user A has forwarded the information of the information ID m1 from the user B at time t1.
  • the first item of the quad can be the first user ID
  • the second item can be empty or negative
  • the third item can be the time
  • the fourth item can be the information ID.
  • ⁇ A,, t1, m1> or ⁇ A, -100, t1, m1> indicates that the user A has issued the information that the information ID is m1 at time t1.
  • the embodiment of the present invention does not limit the form of information.
  • the information may be in the form of text, or the information may be in the form of audio or video, or the information may be in the form of a web page link, and the like.
  • the user profile database can be used to record the user's feature attributes in the social network.
  • the "user profile database” includes the feature attributes of existing users.
  • the feature attributes may include personal attributes, network attributes, and behavior attributes.
  • Personal attributes may include basic attributes of the user, such as age, gender, place of birth, occupation, and the like.
  • Network attributes may include the importance, centrality, structural hole characteristics, etc. of the user in the social network. For example, the importance can be expressed by the PageRank value, the centrality can be expressed by the degree of penetration and the degree of entry, and the structural hole (Structural Hole) can be represented by the Network Constraint index.
  • Behavioral attributes may include the activity of the user's behavior on the social network, where behavior on the social network may include posting, forwarding, commenting, and the like.
  • network attributes are related to the "user relationship database.” Specifically, the network attribute can be calculated according to the “user relationship database”.
  • the centrality in a social network such as Weibo, the centrality can be expressed by two values of degree of out and entry, and generally the degree of outreach is not equal to the degree of entry.
  • the centrality in a social network such as WeChat, the centrality can be expressed by a value of degree or degree of entry, and the degree of outreach is equal to the degree of entry.
  • the degree of entry and the degree of entry are equal to the number of friends, that is, the centrality can also be used by friends. The quantity is expressed.
  • the behavior attribute is related to the "information propagation record”. Specifically, the behavior attribute can be calculated according to the "information propagation record”.
  • the activity is related to the number of behaviors of the user in a unit time, and the activity may be a value. The larger the value, the higher the activity.
  • the activity may be represented by 5 integers from 1 to 5. 5 means very active, 4 means generally active, 3 means active, 2 means inactive, 1 means very inactive. Or, for example, the activity may be expressed as a percentage of 0 to 1. 80% said they were active, 50% said they were generally active, and 20% said they were not active.
  • the feature attribute of each user in the "user feature database” can be represented by an H-dimensional feature vector.
  • the user feature database may include M H-dimensional feature vectors.
  • the age of user A is 20 years old, the gender is female, and the birth place is Beijing.
  • the occupation is a doctor, the importance is 0.65, the number of friends is 50, the activity is 4, the activity of forwarding is 2, and the activity of the comment is 4.
  • H in the "user feature database” may be larger or smaller, that is, the dimension of the feature attribute of the user may be larger or smaller, which is not limited by the present invention.
  • the method includes:
  • K clusters are used to represent K categories of feature attributes of the user.
  • the influence of the K clusters includes the information propagation success rate of the K clusters, and K is a positive integer.
  • the influence of the K clusters can be used to predict the propagation of the target information in the social network, and the calculation method has a small calculation amount and high calculation efficiency.
  • the target information to be predicted in 101 may be the first
  • the user posting or forwarding at the initial moment may be expressed in the form of a quad.
  • the initial moment may be marked as the first moment.
  • ⁇ ID of the first user, the first time, ID of the target information> indicates that the first user issues the target information at the first time.
  • ⁇ ID of the first user, ID of the source user, first time, ID of the target information> indicates that the first user forwards the target information from the source user at the first moment.
  • the second user may be a user to be predicted.
  • the second user may be user A.
  • the second user may be a user whose age attribute is 30 years old and has not propagated the target information.
  • the second user may be a user who does not propagate the target information among the followers of the first user. Then, the first user's follower may be determined according to the user relationship database; and the second user is determined from the first user's followers, wherein the second user does not propagate the target information.
  • the number of the second user is not limited in the embodiment of the present invention.
  • the second user can be one or more.
  • the influence of the K clusters may be obtained through training. Then, in 101, the influence of K clusters can be obtained according to the training result.
  • the method further includes: acquiring an information propagation record, a user relationship database, and a user feature database from the social network, wherein the information propagation record includes a historical propagation record of the existing information, the user relationship database Include a relationship of interest between existing users, the user feature database includes feature attributes of an existing user; and according to the user feature database, using a soft clustering algorithm, obtaining characteristics of the K clusters and the K clusters Attributes:
  • the learning method is used to obtain the influence of the K clusters.
  • the K clusters are K categories determined according to the feature attributes of the existing user. K is a positive integer.
  • the information dissemination record, the user relationship database, and the user feature database are as described above. To avoid repetition, details are not described herein.
  • the soft clustering algorithm may also be referred to as a fuzzy clustering algorithm, and may be, for example, a fuzzy C-Means Algorithm (FCMA or FCM) and a probabilistic hybrid model.
  • K clusters can also be called K class or K roles.
  • the number of existing users is M
  • the user feature database includes the feature attributes of M existing users.
  • the K clusters are obtained by using the soft clustering algorithm, and the M existing users can be clustered into K clusters according to the similarity of the characteristic attributes of the M existing users.
  • K is much smaller than M.
  • the method of the embodiment of the present invention has a small amount of calculation, and thus the calculation efficiency of the method is high.
  • the feature attribute of one of the K clusters may be a representative feature attribute of the one cluster.
  • the representative feature attribute of the one cluster may be a feature attribute of a center point of the one cluster, or the representative feature attribute of the one cluster may be a feature attribute of a user closest to the center point in the one cluster.
  • the center point of the one cluster may be defined as the mean value of the feature attributes of all users belonging to the one cluster.
  • the feature attributes of the K clusters may be represented by K H-dimensional feature vectors.
  • the feature attributes of the K clusters can be represented by a matrix of K x H.
  • determining the role probability distribution of the first user in 102 may include: acquiring a feature attribute of the first user; determining, according to the feature attribute of the first user and the feature attribute of the K clusters, The role probability distribution of the first user.
  • the role probability distribution of the first user may be determined according to a distance between a feature attribute of the first user and a feature attribute of the K clusters.
  • the role probability distribution of the first user can be represented by a K-dimensional vector form composed of the above K values.
  • the method further includes: determining, according to the feature database of the user feature and the feature attributes of the K clusters, a role probability distribution of the existing user, where The role probability distribution of the existing users is used to indicate the probability that the existing users belong to the K clusters respectively.
  • the role probability distribution of the existing user may be determined according to the feature attributes of the existing user and the feature attributes of the K clusters.
  • the feature attributes of the K clusters can be obtained according to the method in the foregoing embodiment. To avoid repetition, details are not described herein again.
  • the feature attribute of the user B in the existing user is the feature vector BT
  • represents a modulus or a norm.
  • may be an infinite norm
  • determining the role probability distribution of the first user in the method may include: acquiring the first user from the role probability distribution of the existing user. Role probability distribution.
  • determining the role probability distribution of the first user in the method may include: acquiring a feature attribute of the first user; and according to the feature of the first user Attributes, the attribute attributes of the N third users are obtained from the user feature database, wherein the N third users belong to the existing user, and the attribute attributes of the N third users are the first
  • the distance between the feature attributes of the user is less than a preset distance threshold, and N is a positive integer.
  • the role probability distribution of the N third users is obtained from the role probability distribution of the existing user; A role probability distribution of the third user determines a role probability distribution of the first user.
  • determining the role probability distribution of the first user according to the role probability distribution of the N third users may include: determining that the role probability distribution of the first user is the N third users The arithmetic mean of the role probability distribution.
  • embodiments of the present invention are capable of solving the cold start problem.
  • the process of predicting can be implemented later, that is, the embodiment of the present invention can solve the data sparse problem.
  • the influence of the K clusters is determined through training, that is, according to:
  • the information dissemination record and the user relationship database are obtained by using a learning method to obtain the influence of the K clusters, which may be: combining the role probability of the existing user according to the information propagation record and the user relationship database.
  • Distribution calculate the influence of existing users belonging to each cluster on the forwarding behavior of the followers of existing users, and then learn the influence of K clusters in the information dissemination process.
  • the method for learning may be a method of machine learning or a method of statistical learning, which is not limited by the present invention.
  • the influence of the K clusters may include information transmission success rates of the K clusters.
  • the information propagation success rate can be expressed by an impact factor, that is, the influence of the K clusters can include the impact factors of the K clusters.
  • the impact factor of K clusters can be understood as the success rate of K clusters in the information dissemination process.
  • the influence of K clusters can be represented by a vector of K dimensions.
  • the influence of the K clusters may include an information propagation success rate and an information propagation time delay rate of the K clusters.
  • the influence includes an impact factor and a time delay. That is, the influence of the K clusters may include the impact factor of the K clusters and the time delay of the K clusters.
  • the impact factor of K clusters can be understood as the success rate of K clusters in the process of information dissemination.
  • the time delay of K clusters can be understood as the probability of delay of K clusters to one moment in the information propagation process. Then, the influence of K clusters can be represented by a matrix of K ⁇ 2.
  • the influence factor and the time delay may be values between 0 and 1, the larger the value, the greater the influence.
  • the impact factor and the time delay may be an integer value between 1 and 5, and the larger the integer value, the greater the influence. The invention is not limited thereto.
  • it may be: setting an approximate function of the information dissemination record data, and optimizing the approximate function according to the information propagation record, the user relationship database, and the user feature database, thereby determining The influence of K clusters.
  • the approximate function can be defined as the following formula (1):
  • I represents the total number of information
  • T represents the largest moment
  • H represents the dimension of the user's feature vector
  • K represents the number of clusters
  • V represents a collection of all users.
  • a it represents the set of users that have propagated information i at time t
  • D iT represents the set of users that have propagated information i at time T
  • x uh represents the value of the hth component of the feature vector of the user u
  • ⁇ uk represents the probability that user u belongs to the kth cluster during information propagation.
  • ⁇ k and ⁇ k represent the influence of the kth cluster
  • ⁇ k represents the influence factor (success rate) of the kth cluster
  • ⁇ k represents the time delay of the kth cluster.
  • ⁇ kh represents the mean (mean) of the hth component of the feature vector of all users belonging to the kth cluster
  • ⁇ kh represents the accuracy of the hth component of the feature vector of all users belonging to the kth cluster ( Precision).
  • logP(v ⁇ A it ) represents the probability that the user participates in the propagation of information i at time t
  • logP(x uh ) represents the probability of the hth feature vector of user u.
  • the approximate function can be optimized, and ⁇ uk , ⁇ k , ⁇ k , ⁇ kh , and ⁇ kh can be determined by the existing generated model parameter learning method.
  • the generating model parameter learning method may be a Gibbs Sampling method or a variational method.
  • the expected value of the propagation probability of the second user forwarding the target information may be calculated according to the influence of the K clusters and the role probability of the first user by using a statistical method. And taking the expected value of the propagation probability as a probability that the second user forwards the target information from the first user.
  • the method of FIG. 1 may further include: determining, according to the influence of the K clusters and the role probability distribution of the first user, the second user from the first user The time at which the target information is forwarded.
  • the second user may be forwarded from the first user by using Bayesian Theory according to the influence of the K clusters and the role probability distribution of the first user. The moment of information.
  • the expected value of the propagation time of the second user to forward the target information may be calculated according to the influence of the K clusters and the role probability of the first user by using a statistical method. And the expected value of the propagation time is used as a time when the second user forwards the target information from the first user.
  • determining the time at which the second user forwards the target information from the first user may be performed before or after 103, or may be performed simultaneously with 103, which is not limited by the present invention.
  • a step of determining a probability that the second user forwards the target information from the first user and a step of determining a time at which the second user forwards the target information from the first user, They can be executed independently or cross-coupled.
  • the prior probability of the sample can be expressed as:
  • the embodiment of the present invention approximates the ⁇ function by Stirling's formula.
  • the function ⁇ ( ⁇ ) is defined as:
  • ⁇ 0 , ⁇ 1 , ⁇ 2 and ⁇ 3 are normal-normal gamma prior parameters.
  • the model parameters can be estimated based on the sample results. Specifically, the model parameters can be updated to:
  • the probability of forwarding the target information and the time at which the target information is forwarded can be estimated:
  • ⁇ kh here is a time interval, and the time at which the second user forwards the target information is the initial time that the first user issues or forwards plus the time interval ⁇ kh .
  • the target information is issued or forwarded by the first user at an initial time.
  • the method may include: outputting the second that meets a preset condition.
  • the user's account wherein the preset condition is that the probability of forwarding the target information is greater than a preset probability threshold.
  • the influence of the K clusters further includes an information propagation time delay rate of the K clusters
  • a time at which the second user forwards the target information from the first user may also be determined.
  • the preset condition may further include: the duration between the time when the target information is forwarded and the initial time is less than a preset duration threshold.
  • the embodiment of the present invention does not limit the form of the account, for example, may be an ID, or may be a name.
  • the embodiment of the present invention does not limit the preset probability threshold and the preset duration threshold.
  • the preset probability threshold may be 0.3
  • the preset duration threshold may be 12 hours.
  • the method shown in FIG. 1 predicts the propagation of the target information by the first user's follower for the target information to be predicted. Further, the follower of the first user's follower may also predict the propagation of the target information, and so on. as shown in picture 2.
  • the user X is assumed that the information released at time t 0 m. It can be represented by the quaternion ⁇ X,, t 0 , m>.
  • the method shown in FIG. 2 is a prediction of the dissemination of information in the m start t 0 of preset time threshold.
  • the method shown in Figure 2 includes:
  • the initial condition is that the first user publishes/forwards the information m at the first moment. It can be represented by the quaternion ⁇ X,, t 0 , m>.
  • the first user is user X
  • the first time is t 0 .
  • 203 may refer to 102 in the foregoing embodiment. To avoid repetition, details are not described herein again.
  • the stopping condition may include: the probability that the second user forwards the information m from the first user is less than a preset probability threshold.
  • the stopping condition may include: the time length from the time t 0 to the second time is greater than the time length threshold, and the probability that the second user forwards the information m from the first user is less than a preset probability threshold.
  • the stop condition is: the time from the time t 0 to the second time is greater than the duration threshold, and the probability that the second user forwards the information m from the first user is less than the preset probability threshold.
  • the duration threshold can be equal to 24 hours and the probability threshold can be equal to 0.2.
  • the invention is not limited thereto.
  • the second users determined in 204 are at least one, assuming M1. Then, in 205, it is necessary to judge whether each of the M1 second users satisfies a preset stop condition. And, if it is determined that each of the M1 second users satisfies the preset stop condition, the determination result of 205 is considered to be YES. If it is judged for each of the M1 second users, if one of them does not satisfy the preset stop condition, the judgment result of 205 is considered to be NO. Further, it can be understood that the second user that does not satisfy the preset stop condition among the M1 second users is executed. 206 and 207.
  • the probability that Y1 and Y2 forward m and the second time do not satisfy the preset stop condition. Further, the probability that Y1 forwards m from X is P1, and the second time that Y1 forwards m from X is t1. The probability that Y2 forwards m from X is P2, and the second time that Y2 forwards m from X is t2.
  • the output of 206 is the second user who does not satisfy the preset stop condition in the 205 judgment.
  • the account of the second user may be output.
  • the probability that the second user forwards m is also output; or the probability that the second user forwards m and the second time are output.
  • 206 can output Y1 and Y2.
  • 206 can output two vectors, (Y1, P1, t1) and (Y2, P2, t2). It can be understood that the two vectors output by 206 include three components, the first component represents the account number, the second component represents the probability of forwarding, and the third component represents the time of forwarding.
  • the second replacement first, n is increased by 1.
  • the second user that does not satisfy the preset stop condition in the 205 determination is replaced with the first user, and the second time forwarded by the second user is replaced with the first time.
  • 202 performed after 207 may be that the first user forwarded the information m at the first moment.
  • the initial condition generated by 202 can be represented by the quaternion ⁇ Y1, X, t1, m> and the quaternion ⁇ Y2, X, t2, m>.
  • the prediction process is stopped.
  • the user whose probability of propagating the information m is greater than the preset probability threshold within the preset duration threshold can be obtained from 206.
  • the embodiment of the present invention does not limit the stopping condition.
  • the stop condition may be that the number of iterations is greater than or equal to a preset iteration threshold, ie, the value of n is greater than or equal to a preset iteration threshold.
  • the stop condition may be that the number of users output is greater than a preset number threshold, ie, 206 output The number of second users is greater than a preset number threshold.
  • the embodiment of the present invention does not limit the preset iteration threshold and the preset number threshold.
  • the preset iteration threshold may be 10 in size.
  • the preset number threshold may be 1000 in size.
  • the influence of the K clusters can be used to predict the propagation of information in the social network, and the calculation method of the prediction method is small and the calculation efficiency is high.
  • an enterprise can efficiently make various business decisions. For example, companies want to achieve a certain amount of advertising sales, for example, companies expect a piece of information to spread to at least 1,000 people in a day. Then, the enterprise can set the stop condition according to the expectation, and through the method shown in FIG. 2, assume that the user X is Zhang San, and the information propagation prediction is performed. If the forecast can meet the expectations of the company, then the company can publish information for Zhang San. For example, the published information can be product introduction information for a new product.
  • risk management decision and the like can also be made in time according to the prediction result of information dissemination.
  • FIG. 3 is a block diagram of an apparatus for predicting information propagation in a social network in accordance with an embodiment of the present invention.
  • the device 300 shown in FIG. 3 includes an obtaining unit 301 and a determining unit 302.
  • the obtaining unit 301 is configured to acquire target information to be predicted and acquire the influence of the K clusters, where the target information is published or forwarded by the first user, and the K clusters are used to represent the feature attributes of the user.
  • K categories, the influence of the K clusters includes the information propagation success rate of the K clusters, and K is a positive integer.
  • a determining unit 302 configured to determine a role probability distribution of the first user, and determine a second user that does not propagate the target information acquired by the acquiring unit 301, where the role probability distribution of the first user is used to represent the The probability that the first user belongs to the K clusters respectively.
  • the determining unit 302 is further configured to determine, according to the influence of the K clusters acquired by the acquiring unit 301 and the role probability distribution of the first user, that the second user forwards the target from the first user The probability of information.
  • the influence of the K clusters can be used to predict the propagation of the target information in the social network, and the calculation method has a small calculation amount and high calculation efficiency.
  • the target information is advertised or forwarded by the first user at an initial time
  • the device further includes: an output unit, configured to output the second user that meets a preset condition.
  • an output unit configured to output the second user that meets a preset condition.
  • the preset condition is that the probability of forwarding the target information is greater than a preset probability threshold.
  • the obtaining unit 301 is further configured to obtain, from the social network, an information propagation record, a user relationship database, and a user feature database, where the information propagation record includes a history propagation record of the existing information, where the user relationship database includes an existing user.
  • the user feature database includes feature attributes of existing users.
  • the determining unit 302 is further configured to obtain, according to the user feature database, a feature attribute of the K clusters and the K clusters by using a soft clustering algorithm, where the K clusters are based on the existing users The K categories determined by the feature attributes.
  • the determining unit 302 is further configured to obtain the influence of the K clusters by using a learning method according to the information propagation record and the user relationship database.
  • the determining unit 302 is specifically configured to:
  • the feature attribute of the first user is represented as an AT
  • the determining unit is specifically configured to:
  • the determining unit 302 is further configured to:
  • the first user belongs to the existing user
  • the determining unit 302 is specifically configured to:
  • the first user does not belong to the existing user
  • the determining unit 302 is specifically configured to:
  • the determining unit 302 is specifically configured to: determine that the role probability distribution of the first user is an arithmetic average of the role probability distributions of the N third users.
  • the learning method is a method of machine learning or a method of statistical learning.
  • the determining unit 302 is specifically configured to:
  • the expected value of the propagation probability is used as a probability that the second user forwards the target information from the first user.
  • the influence of the K clusters further includes an information propagation time delay rate of the K clusters
  • the determining unit 302 is further configured to: according to the influence of the K clusters The role probability distribution of the first user determines a time at which the second user forwards the target information from the first user.
  • the foregoing preset condition may further include: the duration between the time when the target information is forwarded and the initial time is less than a preset duration threshold.
  • the second user is a user of the first user's followers that does not propagate the target information.
  • the determining unit is specifically configured to: determine, according to a user relationship database, a follower of the first user; determine, according to a follower of the first user a second user, wherein the second user does not propagate the target information.
  • the device 300 shown in FIG. 3 may be a server of a social network.
  • the device 300 shown in FIG. 3 can implement various processes in the methods shown in FIG. 1 and FIG. 2, and details are not described herein again to avoid repetition.
  • FIG. 4 is a block diagram of an apparatus for predicting information propagation in a social network in accordance with another embodiment of the present invention.
  • the apparatus 400 shown in FIG. 4 includes a processor 401, a receiving circuit 402, a transmitting circuit 403, and a memory 404.
  • the receiving circuit 402 is configured to acquire target information to be predicted and acquire the influence of the K clusters, where the target information is published or forwarded by the first user, and the K clusters are used to represent the characteristic attributes of the user.
  • K categories, the influence of the K clusters includes the information propagation success rate of the K clusters, and K is a positive integer.
  • the processor 401 is configured to determine a role probability distribution of the first user, and determine a second user that does not propagate the target information, where the role probability distribution of the first user is used to indicate that the first user is respectively The probability of belonging to the K clusters.
  • the processor 401 is further configured to determine, according to the obtained influence of the K clusters and the role probability distribution of the first user, a probability that the second user forwards the target information from the first user.
  • the influence of the K clusters can be used to predict the propagation of the target information in the social network, and the calculation method has a small calculation amount and high calculation efficiency.
  • bus system 405 which in addition to the data bus includes a power bus, a control bus, and a status signal bus.
  • bus system 405 various buses are labeled as bus system 405 in FIG.
  • Processor 401 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 401 or an instruction in a form of software.
  • the processor 401 may be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or the like. Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in the memory 404, and the processor 401 reads the information in the memory 404 and completes the steps of the above method in combination with its hardware.
  • the memory 404 in the embodiments of the present invention may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be a read-only memory (ROM), a programmable read only memory (PROM), an erasable programmable read only memory (Erasable PROM, EPROM), or an electric Erase programmable read only memory (EEPROM) or flash memory.
  • the volatile memory can be a Random Access Memory (RAM) that acts as an external cache.
  • RAM Random Access Memory
  • many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (Synchronous DRAM).
  • SDRAM Double Data Rate SDRAM
  • DDR SDRAM Double Data Rate SDRAM
  • ESDRAM Enhanced Synchronous Dynamic Random Access Memory
  • SLDRAM Synchronous Connection Dynamic Random Access Memory
  • DR RAM direct memory bus random access memory
  • the embodiments described herein can be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof.
  • the processing unit can be implemented in one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processing (DSP), Digital Signal Processing Equipment (DSP Device, DSPD), programmable Programmable Logic Device (PLD), Field-Programmable Gate Array (FPGA), general purpose processor, controller, microcontroller, microprocessor, other for performing the functions described herein In an electronic unit or a combination thereof.
  • ASICs Application Specific Integrated Circuits
  • DSP Digital Signal Processing
  • DSP Device Digital Signal Processing Equipment
  • PLD programmable Programmable Logic Device
  • FPGA Field-Programmable Gate Array
  • a code segment can represent a procedure, a function, a subprogram, a program, a routine, a subroutine, a module, a software group, a class, or an instruction, a data structure, or Any combination of program statements.
  • a code segment can be combined into another code segment or hardware circuit by transmitting and/or receiving information, data, arguments, parameters or memory contents. Information, arguments, parameters, data, etc. can be communicated, forwarded, or transmitted using any suitable means including memory sharing, messaging, token passing, network transmission, and the like.
  • the techniques described herein can be implemented by modules (eg, procedures, functions, and so on) that perform the functions described herein.
  • the software code can be stored in a memory unit and executed by the processor.
  • the memory unit can be implemented in the processor or external to the processor, in the latter case the memory unit can be communicatively coupled to the processor via various means known in the art.
  • the target information is sent or forwarded by the first user at an initial time
  • the sending circuit 403 of the device 400 is configured to output the second user that meets a preset condition.
  • the preset condition is that the probability of forwarding the target information is greater than a preset probability threshold.
  • the receiving circuit 402 is further configured to obtain, from the social network, an information propagation record, a user relationship database, and a user feature database, where the information propagation record includes a historical propagation record of the existing information.
  • the user relationship database includes a relationship of interest between existing users
  • the user feature database includes feature attributes of existing users.
  • the processor 401 is further configured to obtain feature attributes of the K clusters and the K clusters by using a soft clustering algorithm according to the user feature database, where the K clusters are based on the existing users
  • the K categorization determined by the feature attribute; the processor 401 is further configured to obtain the influence of the K clusters by using a learning method according to the information propagation record and the user relationship database.
  • the memory 404 can be used to store an information dissemination record, a user relationship database, and a user feature database.
  • the memory 404 is also used to store the feature attributes of the K clusters and the influence of the K clusters.
  • the processor 401 is specifically configured to:
  • the feature attribute of the first user is represented as an AT
  • the determining unit is specifically configured to:
  • the processor 401 is further configured to:
  • the first user belongs to the existing user
  • the processor 401 is specifically configured to: acquire the role of the first user from a role probability distribution of the existing user. Probability distributions.
  • the first user does not belong to the existing user
  • the processor 401 is specifically configured to:
  • the processor 401 is specifically configured to: determine that the role probability distribution of the first user is an arithmetic average of the role probability distributions of the N third users.
  • the learning method is a method of machine learning or a method of statistical learning.
  • the processor 401 is specifically configured to:
  • the expected value of the propagation probability is used as a probability that the second user forwards the target information from the first user.
  • the influence of the K clusters further includes an information propagation time delay rate of the K clusters
  • the processor 401 is further configured to: according to the influence and location of the K clusters Determining a role probability distribution of the first user, determining a time at which the second user forwards the target information from the first user.
  • the foregoing preset condition may further include: the duration between the time when the target information is forwarded and the initial time is less than a preset duration threshold.
  • the second user is a user of the first user's followers that does not propagate the target information.
  • the processor 401 is specifically configured to: determine a follower of the first user according to a user relationship database; and determine the second user from among the followers of the first user. , wherein the second user does not propagate the target information.
  • the apparatus 400 shown in FIG. 4 can implement the various processes in the methods shown in FIG. 1 and FIG. 2, and details are not described herein again to avoid repetition.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. You can choose some of them according to actual needs or All units are used to achieve the objectives of the solution of this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes various media that can store program codes, such as a USB flash drive, a mobile hard disk, a read-only memory ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • General Business, Economics & Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Health & Medical Sciences (AREA)
  • Tourism & Hospitality (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Primary Health Care (AREA)
  • Game Theory and Decision Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种社交网络中预测信息传播的方法,包括:获取待预测的目标信息并获取K个集群的影响力,其中,目标信息是由第一用户在第一时刻发布或转发的,K为正整数(101);确定第一用户的角色概率分布,并确定未传播所述目标信息的第二用户,其中,第一用户的角色概率分布用于表示所述第一用户分别属于所述K个集群的概率(102);根据K个集群的影响力和第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的概率(103)。该预测方法利用K个集群的影响力,能够预测社交网络中目标信息的传播,计算量小,计算效率高,并且能够解决针对新用户的冷启动问题。

Description

社交网络中预测信息传播的方法及设备
本申请要求于2014年9月18日提交中国专利局、申请号为201410478217.3、发明名称为“社交网络中预测信息传播的方法及设备”的中国专利申请的优先权;以及于2015年3月24日提交中国专利局、申请号201510131640.0、发明名称为“社交网络中预测信息传播的方法及设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明实施例涉及信息处理领域,并且更具体地,涉及一种社交网络中预测信息传播的方法及设备。
背景技术
信息传播模型通过不同信息在网络上的传播纪录,包括路径、范围及/或速度,计算不同因素对信息传播的帮助或影响,然后对新信息或现有信息的传播路径做出预测。其中,信息传播最重要的因素在于不同网络节点对信息传播过程的影响力,并在社交网络、通讯网络、计算机网络、互联网等不同类型的网络上都有广泛的应用。在社交网络上,信息传播模型中一个最重要的应用在于找出最有价值的用户(网络节点),例如对其他用户影响力最大的用户、影响范围最广的用户、传播消息最快的用户等等。
目前的信息传播模型一般针对个别用户的影响力建模。以社交网络上的消息转发为例,利用用户的关注者(粉丝)的数量、消息被转发的次数等特征生成用户影响力排名,并将用户之间的边(社交关系)加权,代表一个用户对另一个用户的影响力。但是该信息传播模型需先给定或先学习所有用户之间的边的权重。对于具有庞大用户数据的社交网络来说,学习所有用户之间的边的权重的复杂度大,这样导致该信息传播模型的计算效率低。
发明内容
本发明实施例提供了一种社交网络中预测信息传播的方法,计算效率高。
第一方面,提供了一种社交网络中预测信息传播的方法,所述方法包括:
获取待预测的目标信息并获取K个集群的影响力,其中,所述目标信息是由第一用户发布或转发的,所述K个集群用于表示用户的特征属性的K个分类,所述K个集群的影响力包括所述K个集群的信息传播成功率,K为正整数;
确定所述第一用户的角色概率分布,并确定未传播所述目标信息的第二用户,其中,所述第一用户的角色概率分布用于表示所述第一用户分别属于所述K个集群的概率;
根据所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的概率。
结合第一方面,在第一方面的第一种可能的实现方式中,所述目标信息是由所述第一用户在初始时刻发布或转发的,所述方法还包括:输出满足预设条件的所述第二用户的账号,
其中,所述预设条件为:所述转发所述目标信息的概率大于预设的概率阈值。
结合第一方面或者第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,在所述获取待预测的目标信息并获取K个集群的影响力之前,所述方法还包括:
从所述社交网络获取信息传播记录、用户关系数据库和用户特征数据库,其中,所述信息传播记录包括已有信息的历史传播记录,所述用户关系数据库包括已有用户之间的关注关系,所述用户特征数据库包括已有用户的特征属性;
根据所述用户特征数据库,采用软聚类算法,得到所述K个集群以及所述K个集群的特征属性,其中,所述K个集群是根据所述已有用户的特征属性所确定的K个分类;
根据所述信息传播记录和所述用户关系数据库,采用学习的方法,得到所述K个集群的影响力。
结合第一方面的第二种可能的实现方式,在第一方面的第三种可能的实现方式中,所述确定所述第一用户的角色概率分布,包括:
获取所述第一用户的特征属性;
根据所述第一用户的特征属性与所述K个集群的特征属性,确定所述第一用户的角色概率分布。
结合第一方面的第三种可能的实现方式,在第一方面的第四种可能的实现方式中,所述第一用户的特征属性表示为AT,所述K个集群的特征属性表示为KTj,j=1,2,...,K;
确定所述第一用户的角色概率分布为与所述K个集群对应的K个值,其中,所述K个值分别为
Figure PCTCN2015079877-appb-000001
结合第一方面的第二种可能的实现方式,在第一方面的第五种可能的实现方式中,在所述获取待预测的目标信息并获取K个集群的影响力之前,所述方法还包括:
根据所述用户特征数据库和所述K个集群的特征属性,确定所述已有用户的角色概率分布,其中,所述已有用户的角色概率分布用于表示所述已有用户分别属于所述K个集群的概率。
结合第一方面的第五种可能的实现方式,在第一方面的第六种可能的实现方式中,所述第一用户属于所述已有用户,
所述确定所述第一用户的角色概率分布,包括:
从所述已有用户的角色概率分布中,获取所述第一用户的角色概率分布。
结合第一方面的第五种可能的实现方式,在第一方面的第七种可能的实现方式中,所述第一用户不属于所述已有用户,
所述确定所述第一用户的角色概率分布,包括:
获取所述第一用户的特征属性;
根据所述第一用户的特征属性,从所述用户特征数据库中获取N个第三用户的特性属性,其中,所述N个第三用户属于所述已有用户,所述N个第三用户的特性属性与所述第一用户的特征属性之间的距离小于预设的距离阈值,N为正整数;
从所述已有用户的角色概率分布中,获取所述N个第三用户的角色概率分布;
根据所述N个第三用户的角色概率分布,确定所述第一用户的角色概率分布。
结合第一方面的第七种可能的实现方式,在第一方面的第八种可能的实现方式中,所述根据所述N个第三用户的角色概率分布,确定所述第一用户的角色概率分布,包括:
确定所述第一用户的角色概率分布为所述N个第三用户的角色概率分布的算术平均。
结合第一方面的第二种可能的实现方式至第一方面的第八种可能的实现方式中的任一种可能的实现方式,在第一方面的第九种可能的实现方式中,所述学习的方法为机器学习的方法或统计学习的方法。
结合第一方面或者上述第一方面的任一种可能的实现方式,在第一方面的第十种可能的实现方式中,所述根据所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的概率,包括:
根据所述K个集群的影响力和所述第一用户的角色概率分布,采用统计的方法,计算所述第二用户转发所述目标信息的传播概率的期望值;
将所述传播概率的期望值作为所述第二用户从所述第一用户转发所述目标信息的概率。
结合第一方面或者上述第一方面的任一种可能的实现方式,在第一方面的第十一种可能的实现方式中,所述K个集群的影响力还包括所述K个集群的信息传播时间延迟率,所述方法还包括:
根据所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的时刻。
结合第一方面或者上述第一方面的任一种可能的实现方式,在第一方面的第十二种可能的实现方式中,所述第二用户为所述第一用户的关注者中未传播所述目标信息的用户。
结合第一方面的第十二种可能的实现方式,在第一方面的第十三种可能的实现方式中,所述确定未传播所述目标信息的第二用户,包括:
根据用户关系数据库,确定所述第一用户的关注者;
从所述第一用户的关注者中确定所述第二用户,其中,所述第二用户未传播所述目标信息。
第二方面,提供了一种社交网络中预测信息传播的设备,所述设备包括:
获取单元,用于获取待预测的目标信息并获取K个集群的影响力,其中, 所述目标信息是由第一用户发布或转发的,所述K个集群用于表示用户的特征属性的K个分类,所述K个集群的影响力包括所述K个集群的信息传播成功率,K为正整数;
确定单元,用于确定所述第一用户的角色概率分布,并确定未传播所述获取单元获取的所述目标信息的第二用户,其中,所述第一用户的角色概率分布用于表示所述第一用户分别属于所述K个集群的概率;
所述确定单元,还用于根据所述获取单元获取的所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的概率。
结合第二方面,在第二方面的第一种可能的实现方式中,所述目标信息是由所述第一用户在初始时刻发布或转发的,所述设备还包括:
输出单元,用于输出满足预设条件的所述第二用户的账号,
其中,所述预设条件为:所述转发所述目标信息的概率大于预设的概率阈值。
结合第二方面或者第二方面的第一种可能的实现方式,在第二方面的第二种可能的实现方式中,所述获取单元,还用于从所述社交网络获取信息传播记录、用户关系数据库和用户特征数据库,其中,所述信息传播记录包括已有信息的历史传播记录,所述用户关系数据库包括已有用户之间的关注关系,所述用户特征数据库包括已有用户的特征属性;
所述确定单元,还用于根据所述用户特征数据库,采用软聚类算法,得到所述K个集群以及所述K个集群的特征属性,其中,所述K个集群是根据所述已有用户的特征属性所确定的K个分类;
所述确定单元,还用于根据所述信息传播记录和所述用户关系数据库,采用学习的方法,得到所述K个集群的影响力。
结合第二方面的第二种可能的实现方式,在第二方面的第三种可能的实现方式中,所述确定单元,具体用于:
获取所述第一用户的特征属性;
根据所述第一用户的特征属性与所述K个集群的特征属性,确定所述第一用户的角色概率分布。
结合第二方面的第三种可能的实现方式,在第二方面的第四种可能的实现方式中,所述第一用户的特征属性表示为AT,所述K个集群的特征属性 表示为KTj,j=1,2,...,K;
所述确定单元,具体用于:
确定所述第一用户的角色概率分布为与所述K个集群对应的K个值,其中,所述K个值分别为
Figure PCTCN2015079877-appb-000002
结合第二方面的第二种可能的实现方式,在第二方面的第五种可能的实现方式中,所述确定单元,还用于:
根据所述用户特征数据库和所述K个集群的特征属性,确定所述已有用户的角色概率分布,其中,所述已有用户的角色概率分布用于表示所述已有用户分别属于所述K个集群的概率。
结合第二方面的第五种可能的实现方式,在第二方面的第六种可能的实现方式中,所述第一用户属于所述已有用户,
所述确定单元,具体用于:
从所述已有用户的角色概率分布中,获取所述第一用户的角色概率分布。
结合第二方面的第五种可能的实现方式,在第二方面的第七种可能的实现方式中,所述第一用户不属于所述已有用户,
所述确定单元,具体用于:
获取所述第一用户的特征属性;
根据所述第一用户的特征属性,从所述用户特征数据库中获取N个第三用户的特性属性,其中,所述N个第三用户属于所述已有用户,所述N个第三用户的特性属性与所述第一用户的特征属性之间的距离小于预设的距离阈值,N为正整数;
从所述已有用户的角色概率分布中,获取所述N个第三用户的角色概率分布;
根据所述N个第三用户的角色概率分布,确定所述第一用户的角色概率分布。
结合第二方面的第七种可能的实现方式,在第二方面的第八种可能的实现方式中,所述确定单元,具体用于:
确定所述第一用户的角色概率分布为所述N个第三用户的角色概率分布的算术平均。
结合第二方面的第二种可能的实现方式至第二方面的第八种可能的实现方式中的任一种可能的实现方式,在第二方面的第九种可能的实现方式中,所述学习的方法为机器学习的方法或统计学习的方法。
结合第二方面或者上述第二方面的任一种可能的实现方式,在第二方面的第十种可能的实现方式中,所述确定单元,具体用于:
根据所述K个集群的影响力和所述第一用户的角色概率分布,采用统计的方法,计算所述第二用户转发所述目标信息的传播概率的期望值;
将所述传播概率的期望值作为所述第二用户从所述第一用户转发所述目标信息的概率。
结合第二方面或者上述第二方面的任一种可能的实现方式,在第二方面的第十一种可能的实现方式中,所述K个集群的影响力还包括所述K个集群的信息传播时间延迟率,所述确定单元,还用于:
根据所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的时刻。
结合第二方面或者上述第二方面的任一种可能的实现方式,在第二方面的第十二种可能的实现方式中,所述第二用户为所述第一用户的关注者中未传播所述目标信息的用户。
结合第二方面的第十二种可能的实现方式,在第二方面的第十三种可能的实现方式中,所述确定单元,具体用于:
根据用户关系数据库,确定所述第一用户的关注者;
从所述第一用户的关注者中确定所述第二用户,其中,所述第二用户未传播所述目标信息。
本发明实施例中,利用K个集群的影响力,能够预测社交网络中的目标信息的传播,该预测方法的计算量小,计算效率高。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造 性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本发明一个实施例的社交网络中预测信息传播的方法的流程图。
图2是本发明另一个实施例的社交网络中预测信息传播的方法的流程图。
图3是本发明一个实施例的社交网络中预测信息传播的设备的框图。
图4是本发明另一个实施例的社交网络中预测信息传播的设备的框图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
社交网络(social network)可以理解为在线社区,社交网络的用户数量巨大,例如用户数量可能为成百上千,也可能为百万千万甚至更多。
例如,比较常用的社交网络有微博(Weibo或MicroBlog)、微信(WeChat)、易信、米聊(MiTalk)、脸书(Facebook)、推特(Twitter)和领英(LinkedIn)等。
社交网络中可以用“用户关系数据库”记录用户之间的关系。“用户关系数据库”包括已有用户之间的关注关系。具体地,用户之间可以通过关注(follow)建立关系。例如,用户A关注了用户B,那么用户A为用户B的关注者(follower)。或者,也可以称为用户A为用户B的粉丝。
应注意,在诸如微博的社交网络中,用户A为用户B的关注者,但是用户B不一定为用户A的关注者。在诸如微信的社交网络中,用户A为用户B的关注者,同时用户B也一定为用户A的关注者,也可以称为用户A和用户B为朋友。
可选地,社交网络中可以用三元组或二元组表示用户之间的关系。
其中,三元组的第一项可以为第一个用户ID,第二项可以为第二个用户ID,第三项可以表示第一个用户是否关注了第二个用户。其中,第三项为1表示关注,第三项为0表示没有关注。例如,在诸如微博的社交网络中,每两个用户之间可以均以两个三元组<A,B,1>和<B,A,0>表示。其中,<A,B,1>表示用户A关注了用户B,<B,A,0>表示用户B没有关注用 户A。那么,可理解,假设社交网络有M个用户,那么在诸如微博的社交网络中的“用户关系数据库”可由M×(M-1)个三元组表示。或者,“用户关系数据库”可以只包括第三项为1的三元组,那么所存储的三元组的数量可以远小于M×(M-1)。或者,“用户关系数据库”可以只包括二元组,该二元组可以理解为是前述第三项为1的三元组的前两项,那么所存储的二元组的数量可以远小于M×(M-1)。这样能够节省存储空间。
或者,其中,三元组的第一项可以为第一个用户ID,第二项可以为第二个用户ID,第三项可以表示第一个用户和第二个用户是否为朋友。其中,第三项为1表示是朋友,第三项为0表示不是朋友。例如,在诸如微信的社交网络中,以三元组<A,B,1>表示用户A和用户B是朋友。即用户A为用户B的关注者,同时用户B也为用户A的关注者。那么,可理解,假设社交网络有M个用户,那么在诸如微信的社交网络中的“用户关系数据库”可由M!/2×(M-2)!个三元组表示。或者,“用户关系数据库”可以只包括第三项为1的三元组,那么所存储的三元组的数量可以远小于M!/2×(M-2)!。或者,“用户关系数据库”可以只包括二元组,该二元组可以理解为是前述第三项为1的三元组的前两项,那么所存储的二元组的数量可以远小于M!/2×(M-2)!。这样能够节省存储空间。
可选地,社交网络中可以用四元组表示用户之间的关系。
其中,四元组的第一项可以为第一个用户ID,第二项可以为第二个用户ID,第三项可以表示第一个用户是否关注了第二个用户,第四项可以表示第二个用户是否关注了第一个用户。其中,第三项和第四项可以用0或1表示。例如,<A,B,1,0>表示用户A关注了用户B,但是用户B没有关注用户A。<A,B,1,1>表示用户A关注了用户B,且用户B关注了用户A。那么,可理解,假设社交网络有M个用户,那么在社交网络中的“用户关系数据库”可由M!/2×(M-2)!个四元组表示。或者,“用户关系数据库”可以只包括第三项和第四项至少一项为1的四元组,那么所存储的四元组的数量可以远小于M!/2×(M-2)!。
应注意,本发明实施例对“用户关系数据库”的表示形式不作限定。
社交网络中可以用“信息传播记录”记录已有信息的传播。“信息传播记录”包括已有信息的历史传播记录。历史传播记录可以包括历史传播路径和时间。具体地,可以记录某一用户在某一时刻发布(post或tweet)了某一信息,或者某一用户在某一时刻从另一用户转发(forward或repost或retweet)了某一信息。
可选地,社交网络中可以用四元组表示“信息传播记录”。
其中,四元组的第一项可以为第一个用户ID,第二项可以为第二个用户ID,第三项可以为时刻,第四项可以为信息ID。例如,<A,B,t1,m1>表示用户A在t1时刻从用户B转发了信息ID为m1的信息。
其中,四元组的第一项可以为第一个用户ID,第二项可以为空或负数,第三项可以为时刻,第四项可以为信息ID。例如,<A,,t1,m1>或<A,-100,t1,m1>表示用户A在t1时刻发布了信息ID为m1的信息。
应注意,本发明实施例对“信息传播记录”的表示形式不作限定。
应注意,本发明实施例对信息的形式不作限定。例如,信息可以是文本的形式,或者,信息也可以是音频或视频的形式,或者,信息也可以是网页链接的形式,等等。
社交网络中可以用“用户特征数据库”记录用户的特征属性。“用户特征数据库”包括已有用户的特征属性。其中,特征属性可包括个人属性、网络属性和行为属性。个人属性可以包括用户的基本属性,例如,年龄、性别、出生地、职业等。网络属性可以包括用户在社交网络中的重要性、中心性、结构洞特性等。例如,重要性可以用PageRank值表示,中心性(centrality)可以用出度和入度表示,结构洞(Structural Hole)特性可以用网络约束系数(Network Constraint index)表示。行为属性可以包括用户在社交网络上的行为的活跃度,这里,在社交网络上的行为可包括发布、转发、评论(comment)等。
可理解,网络属性与“用户关系数据库”有关。具体地,可根据“用户关系数据库”计算得到网络属性。其中,在诸如微博的社交网络中,中心性可以用出度和入度两个值表示,且一般地出度不等于入度。在诸如微信的社交网络中,中心性可以用出度或入度一个值表示,且出度等于入度,此时出度、入度均等于朋友数量,也就是说,中心性也可以用朋友数量表示。
可理解,行为属性与“信息传播记录”有关。具体地,可根据“信息传播记录”计算得到行为属性。其中,活跃度与用户在单位时间内的行为的数量有关,且活跃度可以是一个数值,数值越大表示活跃度越高。例如,可以用1至5的5个整数表示活跃度。5表示非常活跃,4表示一般活跃,3表示活跃,2表示不活跃,1表示非常不活跃。或者例如,可以用0至1的百分比数表示活跃度。80%表示活跃,50%表示一般活跃,20%表示不活跃等。
可理解,“用户特征数据库”中每一个用户的特征属性可以用一个H维特征向量表示。假设用户数量为M个,那么用户特征数据库可以包括M个H维的特征向量。或者,可理解,用户特征数据库也可以用一个M×H的矩阵表示。其中,M和H为正整数。举例来说,假设H=9,且一个用户的H=9维特征向量的每个分量分别表示年龄、性别、出生地、职业、重要性、中心性、发布的活跃度、转发的活跃度、评论的活跃度。那么,若用户A的特征属性表示为特征向量AT={20,F,BJ,Doc,0.65,50,4,2,4}表示用户A的年龄为20岁,性别为女,出生地为北京,职业为医生,重要性为0.65,朋友数量为50,发布的活跃度为4,转发的活跃度为2,评论的活跃度为4。
应注意,“用户特征数据库”中H的值可以更大或更小,也就是说用户的特征属性的维度可以更大或更小,本发明对此不作限定。
图1是本发明一个实施例的社交网络中预测信息传播的方法。该方法包括:
101,获取待预测的目标信息并获取K个集群的影响力,其中,所述目标信息是由第一用户发布或转发的,所述K个集群用于表示用户的特征属性的K个分类,所述K个集群的影响力包括所述K个集群的信息传播成功率,K为正整数。
102,确定所述第一用户的角色概率分布,并确定未传播所述目标信息的第二用户,其中,所述第一用户的角色概率分布用于表示所述第一用户分别属于所述K个集群的概率。
103,根据所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的概率。
本发明实施例中,利用K个集群的影响力,能够预测社交网络中的目标信息的传播,该预测方法的计算量小,计算效率高。
可选地,本发明实施例中,101中的待预测的目标信息可以是所述第一 用户在初始时刻发布或转发的,可以以四元组的形式表示,例如,可以将初始时刻标记为第一时刻。那么,<第一用户的ID,,第一时刻,目标信息的ID>表示第一用户在第一时刻发布目标信息。例如,<第一用户的ID,源用户的ID,第一时刻,目标信息的ID>表示第一用户在第一时刻从源用户转发目标信息。
可选地,本发明实施例中,在102中,第二用户可以是待预测的用户。
例如,如果期望获知用户A将会对该目标信息的传播情况,那么第二用户可以是用户A。再例如,如果期望获知年龄为30岁的用户将会对该目标信息的传播情况,那么第二用户可以是特征属性中年龄为30岁的,并且还未传播该目标信息的用户。
或者,可选地,本发明实施例中,在102中,第二用户可以是所述第一用户的关注者中未传播所述目标信息的用户。那么,可以根据所述用户关系数据库确定所述第一用户的关注者;并从所述第一用户的关注者中确定所述第二用户,其中所述第二用户未传播所述目标信息。
可理解,本发明实施例对第二用户的数量不作限定。例如,第二用户可以为一个或多个。
本发明实施例中,K个集群的影响力可以是通过训练得到的。那么,在101中,可以根据训练的结果获取K个集群的影响力。可理解,在101之前,还可包括:从所述社交网络获取信息传播记录、用户关系数据库和用户特征数据库,其中,所述信息传播记录包括已有信息的历史传播记录,所述用户关系数据库包括已有用户之间的关注关系,所述用户特征数据库包括已有用户的特征属性;根据所述用户特征数据库,采用软聚类算法,得到所述K个集群以及所述K个集群的特征属性;根据所述信息传播记录和所述用户关系数据库,采用学习的方法,得到所述K个集群的影响力。其中,所述K个集群是根据所述已有用户的特征属性所确定的K个分类。K为正整数。
其中,信息传播记录、用户关系数据库和用户特征数据库如前所述,为避免重复,这里不再赘述。
其中,软聚类算法也可以称为模糊聚类算法,例如可以为模糊C-均值算法(Fuzzy C-Means Algorithm,FCMA或FCM)和概率混合模型等。其中,K个集群也可以称为K类或K个角色。
假设已有用户的数量为M个,即用户特征数据库包括M个已有用户的 特征属性。那么,根据用户特征数据库,采用软聚类算法,得到K个集群,可以是根据M个已有用户的特征属性的相似性,将M个已有用户聚成K个集群。一般地,K远小于M。例如可以是K=10-3×M,甚至K=10-8×M,本发明对此不作限定。
可见,由于集群的数量K远小于用户的数量M,这样本发明实施例的方法计算量小,因此该方法的计算效率高。
其中,K个集群中的一个集群的特征属性可以为该一个集群的代表特征属性。例如,该一个集群的代表特征属性可以为该一个集群的中心点的特征属性,或者,该一个集群的代表特征属性可以为该一个集群中距离中心点最近的用户的特征属性。其中,该一个集群的中心点可以定义为属于该一个集群的所有用户的特征属性的均值。
具体地,K个集群的特征属性可以用K个H维特征向量表示。或者,可理解,K个集群的特征属性可以用一个K×H的矩阵表示。
这样,102中确定所述第一用户的角色概率分布,可以包括:获取所述第一用户的特征属性;根据所述第一用户的特征属性与所述K个集群的特征属性,确定所述第一用户的角色概率分布。
具体地,可以根据所述第一用户的特征属性与所述K个集群的特征属性之间的距离,确定所述第一用户的角色概率分布。
例如,所述第一用户的特征属性表示为AT,所述K个集群的特征属性表示为KTj,j=1,2,...,K;
确定所述第一用户的角色概率分布为与所述K个集群对应的K个值,其中,所述K个值分别为
Figure PCTCN2015079877-appb-000003
其中,||·||表示模或范数。
可理解,第一用户的角色概率分布可以由上述K个值组成的一个K维的向量形式来表示。
可选地,作为另一个实施例,在101之前,还可进一步包括:根据所述用户特征数据库和所述K个集群的特征属性,确定所述已有用户的角色概率分布,其中,所述已有用户的角色概率分布用于表示所述已有用户分别属于所述K个集群的概率。
其中,可以根据已有用户的特征属性和K个集群的特征属性,确定已有用户的角色概率分布。其中,K个集群的特征属性可以按照前述实施例的方法得到,为避免重复,这里不再赘述。例如,假设已有用户中的用户B的特征属性为特征向量BT,K个集群的特征属性为K个特征向量,分别为KTj,j=1,2,...,K。那么,用户B的角色概率分布可包括K个值,分别为
Figure PCTCN2015079877-appb-000004
j=1,2,...,K。其中,||·||表示模或范数。应注意,||·||可以为无穷范数||·||,或者也可以为2-范数||·||2,或者也可以为其他形式的范数,本发明对此不作限定。
这样,若第一用户属于所述已有用户,那么,102中确定所述第一用户的角色概率分布,可以包括:从所述已有用户的角色概率分布中,获取所述第一用户的角色概率分布。
这样,若第一用户不属于所述已有用户,那么,102中确定所述第一用户的角色概率分布,可以包括:获取所述第一用户的特征属性;根据所述第一用户的特征属性,从所述用户特征数据库中获取N个第三用户的特性属性,其中,所述N个第三用户属于所述已有用户,所述N个第三用户的特性属性与所述第一用户的特征属性之间的距离小于预设的距离阈值,N为正整数;从所述已有用户的角色概率分布中,获取所述N个第三用户的角色概率分布;根据所述N个第三用户的角色概率分布,确定所述第一用户的角色概率分布。
可理解,N个第三用户是与第一用户的特性属性相似的用户。可选地,根据所述N个第三用户的角色概率分布,确定所述第一用户的角色概率分布,可以包括:确定所述第一用户的角色概率分布为所述N个第三用户的角色概率分布的算术平均。
应注意,本发明实施例中,若第一用户不属于已有用户,可理解,该第一用户为新用户。这样,本发明实施例能够解决冷启动问题。
换个角度,即使在训练过程中,已有用户的数量不够多,也可以在后续实现预测的过程,也就是说,本发明实施例能够解决数据稀疏问题。
可选地,本发明实施例中,通过训练确定K个集群的影响力,即:根据 所述信息传播记录和所述用户关系数据库,采用学习的方法,得到所述K个集群的影响力,可以是:根据所述信息传播记录和所述用户关系数据库,结合已有用户的角色概率分布,计算属于每一个集群的已有用户对已有用户的关注者的转发行为的影响,进而可以学习K个集群在信息传播过程中的影响力。
可选地,其中,所述学习的方法可以为机器学习的方法或统计学习的方法,本发明对此不作限定。
可选地,本发明实施例中,所述K个集群的影响力可以包括所述K个集群的信息传播成功率。其中,信息传播成功率可以用影响因子表示,即K个集群的影响力可以包括K个集群的影响因子。其中,K个集群的影响因子可以理解为K个集群在信息传播过程中的成功率。那么,K个集群的影响力可以用一个K维的向量表示。
可选地,本发明实施例中,所述K个集群的影响力可以包括所述K个集群的信息传播成功率和信息传播时间延迟率。可选地,其中,所述影响力包括影响因子和时间延迟。即,K个集群的影响力可以包括K个集群的影响因子和K个集群的时间延迟。其中,K个集群的影响因子可以理解为K个集群在信息传播过程中的成功率,K个集群的时间延迟可以理解为K个集群在信息传播过程中对一个时刻的延迟的概率。那么,K个集群的影响力可以用一个K×2的矩阵表示。
可选地,影响因子和时间延迟可以为0至1之间的数值,数值越大,表示影响力越大。可选地,影响因子和时间延迟可以为1至5之间的整数值,整数值越大,表示影响力越大。本发明对此不作限定。
或者,可选地,也可以是:设定信息传播记录数据的概似函数,根据所述信息传播记录、所述用户关系数据库和所述用户特征数据库,对概似函数进行最优化,从而确定K个集群的影响力。
例如,概似函数可以定义为如下的公式(1):
Figure PCTCN2015079877-appb-000005
其中,I代表信息的总数量,T代表最大的时刻,H代表用户的特征向量的维度。K代表集群的数量,V代表所有用户的集合。
其中,Ait代表在时刻t传播过信息i的用户集,DiT代表在时刻T传播过信息i的用户集。xuh代表用户u的特征向量的第h个分量的值。θuk代表用户u在信息传播过程中属于第k个集群的概率。ρk和λk代表第k个集群的影响力,ρk代表第k个集群的影响因子(成功率),λk代表第k个集群的时间延迟。μkh代表属于第k个集群的所有的用户的特征向量的第h个分量的平均值(mean),δkh代表属于第k个集群的所有的用户的特征向量的第h个分量的精度(precision)。
那么,可理解,logP(v∈Ait)代表用户在时刻t参与传播信息i的概率,
Figure PCTCN2015079877-appb-000006
代表用户在时刻T没有参与传播信息i的概率,logP(xuh)代表用户u的第h个特征向量的概率。
并且,
Figure PCTCN2015079877-appb-000007
Figure PCTCN2015079877-appb-000008
Figure PCTCN2015079877-appb-000009
其中,
Figure PCTCN2015079877-appb-000010
Figure PCTCN2015079877-appb-000011
进一步地,对概似函数进行最优化,可以通过现有的生成模型参数学习方法确定θuk、ρk、λk、μkh和δkh。其中,生成模型参数学习方法可以为吉布斯采样(Gibbs Sampling)方法或者变分方法(variational method)。
应注意,本发明实施例中,概似函数也可以为其他的形式,本发明对此不作限定。
可选地,在103中,可以根据所述K个集群的影响力和所述第一用户的角色概率分布,利用贝叶斯理论(Bayesian Theory),确定所述第二用户从所述第一用户转发所述目标信息的概率。
可选地,在103中,可以根据所述K个集群的影响力和所述第一用户的角色概率,采用统计的方法,计算所述第二用户转发所述目标信息的传播概率的期望值。并将所述传播概率的期望值作为所述第二用户从所述第一用户转发所述目标信息的概率。
可选地,若所述K个集群的影响力还包括所述K个集群的信息传播时间延迟率,也就是说,若所述K个集群的影响力包括所述K个集群的信息传播成功率和信息传播时间延迟率,那么,图1的方法还可以包括:根据所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的时刻。
具体地,可以根据所述K个集群的影响力和所述第一用户的角色概率分布,利用贝叶斯理论(Bayesian Theory),确定所述第二用户从所述第一用户转发所述目标信息的时刻。
可选地,可以根据所述K个集群的影响力和所述第一用户的角色概率,采用统计的方法,计算所述第二用户转发所述目标信息的传播时刻的期望值。并将所述传播时刻的期望值作为所述第二用户从所述第一用户转发所述目标信息的时刻。
应注意,确定所述第二用户从所述第一用户转发所述目标信息的时刻的 步骤可以在103之前或之后执行,也可以与103同时执行,本发明对此不作限定。
应注意,本发明实施例中,确定第二用户从所述第一用户转发所述目标信息的概率的步骤,与确定第二用户从所述第一用户转发所述目标信息的时刻的步骤,可以分别独立执行;也可以是相互耦合交叉执行。
例如,对用户u的一个特征属性h的潜在变量k,其样本的先验概率可以表示为:
Figure PCTCN2015079877-appb-000012
其中,本发明实施例采用Stirling’s公式近似Γ函数。且函数η(·)定义为:
Figure PCTCN2015079877-appb-000013
其中,τ0、τ1、τ2和τ3为常态珈玛事前(Normal-Gamma prior)参数。
对于潜在变量(t,k,z),有下式:
Figure PCTCN2015079877-appb-000014
根据样本结果可以估计模型参数,具体地,模型参数可以更新为:
Figure PCTCN2015079877-appb-000015
Figure PCTCN2015079877-appb-000016
Figure PCTCN2015079877-appb-000017
这样,便可以估计转发目标信息的概率和转发目标信息的时刻:
Figure PCTCN2015079877-appb-000018
Figure PCTCN2015079877-appb-000019
其中,E(·)表示期望。
可理解,这里的δkh为时间间隔,第二用户转发目标信息的时刻为第一用户发布或转发的初始时刻加所述时间间隔δkh
可选地,所述目标信息是由所述第一用户在初始时刻发布或转发的,在图1所示的方法之后,即在103之后,可以包括:输出满足预设条件的所述第二用户的账号,其中,所述预设条件为:所述转发所述目标信息的概率大于预设的概率阈值。
可选地,若所述K个集群的影响力还包括所述K个集群的信息传播时间延迟率,还可以确定所述第二用户从所述第一用户转发所述目标信息的时刻。那么相应地,所述预设条件还可以包括:所述转发所述目标信息的时刻与所述初始时刻之间的时长小于预设的时长阈值。
应注意,本发明实施例对账号的形式不作限定,例如可以为ID,或者也可以为姓名。
应注意,本发明实施例对预设的概率阈值和预设的时长阈值的大小不作限定。例如,预设的概率阈值可以为0.3,预设的时长阈值可以为12小时。
可理解,图1所示的方法针对待预测的目标信息,将第一用户的关注者对该目标信息的传播作出了预测。进一步地,也可以将第一用户的关注者的关注者对该目标信息的传播作出预测,以此类推。如图2所示。
具体地,在图2中,假设用户X在t0时刻发布了信息m。可以用四元组 <X,,t0,m>表示。图2所示的方法为在t0开始的预设的时长阈值内对该信息m的传播进行的预测。图2所示的方法包括:
201,开始。具体地,确定用户X在t0时刻发布了信息m。且给定n=1。
202,生成第n次的初始条件。
当n=1时,该初始条件为第一用户在第一时刻发布/转发了信息m。可以用四元组<X,,t0,m>表示。
具体地,在201之后,第一用户为用户X,第一时刻为t0
203,确定第一用户的角色概率分布,并确定第一用户的关注者中未传播信息m的第二用户。
具体地,203可以参见前述实施例中的102,为避免重复,这里不再赘述。
204,根据K个集群的影响力和第一用户的角色概率分布,确定第二用户从第一用户转发信息m的概率;或者,确定第二用户从第一用户转发信息m的概率和从第一用户转发信息m的第二时刻。
具体地,204可以参见前述实施例中的103,为避免重复,这里不再赘述。
205,判断是否满足预设的停止条件。若是,执行208;若否,执行206和207。
可选地,停止条件可包括:第二用户从第一用户转发信息m的概率小于预设的概率阈值。或者,停止条件可包括:t0时刻至第二时刻的时长大于时长阈值,且第二用户从第一用户转发信息m的概率小于预设的概率阈值。
以下实施例中,假设停止条件为:t0时刻至第二时刻的时长大于时长阈值,且第二用户从第一用户转发信息m的概率小于预设的概率阈值。
例如,时长阈值可以等于24小时,概率阈值可以等于0.2。本发明对此不作限定。
可理解,在204中所确定的第二用户为至少一个,假设为M1个。那么,在205中,需对M1个第二用户中的每一个进行判断是否满足预设的停止条件。并且,若对M1个第二用户中的每一个进行判断都满足预设的停止条件,则认为205的判断结果为是。若对M1个第二用户中的每一个进行判断,只要有其中的一个不满足预设的停止条件,则认为205的判断结果为否。进一步地,可理解,对M1个第二用户中不满足预设的停止条件的第二用户执行 206和207。
作为一例,假设用户X的关注者中,Y1和Y2转发m的概率和第二时刻均不满足预设的停止条件。且,Y1从X转发m的概率为P1,Y1从X转发m的第二时刻为t1。Y2从X转发m的概率为P2,Y2从X转发m的第二时刻为t2。
206,输出第二用户。
可理解,206中输出的为在205判断中不满足预设的停止条件的第二用户。
可选地,可以输出第二用户的账号。或者,可选地,还可以输出第二用户转发m的概率;或者输出第二用户转发m的概率和第二时刻。
作为一例,假设用户X的关注者中,Y1和Y2转发m的概率和第二时刻均不满足预设的停止条件。那么,206可以输出Y1和Y2。或者,206可以输出两个向量,分别为(Y1,P1,t1)和(Y2,P2,t2)。可理解,206输出的两个向量包括三个分量,第一个分量代表账号,第二个分量代表转发的概率,第三个分量代表转发的时刻。
207,第二替换第一,n增加1。
具体地,将在205判断中不满足预设的停止条件的第二用户替换为第一用户,将第二用户转发的第二时刻替换为第一时刻。
那么,相应地,在207之后执行的202可以为:第一用户在第一时刻转发了信息m。
作为一例,假设用户X的关注者中,Y1和Y2转发m的概率和第二时刻均不满足预设的停止条件。那么,n=1时,202生成的初始条件可以用四元组<Y1,X,t1,m>和四元组<Y2,X,t2,m>表示。
208,停止。
具体地,当在205判断中所有的第二用户均满足预设的停止条件时,该预测过程停止。
这样,便可以从206中获取在预设的时长阈值内,传播信息m的概率大于预设的概率阈值的用户。
应注意,本发明实施例对停止条件不作限定。例如,停止条件可以为迭代的次数大于或等于预设的迭代阈值,即n的值大于或等于预设的迭代阈值。例如,停止条件可以为输出的用户的数量大于预设的数量阈值,即206输出 的第二用户的数量大于预设的数量阈值。
应注意,本发明实施例对预设的迭代阈值和预设的数量阈值的大小不作限定。例如,预设的迭代阈值的大小可以为10。例如,预设的数量阈值的大小可以为1000。
这样,本发明实施例中,利用K个集群的影响力,能够预测社交网络中的信息的传播,并且,该预测方法的计算量小,计算效率高。
这样,利用本发明所示的实施例,企业可以有效率地进行各种商业决策。举例来说,企业想要达到一定的广告推销效果,例如,企业期望某一条信息在一天的时间内传播到至少一千人。那么,企业可以根据期望设置停止条件,通过图2所示的方法,假设用户X为张三,进行信息传播预测。如果通过预测能够满足企业的期望,那么该企业可以针对张三发布信息。例如,发布的信息可以是新产品的产品介绍信息。
另外,可理解,通过本发明的方法,也可以根据信息传播的预测结果,及时地作出风险管理的决策等。
图3是本发明一个实施例的社交网络中预测信息传播的设备的框图。图3所示的设备300包括获取单元301和确定单元302。
获取单元301,用于获取待预测的目标信息并获取K个集群的影响力,其中,所述目标信息是由第一用户发布或转发的,所述K个集群用于表示用户的特征属性的K个分类,所述K个集群的影响力包括所述K个集群的信息传播成功率,K为正整数。
确定单元302,用于确定所述第一用户的角色概率分布,并确定未传播获取单元301获取的所述目标信息的第二用户,其中,所述第一用户的角色概率分布用于表示所述第一用户分别属于所述K个集群的概率。
确定单元302,还用于根据所述获取单元301获取的所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的概率。
本发明实施例中,利用K个集群的影响力,能够预测社交网络中的目标信息的传播,该预测方法的计算量小,计算效率高。
可选地,作为一个实施例,所述目标信息是由所述第一用户在初始时刻发布或转发的,所述设备还包括:输出单元,用于输出满足预设条件的所述第二用户的账号,
其中,所述预设条件为:所述转发所述目标信息的概率大于预设的概率阈值。
可选地,作为另一个实施例,
获取单元301,还用于从所述社交网络获取信息传播记录、用户关系数据库和用户特征数据库,其中,所述信息传播记录包括已有信息的历史传播记录,所述用户关系数据库包括已有用户之间的关注关系,所述用户特征数据库包括已有用户的特征属性。
确定单元302,还用于根据所述用户特征数据库,采用软聚类算法,得到所述K个集群以及所述K个集群的特征属性,其中,所述K个集群是根据所述已有用户的特征属性所确定的K个分类。
确定单元302,还用于根据所述信息传播记录和所述用户关系数据库,采用学习的方法,得到所述K个集群的影响力。
可选地,作为另一个实施例,确定单元302,具体用于:
获取所述第一用户的特征属性;
根据所述第一用户的特征属性与所述K个集群的特征属性,确定所述第一用户的角色概率分布。
可选地,作为另一个实施例,所述第一用户的特征属性表示为AT,所述K个集群的特征属性表示为KTj,j=1,2,...,K;
所述确定单元,具体用于:
确定所述第一用户的角色概率分布为与所述K个集群对应的K个值,其中,所述K个值分别为
Figure PCTCN2015079877-appb-000020
可选地,作为另一个实施例,确定单元302,还用于:
根据所述用户特征数据库和所述K个集群的特征属性,确定所述已有用户的角色概率分布,其中,所述已有用户的角色概率分布用于表示所述已有用户分别属于所述K个集群的概率。
可选地,作为另一个实施例,所述第一用户属于所述已有用户,确定单元302,具体用于:
从所述已有用户的角色概率分布中,获取所述第一用户的角色概率分 布。
可选地,作为另一个实施例,所述第一用户不属于所述已有用户,确定单元302,具体用于:
获取所述第一用户的特征属性;
根据所述第一用户的特征属性,从所述用户特征数据库中获取N个第三用户的特性属性,其中,所述N个第三用户属于所述已有用户,所述N个第三用户的特性属性与所述第一用户的特征属性之间的距离小于预设的距离阈值,N为正整数;
从所述已有用户的角色概率分布中,获取所述N个第三用户的角色概率分布;
根据所述N个第三用户的角色概率分布,确定所述第一用户的角色概率分布。
可选地,作为另一个实施例,确定单元302,具体用于:确定所述第一用户的角色概率分布为所述N个第三用户的角色概率分布的算术平均。
可选地,作为另一个实施例,所述学习的方法为机器学习的方法或统计学习的方法。
可选地,作为另一个实施例,确定单元302,具体用于:
根据所述K个集群的影响力和所述第一用户的角色概率分布,采用统计的方法,计算所述第二用户转发所述目标信息的传播概率的期望值;
将所述传播概率的期望值作为所述第二用户从所述第一用户转发所述目标信息的概率。
可选地,作为另一个实施例,所述K个集群的影响力还包括所述K个集群的信息传播时间延迟率,确定单元302,还用于:根据所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的时刻。
相应地,前述的预设条件还可以进一步包括:所述转发所述目标信息的时刻与所述初始时刻之间的时长小于预设的时长阈值。
可选地,作为另一个实施例,所述第二用户为所述第一用户的关注者中未传播所述目标信息的用户。
可选地,作为另一个实施例,所述确定单元,具体用于:根据用户关系数据库,确定所述第一用户的关注者;从所述第一用户的关注者中确定所述 第二用户,其中,所述第二用户未传播所述目标信息。
可选地,本发明实施例中,图3所示的设备300可以为社交网络的服务器。
图3所示的设备300能够实现图1和图2所示的方法中的各个过程,为避免重复,这里不再赘述。
图4是本发明另一个实施例的社交网络中预测信息传播的设备的框图。图4所示的设备400包括处理器401、接收电路402、发送电路403和存储器404。
接收电路402,用于获取待预测的目标信息并获取K个集群的影响力,其中,所述目标信息是由第一用户发布或转发的,所述K个集群用于表示用户的特征属性的K个分类,所述K个集群的影响力包括所述K个集群的信息传播成功率,K为正整数。
处理器401,用于确定所述第一用户的角色概率分布,并确定未传播所述目标信息的第二用户,其中,所述第一用户的角色概率分布用于表示所述第一用户分别属于所述K个集群的概率。
处理器401,还用于根据获取的所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的概率。
本发明实施例中,利用K个集群的影响力,能够预测社交网络中的目标信息的传播,该预测方法的计算量小,计算效率高。
设备400中的各个组件通过总线系统405耦合在一起,其中总线系统405除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图4中将各种总线都标为总线系统405。
上述本发明实施例揭示的方法可以应用于处理器401中,或者由处理器401实现。处理器401可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器401中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器401可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。 通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器404,处理器401读取存储器404中的信息,结合其硬件完成上述方法的步骤。
可以理解,本发明实施例中的存储器404可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。应注意,本文描述的系统和方法的存储器404旨在包括但不限于这些和任意其它适合类型的存储器。
可以理解的是,本文描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本申请所述功能的其它电子单元或其组合中。
当在软件、固件、中间件或微码、程序代码或代码段中实现实施例时,它们可存储在例如存储部件的机器可读介质中。代码段可表示过程、函数、子程序、程序、例程、子例程、模块、软件分组、类、或指令、数据结构或 程序语句的任意组合。代码段可通过传送和/或接收信息、数据、自变量、参数或存储器内容来稿合至另一代码段或硬件电路。可使用包括存储器共享、消息传递、令牌传递、网络传输等任意适合方式来传递、转发或发送信息、自变量、参数、数据等。
对于软件实现,可通过执行本文所述功能的模块(例如过程、函数等)来实现本文所述的技术。软件代码可存储在存储器单元中并通过处理器执行。存储器单元可以在处理器中或在处理器外部实现,在后一种情况下存储器单元可经由本领域己知的各种手段以通信方式耦合至处理器。
可选地,作为一个实施例,所述目标信息是由所述第一用户在初始时刻发布或转发的,所述设备400的发送电路403,用于输出满足预设条件的所述第二用户的账号,其中,所述预设条件为:所述转发所述目标信息的概率大于预设的概率阈值。
可选地,作为另一个实施例,接收电路402,还用于从所述社交网络获取信息传播记录、用户关系数据库和用户特征数据库,其中,所述信息传播记录包括已有信息的历史传播记录,所述用户关系数据库包括已有用户之间的关注关系,所述用户特征数据库包括已有用户的特征属性。
处理器401,还用于根据所述用户特征数据库,采用软聚类算法,得到所述K个集群以及所述K个集群的特征属性,其中,所述K个集群是根据所述已有用户的特征属性所确定的K个分类;处理器401,还用于根据所述信息传播记录和所述用户关系数据库,采用学习的方法,得到所述K个集群的影响力。
可理解,本发明实施例中,存储器404可用于存储信息传播记录、用户关系数据库、用户特征数据库。存储器404还用于存储K个集群的特征属性和K个集群的影响力。
可选地,作为另一个实施例,处理器401,具体用于:
获取所述第一用户的特征属性;
根据所述第一用户的特征属性与所述K个集群的特征属性,确定所述第一用户的角色概率分布。
可选地,作为另一个实施例,所述第一用户的特征属性表示为AT,所述K个集群的特征属性表示为KTj,j=1,2,...,K;
所述确定单元,具体用于:
确定所述第一用户的角色概率分布为与所述K个集群对应的K个值,其中,所述K个值分别为
Figure PCTCN2015079877-appb-000021
可选地,作为另一个实施例,处理器401,还用于:
根据所述用户特征数据库和所述K个集群的特征属性,确定所述已有用户的角色概率分布,其中,所述已有用户的角色概率分布用于表示所述已有用户分别属于所述K个集群的概率。
可选地,作为另一个实施例,所述第一用户属于所述已有用户,处理器401,具体用于:从所述已有用户的角色概率分布中,获取所述第一用户的角色概率分布。
可选地,作为另一个实施例,所述第一用户不属于所述已有用户,处理器401,具体用于:
获取所述第一用户的特征属性;
根据所述第一用户的特征属性,从所述用户特征数据库中获取N个第三用户的特性属性,其中,所述N个第三用户属于所述已有用户,所述N个第三用户的特性属性与所述第一用户的特征属性之间的距离小于预设的距离阈值,N为正整数;
从所述已有用户的角色概率分布中,获取所述N个第三用户的角色概率分布;
根据所述N个第三用户的角色概率分布,确定所述第一用户的角色概率分布。
可选地,作为另一个实施例,处理器401,具体用于:确定所述第一用户的角色概率分布为所述N个第三用户的角色概率分布的算术平均。
可选地,作为另一个实施例,所述学习的方法为机器学习的方法或统计学习的方法。
可选地,作为另一个实施例,所述处理器401具体用于:
根据所述K个集群的影响力和所述第一用户的角色概率分布,采用统计的方法,计算所述第二用户转发所述目标信息的传播概率的期望值;
将所述传播概率的期望值作为所述第二用户从所述第一用户转发所述目标信息的概率。
可选地,作为另一个实施例,所述K个集群的影响力还包括所述K个集群的信息传播时间延迟率,处理器401还用于:根据所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的时刻。
相应地,前述的预设条件还可以进一步包括:所述转发所述目标信息的时刻与所述初始时刻之间的时长小于预设的时长阈值。
可选地,作为另一个实施例,所述第二用户为所述第一用户的关注者中未传播所述目标信息的用户。
可选地,作为另一个实施例,所述处理器401具体用于:根据用户关系数据库,确定所述第一用户的关注者;从所述第一用户的关注者中确定所述第二用户,其中,所述第二用户未传播所述目标信息。
图4所示的设备400能够实现图1和图2所示的方法中的各个过程,为避免重复,这里不再赘述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或 者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器ROM、RAM、磁盘或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。

Claims (28)

  1. 一种社交网络中预测信息传播的方法,其特征在于,所述方法包括:
    获取待预测的目标信息并获取K个集群的影响力,其中,所述目标信息是由第一用户发布或转发的,所述K个集群用于表示用户的特征属性的K个分类,所述K个集群的影响力包括所述K个集群的信息传播成功率,K为正整数;
    确定所述第一用户的角色概率分布,并确定未传播所述目标信息的第二用户,其中,所述第一用户的角色概率分布用于表示所述第一用户分别属于所述K个集群的概率;
    根据所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的概率。
  2. 根据权利要求1所述的方法,其特征在于,所述目标信息是由所述第一用户在初始时刻发布或转发的,所述方法还包括:输出满足预设条件的所述第二用户的账号,
    其中,所述预设条件为:所述转发所述目标信息的概率大于预设的概率阈值。
  3. 根据权利要求1或2所述的方法,其特征在于,在所述获取待预测的目标信息并获取K个集群的影响力之前,所述方法还包括:
    从所述社交网络获取信息传播记录、用户关系数据库和用户特征数据库,其中,所述信息传播记录包括已有信息的历史传播记录,所述用户关系数据库包括已有用户之间的关注关系,所述用户特征数据库包括已有用户的特征属性;
    根据所述用户特征数据库,采用软聚类算法,得到所述K个集群以及所述K个集群的特征属性,其中,所述K个集群是根据所述已有用户的特征属性所确定的K个分类;
    根据所述信息传播记录和所述用户关系数据库,采用学习的方法,得到所述K个集群的影响力。
  4. 根据权利要求3所述的方法,其特征在于,所述确定所述第一用户的角色概率分布,包括:
    获取所述第一用户的特征属性;
    根据所述第一用户的特征属性与所述K个集群的特征属性,确定所述第 一用户的角色概率分布。
  5. 根据权利要求4所述的方法,其特征在于,所述第一用户的特征属性表示为AT,所述K个集群的特征属性表示为KTj,j=1,2,…,K;
    确定所述第一用户的角色概率分布为与所述K个集群对应的K个值,其中,所述K个值分别为
    Figure PCTCN2015079877-appb-100001
  6. 根据权利要求3所述的方法,其特征在于,在所述获取待预测的目标信息并获取K个集群的影响力之前,所述方法还包括:
    根据所述用户特征数据库和所述K个集群的特征属性,确定所述已有用户的角色概率分布,其中,所述已有用户的角色概率分布用于表示所述已有用户分别属于所述K个集群的概率。
  7. 根据权利要求6所述的方法,其特征在于,所述第一用户属于所述已有用户,
    所述确定所述第一用户的角色概率分布,包括:
    从所述已有用户的角色概率分布中,获取所述第一用户的角色概率分布。
  8. 根据权利要求6所述的方法,其特征在于,所述第一用户不属于所述已有用户,
    所述确定所述第一用户的角色概率分布,包括:
    获取所述第一用户的特征属性;
    根据所述第一用户的特征属性,从所述用户特征数据库中获取N个第三用户的特性属性,其中,所述N个第三用户属于所述已有用户,所述N个第三用户的特性属性与所述第一用户的特征属性之间的距离小于预设的距离阈值,N为正整数;
    从所述已有用户的角色概率分布中,获取所述N个第三用户的角色概率分布;
    根据所述N个第三用户的角色概率分布,确定所述第一用户的角色概率分布。
  9. 根据权利要求8所述的方法,其特征在于,所述根据所述N个第三 用户的角色概率分布,确定所述第一用户的角色概率分布,包括:
    确定所述第一用户的角色概率分布为所述N个第三用户的角色概率分布的算术平均。
  10. 根据权利要求3至9任一项所述的方法,其特征在于,所述学习的方法为机器学习的方法或统计学习的方法。
  11. 根据权利要求1至10任一项所述的方法,其特征在于,所述根据所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的概率,包括:
    根据所述K个集群的影响力和所述第一用户的角色概率分布,采用统计的方法,计算所述第二用户转发所述目标信息的传播概率的期望值;
    将所述传播概率的期望值作为所述第二用户从所述第一用户转发所述目标信息的概率。
  12. 根据权利要求1至11任一项所述的方法,其特征在于,所述K个集群的影响力还包括所述K个集群的信息传播时间延迟率,所述方法还包括:
    根据所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的时刻。
  13. 根据权利要求1至12任一项所述的方法,其特征在于,所述第二用户为所述第一用户的关注者中未传播所述目标信息的用户。
  14. 根据权利要求13所述的方法,其特征在于,所述确定未传播所述目标信息的第二用户,包括:
    根据用户关系数据库,确定所述第一用户的关注者;
    从所述第一用户的关注者中确定所述第二用户,其中,所述第二用户未传播所述目标信息。
  15. 一种社交网络中预测信息传播的设备,其特征在于,所述设备包括:
    获取单元,用于获取待预测的目标信息并获取K个集群的影响力,其中,所述目标信息是由第一用户发布或转发的,所述K个集群用于表示用户的特征属性的K个分类,所述K个集群的影响力包括所述K个集群的信息传播成功率,K为正整数;
    确定单元,用于确定所述第一用户的角色概率分布,并确定未传播所述获取单元获取的所述目标信息的第二用户,其中,所述第一用户的角色概率 分布用于表示所述第一用户分别属于所述K个集群的概率;
    所述确定单元,还用于根据所述获取单元获取的所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的概率。
  16. 根据权利要求15所述的设备,其特征在于,所述目标信息是由所述第一用户在初始时刻发布或转发的,所述设备还包括:
    输出单元,用于输出满足预设条件的所述第二用户的账号,
    其中,所述预设条件为:所述转发所述目标信息的概率大于预设的概率阈值。
  17. 根据权利要求15或16所述的设备,其特征在于,
    所述获取单元,还用于从所述社交网络获取信息传播记录、用户关系数据库和用户特征数据库,其中,所述信息传播记录包括已有信息的历史传播记录,所述用户关系数据库包括已有用户之间的关注关系,所述用户特征数据库包括已有用户的特征属性;
    所述确定单元,还用于根据所述用户特征数据库,采用软聚类算法,得到所述K个集群以及所述K个集群的特征属性,其中,所述K个集群是根据所述已有用户的特征属性所确定的K个分类;
    所述确定单元,还用于根据所述信息传播记录和所述用户关系数据库,采用学习的方法,得到所述K个集群的影响力。
  18. 根据权利要求17所述的设备,其特征在于,所述确定单元,具体用于:
    获取所述第一用户的特征属性;
    根据所述第一用户的特征属性与所述K个集群的特征属性,确定所述第一用户的角色概率分布。
  19. 根据权利要求18所述的设备,其特征在于,所述第一用户的特征属性表示为AT,所述K个集群的特征属性表示为KTj,j=1,2,…,K;
    所述确定单元,具体用于:
    确定所述第一用户的角色概率分布为与所述K个集群对应的K个值, 其中,所述K个值分别为
    Figure PCTCN2015079877-appb-100002
  20. 根据权利要求17所述的设备,其特征在于,所述确定单元,还用于:
    根据所述用户特征数据库和所述K个集群的特征属性,确定所述已有用户的角色概率分布,其中,所述已有用户的角色概率分布用于表示所述已有用户分别属于所述K个集群的概率。
  21. 根据权利要求20所述的设备,其特征在于,所述第一用户属于所述已有用户,所述确定单元,具体用于:
    从所述已有用户的角色概率分布中,获取所述第一用户的角色概率分布。
  22. 根据权利要求20所述的设备,其特征在于,所述第一用户不属于所述已有用户,所述确定单元,具体用于:
    获取所述第一用户的特征属性;
    根据所述第一用户的特征属性,从所述用户特征数据库中获取N个第三用户的特性属性,其中,所述N个第三用户属于所述已有用户,所述N个第三用户的特性属性与所述第一用户的特征属性之间的距离小于预设的距离阈值,N为正整数;
    从所述已有用户的角色概率分布中,获取所述N个第三用户的角色概率分布;
    根据所述N个第三用户的角色概率分布,确定所述第一用户的角色概率分布。
  23. 根据权利要求22所述的设备,其特征在于,所述确定单元,具体用于:
    确定所述第一用户的角色概率分布为所述N个第三用户的角色概率分布的算术平均。
  24. 根据权利要求17至23任一项所述的设备,其特征在于,所述学习的方法为机器学习的方法或统计学习的方法。
  25. 根据权利要求15至24任一项所述的设备,其特征在于,所述确定单元,具体用于:
    根据所述K个集群的影响力和所述第一用户的角色概率分布,采用统计的方法,计算所述第二用户转发所述目标信息的传播概率的期望值;
    将所述传播概率的期望值作为所述第二用户从所述第一用户转发所述目标信息的概率。
  26. 根据权利要求15至25任一项所述的设备,其特征在于,所述K个集群的影响力还包括所述K个集群的信息传播时间延迟率,所述确定单元,还用于:
    根据所述K个集群的影响力和所述第一用户的角色概率分布,确定所述第二用户从所述第一用户转发所述目标信息的时刻。
  27. 根据权利要求15至26任一项所述的设备,其特征在于,所述第二用户为所述第一用户的关注者中未传播所述目标信息的用户。
  28. 根据权利要求27所述的设备,其特征在于,所述确定单元,具体用于:
    根据用户关系数据库,确定所述第一用户的关注者;
    从所述第一用户的关注者中确定所述第二用户,其中,所述第二用户未传播所述目标信息。
PCT/CN2015/079877 2014-09-18 2015-05-27 社交网络中预测信息传播的方法及设备 WO2016041376A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP15841322.9A EP3159809A4 (en) 2014-09-18 2015-05-27 Method and device for predicting information propagation in social network
US15/460,247 US10860941B2 (en) 2014-09-18 2017-03-16 Method and device for predicting information propagation in social network

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201410478217 2014-09-18
CN201410478217.3 2014-09-18
CN201510131640.0A CN106156030A (zh) 2014-09-18 2015-03-24 社交网络中预测信息传播的方法及设备
CN201510131640.0 2015-03-24

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/460,247 Continuation US10860941B2 (en) 2014-09-18 2017-03-16 Method and device for predicting information propagation in social network

Publications (1)

Publication Number Publication Date
WO2016041376A1 true WO2016041376A1 (zh) 2016-03-24

Family

ID=55532530

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/079877 WO2016041376A1 (zh) 2014-09-18 2015-05-27 社交网络中预测信息传播的方法及设备

Country Status (1)

Country Link
WO (1) WO2016041376A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110995485A (zh) * 2019-12-02 2020-04-10 黑龙江大学 一种无拓扑结构的社交消息传播范围预测方法

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130151330A1 (en) * 2011-12-09 2013-06-13 Audax Health Solutions, Inc. Methods and system for predicting influence-basis outcomes in a social network using directed acyclic graphs
CN103258248A (zh) * 2013-05-21 2013-08-21 中国科学院计算技术研究所 一种微博流行趋势预测方法、装置及系统
CN103699650A (zh) * 2013-12-26 2014-04-02 清华大学 消息传播预测方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130151330A1 (en) * 2011-12-09 2013-06-13 Audax Health Solutions, Inc. Methods and system for predicting influence-basis outcomes in a social network using directed acyclic graphs
CN103258248A (zh) * 2013-05-21 2013-08-21 中国科学院计算技术研究所 一种微博流行趋势预测方法、装置及系统
CN103699650A (zh) * 2013-12-26 2014-04-02 清华大学 消息传播预测方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3159809A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110995485A (zh) * 2019-12-02 2020-04-10 黑龙江大学 一种无拓扑结构的社交消息传播范围预测方法
CN110995485B (zh) * 2019-12-02 2022-03-04 黑龙江大学 一种无拓扑结构的社交消息传播范围预测方法

Similar Documents

Publication Publication Date Title
US10860941B2 (en) Method and device for predicting information propagation in social network
US9122989B1 (en) Analyzing website content or attributes and predicting popularity
US11436430B2 (en) Feature information extraction method, apparatus, server cluster, and storage medium
CA2884201C (en) Customized predictors for user actions in an online system
JP6072287B2 (ja) オンライン・システムにおけるユーザ・アクションに基づいて特徴について照会すること
Häggström Data‐driven confounder selection via Markov and Bayesian networks
US10116758B2 (en) Delivering notifications based on prediction of user activity
US10832349B2 (en) Modeling user attitudes toward a target from social media
US20110004463A1 (en) Systems and methods for extracting patterns from graph and unstructured data
Xiao et al. A truth discovery approach with theoretical guarantee
US9386107B1 (en) Analyzing distributed group discussions
US10540607B1 (en) Apparatus, method and article to effect electronic message reply rate matching in a network environment
US10936601B2 (en) Combined predictions methodology
US10827014B1 (en) Adjusting pacing of notifications based on interactions with previous notifications
US20130151330A1 (en) Methods and system for predicting influence-basis outcomes in a social network using directed acyclic graphs
US11138237B2 (en) Social media toxicity analysis
CN110555172A (zh) 用户关系挖掘方法及装置、电子设备和存储介质
Trevezas et al. Exact MLE and asymptotic properties for nonparametric semi-Markov models
Lin et al. Optimization analysis for an infinite capacity queueing system with multiple queue-dependent servers: genetic algorithm
Xu et al. A novel matrix factorization recommendation algorithm fusing social trust and behaviors in micro-blogs
Liu et al. A data classification method based on particle swarm optimisation and kernel function extreme learning machine
WO2016041376A1 (zh) 社交网络中预测信息传播的方法及设备
US20210150374A1 (en) Communication management
JP6784000B2 (ja) 情報処理装置、情報処理方法、及び、プログラム
Grabchak et al. Asymptotic normality for plug-in estimators of diversity indices on countable alphabets

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15841322

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2015841322

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2015841322

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE