CN113254719B - Online social network information propagation method based on status theory - Google Patents
Online social network information propagation method based on status theory Download PDFInfo
- Publication number
- CN113254719B CN113254719B CN202110465358.1A CN202110465358A CN113254719B CN 113254719 B CN113254719 B CN 113254719B CN 202110465358 A CN202110465358 A CN 202110465358A CN 113254719 B CN113254719 B CN 113254719B
- Authority
- CN
- China
- Prior art keywords
- node
- information
- topic
- nodes
- user
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9027—Trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06393—Score-carding, benchmarking or key performance indicator [KPI] analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Systems or methods specially adapted for specific business sectors, e.g. utilities or tourism
- G06Q50/01—Social networking
Abstract
The invention provides an online social network information propagation method based on a status theory. And constructing a tree index structure with the maximum spanning tree attribute to sort the users according to the comprehensive influence, thereby determining the social status level of the users according to the hierarchical characteristics of the tree. By analyzing key factors such as the perception change of the user on the information in the information transmission process, interest time windows, parallel transmission and the like, the updating rule of the user information identity is determined, and meanwhile, the state updating strategy of the user in the information transmission process is determined by combining the social status level of the user. Finally, the method selects an initial seed node of information propagation through a given tree index structure, and maximizes the propagation scale of the information while ensuring accurate prediction of the propagation scale of the specific topic information in the network at different times according to an information propagation model based on the position theory.
Description
Technical Field
The invention belongs to the field of online social networks, relates to online social network information propagation, and particularly relates to an online social network information propagation method based on a position theory.
Background
Public sentiment, real-time data analysis, product advertising marketing and other scenes in an online social network have strict timeliness requirements, and when a new product is online, a company enterprise needs to popularize the product to a user group in the largest range in the shortest time so as to obtain enough benefits. The research on how to accurately predict the propagation scale of the hot topic information and the general rule of topic information circulation can help government agencies to inhibit the random diffusion of public sentiments in the network, so that the information propagation process and result can be better controlled, and the method has important research value.
With the rapid development of computer science and mobile computing, the information dissemination mode has also undergone a great revolution, and users in the online social network are not only creators of information, but also sharers and propagators of information. In addition to factors such as information content, social influence of users who create or propagate information in a social network is also an important factor influencing information propagation strength, and users with high influence can utilize self influence to accelerate information propagation speed so as to attract more people to pay attention. Meanwhile, user comments with low influence often have huge reverberation in a short time due to high-quality contents or proper propagation time. Since the information content features are complicated and difficult to further subdivide, most studies judge the probability of establishing the information propagation link based on the influence among users. In the process of establishing the social relationship, a common user and experts and scholars in the related field mostly take a please-teach way for social behavior, but communication among a plurality of experts and field bulls is mostly mainly discussed and shared. The social status difference determines the communication mode among users and determines the effect of information in the transmission process, the social status refers to the status or grade of influence of users in social network groups, and the attached honor or social prestige.
Topic propagation aims to allow as many users in a social network as possible to accept the information, and the function of the topic propagation is similar to that of advertisement recommendation, for example, recommendation strategies based on user purchase history can provide personalized recommendation services for users. The influence of the users has obvious influence on the information propagation, important factors influencing interactive interaction among users of the online social network are discovered, and the specific process of the information propagation can be better understood. The dissemination of rumors and the replacement of information in the online social network have timeliness, particularly when a crisis occurs, news media must be reported quickly and timely, and an information dissemination algorithm should also give full play to timeliness and push specific topic information to the whole social network in the shortest time.
Currently, the most widely used topic propagation models are:
(a) Independent cascade model: the node state is divided into an active state and an inactive state, the node in the active state represents node identity information and is popularized to other nodes as a propagation source, and only the seed node is in the active state in the initial time slot. When a node receives multiple transmissions from neighboring nodes in the same time slot, the order of the forces generated by the neighboring nodes is random. Each node in the network, with and only one opportunity to propagate topic information, is removed from the network at the next time slot, whether or not the node is successful.
(b) Linear threshold model: the model allocates a threshold value to each node of the online social network to represent the degree of difficulty of the node affected, if the sum of the influence of the neighbor active nodes exceeds the threshold value predefined by the node, the next time slot of the node is converted into an active state, and information is spontaneously recommended to the neighbor nodes. Compared with an independent cascade model, each node in the network can only be converted from an inactive state to an active state without other conversion modes, but a neighbor node in the active state can propagate topic information to the same node for multiple times until the node is converted from the inactive state to the active state.
(c) Model of SIR infectious disease: the general process of disease spreading in the human population is simulated, and the process is an abstract of the information spreading process. The SIR model separates users into three states: susceptible, diseased and removed, removed refers to persons who are healthy after disease and who have no longer affected immunity. Meanwhile, the model assumes that nodes of a susceptible person in unit time randomly contact all nodes in the network with the probability b, and a sick person recovers and obtains immunity with the probability a at the same time.
The propagation process of information, information and hot topics in the online social network is still legal despite the complexity and diversity. The cascade propagation of one piece of information is usually propagated to other users layer by layer in a tree structure from the seed node through social relations among the users, and the cascade propagation mode has high universality. In the technical field of information transmission, the existing information transmission models such as an independent cascade model, an infectious disease model and a linear threshold model simulate the transmission of information in a social network from different angles, but all have shortcomings. The independent cascade model limits each node to be spread to surrounding neighbor nodes only once, and the infectious disease model only considers the information spread with fixed infection probability and ignores the diversity of user groups and the complexity of interaction among users in the online social network.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an online social network information propagation method based on a position theory so as to solve the technical problem that the similarity between the method in the prior art and the real propagation trend needs to be further improved.
In order to solve the technical problems, the invention adopts the following technical scheme to realize:
an online social network information dissemination method based on a position theory comprises the following steps:
step one, crawling historical information data of all users in an online social network, calculating an activity index of each user in a participated topic according to the historical information data, and comprehensively evaluating the influence of the users in information propagation according to activity and influence;
ranking the users according to the comprehensive influence of the users by using a tree index structure with the maximum spanning tree attribute, and dividing the social status level of the users by means of the hierarchical structure of the tree;
thirdly, traversing a tree index structure formed by user nodes from top to bottom according to influence by using a hierarchical traversal algorithm, determining an initial seed node for transmitting topic information, and transmitting the information to all common nodes by the seed node based on social status level;
step four, when the common node receives the influence from the neighbor node, the common node changes the self identity under the current time slot according to the information identity updating rule;
step five, judging whether all nodes which do not receive information in the network select to identify the topic information in the current time slot;
step six, judging whether all nodes in the network which accept the information lose the interest in the information;
and step seven, judging that all the rest nodes in the network cannot continuously receive the influence of the neighbor nodes or the current time slot exceeds the specified time range, and finishing the algorithm.
The invention also has the following technical characteristics:
specifically, the implementation steps of the first step are as follows:
step S10, defining the social network as a weighted directed graph G (V, E), and representing the identity matrix of the user to the information by using a matrix R with the size of 1 multiplied by N;
v is a set of nodes in the network, E is a set of relationships among the nodes, and N is the total number of the nodes in the network;
in the identity matrix, r i,t Indicating node v at time t i The degree of identity of the information;
step S11, calculating each user participating in the topic according to the historical information dataAn activity index, using g to represent a particular topic category in the user topic category total set C, using m to represent the total number of topics in which the user participates in the network, i.e., C = { g = { 1 ,g 2 ,...,g m },|C|=m;
Creating a one-dimensional intensity vector S describing the total intensity of all users under each topic category, the length of the vector S corresponding to the number of specific topic categories g in C, i.e. | S | = m;
representing the topic activeness distribution result of the user by using a two-dimensional matrix A, wherein the abscissa i of A corresponds to a specific user V of a user node set V i The ordinate j of A corresponds to a particular topic category g in the set C j Any element a of A ij Representing a user v i On a particular topic g j (ii) an activity index of;
in the calculation of each element a ij All users participating in the topic will share activity equally, i.e. Under topic g m A total number of participating users;
user v i Total activity value I a (v i |C i ) For its sum of liveness in all participating topics, the calculation formula is as follows:
in the formula:
μ m is shown on a particular topic g m The number of information records;
step S12, comprehensively evaluating the influence of the user in information transmission according to the liveness and the influence, and calculating the static influence of the participant user in the topic transmission process by using a PageRank algorithm, wherein the calculation formula is as follows:
in the formula:
PR i,γ representing users v in a network i The PageRank value obtained after the gamma iteration;
and &>Respectively representing user nodes v j Out-of-order neighbor node set and user node v i The in-degree neighbor node set;
w i,j for user v i For user v j The potential impact of presence;
n represents the total number of nodes in the network;
d is used as a damping coefficient to prevent the infinite increase of the node rank in the calculation process, and d belongs to [0,1];
the influence of the users and the activity level of the topics are integrated, and an evaluation formula of the integrated influence of the users in the online social network is given, and the evaluation formula is as follows:
I(v i |c i )=F(v i )=βI a (v i |c i )+(1-β)I p (v i ) Formula IV
In the formula:
I a (v i |c i ) Representing a user v i Set of topics c in which it participates i Activities possessed by the situationJumping level;
I p (v i ) Representing a user v i Obtaining the influence value after convergence in the network according to the PageRank algorithm;
beta is used as an activity degree correlation coefficient to determine the proportion of the activity degree and the influence contribution in the user comprehensive influence;
I(v i |c i ) Representing the magnitude of the combined impact achieved by the end user.
Specifically, the implementation steps of the second step are as follows:
step S20, using a tree index structure with the maximum spanning tree attribute, ranking the users in the social network based on the comprehensive influence, and establishing a tree index structure with a total depth of K layers, wherein each layer in the tree index respectively corresponds to K levels of the social status of the users, and the divided level sets are represented as:
U={u 1 ,u 2 ,...,u K };
step S21, each directed edge e in the online social network ij Assigning a weight eta for E ij ,η ij Representing a node v i To node v j The potential influence of the position of (1) is different because of different social position grades of the node in the whole life cycle of information dissemination, namely eta ij ≠η ji ;
Node v i To node v j The potential impact of (c) is calculated using the following formula:
in the formula:
u i is a node v i The impact status grade of (1);
u j is a node v j The power status of (1).
Specifically, the third step is realized by the following steps:
step S30: due to the definition of the tree structure, a hierarchical traversal algorithm is used for traversing the tree index structure from top to bottom, initial k seed nodes for transmitting topic information are determined, and the remaining N-k nodes are unaffected common nodes;
and S31, the seed node recognizes the topic information content and transmits the information to all common nodes based on the social status grade, the self recognition degree of the seed node in the initial time slot is the maximum value 1, and the initial recognition degree of the common nodes is the minimum value 0.
Specifically, the implementation steps of the fourth step are as follows:
step S40, dividing the node states into an active state and an inactive state, wherein the active state node actively participates in information forwarding, and the inactive state node passively receives topic information; the state attribute of each node in the network is represented by a matrix D with the size of 1 multiplied by N, and any element D in the matrix D i,t = {0,1} is a Boolean variable, d i,t =1 denotes that node is active at t slot, d i,t =0 indicates that the node is in an inactive state at t-slot;
step S41, in time slot t, when node v i When affected by both higher and lower priorities, node v i The topic information identity updating formula is as follows:
in the formula:
η ij is a node v i To node v j Has a potential influence on the status of (1), and satisfies 0<η ij <1;
V seed Is a seed node set under a t-1 time slot;
only when node v i Can change the node v only when the neighbor node is in the active state i The identity of the topic;
step S42, the propagation of the hot topic information has timeliness, and any node v in the node set E is subjected to i Definition ofAs v i A corresponding topic propagation time window of the node in the network;
wherein the content of the first and second substances,denoted as propagation period t i The start time slot of the time slot, device for selecting or keeping>Is a node v i In the propagation period t i An inner termination slot;
only under the time slot in the time window, the node can be influenced by the neighbor node or used as a seed node to influence other nodes;
when v is i The node is used as a common nodeIs affected by the time period from the seed node, v i Identity r corresponding to a node i,t The value does not change;
when the system time t exceeds v i Upper propagation time window limit ofTime, node v i Is automatically removed from the network, r i,t The value of (c) also does not change any.
Specifically, the step five is realized by the following steps:
step S50, judging whether all the nodes which do not receive the information in the network select to identify the topic information in the current time slot, and subdividing the nodes in the online social network into three different periods: a susceptible phase, an infected phase and an immune phase;
the infection period corresponds to the node state being active, and the infection-susceptible period and the immune period correspond to the node inactive state;
node v in susceptible phase i When the time slot t lies between topic propagation time windows, i.e. when When, v i The node is influenced by the neighbor seed node and changes the topic identity r of the node i,t ;
Step S51, node v of susceptible period i Using topic identity r in topic transmission process i,t Changing the self-state for the two-term distributed result of the probability can be divided into two cases:
when two distributed results d i,t ~B(1,r i,t ) When the time slot is not equal to or less than 0, the node is indicated to not identify topic information and still stays in a susceptible infection period at the time slot of t + 1;
when two distributed results d i,t ~B(1,r i,t ) And when the number of the nodes is not less than 1, the nodes generate sense of identity to the topic information, add a seed node set in a time slot of t +1 and spontaneously propagate the topic information to other nodes.
Specifically, the implementation steps of the sixth step are as follows:
arbitrary v in seed node set j The probability that the node has b loses interest in topic information, namely immunity is generated and converted into an immune state, and the immune node state cannot be changed any more subsequently;
meanwhile, when the system time slot exceeds the node v i And the node is still in the infection phase, i.e. d i,t =0,In time, the node does not receive new topic information in the opportunity, and the node v i Removed from the network and transferred to the immunization phase.
Specifically, the implementation steps of the seventh step are as follows:
with the increase of the time slot t, when all user nodes in the network are in an infection period or an immunization period, the propagation process is ended because no target for propagating the influence is needed, otherwise, the time slot t +1 is returned to the fourth step, and the number of the nodes recognizing the information in each time slot is the propagation scale for predicting the topic information in the specific network.
Compared with the prior art, the invention has the following technical effects:
in order to realize high similarity with a real propagation trend, the method provides the method for dividing the social status grade of the users based on comprehensive influence, and measures the strength of potential influence caused by information through the difference of the social status grade among the users.
The method selects an initial seed node for information propagation through a tree index structure after traversing and sequencing from top to bottom, and establishes an information propagation model based on a position theory on the basis of considering important factors such as information identity perception, interest time window, parallel propagation and the like, thereby ensuring that the propagation scale of the information of the specific topic in the network is maximized while accurately predicting the propagation scale of the information of the specific topic in the network at different time.
(III) through a large number of experiments, the important influence of the position theory in the process of information transmission full life cycle in the method is verified, and meanwhile, the accuracy of the model prediction transmission range can be well shown in the comparative analysis result with other classical models, so that the accuracy of the method is verified.
(IV) the method divides users into different social status levels by using comprehensive influence, measures the information propagation effect and influence by using the user status level characteristics among propagation paths, and accordingly simulates the real change of the information propagation trend in the network.
According to the method, a propagation period, an identity perception strategy, the influence force difference and the like of the user node in the information propagation process are used as key factors, so that the information propagation process in the online social network is simulated really, and the information propagation scale and the influence action are predicted accurately.
Drawings
FIG. 1 is a schematic flow chart of an information dissemination algorithm based on position theory in an online social network.
FIG. 2 is a schematic diagram of an online social network information dissemination topology.
FIG. 3 is a schematic diagram of online social network information parallel propagation.
Fig. 4 is an exemplary diagram of user information affinity perception.
Fig. 5 is a graph of the variation trend of experimental data in propagation time and propagation depth.
Fig. 6 is a graph of experimental results of the propagation model predicting the information propagation trend.
FIG. 7 is a graph of time window size versus number of propagation iterations.
FIG. 8 is a graph of time window size versus the number of eventual infected nodes for each propagation model.
The present invention will be explained in further detail with reference to examples.
Detailed Description
The information transmission method based on the position theory in the online social network can be used for helping government departments to improve the popularization effect of news media and related enterprise organizations on news information, products or advertisements, and plays an important role in specific fields of public crisis treatment, market public praise marketing, advertisement product recommendation, network public opinion monitoring and the like. In view of the dynamic property and complexity of information propagation in a social platform, the method researches the influence of influence difference among users on the propagation effect, introduces important characteristics such as information identity, interest time window, parallel propagation and the like in order to be most consistent with the propagation trend of hot topics in real life, and determines the updating rule and the state updating strategy of the user information identity in the information propagation process. The static influence of the users participating in the network is calculated by using a classic PageRank algorithm, and the propagation action of the users in specific information propagation is comprehensively evaluated by combining the user activity index calculated based on historical information data. And selecting a propagation information initial seed node according to the evaluation result, and maximizing the propagation scale of the information while ensuring accurate prediction of the propagation scale of the specific topic information in the network at different times according to a propagation model based on a position theory.
The present invention is not limited to the following embodiments, and all equivalent changes based on the technical solutions of the present invention fall within the protection scope of the present invention.
The embodiment is as follows:
the embodiment provides an online social network information dissemination method based on a position theory, and as shown in fig. 1 to 3, the method includes the following steps:
step one, crawling historical information data of all users in an online social network, calculating an activity index of each user in a participated topic according to the historical information data, and comprehensively evaluating the influence of the users in information propagation according to activity and influence;
step S10, defining the social network as a weighted directed graph G (V, E), and representing the identity matrix of the user to the information by using a matrix R with the size of 1 multiplied by N;
v is a set of nodes in the network, E is a set of relationships among the nodes, and N is the total number of the nodes in the network;
in the identity matrix, r i,t Indicating node v at time t i The degree of identity of the information;
step S11, calculating an activity index of each user in the participating topics according to the historical information data, using g to represent a specific topic category in the user topic category total set C, using m to represent the total number of topics participated in by the users in the network, namely C = { g = { (g) 1 ,g 2 ,...,g m },|C|=m;
Creating a one-dimensional intensity vector S to describe the total intensity of all users under each topic category, wherein the length of the vector S corresponds to the number of the specific topic categories g in C, namely | S | = m;
representing the topic activeness distribution result of the user by using a two-dimensional matrix A, wherein the abscissa i of A corresponds to a specific user V of a user node set V i The ordinate j of A corresponds to a particular topic category g in the set C j Any element a of A ij Representing a user v i On a particular topic g j The next activity index;
in the calculation of each element a ij All users participating in the topic will share activity equally, i.e. Under topic g m A total number of participating users;
user v i Total activity value I a (v i |C i ) For its sum of liveness in all participating topics, the calculation formula is as follows:
in the formula:
μ m shown on a particular topic g m The number of information records;
step S12, comprehensively evaluating the influence of the user in information transmission according to the liveness and the influence, and calculating the static influence of the participant user in the topic transmission process by using a PageRank algorithm, wherein the calculation formula is as follows:
in the formula:
PR i,γ representing users v in a network i The PageRank value obtained after the gamma iteration;
and &>Respectively representing user nodes v j Out-of-order neighbor node set and user node v i The in-degree neighbor node set;
w i,j for user v i For user v j The potential impact of presence;
n represents the total number of nodes in the network;
d is used as a damping coefficient to prevent the infinite increase of the node rank in the calculation process, and d belongs to [0,1];
the influence of the user and the activity level of the topic are integrated, and an evaluation formula of the comprehensive influence of the online social network user is given, and the evaluation formula is as follows:
I(v i |c i )=F(v i )=βI a (v i |c i )+(1-β)I p (v i ) Formula IV
In the formula:
I a (v i |c i ) Representing a user v i Set of topics c in which it participates i The activity level that the situation has;
I p (v i ) Representing a user v i Obtaining the influence after convergence in the network according to the PageRank algorithm;
beta is used as an activity correlation coefficient to determine the proportion of the activity and the contribution of the influence in the comprehensive influence of the user;
I(v i |c i ) Representing the magnitude of the combined impact achieved by the end user.
Ranking the users according to the comprehensive influence of the users by using a tree index structure with the maximum spanning tree attribute, and dividing the social status level of the users by means of the hierarchical structure of the tree;
step S20, using a tree index structure with the maximum spanning tree attribute, ranking the users in the social network based on the comprehensive influence, and establishing a tree index structure with a total depth of K layers, wherein each layer in the tree index respectively corresponds to K levels of the social status of the users, and the divided level sets are represented as:
U={u 1 ,u 2 ,...,u K };
step S21, each directed edge e in the online social network ij Assigning a weight eta for E ij ,η ij Representing a node v i To node v j The potential influence of the node on other nodes in the whole life cycle of information propagation is different due to different social status levels of the node, namely eta ij ≠η ji ;
Node v i To node v j The potential impact of (c) is calculated using the following formula:
in the formula:
u i is a node v i The power status grade of (a);
u j is a node v j The power status of (1).
Thirdly, traversing a tree index structure formed by user nodes from top to bottom according to influence by using a hierarchical traversal algorithm, determining initial seed nodes for propagating topic information, and performing information propagation on all common nodes by the seed nodes based on social status levels;
step S30: due to the definition of the tree structure, a hierarchical traversal algorithm is used for traversing the tree index structure from top to bottom, initial k seed nodes for transmitting topic information are determined, and the remaining N-k nodes are unaffected common nodes;
and S31, the seed node recognizes the topic information content and carries out information propagation to all common nodes based on the social status grade, the self recognition degree of the seed node in the initial time slot is the maximum value 1, and the initial recognition degree of the common node is the minimum value 0.
Step four, when the common node receives the influence from the neighbor node, the common node changes the identity of the common node under the current time slot according to the information identity updating rule;
step S40, dividing the node states into an active state and an inactive state, wherein the active state node actively participates in information forwarding, and the inactive state node passively receives topic information; the state attribute of each node in the network is represented by a matrix D with the size of 1 multiplied by N, and any element D in the matrix D i,t Each of = {0,1} is a Boolean variable, d i,t =1 denotes that the node is active at t slot, d i,t =0 indicates that the node is in an inactive state at t slot;
step S41, at time slot t, when node v i When affected by both higher and lower priorities, node v i The topic information identity updating formula is as follows:
in the formula:
η ij is a node v i To node v j Potential position ofInfluence and satisfy 0<η ij <1;
V seed Is a seed node set under a t-1 time slot;
only when node v i Can change the node v only when the neighbor node is in the active state i The identity of the topic;
step S42, the propagation of the hot topic information has timeliness, and any node v in the node set E is subjected to i Definition ofAs v i A corresponding topic propagation time window of the node in the network;
wherein the content of the first and second substances,denoted as propagation period t i In an initial time slot, in>Is a node v i In the propagation period t i An inner termination slot;
only under the time slot in the time window, the node can be influenced by the neighbor node or used as a seed node to influence other nodes;
when v is i The node is used as a common nodeIs affected by the time period from the seed node, v i Identity r corresponding to a node i,t The value does not change;
when the system time t exceeds v i Upper propagation time window limit ofTime, node v i Is automatically removed from the network, r i,t The value of (c) also does not change any.
Specifically, in this embodiment:
user informationAn example of affinity awareness is shown in FIG. 4, example (a) user v in an online social network 3 Influenced by four nodes with higher social status at the same time, i.e./ 13 =l 23 =l 43 =l 53 =1, then user v 3 The calculation method of the information identity in the next time slot is as follows:
r 3,t+1 =0.1+(0.5+0.2+0.4+0.1)×0.6=0.82;
similarly, in example (b), v 3 The social influence of the four neighbor nodes is relatively low, and the result of the update of the identity of the next time slot is as follows:
r 3,t+1 =0.1+(0.5+0.2+0.1+0.4)×0.25=0.4。
as can be seen from the same, in example (c), v 3 Affected by nodes from different social positions.
r 3,t+1 =0.1+(0.5+0.2)×0.6+0.4×0.5+0.1×0.3=0.75。
Step five, judging whether all nodes which do not receive information in the network select to identify the topic information in the current time slot;
step S50, judging whether all the nodes which do not receive information in the network select to identify the topic information in the current time slot, and subdividing the nodes in the online social network into three different periods: a susceptible phase, an infected phase and an immune phase;
the infection period corresponds to the node state being active, and the infection-susceptible period and the immune period correspond to the node inactive state;
node v in susceptible phase i When the time slot t lies between topic propagation time windows, i.e. when When, v i The node is influenced by the neighbor seed node and changes the topic identification degree r i,t ;
Step S51, node v of susceptible period i Recognizing topics during topic transmissionDegree of identity r i,t Changing the self-state for the two-term distributed result of the probability can be divided into two cases:
when two distributed results d i,t ~B(1,r i,t ) If =0, it indicates that the node does not identify topic information, and the node is still in a susceptible infection period at the t +1 time slot;
when two distributed results d i,t ~B(1,r i,t ) And when the number of the nodes is not less than 1, the nodes generate sense of identity to the topic information, add a seed node set in a time slot of t +1 and spontaneously propagate the topic information to other nodes.
Step six, judging whether all nodes which accept the information in the network lose the interest of the information;
arbitrary v in seed node set j The probability that the node has b loses interest in topic information, namely immunity is generated and converted into an immune state, and the immune node state cannot be changed any more subsequently;
meanwhile, when the system time slot exceeds the node v i And the node is still in the infection phase, i.e. d i,t =0,If the node v receives new topic information, the node v does not receive the new topic information in the opportunity i Removed from the network and transferred to the immunization phase.
Step seven, judging that all the rest nodes in the network cannot continuously receive the influence of the neighbor nodes or the current time slot exceeds the specified time range, and finishing the algorithm;
with the increase of the time slot t, when all user nodes in the network are in an infection period or an immunization period, the propagation process is finished because no target for propagating the influence is needed, otherwise, the time slot t +1 returns to the step four again, and the number of the nodes recognizing the information in each time slot is the propagation scale for predicting the topic information in the specific network.
Performance analysis:
for the information propagation method based on the position theory of the online social network provided by the embodiment, the performance of the proposed prediction model on the real data set is evaluated through experiments. Meanwhile, the method is compared with three existing classical propagation models, the importance of the position theory in information propagation is proved, and empirical research results show that the method can ensure that the propagation scale of the specific topic information in the network at different time can be accurately predicted, and meanwhile, the propagation scale of the information is maximized.
(A) Experimental setup:
the experiment uses large-scale real-time information dissemination data with fixed timestamp marks acquired from online social platforms such as microblog, and the like, specifically comprises 46026 user nodes participating in topics, 154979 dissemination link relations, and 1084853 important characteristic attributes including user attributes, text contents and the like. The data reflects the whole life cycle process of the propagation of the microblog hot topics, wherein the initial time of information propagation is 2021-01-10, 28, and the termination time of topic information collection is 2021-01-31. A graph of the variation trend of the experimental data in propagation time and propagation depth is shown in FIG. 5, wherein the X axis in FIG. 5 represents the propagation days of the information in the social platform, and the Y axis represents the propagation depth of the information through the user. It can be clearly seen that the propagation of the hot topic in the online social network has high timeliness, and this phenomenon can also represent the general trend performance of various types of information propagation in the real world.
In order to prove the accuracy of algorithm prediction, an IC cascade model and an SIR infectious disease model are used for carrying out a control experiment, wherein the SIR model has a probability of infection susceptibility of a =0.3 and a probability of interest loss of b =0.1, but does not add social status as an influencing factor; all user nodes in the IC model follow the position theory, but the nodes have only one opportunity to propagate information to the neighbor nodes. In order to ensure the correctness of the comparison effect, the three algorithms are respectively operated for 500 times, the number of user nodes recognizing the topic information in each iteration process is recorded, and an average value is selected as a final result.
(B) And (4) performance results:
(B1) Model accuracy: in order to verify the effectiveness of the information transmission method based on the status theory, experiments are respectively compared and analyzed with a classical SIR infectious disease model, an IC model and a real-time data transmission result, an experiment result chart of the four model transmission models for predicting the information transmission trend is shown in FIG. 6, an RD curve in FIG. 6 is represented as a transmission curve of a full-life-cycle hot topic acquired in real time, and data are calculated once every six hours. In order to verify the importance of the status theory, the experiment defaults that all participant users in the online social network are not influenced by the size of the interest time window, and each node can be infected by other seed nodes in the system-wide time. The seed node is selected in the initial time slot to ensure that the seed node is highly consistent with the variable in the real data set, and only the user with the first comprehensive influence ranking is selected as the seed node to be responsible for creating and propagating topic information in the experiment. It is clear from the experimental result graph that the IC model performs the worst in comparison with other models because each participant has only one chance and has when it propagates information, and if the user propagates information to its neighbor nodes in the t time slot, it will be removed from the network in the next time slot, which is also the reason why the trend curve of the IC model is not propagated to all nodes without the time window constraint. The final prediction result of the SIR model is similar to the real data, but the information transmission in the online social network has high timeliness, hot topics can only have enough influence within a few days, the SIR model has relatively fixed infection probability, so that an ascending curve is relatively gentle, the transmission trend of the real information cannot be correctly reflected within the key time of the information transmission, and meanwhile, the SIR model is obviously too high in later prediction and greatly different from the real situation. The STB Model (Status Theory Based Model) provided by the invention introduces a position Theory while ensuring the freedom of node propagation, so that the change intensity of a prediction scale curve in the early stage is faster than that of an SIR Model, and the increase of the number of infected nodes is well reduced in the later stage of propagation, so that the total prediction curve is highly similar to the change curve of real data. Through the comparative analysis of the propagation results of different models, the important role of the theory of status in the information propagation process is fully proved, and meanwhile, the accuracy of the method provided by the invention in predicting the topic information propagation breadth and depth is verified.
(B2) Information propagation time window:
the time in which a user in an online social network is interested in specific topic information is short and limited, and the size of an interest time window is an important factor for determining the information propagation range. The experiment is based on the time point of each user participating in information propagation in the real data set as a reference, different time window sizes are set by taking days as a unit, and therefore performances of the method, an SIR model and an IC model on iteration times and the number of finally infected users are contrastingly analyzed. The comparison relationship between the size of the time window and the number of propagation iterations and the number of finally infected nodes of each propagation model is shown in fig. 7 and 8, when the size of the time window increases, the number of users which can be affected by the seed node in the same time slot obviously increases, and therefore the propagation depth and the influence range of the information also obviously increase. In the three models, the influence of the seed node is limited by the IC model, so that the final infection number and the iteration number stop increasing after the time window size reaches a certain range. Compared with an SIR model, the algorithm provided by the invention can not only influence more nodes under shorter iteration times, but also can well ensure the prediction precision when the time window size is 1, and fully proves the effectiveness of the algorithm in predicting the propagation scale of specific topic information in a network and maximizing the propagation scale of the information at different times.
Claims (7)
1. An online social network information dissemination method based on a position theory is characterized by comprising the following steps:
step one, crawling historical information data of all users in an online social network, calculating an activity index of each user in a participated topic according to the historical information data, and comprehensively evaluating the influence of the users in information propagation according to activity and influence;
the implementation steps of the first step are as follows:
step S10, defining the social network as a weighted directed graph G (V, E), and representing the identity matrix of the user to the information by using a matrix R with the size of 1 multiplied by N;
v is a set of nodes in the network, E is a set of relationships among the nodes, and N is the total number of the nodes in the network;
in the identity matrix, r i,t Indicating node v at time t i The identity of the information;
step S11, calculating an activity index of each user in the participating topics according to the historical information data, using g to represent a specific topic category in the user topic category total set C, using m to represent the total number of topics participated in by the users in the network, namely C = { g = { (g) 1 ,g 2 ,...,g m },|C|=m;
Creating a one-dimensional intensity vector S to describe the total intensity of all users under each topic category, wherein the length of the vector S corresponds to the number of the specific topic categories g in C, namely | S | = m;
representing the topic activeness distribution result of the user by using a two-dimensional matrix A, wherein the abscissa i of A corresponds to a specific user V of a user node set V i The ordinate j of A corresponds to a particular topic category g in the set C j Any element a of A ij Representing a user v i On a particular topic g j (ii) an activity index of;
in the calculation of each element a ij All users participating in the topic will share activity equally, i.e. Under topic g m A total number of participating users;
user v i Total activity value I a (v i |C i ) For its sum of liveness in all participating topics, the calculation formula is as follows:
in the formula:
μ m is shown on a particular topic g m The number of information records;
step S12, comprehensively evaluating the influence of the user in information transmission according to the liveness and the influence, and calculating the static influence of the participant user in the topic transmission process by using a PageRank algorithm, wherein the calculation formula is as follows:
in the formula:
PR i,γ representing users v in a network i The PageRank value obtained after the gamma iteration;
and &>Respectively representing user nodes v j Out-of-order neighbor node set and user node v i The in-degree neighbor node set; />
w i,j For user v i For user v j The potential impact of presence;
n represents the total number of nodes in the network;
d is used as a damping coefficient to prevent the infinite increase of the node rank in the calculation process, and d belongs to [0,1];
the influence of the user and the activity level of the topic are integrated, and an evaluation formula of the comprehensive influence of the online social network user is given, and the evaluation formula is as follows:
I(v i |c i )=F(v i )=βI a (v i |c i )+(1-β)I p (v i ) Equation N
In the formula:
I a (v i |c i ) Representing user v i Set of topics c in which it participates i The activity level that the situation has;
I p (v i ) Representing a user v i Obtaining the influence after convergence in the network according to the PageRank algorithm;
beta is used as an activity degree correlation coefficient to determine the proportion of the activity degree and the influence contribution in the user comprehensive influence;
I(v i |c i ) Representing the magnitude of the resultant impact obtained by the end user;
ranking the users according to the comprehensive influence of the users by using a tree index structure with the maximum spanning tree attribute, and dividing the social status level of the users by means of the hierarchical structure of the tree;
thirdly, traversing a tree index structure formed by user nodes from top to bottom according to influence by using a hierarchical traversal algorithm, determining initial seed nodes for propagating topic information, and performing information propagation on all common nodes by the seed nodes based on social status levels;
step four, when the common node receives the influence from the neighbor node, the common node changes the identity of the common node under the current time slot according to the information identity updating rule;
step five, judging whether all nodes which do not receive information in the network select to identify the topic information in the current time slot;
step six, judging whether all nodes in the network which accept the information lose the interest in the information;
and step seven, judging that all the rest nodes in the network cannot continuously receive the influence of the neighbor nodes or the current time slot exceeds the specified time range, and finishing the algorithm.
2. The method for spreading the information of the online social network based on the position theory as claimed in claim 1, wherein the step two is realized by the following steps:
step S20, using a tree index structure with the maximum spanning tree attribute, ranking the users in the social network based on the comprehensive influence, and establishing a tree index structure with a total depth of K layers, wherein each layer in the tree index respectively corresponds to K levels of the social status of the users, and the divided level sets are represented as:
U={u 1 ,u 2 ,...,u K };
step S21, each directed edge e in the online social network ij All the E's are given weight eta ij ,η ij Representing a node v i To node v j The potential influence of the node on other nodes in the whole life cycle of information propagation is different due to different social status levels of the node, namely eta ij ≠η ji ;
Node v i To node v j The potential impact of (c) is calculated using the following formula:
in the formula:
u i is a node v i The impact status grade of (1);
u j is a node v j The power status of (1).
3. The method for spreading the information of the online social network based on the position theory as claimed in claim 1, wherein the third step is realized by the following steps:
step S30: due to the definition of the tree structure, a hierarchical traversal algorithm is used for traversing the tree index structure from top to bottom, initial k seed nodes for transmitting topic information are determined, and the remaining N-k nodes are unaffected common nodes;
and S31, the seed node recognizes the topic information content and transmits the information to all common nodes based on the social status grade, the self recognition degree of the seed node in the initial time slot is the maximum value 1, and the initial recognition degree of the common nodes is the minimum value 0.
4. The online social network information dissemination method based on the position theory as claimed in claim 1, wherein the step four is implemented as follows:
s40, dividing the node states into an active state and an inactive state, wherein the active state node actively participates in information forwarding, and the inactive state node passively receives topic information; the state attribute of each node in the network is represented by a matrix D with the size of 1 multiplied by N, and any element D in the matrix D i,t Each of = {0,1} is a Boolean variable, d i,t =1 denotes that the node is active at t slot, d i,t =0 indicates that the node is in an inactive state at t-slot;
step S41, in time slot t, when node v i When affected by both higher and lower priorities, node v i The topic information identity updating formula is as follows:
in the formula:
η ij is a node v i To node v j Has a potential influence on the position of (1), and satisfies 0 < eta ij <1;
V seed Is a seed node set under a t-1 time slot;
only when node v i Can change the node v only when the neighbor node is in the active state i The identity of the topic;
step S42, the propagation of the hot topic information has timeliness and is applied to any node v in the node set E i Definition ofAs v i A corresponding topic propagation time window of the node in the network;
wherein, the first and the second end of the pipe are connected with each other,denoted as propagation period t i The start time slot of the time slot, device for selecting or keeping>Is a node v i In a propagation period t i An inner termination slot;
only under the time slot in the time window, the node can be influenced by the neighbor node or used as a seed node to influence other nodes;
when v is i The node is used as a common nodeIs affected by the time period from the seed node, v i Identity r corresponding to a node i,t The value does not change;
5. The method for spreading the information of the online social network based on the position theory as claimed in claim 1, wherein the step five is realized by the following steps:
step S50, judging whether all the nodes which do not receive information in the network select to identify the topic information in the current time slot, and subdividing the nodes in the online social network into three different periods: a susceptible phase, an infected phase and an immune phase;
the infection period corresponds to the node state being active, and the infection-susceptible period and the immune period correspond to the node inactive state;
node v in susceptible phase i When the time slot t lies between topic propagation time windows, i.e. whenWhen, v i The node is influenced by the neighbor seed node and changes the topic identification degree r i,t ;/>
Step S51, node v of susceptible period i Using topic identity r in topic transmission process i,t Changing the self-state for the two-term distributed result of the probability can be divided into two cases:
when two distributed results d i,t ~B(1,r i,t ) When the time slot is not equal to or less than 0, the node is indicated to not identify topic information and still stays in a susceptible infection period at the time slot of t + 1;
when two distributed results d i,t ~B(1,r i,t ) And when the number of the nodes is not less than 1, the nodes generate sense of identity to the topic information, add a seed node set in a time slot of t +1 and spontaneously propagate the topic information to other nodes.
6. The online social network information dissemination method based on the position theory as claimed in claim 1, wherein the implementation steps of the sixth step are as follows:
arbitrary v in seed node set j The probability that the node has b loses interest in topic information, namely immunity is generated and converted into an immune state, and the immune node state cannot be changed any more subsequently;
7. The method for spreading the information of the online social network based on the position theory as claimed in claim 1, wherein the seventh step is realized by the following steps:
with the increase of the time slot t, when all user nodes in the network are in an infection period or an immunization period, the propagation process is ended because no target for propagating the influence is needed, otherwise, the time slot t +1 is returned to the fourth step, and the number of the nodes recognizing the information in each time slot is the propagation scale for predicting the topic information in the specific network.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110465358.1A CN113254719B (en) | 2021-04-28 | 2021-04-28 | Online social network information propagation method based on status theory |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110465358.1A CN113254719B (en) | 2021-04-28 | 2021-04-28 | Online social network information propagation method based on status theory |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113254719A CN113254719A (en) | 2021-08-13 |
CN113254719B true CN113254719B (en) | 2023-04-07 |
Family
ID=77221998
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110465358.1A Active CN113254719B (en) | 2021-04-28 | 2021-04-28 | Online social network information propagation method based on status theory |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113254719B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114465893B (en) * | 2022-02-28 | 2023-04-11 | 武汉大学 | Propagation network reconstruction method, device, equipment and storage medium |
CN116319379B (en) * | 2023-05-17 | 2023-08-01 | 云目未来科技(湖南)有限公司 | Network information guiding intervention method and system based on propagation chain |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8312056B1 (en) * | 2011-09-13 | 2012-11-13 | Xerox Corporation | Method and system for identifying a key influencer in social media utilizing topic modeling and social diffusion analysis |
CN104598605A (en) * | 2015-01-30 | 2015-05-06 | 福州大学 | Method for user influence evaluation in social network |
CN106127590A (en) * | 2016-06-21 | 2016-11-16 | 重庆邮电大学 | A kind of information Situation Awareness based on node power of influence and propagation management and control model |
CN110136015A (en) * | 2019-03-27 | 2019-08-16 | 西北大学 | A kind of information dissemination method that online social networks interior joint similitude is laid equal stress on cohesiveness |
CN111898041A (en) * | 2020-07-20 | 2020-11-06 | 电子科技大学 | Social network combined circle layer user comprehensive influence evaluation and counterfeiting discrimination method |
CN112380455A (en) * | 2020-11-06 | 2021-02-19 | 中国电子科技集团公司电子科学研究院 | Method for directionally and covertly acquiring data of international and foreign internet based on backtracking security controlled network access channel |
-
2021
- 2021-04-28 CN CN202110465358.1A patent/CN113254719B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8312056B1 (en) * | 2011-09-13 | 2012-11-13 | Xerox Corporation | Method and system for identifying a key influencer in social media utilizing topic modeling and social diffusion analysis |
CN104598605A (en) * | 2015-01-30 | 2015-05-06 | 福州大学 | Method for user influence evaluation in social network |
CN106127590A (en) * | 2016-06-21 | 2016-11-16 | 重庆邮电大学 | A kind of information Situation Awareness based on node power of influence and propagation management and control model |
CN110136015A (en) * | 2019-03-27 | 2019-08-16 | 西北大学 | A kind of information dissemination method that online social networks interior joint similitude is laid equal stress on cohesiveness |
CN111898041A (en) * | 2020-07-20 | 2020-11-06 | 电子科技大学 | Social network combined circle layer user comprehensive influence evaluation and counterfeiting discrimination method |
CN112380455A (en) * | 2020-11-06 | 2021-02-19 | 中国电子科技集团公司电子科学研究院 | Method for directionally and covertly acquiring data of international and foreign internet based on backtracking security controlled network access channel |
Non-Patent Citations (3)
Title |
---|
A Novel Social Recommendation Method Fusing User’s Social Status and Homophily Based on Matrix Factorization Techniques;Rui Chen;《IEEE》;20190113;第18783-18798页 * |
Signed-PageRank: An Efficient Influence Maximization Framework for Signed Social Networks;Xiaoyan Yin;《IEEE》;20190816;第2208-2222页 * |
社交网络分析与信息传播研究;郭琛;《中国优秀硕士学位论文全文数据库》;20130315;第I139-225页 * |
Also Published As
Publication number | Publication date |
---|---|
CN113254719A (en) | 2021-08-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11100411B2 (en) | Predicting influence in social networks | |
CN106980692B (en) | Influence calculation method based on microblog specific events | |
Murata et al. | Link prediction of social networks based on weighted proximity measures | |
CN106682991B (en) | Information propagation model based on online social network and propagation method thereof | |
CN110825948B (en) | Rumor propagation control method based on rumor-splitting message and representation learning | |
CN104182457B (en) | The event popularity Forecasting Methodology based on poisson process model in social networks | |
CN113254719B (en) | Online social network information propagation method based on status theory | |
CN109727152B (en) | Online social network information propagation construction method based on time-varying damping motion | |
Kalampokis et al. | Combining social and government open data for participatory decision-making | |
CN110781411B (en) | Rumor propagation control method based on rumor splitting message | |
CN108230169B (en) | Information propagation model based on social influence and situation perception system and method | |
WO2020135642A1 (en) | Model training method and apparatus employing generative adversarial network | |
CN113282841B (en) | Modeling-based public topic propagation evaluation method and system | |
CN115221396A (en) | Information recommendation method and device based on artificial intelligence and electronic equipment | |
CN110245133B (en) | Online Learning Course Analysis Method Based on Collective Attention Flow Network | |
CN109727153B (en) | Online social network information propagation prediction system based on time-varying damping motion | |
CN107798623A (en) | Media intervene lower three points of opinion colonies network public-opinion propagation model | |
CN115495671A (en) | Cross-domain rumor propagation control method based on graph structure migration | |
CN102866997A (en) | Method and device for processing user data | |
CN115712772A (en) | Topic propagation prediction method based on topic association | |
Canals et al. | Evolution of knowledge management strategies in organizational populations: a simulation model | |
Li et al. | A Network Public Opinion Trend Estimation Model Using a Scale-free Network Algorithm | |
Zhang et al. | A Multi-agent Based Sage-Fool Model for Rumor Propagation | |
Song et al. | Evolutionary Game Propagation Model on Social Networks | |
Wang et al. | A novel rumour propagation model on social networks |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |