CN112380456A

CN112380456A - Condensation entropy based dynamic influence maximization method

Info

Publication number: CN112380456A
Application number: CN202011338087.5A
Authority: CN
Inventors: 李卫民; 钟克欣; 王钊; 刘艳霞
Original assignee: University of Shanghai for Science and Technology
Current assignee: University of Shanghai for Science and Technology
Priority date: 2020-11-25
Filing date: 2020-11-25
Publication date: 2021-02-19

Abstract

The invention discloses a method for maximizing dynamic influence based on condensation entropy, which specifically comprises the following steps: 1) the CeCOPRA algorithm is provided for carrying out overlapped community discovery on the social network; 2) selecting potential nodes in the aggregation area to construct a candidate seed set; 3) providing a selectable dynamic influence propagation algorithm, obtaining the cohesion between adjacent nodes by utilizing multiple entropy calculations, and determining whether the node can be a transmissible precursor of another node or not so as to continuously and effectively diffuse information; 4) finally, through multiple experiments on multiple data sets, whether the DEIM algorithm can successfully influence the ideal number of users in different scenes is verified. The method can filter edge nodes in the network, narrow the selection range of the seed nodes, greatly improve the efficiency, maintain the autonomy of individuals and ensure a more real information transmission process.

Description

Condensation entropy based dynamic influence maximization method

Technical Field

The invention relates to the technical field of social networks, in particular to a dynamic influence maximization method based on condensation entropy.

Background

With the development of network technology, various social software becomes the mainstream form of online communication for people, such as Facebook, YouTube, Twitter, etc. Therefore, massive network data are triggered, and the research prospect of maximizing the influence is wider and more important. Influence maximization refers to the problem that a group of seed nodes are selected in a social network, and the overall influence of the seed nodes on other nodes in the network is maximized according to a specific diffusion model. A widely used marketing strategy is to generate a chain reaction to make its own goods purchased by more people through public praise effect, but how to obtain the best publicity effect with the minimum cost, namely, the selection of the initial user set is the challenge facing the influence maximization problem.

The goal of impact maximization is to determine K influential seed nodes, a property of the complex network making this task very complex. The node has the key function in the seed selection process according to the self attribute and structural characteristics of the node in the network, the community structure exists due to the characteristics, the reasonable selection is realized by utilizing the topological characteristic of the community to represent the user, and meanwhile, the special user in the community can provide a good starting point for the information diffusion, so the accuracy of community division directly influences the effect of the seed set. The current work related to community division comprises the steps of judging a community division result according to the membership degree of a node to a community to which a neighbor node belongs, dividing a community structure according to the attraction among self-organizing nodes, determining the community through budget allocation, determining a seed node through budget transfer and the like. However, these algorithms lack quantification of social distances among users, so that community division results are affected, and the obtained community structure is a non-overlapping community, which is obviously not very realistic. How to obtain an accurate community structure and integrate the secondary influence into the maximization process is a direction worthy of research.

Classical propagation models in the field of influence maximization, the independent cascade model and the linear threshold model, have developed a plurality of models. However, most algorithms have certain limitations, do not consider the uncertainty of the diffusion process in the real social network, and ignore the right of selecting the sharing object caused by the autonomy of individuals. In reality, a user can subjectively select an object for sharing information, and the object may not talk to a member, but may only have a working communication with a colleague. The decision of who to share resources with a user is a starting point for information dissemination. Each user in the social network is observed from a space angle, a plurality of paths are formed radially by taking the user as a center, and information flows to other users along the paths. Since the user autonomously selects the point through which information flows, there is uncertainty about the length and direction of the propagation path starting from the user. According to the above characteristics, how to model the dynamic problem of the propagation path caused by the autonomy of the individual is a challenge.

Disclosure of Invention

The invention aims to provide a dynamic influence maximization method based on condensation entropy, which aims to solve the problems in the prior art, so that edge nodes in a network are filtered, the selection range of seed nodes is reduced, the autonomy of individuals is reserved, and the information transmission process is more real.

In order to achieve the purpose, the invention provides the following scheme:

the invention provides a dynamic influence maximization method based on condensation entropy, which comprises the following steps:

s1, constructing a CeCOPRA algorithm: defining the degree of affinity and sparseness among users by using the concept of condensation entropy by using the local topological information of the nodes, and dividing overlapping communities;

s2, in order to reduce the selection range of the seed nodes, screening out a candidate seed set by utilizing a community structure, wherein the candidate seed set is a node set which has potential to become seeds, and the method specifically comprises the following steps:

a selected aggregation bridge in the large network; an aggregate focus of selection in each community;

s3, constructing a selectable dynamic influence propagation algorithm: a propagation control factor alpha is added for representing the lower limit of a propagation condition, whether a user can become a precursor for propagation and influence other people is judged by combining self-information entropy and cohesion of cohesion entropy, the propagator has an opportunity to express own view when the cohesion reaches a threshold value, and otherwise, the influence on the propagation is finished;

and S4, verifying whether the DEIM algorithm can successfully influence the users with ideal number in different scenes through multiple experiments on multiple data sets.

Further, the aggregation entropy in step S1 is to measure the similarity between two nodes with respect to the distribution of neighborhood information, place the attribute of the node itself at the head, use the closeness of the connection edge between the nodes in the local area as an auxiliary attribute, calculate the aggregation entropy between the nodes using the neighborhood structure information of the nodes, and calculate the aggregation entropy CE of the nodes i and j_ijThe calculation formula is defined as follows:

wherein r is_ijThe sum of the relative entropies of the neighborhood information distributions of the node i and the node j, namely the degree of dispersion.

Further, the aggregation bridge in step S2 is defined as: regarding each community as a gathering area, the position of the overlapped node is a gathering intersection area, gathering bridges are generated in the area, and a gathering bridge N_hingeIs a set of user representatives across multiple domains, defined as:

wherein

The node or the node set simultaneously located in six or more communities in the representative community i is tightly connected with a plurality of aggregation areas, and the number of the communities can ensure that users in the aggregation bridge have enough chances to try to influence other people, so that a certain number of influence diffusion paths are ensured.

Further, the focus of focus in step S2 is defined as: the non-overlapping nodes of each community form a concentrated aggregation area of the community, wherein the node with the highest centrality has the closest relation with other nodes in the area, which is called an aggregation focus and is expressed as:

wherein

Represents the node v that maximizes D (v).

Further, the self-information entropy in the step S3 is defined as: the information quantity carried by the node is positively correlated with the diffusion quantity of the node, and the formula is as follows:

where M is the total number of edges in the entire network, D_uRepresenting the degree of the node u, wherein the information entropy is the quantification of the information, and the self-information entropy measures the quantity of the information carried by the node according to the ratio of the degree of the node to the total number of edges.

Further, the propagable precursor in step S3 is defined as: in the network G (V, E), V is a node set, E is an edge set, and for an edge (u, V) ∈ E, when the cohesion of the node u and the node V reaches the value of the propagation control factor α, the node u has the capability of attempting to affect the node V, that is, the node u becomes a propagable precursor of the node V, and then attempts to affect the node V.

The invention discloses the following technical effects:

1. the CeCOPRA algorithm is provided, the degree of affinity and sparseness among users is defined by using the concept of the condensation entropy by using the local topology information of the nodes, and the overlapping community division is carried out. Influence of randomness generated by neglecting the relationship among users and inappropriate threshold selection is eliminated to a certain extent, and an aggregation bridge and an aggregation focus are selected as potential seed nodes, so that the efficiency can be greatly improved.

2. An alternative dynamic influence propagation algorithm is proposed, incorporating a propagation control factor α for indicating a lower limit of the propagation conditions, i.e. for adjusting the process. And the cohesion combining the self-information entropy and the cohesion entropy is provided to judge whether the user can become a transmissible precursor so as to influence other people, the propagator has an opportunity to express the view of the user when the cohesion reaches a threshold value, and otherwise, the diffusion is influenced to be finished. The real propagation path is more realistic, and the condition is provided to improve the efficiency and avoid unnecessary diffusion attempt by using a large amount of time.

3. Multiple tests are carried out on the four data sets, and the results show that conditional propagation is carried out by utilizing the community structure, so that the time efficiency can be obviously improved, and the acceptable precision loss is ensured.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.

FIG. 1 is a network structure, where a) is an example of a network shown in a subgraph, b) is a neighborhood structure of nodes shown as a subgraph;

FIG. 2 is an example of selecting a candidate seed set based on community structure;

FIG. 3 is an example of a transmissible precursor;

FIG. 4 is a graph of the impact propagation range for different algorithms on four data sets, where (a) is DBLP, (b) is Facebook, (c) is wiki-Vote, and (d) is CA-HepPh;

FIG. 5 is a graph of the run times of different algorithms on four datasets, where (a) is DBLP, (b) is Facebook, (c) is wiki-Vote, and (d) is CA-HepPh;

FIG. 6 is a graph of the impact propagation range for different propagation control factors on four data sets, where (a) is DBLP, (b) is Facebook, (c) is wiki-Vote, and (d) is CA-HepPh;

FIG. 7 shows the run times for different propagation control factors on four data sets, where (a) is DBLP, (b) is Facebook, (c) is wiki-Vote, and (d) is CA-HepPh.

Detailed Description

Reference will now be made in detail to various exemplary embodiments of the invention, the detailed description should not be construed as limiting the invention but as a more detailed description of certain aspects, features and embodiments of the invention.

It is to be understood that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. Further, for numerical ranges in this disclosure, it is understood that each intervening value, between the upper and lower limit of that range, is also specifically disclosed. Every smaller range between any stated value or intervening value in a stated range and any other stated or intervening value in a stated range is encompassed within the invention. The upper and lower limits of these smaller ranges may independently be included or excluded in the range.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. Although only preferred methods and materials are described herein, any methods and materials similar or equivalent to those described herein can be used in the practice or testing of the present invention. All documents mentioned in this specification are incorporated by reference herein for the purpose of disclosing and describing the methods and/or materials associated with the documents. In case of conflict with any incorporated document, the present specification will control.

It will be apparent to those skilled in the art that various modifications and variations can be made in the specific embodiments of the present disclosure without departing from the scope or spirit of the disclosure. Other embodiments will be apparent to those skilled in the art from consideration of the specification. The specification and examples are exemplary only.

As used herein, the terms "comprising," "including," "having," "containing," and the like are open-ended terms that mean including, but not limited to.

The "parts" in the present invention are all parts by mass unless otherwise specified.

Example 1

The application scenarios of the influence maximization are very wide and comprise virus marketing, recommendation systems, information diffusion, time detection, expert discovery, link prediction and the like. Given a social network graph G ═ V, E, V is a set of nodes in the graph, representing individual users, and E represents a set of edges in the graph, representing relationships between users.

1.1.1 calculation of entropy of agglomeration in neighborhood

The users in the social network have their own characteristics, and therefore, there is a difference between users, and there is a similarity relatively speaking. The greater the similarity, the more closely the user may be connected, and this varying degree of connection allows communities to appear in the network. The relative entropy is calculation for measuring the difference between probability distributions, is just suitable for measuring the difference between nodes, further obtains the similarity, and has important significance for improving the community division precision. The relative entropy is proposed in the literature to be used for calculating the node similarity in the network, and the structural similarity between the nodes is researched by using the local topological structure of the nodes, namely the degree distribution conditions of the nodes and all the neighbor nodes, but the importance degree of the node attributes and the neighbor node attributes is not distinguished. The characterization node mainly depends on the attribute of the characterization node, and the adjacent nodes play an auxiliary role. In addition, the degree of the neighbor node cannot accurately represent the characteristics of the local structure, and the edges of part of the neighbor nodes have no relation with the representation nodes. The method constructs a new node neighborhood structure and provides a method for solving the similarity between nodes.

First, the specific composition of the node neighborhood structure is given. Complete neighborhood structure of node i

Is shown in which

Is a set of nodes and their neighbors,

representing the distribution of structural information within the neighborhood is

Each node in the network and

and the proportion of the number of the connecting edges among other nodes is distributed. Fig. 1(b) shows a specific neighborhood structure example. The information distribution of the node neighborhood structure is as in formula (1):

where m ═ D (i) +1, l ∈ {1, …, b, …, m } is the symbolic representation of each node in the neighborhood

Since the relative entropy formula is to calculate the elements in the two probability sets in a one-to-one correspondence and accumulate the results, the order of the elements in the information distribution affects the accuracy of the difference metric, where the elements in the neighborhood of the node are sorted in descending order. The sorted neighborhood information distribution is as in formula (2):

and p (i,1) ═ p' (i,1), which represents the self attribute of the characteristic node and is the main attribute in the similarity calculation, and the position is unchanged and still located at the first position. As shown in fig. 1(b), wherein,

corresponding to a local region having a degree of [2,1,4,2,1 ]]，

Thus, it is possible to provide

In the application, the similarity analysis is carried out on the topological structure of the node on the basis of a relative entropy formula, but not on the basis of the prior Euclidean distance formula. The reason is that in the social network, the similarity among users is the similarity degree of the carried information, and the input of the relative entropy formula is the information distribution, so that the method is more suitable for the similarity calculation of the social network information. The quantification of the node similarity can be regarded as the calculation of the difference of the node topological structures, namely the difference between local structure information is found out. The difference between each pair of nodes is quantified using a relative entropy formula. If the difference between two nodes is small, the similarity between the two nodes is large, otherwise, the similarity is opposite. The specific concept of obtaining the degree of closeness between nodes by using relative entropy is given next.

Definition 1 (entropy of agglomeration): the entropy of aggregation is a measure of similarity between two nodes with respect to the distribution of neighborhood information. The self attribute of the node is placed at the head, the compactness of the connecting edge between the nodes in the local area is used as an auxiliary attribute, and the neighborhood structure information of the node is formed to calculate the condensation entropy between the nodes. Condensation entropy CE of node i and node j_ijThe calculation formula is defined as follows:

wherein r is_ijThe sum of the relative entropies of the neighborhood information distributions of the node i and the node j, namely the degree of dispersion. Since relative entropy is an asymmetry measure, and the degree of similarity for each pair of nodes should be equal, by the variable r_ijThe difference between the two nodes is made to be a symmetrical value. The larger the value is, the larger the difference of the local structures of the two nodes is, and the calculation formula is defined as follows:

wherein D is_klIs a relative entropy calculation formula and is expressed as follows:

where B ═ min (d (i) +1, d (j) +1), ensures that the two information distributions are on the same scale.

The relation between users is measured by using the condensation entropy, the similarity between nodes is converted into difference calculation, and the neighborhood information distribution of the nodes is used as the input of the relative entropy to obtain the neighborhood information difference, namely the dispersion degree, of the nodes. If the degree of dispersion is small, the entropy of agglomeration is large, and vice versa. When the neighborhood structure information of two nodes is the same, the dispersity of the two nodes is 0, and the condensation entropy is 1; when the neighborhood structure information of two nodes is greatly different, their degree of dispersion is close to 1, and the cohesion entropy is close to 0.

1.1.2 condensation entropy based discovery of overlapping communities

The method is mainly characterized in that the community to which a node belongs is determined by the community distribution quantity of neighbor nodes, namely, the intimacy distance and the influence degree between the node and all the neighbor nodes are the same. However, in reality, the amount of information transmitted and received among different users is different, the probability of sharing information among users with close relationships is obviously higher than that of common users, and meanwhile, the affected users also rely more on users with similar preferences. In addition, as the number of nodes belonging to different communities in the neighbor nodes is probably the same, the random strategy in the algorithm greatly reduces the accuracy of the result, and therefore, the influence caused by distinguishing different neighbors by using the condensation entropy is more reasonable. In the process of calculating the condensation entropy, the attributes of the nodes and the surrounding environment factors closely related to the nodes are considered, and the information distribution is used in a relative entropy formula, so that the information differences of the internal factors can be reflected and accumulated one by one, and the result is more accurate. The CeCOPRA algorithm is therefore proposed.

The temporal complexity of the CeCOPRA algorithm is related to the number of nodes in the network and the number of neighbors of a node. The number of neighbors is related to the degree of the node, so the time complexity of the algorithm is o (nd), where N is the total number of nodes in the network and D is the highest value of the degree in the network.

1.2 selection of candidate seed nodes

The candidate seed set is constructed in order to select individuals which have potential to become seed nodes from the node set of the whole network, and unimportant nodes are removed to narrow the search range of the seed nodes. The community structure can help to evaluate the importance of the nodes, comprehensively considers the positions of the nodes in the community or the community and the relationship among the nodes, and obtains the influence of the nodes. The method and the device for selecting the potential nodes in the network based on the community structure simultaneously consider the attributes and the positions of the nodes and select the potential nodes to form the candidate node set.

Definition 2 (aggregation bridge): according to the method, each community is regarded as an aggregation area, the position of an overlapped node is an aggregation intersection area, an aggregation bridge is generated in the area, and an aggregation bridge N is formed_hingeIs a set of user representatives across multiple domains, defined as:

wherein

The representative community i is a node or a node set which is simultaneously located in six or more communities, the nodes are closely connected with a plurality of aggregation areas, the number of the communities is specified, and the users in the aggregation bridge can have enough chances to try to influence other people, so that a certain number of influence diffusion paths are ensured. In order to avoid the situation that the number of communities is too small and the scale of the communities is too large, each node belongs to six communities at most when the communities are divided according to the small world characteristics, and therefore the nodes in the aggregation bridge belong to six communities at most simultaneously.

Definition 3 (focus of aggregation): the non-overlapping nodes of each community form a concentrated aggregation area of the community, wherein the node with the highest centrality has the closest relation with other nodes in the area, which is called an aggregation focus and is expressed as:

wherein

Representing the node v that maximizes d (v), fig. 2 is an example of candidate seed selection. The social network in the figure has been divided into three communities with a degree of 6 for nodes 8 and 16 (the criteria for an aggregation bridge is reduced to the node with the greatest degree across multiple community nodes due to the smaller network in the exampleUbiquitous) to form an aggregation bridge {8,16 }. For each community, an aggregate focus is selected separately,

for the third community, except for the overlapped nodes, the degrees of the rest nodes are all 1, which is not common for large networks, if the situation occurs, the node with the highest degree of centrality is selected from the overlapped nodes as the aggregation focus, and the aggregation focus of the whole network is

N

_core3,4,12, 16. The final social network candidate seed set is 3,4,8,12, 16.

Based on the above concept, the Candidate seed set is generated by the Candidate seeds set based on Two Key Regions (TKRCS) algorithm.

The algorithm time complexity is O (NC), where N is the total number of nodes in the network and C is the number of communities.

1.3 selectable dynamic influence propagation Algorithm

The selection of candidate seed sets has been completed and nodes that have the potential to become seeds are selected. On the basis, a selectable dynamic influence propagation algorithm (ODP algorithm) is constructed, and a node set with the most influence is determined based on a greedy thought and an IC model. In consideration of the autonomy of users, the users are entitled to select sharing objects, meanwhile, the users tend to be connected with friends close to each other, and the probability of selecting other social distances from the users is small. Therefore, the algorithm adds a propagation control factor alpha to the propagation process, represents the lower limit of the propagation condition, and the node has the infection capability when the condition is met, and then tries to activate the neighbor node. Whereas in the IC model no preconditions are required for an active user to try to influence him. This attempt is meaningless if the relationships between users are relatively distant and in an untrusted state with respect to each other.

Definition 4 (entropy of self information): the information quantity carried by the node is positively correlated with the diffusion quantity of the node, and the formula is as follows:

where M is the total number of edges in the entire network, D_uRepresenting the degree of node u. The information entropy is the quantification of information, and the self-information entropy measures the amount of information carried by a node through the ratio of the node degree to the total number of edges.

Definition 5 (cohesion): for node u ∈ V and its neighbor node w, the cohesion between the two is shown in equation (9):

wherein H_uIs the self-information entropy, CE, to be the propagation node_uwIs the entropy of the cohesion of node u and node w. The larger the cohesion, the more closely the two are related.

Definition 6 (transmissible precursor): in the network G (V, E), for the edge (u, V) ∈ E, when the cohesion of the node u and the node V reaches the value of the propagation control factor α, the node u has the capability of attempting to affect the node V, that is, the node u becomes a propagable precursor of the node V, and then attempts to affect the node V. An example is shown in fig. 3, where node 12 has been successfully activated, and there are 5 nodes that may be affected, and the magnitude of the cohesion is calculated separately, assuming that α is 0.1. CP (CP)_12,10，CP_12,11，CP_12,16Exceeds a, the node 12 continues to attempt to activate them with inter-user impact probabilities. And the cohesion of nodes 8 and 13 with node 12 does not reach α, so the two propagation paths terminate as such.

Unlike previous propagation algorithms, where the threshold α is added to indicate the lower bound of allowable propagation, when the cohesion between two users reaches the value α, one user can become a propagable precursor of the other user, i.e., the ability to try to affect the other node is provided and the influence continues to spread out. In the application, the activation probability among users is the reciprocal of the degree of the activated node.

The time taken by the algorithm is related to the number of neighbors of the current node, i.e., the degree of the current node. The algorithm has the time complexity of O (D)²) Where D is the highest value in the network.

1.4 condensation entropy-based dynamic influence maximization algorithm

The DEIM algorithm is an influence maximization algorithm based on a community structure and used for a process of dynamically selecting shared objects by a fusion user. Firstly, in order to improve efficiency, on the basis of a community structure, the method provides the condensation entropy to quantify the social distance between users, and then provides an overlapping community discovery algorithm based on the condensation entropy by combining with a label propagation algorithm, and the seed selection range is narrowed by utilizing node position information. Then, in order to embody a dynamic process that users independently select sharing objects, the application provides a selective dynamic influence propagation algorithm to evaluate the influence of the nodes, analyzes the influence effect difference caused by different intimacy degrees among the users, and further determines the seed set. The method not only effectively reduces the time overhead, but also embodies the autonomy and the dynamism in the user propagation process.

The total time of the algorithm is as follows by integrating all the stages: o (ND + NC + C' D)²) Where N represents the total number of nodes in the network, C is the number of partitioned communities, D is the maximum value of the degree in the network, C 'represents the number of nodes in the candidate seed set and C' is much smaller than the number of nodes N.

2 experiment

2.1 Experimental setup

Experiments were performed on four data sets of different sizes, and the following three problems were studied for each data set, and the experimental results are shown in section 4.2. All diffusion models of the evaluation experiments adopt an IC model, wherein the influence probability of each edge is set as the reciprocal of the node degree of the end point of the edge.

a) Seed influence diffusion Range comparison with other algorithms

b) Time comparison for seed selection by different algorithms

c) Setting of a propagation control factor alpha

Data set:

1) the DBLP dataset is a comprehensive list of computer science research papers provided by computer science bibliography. It establishes a federation author network with a total of 954 nodes, 3798 edges. Two authors are joined together if they publish at least one paper together.

2) The Facebook dataset was derived from the friends list of the social software Facebook, 4024 users in total, 87887 connected edges representing mutual friends.

3) wiki-Vote Wikipedia is a free encyclopedia written by volunteers around the world in cooperation, the nodes in the network represent Wikipedia users, the edges represent votes among the users, and the total number of the nodes is 7115, and the edges are 103689. The attributes of all data sets are shown in table 1.

4) The CA-HepPh collaboration network is derived from the electronic publication of arXiv, covering scientific collaboration between authors' papers submitted to the high-energy physico-phenomenological category.

TABLE 1

And comparing the DEIM algorithm with a heuristic algorithm and a greedy algorithm respectively, and proving that the algorithm has the effect advantage of the greedy algorithm and the efficiency advantage of the heuristic algorithm. The comparison algorithm is respectively as follows:

1) greeny: a classical seed selection strategy, the approximation of which to the optimal solution is known, can be referred to as one of the criteria of the impact maximization algorithm. The algorithm selects the node with the maximum marginal gain to join the seed set at each step, and uses Monte Carlo simulation to calculate the influence of each node, so that the algorithm has higher precision;

2) degree: a classic heuristic algorithm using the centrality of network nodes selects the nodes with the maximum degree in the network as seed nodes, and is the most intuitive and simple index for measuring the influence of the nodes;

3) PageRank: the method is also a relatively classical heuristic algorithm and is used for sequencing the importance degree of each node in the network, and the value of the damping coefficient is 0.85. The web page ranking algorithm, originally used for Google, can also be used to find influential seed nodes in social networks;

4) IMM: one of the advanced sampling methods is to find seed nodes using the reverse reachable set.

2.2 seed influence diffusion Range comparison with other algorithms

The DEIM algorithm was compared with the other four classical algorithms in terms of seed propagation effect on four distinct sets of data, where DEIM algorithm set α to 0.001. The results are shown in fig. 4, and it can be seen that the algorithm performs well as a whole, and the influence propagation performance is better than that of other algorithms.

In the data set DBLP, as shown in fig. 4(a), as the number of seeds increases, the influence obtained by each algorithm steadily increases, wherein the DEIM algorithm is highlighted and always higher than other algorithms. For the data sets Facebook, viki-Vote and CA-HepPh, as shown in FIGS. 4(b), (c) and (d), the DEIM algorithm can find a seed set for laying the whole situation when the number of seeds is small, and the effect is always better than that of other algorithms. This is because DEIM rejects network edge nodes before determining the seed, and in consideration of the situation that users selectively share information, only propagation paths with high probability are generated as a result, so that the performance of these paths is significantly better than that of a comparison algorithm when the seed users disseminate information. The IMM algorithm does not behave stably, possibly due to randomly selecting nodes to generate the reverse reachable set. The simple heuristic algorithms PageRank algorithm and Degree algorithm are good for small data sets, but with the increase of the scale of the data sets, the scale-free property of the network is gradually enhanced, the seeds selected by the algorithms may show aggregation, and the effect is gradually reduced. The DEIM algorithm is stable in performance, and shows that the DEIM algorithm has universality for different types and sizes of networks.

2.3 comparison of time taken by different algorithms for seed selection

Fig. 5 shows the corresponding run times for different algorithms selecting different numbers of seeds on the four datasets, where the DEIM algorithm sets a 0.001.

Fig. 5 shows that the DEIM algorithm has a significant advantage in efficiency when the target number is selected to be small, and the result is comparable to the heuristic algorithm. The reason is that the selection range of the seeds is greatly reduced by the candidate seed set, and the network edge nodes with small influence are removed. As the number of seeds increases, run time increases, but is still less than the greedy algorithm and the IMM algorithm. In the greedy idea-based algorithm, influence propagation simulation is carried out on nodes in all candidate seed sets to obtain the influence, and along with the increase of the number of seeds, the influence simulation calculation times are increased, so that the algorithm is more and more time-consuming. In general, the time efficiency of the DEIM algorithm is obviously higher than that of the Greedy algorithm, but compared with the two heuristic algorithms, the DEIM algorithm has no advantage, because only a certain characteristic in a network is considered in the Degree algorithm and the PageRank algorithm, the problem in actual propagation is not considered, and a seed set which is theoretically guaranteed cannot be given. However, as the size of the data set increases, the running time of the two algorithms also increases greatly, as shown in fig. 5(b) and 5 (c).

2.4 setting of the propagation control factor alpha

The propagation control factor α is a parameter for determining whether the user shares the message in the impact diffusion stage, and determines the length of the impact diffusion path. Alpha can restrict the area of the node that influences the propagation of the force, thus influencing the spread range of the final seed set, and also directly influencing the running time of the algorithm. And respectively taking alpha values according to the distribution of the node cohesion in each network. Fig. 6 and 7 show the results of different values of α, and the experiments are compared in terms of the effect and the efficiency, respectively.

In fig. 6(a), α values are 0.01,0.001,0.0001, and 0.00001, respectively. Where α is 0.01 and α is 0.001, this is preferable, and propagation control between nodes requires a relatively high ratio of cohesion between users. The range of influence is relatively low when α is 0.0001 and α is 0.00001, and redundant activation attempts may be made. (b) In the step (c) and the step (d), the values of alpha are 0.001,0.0001,0.00001 and 0.000001 respectively. As can be seen from fig. 6(b) (c), on each data set, it is still better when α is larger, similar to the case in (a). In fig. 6(d), the effect is significant when α is a maximum value and a minimum value, and is poor when α is an intermediate value of 0.0001 and 0.00001. When α is 0.000001, the relative requirement is the lowest, the range of the influence attempt spread is large, the activation opportunities are large, but unnecessary attempts are likely to be generated, and the operation time is increased.

The size of the threshold α directly affects the diffusion path length of the node, and also makes the algorithm time significantly different, as shown in fig. 7. When the value of alpha is larger, the diffusion path length can be obviously shortened, repeated activation attempts of partial paths are avoided, the running time is greatly reduced, and the phenomenon is more obvious for a dense network.

The above-described embodiments are merely illustrative of the preferred embodiments of the present invention, and do not limit the scope of the present invention, and various modifications and improvements of the technical solutions of the present invention can be made by those skilled in the art without departing from the spirit of the present invention, and the technical solutions of the present invention are within the scope of the present invention defined by the claims.

Claims

1. A dynamic influence maximization method based on condensation entropy is characterized in that: the method comprises the following steps:

2. A method for maximizing a dynamic influence based on cohesion entropy as claimed in claim 1, wherein: the aggregation entropy in step S1 is to measure the similarity between two nodes with respect to the distribution of neighborhood information, place the attribute of the node itself at the head, use the closeness of the connecting edge between nodes in the local area as an auxiliary attribute, calculate the aggregation entropy between nodes using the neighborhood structure information of the nodes, and calculate the aggregation entropy CE of the nodes i and j_ijThe calculation formula is defined as follows:

3. A method for maximizing a dynamic influence based on cohesion entropy as claimed in claim 1, wherein: the aggregation bridge in step S2 is defined as: regarding each community as a gathering area, the position of the overlapped node is a gathering intersection area, gathering bridges are generated in the area, and a gathering bridge N_hingeIs a set of user representatives across multiple domains, defined as:

wherein

4. A method for maximizing a dynamic influence based on cohesion entropy as claimed in claim 1, wherein: the focus of aggregation in said step S2 is defined as: the non-overlapping nodes of each community form a concentrated aggregation area of the community, wherein the node with the highest centrality has the closest relation with other nodes in the area, which is called an aggregation focus and is expressed as:

wherein

Represents the node v that maximizes D (v).

5. A method for maximizing a dynamic influence based on cohesion entropy as claimed in claim 1, wherein: the self-information entropy in step S3 is defined as: the information quantity carried by the node is positively correlated with the diffusion quantity of the node, and the formula is as follows:

6. A method for maximizing a dynamic influence based on cohesion entropy as claimed in claim 1, wherein: the propagable precursor in step S3 is defined as: in the network G (V, E), V is a node set, E is an edge set, and for an edge (u, V) ∈ E, when the cohesion of the node u and the node V reaches the value of the propagation control factor α, the node u has the capability of attempting to affect the node V, that is, the node u becomes a propagable precursor of the node V, and then attempts to affect the node V.