CN115878908B - Social network influence maximization method and system of graph annotation meaning force mechanism - Google Patents
Social network influence maximization method and system of graph annotation meaning force mechanism Download PDFInfo
- Publication number
- CN115878908B CN115878908B CN202310025466.6A CN202310025466A CN115878908B CN 115878908 B CN115878908 B CN 115878908B CN 202310025466 A CN202310025466 A CN 202310025466A CN 115878908 B CN115878908 B CN 115878908B
- Authority
- CN
- China
- Prior art keywords
- node
- graph
- nodes
- social network
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 53
- 230000007246 mechanism Effects 0.000 title claims abstract description 41
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 43
- 239000011159 matrix material Substances 0.000 claims abstract description 13
- 238000000605 extraction Methods 0.000 claims abstract description 10
- 239000013598 vector Substances 0.000 claims description 44
- 238000004364 calculation method Methods 0.000 claims description 30
- 230000006870 function Effects 0.000 claims description 20
- 238000005070 sampling Methods 0.000 claims description 18
- 238000005295 random walk Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 7
- 238000012163 sequencing technique Methods 0.000 claims description 7
- 230000009466 transformation Effects 0.000 claims description 6
- 230000001131 transforming effect Effects 0.000 claims description 3
- 239000010410 layer Substances 0.000 description 29
- 238000012545 processing Methods 0.000 description 16
- 238000000342 Monte Carlo simulation Methods 0.000 description 7
- 239000000243 solution Substances 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 102100034799 CCAAT/enhancer-binding protein delta Human genes 0.000 description 3
- 101000945965 Homo sapiens CCAAT/enhancer-binding protein delta Proteins 0.000 description 3
- 238000013528 artificial neural network Methods 0.000 description 3
- 230000009286 beneficial effect Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 239000002356 single layer Substances 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 230000007547 defect Effects 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000010187 selection method Methods 0.000 description 2
- 230000007480 spreading Effects 0.000 description 2
- 238000000547 structure data Methods 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007796 conventional method Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000002347 injection Methods 0.000 description 1
- 239000007924 injection Substances 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000003612 virological effect Effects 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a social network influence maximization method and a social network influence maximization system of a graph annotation force mechanism, wherein the method comprises the following steps: s1: collecting social network data and constructing graph sequence data of a social network; wherein the graph sequence data includes: graph adjacency matrix data and node representation feature data; s2: extracting features of the graph sequence data based on a graph attention network and a Node2Vec combination algorithm; s3: and heuristically selecting candidate seeds from the graph sequence data after feature extraction, and selecting a node with the maximum propagation degree gain from the candidate seeds by adopting a greedy algorithm as a final seed node, so as to form a seed node set with the maximum propagation degree gain. According to the method and the system for maximizing the influence of the social network, the graph attention network is adopted to learn the graph structure of the social network, so that a more complex graph topological structure is effectively learned, the influence maximization is realized in the social network, and the method and the system have good usability.
Description
Technical Field
The invention relates to the field of social network information propagation research, in particular to a social network influence maximization method and system of a drawing meaning mechanism.
Background
With the advent of the 5G age and the continuous development of new media technologies, online social networks have become increasingly popular, and in the last few years, online social networks play an important role as virtual communities, players are connected together through various daily personal activities (such as communication and content sharing), and they become the most effective and huge propagation platform through an oral-oral mechanism, so that information can affect a large number of people in a short time. Impact maximization has been studied extensively in recent years as a key algorithmic problem in information dissemination research due to its potential commercial value. The method aims at selecting k users in a social network as seed nodes, and then spreading information through the users to influence other users, so that the number of the influenced users is maximized in the information spreading process. Impact maximization has many well-known applications such as viral marketing, personalized recommendation, cascade detection and information monitoring.
Currently, there are a number of approaches to solving the problem of maximizing impact in social networks. For example, kemp et al demonstrate that the problem of maximizing impact is a NP-hard problem on independent cascading models and linear thresholding models, and propose a greedy algorithm to compute the impact results for a set of seed nodes to achieve an optimal solution. However, the algorithm has the disadvantage that: (1) The algorithm uses Monte Carlo simulation to approximate the influence gain of each node in the estimated network, and the frequency of Monte Carlo simulation is high (generally set to 10000 times) to ensure accuracy; (2) After selecting a node, the influence gain still needs to be recalculated for each node, so that the calculation amount is very large, and therefore, in a huge social network data set, it is difficult to quickly and efficiently find the seed node set with the largest influence. Cheng et al in 2013 proposed a static greedy algorithm, which demonstrated that only a small number of Monte Carlo simulations were needed to ensure a certain approximate solution in each iteration of the algorithm, and then the algorithm randomly stored a set of information propagation maps generated using Monte Carlo simulations during the first iteration, and used the set of random maps to estimate the influence gain of the nodes in subsequent iterations. Although a large number of unnecessary computations in the greedy algorithm can be avoided, the computation efficiency is greatly improved, a long time is still required for selecting a small number of seed nodes in a large-scale network.
Thus, many heuristic algorithms have been proposed in succession, and researchers have used structural features of the network and some characteristics of the information propagation model to find nodes with high impact. Wang et al propose MIA algorithm based on independent cascading model, which assumes that nodes can only affect neighbors in surrounding local tree structures, and at the same time affects propagation paths by considering only one with the largest probability, thus simplifying calculation of node influence propagation. But causes a problem of greater coverage of the impact range, resulting in a smaller impact propagation range as a whole.
Although the problem of maximizing the influence is discussed in the above-mentioned research, there is still a lack of effective and accurate solutions, and at present, the conventional method for maximizing the influence has the defect that only a shallow topology structure in a network can be utilized, and meanwhile, the problem of overlapping the coverage range of the influence also exists. Accordingly, there is a need for improvements in existing social networks that address the problem of maximizing impact.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a social network influence maximization method and a social network influence maximization system of a graph attention network, which adopt a multi-layer attention mechanism of the graph attention network to learn a graph structure of the social network, effectively learn a more complex graph topological structure, select a seed node with the most influence, realize influence maximization in the social network and have better usability.
In order to achieve the purpose of the invention, the invention provides a social network influence maximization method of a graph annotation meaning mechanism, which comprises the following steps:
s1: collecting social network data and constructing graph sequence data of a social network;
wherein the graph sequence data includes: graph adjacency matrix data and node representation feature data;
s2: extracting features of the graph sequence data based on a graph attention network and a Node2Vec combination algorithm;
s3: and heuristically selecting candidate seeds from the graph sequence data after feature extraction, and selecting a node with the maximum propagation degree gain from the candidate seeds by adopting a greedy algorithm as a final seed node, so as to form a seed node set with the maximum propagation degree gain.
Preferably, the specific steps of the step S2 include:
representing feature X to Node by Node2Vec i Random walk sequence of sampling and negative sampling operation processing, graph attention network layer based on graph attention network represents characteristic X to processed node i And then processing and calculating node characteristic output, and splicing the K output node characteristic vectors to obtain the final node characteristic vector.
Preferably, in the step S2, the Node is represented by Node2Vec as feature X i The specific steps of the random walk sequence and the negative sampling operation process of sampling further comprise:
Node-to-Node representation feature X based on Node2Vec i Random walk sequence and negative sampling operation processing of sampling is performed, and skip-gram algorithm is utilized to maximize central nodeAnd the probability of co-occurrence of context nodes within the length w of the left window and the right window, wherein the calculation formula is as follows:
The final target loss function is minimized by a formula logarithm method, and the target loss function is optimized and converged by a random gradient descent algorithm to obtain node representation characteristics, wherein the calculation formula of the target loss function is as follows:
preferably, the graph attention network layer based on the graph attention network in the step S2 represents the feature X to the processed node i The specific steps of the treatment include:
based on the graph attention network, h= { H 1 ,h 2 ……h n As input features of the nodes, and calculates the attention coefficient between two nodes, wherein the calculation formula is as follows:
wherein ,representing a weight matrix for the node characteristics +.>Performing linear transformation>Representing a shared attention mechanism, +.>The node +.>Node->Is of importance;
when the attention coefficient is normalized through the activation function, the calculation formula is as follows:
where a represents the weight of a single-layer neural network that calculates attention,representing transpose operations in a matrix,/->Representing the operation of the connection to the two matrices.
Preferably, the graph attention network layer based on the graph attention network in the step S2 represents the feature X to the processed node i And then processing and calculating node characteristic output, and splicing the K output node characteristic vectors to obtain a final node characteristic vector, wherein the specific steps further comprise:
based on the graph attention network, calculating node characteristic output by adopting K attention layers of a multi-head attention mechanism, and splicing the node characteristic vectors of the K outputs to obtain a final node characteristic vector, wherein the calculation formula of the multi-head attention mechanism is as follows:
wherein ,indicating the connection operation +_>Indicated by +.>Attention coefficients calculated by the attention layers, < >>Indicate->And learning parameters for linearly transforming the node characteristics.
Preferably, the graph attention network layer based on the graph attention network in the step S2 represents the feature X to the processed node i And then processing and calculating node characteristic output, and splicing the K output node characteristic vectors to obtain a final node characteristic vector, wherein the specific steps further comprise:
the calculation method of the last layer of graph annotation force network layer comprises the following steps: the average value of k features is calculated, nonlinear transformation is carried out through a nonlinear activation function, and the node calculation formula of the last layer is as follows:
preferably, the specific step of heuristically selecting candidate seeds for the feature extracted graph sequence data in step S2 includes:
the method comprises the steps of taking the characteristic vectors of np nodes of a first-order neighbor and a second-order neighbor of a network user as related vectors of each node, calculating the similarity between the two nodes by utilizing Euclidean norms of the vectors, selecting an rnp node with the maximum similarity as a strong related node, wherein r is a strong related node coefficient, r is E (0, 1), obtaining node frequency by calculating the occurrence times of each node in the strong related node set of other nodes, sorting according to the obtained node frequency, and selecting ck node with the maximum occurrence times as a candidate seed node, wherein ck is the candidate seed node related coefficient and the seed node number respectively;
the similarity is calculated by Euclidean norm formula:
Preferably, the specific step of determining the final seed node from the candidate seeds in step S2 by using a greedy algorithm includes:
and calculating the influence propagation degree of the candidate seed nodes, wherein the calculation formula is as follows:
wherein ,representing the size of the collection +.>Representation set->The number of neighbor nodes not activated;
and sequencing the influence propagation degree of each node, and selecting the node with the largest influence propagation degree as a seed node, thereby forming a seed node set with the largest propagation degree gain.
Preferably, the specific steps of the step S1 include:
collecting social network data, wherein the social network is as follows:
wherein ,represents node set, node->Representing users in a social network; />Representing a collection of edges.
Preferably, the present invention further provides a social network influence maximization system of a graph annotation force mechanism, including:
and a network data module: graph sequence data for collecting social network data and constructing a social network, wherein the graph sequence data comprises: graph adjacency matrix data and node representation feature data;
and the feature extraction module is used for: extracting features of the graph sequence data based on a graph attention network and a Node2Vec combination algorithm;
candidate seed selection module: heuristically selecting candidate seeds from the graph sequence data after feature extraction;
seed node selection module: and selecting the node with the maximum propagation degree gain from the candidate seeds by adopting a greedy algorithm as a final seed node, thereby forming a seed node set with the maximum propagation degree gain.
The beneficial effects of the invention are as follows: according to the social network influence maximization method and system of the graph attention mechanism, the multi-layer attention mechanism of the graph attention network is adopted to learn the graph structure of the social network, the graph topological structure with more complex is effectively learned, the seed node with the most influence is selected, influence maximization is achieved in the social network, and good usability is achieved.
Drawings
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings. Like reference numerals refer to like parts throughout the drawings, and the drawings are not intentionally drawn to scale on actual size or the like, with emphasis on illustrating the principles of the invention.
FIG. 1 is a schematic diagram of a specific flow chart of a method and a system for maximizing social network influence by providing a attention mechanism according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an embodiment of a method and a system for maximizing social network impact of a graph attention mechanism according to an embodiment of the present invention;
fig. 3 is a schematic diagram of feature extraction based on a graph attention network and a Node2Vec combining algorithm according to an embodiment of the present invention.
Detailed Description
The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and specific examples, so that those skilled in the art can better understand the present invention and implement it, but the examples are not limited thereto.
In a first embodiment, please refer to fig. 1-3, an embodiment of the present invention provides a social network influence maximizing method of a graph annotation mechanism, which includes the following steps:
s1: collecting social network data and constructing graph sequence data of a social network;
wherein the graph sequence data includes: graph adjacency matrix data and node representation feature data;
s2: inputting the graph structure data into a graph injection force network and Node2vec combined influence maximization model to perform feature embedding learning so as to extract information related to influence maximization problems;
s3: and heuristically selecting candidate seeds from the graph sequence data after feature extraction, and selecting a node with the maximum propagation degree gain from the candidate seeds by adopting a greedy algorithm as a final seed node, so that a seed node set with the maximum propagation degree gain is formed, and influence overlapping is avoided. An influence maximization model based on a graph attention mechanism is constructed (comprising a Node characteristic extraction method combining a graph attention network and a Node2Vec, a candidate seed selection method based on a heuristic method and a seed selection method based on a greedy algorithm).
The beneficial effects of the invention are as follows: aiming at complex graph data, the method can effectively utilize the graph topological structure, and solves the problem of suboptimal solution of the seed set. In the Node characteristic processing stage, node2Vec is used for learning the shallow graph structure and processing the deeper graph topology structure by utilizing a graph attention mechanism. In terms of seed selection, a heuristic algorithm is used to select candidate seed nodes, and a greedy algorithm CELF is used to select the seed nodes with the greatest influence, so that the influence overlapping problem is relieved. Because the method is monotonous and sub-model, the approximate optimal solution of the influence maximization problem is ensured, and the method has better usability.
Referring to fig. 2-3, in a preferred embodiment, the specific steps of step S2 include:
to obtain feature embedding of a Node, feature X is represented to the Node by Node2Vec i Random walk sequence with sampling and processing of negative sampling operations (based on second order random walk super parameters and />To generate a random walk sequence), the graph attention network layer of the graph attention network representing the feature X to the processed nodes based on the graph attention network i And then processing and calculating node characteristic output, and splicing the K output node characteristic vectors to obtain the final node characteristic vector. (in the preferred embodiment, node2Vec generates Node feature dimension is 512, sampling sequence length is 6, and the number of sampling sequences per Node is 200, wherein the learning rate in the graph attention network model is 0.0001, the first layer graph attention network output dimension is 256, and the second layer network output dimension is 16).
Referring to FIGS. 2-3, in a preferred embodiment, feature X is represented to nodes by Node2Vec in step S2 i The specific steps of the random walk sequence and the negative sampling operation process of sampling further comprise:
Node-to-Node representation feature X based on Node2Vec i The random walk sequence of sampling and the processing of the negative sampling operation take into account low order neighbors and high orderThe similarity of neighbors can flexibly capture the homogeneity and the structural peering of nodes in the graph and maximize the central node by using a skip-gram algorithmAnd the probability of co-occurrence of context nodes within the length w of the left window and the right window, wherein the calculation formula is as follows:
the final target loss function is minimized by a formula logarithm method (the reconstruction loss of a graph structure is calculated, the graph is subjected to unsupervised training to obtain a low-dimensional node characteristic representation), and the target loss function is optimized and converged by a random gradient descent algorithm to obtain the node representation characteristic, wherein the calculation formula of the target loss function is as follows:
referring to fig. 2-3, in a further preferred embodiment, the graph attention network layer based on the graph attention network in step S2 represents the feature X to the processed node i The specific steps of the treatment include:
in the aspect of graph structure data, through the action of an attention mechanism, a user can allocate different attention to the neighbor nodes, and a larger attention coefficient is allocated to the neighbor nodes with similar hobbies or similar topological structures, so that the nodes can learn more complex graph topological structure characteristics. Therefore, based on the graph attention network, h= { H 1 ,h 2 ……h n As input features of the nodes, and calculates the attention coefficient between two nodes, wherein the calculation formula is as follows:
wherein ,representing a weight matrix for the node characteristics +.>Performing linear transformation>Representing a shared attention mechanism, +.>The node +.>Node->Is of importance; for the purpose of node->A larger difference is made in their neighbor attention coefficients, where the attention coefficients can be normalized using the softmax function:
in which a single layer neural network is used for computationThen use LeakyReLUThe method is characterized in that the attention coefficient is normalized through the activation function as a nonlinear activation function, and the calculation formula is as follows:
where a represents the weight of a single-layer neural network that calculates attention,representing transpose operations in a matrix,/->Representing the operation of the connection to the two matrices.
Referring to fig. 2-3, in a further preferred embodiment, the graph attention network layer based on the graph attention network in step S2 represents the feature X to the processed node i And then processing and calculating node characteristic output, and splicing the K output node characteristic vectors to obtain a final node characteristic vector, wherein the specific steps further comprise:
in order to make the self-attention mechanism more stable, based on a graph attention network, K attention layers of a multi-head attention mechanism are adopted to respectively calculate different node characteristic outputs, and the node characteristic vectors of the K outputs are spliced to form a final node characteristic vector, wherein the calculation formula of the multi-head attention mechanism is as follows:
wherein ,indicating the connection operation +_>Indicated by +.>Attention coefficients calculated by the attention layers, < >>Indicate->And learning parameters for linearly transforming the node characteristics.
Referring to fig. 2-3, in a preferred embodiment, the graph attention network layer of the graph attention network-based graph attention network in step S2 represents the feature X to the processed node i And then processing and calculating node characteristic output, and splicing the K output node characteristic vectors to obtain a final node characteristic vector, wherein the specific steps further comprise:
the calculation method of the last layer of graph annotation force network layer comprises the following steps: for the last layer of models, it will not normally beThe characteristics are spliced, the average value of k characteristics is calculated, nonlinear transformation is carried out through a nonlinear activation function, and the node updating calculation formula of the last layer is as follows: />
The graph attention network layer processes the nodes and learns further graph structures. The method ignores the degree of the nodes in network searching, generates a fixed number of paths for each node, learns a complex graph topological structure, and improves the effectiveness of the algorithm.
Referring to fig. 2-3, in a further preferred embodiment, the step S2 of heuristically selecting candidate seeds for the feature extracted graph sequence data includes:
the euclidean norm of the vector is used to measure the similarity between node pairs, and if there is a high similarity between two nodes, two nodes are considered to be more susceptible to each other, whereas two nodes are considered to be less susceptible to each other.
The propagation path of the information on the social network is generally relatively short, and the propagation range is limited to the range of the second-order neighbors of the nodes, so that the feature vectors of the first-order neighbors and the second-order neighbors of the user are selected as the related vector of each node in the network. And calculating the similarity between the node pairs by using the feature vectors, and then sequencing.
The social network influence maximization method and the social network influence maximization system of the graph attention mechanism utilize a heuristic method to select candidate seed nodes when selecting candidate seeds. After obtaining the node characteristic representation in the network, euclidean distance between nodes is used to evaluate similarity between nodes, and then strong correlation node set of each node is calculated. And finally, counting the occurrence frequency of each node in the strong correlation node sets of other nodes, sequencing the nodes, and selecting partial nodes with high frequency as candidate seed node sets.
By selecting seed nodes in the candidate seed node set, the time efficiency of the whole algorithm can be accelerated. Meanwhile, in order to avoid the problem of influence overlapping among the nodes, a final seed node set is selected from candidate seed nodes by using an optimized greedy algorithm CELF at the stage.
The method comprises the steps of taking the characteristic vectors of np nodes of first-order neighbors and second-order neighbors of a network user as related vectors of each node, calculating the similarity between the two nodes by utilizing Euclidean norms of the vectors, and selecting an r x np node with the maximum similarity as a strong related node, wherein r is a strong related node coefficient, r epsilon (0, 1), the number of the nodes is rounded downwards, node frequencies are obtained by calculating the occurrence times of each node in strong related node sets of other nodes, sorting is carried out according to the obtained node frequencies, ck nodes with the maximum occurrence times are selected as nodes of candidate seeds, and ck is respectively the candidate seed node related coefficient (generally set to 10) and the number of seed nodes (set to 50);
the similarity is calculated by Euclidean norm formula:
Referring to fig. 2-3, in a preferred embodiment, the step S2 of determining the final seed node from the candidate seeds by using a greedy algorithm includes:
after the candidate nodes are selected, the final candidate nodes are selected from the candidate node setsAnd seed nodes. There may be a problem of overlapping effects among the candidate nodes chosen:
if an individual node u or an individual node v can ideally affect all neighbors around, then nodes are selected in the seed node setIn the case of (2), node +.>And adding the seed node set, so that the influence propagation degree of the seed node set is not improved. Here a heuristic formula is used to measure the information propagation degree of a seed set S.
The influence propagation degree of the candidate seed nodes is calculated in a heuristic manner, and the calculation formula is as follows:
wherein ,representing the size of the collection +.>Representation set->The number of neighbor nodes not activated;
and sequencing the influence propagation degree of each node, and selecting the node with the largest influence propagation degree as a seed node, thereby forming a seed node set with the largest propagation degree gain.
In the case of an independent cascading propagation model,is monotonous and sub-mode. The marginal influence of the nodes conforms to the sub-model, and the CELF algorithm is optimized by utilizing the property. Thus calculate the seed node set +.>. After adding a first node A into a seed node according to the marginal influence, calculating the marginal influence again by a node B with the next smaller marginal influence in each node calculated for the first time, and if the new marginal influence of the node B is larger than or equal to the previous marginal influence of a node C with the next smaller marginal influence of the node B, directly taking the node B as a new seed node without calculating the marginal influence of a later node again. If the influence of the node B is not greater than or equal to the last round of marginal influence of the node C which is smaller than the last round of the node B, the marginal influence of each node is calculated one by one in sequence, and the largest node is selected as a seed node in sequence and placed into a seed subset.
Referring to fig. 2-3, in a preferred embodiment, the specific steps of step S1 include:
collecting social network data, wherein the social network is as follows:
wherein ,represents node set, node->Representing users in a social network; />The method comprises the steps that a set of representative edges (one edge represents the influence possibly generated among nodes, a social network is modeled as a graph sequence, a user conducts information propagation in the social network, other users are influenced through information propagation, and the number of the influenced users is maximized in the information propagation process).
In this embodiment, network data is collected, provided by the arxiv platform, whether there is cooperation between authors in the neighborhood of the high-energy physical theory, and if two authors write at least one paper together, an undirected edge is generated between the two authors, and the collected data includes 31376 edges and 15229 nodes. Algorithm performance was measured using the influence spread, performed under an independent cascade model with probability p=0.1, calculated by repeating 10000 monte carlo simulations.
In a second embodiment, the present application further provides a method for maximizing academic quote network impact based on graph attention mechanisms;
step S1: collecting academic cited network data, and constructing diagram sequence data of the academic cited network, wherein in the embodiment, a data set consists of 2708 nodes and 5429 edges; algorithm performance was measured using the influence spread, performed under an independent cascade model with probability p=0.1, calculated by repeating 10000 monte carlo simulations.
Step S2: inputting academic cited graph sequence data into Node2Vec algorithm to learn shallow graph topology structure to Node representationIn the method, a multi-head attention mechanism is utilized to learn the graph sequence characteristics after preliminary processing, and a two-layer graph attention network layer pair is used for +.>Processing, calculating different node characteristic outputs, and then adding this +.>The individual node feature vectors are stitched together as the final node feature vector. And finally, calculating the reconstruction loss of the graph structure, and performing unsupervised training on the graph to obtain a low-dimensional node characteristic representation.
Step S3: inputting graph sequence data with extracted features, selecting candidate seeds by using a heuristic method, and re-selecting the candidate seeds by using an optimized greedy algorithm to determine final seed nodes;
in a third embodiment, the present application further provides a twitter network impact maximizing method based on a graph attention mechanism;
step S1: collecting data of a twitter network, and constructing graph sequence data of the twitter network, wherein a data set consists of 3312 nodes and 4732 edges in the embodiment; algorithm performance was measured using the influence spread, performed under an independent cascade model with probability p=0.1, calculated by repeating 10000 monte carlo simulations.
Step S2: inputting graph sequence data of the twitter network into a Node2Vec algorithm to learn a shallow graph topological structure into a Node representationIn the method, a multi-head attention mechanism is utilized to learn the graph sequence characteristics after preliminary processing, and a two-layer graph attention network layer pair is used for +.>Processing, calculating different node characteristic outputs, and then adding this +.>The individual node feature vectors are stitched together as the final node feature vector. And finally, calculating the reconstruction loss of the graph structure, and performing unsupervised training on the graph to obtain a low-dimensional node characteristic representation.
Step S3: inputting graph sequence data with extracted features, selecting candidate seeds by using a heuristic method, and re-selecting the candidate seeds by using an optimized greedy algorithm to determine final seed nodes;
referring to fig. 1-3, in a preferred embodiment, the present invention further provides a social network impact maximizing system of a graph annotation mechanism, including:
and a network data module: graph sequence data for collecting social network data and constructing a social network, wherein the graph sequence data comprises: graph adjacency matrix data and node representation feature data;
and the feature extraction module is used for: extracting features of the graph sequence data based on a graph attention network and a Node2Vec combination algorithm;
candidate seed selection module: heuristically selecting candidate seeds from the graph sequence data after feature extraction;
seed node selection module: and selecting the node with the maximum propagation degree gain from the candidate seeds by adopting a greedy algorithm as a final seed node, thereby forming a seed node set with the maximum propagation degree gain.
The social network influence maximization system of the graph attention mechanism provided by the embodiment is the same as the social network influence maximization method of the graph attention mechanism provided by the embodiment, and the social network influence maximization system and the social network influence maximization method can be shared.
The beneficial effects of the invention are as follows: the invention provides a social network influence maximization method and a social network influence maximization system for a graph attention mechanism, which adopt a multi-layer attention mechanism of the graph attention network to learn a graph structure of the social network, effectively learn a more complex graph topological structure, select a seed node with the most influence, realize influence maximization in the social network and have better usability.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.
Claims (6)
1. The social network influence maximization method of the graph meaning force mechanism is characterized by comprising the following steps of:
s1: collecting network data of authors or academic quotations or twitter networks in a high-energy physical theory neighborhood in the social network data and constructing graph sequence data of the social network;
wherein the graph sequence data includes: graph adjacency matrix data and node representation feature data;
s2: representing feature X to Node by Node2Vec i Random walk sequence with sampling and negative sampling operation process, using skip-gram algorithm to maximize central nodeAnd the probability of co-occurrence of context nodes within the left and right window lengths w; the calculation formula is as follows:
minimizing a final target loss function by a formula logarithm method, and optimizing and converging the target loss function by a random gradient descent algorithm to obtain node representation characteristics;
the calculation formula of the target loss function is as follows:
based on a graph attention network, calculating node characteristic output by adopting K attention layers of a multi-head attention mechanism, and splicing the node characteristic vectors of the K outputs to obtain a final node characteristic vector;
s3: the method comprises the steps of taking the characteristic vectors of np nodes of a first-order neighbor and a second-order neighbor of a network user as related vectors of each node, calculating the similarity between the two nodes by utilizing Euclidean norms of the vectors, selecting an rnp node with the maximum similarity as a strong related node, wherein r is a strong related node coefficient, r is E (0, 1), obtaining node frequency by calculating the occurrence times of each node in the strong related node set of other nodes, sorting according to the obtained node frequency, and selecting ck node with the maximum occurrence times as a candidate seed node, wherein ck is the candidate seed node related coefficient and the seed node number respectively;
the similarity is calculated by Euclidean norm formula:
Calculating the influence propagation degree of candidate seed nodes, sequencing the influence propagation degree of each node, and selecting the node with the largest influence propagation degree as the seed node, thereby forming a seed node set with the largest propagation degree gain;
the calculation formula of the influence propagation degree of the candidate seed nodes is as follows:
2. The method for maximizing the influence of a social network as recited in claim 1, wherein the graph attention network layer based on the graph attention network in said step S2 represents the feature X to the processed nodes i The specific steps of the treatment include:
based on the graph attention network, h= { H 1 ,h 2 ……h n As input features of the nodes, and calculates the attention coefficient between two nodes, wherein the calculation formula is as follows:
wherein ,representing a weight matrix for the node characteristics +.>Performing linear transformation>Representing a shared attention mechanism,/>The node +.>Node->Is of importance;
when the attention coefficient is normalized through the activation function, the calculation formula is as follows:
3. The method for maximizing the influence of a social network as recited in claim 1, wherein the step S2 of calculating the node feature output based on the graph attention network by using K attention layers of the multi-head attention mechanism, and the step of stitching the node feature vectors of the K outputs to obtain the final node feature vector further comprises:
the calculation formula of the multi-head attention mechanism is as follows:
4. The method for maximizing influence of social network as recited in claim 3, wherein the step S2 of calculating node feature outputs based on the graph attention network by using K attention layers of a multi-head attention mechanism, and the step of concatenating the K output node feature vectors to obtain a final node feature vector further comprises:
the calculation method of the last layer of graph annotation force network layer comprises the following steps: the average value of k features is calculated, nonlinear transformation is carried out through a nonlinear activation function, and the node calculation formula of the last layer is as follows:
5. the method for maximizing the influence of a social network as recited in claim 1, wherein the specific step of step S1 includes:
collecting social network data, wherein the social network is as follows:
6. A social network impact maximization system of a graph attention mechanism, comprising:
and a network data module: network data for collecting inter-author or academic cited or twitter networks in a high-energy physical theoretical neighborhood in social network data, wherein graph sequence data comprises: graph adjacency matrix data and node representation feature data;
and the feature extraction module is used for: representing feature X to Node by Node2Vec i Random walk sequence with sampling and negative sampling operation process, using skip-gram algorithm to maximize central nodeAnd the probability of co-occurrence of context nodes within the left and right window lengths w;
the calculation formula is as follows:
minimizing a final target loss function by a formula logarithm method, and optimizing and converging the target loss function by a random gradient descent algorithm to obtain node representation characteristics;
the calculation formula of the target loss function is as follows:
based on a graph attention network, calculating node characteristic output by adopting K attention layers of a multi-head attention mechanism, and splicing the node characteristic vectors of the K outputs to obtain a final node characteristic vector;
candidate seed selection module: the method comprises the steps of taking the characteristic vectors of np nodes of a first-order neighbor and a second-order neighbor of a network user as related vectors of each node, calculating the similarity between the two nodes by utilizing Euclidean norms of the vectors, selecting an rnp node with the maximum similarity as a strong related node, wherein r is a strong related node coefficient, r is E (0, 1), obtaining node frequency by calculating the occurrence times of each node in the strong related node set of other nodes, sorting according to the obtained node frequency, and selecting ck node with the maximum occurrence times as a candidate seed node, wherein ck is the candidate seed node related coefficient and the seed node number respectively;
the similarity is calculated by Euclidean norm formula:
Seed node selection module: calculating the influence propagation degree of candidate seed nodes, sequencing the influence propagation degree of each node, and selecting the node with the largest influence propagation degree as the seed node, thereby forming a seed node set with the largest propagation degree gain;
the calculation formula of the influence propagation degree of the candidate seed nodes is as follows:
Seed node selection module: calculating the influence propagation degree of candidate seed nodes, sequencing the influence propagation degree of each node, and selecting the node with the largest influence propagation degree as the seed node, thereby forming a seed node set with the largest propagation degree gain;
the calculation formula of the influence propagation degree of the candidate seed nodes is as follows:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310025466.6A CN115878908B (en) | 2023-01-09 | 2023-01-09 | Social network influence maximization method and system of graph annotation meaning force mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310025466.6A CN115878908B (en) | 2023-01-09 | 2023-01-09 | Social network influence maximization method and system of graph annotation meaning force mechanism |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115878908A CN115878908A (en) | 2023-03-31 |
CN115878908B true CN115878908B (en) | 2023-06-02 |
Family
ID=85758315
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310025466.6A Active CN115878908B (en) | 2023-01-09 | 2023-01-09 | Social network influence maximization method and system of graph annotation meaning force mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115878908B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898041A (en) * | 2020-07-20 | 2020-11-06 | 电子科技大学 | Social network combined circle layer user comprehensive influence evaluation and counterfeiting discrimination method |
CN111898040A (en) * | 2020-07-20 | 2020-11-06 | 电子科技大学 | Circle layer user influence evaluation method combined with social network |
CN112214689A (en) * | 2020-10-22 | 2021-01-12 | 上海交通大学 | Method and system for maximizing influence of group in social network |
CN112330136A (en) * | 2020-11-02 | 2021-02-05 | 国网江苏省电力有限公司电力科学研究院 | Relevance mining method and device for abnormal electricity utilization analysis data set of large user |
CN112446634A (en) * | 2020-12-03 | 2021-03-05 | 兰州大学 | Method and system for detecting influence maximization node in social network |
-
2023
- 2023-01-09 CN CN202310025466.6A patent/CN115878908B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898041A (en) * | 2020-07-20 | 2020-11-06 | 电子科技大学 | Social network combined circle layer user comprehensive influence evaluation and counterfeiting discrimination method |
CN111898040A (en) * | 2020-07-20 | 2020-11-06 | 电子科技大学 | Circle layer user influence evaluation method combined with social network |
CN112214689A (en) * | 2020-10-22 | 2021-01-12 | 上海交通大学 | Method and system for maximizing influence of group in social network |
CN112330136A (en) * | 2020-11-02 | 2021-02-05 | 国网江苏省电力有限公司电力科学研究院 | Relevance mining method and device for abnormal electricity utilization analysis data set of large user |
CN112446634A (en) * | 2020-12-03 | 2021-03-05 | 兰州大学 | Method and system for detecting influence maximization node in social network |
Non-Patent Citations (2)
Title |
---|
基于偏好传播的社交影响预测研究;陈泓霏;中国优秀硕士学位论文全文数据库基础科学辑(第8期);A002-81 * |
基于启发式和贪心策略的社交网络影响最大化算法;曹玖新 等;东南大学学报;第46卷(第5期);第950-956页 * |
Also Published As
Publication number | Publication date |
---|---|
CN115878908A (en) | 2023-03-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Ma et al. | Adaptive-step graph meta-learner for few-shot graph classification | |
CN108009575A (en) | A kind of community discovery method for complex network | |
CN109766710B (en) | Differential privacy protection method of associated social network data | |
CN112446634B (en) | Method and system for detecting influence maximization node in social network | |
CN114064627A (en) | Knowledge graph link completion method and system for multiple relations | |
CN110866134A (en) | Image retrieval-oriented distribution consistency keeping metric learning method | |
Panagopoulos et al. | Influence maximization using influence and susceptibility embeddings | |
CN109948242A (en) | Network representation learning method based on feature Hash | |
CN109783805A (en) | A kind of network community user recognition methods and device | |
Zhou et al. | Approximate deep network embedding for mining large-scale graphs | |
Yu et al. | Unsupervised euclidean distance attack on network embedding | |
Wickman et al. | A Generic Graph Sparsification Framework using Deep Reinforcement Learning | |
Wei et al. | Auto-prox: Training-free vision transformer architecture search via automatic proxy discovery | |
Wang et al. | A multi-agent genetic algorithm for local community detection by extending the tightest nodes | |
CN112231579B (en) | Social video recommendation system and method based on implicit community discovery | |
CN113989544A (en) | Group discovery method based on deep map convolution network | |
CN116955846B (en) | Cascade information propagation prediction method integrating theme characteristics and cross attention | |
CN109472712A (en) | A kind of efficient Markov random field Combo discovering method strengthened based on structure feature | |
CN115878908B (en) | Social network influence maximization method and system of graph annotation meaning force mechanism | |
Gialampoukidis et al. | Community detection in complex networks based on DBSCAN* and a Martingale process | |
CN117272195A (en) | Block chain abnormal node detection method and system based on graph convolution attention network | |
CN115661861A (en) | Skeleton behavior identification method based on dynamic time sequence multidimensional adaptive graph convolution network | |
CN112256756B (en) | Influence discovery method based on ternary association diagram and knowledge representation | |
CN114722920A (en) | Deep map convolution model phishing account identification method based on map classification | |
Ibrahim et al. | Under-counted tensor completion with neural incorporation of attributes |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |