CN115878908B - Social network influence maximization method and system of graph annotation meaning force mechanism - Google Patents

Social network influence maximization method and system of graph annotation meaning force mechanism Download PDF

Info

Publication number
CN115878908B
CN115878908B CN202310025466.6A CN202310025466A CN115878908B CN 115878908 B CN115878908 B CN 115878908B CN 202310025466 A CN202310025466 A CN 202310025466A CN 115878908 B CN115878908 B CN 115878908B
Authority
CN
China
Prior art keywords
node
graph
nodes
social network
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310025466.6A
Other languages
Chinese (zh)
Other versions
CN115878908A (en
Inventor
李远鑫
王振宇
韩柳
李萍
梁朝恺
钟伟杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Post Consumer Finance Co ltd
South China University of Technology SCUT
Original Assignee
China Post Consumer Finance Co ltd
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Post Consumer Finance Co ltd, South China University of Technology SCUT filed Critical China Post Consumer Finance Co ltd
Priority to CN202310025466.6A priority Critical patent/CN115878908B/en
Publication of CN115878908A publication Critical patent/CN115878908A/en
Application granted granted Critical
Publication of CN115878908B publication Critical patent/CN115878908B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention relates to a social network influence maximization method and a social network influence maximization system of a graph annotation force mechanism, wherein the method comprises the following steps: s1: collecting social network data and constructing graph sequence data of a social network; wherein the graph sequence data includes: graph adjacency matrix data and node representation feature data; s2: extracting features of the graph sequence data based on a graph attention network and a Node2Vec combination algorithm; s3: and heuristically selecting candidate seeds from the graph sequence data after feature extraction, and selecting a node with the maximum propagation degree gain from the candidate seeds by adopting a greedy algorithm as a final seed node, so as to form a seed node set with the maximum propagation degree gain. According to the method and the system for maximizing the influence of the social network, the graph attention network is adopted to learn the graph structure of the social network, so that a more complex graph topological structure is effectively learned, the influence maximization is realized in the social network, and the method and the system have good usability.

Description

Social network influence maximization method and system of graph annotation meaning force mechanism
Technical Field
The invention relates to the field of social network information propagation research, in particular to a social network influence maximization method and system of a drawing meaning mechanism.
Background
With the advent of the 5G age and the continuous development of new media technologies, online social networks have become increasingly popular, and in the last few years, online social networks play an important role as virtual communities, players are connected together through various daily personal activities (such as communication and content sharing), and they become the most effective and huge propagation platform through an oral-oral mechanism, so that information can affect a large number of people in a short time. Impact maximization has been studied extensively in recent years as a key algorithmic problem in information dissemination research due to its potential commercial value. The method aims at selecting k users in a social network as seed nodes, and then spreading information through the users to influence other users, so that the number of the influenced users is maximized in the information spreading process. Impact maximization has many well-known applications such as viral marketing, personalized recommendation, cascade detection and information monitoring.
Currently, there are a number of approaches to solving the problem of maximizing impact in social networks. For example, kemp et al demonstrate that the problem of maximizing impact is a NP-hard problem on independent cascading models and linear thresholding models, and propose a greedy algorithm to compute the impact results for a set of seed nodes to achieve an optimal solution. However, the algorithm has the disadvantage that: (1) The algorithm uses Monte Carlo simulation to approximate the influence gain of each node in the estimated network, and the frequency of Monte Carlo simulation is high (generally set to 10000 times) to ensure accuracy; (2) After selecting a node, the influence gain still needs to be recalculated for each node, so that the calculation amount is very large, and therefore, in a huge social network data set, it is difficult to quickly and efficiently find the seed node set with the largest influence. Cheng et al in 2013 proposed a static greedy algorithm, which demonstrated that only a small number of Monte Carlo simulations were needed to ensure a certain approximate solution in each iteration of the algorithm, and then the algorithm randomly stored a set of information propagation maps generated using Monte Carlo simulations during the first iteration, and used the set of random maps to estimate the influence gain of the nodes in subsequent iterations. Although a large number of unnecessary computations in the greedy algorithm can be avoided, the computation efficiency is greatly improved, a long time is still required for selecting a small number of seed nodes in a large-scale network.
Thus, many heuristic algorithms have been proposed in succession, and researchers have used structural features of the network and some characteristics of the information propagation model to find nodes with high impact. Wang et al propose MIA algorithm based on independent cascading model, which assumes that nodes can only affect neighbors in surrounding local tree structures, and at the same time affects propagation paths by considering only one with the largest probability, thus simplifying calculation of node influence propagation. But causes a problem of greater coverage of the impact range, resulting in a smaller impact propagation range as a whole.
Although the problem of maximizing the influence is discussed in the above-mentioned research, there is still a lack of effective and accurate solutions, and at present, the conventional method for maximizing the influence has the defect that only a shallow topology structure in a network can be utilized, and meanwhile, the problem of overlapping the coverage range of the influence also exists. Accordingly, there is a need for improvements in existing social networks that address the problem of maximizing impact.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a social network influence maximization method and a social network influence maximization system of a graph attention network, which adopt a multi-layer attention mechanism of the graph attention network to learn a graph structure of the social network, effectively learn a more complex graph topological structure, select a seed node with the most influence, realize influence maximization in the social network and have better usability.
In order to achieve the purpose of the invention, the invention provides a social network influence maximization method of a graph annotation meaning mechanism, which comprises the following steps:
s1: collecting social network data and constructing graph sequence data of a social network;
wherein the graph sequence data includes: graph adjacency matrix data and node representation feature data;
s2: extracting features of the graph sequence data based on a graph attention network and a Node2Vec combination algorithm;
s3: and heuristically selecting candidate seeds from the graph sequence data after feature extraction, and selecting a node with the maximum propagation degree gain from the candidate seeds by adopting a greedy algorithm as a final seed node, so as to form a seed node set with the maximum propagation degree gain.
Preferably, the specific steps of the step S2 include:
representing feature X to Node by Node2Vec i Random walk sequence of sampling and negative sampling operation processing, graph attention network layer based on graph attention network represents characteristic X to processed node i And then processing and calculating node characteristic output, and splicing the K output node characteristic vectors to obtain the final node characteristic vector.
Preferably, in the step S2, the Node is represented by Node2Vec as feature X i The specific steps of the random walk sequence and the negative sampling operation process of sampling further comprise:
Node-to-Node representation feature X based on Node2Vec i Random walk sequence and negative sampling operation processing of sampling is performed, and skip-gram algorithm is utilized to maximize central node
Figure 214153DEST_PATH_IMAGE001
And the probability of co-occurrence of context nodes within the length w of the left window and the right window, wherein the calculation formula is as follows:
Figure 891122DEST_PATH_IMAGE002
wherein ,
Figure 922532DEST_PATH_IMAGE003
for node->
Figure 811990DEST_PATH_IMAGE001
Is characterized by a potential node representation of;
The final target loss function is minimized by a formula logarithm method, and the target loss function is optimized and converged by a random gradient descent algorithm to obtain node representation characteristics, wherein the calculation formula of the target loss function is as follows:
Figure 895614DEST_PATH_IMAGE004
preferably, the graph attention network layer based on the graph attention network in the step S2 represents the feature X to the processed node i The specific steps of the treatment include:
based on the graph attention network, h= { H 1 ,h 2 ……h n As input features of the nodes, and calculates the attention coefficient between two nodes, wherein the calculation formula is as follows:
Figure 743484DEST_PATH_IMAGE005
wherein ,
Figure DEST_PATH_IMAGE006
representing a weight matrix for the node characteristics +.>
Figure 996611DEST_PATH_IMAGE007
Performing linear transformation>
Figure 955339DEST_PATH_IMAGE008
Representing a shared attention mechanism, +.>
Figure DEST_PATH_IMAGE009
The node +.>
Figure 344732DEST_PATH_IMAGE010
Node->
Figure DEST_PATH_IMAGE011
Is of importance;
when the attention coefficient is normalized through the activation function, the calculation formula is as follows:
Figure 691400DEST_PATH_IMAGE012
where a represents the weight of a single-layer neural network that calculates attention,
Figure DEST_PATH_IMAGE013
representing transpose operations in a matrix,/->
Figure 431823DEST_PATH_IMAGE014
Representing the operation of the connection to the two matrices.
Preferably, the graph attention network layer based on the graph attention network in the step S2 represents the feature X to the processed node i And then processing and calculating node characteristic output, and splicing the K output node characteristic vectors to obtain a final node characteristic vector, wherein the specific steps further comprise:
based on the graph attention network, calculating node characteristic output by adopting K attention layers of a multi-head attention mechanism, and splicing the node characteristic vectors of the K outputs to obtain a final node characteristic vector, wherein the calculation formula of the multi-head attention mechanism is as follows:
Figure DEST_PATH_IMAGE015
wherein ,
Figure 928663DEST_PATH_IMAGE014
indicating the connection operation +_>
Figure 703721DEST_PATH_IMAGE016
Indicated by +.>
Figure DEST_PATH_IMAGE017
Attention coefficients calculated by the attention layers, < >>
Figure 96657DEST_PATH_IMAGE018
Indicate->
Figure 855534DEST_PATH_IMAGE017
And learning parameters for linearly transforming the node characteristics.
Preferably, the graph attention network layer based on the graph attention network in the step S2 represents the feature X to the processed node i And then processing and calculating node characteristic output, and splicing the K output node characteristic vectors to obtain a final node characteristic vector, wherein the specific steps further comprise:
the calculation method of the last layer of graph annotation force network layer comprises the following steps: the average value of k features is calculated, nonlinear transformation is carried out through a nonlinear activation function, and the node calculation formula of the last layer is as follows:
Figure DEST_PATH_IMAGE019
preferably, the specific step of heuristically selecting candidate seeds for the feature extracted graph sequence data in step S2 includes:
the method comprises the steps of taking the characteristic vectors of np nodes of a first-order neighbor and a second-order neighbor of a network user as related vectors of each node, calculating the similarity between the two nodes by utilizing Euclidean norms of the vectors, selecting an rnp node with the maximum similarity as a strong related node, wherein r is a strong related node coefficient, r is E (0, 1), obtaining node frequency by calculating the occurrence times of each node in the strong related node set of other nodes, sorting according to the obtained node frequency, and selecting ck node with the maximum occurrence times as a candidate seed node, wherein ck is the candidate seed node related coefficient and the seed node number respectively;
the similarity is calculated by Euclidean norm formula:
Figure 624907DEST_PATH_IMAGE020
wherein ,
Figure 785630DEST_PATH_IMAGE021
,/>
Figure 880625DEST_PATH_IMAGE022
respectively represent nodesiSum nodej
Preferably, the specific step of determining the final seed node from the candidate seeds in step S2 by using a greedy algorithm includes:
and calculating the influence propagation degree of the candidate seed nodes, wherein the calculation formula is as follows:
Figure 736585DEST_PATH_IMAGE023
wherein ,
Figure 965441DEST_PATH_IMAGE024
representing the size of the collection +.>
Figure DEST_PATH_IMAGE025
Representation set->
Figure 59299DEST_PATH_IMAGE026
The number of neighbor nodes not activated;
and sequencing the influence propagation degree of each node, and selecting the node with the largest influence propagation degree as a seed node, thereby forming a seed node set with the largest propagation degree gain.
Preferably, the specific steps of the step S1 include:
collecting social network data, wherein the social network is as follows:
Figure 715409DEST_PATH_IMAGE027
wherein ,
Figure 793086DEST_PATH_IMAGE028
represents node set, node->
Figure 700999DEST_PATH_IMAGE029
Representing users in a social network; />
Figure 305156DEST_PATH_IMAGE030
Representing a collection of edges.
Preferably, the present invention further provides a social network influence maximization system of a graph annotation force mechanism, including:
and a network data module: graph sequence data for collecting social network data and constructing a social network, wherein the graph sequence data comprises: graph adjacency matrix data and node representation feature data;
and the feature extraction module is used for: extracting features of the graph sequence data based on a graph attention network and a Node2Vec combination algorithm;
candidate seed selection module: heuristically selecting candidate seeds from the graph sequence data after feature extraction;
seed node selection module: and selecting the node with the maximum propagation degree gain from the candidate seeds by adopting a greedy algorithm as a final seed node, thereby forming a seed node set with the maximum propagation degree gain.
The beneficial effects of the invention are as follows: according to the social network influence maximization method and system of the graph attention mechanism, the multi-layer attention mechanism of the graph attention network is adopted to learn the graph structure of the social network, the graph topological structure with more complex is effectively learned, the seed node with the most influence is selected, influence maximization is achieved in the social network, and good usability is achieved.
Drawings
The foregoing and other objects, features and advantages of the invention will be apparent from the following more particular description of preferred embodiments of the invention, as illustrated in the accompanying drawings. Like reference numerals refer to like parts throughout the drawings, and the drawings are not intentionally drawn to scale on actual size or the like, with emphasis on illustrating the principles of the invention.
FIG. 1 is a schematic diagram of a specific flow chart of a method and a system for maximizing social network influence by providing a attention mechanism according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of an embodiment of a method and a system for maximizing social network impact of a graph attention mechanism according to an embodiment of the present invention;
fig. 3 is a schematic diagram of feature extraction based on a graph attention network and a Node2Vec combining algorithm according to an embodiment of the present invention.
Detailed Description
The technical solution of the present invention will be further described in detail below with reference to the accompanying drawings and specific examples, so that those skilled in the art can better understand the present invention and implement it, but the examples are not limited thereto.
In a first embodiment, please refer to fig. 1-3, an embodiment of the present invention provides a social network influence maximizing method of a graph annotation mechanism, which includes the following steps:
s1: collecting social network data and constructing graph sequence data of a social network;
wherein the graph sequence data includes: graph adjacency matrix data and node representation feature data;
s2: inputting the graph structure data into a graph injection force network and Node2vec combined influence maximization model to perform feature embedding learning so as to extract information related to influence maximization problems;
s3: and heuristically selecting candidate seeds from the graph sequence data after feature extraction, and selecting a node with the maximum propagation degree gain from the candidate seeds by adopting a greedy algorithm as a final seed node, so that a seed node set with the maximum propagation degree gain is formed, and influence overlapping is avoided. An influence maximization model based on a graph attention mechanism is constructed (comprising a Node characteristic extraction method combining a graph attention network and a Node2Vec, a candidate seed selection method based on a heuristic method and a seed selection method based on a greedy algorithm).
The beneficial effects of the invention are as follows: aiming at complex graph data, the method can effectively utilize the graph topological structure, and solves the problem of suboptimal solution of the seed set. In the Node characteristic processing stage, node2Vec is used for learning the shallow graph structure and processing the deeper graph topology structure by utilizing a graph attention mechanism. In terms of seed selection, a heuristic algorithm is used to select candidate seed nodes, and a greedy algorithm CELF is used to select the seed nodes with the greatest influence, so that the influence overlapping problem is relieved. Because the method is monotonous and sub-model, the approximate optimal solution of the influence maximization problem is ensured, and the method has better usability.
Referring to fig. 2-3, in a preferred embodiment, the specific steps of step S2 include:
to obtain feature embedding of a Node, feature X is represented to the Node by Node2Vec i Random walk sequence with sampling and processing of negative sampling operations (based on second order random walk super parameters
Figure 7533DEST_PATH_IMAGE031
and />
Figure DEST_PATH_IMAGE032
To generate a random walk sequence), the graph attention network layer of the graph attention network representing the feature X to the processed nodes based on the graph attention network i And then processing and calculating node characteristic output, and splicing the K output node characteristic vectors to obtain the final node characteristic vector. (in the preferred embodiment, node2Vec generates Node feature dimension is 512, sampling sequence length is 6, and the number of sampling sequences per Node is 200, wherein the learning rate in the graph attention network model is 0.0001, the first layer graph attention network output dimension is 256, and the second layer network output dimension is 16).
Referring to FIGS. 2-3, in a preferred embodiment, feature X is represented to nodes by Node2Vec in step S2 i The specific steps of the random walk sequence and the negative sampling operation process of sampling further comprise:
Node-to-Node representation feature X based on Node2Vec i The random walk sequence of sampling and the processing of the negative sampling operation take into account low order neighbors and high orderThe similarity of neighbors can flexibly capture the homogeneity and the structural peering of nodes in the graph and maximize the central node by using a skip-gram algorithm
Figure 431561DEST_PATH_IMAGE001
And the probability of co-occurrence of context nodes within the length w of the left window and the right window, wherein the calculation formula is as follows:
Figure 612006DEST_PATH_IMAGE033
wherein ,
Figure 336249DEST_PATH_IMAGE003
for node->
Figure 209527DEST_PATH_IMAGE001
Is characterized by a potential node;
the final target loss function is minimized by a formula logarithm method (the reconstruction loss of a graph structure is calculated, the graph is subjected to unsupervised training to obtain a low-dimensional node characteristic representation), and the target loss function is optimized and converged by a random gradient descent algorithm to obtain the node representation characteristic, wherein the calculation formula of the target loss function is as follows:
Figure 261797DEST_PATH_IMAGE034
referring to fig. 2-3, in a further preferred embodiment, the graph attention network layer based on the graph attention network in step S2 represents the feature X to the processed node i The specific steps of the treatment include:
in the aspect of graph structure data, through the action of an attention mechanism, a user can allocate different attention to the neighbor nodes, and a larger attention coefficient is allocated to the neighbor nodes with similar hobbies or similar topological structures, so that the nodes can learn more complex graph topological structure characteristics. Therefore, based on the graph attention network, h= { H 1 ,h 2 ……h n As input features of the nodes, and calculates the attention coefficient between two nodes, wherein the calculation formula is as follows:
Figure 370567DEST_PATH_IMAGE005
wherein ,
Figure 824682DEST_PATH_IMAGE006
representing a weight matrix for the node characteristics +.>
Figure 868861DEST_PATH_IMAGE007
Performing linear transformation>
Figure 533061DEST_PATH_IMAGE008
Representing a shared attention mechanism, +.>
Figure 320888DEST_PATH_IMAGE009
The node +.>
Figure 754144DEST_PATH_IMAGE010
Node->
Figure 969224DEST_PATH_IMAGE011
Is of importance; for the purpose of node->
Figure 996086DEST_PATH_IMAGE011
A larger difference is made in their neighbor attention coefficients, where the attention coefficients can be normalized using the softmax function:
Figure DEST_PATH_IMAGE035
in which a single layer neural network is used for computation
Figure 921360DEST_PATH_IMAGE009
Then use LeakyReLUThe method is characterized in that the attention coefficient is normalized through the activation function as a nonlinear activation function, and the calculation formula is as follows:
Figure 350067DEST_PATH_IMAGE036
where a represents the weight of a single-layer neural network that calculates attention,
Figure 860683DEST_PATH_IMAGE013
representing transpose operations in a matrix,/->
Figure 374841DEST_PATH_IMAGE014
Representing the operation of the connection to the two matrices.
Referring to fig. 2-3, in a further preferred embodiment, the graph attention network layer based on the graph attention network in step S2 represents the feature X to the processed node i And then processing and calculating node characteristic output, and splicing the K output node characteristic vectors to obtain a final node characteristic vector, wherein the specific steps further comprise:
in order to make the self-attention mechanism more stable, based on a graph attention network, K attention layers of a multi-head attention mechanism are adopted to respectively calculate different node characteristic outputs, and the node characteristic vectors of the K outputs are spliced to form a final node characteristic vector, wherein the calculation formula of the multi-head attention mechanism is as follows:
Figure 504471DEST_PATH_IMAGE015
wherein ,
Figure 912319DEST_PATH_IMAGE014
indicating the connection operation +_>
Figure 469202DEST_PATH_IMAGE016
Indicated by +.>
Figure 470656DEST_PATH_IMAGE017
Attention coefficients calculated by the attention layers, < >>
Figure DEST_PATH_IMAGE037
Indicate->
Figure 997452DEST_PATH_IMAGE017
And learning parameters for linearly transforming the node characteristics.
Referring to fig. 2-3, in a preferred embodiment, the graph attention network layer of the graph attention network-based graph attention network in step S2 represents the feature X to the processed node i And then processing and calculating node characteristic output, and splicing the K output node characteristic vectors to obtain a final node characteristic vector, wherein the specific steps further comprise:
the calculation method of the last layer of graph annotation force network layer comprises the following steps: for the last layer of models, it will not normally be
Figure 135173DEST_PATH_IMAGE017
The characteristics are spliced, the average value of k characteristics is calculated, nonlinear transformation is carried out through a nonlinear activation function, and the node updating calculation formula of the last layer is as follows: />
Figure 987591DEST_PATH_IMAGE019
The graph attention network layer processes the nodes and learns further graph structures. The method ignores the degree of the nodes in network searching, generates a fixed number of paths for each node, learns a complex graph topological structure, and improves the effectiveness of the algorithm.
Referring to fig. 2-3, in a further preferred embodiment, the step S2 of heuristically selecting candidate seeds for the feature extracted graph sequence data includes:
the euclidean norm of the vector is used to measure the similarity between node pairs, and if there is a high similarity between two nodes, two nodes are considered to be more susceptible to each other, whereas two nodes are considered to be less susceptible to each other.
The propagation path of the information on the social network is generally relatively short, and the propagation range is limited to the range of the second-order neighbors of the nodes, so that the feature vectors of the first-order neighbors and the second-order neighbors of the user are selected as the related vector of each node in the network. And calculating the similarity between the node pairs by using the feature vectors, and then sequencing.
The social network influence maximization method and the social network influence maximization system of the graph attention mechanism utilize a heuristic method to select candidate seed nodes when selecting candidate seeds. After obtaining the node characteristic representation in the network, euclidean distance between nodes is used to evaluate similarity between nodes, and then strong correlation node set of each node is calculated. And finally, counting the occurrence frequency of each node in the strong correlation node sets of other nodes, sequencing the nodes, and selecting partial nodes with high frequency as candidate seed node sets.
By selecting seed nodes in the candidate seed node set, the time efficiency of the whole algorithm can be accelerated. Meanwhile, in order to avoid the problem of influence overlapping among the nodes, a final seed node set is selected from candidate seed nodes by using an optimized greedy algorithm CELF at the stage.
The method comprises the steps of taking the characteristic vectors of np nodes of first-order neighbors and second-order neighbors of a network user as related vectors of each node, calculating the similarity between the two nodes by utilizing Euclidean norms of the vectors, and selecting an r x np node with the maximum similarity as a strong related node, wherein r is a strong related node coefficient, r epsilon (0, 1), the number of the nodes is rounded downwards, node frequencies are obtained by calculating the occurrence times of each node in strong related node sets of other nodes, sorting is carried out according to the obtained node frequencies, ck nodes with the maximum occurrence times are selected as nodes of candidate seeds, and ck is respectively the candidate seed node related coefficient (generally set to 10) and the number of seed nodes (set to 50);
the similarity is calculated by Euclidean norm formula:
Figure 476341DEST_PATH_IMAGE020
wherein ,
Figure 947774DEST_PATH_IMAGE021
,/>
Figure 64634DEST_PATH_IMAGE022
respectively represent nodesiSum nodej
Referring to fig. 2-3, in a preferred embodiment, the step S2 of determining the final seed node from the candidate seeds by using a greedy algorithm includes:
after the candidate nodes are selected, the final candidate nodes are selected from the candidate node sets
Figure 228899DEST_PATH_IMAGE017
And seed nodes. There may be a problem of overlapping effects among the candidate nodes chosen:
if an individual node u or an individual node v can ideally affect all neighbors around, then nodes are selected in the seed node set
Figure 798421DEST_PATH_IMAGE038
In the case of (2), node +.>
Figure 73545DEST_PATH_IMAGE039
And adding the seed node set, so that the influence propagation degree of the seed node set is not improved. Here a heuristic formula is used to measure the information propagation degree of a seed set S.
The influence propagation degree of the candidate seed nodes is calculated in a heuristic manner, and the calculation formula is as follows:
Figure 185857DEST_PATH_IMAGE023
wherein ,
Figure 380078DEST_PATH_IMAGE024
representing the size of the collection +.>
Figure 843420DEST_PATH_IMAGE025
Representation set->
Figure 391076DEST_PATH_IMAGE026
The number of neighbor nodes not activated;
and sequencing the influence propagation degree of each node, and selecting the node with the largest influence propagation degree as a seed node, thereby forming a seed node set with the largest propagation degree gain.
In the case of an independent cascading propagation model,
Figure 748108DEST_PATH_IMAGE040
is monotonous and sub-mode. The marginal influence of the nodes conforms to the sub-model, and the CELF algorithm is optimized by utilizing the property. Thus calculate the seed node set +.>
Figure 723018DEST_PATH_IMAGE026
. After adding a first node A into a seed node according to the marginal influence, calculating the marginal influence again by a node B with the next smaller marginal influence in each node calculated for the first time, and if the new marginal influence of the node B is larger than or equal to the previous marginal influence of a node C with the next smaller marginal influence of the node B, directly taking the node B as a new seed node without calculating the marginal influence of a later node again. If the influence of the node B is not greater than or equal to the last round of marginal influence of the node C which is smaller than the last round of the node B, the marginal influence of each node is calculated one by one in sequence, and the largest node is selected as a seed node in sequence and placed into a seed subset.
Referring to fig. 2-3, in a preferred embodiment, the specific steps of step S1 include:
collecting social network data, wherein the social network is as follows:
Figure 673656DEST_PATH_IMAGE027
wherein ,
Figure 415216DEST_PATH_IMAGE028
represents node set, node->
Figure 236541DEST_PATH_IMAGE029
Representing users in a social network; />
Figure 772565DEST_PATH_IMAGE030
The method comprises the steps that a set of representative edges (one edge represents the influence possibly generated among nodes, a social network is modeled as a graph sequence, a user conducts information propagation in the social network, other users are influenced through information propagation, and the number of the influenced users is maximized in the information propagation process).
In this embodiment, network data is collected, provided by the arxiv platform, whether there is cooperation between authors in the neighborhood of the high-energy physical theory, and if two authors write at least one paper together, an undirected edge is generated between the two authors, and the collected data includes 31376 edges and 15229 nodes. Algorithm performance was measured using the influence spread, performed under an independent cascade model with probability p=0.1, calculated by repeating 10000 monte carlo simulations.
In a second embodiment, the present application further provides a method for maximizing academic quote network impact based on graph attention mechanisms;
step S1: collecting academic cited network data, and constructing diagram sequence data of the academic cited network, wherein in the embodiment, a data set consists of 2708 nodes and 5429 edges; algorithm performance was measured using the influence spread, performed under an independent cascade model with probability p=0.1, calculated by repeating 10000 monte carlo simulations.
Step S2: inputting academic cited graph sequence data into Node2Vec algorithm to learn shallow graph topology structure to Node representation
Figure 944920DEST_PATH_IMAGE041
In the method, a multi-head attention mechanism is utilized to learn the graph sequence characteristics after preliminary processing, and a two-layer graph attention network layer pair is used for +.>
Figure 99958DEST_PATH_IMAGE041
Processing, calculating different node characteristic outputs, and then adding this +.>
Figure 166003DEST_PATH_IMAGE042
The individual node feature vectors are stitched together as the final node feature vector. And finally, calculating the reconstruction loss of the graph structure, and performing unsupervised training on the graph to obtain a low-dimensional node characteristic representation.
Step S3: inputting graph sequence data with extracted features, selecting candidate seeds by using a heuristic method, and re-selecting the candidate seeds by using an optimized greedy algorithm to determine final seed nodes;
in a third embodiment, the present application further provides a twitter network impact maximizing method based on a graph attention mechanism;
step S1: collecting data of a twitter network, and constructing graph sequence data of the twitter network, wherein a data set consists of 3312 nodes and 4732 edges in the embodiment; algorithm performance was measured using the influence spread, performed under an independent cascade model with probability p=0.1, calculated by repeating 10000 monte carlo simulations.
Step S2: inputting graph sequence data of the twitter network into a Node2Vec algorithm to learn a shallow graph topological structure into a Node representation
Figure 748294DEST_PATH_IMAGE041
In the method, a multi-head attention mechanism is utilized to learn the graph sequence characteristics after preliminary processing, and a two-layer graph attention network layer pair is used for +.>
Figure 142366DEST_PATH_IMAGE041
Processing, calculating different node characteristic outputs, and then adding this +.>
Figure 225729DEST_PATH_IMAGE042
The individual node feature vectors are stitched together as the final node feature vector. And finally, calculating the reconstruction loss of the graph structure, and performing unsupervised training on the graph to obtain a low-dimensional node characteristic representation.
Step S3: inputting graph sequence data with extracted features, selecting candidate seeds by using a heuristic method, and re-selecting the candidate seeds by using an optimized greedy algorithm to determine final seed nodes;
referring to fig. 1-3, in a preferred embodiment, the present invention further provides a social network impact maximizing system of a graph annotation mechanism, including:
and a network data module: graph sequence data for collecting social network data and constructing a social network, wherein the graph sequence data comprises: graph adjacency matrix data and node representation feature data;
and the feature extraction module is used for: extracting features of the graph sequence data based on a graph attention network and a Node2Vec combination algorithm;
candidate seed selection module: heuristically selecting candidate seeds from the graph sequence data after feature extraction;
seed node selection module: and selecting the node with the maximum propagation degree gain from the candidate seeds by adopting a greedy algorithm as a final seed node, thereby forming a seed node set with the maximum propagation degree gain.
The social network influence maximization system of the graph attention mechanism provided by the embodiment is the same as the social network influence maximization method of the graph attention mechanism provided by the embodiment, and the social network influence maximization system and the social network influence maximization method can be shared.
The beneficial effects of the invention are as follows: the invention provides a social network influence maximization method and a social network influence maximization system for a graph attention mechanism, which adopt a multi-layer attention mechanism of the graph attention network to learn a graph structure of the social network, effectively learn a more complex graph topological structure, select a seed node with the most influence, realize influence maximization in the social network and have better usability.
The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims (6)

1. The social network influence maximization method of the graph meaning force mechanism is characterized by comprising the following steps of:
s1: collecting network data of authors or academic quotations or twitter networks in a high-energy physical theory neighborhood in the social network data and constructing graph sequence data of the social network;
wherein the graph sequence data includes: graph adjacency matrix data and node representation feature data;
s2: representing feature X to Node by Node2Vec i Random walk sequence with sampling and negative sampling operation process, using skip-gram algorithm to maximize central node
Figure QLYQS_1
And the probability of co-occurrence of context nodes within the left and right window lengths w; the calculation formula is as follows:
Figure QLYQS_2
wherein ,
Figure QLYQS_3
for node->
Figure QLYQS_4
Is characterized by a potential node;
minimizing a final target loss function by a formula logarithm method, and optimizing and converging the target loss function by a random gradient descent algorithm to obtain node representation characteristics;
the calculation formula of the target loss function is as follows:
Figure QLYQS_5
based on a graph attention network, calculating node characteristic output by adopting K attention layers of a multi-head attention mechanism, and splicing the node characteristic vectors of the K outputs to obtain a final node characteristic vector;
s3: the method comprises the steps of taking the characteristic vectors of np nodes of a first-order neighbor and a second-order neighbor of a network user as related vectors of each node, calculating the similarity between the two nodes by utilizing Euclidean norms of the vectors, selecting an rnp node with the maximum similarity as a strong related node, wherein r is a strong related node coefficient, r is E (0, 1), obtaining node frequency by calculating the occurrence times of each node in the strong related node set of other nodes, sorting according to the obtained node frequency, and selecting ck node with the maximum occurrence times as a candidate seed node, wherein ck is the candidate seed node related coefficient and the seed node number respectively;
the similarity is calculated by Euclidean norm formula:
Figure QLYQS_6
wherein ,
Figure QLYQS_7
,/>
Figure QLYQS_8
respectively represent nodesi Sum nodej
Calculating the influence propagation degree of candidate seed nodes, sequencing the influence propagation degree of each node, and selecting the node with the largest influence propagation degree as the seed node, thereby forming a seed node set with the largest propagation degree gain;
the calculation formula of the influence propagation degree of the candidate seed nodes is as follows:
Figure QLYQS_9
wherein ,
Figure QLYQS_10
representing the size of the collection +.>
Figure QLYQS_11
Representation set->
Figure QLYQS_12
The number of neighbor nodes that are not activated.
2. The method for maximizing the influence of a social network as recited in claim 1, wherein the graph attention network layer based on the graph attention network in said step S2 represents the feature X to the processed nodes i The specific steps of the treatment include:
based on the graph attention network, h= { H 1 ,h 2 ……h n As input features of the nodes, and calculates the attention coefficient between two nodes, wherein the calculation formula is as follows:
Figure QLYQS_13
wherein ,
Figure QLYQS_14
representing a weight matrix for the node characteristics +.>
Figure QLYQS_15
Performing linear transformation>
Figure QLYQS_16
Representing a shared attention mechanism,/>
Figure QLYQS_17
The node +.>
Figure QLYQS_18
Node->
Figure QLYQS_19
Is of importance;
when the attention coefficient is normalized through the activation function, the calculation formula is as follows:
Figure QLYQS_20
wherein ,
Figure QLYQS_21
representing transpose operations in a matrix,/->
Figure QLYQS_22
Representing the operation of the connection to the two matrices.
3. The method for maximizing the influence of a social network as recited in claim 1, wherein the step S2 of calculating the node feature output based on the graph attention network by using K attention layers of the multi-head attention mechanism, and the step of stitching the node feature vectors of the K outputs to obtain the final node feature vector further comprises:
the calculation formula of the multi-head attention mechanism is as follows:
Figure QLYQS_23
wherein ,
Figure QLYQS_24
indicating the connection operation +_>
Figure QLYQS_25
Indicated by +.>
Figure QLYQS_26
Attention coefficients calculated by the attention layers, < >>
Figure QLYQS_27
Represent the first
Figure QLYQS_28
And learning parameters for linearly transforming the node characteristics.
4. The method for maximizing influence of social network as recited in claim 3, wherein the step S2 of calculating node feature outputs based on the graph attention network by using K attention layers of a multi-head attention mechanism, and the step of concatenating the K output node feature vectors to obtain a final node feature vector further comprises:
the calculation method of the last layer of graph annotation force network layer comprises the following steps: the average value of k features is calculated, nonlinear transformation is carried out through a nonlinear activation function, and the node calculation formula of the last layer is as follows:
Figure QLYQS_29
5. the method for maximizing the influence of a social network as recited in claim 1, wherein the specific step of step S1 includes:
collecting social network data, wherein the social network is as follows:
Figure QLYQS_30
wherein ,
Figure QLYQS_31
represents node set, node->
Figure QLYQS_32
Representing users in a social network; />
Figure QLYQS_33
Representing a collection of edges.
6. A social network impact maximization system of a graph attention mechanism, comprising:
and a network data module: network data for collecting inter-author or academic cited or twitter networks in a high-energy physical theoretical neighborhood in social network data, wherein graph sequence data comprises: graph adjacency matrix data and node representation feature data;
and the feature extraction module is used for: representing feature X to Node by Node2Vec i Random walk sequence with sampling and negative sampling operation process, using skip-gram algorithm to maximize central node
Figure QLYQS_34
And the probability of co-occurrence of context nodes within the left and right window lengths w;
the calculation formula is as follows:
Figure QLYQS_35
wherein ,
Figure QLYQS_36
for node->
Figure QLYQS_37
Is characterized by a potential node;
minimizing a final target loss function by a formula logarithm method, and optimizing and converging the target loss function by a random gradient descent algorithm to obtain node representation characteristics;
the calculation formula of the target loss function is as follows:
Figure QLYQS_38
based on a graph attention network, calculating node characteristic output by adopting K attention layers of a multi-head attention mechanism, and splicing the node characteristic vectors of the K outputs to obtain a final node characteristic vector;
candidate seed selection module: the method comprises the steps of taking the characteristic vectors of np nodes of a first-order neighbor and a second-order neighbor of a network user as related vectors of each node, calculating the similarity between the two nodes by utilizing Euclidean norms of the vectors, selecting an rnp node with the maximum similarity as a strong related node, wherein r is a strong related node coefficient, r is E (0, 1), obtaining node frequency by calculating the occurrence times of each node in the strong related node set of other nodes, sorting according to the obtained node frequency, and selecting ck node with the maximum occurrence times as a candidate seed node, wherein ck is the candidate seed node related coefficient and the seed node number respectively;
the similarity is calculated by Euclidean norm formula:
Figure QLYQS_39
wherein ,
Figure QLYQS_40
,/>
Figure QLYQS_41
respectively represent nodesi Sum nodej
Seed node selection module: calculating the influence propagation degree of candidate seed nodes, sequencing the influence propagation degree of each node, and selecting the node with the largest influence propagation degree as the seed node, thereby forming a seed node set with the largest propagation degree gain;
the calculation formula of the influence propagation degree of the candidate seed nodes is as follows:
Figure QLYQS_42
wherein ,
Figure QLYQS_43
,/>
Figure QLYQS_44
respectively represent nodesi Sum nodej
Seed node selection module: calculating the influence propagation degree of candidate seed nodes, sequencing the influence propagation degree of each node, and selecting the node with the largest influence propagation degree as the seed node, thereby forming a seed node set with the largest propagation degree gain;
the calculation formula of the influence propagation degree of the candidate seed nodes is as follows:
Figure QLYQS_45
wherein ,
Figure QLYQS_46
representing the size of the collection +.>
Figure QLYQS_47
Representation set->
Figure QLYQS_48
The number of neighbor nodes that are not activated. />
CN202310025466.6A 2023-01-09 2023-01-09 Social network influence maximization method and system of graph annotation meaning force mechanism Active CN115878908B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310025466.6A CN115878908B (en) 2023-01-09 2023-01-09 Social network influence maximization method and system of graph annotation meaning force mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310025466.6A CN115878908B (en) 2023-01-09 2023-01-09 Social network influence maximization method and system of graph annotation meaning force mechanism

Publications (2)

Publication Number Publication Date
CN115878908A CN115878908A (en) 2023-03-31
CN115878908B true CN115878908B (en) 2023-06-02

Family

ID=85758315

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310025466.6A Active CN115878908B (en) 2023-01-09 2023-01-09 Social network influence maximization method and system of graph annotation meaning force mechanism

Country Status (1)

Country Link
CN (1) CN115878908B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898041A (en) * 2020-07-20 2020-11-06 电子科技大学 Social network combined circle layer user comprehensive influence evaluation and counterfeiting discrimination method
CN111898040A (en) * 2020-07-20 2020-11-06 电子科技大学 Circle layer user influence evaluation method combined with social network
CN112214689A (en) * 2020-10-22 2021-01-12 上海交通大学 Method and system for maximizing influence of group in social network
CN112330136A (en) * 2020-11-02 2021-02-05 国网江苏省电力有限公司电力科学研究院 Relevance mining method and device for abnormal electricity utilization analysis data set of large user
CN112446634A (en) * 2020-12-03 2021-03-05 兰州大学 Method and system for detecting influence maximization node in social network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111898041A (en) * 2020-07-20 2020-11-06 电子科技大学 Social network combined circle layer user comprehensive influence evaluation and counterfeiting discrimination method
CN111898040A (en) * 2020-07-20 2020-11-06 电子科技大学 Circle layer user influence evaluation method combined with social network
CN112214689A (en) * 2020-10-22 2021-01-12 上海交通大学 Method and system for maximizing influence of group in social network
CN112330136A (en) * 2020-11-02 2021-02-05 国网江苏省电力有限公司电力科学研究院 Relevance mining method and device for abnormal electricity utilization analysis data set of large user
CN112446634A (en) * 2020-12-03 2021-03-05 兰州大学 Method and system for detecting influence maximization node in social network

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于偏好传播的社交影响预测研究;陈泓霏;中国优秀硕士学位论文全文数据库基础科学辑(第8期);A002-81 *
基于启发式和贪心策略的社交网络影响最大化算法;曹玖新 等;东南大学学报;第46卷(第5期);第950-956页 *

Also Published As

Publication number Publication date
CN115878908A (en) 2023-03-31

Similar Documents

Publication Publication Date Title
Ma et al. Adaptive-step graph meta-learner for few-shot graph classification
CN108009575A (en) A kind of community discovery method for complex network
CN109766710B (en) Differential privacy protection method of associated social network data
CN112446634B (en) Method and system for detecting influence maximization node in social network
CN114064627A (en) Knowledge graph link completion method and system for multiple relations
CN110866134A (en) Image retrieval-oriented distribution consistency keeping metric learning method
Panagopoulos et al. Influence maximization using influence and susceptibility embeddings
CN109948242A (en) Network representation learning method based on feature Hash
CN109783805A (en) A kind of network community user recognition methods and device
Zhou et al. Approximate deep network embedding for mining large-scale graphs
Yu et al. Unsupervised euclidean distance attack on network embedding
Wickman et al. A Generic Graph Sparsification Framework using Deep Reinforcement Learning
Wei et al. Auto-prox: Training-free vision transformer architecture search via automatic proxy discovery
Wang et al. A multi-agent genetic algorithm for local community detection by extending the tightest nodes
CN112231579B (en) Social video recommendation system and method based on implicit community discovery
CN113989544A (en) Group discovery method based on deep map convolution network
CN116955846B (en) Cascade information propagation prediction method integrating theme characteristics and cross attention
CN109472712A (en) A kind of efficient Markov random field Combo discovering method strengthened based on structure feature
CN115878908B (en) Social network influence maximization method and system of graph annotation meaning force mechanism
Gialampoukidis et al. Community detection in complex networks based on DBSCAN* and a Martingale process
CN117272195A (en) Block chain abnormal node detection method and system based on graph convolution attention network
CN115661861A (en) Skeleton behavior identification method based on dynamic time sequence multidimensional adaptive graph convolution network
CN112256756B (en) Influence discovery method based on ternary association diagram and knowledge representation
CN114722920A (en) Deep map convolution model phishing account identification method based on map classification
Ibrahim et al. Under-counted tensor completion with neural incorporation of attributes

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant