CN116932938A - Link prediction method and system based on topological structure and attribute information - Google Patents

Link prediction method and system based on topological structure and attribute information Download PDF

Info

Publication number
CN116932938A
CN116932938A CN202310697038.8A CN202310697038A CN116932938A CN 116932938 A CN116932938 A CN 116932938A CN 202310697038 A CN202310697038 A CN 202310697038A CN 116932938 A CN116932938 A CN 116932938A
Authority
CN
China
Prior art keywords
node
attribute
representation
nodes
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310697038.8A
Other languages
Chinese (zh)
Inventor
汤庸
李伟生
汤非易
陈国华
袁成哲
林荣华
常超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Yifei Information Technology Co ltd
South China Normal University
Original Assignee
Guangzhou Yifei Information Technology Co ltd
South China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Yifei Information Technology Co ltd, South China Normal University filed Critical Guangzhou Yifei Information Technology Co ltd
Priority to CN202310697038.8A priority Critical patent/CN116932938A/en
Publication of CN116932938A publication Critical patent/CN116932938A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • G06F16/9558Details of hyperlinks; Management of linked annotations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a link prediction method and a link prediction system based on a topological structure and attribute information, wherein a first node sequence set is obtained by carrying out biased random walk calculation on an original attribute map, frequency screening is carried out on a first node, a neighbor node set is determined, attribute vector representation of the node is obtained, then a similar attribute node set is determined, the neighbor node set and the similar attribute node set are fused to generate the first attribute map, and then the probability of existence of links between a source node and a target node is predicted. The invention can be widely applied to the technical field of Internet.

Description

Link prediction method and system based on topological structure and attribute information
Technical Field
The invention relates to the technical field of Internet, in particular to a link prediction method and a link prediction system based on topological structure and attribute information.
Background
In recent years, with the continuous development of internet applications such as social networks, electronic commerce and search engines, the connection relationship between nodes in the network becomes more and more complex and diversified, so the accuracy and efficiency of link prediction are particularly important. The purpose of the link prediction method is to identify potential connection relationships, thereby promoting more intensive exploration and research, such as recommending related content or finding more friends or partners, etc. In addition, the link prediction can be applied to the fields of financial fraud detection, disease propagation analysis, knowledge graph construction and the like.
Traditional link prediction methods are mainly based on topological structures, and generally adopt graph theory, machine learning, network science and other technologies to extract structural features from the network topological structures, such as centrality, near centrality, betweenness centrality and the like, so as to predict the link relation among nodes. However, the conventional link prediction method generally considers only the topology of the network, and ignores attribute information of the node itself. These methods do not solve some challenging problems, such as lack of topology information, network noise, and lack of node attributes, where the accuracy of traditional link prediction methods is reduced.
Disclosure of Invention
In view of this, the embodiment of the invention provides a highly accurate link prediction method and system based on topology and attribute information.
In order to achieve the above object, an aspect of an embodiment of the present invention provides a link prediction method based on topology and attribute information, where the method includes:
determining a root node from a first node of an original attribute graph, performing offset random walk calculation on the original attribute graph according to the root node to obtain a first node sequence set, and performing frequency screening on the first node according to the first node sequence set to obtain a neighbor node set;
learning the attribute representation of the original attribute map by adopting a BERT model to obtain attribute vector representations of all the first nodes in the original attribute map, and generating a high-order similar attribute list;
calculating attribute similarity among the first nodes according to the high-order similar attribute list, and further determining a similar attribute node set of the root node;
merging the neighbor node set and the similar attribute node set to obtain a first attribute graph, and determining a final node representation of the first attribute graph;
and predicting the probability of a link between the source node and the target node according to the final node representation.
Optionally, the performing, according to the random walk calculation that the root node biases the original attribute map, obtaining a first node sequence set, and performing frequency screening on the first node according to the first node sequence set, to obtain a neighboring node set, including:
calculating the migration probability of one first node to take the other first node as the next node for migration according to the weight of the edges between the first nodes and the influence of the first nodes;
taking the root node as a starting node of random walk calculation, and carrying out biased random walk calculation according to the walk probability to obtain a first node sequence set; wherein the first node sequence set comprises a plurality of strings of first node sequences;
and determining a neighbor node set according to the frequency of the first node in the first node sequence set.
Optionally, the calculation formula of the walk probability is:
wherein P (v) t |v t-1 ) Represents the t-1 th said first node v t-1 With the t th said first node v t A migration probability of performing migration as a next node; alpha (v) t-1 ,v t ) Is v t-1 And v t A transfer weight function therebetween; z is a normalization constant; e is the edge set in the original attribute map;
the calculation formula of the frequency is as follows:
wherein freq (v, S) represents the frequency of occurrence of the current node v in said first set of node sequences S; count (v, S) represents the number of times the current node v appears in the first set of node sequences S,representing the sum of the number of occurrences of all the first nodes in the first set of sequences of nodes S; v r Representing an r-th said first node;
the function expression for determining the neighbor node set is:
S =F(|v|v∈S,freq(v,S)>τ|)
wherein S is Representing a set of neighbor nodes; f is a function of selecting neighbor nodes; v represents the current node; τ is a frequency threshold that controls the frequency of the selected node.
Optionally, learning the attribute representation of the original attribute map by using a BERT model to obtain attribute vector representations of all the first nodes in the original attribute map, and generating a high-order similar attribute list includes:
inputting the original attribute map into a BERT model for calculation to obtain a hidden layer representation of the first node;
encoding structural information between the first nodes into a text sequence;
determining an attribute vector representation of the first node in combination with the hidden layer representation by taking the text sequence as an input of an NSP task;
and integrating the attribute vector representations to form a high-order similar attribute list.
Optionally, the calculating attribute similarity between the first nodes according to the high-order similar attribute list, and further determining a similar attribute node set of the root node includes:
determining a dependency relationship in the original attribute map according to the high-order similar attribute list;
calculating attribute similarity between the first nodes by a cosine similarity calculation method according to the dependency relationship;
and determining a similar attribute node set of the root node according to the attribute similarity.
Optionally, the merging the neighboring node set and the similar attribute node set to obtain a first attribute map, and determining a final node representation of the first attribute map, including:
combining the neighbor node set and the similar attribute node set to obtain a first attribute graph;
splicing the structural representation and the attribute representation of the second node in the first attribute graph, and further determining a first neighbor node of the root node;
and carrying out fusion calculation on the root node and the first neighbor node by adopting a multi-head attention mechanism to obtain a final node representation.
Optionally, the predicting, according to the final node representation, a probability that a link exists between a source node and a target node includes:
acquiring a source node and a target node to be predicted;
determining a first embedded representation of the source node and a second embedded representation of the target node from the final node representation;
splicing the first embedded representation and the second embedded representation into a vector and inputting the vector into a multi-layer perceptron;
determining a link score according to a first layer weight matrix, a second layer weight matrix, a first layer bias vector and a second layer bias vector of the multi-layer perceptron;
determining a probability of a link between the source node and the target node based on the link score
Another aspect of an embodiment of the present invention provides a link prediction system based on topology and attribute information, including:
the first module is used for determining a root node from a first node of an original attribute graph, performing offset random walk calculation on the original attribute graph according to the root node to obtain a first node sequence set, and performing frequency screening on the first node according to the first node sequence set to obtain a neighbor node set;
the second module is used for learning the attribute representation of the original attribute graph by adopting a BERT model to obtain attribute vector representations of all the first nodes in the original attribute graph and generate a high-order similar attribute list;
a third module, configured to calculate, according to the high-order similarity attribute list, attribute similarity between the first nodes, thereby determining a similar attribute node set of the root node;
a fourth module, configured to combine the neighboring node set and the similar attribute node set to obtain a first attribute map, and determine a final node representation of the first attribute map;
and a fifth module, configured to predict a probability that a link exists between the source node and the target node according to the final node representation.
The embodiment of the invention also provides electronic equipment, which comprises a processor and a memory; the memory is used for storing programs; the processor executes the program to implement the method as described above.
The embodiment of the invention also provides a computer-readable storage medium storing a program that is executed by a processor to implement the method as described above.
The embodiment of the invention has the following beneficial effects: the method comprises the steps of carrying out biased random walk calculation on an original attribute graph to obtain a first node sequence set, carrying out frequency screening on the first node, determining a neighbor node set, obtaining attribute vector representation of the node, further determining a similar attribute node set, fusing the neighbor node set and the similar attribute node set to generate the first attribute graph, further predicting the probability of existence of links between a source node and a target node, effectively utilizing a topological structure and attribute information to learn node representation, capturing higher-order neighbor information, and adopting a random walk and frequency screening method to fully utilize the relevance among all nodes in the original attribute graph, capturing the structural adjacency in the original attribute graph, obtaining more comprehensive information, further providing a more accurate link prediction result and improving the accuracy of inter-node link prediction in the attribute graph.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of steps of a link prediction method based on topology and attribute information provided by an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a link prediction method based on topology and attribute information according to an embodiment of the present invention;
FIG. 3 is a system block diagram of a connection prediction system based on topology and attribute information provided by an embodiment of the present invention;
fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
It should be noted that although functional block division is performed in a system diagram and a logic sequence is shown in a flowchart, in some cases, the steps shown or described may be performed in a different order than block division in an apparatus or in a flowchart. The terms first, second and the like in the description and in the claims and in the above-described figures, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.
Aiming at the problem of lower accuracy of the current link prediction method, the embodiment of the invention provides a link prediction method and a link prediction system based on topological structure and attribute information.
The method and system for predicting links based on topology and attribute information provided by the embodiment of the invention are specifically described by the following embodiment, and the method for predicting links based on topology and attribute information in the embodiment of the invention is described first.
Referring to fig. 1 and 2, a link prediction method based on topology and attribute information according to an embodiment of the present invention may include, but is not limited to, the following steps S100 to S500.
S100, determining a root node from a first node of an original attribute graph, performing offset random walk calculation on the original attribute graph according to the root node to obtain a first node sequence set, and performing frequency screening on the first node according to the first node sequence set to obtain a neighbor node set.
Specifically, the original attribute map represents an initial dataset, which is a map structure; there are nodes, edges, and attributes of the nodes in the attribute graph. For example, where learner A and learner B are nodes in the attribute map, where learner A and learner B are friends, then there is an edge between them; both scholars a and B have attributes, which may be study interests, units, gender, age, etc.
Step S100 includes steps S110 to S120.
S110, according to the weight of the edges between the first nodes and the influence of the first nodes, the root node is calculated to determine the migration probability that one of the first nodes is used as the next node to carry out the migration.
In the embodiment of the invention, the probability that the root node selects the next first node to walk is calculated according to the weight of the edges between the first nodes and the influence of each first node, and the calculation formula of the walk probability is as follows:
wherein P (v) t |v t-1 ) Represents the t-1 th said first node v t-1 The probability of wandering by the t first node; alpha (v) t-1 ,v t ) Is v t-1 And v t A transfer weight function therebetween; z is a normalization constant; e is the edge set in the original attribute map.
S120, taking a root node as a starting node of random walk calculation, and carrying out biased random walk calculation according to the walk probability to obtain a first node sequence set; wherein the first node sequence set comprises a number of strings of first node sequences.
And setting parameters such as proper wandering length l, wandering times n, window size w and the like, taking a root node as a starting node of random walk calculation, and carrying out random walk calculation on a first node in an original attribute graph according to the wandering probability to obtain a first node sequence set with n strings of length l.
When random walk calculation is performed, noise can be introduced to a certain extent, for example, random nodes are added or a part of edges are deleted, and the formed first node sequence set is beneficial to noise processing of a model and enhances the robustness of the model. In other embodiments, the generalization ability and robustness of the model can also be increased by data enhancement of the node representation, such as randomly perturbing the node representation vector or adding noise.
S130, determining a neighbor node set according to the frequency of the first node in the first node sequence set.
And screening the first node in the original attribute graph according to the occurrence frequency of the first node in the first node sequence set, and selecting the first node into the neighbor node set when the occurrence frequency of the first node exceeds a set frequency threshold tau. For example, a node is determined as a current node in the first node, and frequency calculation is performed, where the calculation formula of the frequency is as follows:
wherein freq (v, S) represents the frequency with which the current node v appears in the first set of node sequences S; count (v, S) represents the number of times the current node v appears in the first set of node sequences S,representing the sum of the number of occurrences of all the first nodes in the first set of sequences of nodes S; v r Representing the r first node.
The result of this frequency calculation measures the relative frequency of occurrence of the current node v in S, and further samples the first node sequence set according to the obtained frequency to determine the neighboring node set.
The function expression for determining the neighbor node set is:
S =F(|v|v∈S,freq(v,S)>τ|)
wherein S is Representing a set of neighbor nodes; f is a function of selecting neighbor nodes, and is used for selecting Gao Pindi two nodes in the first node sequence as the neighbor nodes; v represents the current node; τ is a frequency threshold that controls the frequency of the selected node.
And S200, learning the attribute representation of the original attribute graph by adopting the BERT model to obtain attribute vector representations of all first nodes in the original attribute graph, and generating a high-order similar attribute list.
Specifically, the BERT (Bidirectional Encoder Representations from Transformers) model is a pre-training model, can train by using large-scale unlabeled corpus, learn semantic representation of the corpus content and output vector representation. Step S200 includes steps S210 to S240.
S210, inputting the original attribute graph into the BERT model for calculation to obtain the hidden layer representation of the first node.
S220, the structure information between the first nodes is encoded into a text sequence.
The network structure information between the first nodes is encoded into a vector form for representing the first nodes, specifically, each first node needs to be segmented, and the segmented words are encoded into sub-word units by using WordPiece segmentation of BERT, which process decomposes the attribute text of each first node into smaller word units so that the BERT model can understand.
S230, using the text sequence as input of an NSP task, and determining attribute vector representation of the first node by combining hidden layer representation.
NSP (Next Sentence Prediction) is a method in the BERT model to predict the context of sentences, the task of NSP is to determine if one sentence is the context of another sentence. In the embodiment of the invention, NSP is adopted to judge whether the nodes are adjacent or continuous. Specifically, in the embodiment of the invention, a text sequence generated by encoding is used as input of an NSP task, the context representation of each subword unit can be obtained through NSP calculation, then the hidden layer representation of the first node is combined, and the attribute vector representation of the first node is obtained through NSP calculation.
S240, integrating the attribute vector representations to form a high-order similar attribute list.
In the embodiment of the invention, the high-order similar attribute list consists of Top-N attribute similar nodes of k-order neighborhood of the root node.
S300, calculating attribute similarity among the first nodes according to the high-order similar attribute list, and further determining a similar attribute node set of the root node.
Specifically, step S300 includes the following steps S310 to S330.
S310, determining the dependency relationship in the original attribute map according to the high-order similar attribute list.
S320, calculating attribute similarity between the first nodes through a cosine similarity calculation method according to the dependency relationship.
In the embodiment of the invention, a calculation formula for calculating attribute similarity between the first nodes through cosine similarity is as follows:
(v i ,v j )=cos(Attribute(v i ),Attribute(v j ))
wherein Simv i ,v j ) Representing the calculation of the i-th first node v i And the jth first node v j Attribute similarity of (2); attribute representation of the Attribute (·) node; cos (-) represents cosine similarity calculation; v i Representing an ith first node; v j Representing the jth first node.
S330, determining a similar attribute node set of the root node according to the attribute similarity.
Selecting a first node with attribute similarity larger than a set similarity threshold value rho into a first similar attribute node set, screening out first nodes with the attribute similarity of N bits from large to small from the first similar attribute node set, namely Top-N, and combining all Top-N into a similar attribute node set; the expression for determining the attribute similar node set is as follows:
in the method, in the process of the invention,representing a set of Top-N nodes most similar to the root node u in a k-order neighborhood; dist (u, v) represents the shortest distance between the root node u and the current node v in the graph; the function Top-N (C, f) represents selecting Top-N nodes from the set C according to the function f; v denotes the first set of similar attribute nodes.
S400, merging the neighbor node set and the similar attribute node set to obtain a first attribute graph, and determining the final node representation of the first attribute graph.
Specifically, step S400 includes the following steps S410 to S430.
S410, merging the neighbor node set and the similar attribute node set to obtain a first attribute graph.
S420, splicing the structural representation and the attribute representation of the second node in the first attribute graph, and further determining a first neighbor node of the root node.
And splicing the structural representation and the attribute representation representing the same node in the neighbor node set and the similar attribute node set, and determining a first neighbor node of the root node.
And S430, performing fusion calculation on the root node and the first neighbor node by adopting a multi-head attention mechanism to obtain a final node representation.
In the embodiment of the invention, the step of performing fusion calculation on the root node and the first neighbor node by adopting a multi-head attention mechanism comprises the following steps of:
step one, characteristic representation. Converting the input root node and the first neighbor node into feature representations, wherein each node represents a feature or element of data through the feature representations in the first attribute graph;
and step two, similarity calculation. The similarity or the correlation between each node and other nodes is calculated, and dot product, cosine similarity and the like can be adopted in the method for calculating the similarity or the correlation;
and thirdly, calculating weights. Calculating the weight of each first node according to the result obtained by similarity calculation, and converting the similarity into probability distribution by adopting a normalization function (such as Softmax);
and step four, weighted summation. And carrying out weighted summation on the weight of each first node and other corresponding characteristics to obtain a final node representation. The higher weighted nodes have a greater contribution in the final node representation.
The calculation formula of the multi-head attention mechanism in the embodiment of the invention is as follows:
MultiHead(Q,K,V)=Concat(head 1 ,head 2 ,…,head h )W O
wherein multi head represents a multi-headed attention calculation; q, K and V represent input query, key, and value, respectively; w (W) O Is a weight matrix; h represents the number of attention headers; the dimension is n x d; n is the number of nodes; d is the representation dimension.
Each attention header head i Applying a scaled dot product attention mechanism to both Q, K and V, the attention head formula is: head part i =Attention(QW i Q ,KW i K ,VW i V ) Wherein W is i Q ,W i K ,W i V Is a learnable parameter. The attention mechanism formula is:wherein d k Is the dimension of the key vector K.
And finally, connecting the output result of the multi-head attention with the attribute vector representation of each first node to obtain a final node representation, wherein the calculation formula is as follows: h is a v =Concat(MultiHead(Q v ,K v ,V v ),Attribute(v))。
S500, representing the probability of existence of a link between the prediction source node and the target node according to the final node.
Specifically, step S500 includes the following steps S510 to S550.
S510, acquiring a source node and a target node to be predicted.
S520, determining a first embedded representation of the source node and a second embedded representation of the target node according to the final node representation.
And finding a node representation corresponding to the source node in the final node representation as a first embedded representation, and finding a node representation corresponding to the target node as a second embedded representation.
S530, splicing the first embedded representation and the second embedded representation into a vector and inputting the vector into the multi-layer perceptron.
S540, determining a link score according to the first layer weight matrix, the second layer weight matrix, the first layer bias vector and the second layer bias vector of the multi-layer perceptron.
S550, determining the probability of existence of the link between the source node and the target node according to the link score.
In the embodiment of the invention, a calculation formula of a link score of a link between a source node and a target node is as follows:
P(u i ,u j )=σ(W 2 f(W 1 [u i ,u j ]+b 1 )+b 2 )
wherein P (u) i ,u j ) Representing a probability of a link existing between a source node and a target node; u (u) i A first embedded representation representing a source node; u (u) j A second embedded representation representing a target node; [ u ] i ,u j ]Representing the first embedded representation u of the source node i And a second embedded representation u of the target node j Splicing into a vector; w (W) 1 A first layer weight matrix representing a multi-layer perceptron; w (W) 2 Respectively representing a second layer weight matrix of the multi-layer perceptron; b 1 A first layer bias vector representing a multi-layer sensor; b 2 A second layer bias vector representing the multi-layer sensor; sigma represents a sigmoid function. The resulting link score measures the probability of links between nodes.
The embodiment of the invention also provides a link prediction system based on the topological structure and the attribute information, and referring to fig. 3, the link prediction system comprises:
the first module is used for determining a root node from the first nodes of the original attribute graphs, performing offset random walk calculation on the original attribute graphs according to the root node to obtain a first node sequence set, and performing frequency screening on the second nodes in the first node sequence set to obtain a neighbor node set.
And the second module is used for learning the attribute representation of the original attribute graph by adopting a BERT model to obtain attribute vector representations of all the first nodes in the original attribute graph and generate a high-order similar attribute list.
And the third module is used for calculating the attribute similarity between the first nodes according to the high-order similar attribute list, and further determining a similar attribute node set of the root node.
And a fourth module, configured to combine the neighboring node set and the similar attribute node set to obtain a first attribute map, and determine a final node representation of the first attribute map.
And a fifth module, configured to predict a probability that a link exists between the source node and the target node according to the final node representation.
The embodiment of the invention also provides electronic equipment, referring to FIG. 4, comprising a processor and a memory; the memory is used for storing programs; the processor executes the program to implement the method as described above.
The embodiment of the invention also provides a computer-readable storage medium storing a program that is executed by a processor to implement the method as described above.
The embodiment of the invention has the following beneficial effects:
1. the method comprises the steps that a first node sequence set is obtained through biased random walk calculation on an original attribute graph, frequency screening is conducted on a first node, a neighbor node set is determined, attribute vector representation of the node is obtained, then a similar attribute node set is determined, the neighbor node set and the similar attribute node set are fused to generate the first attribute graph, then the probability of linkage between a source node and a target node is predicted, a topological structure and attribute information can be effectively utilized to learn node representation, meanwhile, higher-order neighbor information can be captured, a random walk and frequency screening method is adopted, relevance among all nodes in the original attribute graph is fully utilized, the structural adjacency in the original attribute graph can be captured, more comprehensive information is obtained, more accurate link prediction results are further provided, and the accuracy of inter-node link prediction in the attribute graph is improved;
2. when random walk calculation is performed, noise is introduced to a certain extent, for example, random nodes are added or a part of edges are deleted, and the formed first node sequence set is beneficial to noise processing of a model and can enhance the robustness of the model. In other embodiments, the generalization ability and robustness of the model can also be increased by data enhancement of the node representation.
The following is an application example provided by the embodiment of the present invention:
determining a root node from a first node of an original attribute graph, performing offset random walk calculation on the original attribute graph according to the root node to obtain a first node sequence set, and performing frequency screening on the first node according to the first node sequence set to obtain a neighbor node set; learning attribute representation of the original attribute graph by adopting a BERT model to obtain attribute vector representations of all first nodes in the original attribute graph, and generating a high-order similar attribute list; according to the high-order similarity attribute list, calculating attribute similarity among the first nodes, and further determining a similar attribute node set of the root node; merging the neighbor node set and the similar attribute node set to obtain a first attribute map, and determining the final node representation of the first attribute map; the probability of a link between the source node and the target node is predicted from the final node representation.
In some alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may in fact be executed substantially concurrently or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present invention are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed, and in which sub-operations described as part of a larger operation are performed independently.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
Logic and/or steps represented in the flowcharts or otherwise described herein, e.g., a ordered listing of executable instructions for implementing logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.
More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). In addition, the computer readable medium may even be paper or other suitable medium on which the program is printed, as the program may be electronically captured, via, for instance, optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner, if necessary, and then stored in a computer memory.
In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
While embodiments of the present invention have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the invention, the scope of which is defined by the claims and their equivalents.
While the preferred embodiment of the present invention has been described in detail, the present invention is not limited to the embodiments described above, and those skilled in the art can make various equivalent modifications or substitutions without departing from the spirit of the present invention, and these equivalent modifications or substitutions are included in the scope of the present invention as defined in the appended claims.

Claims (10)

1. The link prediction method based on the topological structure and the attribute information is characterized by comprising the following steps:
determining a root node from a first node of an original attribute graph, performing offset random walk calculation on the original attribute graph according to the root node to obtain a first node sequence set, and performing frequency screening on the first node according to the first node sequence set to obtain a neighbor node set;
learning the attribute representation of the original attribute map by adopting a BERT model to obtain attribute vector representations of all the first nodes in the original attribute map, and generating a high-order similar attribute list;
calculating attribute similarity among the first nodes according to the high-order similar attribute list, and further determining a similar attribute node set of the root node;
merging the neighbor node set and the similar attribute node set to obtain a first attribute graph, and determining a final node representation of the first attribute graph;
and predicting the probability of a link between the source node and the target node according to the final node representation.
2. The method for predicting a link based on a topology and attribute information according to claim 1, wherein the performing, according to the random walk calculation in which the root node biases the original attribute map, a first node sequence set is obtained, and according to the first node sequence set, performing frequency screening on the first node to obtain a neighbor node set includes:
calculating the migration probability of one first node to take the other first node as the next node for migration according to the weight of the edges between the first nodes and the influence of the first nodes;
taking the root node as a starting node of random walk calculation, and carrying out biased random walk calculation according to the walk probability to obtain a first node sequence set; wherein the first node sequence set comprises a plurality of strings of first node sequences; and determining a neighbor node set according to the frequency of the first node in the first node sequence set.
3. The method for link prediction based on topology and attribute information of claim 2, wherein,
the calculation formula of the walk probability is as follows:
wherein P (v) t |v t-1 ) Represents the t-1 th said first node v t-1 With the t th said first node v t A migration probability of performing migration as a next node; alpha (v) t-1 ,v t ) Is v t-1 And v t A transfer weight function between Z and E is a normalization constant, and E is an edge set in the original attribute graph;
the calculation formula of the frequency is as follows:
wherein freq (v, S) represents the frequency of occurrence of the current node v in said first set of node sequences S; count (v, S) represents the number of times the current node v appears in the first set of node sequences S,representing the sum of the number of occurrences of all the first nodes in the first set of sequences of nodes S; v r Representing an r-th said first node;
the function expression for determining the neighbor node set is:
S =F(|v|v∈S,freq(v,S)>τ|)
wherein S is Representing a set of neighbor nodes; f is a function of selecting neighbor nodes; v represents the current node; τ is a frequency threshold that controls the frequency of the selected node.
4. The method for predicting a link based on a topology and attribute information according to claim 1, wherein learning the attribute representation of the original attribute map using a BERT model to obtain attribute vector representations of all the first nodes in the original attribute map, and generating a high-order similar attribute list includes:
inputting the original attribute map into a BERT model for calculation to obtain a hidden layer representation of the first node;
encoding structural information between the first nodes into a text sequence;
determining an attribute vector representation of the first node in combination with the hidden layer representation by taking the text sequence as an input of an NSP task;
and integrating the attribute vector representations to form a high-order similar attribute list.
5. The method for predicting links based on topology and attribute information according to claim 1, wherein calculating attribute similarity between each of the first nodes according to the high-order similarity attribute list, and further determining a set of similar attribute nodes of the root node, comprises:
determining a dependency relationship in the original attribute map according to the high-order similar attribute list;
calculating attribute similarity between the first nodes by a cosine similarity calculation method according to the dependency relationship;
and determining a similar attribute node set of the root node according to the attribute similarity.
6. The method of claim 1, wherein merging the set of neighboring nodes and the set of similar attribute nodes to obtain a first attribute map and determining a final node representation of the first attribute map comprises:
combining the neighbor node set and the similar attribute node set to obtain a first attribute graph;
splicing the structural representation and the attribute representation of the second node in the first attribute graph, and further determining a first neighbor node of the root node;
and carrying out fusion calculation on the root node and the first neighbor node by adopting a multi-head attention mechanism to obtain a final node representation.
7. The method for predicting a link based on topology and attribute information as recited in claim 1, wherein the predicting a probability that a link exists between a source node and a target node based on the final node representation comprises:
acquiring a source node and a target node to be predicted;
determining a first embedded representation of the source node and a second embedded representation of the target node from the final node representation;
splicing the first embedded representation and the second embedded representation into a vector and inputting the vector into a multi-layer perceptron;
determining a link score according to a first layer weight matrix, a second layer weight matrix, a first layer bias vector and a second layer bias vector of the multi-layer perceptron;
and determining the probability of the existence of a link between the source node and the target node according to the link score.
8. A link prediction system based on topology and attribute information, comprising:
the first module is used for determining a root node from a first node of an original attribute graph, performing offset random walk calculation on the original attribute graph according to the root node to obtain a first node sequence set, and performing frequency screening on the first node according to the first node sequence set to obtain a neighbor node set;
the second module is used for learning the attribute representation of the original attribute graph by adopting a BERT model to obtain attribute vector representations of all the first nodes in the original attribute graph and generate a high-order similar attribute list;
a third module, configured to calculate, according to the high-order similarity attribute list, attribute similarity between the first nodes, thereby determining a similar attribute node set of the root node;
a fourth module, configured to combine the neighboring node set and the similar attribute node set to obtain a first attribute map, and determine a final node representation of the first attribute map;
and a fifth module, configured to predict a probability that a link exists between the source node and the target node according to the final node representation.
9. An electronic device comprising a processor and a memory;
the memory is used for storing programs;
the processor executing the program implements the method of any one of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the storage medium stores a program that is executed by a processor to implement the method of any one of claims 1 to 7.
CN202310697038.8A 2023-06-12 2023-06-12 Link prediction method and system based on topological structure and attribute information Pending CN116932938A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310697038.8A CN116932938A (en) 2023-06-12 2023-06-12 Link prediction method and system based on topological structure and attribute information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310697038.8A CN116932938A (en) 2023-06-12 2023-06-12 Link prediction method and system based on topological structure and attribute information

Publications (1)

Publication Number Publication Date
CN116932938A true CN116932938A (en) 2023-10-24

Family

ID=88376487

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310697038.8A Pending CN116932938A (en) 2023-06-12 2023-06-12 Link prediction method and system based on topological structure and attribute information

Country Status (1)

Country Link
CN (1) CN116932938A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117235032A (en) * 2023-11-08 2023-12-15 支付宝(杭州)信息技术有限公司 Distributed link prediction method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117235032A (en) * 2023-11-08 2023-12-15 支付宝(杭州)信息技术有限公司 Distributed link prediction method and device
CN117235032B (en) * 2023-11-08 2024-01-05 支付宝(杭州)信息技术有限公司 Distributed link prediction method and device

Similar Documents

Publication Publication Date Title
CN111488734B (en) Emotional feature representation learning system and method based on global interaction and syntactic dependency
Xu Understanding graph embedding methods and their applications
CN108984724B (en) Method for improving emotion classification accuracy of specific attributes by using high-dimensional representation
CN109376222B (en) Question-answer matching degree calculation method, question-answer automatic matching method and device
CN111506714A (en) Knowledge graph embedding based question answering
Ezaldeen et al. A hybrid E-learning recommendation integrating adaptive profiling and sentiment analysis
CN112417097B (en) Multi-modal data feature extraction and association method for public opinion analysis
CN110245238B (en) Graph embedding method and system based on rule reasoning and syntax mode
Wang et al. SentiRelated: A cross-domain sentiment classification algorithm for short texts through sentiment related index
CN115391570A (en) Method and device for constructing emotion knowledge graph based on aspects
Wang et al. Scholar2vec: vector representation of scholars for lifetime collaborator prediction
CN116383399A (en) Event public opinion risk prediction method and system
Zhu et al. Joint visual-textual sentiment analysis based on cross-modality attention mechanism
CN116932938A (en) Link prediction method and system based on topological structure and attribute information
Yang et al. Social tag embedding for the recommendation with sparse user-item interactions
CN116244446A (en) Social media cognitive threat detection method and system
Zhou et al. Rank2vec: learning node embeddings with local structure and global ranking
CN114880427A (en) Model based on multi-level attention mechanism, event argument extraction method and system
Wei et al. Sentiment classification of tourism reviews based on visual and textual multifeature fusion
Chen et al. Dag-based long short-term memory for neural word segmentation
CN112434512A (en) New word determining method and device in combination with context
Zhang et al. An attentive memory network integrated with aspect dependency for document-level multi-aspect sentiment classification
CN113869034B (en) Aspect emotion classification method based on reinforced dependency graph
CN115730232A (en) Topic-correlation-based heterogeneous graph neural network cross-language text classification method
CN114491041A (en) Patent classification method and system based on network representation learning and hierarchical label embedding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination