CN114037008A - Node classification method and system for multi-granularity attribute network embedding based on attribute continuous edges - Google Patents
Node classification method and system for multi-granularity attribute network embedding based on attribute continuous edges Download PDFInfo
- Publication number
- CN114037008A CN114037008A CN202111305905.6A CN202111305905A CN114037008A CN 114037008 A CN114037008 A CN 114037008A CN 202111305905 A CN202111305905 A CN 202111305905A CN 114037008 A CN114037008 A CN 114037008A
- Authority
- CN
- China
- Prior art keywords
- network
- attribute
- node
- nodes
- granularity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a node classification method and a node classification system for multi-granularity attribute network embedding based on attribute continuous edges, which are used for constructing a structural network aiming at an input network, selecting node pairs with high similarity of attributes and providing attribute continuous edges to construct an attribute network; adjusting a network structure and attribute weight, fusing the network structure and the attribute weight, dividing the topological structure and node attribute information in the network to obtain a coarsening network, repeating the division process to obtain a series of coarsening attribute networks, performing matrix decomposition to obtain initial node low-dimensional vector representation of each granularity attribute network, and optimizing the initial representation through spectrum propagation to obtain node feature representation of the granularity; and splicing the node characteristic representations of all the granularities and sending the node characteristic representations to a classifier to finish node classification. The attribute continuous edges are firstly clustered, then processed by adopting a threshold value, the speed of attribute processing is obviously accelerated, meanwhile, a granulation model is obtained through rapid division to reduce the network scale, and node representation is learned, so that the speed of node classification is generally accelerated.
Description
Technical Field
The invention relates to the technical field of network embedding, in particular to a node classification method and a node classification system for multi-granularity attribute network embedding based on attribute edge connection.
Background
The network data form can naturally express the relation between objects and is ubiquitous in daily life and work of people. In recent years, networks are developed rapidly as important data structures for simulating complex systems in the real world, and for example, a citation network, a social network and the like can perform a series of data mining analysis by constructing a complex network model. Nodes and edges are vital as an essential part of the constituent network. For example, in a citation network, each article can be represented by one node, the citation relationship between the articles is described by connecting edges in the network, and citations with higher similarity can be recommended to readers by predicting labels of the articles. In the social network, each social entity user can be represented by one node, the relationship among the users can be replaced by edges, and a node classification model is constructed by predicting and classifying the types of the users, so that the personalized recommendation of the users is realized. Therefore, the problem study of node classification plays a crucial role in network analysis.
Network embedding is to map each node into a low-dimensional vector representation by keeping the network structure and its inherent characteristics, and the method attracts great attention with the continuous development of networks. In practical application, the node classification task is mainly performed by using a network embedding method, low-dimensional vector representations of each node are obtained through learning, and the similarity degree between the nodes is calculated by using the vector representations, so that the labels in the nodes are predicted. There are various network embedding methods to obtain node low-dimensional vectors, such as a single-granularity structure information node classification method in a reserved network, a single-granularity structure and attribute information node classification method in a reserved network, a multi-granularity structure information node classification method in a reserved network, and the like. Using these different strategies, low-dimensional vector representations of nodes can be obtained, and these representations are used for node classification tasks. However, the above methods all have certain limitations, for example, the node classification method for retaining single-granularity structural information in a network generally cannot retain attribute information of a node, which may result in low node classification accuracy. The node classification method for retaining single-granularity structures and attribute information in the network generally cannot capture the characteristics of multiple granularities in the network, only can reflect the network characteristics under a certain single granularity, and further cannot show greater superiority on the node classification task. Although the node classification method for retaining the multi-granularity structure information in the network retains the multi-granularity information of the network, the node attribute information is ignored, so that a more excellent node classification result cannot be obtained. The method has yet to be improved in node classification accuracy.
Disclosure of Invention
The invention aims to solve the technical problem that the existing node classification method using network embedding can not obtain more effective classification results.
The invention solves the technical problems through the following technical means:
a node classification method and system based on multi-granularity attribute network embedding of attribute continuous edges comprises the following steps:
s1, numbering the nodes in the network according to the input network and obtaining the label information of the nodes, and then constructing a structural network Gtopo;
S2, capturing attribute information of all nodes, firstly clustering and primarily screening out nodes with similar attributes, then comparing the nodes with thresholds in each cluster to screen out node pairs with high similar attributes in the network overall situation, and finally constructing an attribute network G by using the proposed attribute edge connecting methodfeat;
S3, aiming at the input network, adjusting the weight of the network structure and the attribute, and forming a fusion network by the fusion structure network and the attribute network
S4, according to the converged networkThe network is obtained by dividing the topological structure and the attribute information of the nodesRepeating the division process to obtain a series of attribute networks with gradually reduced network scale:respectively representing different granularities in the network, wherein i and k are integers, and i is an integer between 0 and k;
s5, performing matrix decomposition to obtain initial node low-dimensional vector representation of each granularity attribute network, and optimizing the initial representation through spectrum propagation to obtain node feature representation of the granularity;
and S6, splicing the node feature representations of all the granularities to obtain the node feature representation reflecting the multiple granularities of the network.
And S7, sending the node feature representation and the labels of the multi-granularity attribute network into a classifier, predicting the labels of the nodes of unknown classes, and classifying the nodes with the same labels into the same class to finish node classification.
The invention constructs the structure network G firstlytopoAnd then constructing an attribute network G by using the proposed attribute edge linking methodfeatThe fusion structure network and the attribute network form a fusion networkAnd to networkThe network is obtained by dividing the topological structure and the attribute information of the nodesThe division process is repeated to obtain a series of hierarchical attribute networks with gradually reduced network scale, and the multi-granularity attribute network embedding method based on attribute connection edges can not only retain the attribute information of the nodes in the network, but also retain different granularity characteristics of the network, thereby improving the node classification performance and solving the problem of the node classification method utilizing network embedding at presentThe method can not obtain more efficient classification results.
As a further scheme of the invention: the step S1 includes:
s11, processing the input network, including:
step A, numbering entities in a network; and the network comprises n1Entities, each entity acting as a node of a network, n1The relation between the entities is used as the connecting edge of the network, and the number of the connecting edges is n2;
B, dividing the entities input into the network into a plurality of categories, wherein the label of each node is a category number;
s12, constructing a structure network G according to the processed network datatopo:
Network Gtopo(V, E) wherein V represents n1A set of nodes, E represents n2Set of bars connecting edges.
As a further scheme of the invention: the S2 process includes:
s21, screening node pairs with high similarity of attributes, and connecting edges of the attributes:
and acquiring node attribute information, and clustering nodes in the network based on attribute relation by using a KMeans algorithm.
And for the nodes in each class, simultaneously performing cosine similarity AtTiSim detection between every two nodes. Similarity detection and assignment are performed on node pairs by using formula (1), Af(i, j) representsThe similarity degree between the ith node and the jth node is expressed in a matrix form:
if the attribute similarity AtTiSim between the node pairs is larger than a preset threshold value gamma, screening the node pairs as node pairs with high similarity of the attributes in the network overall situation;
and obtaining node pairs with high attribute similarity AttiSim between the nodes by comparing threshold values, namely, the attributes between the nodes have high similarity. Connecting an edge between the two nodes, setting the weight as a cosine similarity value on the node pair to represent the attribute similarity degree between the two nodes, wherein the process is the attribute connecting edge;
s22, constructing attribute network G according to the result obtained by attribute edge connectionfeat:
Network Gfeat(V, E, X) wherein V represents n1A set of nodes, E represents n2A set of contiguous edges, X being an n1L, where l represents the dimension of the node attribute; constructing and completing an attribute network G according to the existing nodes and the newly generated continuous edge conditionfeat;
As a further scheme of the invention: the S3 process includes:
s31, aiming at the given input network, adjusting the network structure and the attribute adding weight, and forming a fusion network by the fusion structure network and the attribute network
Will structure network GtopoAnd attribute network GfeatRespectively by adjacent matrix Atopo,AfeatExpressing that aiming at different networks, weights are respectively given to adapt to a given network as shown in formula (2), and then the two matrixes are added to obtain a network fusion graphExpressed as a adjacency matrix
Network convergence graph at this timeThe weight value between the middle nodes represents the similarity degree between the nodes under the combined action of the network structure and the attributes;
as a further scheme of the invention: the S4 process includes:
s41 fusion network based on any granularityNetwork pair by using Louvain community division methodCarrying out community division, and obtaining community division results based on structure and attribute information:
wherein, ViIndicating the result of community division with similar structure and attribute,representing according to a networkThe 1 st community divided by the similarity degree of the structure and the attribute of the community;
s42, acquiring a super node set according to the community division result based on the structure and the attribute;
acquiring a networkThe division results of communities with similar structures and attributes will beEach new community as a networkTo obtain a set of supernodes
S43, according to the super node setThe attribute information in the supernode is obtained by averaging the nodes forming the previous granularity of the supernode to obtain the attribute information X of the supernodei+1;
S44, according to the super node setCombining the continuous edges of the nodes in the super-node set to form continuous edges of the super-nodes, superposing the weights to obtain super-edges, and acquiring a super-edge set Ei+1;
S45, according to the super node set Vi+1Super edge set Ei+1And attribute information X of the supernodei+1Building a new networkAnd is
S46, iteratively training to obtain a series of attribute networks with gradually reduced network sizes, and is For the coarsest layer of the attribute network,the thickness relationship of the particles is shown,to representParticle ratio ofThe particles of (2) are finer.
As a further scheme of the invention: the S5 process includes:
s51, for a series of coarsened networks with different granularities, the coarsened networks are expressed in a matrix form, and the initial expression is approximated by learning by a rapid matrix decomposition method Randomized SVD method. The method comprises the following steps:
step 1:
solving an approximate matrix Q in the range of the original matrix A, continuously multiplying the original matrix A by a randomly initialized small-dimension matrix Q by using a formula (3), and then decomposing to finally obtain a stable vector matrix:
A≈QQTA (3)
step 2: constructing a matrix:
B=QTA (4)
and step 3: and decomposing the matrix B by using an SVD method:
B=S∑VT (5)
s52, adding local smooth information and clustering information of each layer by using spectrum propagation, optimizing initial node representation, and obtaining node representation under the granularity;
as a further scheme of the invention: the S6 process includes:
s61, splicing the vector representation and the original attribute information of each granularity network according to a formula (6) to obtain a network representation reflecting a plurality of granularities:
as a further scheme of the invention: the S7 process includes:
the multi-granularity attribute network node characteristics and labels are sent into a classifier, the labels of nodes of unknown classes are predicted, the nodes with the same labels are classified into the same class, and node classification is completed;
a node classification method and system based on multi-granularity attribute network embedding of attribute continuous edges comprises the following steps:
a construction module for numbering the citation network base and obtaining the node label based on the citation network base, and then constructing the structure network GtopoProperty network GfeatConverged networkWherein i is an integer;
a partitioning module for pairing networks based on structure and attribute informationDividing and granulating to obtain a coarsened networkRepeating the granulation process to obtain a series of attribute networks with gradually reduced network size: wherein i and k are integers;
the attribute network node feature module learns the low-dimensional vector initial feature representation of each granularity attribute network node;
an optimization module that optimizes the initial vector representation using spectral propagation;
the splicing module splices the vector representation and the attribute information of each granularity node to obtain network representation;
the classification module is used for sending the multi-granularity attribute network node characteristics and the labels into the classifier, predicting the labels of the nodes of unknown classes, classifying the nodes with the same labels into the same class, and finishing node classification;
the invention has the advantages that:
1. in the invention, a structure network G is constructed firstlytopoAnd then constructing an attribute network G by attribute connectionfeatFinally form a converged networkSimilar relationships between structures and attributes in the network may be maintained, respectively. Network based on both structure and attribute informationPartitioning to obtain a networkThe division process is repeated to obtain a series of attribute networks with gradually reduced network scale, network structures and attribute information under different granularities can be reserved, and the attribute network information of different granularities can be freely exchanged and transmitted, so that the node classification performance is improved, and the problem that a more effective classification result cannot be obtained by using the conventional network-embedded node classification method is solved.
2. According to the invention, the attribute information in the network can be rapidly reserved through the attribute connection, the attribute similarity relation is added into the original structure network, the network structure and the attribute information are effectively reserved through the fusion structure and the attribute network, and meanwhile, the adding weight of the structure and the attribute information is adjusted, so that the classification result can be better improved.
3. In the invention, not only can the network embedding results of various granularities be obtained, and the result of each granularity can approximately represent the original network, but also the network embedding of various granularities can be spliced to describe the multi-granularity information of the original network.
In the invention, the attribute continuous edges are firstly clustered and then processed by adopting a threshold, so that the speed of attribute processing is obviously accelerated, and meanwhile, a granulation model is obtained through rapid division to reduce the network scale and learn node representation, thereby accelerating the speed of node classification on the whole.
Drawings
Fig. 1 is a schematic flowchart of a node classification method and system for attribute-edge-based multi-granularity attribute network embedding according to embodiment 1 of the present invention.
Fig. 2 is a framework diagram of a node classification method and system based on attribute edge-connected multi-granularity attribute network embedding provided in embodiment 1 of the present invention.
Fig. 3 is a schematic structural diagram of a node classification method and system for attribute edge-based multi-granularity attribute network embedding according to embodiment 2 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
First, a brief introduction is made to Pubmed, which is a database that provides biomedical paper search and summarization, and is free to search. Its database source is MEDLINE. The core subject is medicine, but also includes other medically related fields, such as nursing or other health disciplines.
Example 1
Referring to fig. 1 and fig. 2, fig. 1 is a schematic flowchart of a method and a system for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to embodiment 1 of the present invention, and fig. 2 is a frame diagram of the method and the system for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to embodiment 1 of the present invention, including the following steps:
s1, constructing a structural network G based on the citation network librarytopo;
S11, processing the cited reference network library, specifically comprising the following steps:
A. numbering articles in the Pubmed data set; and the data set comprises n1Articles, each article being a node of the network, n1The quoted or quoted relation between articles is used as the connecting edge of the network, and the number of the connecting edge is n2
Numbering the articles in the citation network library according to the sequence of 0, 1, 2 … n1;
Specifically, the present invention takes Pubmed as an example, and Pubmed is a database providing biomedical literature search and summarization. The main sources of the data are as follows: MEDLINE, OLDMEDLINE, Record in process, Record supported by publishing et al. Data type: journal articles, reviews, and links to other databases. The core subject is medicine, but other medically related fields are also included. Each article contains a title and an abstract, each article in the Pubmed is a node in the network and is numbered 0, 1, 2 … 3311 in sequence, and the reference or referenced relationship is a connecting edge of the network, and the number of the articles is n2;
B. Classifying the articles in the Pubmed data set;
in this embodiment, the categories of articles in Pubmed are numbered 0, 1, 2 as labels of each node; extracting the title and abstract of each article, sorting to obtain 500 high-frequency words serving as attribute information of the article, and obtaining 500-dimensional vector representation of the attribute information by using a Tfidf vector;
and removing stop words and low-frequency words from the attribute information of the entity, and converting the attribute information into TF-IDF vector representation, wherein the vector is used as the attribute information of the node.
S12, constructing a structural network T according to the processed data of the citation network library;
network Gtopo(V, E) wherein V represents n1A set of nodes, E represents n2A set of contiguous edges.
In this embodiment, since there are 19717 nodes and 44338 connecting edges in Pubmed, in this embodiment, V represents 19717 nodesSet of (V) { V ═ V0,v1,…};
E represents the set of 44338 connected edges in the network; e ═ E1,e2…, where E ═ u, V ∈ E, indicates that there is a connecting edge between node V and node u (i.e., there is a reference relationship), and V and u indicate that V ═ V ∈ E, and V and u indicate that V are equal to { V ∈ }0,v1… };
s2, capturing attribute information of all nodes, screening node pairs with high similarity of attributes, and establishing an attribute network G by providing an attribute edge connecting methodfeat;
Network Gtopo(V, E, X) wherein V represents n1A set of nodes, E represents n2A set of contiguous edges, X being an n1A multidimensional matrix of l, l representing the dimension of the node attribute, l being generally known, and
x is a 19717X 500 dimensional matrix, 500 is the dimension of the node attribute in Pubmed, and the ith row in the matrix represents the ith node viBy x, attribute information ofiRepresents;
s21, screening node pairs with high similarity of attributes, and connecting edges of the attributes:
and acquiring node attribute information, and clustering nodes in the network based on attribute relation by using a KMeans algorithm.
And for the nodes in each class, simultaneously performing cosine similarity AtTiSim detection between every two nodes. Similarity detection and assignment are performed on node pairs by using formula (1), Af(i, j) representsThe similarity degree between the ith node and the jth node is expressed in a matrix form:
if the attribute similarity AtTiSim between the node pairs is larger than a preset threshold value gamma, screening the node pairs as node pairs with high attribute similarity;
and obtaining node pairs with high attribute similarity AttiSim between the nodes by comparing threshold values, namely, the attributes between the nodes have high similarity. Connecting an edge between the two nodes, setting the weight as a cosine similarity value on the node pair to represent the attribute similarity degree between the two nodes, wherein the process is the attribute connecting edge;
s22, constructing attribute network G according to the result obtained by attribute edge connectionfeat:
Network Gfeat(V, E, X) wherein V represents n1A set of nodes, E represents n2A set of contiguous edges, X being an n1L, where l represents the dimension of the node attribute; constructing and completing an attribute network G according to the existing nodes and the newly generated continuous edge conditionfeat。
S3, aiming at the given input network, adjusting the network structure and attribute adding weight, and forming a fusion network by the fusion structure network and the attribute network
S31, aiming at the given input network, adjusting the network structure and the attribute adding weight, and forming a fusion network by the fusion structure network and the attribute network
Will structure network GtopoAnd attribute network GfeatRespectively by adjacent matrix Atopo,AfeatExpressing that aiming at different networks, weights are respectively given to adapt to a given network as shown in formula (2), and then the two matrixes are added to obtain a network fusion graphExpressed as a adjacency matrix
Network convergence graph at this timeThe middle nodes represent the similarity degree under the joint action of the structure and the attribute to the weight, so that the network with each granularity can keep the backbone structure and the attribute information of the original network to solve the approximate solution of the node representation of the original network, and the structure and the attribute information of the original network are fully utilized, so that the representation of similar nodes is more similar, and the similar nodes are more easily classified into the same category.
S4, networkThe network is obtained by dividing the topological structure and the attribute information of the nodesRepeating the division process to obtain a series of hierarchical attribute networks with gradually reduced network scale:respectively representing different granularities in the network, wherein i and k are integers, and i is an integer between 0 and k;
s41, based on any networkNetwork pair by using Louvain community division methodCarrying out community division, and obtaining a community division result based on the structure and the attribute:
wherein, ViRepresenting communities of similar structure and attributesAs a result of the division, the result of the division,representing according to a networkThe 1 st community divided by the similarity degree of the structure and the attribute of the community;
s42, acquiring a super node set according to the community division result based on the structure and the attribute;
acquiring a networkThe division results of communities with similar structures and attributes will beEach new community as a networkTo obtain a set of supernodes
S43, according to the super node setThe attribute information in the supernode is obtained by averaging the nodes forming the previous granularity of the supernode to obtain the attribute information X of the supernodei+1;
S44, according to the super node setCombining the continuous edges of the nodes in the super-node set to form continuous edges of the super-nodes, superposing the weights to obtain super-edges, and acquiring a super-edge set Ei+1;
S45, according to the super node set Vi+1Super edge set Ei+1And attribute information X of the supernodei+1Building a new networkAnd is
S46, iteratively training to obtain a series of attribute networks with gradually reduced network sizes, and isk is the number of layers of the network granulation,for the coarsest layer of the attribute network,the thickness relationship of the particles is shown,to representParticle ratio ofThe particles of (2) are finer. Exemplified therein are granulated three layers:
|V0|=19717>|V1|=9614>|V2|=4938>|V3|=2235,
|E0|=44338>|E1|=31338>|E2|=21843>|E3|=10137。
s5, performing matrix decomposition to obtain initial node low-dimensional vector representation of each granularity attribute network, and optimizing the initial representation through spectrum propagation to obtain node feature representation of the granularity;
s51, for a series of coarsened networks with different granularities, the coarsened networks are expressed in a matrix form, and the initial expression is approximated by learning by a rapid matrix decomposition method Randomized SVD method. The method comprises the following steps:
step 1:
solving an approximate matrix Q in the range of the original matrix A, continuously multiplying the original matrix A by a randomly initialized small-dimension matrix Q by using a formula (3), and then decomposing to finally obtain a stable vector matrix:
A≈QQTA (3)
step 2: constructing a matrix:
B=QTA (4)
and step 3: and decomposing the matrix B by using an SVD method:
B=S∑VT (5)
s52, adding local smooth information and clustering information of each layer by using spectrum propagation, optimizing initial node representation, and obtaining node representation under the granularity;
s6: obtaining initial node low-dimensional vector representation of each granularity attribute network through matrix decomposition, and optimizing the initial representation through spectrum propagation to obtain node feature representation of the granularity;
s61, splicing the vector representation and the original attribute information of each granularity network according to a formula (6) to obtain a network representation reflecting a plurality of granularities:
whereinThe splicing operation is expressed, the structure information and the attribute information are further fused, the network structure and the attribute information are reserved, and the example Pubmed is combined to obtain the informationSplicing the network representation and the original attribute information to obtain the final vector of each node in the networkAnd (4) showing.
And S7, sending the node feature representation and the labels of the hierarchical attribute network into a classifier, predicting the labels of the nodes of unknown classes, and classifying the nodes with the same labels into the same class to finish node classification.
It should be noted that the node classification model mentioned in the present invention is a very common classification model, such as an SVM classifier, and the classification model is not improved in the present application.
And step S7, predicting the labels of the nodes with unknown labels by using an SVM classifier according to the labels of the nodes Z and G of the network expressed by the low-dimensional vectors, wherein the nodes with the same labels are classified into the same class.
Illustratively, in order to verify the effectiveness and the advancement of the technical scheme provided by the invention, several existing node classification methods are selected for comparison: deepwalk, Node2vec, AANE, ASNE, HARP, MILE. The method comprises the steps that Deepwalk and Node2vec are single-granularity structure information Node classification methods in a reserved network, AANE and ASNE are single-granularity structures and attribute information Node classification methods in the reserved network, HARP and MILE are multi-granularity structure information Node classification methods in the reserved network, low-dimensional representation of each Node is learned to carry out Node classification, and the method (marked as AMG) is a method for learning embedded representation of nodes in multiple granularities by fusing attribute information and then carrying out Node classification. In the invention, the granulation layer number is set to be 1 and 2, the classification results of the nodes on the Pubmed citation network set in different training set proportions by the method are evaluated through Micro-F1 and Macro-F1, the best results are thickened, and the results are shown in Table 1. It can be seen that the present invention is best classified on all scales.
2. In order to verify the rapidity of the technical scheme provided by the invention, the method is used for classifying nodes on data sets Cora, Citeser and Pubmed citation networks. And selecting several existing node classification methods for comparison: deepwalk, Node2vec, AANE, ASNE, HARP, MILE. In the invention (AMG), the number of layers k is set to 1, 2, randomised SVD is used to learn the initial representation of the node, spectrum propagation is used to optimize the node representation, the representation of all granularities is integrated by splicing to obtain the final node representation, the node classification result is as in table 1, and the time consumed by node classification is as in table 2. As can be seen from table 2, the improvement in time is also very significant, and the average improvement is also very large, with the best results being shown in bold, it can be seen that the method is the fastest on the listed data sets.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (9)
1. The node classification method based on multi-granularity attribute network embedding of attribute continuous edges is characterized by comprising the following steps of:
s1, numbering the nodes in the network according to the input network and obtaining the label information of the nodes, and then constructing a structural network Gtopo;
S2, capturing attribute information of all nodes, firstly clustering and primarily screening out nodes with similar attributes, then comparing the nodes with thresholds in each cluster to screen out node pairs with high similar attributes in the network overall situation, and finally constructing an attribute network G by using the proposed attribute edge connecting methodfeat;
S3, aiming at the input network, adjusting the weight of the network structure and the attribute, and forming a fusion network by the fusion structure network and the attribute network
S4, according to networkThe network is obtained by dividing the topological structure and the attribute information of the nodesRepeating the division process to obtain a series of attribute networks with gradually reduced network scale:respectively representing different granularities in the network, wherein i and k are integers, and i is an integer between 0 and k;
s5, performing matrix decomposition to obtain initial node low-dimensional vector representation of each granularity attribute network, and optimizing the initial representation through spectrum propagation to obtain node feature representation of the granularity;
s6, splicing the node feature representations of all granularities to obtain the node feature representation reflecting a plurality of granularities of the network;
and S7, sending the node feature representation and the labels of the multi-granularity attribute network into a classifier, predicting the labels of the nodes of unknown classes, and classifying the nodes with the same labels into the same class to finish node classification.
2. The method for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to claim 1, wherein the step S1 comprises:
s11, processing the input network, including:
step A, numbering entities in a network; and the network comprises n1Entities, each entity acting as a node of a network, n1The relation between the entities is used as the connecting edge of the network, and the number of the connecting edges is n2;
B, dividing the entities input into the network into a plurality of categories, wherein the label of each node is a category number;
s12, constructing a structure network G according to the processed network datatopo:
Network Gtopo(V, E) wherein V represents n1A set of nodes, E represents n2Set of bars connecting edges.
S13, constructing a structure network G according to the processed network datatopo:
Network Gtopo(V, E) wherein V represents n1A set of nodes, E represents n2Set of bars connecting edges.
3. The method for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to claim 2, wherein the attribute edge connection step S2 includes:
s21, screening node pairs with high similarity of attributes, and connecting edges of the attributes:
acquiring node attribute information, and clustering nodes in the network based on attribute relation by using a KMeans algorithm;
for nodes in each class, simultaneously performing cosine similarity AttiSim detection between every two nodes; similarity detection and assignment are performed on node pairs by using formula (1), Af(i, j) representsThe similarity degree between the ith node and the jth node is expressed in a matrix form:
if the attribute similarity AtTiSim between the node pairs is larger than a preset threshold value gamma, screening the node pairs as node pairs with high attribute similarity;
and obtaining node pairs with high attribute similarity AttiSim between the nodes by comparing threshold values, namely, the attributes between the nodes have high similarity. Connecting an edge between the two nodes, setting the weight as a cosine similarity value on the node pair to represent the attribute similarity degree between the two nodes, wherein the process is the attribute connecting edge;
s22, constructing attribute network G according to the result obtained by attribute edge connectionfeat:
Network Gfeat(V, E, X) wherein V represents n1A set of nodes, E represents n2A set of contiguous edges, X being an n1L ofA multidimensional matrix, l representing the dimension of the node attribute; constructing and completing an attribute network G according to the existing nodes and the newly generated continuous edge conditionfeat。
4. The method for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to claim 3, wherein the step S3 comprises:
s31, aiming at the given input network, adjusting the network structure and the attribute adding weight, and forming a fusion network by the fusion structure network and the attribute network
Will structure network GtopoAnd attribute network GfeatRespectively by adjacent matrix Atopo,AfeatExpressing that aiming at different networks, weights are respectively given to adapt to a given network as shown in formula (2), and then the two matrixes are added to obtain a network fusion graphExpressed as a adjacency matrix
5. the method for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to claim 4, wherein the dividing step S4 comprises:
s41 based onAny one of the networksNetwork pair by using Louvain community division methodCarrying out community division, and obtaining a community division result based on the structure and the attribute:
wherein, ViIndicating the result of community division with similar structure and attribute,representing according to a networkThe 1 st community divided by the similarity degree of the structure and the attribute of the community;
s42, acquiring a super node set according to the community division result based on the structure and the attribute;
acquiring a networkThe division results of communities with similar structures and attributes will beEach new community as a networkTo obtain a set of supernodes
S43, according to the super node setThe attribute information in the supernode is obtained by averaging the nodes forming the previous granularity of the supernode to obtain the attribute information X of the supernodei+1;
S44, according to the super node setCombining the continuous edges of the nodes in the super-node set to form continuous edges of the super-nodes, superposing the weights to obtain super-edges, and acquiring a super-edge set Ei+1;
S45, according to the super node set Vi+1Super edge set Ei+1And attribute information X of the supernodei+1Building a new networkAnd is
S46, iteratively training to obtain a series of attribute networks with gradually reduced network sizes, and isk is the number of layers of the network granulation,for the coarsest layer of the attribute network,the thickness relationship of the particles is shown,to representParticle ratio ofThe particles of (2) are finer.
6. The method for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to claim 5, wherein the step S5 comprises:
s51, representing a series of coarsened networks in a matrix form, and learning to approximate initial representation by adopting a rapid matrix decomposition method Randomized SVD method; the method comprises the following steps:
step 1:
solving an approximate matrix Q in the range of the original matrix A, continuously multiplying the original matrix A by a randomly initialized small-dimension matrix Q by using a formula (3), and then decomposing to finally obtain a stable vector matrix:
A≈QQTA (3)
step 2: constructing a matrix:
B=QTA (4)
and step 3: and decomposing the matrix B by using an SVD method:
B=S∑VT (5)
and S52, adding local smooth information and clustering information of each layer by using spectrum propagation, optimizing the initial node representation, and obtaining the node representation under the granularity.
7. The method for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to claim 6, wherein the step S6 comprises:
s61, splicing the vector representation and the original attribute information of each granularity network according to a formula (6) to obtain a network representation reflecting a plurality of granularities:
8. The method for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to claim 7, wherein the step S7 comprises:
and (4) sending the node feature representation and the labels of the hierarchical attribute network into a classifier, predicting the labels of the nodes of unknown classes, classifying the nodes with the same labels into the same class, and finishing node classification.
9. The node classification system embedded in the multi-granularity attribute network based on the attribute continuous edge is characterized by comprising the following steps:
a construction module for numbering the citation network base and obtaining the node label based on the citation network base, and then constructing the structure network GtopoProperty network GfeatConverged networkWherein i is an integer;
a partitioning module for pairing networks based on structure and attribute informationDividing and granulating to obtain a coarsened networkRepeating the granulation process to obtain a series of attribute networks with gradually reduced network size: wherein i and k are integers;
the attribute network node feature module learns the low-dimensional vector initial feature representation of each granularity attribute network node;
an optimization module that optimizes the initial vector representation using spectral propagation;
the splicing module splices the vector representation and the attribute information of each granularity node to obtain network representation;
and the classification module is used for sending the multi-granularity attribute network node characteristics and the labels into the classifier, predicting the labels of the nodes of unknown classes, and classifying the nodes with the same labels into the same class to finish node classification.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111305905.6A CN114037008A (en) | 2021-11-05 | 2021-11-05 | Node classification method and system for multi-granularity attribute network embedding based on attribute continuous edges |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111305905.6A CN114037008A (en) | 2021-11-05 | 2021-11-05 | Node classification method and system for multi-granularity attribute network embedding based on attribute continuous edges |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114037008A true CN114037008A (en) | 2022-02-11 |
Family
ID=80142904
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111305905.6A Pending CN114037008A (en) | 2021-11-05 | 2021-11-05 | Node classification method and system for multi-granularity attribute network embedding based on attribute continuous edges |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114037008A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115174450A (en) * | 2022-07-05 | 2022-10-11 | 中孚信息股份有限公司 | Unknown equipment identification method and system based on network node representation |
-
2021
- 2021-11-05 CN CN202111305905.6A patent/CN114037008A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN115174450A (en) * | 2022-07-05 | 2022-10-11 | 中孚信息股份有限公司 | Unknown equipment identification method and system based on network node representation |
CN115174450B (en) * | 2022-07-05 | 2023-10-03 | 中孚信息股份有限公司 | Unknown equipment identification method and system based on network node characterization |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Bandyopadhyay et al. | Outlier resistant unsupervised deep architectures for attributed network embedding | |
US11163803B2 (en) | Higher-order graph clustering | |
CN107153713B (en) | Overlapping community detection method and system based on similitude between node in social networks | |
CN110263280B (en) | Multi-view-based dynamic link prediction depth model and application | |
Alzahrani et al. | Community detection in bipartite networks: Algorithms and case studies | |
Bhagat et al. | Node classification in social networks | |
CN110674407B (en) | Hybrid recommendation method based on graph convolution neural network | |
WO2019137185A1 (en) | Image screening method and apparatus, storage medium and computer device | |
CN106815310A (en) | A kind of hierarchy clustering method and system to magnanimity document sets | |
CN111932386A (en) | User account determining method and device, information pushing method and device, and electronic equipment | |
CN112487200B (en) | Improved deep recommendation method containing multi-side information and multi-task learning | |
CN111178399A (en) | Data processing method and device, electronic equipment and computer readable storage medium | |
CN115358487A (en) | Federal learning aggregation optimization system and method for power data sharing | |
CN111310068A (en) | Social network node classification method based on dynamic graph | |
Chi et al. | Hashing for adaptive real-time graph stream classification with concept drifts | |
CN104008177B (en) | Rule base structure optimization and generation method and system towards linguistic indexing of pictures | |
Gupta et al. | Seed community identification framework for community detection over social media | |
CN114037008A (en) | Node classification method and system for multi-granularity attribute network embedding based on attribute continuous edges | |
Rahebi et al. | Digital image edge detection using an ant colony optimization based on genetic algorithm | |
Long et al. | Relational data clustering: models, algorithms, and applications | |
CN111831758B (en) | Node classification method and device based on rapid hierarchical attribute network representation learning | |
CN111597428A (en) | Recommendation method for splicing user and article with q-separation k sparsity | |
Meena et al. | A survey on community detection algorithm and its applications | |
CN113807370B (en) | Data processing method, apparatus, device, storage medium and computer program product | |
CN115168609A (en) | Text matching method and device, computer equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |