CN114037008A - Node classification method and system for multi-granularity attribute network embedding based on attribute continuous edges - Google Patents

Node classification method and system for multi-granularity attribute network embedding based on attribute continuous edges Download PDF

Info

Publication number
CN114037008A
CN114037008A CN202111305905.6A CN202111305905A CN114037008A CN 114037008 A CN114037008 A CN 114037008A CN 202111305905 A CN202111305905 A CN 202111305905A CN 114037008 A CN114037008 A CN 114037008A
Authority
CN
China
Prior art keywords
network
attribute
node
nodes
granularity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111305905.6A
Other languages
Chinese (zh)
Inventor
赵姝
姚诚
杜紫维
段震
陈洁
张燕平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Anhui University
Original Assignee
Anhui University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Anhui University filed Critical Anhui University
Priority to CN202111305905.6A priority Critical patent/CN114037008A/en
Publication of CN114037008A publication Critical patent/CN114037008A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a node classification method and a node classification system for multi-granularity attribute network embedding based on attribute continuous edges, which are used for constructing a structural network aiming at an input network, selecting node pairs with high similarity of attributes and providing attribute continuous edges to construct an attribute network; adjusting a network structure and attribute weight, fusing the network structure and the attribute weight, dividing the topological structure and node attribute information in the network to obtain a coarsening network, repeating the division process to obtain a series of coarsening attribute networks, performing matrix decomposition to obtain initial node low-dimensional vector representation of each granularity attribute network, and optimizing the initial representation through spectrum propagation to obtain node feature representation of the granularity; and splicing the node characteristic representations of all the granularities and sending the node characteristic representations to a classifier to finish node classification. The attribute continuous edges are firstly clustered, then processed by adopting a threshold value, the speed of attribute processing is obviously accelerated, meanwhile, a granulation model is obtained through rapid division to reduce the network scale, and node representation is learned, so that the speed of node classification is generally accelerated.

Description

Node classification method and system for multi-granularity attribute network embedding based on attribute continuous edges
Technical Field
The invention relates to the technical field of network embedding, in particular to a node classification method and a node classification system for multi-granularity attribute network embedding based on attribute edge connection.
Background
The network data form can naturally express the relation between objects and is ubiquitous in daily life and work of people. In recent years, networks are developed rapidly as important data structures for simulating complex systems in the real world, and for example, a citation network, a social network and the like can perform a series of data mining analysis by constructing a complex network model. Nodes and edges are vital as an essential part of the constituent network. For example, in a citation network, each article can be represented by one node, the citation relationship between the articles is described by connecting edges in the network, and citations with higher similarity can be recommended to readers by predicting labels of the articles. In the social network, each social entity user can be represented by one node, the relationship among the users can be replaced by edges, and a node classification model is constructed by predicting and classifying the types of the users, so that the personalized recommendation of the users is realized. Therefore, the problem study of node classification plays a crucial role in network analysis.
Network embedding is to map each node into a low-dimensional vector representation by keeping the network structure and its inherent characteristics, and the method attracts great attention with the continuous development of networks. In practical application, the node classification task is mainly performed by using a network embedding method, low-dimensional vector representations of each node are obtained through learning, and the similarity degree between the nodes is calculated by using the vector representations, so that the labels in the nodes are predicted. There are various network embedding methods to obtain node low-dimensional vectors, such as a single-granularity structure information node classification method in a reserved network, a single-granularity structure and attribute information node classification method in a reserved network, a multi-granularity structure information node classification method in a reserved network, and the like. Using these different strategies, low-dimensional vector representations of nodes can be obtained, and these representations are used for node classification tasks. However, the above methods all have certain limitations, for example, the node classification method for retaining single-granularity structural information in a network generally cannot retain attribute information of a node, which may result in low node classification accuracy. The node classification method for retaining single-granularity structures and attribute information in the network generally cannot capture the characteristics of multiple granularities in the network, only can reflect the network characteristics under a certain single granularity, and further cannot show greater superiority on the node classification task. Although the node classification method for retaining the multi-granularity structure information in the network retains the multi-granularity information of the network, the node attribute information is ignored, so that a more excellent node classification result cannot be obtained. The method has yet to be improved in node classification accuracy.
Disclosure of Invention
The invention aims to solve the technical problem that the existing node classification method using network embedding can not obtain more effective classification results.
The invention solves the technical problems through the following technical means:
a node classification method and system based on multi-granularity attribute network embedding of attribute continuous edges comprises the following steps:
s1, numbering the nodes in the network according to the input network and obtaining the label information of the nodes, and then constructing a structural network Gtopo
S2, capturing attribute information of all nodes, firstly clustering and primarily screening out nodes with similar attributes, then comparing the nodes with thresholds in each cluster to screen out node pairs with high similar attributes in the network overall situation, and finally constructing an attribute network G by using the proposed attribute edge connecting methodfeat
S3, aiming at the input network, adjusting the weight of the network structure and the attribute, and forming a fusion network by the fusion structure network and the attribute network
Figure BDA0003340096290000021
S4, according to the converged network
Figure BDA0003340096290000022
The network is obtained by dividing the topological structure and the attribute information of the nodes
Figure BDA0003340096290000023
Repeating the division process to obtain a series of attribute networks with gradually reduced network scale:
Figure BDA0003340096290000024
respectively representing different granularities in the network, wherein i and k are integers, and i is an integer between 0 and k;
s5, performing matrix decomposition to obtain initial node low-dimensional vector representation of each granularity attribute network, and optimizing the initial representation through spectrum propagation to obtain node feature representation of the granularity;
and S6, splicing the node feature representations of all the granularities to obtain the node feature representation reflecting the multiple granularities of the network.
And S7, sending the node feature representation and the labels of the multi-granularity attribute network into a classifier, predicting the labels of the nodes of unknown classes, and classifying the nodes with the same labels into the same class to finish node classification.
The invention constructs the structure network G firstlytopoAnd then constructing an attribute network G by using the proposed attribute edge linking methodfeatThe fusion structure network and the attribute network form a fusion network
Figure BDA0003340096290000025
And to network
Figure BDA0003340096290000026
The network is obtained by dividing the topological structure and the attribute information of the nodes
Figure BDA0003340096290000027
The division process is repeated to obtain a series of hierarchical attribute networks with gradually reduced network scale, and the multi-granularity attribute network embedding method based on attribute connection edges can not only retain the attribute information of the nodes in the network, but also retain different granularity characteristics of the network, thereby improving the node classification performance and solving the problem of the node classification method utilizing network embedding at presentThe method can not obtain more efficient classification results.
As a further scheme of the invention: the step S1 includes:
s11, processing the input network, including:
step A, numbering entities in a network; and the network comprises n1Entities, each entity acting as a node of a network, n1The relation between the entities is used as the connecting edge of the network, and the number of the connecting edges is n2
B, dividing the entities input into the network into a plurality of categories, wherein the label of each node is a category number;
s12, constructing a structure network G according to the processed network datatopo
Network Gtopo(V, E) wherein V represents n1A set of nodes, E represents n2Set of bars connecting edges.
As a further scheme of the invention: the S2 process includes:
s21, screening node pairs with high similarity of attributes, and connecting edges of the attributes:
and acquiring node attribute information, and clustering nodes in the network based on attribute relation by using a KMeans algorithm.
And for the nodes in each class, simultaneously performing cosine similarity AtTiSim detection between every two nodes. Similarity detection and assignment are performed on node pairs by using formula (1), Af(i, j) represents
Figure BDA0003340096290000031
The similarity degree between the ith node and the jth node is expressed in a matrix form:
Figure BDA0003340096290000032
if the attribute similarity AtTiSim between the node pairs is larger than a preset threshold value gamma, screening the node pairs as node pairs with high similarity of the attributes in the network overall situation;
and obtaining node pairs with high attribute similarity AttiSim between the nodes by comparing threshold values, namely, the attributes between the nodes have high similarity. Connecting an edge between the two nodes, setting the weight as a cosine similarity value on the node pair to represent the attribute similarity degree between the two nodes, wherein the process is the attribute connecting edge;
s22, constructing attribute network G according to the result obtained by attribute edge connectionfeat:
Network Gfeat(V, E, X) wherein V represents n1A set of nodes, E represents n2A set of contiguous edges, X being an n1L, where l represents the dimension of the node attribute; constructing and completing an attribute network G according to the existing nodes and the newly generated continuous edge conditionfeat
As a further scheme of the invention: the S3 process includes:
s31, aiming at the given input network, adjusting the network structure and the attribute adding weight, and forming a fusion network by the fusion structure network and the attribute network
Figure BDA0003340096290000033
Will structure network GtopoAnd attribute network GfeatRespectively by adjacent matrix Atopo,AfeatExpressing that aiming at different networks, weights are respectively given to adapt to a given network as shown in formula (2), and then the two matrixes are added to obtain a network fusion graph
Figure BDA0003340096290000034
Expressed as a adjacency matrix
Figure BDA0003340096290000035
Figure BDA0003340096290000036
Network convergence graph at this time
Figure BDA0003340096290000037
The weight value between the middle nodes represents the similarity degree between the nodes under the combined action of the network structure and the attributes;
as a further scheme of the invention: the S4 process includes:
s41 fusion network based on any granularity
Figure BDA0003340096290000041
Network pair by using Louvain community division method
Figure BDA0003340096290000042
Carrying out community division, and obtaining community division results based on structure and attribute information:
Figure BDA0003340096290000043
wherein, ViIndicating the result of community division with similar structure and attribute,
Figure BDA0003340096290000044
representing according to a network
Figure BDA0003340096290000045
The 1 st community divided by the similarity degree of the structure and the attribute of the community;
s42, acquiring a super node set according to the community division result based on the structure and the attribute;
acquiring a network
Figure BDA0003340096290000046
The division results of communities with similar structures and attributes will be
Figure BDA0003340096290000047
Each new community as a network
Figure BDA0003340096290000048
To obtain a set of supernodes
Figure BDA0003340096290000049
S43, according to the super node set
Figure BDA00033400962900000410
The attribute information in the supernode is obtained by averaging the nodes forming the previous granularity of the supernode to obtain the attribute information X of the supernodei+1
S44, according to the super node set
Figure BDA00033400962900000411
Combining the continuous edges of the nodes in the super-node set to form continuous edges of the super-nodes, superposing the weights to obtain super-edges, and acquiring a super-edge set Ei+1
S45, according to the super node set Vi+1Super edge set Ei+1And attribute information X of the supernodei+1Building a new network
Figure BDA00033400962900000412
And is
Figure BDA00033400962900000413
S46, iteratively training to obtain a series of attribute networks with gradually reduced network sizes,
Figure BDA00033400962900000414
Figure BDA00033400962900000415
and is
Figure BDA00033400962900000416
Figure BDA00033400962900000417
For the coarsest layer of the attribute network,
Figure BDA00033400962900000421
the thickness relationship of the particles is shown,
Figure BDA00033400962900000418
to represent
Figure BDA00033400962900000419
Particle ratio of
Figure BDA00033400962900000420
The particles of (2) are finer.
As a further scheme of the invention: the S5 process includes:
s51, for a series of coarsened networks with different granularities, the coarsened networks are expressed in a matrix form, and the initial expression is approximated by learning by a rapid matrix decomposition method Randomized SVD method. The method comprises the following steps:
step 1:
solving an approximate matrix Q in the range of the original matrix A, continuously multiplying the original matrix A by a randomly initialized small-dimension matrix Q by using a formula (3), and then decomposing to finally obtain a stable vector matrix:
A≈QQTA (3)
step 2: constructing a matrix:
B=QTA (4)
and step 3: and decomposing the matrix B by using an SVD method:
B=S∑VT (5)
s52, adding local smooth information and clustering information of each layer by using spectrum propagation, optimizing initial node representation, and obtaining node representation under the granularity;
as a further scheme of the invention: the S6 process includes:
s61, splicing the vector representation and the original attribute information of each granularity network according to a formula (6) to obtain a network representation reflecting a plurality of granularities:
Figure BDA0003340096290000051
wherein
Figure BDA0003340096290000052
Representing a splicing operation;
as a further scheme of the invention: the S7 process includes:
the multi-granularity attribute network node characteristics and labels are sent into a classifier, the labels of nodes of unknown classes are predicted, the nodes with the same labels are classified into the same class, and node classification is completed;
a node classification method and system based on multi-granularity attribute network embedding of attribute continuous edges comprises the following steps:
a construction module for numbering the citation network base and obtaining the node label based on the citation network base, and then constructing the structure network GtopoProperty network GfeatConverged network
Figure BDA0003340096290000053
Wherein i is an integer;
a partitioning module for pairing networks based on structure and attribute information
Figure BDA0003340096290000054
Dividing and granulating to obtain a coarsened network
Figure BDA0003340096290000055
Repeating the granulation process to obtain a series of attribute networks with gradually reduced network size:
Figure BDA0003340096290000056
Figure BDA0003340096290000057
wherein i and k are integers;
the attribute network node feature module learns the low-dimensional vector initial feature representation of each granularity attribute network node;
an optimization module that optimizes the initial vector representation using spectral propagation;
the splicing module splices the vector representation and the attribute information of each granularity node to obtain network representation;
the classification module is used for sending the multi-granularity attribute network node characteristics and the labels into the classifier, predicting the labels of the nodes of unknown classes, classifying the nodes with the same labels into the same class, and finishing node classification;
the invention has the advantages that:
1. in the invention, a structure network G is constructed firstlytopoAnd then constructing an attribute network G by attribute connectionfeatFinally form a converged network
Figure BDA0003340096290000058
Similar relationships between structures and attributes in the network may be maintained, respectively. Network based on both structure and attribute information
Figure BDA0003340096290000059
Partitioning to obtain a network
Figure BDA00033400962900000510
The division process is repeated to obtain a series of attribute networks with gradually reduced network scale, network structures and attribute information under different granularities can be reserved, and the attribute network information of different granularities can be freely exchanged and transmitted, so that the node classification performance is improved, and the problem that a more effective classification result cannot be obtained by using the conventional network-embedded node classification method is solved.
2. According to the invention, the attribute information in the network can be rapidly reserved through the attribute connection, the attribute similarity relation is added into the original structure network, the network structure and the attribute information are effectively reserved through the fusion structure and the attribute network, and meanwhile, the adding weight of the structure and the attribute information is adjusted, so that the classification result can be better improved.
3. In the invention, not only can the network embedding results of various granularities be obtained, and the result of each granularity can approximately represent the original network, but also the network embedding of various granularities can be spliced to describe the multi-granularity information of the original network.
In the invention, the attribute continuous edges are firstly clustered and then processed by adopting a threshold, so that the speed of attribute processing is obviously accelerated, and meanwhile, a granulation model is obtained through rapid division to reduce the network scale and learn node representation, thereby accelerating the speed of node classification on the whole.
Drawings
Fig. 1 is a schematic flowchart of a node classification method and system for attribute-edge-based multi-granularity attribute network embedding according to embodiment 1 of the present invention.
Fig. 2 is a framework diagram of a node classification method and system based on attribute edge-connected multi-granularity attribute network embedding provided in embodiment 1 of the present invention.
Fig. 3 is a schematic structural diagram of a node classification method and system for attribute edge-based multi-granularity attribute network embedding according to embodiment 2 of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the embodiments of the present invention, and it is obvious that the described embodiments are some embodiments of the present invention, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
First, a brief introduction is made to Pubmed, which is a database that provides biomedical paper search and summarization, and is free to search. Its database source is MEDLINE. The core subject is medicine, but also includes other medically related fields, such as nursing or other health disciplines.
Example 1
Referring to fig. 1 and fig. 2, fig. 1 is a schematic flowchart of a method and a system for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to embodiment 1 of the present invention, and fig. 2 is a frame diagram of the method and the system for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to embodiment 1 of the present invention, including the following steps:
s1, constructing a structural network G based on the citation network librarytopo
S11, processing the cited reference network library, specifically comprising the following steps:
A. numbering articles in the Pubmed data set; and the data set comprises n1Articles, each article being a node of the network, n1The quoted or quoted relation between articles is used as the connecting edge of the network, and the number of the connecting edge is n2
Numbering the articles in the citation network library according to the sequence of 0, 1, 2 … n1
Specifically, the present invention takes Pubmed as an example, and Pubmed is a database providing biomedical literature search and summarization. The main sources of the data are as follows: MEDLINE, OLDMEDLINE, Record in process, Record supported by publishing et al. Data type: journal articles, reviews, and links to other databases. The core subject is medicine, but other medically related fields are also included. Each article contains a title and an abstract, each article in the Pubmed is a node in the network and is numbered 0, 1, 2 … 3311 in sequence, and the reference or referenced relationship is a connecting edge of the network, and the number of the articles is n2
B. Classifying the articles in the Pubmed data set;
in this embodiment, the categories of articles in Pubmed are numbered 0, 1, 2 as labels of each node; extracting the title and abstract of each article, sorting to obtain 500 high-frequency words serving as attribute information of the article, and obtaining 500-dimensional vector representation of the attribute information by using a Tfidf vector;
and removing stop words and low-frequency words from the attribute information of the entity, and converting the attribute information into TF-IDF vector representation, wherein the vector is used as the attribute information of the node.
S12, constructing a structural network T according to the processed data of the citation network library;
network Gtopo(V, E) wherein V represents n1A set of nodes, E represents n2A set of contiguous edges.
In this embodiment, since there are 19717 nodes and 44338 connecting edges in Pubmed, in this embodiment, V represents 19717 nodesSet of (V) { V ═ V0,v1,…};
E represents the set of 44338 connected edges in the network; e ═ E1,e2…, where E ═ u, V ∈ E, indicates that there is a connecting edge between node V and node u (i.e., there is a reference relationship), and V and u indicate that V ═ V ∈ E, and V and u indicate that V are equal to { V ∈ }0,v1… };
s2, capturing attribute information of all nodes, screening node pairs with high similarity of attributes, and establishing an attribute network G by providing an attribute edge connecting methodfeat
Network Gtopo(V, E, X) wherein V represents n1A set of nodes, E represents n2A set of contiguous edges, X being an n1A multidimensional matrix of l, l representing the dimension of the node attribute, l being generally known, and
Figure BDA0003340096290000081
x is a 19717X 500 dimensional matrix, 500 is the dimension of the node attribute in Pubmed, and the ith row in the matrix represents the ith node viBy x, attribute information ofiRepresents;
s21, screening node pairs with high similarity of attributes, and connecting edges of the attributes:
and acquiring node attribute information, and clustering nodes in the network based on attribute relation by using a KMeans algorithm.
And for the nodes in each class, simultaneously performing cosine similarity AtTiSim detection between every two nodes. Similarity detection and assignment are performed on node pairs by using formula (1), Af(i, j) represents
Figure BDA0003340096290000082
The similarity degree between the ith node and the jth node is expressed in a matrix form:
Figure BDA0003340096290000083
if the attribute similarity AtTiSim between the node pairs is larger than a preset threshold value gamma, screening the node pairs as node pairs with high attribute similarity;
and obtaining node pairs with high attribute similarity AttiSim between the nodes by comparing threshold values, namely, the attributes between the nodes have high similarity. Connecting an edge between the two nodes, setting the weight as a cosine similarity value on the node pair to represent the attribute similarity degree between the two nodes, wherein the process is the attribute connecting edge;
s22, constructing attribute network G according to the result obtained by attribute edge connectionfeat:
Network Gfeat(V, E, X) wherein V represents n1A set of nodes, E represents n2A set of contiguous edges, X being an n1L, where l represents the dimension of the node attribute; constructing and completing an attribute network G according to the existing nodes and the newly generated continuous edge conditionfeat
S3, aiming at the given input network, adjusting the network structure and attribute adding weight, and forming a fusion network by the fusion structure network and the attribute network
Figure BDA0003340096290000084
S31, aiming at the given input network, adjusting the network structure and the attribute adding weight, and forming a fusion network by the fusion structure network and the attribute network
Figure BDA0003340096290000085
Will structure network GtopoAnd attribute network GfeatRespectively by adjacent matrix Atopo,AfeatExpressing that aiming at different networks, weights are respectively given to adapt to a given network as shown in formula (2), and then the two matrixes are added to obtain a network fusion graph
Figure BDA0003340096290000086
Expressed as a adjacency matrix
Figure BDA0003340096290000087
Figure BDA0003340096290000088
Network convergence graph at this time
Figure BDA0003340096290000091
The middle nodes represent the similarity degree under the joint action of the structure and the attribute to the weight, so that the network with each granularity can keep the backbone structure and the attribute information of the original network to solve the approximate solution of the node representation of the original network, and the structure and the attribute information of the original network are fully utilized, so that the representation of similar nodes is more similar, and the similar nodes are more easily classified into the same category.
S4, network
Figure BDA0003340096290000092
The network is obtained by dividing the topological structure and the attribute information of the nodes
Figure BDA0003340096290000093
Repeating the division process to obtain a series of hierarchical attribute networks with gradually reduced network scale:
Figure BDA0003340096290000094
respectively representing different granularities in the network, wherein i and k are integers, and i is an integer between 0 and k;
s41, based on any network
Figure BDA0003340096290000095
Network pair by using Louvain community division method
Figure BDA0003340096290000096
Carrying out community division, and obtaining a community division result based on the structure and the attribute:
Figure BDA0003340096290000097
wherein, ViRepresenting communities of similar structure and attributesAs a result of the division, the result of the division,
Figure BDA0003340096290000098
representing according to a network
Figure BDA0003340096290000099
The 1 st community divided by the similarity degree of the structure and the attribute of the community;
s42, acquiring a super node set according to the community division result based on the structure and the attribute;
acquiring a network
Figure BDA00033400962900000910
The division results of communities with similar structures and attributes will be
Figure BDA00033400962900000911
Each new community as a network
Figure BDA00033400962900000912
To obtain a set of supernodes
Figure BDA00033400962900000913
S43, according to the super node set
Figure BDA00033400962900000914
The attribute information in the supernode is obtained by averaging the nodes forming the previous granularity of the supernode to obtain the attribute information X of the supernodei+1
S44, according to the super node set
Figure BDA00033400962900000915
Combining the continuous edges of the nodes in the super-node set to form continuous edges of the super-nodes, superposing the weights to obtain super-edges, and acquiring a super-edge set Ei+1
S45, according to the super node set Vi+1Super edge set Ei+1And attribute information X of the supernodei+1Building a new network
Figure BDA00033400962900000916
And is
Figure BDA00033400962900000917
S46, iteratively training to obtain a series of attribute networks with gradually reduced network sizes,
Figure BDA00033400962900000918
Figure BDA00033400962900000919
and is
Figure BDA00033400962900000920
k is the number of layers of the network granulation,
Figure BDA00033400962900000921
for the coarsest layer of the attribute network,
Figure BDA00033400962900000925
the thickness relationship of the particles is shown,
Figure BDA00033400962900000922
to represent
Figure BDA00033400962900000923
Particle ratio of
Figure BDA00033400962900000924
The particles of (2) are finer. Exemplified therein are granulated three layers:
|V0|=19717>|V1|=9614>|V2|=4938>|V3|=2235,
|E0|=44338>|E1|=31338>|E2|=21843>|E3|=10137。
s5, performing matrix decomposition to obtain initial node low-dimensional vector representation of each granularity attribute network, and optimizing the initial representation through spectrum propagation to obtain node feature representation of the granularity;
s51, for a series of coarsened networks with different granularities, the coarsened networks are expressed in a matrix form, and the initial expression is approximated by learning by a rapid matrix decomposition method Randomized SVD method. The method comprises the following steps:
step 1:
solving an approximate matrix Q in the range of the original matrix A, continuously multiplying the original matrix A by a randomly initialized small-dimension matrix Q by using a formula (3), and then decomposing to finally obtain a stable vector matrix:
A≈QQTA (3)
step 2: constructing a matrix:
B=QTA (4)
and step 3: and decomposing the matrix B by using an SVD method:
B=S∑VT (5)
s52, adding local smooth information and clustering information of each layer by using spectrum propagation, optimizing initial node representation, and obtaining node representation under the granularity;
s6: obtaining initial node low-dimensional vector representation of each granularity attribute network through matrix decomposition, and optimizing the initial representation through spectrum propagation to obtain node feature representation of the granularity;
s61, splicing the vector representation and the original attribute information of each granularity network according to a formula (6) to obtain a network representation reflecting a plurality of granularities:
Figure BDA0003340096290000101
wherein
Figure BDA0003340096290000102
The splicing operation is expressed, the structure information and the attribute information are further fused, the network structure and the attribute information are reserved, and the example Pubmed is combined to obtain the information
Figure BDA0003340096290000103
Splicing the network representation and the original attribute information to obtain the final vector of each node in the networkAnd (4) showing.
And S7, sending the node feature representation and the labels of the hierarchical attribute network into a classifier, predicting the labels of the nodes of unknown classes, and classifying the nodes with the same labels into the same class to finish node classification.
It should be noted that the node classification model mentioned in the present invention is a very common classification model, such as an SVM classifier, and the classification model is not improved in the present application.
And step S7, predicting the labels of the nodes with unknown labels by using an SVM classifier according to the labels of the nodes Z and G of the network expressed by the low-dimensional vectors, wherein the nodes with the same labels are classified into the same class.
Illustratively, in order to verify the effectiveness and the advancement of the technical scheme provided by the invention, several existing node classification methods are selected for comparison: deepwalk, Node2vec, AANE, ASNE, HARP, MILE. The method comprises the steps that Deepwalk and Node2vec are single-granularity structure information Node classification methods in a reserved network, AANE and ASNE are single-granularity structures and attribute information Node classification methods in the reserved network, HARP and MILE are multi-granularity structure information Node classification methods in the reserved network, low-dimensional representation of each Node is learned to carry out Node classification, and the method (marked as AMG) is a method for learning embedded representation of nodes in multiple granularities by fusing attribute information and then carrying out Node classification. In the invention, the granulation layer number is set to be 1 and 2, the classification results of the nodes on the Pubmed citation network set in different training set proportions by the method are evaluated through Micro-F1 and Macro-F1, the best results are thickened, and the results are shown in Table 1. It can be seen that the present invention is best classified on all scales.
2. In order to verify the rapidity of the technical scheme provided by the invention, the method is used for classifying nodes on data sets Cora, Citeser and Pubmed citation networks. And selecting several existing node classification methods for comparison: deepwalk, Node2vec, AANE, ASNE, HARP, MILE. In the invention (AMG), the number of layers k is set to 1, 2, randomised SVD is used to learn the initial representation of the node, spectrum propagation is used to optimize the node representation, the representation of all granularities is integrated by splicing to obtain the final node representation, the node classification result is as in table 1, and the time consumed by node classification is as in table 2. As can be seen from table 2, the improvement in time is also very significant, and the average improvement is also very large, with the best results being shown in bold, it can be seen that the method is the fastest on the listed data sets.
The above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (9)

1. The node classification method based on multi-granularity attribute network embedding of attribute continuous edges is characterized by comprising the following steps of:
s1, numbering the nodes in the network according to the input network and obtaining the label information of the nodes, and then constructing a structural network Gtopo
S2, capturing attribute information of all nodes, firstly clustering and primarily screening out nodes with similar attributes, then comparing the nodes with thresholds in each cluster to screen out node pairs with high similar attributes in the network overall situation, and finally constructing an attribute network G by using the proposed attribute edge connecting methodfeat
S3, aiming at the input network, adjusting the weight of the network structure and the attribute, and forming a fusion network by the fusion structure network and the attribute network
Figure FDA0003340096280000011
S4, according to network
Figure FDA0003340096280000012
The network is obtained by dividing the topological structure and the attribute information of the nodes
Figure FDA0003340096280000013
Repeating the division process to obtain a series of attribute networks with gradually reduced network scale:
Figure FDA0003340096280000014
respectively representing different granularities in the network, wherein i and k are integers, and i is an integer between 0 and k;
s5, performing matrix decomposition to obtain initial node low-dimensional vector representation of each granularity attribute network, and optimizing the initial representation through spectrum propagation to obtain node feature representation of the granularity;
s6, splicing the node feature representations of all granularities to obtain the node feature representation reflecting a plurality of granularities of the network;
and S7, sending the node feature representation and the labels of the multi-granularity attribute network into a classifier, predicting the labels of the nodes of unknown classes, and classifying the nodes with the same labels into the same class to finish node classification.
2. The method for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to claim 1, wherein the step S1 comprises:
s11, processing the input network, including:
step A, numbering entities in a network; and the network comprises n1Entities, each entity acting as a node of a network, n1The relation between the entities is used as the connecting edge of the network, and the number of the connecting edges is n2
B, dividing the entities input into the network into a plurality of categories, wherein the label of each node is a category number;
s12, constructing a structure network G according to the processed network datatopo
Network Gtopo(V, E) wherein V represents n1A set of nodes, E represents n2Set of bars connecting edges.
S13, constructing a structure network G according to the processed network datatopo
Network Gtopo(V, E) wherein V represents n1A set of nodes, E represents n2Set of bars connecting edges.
3. The method for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to claim 2, wherein the attribute edge connection step S2 includes:
s21, screening node pairs with high similarity of attributes, and connecting edges of the attributes:
acquiring node attribute information, and clustering nodes in the network based on attribute relation by using a KMeans algorithm;
for nodes in each class, simultaneously performing cosine similarity AttiSim detection between every two nodes; similarity detection and assignment are performed on node pairs by using formula (1), Af(i, j) represents
Figure FDA0003340096280000021
The similarity degree between the ith node and the jth node is expressed in a matrix form:
Figure FDA0003340096280000022
if the attribute similarity AtTiSim between the node pairs is larger than a preset threshold value gamma, screening the node pairs as node pairs with high attribute similarity;
and obtaining node pairs with high attribute similarity AttiSim between the nodes by comparing threshold values, namely, the attributes between the nodes have high similarity. Connecting an edge between the two nodes, setting the weight as a cosine similarity value on the node pair to represent the attribute similarity degree between the two nodes, wherein the process is the attribute connecting edge;
s22, constructing attribute network G according to the result obtained by attribute edge connectionfeat:
Network Gfeat(V, E, X) wherein V represents n1A set of nodes, E represents n2A set of contiguous edges, X being an n1L ofA multidimensional matrix, l representing the dimension of the node attribute; constructing and completing an attribute network G according to the existing nodes and the newly generated continuous edge conditionfeat
4. The method for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to claim 3, wherein the step S3 comprises:
s31, aiming at the given input network, adjusting the network structure and the attribute adding weight, and forming a fusion network by the fusion structure network and the attribute network
Figure FDA0003340096280000023
Will structure network GtopoAnd attribute network GfeatRespectively by adjacent matrix Atopo,AfeatExpressing that aiming at different networks, weights are respectively given to adapt to a given network as shown in formula (2), and then the two matrixes are added to obtain a network fusion graph
Figure FDA0003340096280000024
Expressed as a adjacency matrix
Figure FDA0003340096280000025
Figure FDA0003340096280000026
Network convergence graph at this time
Figure FDA0003340096280000027
The weight value between the middle nodes represents the similarity degree between the nodes under the combined action of the network structure and the attributes;
5. the method for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to claim 4, wherein the dividing step S4 comprises:
s41 based onAny one of the networks
Figure FDA0003340096280000031
Network pair by using Louvain community division method
Figure FDA0003340096280000032
Carrying out community division, and obtaining a community division result based on the structure and the attribute:
Figure FDA0003340096280000033
wherein, ViIndicating the result of community division with similar structure and attribute,
Figure FDA0003340096280000034
representing according to a network
Figure FDA0003340096280000035
The 1 st community divided by the similarity degree of the structure and the attribute of the community;
s42, acquiring a super node set according to the community division result based on the structure and the attribute;
acquiring a network
Figure FDA0003340096280000036
The division results of communities with similar structures and attributes will be
Figure FDA0003340096280000037
Each new community as a network
Figure FDA0003340096280000038
To obtain a set of supernodes
Figure FDA0003340096280000039
S43, according to the super node set
Figure FDA00033400962800000310
The attribute information in the supernode is obtained by averaging the nodes forming the previous granularity of the supernode to obtain the attribute information X of the supernodei+1
S44, according to the super node set
Figure FDA00033400962800000311
Combining the continuous edges of the nodes in the super-node set to form continuous edges of the super-nodes, superposing the weights to obtain super-edges, and acquiring a super-edge set Ei+1
S45, according to the super node set Vi+1Super edge set Ei+1And attribute information X of the supernodei+1Building a new network
Figure FDA00033400962800000312
And is
Figure FDA00033400962800000313
S46, iteratively training to obtain a series of attribute networks with gradually reduced network sizes,
Figure FDA00033400962800000314
Figure FDA00033400962800000315
and is
Figure FDA00033400962800000316
k is the number of layers of the network granulation,
Figure FDA00033400962800000317
for the coarsest layer of the attribute network,
Figure FDA00033400962800000318
the thickness relationship of the particles is shown,
Figure FDA00033400962800000319
to represent
Figure FDA00033400962800000320
Particle ratio of
Figure FDA00033400962800000321
The particles of (2) are finer.
6. The method for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to claim 5, wherein the step S5 comprises:
s51, representing a series of coarsened networks in a matrix form, and learning to approximate initial representation by adopting a rapid matrix decomposition method Randomized SVD method; the method comprises the following steps:
step 1:
solving an approximate matrix Q in the range of the original matrix A, continuously multiplying the original matrix A by a randomly initialized small-dimension matrix Q by using a formula (3), and then decomposing to finally obtain a stable vector matrix:
A≈QQTA (3)
step 2: constructing a matrix:
B=QTA (4)
and step 3: and decomposing the matrix B by using an SVD method:
B=S∑VT (5)
and S52, adding local smooth information and clustering information of each layer by using spectrum propagation, optimizing the initial node representation, and obtaining the node representation under the granularity.
7. The method for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to claim 6, wherein the step S6 comprises:
s61, splicing the vector representation and the original attribute information of each granularity network according to a formula (6) to obtain a network representation reflecting a plurality of granularities:
Figure FDA0003340096280000041
wherein
Figure FDA0003340096280000042
Indicating a splicing operation.
8. The method for classifying nodes embedded in a multi-granularity attribute network based on attribute edge connection according to claim 7, wherein the step S7 comprises:
and (4) sending the node feature representation and the labels of the hierarchical attribute network into a classifier, predicting the labels of the nodes of unknown classes, classifying the nodes with the same labels into the same class, and finishing node classification.
9. The node classification system embedded in the multi-granularity attribute network based on the attribute continuous edge is characterized by comprising the following steps:
a construction module for numbering the citation network base and obtaining the node label based on the citation network base, and then constructing the structure network GtopoProperty network GfeatConverged network
Figure FDA0003340096280000043
Wherein i is an integer;
a partitioning module for pairing networks based on structure and attribute information
Figure FDA0003340096280000044
Dividing and granulating to obtain a coarsened network
Figure FDA0003340096280000045
Repeating the granulation process to obtain a series of attribute networks with gradually reduced network size:
Figure FDA0003340096280000046
Figure FDA0003340096280000047
wherein i and k are integers;
the attribute network node feature module learns the low-dimensional vector initial feature representation of each granularity attribute network node;
an optimization module that optimizes the initial vector representation using spectral propagation;
the splicing module splices the vector representation and the attribute information of each granularity node to obtain network representation;
and the classification module is used for sending the multi-granularity attribute network node characteristics and the labels into the classifier, predicting the labels of the nodes of unknown classes, and classifying the nodes with the same labels into the same class to finish node classification.
CN202111305905.6A 2021-11-05 2021-11-05 Node classification method and system for multi-granularity attribute network embedding based on attribute continuous edges Pending CN114037008A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111305905.6A CN114037008A (en) 2021-11-05 2021-11-05 Node classification method and system for multi-granularity attribute network embedding based on attribute continuous edges

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111305905.6A CN114037008A (en) 2021-11-05 2021-11-05 Node classification method and system for multi-granularity attribute network embedding based on attribute continuous edges

Publications (1)

Publication Number Publication Date
CN114037008A true CN114037008A (en) 2022-02-11

Family

ID=80142904

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111305905.6A Pending CN114037008A (en) 2021-11-05 2021-11-05 Node classification method and system for multi-granularity attribute network embedding based on attribute continuous edges

Country Status (1)

Country Link
CN (1) CN114037008A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174450A (en) * 2022-07-05 2022-10-11 中孚信息股份有限公司 Unknown equipment identification method and system based on network node representation

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115174450A (en) * 2022-07-05 2022-10-11 中孚信息股份有限公司 Unknown equipment identification method and system based on network node representation
CN115174450B (en) * 2022-07-05 2023-10-03 中孚信息股份有限公司 Unknown equipment identification method and system based on network node characterization

Similar Documents

Publication Publication Date Title
Bandyopadhyay et al. Outlier resistant unsupervised deep architectures for attributed network embedding
US11163803B2 (en) Higher-order graph clustering
CN107153713B (en) Overlapping community detection method and system based on similitude between node in social networks
CN110263280B (en) Multi-view-based dynamic link prediction depth model and application
Alzahrani et al. Community detection in bipartite networks: Algorithms and case studies
Bhagat et al. Node classification in social networks
CN110674407B (en) Hybrid recommendation method based on graph convolution neural network
WO2019137185A1 (en) Image screening method and apparatus, storage medium and computer device
CN106815310A (en) A kind of hierarchy clustering method and system to magnanimity document sets
CN111932386A (en) User account determining method and device, information pushing method and device, and electronic equipment
CN112487200B (en) Improved deep recommendation method containing multi-side information and multi-task learning
CN111178399A (en) Data processing method and device, electronic equipment and computer readable storage medium
CN115358487A (en) Federal learning aggregation optimization system and method for power data sharing
CN111310068A (en) Social network node classification method based on dynamic graph
Chi et al. Hashing for adaptive real-time graph stream classification with concept drifts
CN104008177B (en) Rule base structure optimization and generation method and system towards linguistic indexing of pictures
Gupta et al. Seed community identification framework for community detection over social media
CN114037008A (en) Node classification method and system for multi-granularity attribute network embedding based on attribute continuous edges
Rahebi et al. Digital image edge detection using an ant colony optimization based on genetic algorithm
Long et al. Relational data clustering: models, algorithms, and applications
CN111831758B (en) Node classification method and device based on rapid hierarchical attribute network representation learning
CN111597428A (en) Recommendation method for splicing user and article with q-separation k sparsity
Meena et al. A survey on community detection algorithm and its applications
CN113807370B (en) Data processing method, apparatus, device, storage medium and computer program product
CN115168609A (en) Text matching method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination