CN110956199A - Node classification method based on sampling subgraph network - Google Patents

Node classification method based on sampling subgraph network Download PDF

Info

Publication number
CN110956199A
CN110956199A CN201911068473.4A CN201911068473A CN110956199A CN 110956199 A CN110956199 A CN 110956199A CN 201911068473 A CN201911068473 A CN 201911068473A CN 110956199 A CN110956199 A CN 110956199A
Authority
CN
China
Prior art keywords
network
sgn
graph
node
order
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911068473.4A
Other languages
Chinese (zh)
Inventor
宣琦
王金焕
裘坤锋
单雅璐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University of Technology ZJUT
Original Assignee
Zhejiang University of Technology ZJUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University of Technology ZJUT filed Critical Zhejiang University of Technology ZJUT
Priority to CN201911068473.4A priority Critical patent/CN110956199A/en
Publication of CN110956199A publication Critical patent/CN110956199A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/29Graphical models, e.g. Bayesian networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A node classification method based on a sampling subgraph network comprises the following steps: s1 walk sample; s2 construction of simple sub-graph SGN0(ii) a S3 construction of first-order sub-graph network SGN1(ii) a S4 construction of second-order sub-graph network SGN2(ii) a S5, extracting characteristics of the network diagram; s6 averaging the feature vectors; s7 forming a characterization matrix; s8 feature vector space expansion; s9, adopting classifier model limit random tree in machine learning, adopting ten-fold cross validation to the representation results of all nodes of the original network graph, and obtaining classification precision. The invention provides a node classification method based on a sampling sub-graph network, which converts a node classification problem into a graph classification problem by using a sequence generated by walking, fully utilizes potential structure information, enhances the classification effect of the traditional walking method, further introduces a sub-graph network SGN to perform network feature space expansion and improves the classification precision.

Description

Node classification method based on sampling subgraph network
Technical Field
The invention relates to network science, data mining and data analysis technologies, in particular to a node classification method based on a sampling subgraph network.
Background
In real life, a series of purposeful analyses can be performed on the network nodes such as social networks, citation networks and biological networks through a method for constructing a complex network model, and the analysis tasks involve research on the network nodes, such as node classification. In a typical node classification task, the main concern is the most likely belonging label of a node. For example, in a social network, different nodes represent different users, and through prediction of node tags, an advertisement operator can deduce interests and hobbies of the users, so that targeted popularization is performed. Therefore, the node classification problem of the research network graph has important significance in real life.
The current algorithms for Node classification include walk-based network representation learning algorithms, such as deep walk and Node2vec, which mainly analogize nodes in the network into words in natural language processing according to the theoretical basis of Word2vec, and represent nodes in the network as low-dimensional feature vectors by using a Word vector generation method. The class ratio algorithm realizes the low-dimensional vector representation from the network node to the Euclidean space, and is convenient for executing network analysis tasks such as node classification and the like by a machine learning method in the follow-up process.
At the present stage, the node classification method in the social network does not fully utilize local structural information around the node, neglects the characteristic of multilevel of the node characteristics in the network, and cannot capture node information in a deeper network structure, so that the node classification problem cannot be embodied with greater superiority.
Therefore, the invention utilizes the walk strategy to construct the local self-network of the nodes and creatively converts the node classification problem into the graph classification problem. In addition, the invention also carries out high-order mapping on a local network obtained by node migration based on the sub-graph network SGN to capture a deeper topological structure, expands the structural feature space of the nodes by combining various feature extraction methods, and effectively supplements the node features in the original network so as to improve the node classification precision.
Disclosure of Invention
In order to overcome the defect that the characteristic information of nodes in a network cannot be fully utilized by the existing node classification method based on wandering, the invention provides a node classification method based on a sampling subgraph network, which converts the node classification problem into a network graph classification problem, fully uses the existing structural information of the network graph, and enriches the research work of node classification.
The technical scheme adopted by the invention for realizing the aim is as follows:
a node classification method based on a sampling subgraph network comprises the following steps:
s1: wandering sampling, namely wandering sampling is carried out on each Node in an original network graph G (V, E) based on random wandering in deep walk or a second-order biased wandering strategy in Node2vec to obtain a corresponding sampling sequence with the length of L, and each Node is repeatedly sampled for M times;
s2: construction of simple subgraph SGN0Sequentially considering each sampling sequence, extracting nodes contained in the original network graph and a network graph formed by connecting edges between the nodes, namely a simple subgraph, from the original network graph, and obtaining a simple subgraph SGN by each sampling sequence0
S3: construction of first-order subgraph network SGN1Considering each simple sub-graph SGN in step S2 in turn0First order mapping it, SGN0All connected edges in (1) are mapped to SGN1Different node in (1), if simple sub-graph SGN0Two connecting edges in the network share the same node, and the node corresponds to the SGN1The two nodes in the network are connected to form a finished first-order sub-graph network SGN1Each SGN0Obtaining an SGN1
S4: construction of second-order subgraph network SGN2Considering each first-order sub-graph network SGN in step S3 in turn1Second order mapping it, SGN1All the connecting edges in the network are regarded as different nodes, if the first-order subgraph network SGN1Two connecting edges in the network share the same node, and then the connecting edge is added between two nodes converted from the two connecting edges, and the nodes and the connecting edge form a second-order subgraph network SGN2Each SGN1Obtaining an SGN2
S5: extracting features of network graph, and respectively extracting all simple sub-graphs SGN by using feature extraction method0A toHierarchical sub-graph network SGN1And second-order subgraph network SGN2Extracting features, and respectively obtaining V multiplied by M K-dimensional feature vectors;
s6: averaging the feature vectors, and separately processing the feature vectors from the simple sub-graph SGN in step S50First-order subgraph network SGN1And second-order subgraph network SGN2The extracted feature vector belongs to the original network graph G ═<V,E>Carrying out averaging processing on M K-dimensional feature vectors of the same node, and finally obtaining a K-dimensional feature vector by each node belonging to the original network graph under the three sub-graphs;
s7: forming a characterization matrix, SGN in a simple sub-graph0First-order subgraph network SGN1And second-order subgraph network SGN2In these three cases, the original network graph G is set to<V,E>The characterization vectors of all nodes form a characterization matrix phi0∈RV×K、Φ1∈RV×KAnd phi2∈RV×K
S8: and (3) feature vector space expansion, namely performing feature space expansion on the characterization matrix learned from each seed graph network in a transverse splicing mode to obtain a network characterization matrix phi (phi) merge (phi) of all nodes012)∈RV×3K
S9: and adopting a classifier model limit random tree in machine learning, and adopting ten-fold cross validation on the representation results of all nodes of the original network graph to obtain classification precision.
Further, in step S1, the element in each sampling sequence is a node label of the original network graph, and the same node label may appear multiple times, and when calculating the sequence length, the node repeatedly appears and is also calculated as the effective length.
The invention provides a node classification method based on a sampling subgraph network. The method constructs a new idea of a network graph by using a wandering sampling sequence, and converts a node classification task into a graph classification task. And the feature space of the original network is expanded through the sub-graph network SGN, so that the classification precision of the network graph is improved.
The invention has the beneficial effects that: by means of wandering sampling, a sampling sequence is obtained for each node of an original network graph and is converted into the network graph, a new idea that a node classification problem is converted into a network graph classification problem is provided, and then by constructing an SGN, a network characteristic space is expanded, potential structure information is fully utilized, and the performance of a traditional wandering method is enhanced. In addition, the invention can fuse a plurality of feature extraction methods and classify the nodes by adopting the extreme random tree algorithm in machine learning, thereby effectively improving the node classification precision compared with the prior art.
Drawings
Fig. 1 is a design flow chart of a node classification method based on a sampling subgraph network in the invention.
FIG. 2 is a diagram of the extraction of simple subgraph SGN in the present invention0First-order subgraph network SGN1And second-order subgraph network SGN2Wherein (a) represents the original network, (b) represents the wandering sampling of node 1, (c) represents a sequence derived from the wandering, and (d) represents a simple sub-graph SGN extracted from the original network graph0And (e) shows SGN from a simple subgraph0First-order sub-graph network SGN extracted from the network1And (f) represents SGN from a first-order subgraph network1Second-order sub-graph network SGN extracted from the network2
Detailed Description
The following detailed description of embodiments of the invention is provided in connection with the accompanying drawings.
Referring to fig. 1 and fig. 2, a node classification method based on a sampling subgraph network is described by taking a social network as an example, wherein a node represents a member in the social network, and a connecting edge represents a friendship between members. All node members can be divided into two categories according to the community to which they belong. Thus, the node classification algorithm may identify community members. The invention carries out social network modeling on the Karate data set, G is (V, E), V represents a node set, each node represents a member, E represents a continuous edge set, each continuous edge represents interaction between two members, and further carries out analysis of converting the node into a network graph and extracting a subgraph network.
The invention comprises the following nine steps:
step 1: wandering sampling;
step 2: construction of simple subgraph SGN0
And step 3: construction of first-order subgraph network SGN1
And 4, step 4: construction of second-order subgraph network SGN2
And 5: extracting features of the network graph;
step 6: averaging the feature vectors;
and 7: forming a characterization matrix;
and 8: expanding a feature vector space;
and step 9: and adopting a classifier model limit random tree in machine learning, and adopting ten-fold cross validation on the representation results of all nodes of the original network graph to obtain classification precision.
In step S1, the wander sampling process includes: based on random walk in deep walk or second-order biased walk strategy in Node2vec, each Node in an original social network graph G ═ V, E > is subjected to walk sampling to obtain a corresponding sampling sequence with the length of L, and the walk from a current Node to different next nodes for selection can have the same probability, so that different sequences can be generated in each walk, different sequences can have different information, and each Node is selected to be subjected to repeated sampling for M times.
In the step S2, a simple sub-graph SGN is constructed0Referring to fig. 2(a) (b) (c) (d), the process is: after a sampling sequence is obtained through wandering, nodes in the sequence are extracted from the original network graph, and then connecting edges formed by the nodes in the original network graph are extracted to form a simple subgraph, wherein each sampling sequence obtains a simple subgraph SGN0
In the step S3, a first-order sub-graph network SGN is constructed1Referring to fig. 2(d) (e), the process is: considering each simple sub-graph SGN in step S2 in turn0First order mapping it, SGN0All connected edges in (1) are mapped to SGN1Different node in (1), if simple sub-graph SGN0Two connected edges in the same node share the same node, then the node will correspond to the SGN1The two nodes in the network are connected to form a finished first-order sub-graph network SGN1Each SGN0Obtaining an SGN1
In the step S4, a second-order sub-graph network SGN is constructed2Referring to fig. 2(e) (f), the process is: considering each first-order sub-graph network SGN in step S3 in turn1Second order mapping it, SGN1All the connecting edges in the network are regarded as different nodes, if the first-order subgraph network SGN1Two connecting edges in the network share the same node, and then the connecting edge is added between two nodes converted from the two connecting edges, and the nodes and the connecting edge form a second-order subgraph network SGN2Each SGN1Obtaining an SGN2
In step S5, the process of characterizing the network graph includes: using graph2vec model to respectively perform SGN on all simple subgraphs0First-order subgraph network SGN1And second-order subgraph network SGN2Extracting features to respectively obtain V multiplied by M K-dimensional feature vectors;
in step S6, the averaging process for the feature vectors includes: processing by simple sub-graph SGN in step S5 respectively0First-order subgraph network SGN1And second-order subgraph network SGN2The extracted feature vector belongs to the original network graph G ═<V,E>Carrying out averaging processing on M K-dimensional feature vectors of the same node, and finally obtaining a K-dimensional feature vector by each node belonging to the original network graph under the three sub-graphs;
in step S7, a characterization matrix is formed, and the specific process is as follows: in simple sub-diagram SGN0First-order subgraph network SGN1And second-order subgraph network SGN2In these three cases, the original network graph G is set to<V,E>The characterization vectors of all nodes form a characterization matrix phi0∈RV×K、Φ1∈RV×KAnd phi2∈RV×K
In the step S8, the characteristic directionThe volume space is expanded, and the process is as follows: performing feature space expansion on the characterization matrix learned from each seed graph network in a transverse splicing mode to obtain a network characterization matrix phi of all nodes, namely merge (phi)012)∈RV×3K
In step S9, a classifier model limit random tree in machine learning is used, and cross-folding cross validation is performed on the characterization results of all nodes in the original network graph, that is, data is randomly divided into 10 parts, 1 part of the data is sequentially taken as a test sample, and the remaining 9 parts are taken as training samples, so as to obtain classification accuracy.
The above is a description of an example of the node classification method based on the sampling sub-graph network in the Karate social network. The method for converting the sampling sequence into the network graph converts the node classification problem into the graph classification problem, and further introduces a sub-graph network SGN to perform network feature space expansion, thereby improving the classification precision. In addition, after the node classification task is converted into the graph classification task, potential structural information is fully utilized, and the classification effect of the traditional walking method is enhanced. The node classification method based on the sampling subgraph network provides a new scheme for the node classification task. The present invention is to be considered as illustrative and not restrictive. It will be understood by those skilled in the art that various changes, modifications and equivalents may be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims (2)

1. A node classification method based on a sampling subgraph network is characterized by comprising the following steps:
s1: wandering sampling, namely wandering sampling is carried out on each Node in an original network graph G (V, E) based on random wandering in deep walk or a second-order biased wandering strategy in Node2vec to obtain a corresponding sampling sequence with the length of L, and each Node is repeatedly sampled for M times;
s2: construction of simple subgraph SGN0Sequentially considering each sampling sequence, extracting nodes contained in the original network graph and a network formed by connecting edges of the nodes from the original network graphThe net graph, i.e. simple subgraphs, yielding one simple subgraph SGN per sample sequence0
S3: construction of first-order subgraph network SGN1Considering each simple sub-graph SGN in step S2 in turn0First order mapping it, SGN0All connected edges in (1) are mapped to SGN1Different node in (1), if simple sub-graph SGN0Two connecting edges in the network share the same node, and the node corresponds to the SGN1The two nodes in the network are connected to form a finished first-order sub-graph network SGN1Each SGN0Obtaining an SGN1
S4: construction of second-order subgraph network SGN2Considering each first-order sub-graph network SGN in step S3 in turn1Second order mapping it, SGN1All the connecting edges in the network are regarded as different nodes, if the first-order subgraph network SGN1Two connecting edges in the network share the same node, and then the connecting edge is added between two nodes converted from the two connecting edges, and the nodes and the connecting edge form a second-order subgraph network SGN2Each SGN1Obtaining an SGN2
S5: extracting features of network graph, and respectively extracting all simple sub-graphs SGN by using feature extraction method0First-order subgraph network SGN1And second-order subgraph network SGN2Extracting features, and respectively obtaining V multiplied by M K-dimensional feature vectors;
s6: averaging the feature vectors, and separately processing the feature vectors from the simple sub-graph SGN in step S50First-order subgraph network SGN1And second-order subgraph network SGN2The extracted feature vector belongs to the original network graph G ═<V,E>Carrying out averaging processing on M K-dimensional feature vectors of the same node, and finally obtaining a K-dimensional feature vector by each node belonging to the original network graph under the three sub-graphs;
s7: forming a characterization matrix, SGN in a simple sub-graph0First-order subgraph network SGN1And second-order subgraph network SGN2In these three cases, the original network graph G is set to<V,E>Is composed of the characterization vectors of all nodesCharacterization matrix phi0∈RV×K、Φ1∈RV ×KAnd phi2∈RV×K
S8: and (3) feature vector space expansion, namely performing feature space expansion on the characterization matrix learned from each seed graph network in a transverse splicing mode to obtain a network characterization matrix phi (phi) merge (phi) of all nodes012)∈RV×3K
S9: and adopting a classifier model limit random tree in machine learning, and adopting ten-fold cross validation on the representation results of all nodes of the original network graph to obtain classification precision.
2. The method of claim 1, wherein in the step S1, the element in each sampling sequence is a node label of the original network graph, the same node label may appear multiple times, and when calculating the sequence length, the node repeatedly appears and is also calculated as the effective length.
CN201911068473.4A 2019-11-05 2019-11-05 Node classification method based on sampling subgraph network Pending CN110956199A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911068473.4A CN110956199A (en) 2019-11-05 2019-11-05 Node classification method based on sampling subgraph network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911068473.4A CN110956199A (en) 2019-11-05 2019-11-05 Node classification method based on sampling subgraph network

Publications (1)

Publication Number Publication Date
CN110956199A true CN110956199A (en) 2020-04-03

Family

ID=69976563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911068473.4A Pending CN110956199A (en) 2019-11-05 2019-11-05 Node classification method based on sampling subgraph network

Country Status (1)

Country Link
CN (1) CN110956199A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380931A (en) * 2020-10-30 2021-02-19 浙江工业大学 Modulation signal classification method and system based on sub-graph network
CN114023375A (en) * 2021-03-12 2022-02-08 浙江工业大学 Width learning enzyme protein detection method and system based on global sampling subgraph

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112380931A (en) * 2020-10-30 2021-02-19 浙江工业大学 Modulation signal classification method and system based on sub-graph network
CN112380931B (en) * 2020-10-30 2024-02-20 浙江工业大学 Modulation signal classification method and system based on sub-graph network
CN114023375A (en) * 2021-03-12 2022-02-08 浙江工业大学 Width learning enzyme protein detection method and system based on global sampling subgraph

Similar Documents

Publication Publication Date Title
Wang et al. Zero-shot recognition via semantic embeddings and knowledge graphs
Zhang et al. User profile preserving social network embedding
CN112700056B (en) Complex network link prediction method, device, electronic equipment and medium
CN108170765B (en) Poverty-stricken and living fund assisting recommendation method based on multidimensional analysis of on-school behavior data
CN107368534B (en) Method for predicting social network user attributes
CN106991617B (en) Microblog social relationship extraction algorithm based on information propagation
CN107480213B (en) Community detection and user relation prediction method based on time sequence text network
CN114172688B (en) Method for automatically extracting key nodes of network threat of encrypted traffic based on GCN-DL (generalized traffic channel-DL)
CN112559764A (en) Content recommendation method based on domain knowledge graph
CN105938608A (en) Label-influence-driven semi-synchronous community discovery method
Gebhart et al. Characterizing the shape of activation space in deep neural networks
CN111008337A (en) Deep attention rumor identification method and device based on ternary characteristics
CN113314188B (en) Graph structure enhanced small sample learning method, system, equipment and storage medium
CN110956199A (en) Node classification method based on sampling subgraph network
CN111783879A (en) Hierarchical compression map matching method and system based on orthogonal attention mechanism
CN110136017A (en) A kind of group&#39;s discovery method based on data enhancing and nonnegative matrix sparse decomposition
CN108446605A (en) Double interbehavior recognition methods under complex background
CN111091005A (en) Meta-structure-based unsupervised heterogeneous network representation learning method
Blenn et al. Crawling and detecting community structure in online social networks using local information
CN108647334A (en) A kind of video social networks homology analysis method under spark platforms
CN116720975A (en) Local community discovery method and system based on structural similarity
CN113111914A (en) Graph width learning classification method and system based on global sampling subgraph
CN115544307A (en) Directed graph data feature extraction and expression method and system based on incidence matrix
CN114722920A (en) Deep map convolution model phishing account identification method based on map classification
Kavurucu A comparative study on network motif discovery algorithms

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20200403

RJ01 Rejection of invention patent application after publication