CN110830291B - Node classification method of heterogeneous information network based on meta-path - Google Patents

Node classification method of heterogeneous information network based on meta-path Download PDF

Info

Publication number
CN110830291B
CN110830291B CN201911043848.1A CN201911043848A CN110830291B CN 110830291 B CN110830291 B CN 110830291B CN 201911043848 A CN201911043848 A CN 201911043848A CN 110830291 B CN110830291 B CN 110830291B
Authority
CN
China
Prior art keywords
meta
path
nodes
node
heterogeneous information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911043848.1A
Other languages
Chinese (zh)
Other versions
CN110830291A (en
Inventor
姜彦吉
郭羽含
张家欣
张琪虹
孙涵莆
胡鑫泽
王嘉宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning Technical University
Original Assignee
Liaoning Technical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning Technical University filed Critical Liaoning Technical University
Priority to CN201911043848.1A priority Critical patent/CN110830291B/en
Publication of CN110830291A publication Critical patent/CN110830291A/en
Application granted granted Critical
Publication of CN110830291B publication Critical patent/CN110830291B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/142Network analysis or design using statistical or mathematical methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Abstract

The invention provides a node classification method of a heterogeneous information network based on a meta path, and relates to the technical field of deep learning and network embedding. The method comprises the steps of firstly, acquiring all meta-paths in a heterogeneous information network to obtain a meta-path set, and increasing the number of meta-paths between nodes in the obtained meta-path set; then determining a feature vector of each meta path; and finally, acquiring a characteristic vector representation mode of the nodes in the heterogeneous information network according to the characteristic vector obtained by the meta-path, and classifying the nodes in the meta-path by using a convolutional neural network. The method of the invention obtains the paths between the nodes by using the meta-path, simplifies the training process of the heterogeneous information network to a certain extent, and improves the accuracy of the final classification result.

Description

Node classification method of heterogeneous information network based on meta-path
Technical Field
The invention relates to the technical field of deep learning and network embedding, in particular to a node classification method of a heterogeneous information network based on a meta-path.
Background
With the rapid development of social networks and knowledge networks, people pay more attention to network model structures. The heterogeneous information network is a novel network modeling and analyzing method which has a more complex structure and more abundant information quantity and can describe problems by more characteristics. Existing heterogeneous information networks mainly include conference-paper-author data set (DBLP), merchant-commodity-purchaser data set, and the like. In a heterogeneous information network, the heterogeneous characteristics and complexity of nodes and connections make a node classification method more difficult, the nodes cannot be classified in a network searched by directly using a method in a homogeneous network, otherwise, some information in the network is lost, and the integrity of the network cannot be ensured. Therefore, how to consider and quantitatively analyze different types of nodes and connections simultaneously and establish a more accurate heterogeneous information network becomes a difficult point of research.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a node classification method for heterogeneous information networks based on meta-paths, aiming at the defects of the prior art, so as to realize the classification of nodes in the heterogeneous information networks.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a node classification method of heterogeneous information network based on meta-path includes:
acquiring all meta paths in a heterogeneous information network to obtain a meta path set;
adding the number of meta-paths between the nodes in the obtained meta-path set to obtain an extended meta-path set;
determining a feature vector of each meta path according to the obtained extended meta path set with the increased meta path number;
acquiring a characteristic vector representation mode of nodes in a heterogeneous information network according to the characteristic vector obtained by the meta-path, and classifying the nodes in the meta-path by using a convolutional neural network;
the meta-path in the heterogeneous information network is acquired through a breadth-first traversal algorithm;
the method for obtaining the meta-path by the breadth-first traversal algorithm further comprises the following steps: setting a fixed path length to enable a source node and a last node in the meta-path to be nodes of the same type, wherein the set path length is 3 or 5, and obtaining a meta-path set;
the method comprises the steps of adding virtual connecting edges and weights in nodes of similar types to determine the similarity of the two nodes, comparing the similarity with a threshold value to determine whether the two nodes can be connected to form a connecting edge, and further increasing the number of meta-paths between the nodes.
Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: according to the node classification method of the heterogeneous information network based on the meta-path, provided by the invention, the path between the nodes is obtained by using a simple mode of the meta-path, so that the network training process is simplified to a certain extent, and the accuracy of the final classification result is improved.
Drawings
Fig. 1 is a flowchart of a node classification method for a heterogeneous information network based on a meta path according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating meta-paths of different types in a meta-path set according to an embodiment of the present invention;
fig. 3 is a schematic diagram of increasing the number of meta-paths between nodes according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention, but are not intended to limit the scope of the invention.
A node classification method of heterogeneous information network based on meta path, as shown in fig. 1, includes the following steps:
step 1: acquiring all meta-paths in the heterogeneous information network to obtain a meta-path set;
in this embodiment, the heterogeneous information network is an example of an existing DBLP (digital base systems and Logic Programming) network, the DBLP is a computer-like english document integration system with research results and authors as a core in the computer field, as shown in fig. 2, three types of nodes including a periodical and conference node V, an author node a, and a thesis node P, and there is no connecting edge between nodes of the same type; authors write papers, and submit their postings to meetings and periodicals, and meeting conferences; in this embodiment, the author nodes are classified according to papers and meetings or periodicals submitted by authors, so as to implement division of authors.
Assume a heterogeneous information network G < a, P, V, E, W >, where a denotes a set of authors, P denotes a set of papers, V denotes a set of meetings or periodicals, E denotes a set of all edges, and W denotes weights of the edges in the heterogeneous information network. When the meta-path is obtained, a breadth-first traversal mode is utilized, a source node is an author set, a tail node is also an author set, nodes passing between the source node and the tail node are regarded as a path, and a breadth-first traversal algorithm is adopted to traverse the path between authors. In addition, it is necessary to set the path length, and when the path length between the author and the operator is greater than the set fixed length, the traversal of the node path is stopped. In this embodiment, a meta-path with an actual length of 3 or 5 is taken as an example for explanation, and paths similar to "A1-P1-V1-P2-A3", "A3-P2-V2-P3-A5", "A1-P1-A2" and the like can be obtained, and in fact, a manner of author-paper-conference-paper-author and author-paper-author is established, which means that different papers of different authors may be put into the same conference or journal, and the same paper includes different authors, and a meta-path relationship is established, so as to obtain a meta-path set.
And 2, step: increasing the number of paths between nodes in the obtained meta-path set;
the method comprises the following steps of expanding the obtained path set on the basis of the obtained path set, so that the problem of sparsity of the conventional meta-path is solved, and path information and learnable strategies are enriched.
In this step, adding edges between nodes of the same type to increase the number of paths, generally adding virtual edges between nodes of similar type to increase fixed length paths, and obtaining an increased meta-path set, as shown in fig. 3; in fig. 3, it can be seen that the similarity matching is performed on two nodes by using an artificially set threshold, and the similarity matching method in the nodes is as follows:
order meta path
Figure BDA0002253587880000031
A, P and V represent different types of nodes,
Figure BDA0002253587880000032
for connecting source node P g And end node P h The similarity of two nodes is calculated by the following two equations:
Figure BDA0002253587880000033
Figure BDA0002253587880000034
wherein, the first and the second end of the pipe are connected with each other,
Figure BDA0002253587880000035
is a path
Figure BDA0002253587880000036
Middle node P g 、P h The similarity of (2); deg (P) g ) For nodes P in heterogeneous information networks g Degree of (c), x (P) g ,P h ) Is a feature matrix
Figure BDA0002253587880000037
The feature vector of (1) is selected,
Figure BDA0002253587880000038
for establishing inter-node paths in heterogeneous information networks by path weights or statistical information based on depth traversal priority algorithm
Figure BDA0002253587880000039
The feature matrix of (c) is shown by the following formula:
Figure BDA00022535878800000310
wherein x is i,j Representing a meta-path between node i and node j
Figure BDA00022535878800000311
Number of (2)I =1, 2, \8230, n, j =1, 2, \8230, n is the total number of nodes in the heterogeneous information network.
In this embodiment, the similarity between two nodes is compared with a threshold, and the threshold is selected to be 0.65; when the similarity value is larger than the threshold value, connecting the two nodes to form a connecting edge; otherwise, no continuous edge is added. And then, acquiring the meta-path where the node connected with the edge is positioned by utilizing the added connecting edge to acquire a new meta-path set.
And 3, step 3: determining a feature vector of each node in the heterogeneous information network according to the obtained extended meta-path set with the increased meta-path number;
updating the feature matrix of the meta-path between the nodes according to the generated extended meta-path set, and finding the row where the node is located in the feature matrix to form a row vector of the node; and adding weight information to the nodes connected with the node to represent the importance degree of the node connection. Obtaining a weight eigenvector of the node according to the weight matrix in the heterogeneous information network, thereby obtaining the weight-based eigenvector of each node, wherein the following formula is shown:
Figure BDA00022535878800000312
wherein X i Weight-based feature vector for ith node in heterogeneous information network, W i Representing a weight vector formed by the connection weights with other nodes,
Figure BDA00022535878800000313
for the row vector associated with the node in the feature matrix of the meta-path, W i T And
Figure BDA00022535878800000314
and multiplying to obtain the feature vector of the node.
And 4, step 4: representing the heterogeneous information network into a vector form according to the feature vector of each node obtained by the meta-path, performing label processing on part of node features, dividing the nodes into fixed classes, taking the feature vector of the node containing the label as a training set, taking the rest of nodes without the label as a test set, inputting the training set into a convolutional neural network model, classifying the same class of nodes of the heterogeneous information network by using a softmax classifier to obtain a trained model, and then inputting the test set into the trained model to obtain a final classification result; the convolutional neural network model comprises three convolutional layers and two pooling layers.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit of the invention, which is defined by the claims.

Claims (1)

1. A node classification method of heterogeneous information network based on meta-path is characterized in that: the specific method comprises the following steps:
acquiring all meta paths in a heterogeneous information network to obtain a meta path set;
adding the number of meta-paths between the nodes in the obtained meta-path set to obtain an extended meta-path set;
determining a feature vector of each node in the heterogeneous information network according to the obtained extended meta-path set with the increased meta-path number;
representing the heterogeneous information network into a vector form according to the feature vector of each node obtained by the meta-path, performing label processing on part of node features, taking the feature vector of the node containing a label as a training set, taking the rest of nodes without the label as a test set, inputting the training set into a convolutional neural network model, classifying the same type of nodes of the heterogeneous information network by using a softmax classifier to obtain a trained model, and then inputting the test set into the trained model to obtain a final classification result; the convolutional neural network model comprises three convolutional layers and two pooling layers;
the number of the element paths between the nodes is increased in the obtained element path set, the similarity of the two nodes is determined by adding virtual connecting edges and weights in the nodes of similar types, and then the virtual connecting edges and the weights are compared with a threshold value to determine whether the two nodes can be connected to form a connecting edge, so that the number of the element paths between the nodes is increased;
the meta path in the heterogeneous information network is obtained through a breadth-first traversal algorithm;
the method for obtaining the meta-path by the breadth-first traversal algorithm further comprises the following steps: and setting the fixed path length to ensure that the source node and the tail node in the meta-path are the same type of node, and the set path length is 3 or 5 to obtain a meta-path set.
CN201911043848.1A 2019-10-30 2019-10-30 Node classification method of heterogeneous information network based on meta-path Active CN110830291B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911043848.1A CN110830291B (en) 2019-10-30 2019-10-30 Node classification method of heterogeneous information network based on meta-path

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911043848.1A CN110830291B (en) 2019-10-30 2019-10-30 Node classification method of heterogeneous information network based on meta-path

Publications (2)

Publication Number Publication Date
CN110830291A CN110830291A (en) 2020-02-21
CN110830291B true CN110830291B (en) 2023-01-10

Family

ID=69551222

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911043848.1A Active CN110830291B (en) 2019-10-30 2019-10-30 Node classification method of heterogeneous information network based on meta-path

Country Status (1)

Country Link
CN (1) CN110830291B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112232492B (en) * 2020-10-30 2022-04-12 北京邮电大学 Decoupling-based heterogeneous network embedding method and device and electronic equipment
CN113869461B (en) * 2021-07-21 2024-03-12 中国人民解放军国防科技大学 Author migration classification method for scientific cooperation heterogeneous network
CN115314398B (en) * 2022-09-29 2022-12-23 南昌航空大学 Method for evaluating key nodes of heterogeneous information network

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145527A (en) * 2017-04-14 2017-09-08 东南大学 Link prediction method based on first path in alignment isomery social networks
CN109558494A (en) * 2018-10-29 2019-04-02 中国科学院计算机网络信息中心 A kind of scholar's name disambiguation method based on heterogeneous network insertion

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10656979B2 (en) * 2016-03-31 2020-05-19 International Business Machines Corporation Structural and temporal semantics heterogeneous information network (HIN) for process trace clustering

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145527A (en) * 2017-04-14 2017-09-08 东南大学 Link prediction method based on first path in alignment isomery social networks
CN109558494A (en) * 2018-10-29 2019-04-02 中国科学院计算机网络信息中心 A kind of scholar's name disambiguation method based on heterogeneous network insertion

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
刘京旋.基于元路径的异质网分类与计算方法研究.《中国优秀博硕士学位论文全文数据库(硕士)基础科学辑》.2019,(第5期),第9-37页. *
基于元路径的异质网分类与计算方法研究;刘京旋;《中国优秀博硕士学位论文全文数据库(硕士)基础科学辑》;20190515(第5期);第9-37页 *

Also Published As

Publication number Publication date
CN110830291A (en) 2020-02-21

Similar Documents

Publication Publication Date Title
CN109615014B (en) KL divergence optimization-based 3D object data classification system and method
CN110830291B (en) Node classification method of heterogeneous information network based on meta-path
CN112990280B (en) Class increment classification method, system, device and medium for image big data
CN111488734A (en) Emotional feature representation learning system and method based on global interaction and syntactic dependency
CN112417157B (en) Emotion classification method of text attribute words based on deep learning network
CN108932950A (en) It is a kind of based on the tag amplified sound scenery recognition methods merged with multifrequency spectrogram
CN112685504B (en) Production process-oriented distributed migration chart learning method
WO2022042297A1 (en) Text clustering method, apparatus, electronic device, and storage medium
CN110737805A (en) Method and device for processing graph model data and terminal equipment
CN113743474A (en) Digital picture classification method and system based on cooperative semi-supervised convolutional neural network
CN114821299B (en) Remote sensing image change detection method
CN110289987B (en) Multi-agent system network anti-attack capability assessment method based on characterization learning
WO2020147259A1 (en) User portait method and apparatus, readable storage medium, and terminal device
CN105809200B (en) Method and device for autonomously extracting image semantic information in bioauthentication mode
CN114519107A (en) Knowledge graph fusion method combining entity relationship representation
CN111984790B (en) Entity relation extraction method
TWI452477B (en) Multi-label text categorization based on fuzzy similarity and k nearest neighbors
CN114373093A (en) Fine-grained image classification method based on direct-push type semi-supervised deep learning
CN112286996A (en) Node embedding method based on network link and node attribute information
CN116578708A (en) Paper data name disambiguation algorithm based on graph neural network
WO2023273171A1 (en) Image processing method and apparatus, device, and storage medium
CN114265954B (en) Graph representation learning method based on position and structure information
CN114611668A (en) Vector representation learning method and system based on heterogeneous information network random walk
CN109829500B (en) Position composition and automatic clustering method
CN114548297A (en) Data classification method, device, equipment and medium based on domain self-adaption

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant