CN110830291B - Node classification method of heterogeneous information network based on meta-path - Google Patents
Node classification method of heterogeneous information network based on meta-path Download PDFInfo
- Publication number
- CN110830291B CN110830291B CN201911043848.1A CN201911043848A CN110830291B CN 110830291 B CN110830291 B CN 110830291B CN 201911043848 A CN201911043848 A CN 201911043848A CN 110830291 B CN110830291 B CN 110830291B
- Authority
- CN
- China
- Prior art keywords
- meta
- path
- nodes
- node
- heterogeneous information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Images
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/142—Network analysis or design using statistical or mathematical methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
- H04L41/145—Network analysis or design involving simulating, designing, planning or modelling of a network
Abstract
The invention provides a node classification method of a heterogeneous information network based on a meta path, and relates to the technical field of deep learning and network embedding. The method comprises the steps of firstly, acquiring all meta-paths in a heterogeneous information network to obtain a meta-path set, and increasing the number of meta-paths between nodes in the obtained meta-path set; then determining a feature vector of each meta path; and finally, acquiring a characteristic vector representation mode of the nodes in the heterogeneous information network according to the characteristic vector obtained by the meta-path, and classifying the nodes in the meta-path by using a convolutional neural network. The method of the invention obtains the paths between the nodes by using the meta-path, simplifies the training process of the heterogeneous information network to a certain extent, and improves the accuracy of the final classification result.
Description
Technical Field
The invention relates to the technical field of deep learning and network embedding, in particular to a node classification method of a heterogeneous information network based on a meta-path.
Background
With the rapid development of social networks and knowledge networks, people pay more attention to network model structures. The heterogeneous information network is a novel network modeling and analyzing method which has a more complex structure and more abundant information quantity and can describe problems by more characteristics. Existing heterogeneous information networks mainly include conference-paper-author data set (DBLP), merchant-commodity-purchaser data set, and the like. In a heterogeneous information network, the heterogeneous characteristics and complexity of nodes and connections make a node classification method more difficult, the nodes cannot be classified in a network searched by directly using a method in a homogeneous network, otherwise, some information in the network is lost, and the integrity of the network cannot be ensured. Therefore, how to consider and quantitatively analyze different types of nodes and connections simultaneously and establish a more accurate heterogeneous information network becomes a difficult point of research.
Disclosure of Invention
The technical problem to be solved by the present invention is to provide a node classification method for heterogeneous information networks based on meta-paths, aiming at the defects of the prior art, so as to realize the classification of nodes in the heterogeneous information networks.
In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a node classification method of heterogeneous information network based on meta-path includes:
acquiring all meta paths in a heterogeneous information network to obtain a meta path set;
adding the number of meta-paths between the nodes in the obtained meta-path set to obtain an extended meta-path set;
determining a feature vector of each meta path according to the obtained extended meta path set with the increased meta path number;
acquiring a characteristic vector representation mode of nodes in a heterogeneous information network according to the characteristic vector obtained by the meta-path, and classifying the nodes in the meta-path by using a convolutional neural network;
the meta-path in the heterogeneous information network is acquired through a breadth-first traversal algorithm;
the method for obtaining the meta-path by the breadth-first traversal algorithm further comprises the following steps: setting a fixed path length to enable a source node and a last node in the meta-path to be nodes of the same type, wherein the set path length is 3 or 5, and obtaining a meta-path set;
the method comprises the steps of adding virtual connecting edges and weights in nodes of similar types to determine the similarity of the two nodes, comparing the similarity with a threshold value to determine whether the two nodes can be connected to form a connecting edge, and further increasing the number of meta-paths between the nodes.
Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: according to the node classification method of the heterogeneous information network based on the meta-path, provided by the invention, the path between the nodes is obtained by using a simple mode of the meta-path, so that the network training process is simplified to a certain extent, and the accuracy of the final classification result is improved.
Drawings
Fig. 1 is a flowchart of a node classification method for a heterogeneous information network based on a meta path according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating meta-paths of different types in a meta-path set according to an embodiment of the present invention;
fig. 3 is a schematic diagram of increasing the number of meta-paths between nodes according to an embodiment of the present invention.
Detailed Description
The following detailed description of embodiments of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention, but are not intended to limit the scope of the invention.
A node classification method of heterogeneous information network based on meta path, as shown in fig. 1, includes the following steps:
step 1: acquiring all meta-paths in the heterogeneous information network to obtain a meta-path set;
in this embodiment, the heterogeneous information network is an example of an existing DBLP (digital base systems and Logic Programming) network, the DBLP is a computer-like english document integration system with research results and authors as a core in the computer field, as shown in fig. 2, three types of nodes including a periodical and conference node V, an author node a, and a thesis node P, and there is no connecting edge between nodes of the same type; authors write papers, and submit their postings to meetings and periodicals, and meeting conferences; in this embodiment, the author nodes are classified according to papers and meetings or periodicals submitted by authors, so as to implement division of authors.
Assume a heterogeneous information network G < a, P, V, E, W >, where a denotes a set of authors, P denotes a set of papers, V denotes a set of meetings or periodicals, E denotes a set of all edges, and W denotes weights of the edges in the heterogeneous information network. When the meta-path is obtained, a breadth-first traversal mode is utilized, a source node is an author set, a tail node is also an author set, nodes passing between the source node and the tail node are regarded as a path, and a breadth-first traversal algorithm is adopted to traverse the path between authors. In addition, it is necessary to set the path length, and when the path length between the author and the operator is greater than the set fixed length, the traversal of the node path is stopped. In this embodiment, a meta-path with an actual length of 3 or 5 is taken as an example for explanation, and paths similar to "A1-P1-V1-P2-A3", "A3-P2-V2-P3-A5", "A1-P1-A2" and the like can be obtained, and in fact, a manner of author-paper-conference-paper-author and author-paper-author is established, which means that different papers of different authors may be put into the same conference or journal, and the same paper includes different authors, and a meta-path relationship is established, so as to obtain a meta-path set.
And 2, step: increasing the number of paths between nodes in the obtained meta-path set;
the method comprises the following steps of expanding the obtained path set on the basis of the obtained path set, so that the problem of sparsity of the conventional meta-path is solved, and path information and learnable strategies are enriched.
In this step, adding edges between nodes of the same type to increase the number of paths, generally adding virtual edges between nodes of similar type to increase fixed length paths, and obtaining an increased meta-path set, as shown in fig. 3; in fig. 3, it can be seen that the similarity matching is performed on two nodes by using an artificially set threshold, and the similarity matching method in the nodes is as follows:
order meta pathA, P and V represent different types of nodes,for connecting source node P g And end node P h The similarity of two nodes is calculated by the following two equations:
wherein, the first and the second end of the pipe are connected with each other,is a pathMiddle node P g 、P h The similarity of (2); deg (P) g ) For nodes P in heterogeneous information networks g Degree of (c), x (P) g ,P h ) Is a feature matrixThe feature vector of (1) is selected,for establishing inter-node paths in heterogeneous information networks by path weights or statistical information based on depth traversal priority algorithmThe feature matrix of (c) is shown by the following formula:
wherein x is i,j Representing a meta-path between node i and node jNumber of (2)I =1, 2, \8230, n, j =1, 2, \8230, n is the total number of nodes in the heterogeneous information network.
In this embodiment, the similarity between two nodes is compared with a threshold, and the threshold is selected to be 0.65; when the similarity value is larger than the threshold value, connecting the two nodes to form a connecting edge; otherwise, no continuous edge is added. And then, acquiring the meta-path where the node connected with the edge is positioned by utilizing the added connecting edge to acquire a new meta-path set.
And 3, step 3: determining a feature vector of each node in the heterogeneous information network according to the obtained extended meta-path set with the increased meta-path number;
updating the feature matrix of the meta-path between the nodes according to the generated extended meta-path set, and finding the row where the node is located in the feature matrix to form a row vector of the node; and adding weight information to the nodes connected with the node to represent the importance degree of the node connection. Obtaining a weight eigenvector of the node according to the weight matrix in the heterogeneous information network, thereby obtaining the weight-based eigenvector of each node, wherein the following formula is shown:
wherein X i Weight-based feature vector for ith node in heterogeneous information network, W i Representing a weight vector formed by the connection weights with other nodes,for the row vector associated with the node in the feature matrix of the meta-path, W i T Andand multiplying to obtain the feature vector of the node.
And 4, step 4: representing the heterogeneous information network into a vector form according to the feature vector of each node obtained by the meta-path, performing label processing on part of node features, dividing the nodes into fixed classes, taking the feature vector of the node containing the label as a training set, taking the rest of nodes without the label as a test set, inputting the training set into a convolutional neural network model, classifying the same class of nodes of the heterogeneous information network by using a softmax classifier to obtain a trained model, and then inputting the test set into the trained model to obtain a final classification result; the convolutional neural network model comprises three convolutional layers and two pooling layers.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some or all of the technical features may be equivalently replaced; such modifications or substitutions do not depart from the spirit of the invention, which is defined by the claims.
Claims (1)
1. A node classification method of heterogeneous information network based on meta-path is characterized in that: the specific method comprises the following steps:
acquiring all meta paths in a heterogeneous information network to obtain a meta path set;
adding the number of meta-paths between the nodes in the obtained meta-path set to obtain an extended meta-path set;
determining a feature vector of each node in the heterogeneous information network according to the obtained extended meta-path set with the increased meta-path number;
representing the heterogeneous information network into a vector form according to the feature vector of each node obtained by the meta-path, performing label processing on part of node features, taking the feature vector of the node containing a label as a training set, taking the rest of nodes without the label as a test set, inputting the training set into a convolutional neural network model, classifying the same type of nodes of the heterogeneous information network by using a softmax classifier to obtain a trained model, and then inputting the test set into the trained model to obtain a final classification result; the convolutional neural network model comprises three convolutional layers and two pooling layers;
the number of the element paths between the nodes is increased in the obtained element path set, the similarity of the two nodes is determined by adding virtual connecting edges and weights in the nodes of similar types, and then the virtual connecting edges and the weights are compared with a threshold value to determine whether the two nodes can be connected to form a connecting edge, so that the number of the element paths between the nodes is increased;
the meta path in the heterogeneous information network is obtained through a breadth-first traversal algorithm;
the method for obtaining the meta-path by the breadth-first traversal algorithm further comprises the following steps: and setting the fixed path length to ensure that the source node and the tail node in the meta-path are the same type of node, and the set path length is 3 or 5 to obtain a meta-path set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911043848.1A CN110830291B (en) | 2019-10-30 | 2019-10-30 | Node classification method of heterogeneous information network based on meta-path |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911043848.1A CN110830291B (en) | 2019-10-30 | 2019-10-30 | Node classification method of heterogeneous information network based on meta-path |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110830291A CN110830291A (en) | 2020-02-21 |
CN110830291B true CN110830291B (en) | 2023-01-10 |
Family
ID=69551222
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911043848.1A Active CN110830291B (en) | 2019-10-30 | 2019-10-30 | Node classification method of heterogeneous information network based on meta-path |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110830291B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112232492B (en) * | 2020-10-30 | 2022-04-12 | 北京邮电大学 | Decoupling-based heterogeneous network embedding method and device and electronic equipment |
CN113869461B (en) * | 2021-07-21 | 2024-03-12 | 中国人民解放军国防科技大学 | Author migration classification method for scientific cooperation heterogeneous network |
CN115314398B (en) * | 2022-09-29 | 2022-12-23 | 南昌航空大学 | Method for evaluating key nodes of heterogeneous information network |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145527A (en) * | 2017-04-14 | 2017-09-08 | 东南大学 | Link prediction method based on first path in alignment isomery social networks |
CN109558494A (en) * | 2018-10-29 | 2019-04-02 | 中国科学院计算机网络信息中心 | A kind of scholar's name disambiguation method based on heterogeneous network insertion |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10656979B2 (en) * | 2016-03-31 | 2020-05-19 | International Business Machines Corporation | Structural and temporal semantics heterogeneous information network (HIN) for process trace clustering |
-
2019
- 2019-10-30 CN CN201911043848.1A patent/CN110830291B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107145527A (en) * | 2017-04-14 | 2017-09-08 | 东南大学 | Link prediction method based on first path in alignment isomery social networks |
CN109558494A (en) * | 2018-10-29 | 2019-04-02 | 中国科学院计算机网络信息中心 | A kind of scholar's name disambiguation method based on heterogeneous network insertion |
Non-Patent Citations (2)
Title |
---|
刘京旋.基于元路径的异质网分类与计算方法研究.《中国优秀博硕士学位论文全文数据库(硕士)基础科学辑》.2019,(第5期),第9-37页. * |
基于元路径的异质网分类与计算方法研究;刘京旋;《中国优秀博硕士学位论文全文数据库(硕士)基础科学辑》;20190515(第5期);第9-37页 * |
Also Published As
Publication number | Publication date |
---|---|
CN110830291A (en) | 2020-02-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109615014B (en) | KL divergence optimization-based 3D object data classification system and method | |
CN110830291B (en) | Node classification method of heterogeneous information network based on meta-path | |
CN112990280B (en) | Class increment classification method, system, device and medium for image big data | |
CN111488734A (en) | Emotional feature representation learning system and method based on global interaction and syntactic dependency | |
CN112417157B (en) | Emotion classification method of text attribute words based on deep learning network | |
CN108932950A (en) | It is a kind of based on the tag amplified sound scenery recognition methods merged with multifrequency spectrogram | |
CN112685504B (en) | Production process-oriented distributed migration chart learning method | |
WO2022042297A1 (en) | Text clustering method, apparatus, electronic device, and storage medium | |
CN110737805A (en) | Method and device for processing graph model data and terminal equipment | |
CN113743474A (en) | Digital picture classification method and system based on cooperative semi-supervised convolutional neural network | |
CN114821299B (en) | Remote sensing image change detection method | |
CN110289987B (en) | Multi-agent system network anti-attack capability assessment method based on characterization learning | |
WO2020147259A1 (en) | User portait method and apparatus, readable storage medium, and terminal device | |
CN105809200B (en) | Method and device for autonomously extracting image semantic information in bioauthentication mode | |
CN114519107A (en) | Knowledge graph fusion method combining entity relationship representation | |
CN111984790B (en) | Entity relation extraction method | |
TWI452477B (en) | Multi-label text categorization based on fuzzy similarity and k nearest neighbors | |
CN114373093A (en) | Fine-grained image classification method based on direct-push type semi-supervised deep learning | |
CN112286996A (en) | Node embedding method based on network link and node attribute information | |
CN116578708A (en) | Paper data name disambiguation algorithm based on graph neural network | |
WO2023273171A1 (en) | Image processing method and apparatus, device, and storage medium | |
CN114265954B (en) | Graph representation learning method based on position and structure information | |
CN114611668A (en) | Vector representation learning method and system based on heterogeneous information network random walk | |
CN109829500B (en) | Position composition and automatic clustering method | |
CN114548297A (en) | Data classification method, device, equipment and medium based on domain self-adaption |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |