CN111598223A - Network embedding method based on attribute and structure deep fusion and model thereof - Google Patents
Network embedding method based on attribute and structure deep fusion and model thereof Download PDFInfo
- Publication number
- CN111598223A CN111598223A CN202010410196.7A CN202010410196A CN111598223A CN 111598223 A CN111598223 A CN 111598223A CN 202010410196 A CN202010410196 A CN 202010410196A CN 111598223 A CN111598223 A CN 111598223A
- Authority
- CN
- China
- Prior art keywords
- node
- attribute
- layer
- network
- node attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 48
- 230000004927 fusion Effects 0.000 title claims abstract description 27
- 239000013598 vector Substances 0.000 claims abstract description 29
- 238000013519 translation Methods 0.000 claims description 25
- 238000005295 random walk Methods 0.000 claims description 10
- 230000008569 process Effects 0.000 claims description 9
- 230000008447 perception Effects 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 4
- 241000689227 Cora <basidiomycete fungus> Species 0.000 description 16
- 230000000694 effects Effects 0.000 description 7
- 238000011160 research Methods 0.000 description 7
- 238000004458 analytical method Methods 0.000 description 6
- 238000005516 engineering process Methods 0.000 description 5
- 238000002474 experimental method Methods 0.000 description 5
- 210000002569 neuron Anatomy 0.000 description 5
- 238000013461 design Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 239000011159 matrix material Substances 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000002679 ablation Methods 0.000 description 2
- 238000007405 data analysis Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000003012 network analysis Methods 0.000 description 2
- 238000000692 Student's t-test Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000007423 decrease Effects 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000008450 motivation Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000007427 paired t-test Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 238000012353 t test Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000012800 visualization Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/25—Fusion techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Software Systems (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a network embedding method based on attribute and structure deep fusion, which comprises the following steps: s1, obtaining node attribute characteristics of a reconstructed coding layer; s2, obtaining a node attribute information sequence with node attribute characteristics; s3, translating the node attribute information sequence into a node identity sequence to obtain node embedded vector representation capable of retaining the network structure and the node attribute information of the original network; also disclosed is a network embedding model based on attribute and structure depth fusion, comprising: the system comprises a multi-mode attribute sensing module, an attribute embedding layer and a multi-hop structure sensing module; the multi-mode attribute sensing module is connected with the multi-hop structure sensing module through the attribute embedding layer; according to the invention, the similarity between the low-order structure and the node attribute can be captured, the semantic of the high-order structure and the similarity between the node attribute can also be captured, and finally the obtained node embedded vector can represent the information capable of deeply fusing the network structure and the node attribute.
Description
Technical Field
The invention relates to the technical field of network embedding, in particular to a network embedding method based on attribute and structure deep fusion and a model thereof.
Background
The field of data mining is always a field which is concerned by the public, and particularly with the occurrence of a great amount of various social network data in recent years, such as a microblog network, a transportation network and the like, research and analysis on the social network become particularly important. Because the network data has the characteristics of large scale, high nonlinearity and rich node attribute information, the traditional method for analyzing based on the network adjacency matrix is easy to have the problem of high cost of computing resources and space resources, so that effective data analysis tasks, such as network node classification and clustering tasks, recommendation tasks, visualization tasks and the like, cannot be performed. Therefore, the research of network embedding technology appears to effectively solve the problem, and aims to find a proper mapping function for mapping large-scale and high-latitude network data to a vector space at a low latitude, and simultaneously retain the inherent properties of the original network, namely learning the embedded vector representation of all network nodes. The node embedded vector representation not only can effectively retain the structure and attribute characteristics of the original network, but also can be effectively used by machine learning related technologies, thereby facilitating further research and analysis of network data.
Although there is now some very good work in network embedding research, such as a network embedding method for capturing network structure information, the deep walk method proposes to obtain a node sequence by truncating random walks, and then to learn an embedded vector representation of a network node by using a skip-gram model in natural language processing technology. The LINE method provides modeling of first-order and second-order similarity of a network structure, and considers that two nodes which are directly connected have similar vector representations, and even if the two nodes which are not connected have a common neighbor, the two nodes have similar vector representations, so that a targeted loss function is designed to obtain effective node embedding vector representations. In order to solve the problem of high nonlinearity of network data, the SDNE method utilizes a depth self-encoder technology to mine network structure information and simultaneously reserves the high nonlinearity of a network structure. However, in real network data, network nodes have abundant attribute information, and therefore methods for jointly capturing network structure and node attribute information are also proposed, for example, SNE methods propose learning the structure and attribute similarity of nodes in a social network. The ANRL method designs an attribute-aware skip-gram module based on a self-encoder to capture the similarity of network structures and node attributes. The STNE method provides a self-translation network embedding model to deeply fuse network structure and node attribute information to obtain effective node embedding vector representation.
All of the above network embedding methods, although performing well on some network analysis tasks, have several problems as follows. Firstly, most methods respectively model network structure similarity and node attribute similarity, and then merge and consider learning of final network node embedded vector representation; the second reality network has high-order structural semantic similarity, in other words, although the two nodes are far away from each other, the structural semantics in the neighbor range where the two nodes are located are similar, so that the two nodes are also similar in semantics; most of the third existing methods only consider how to merge network structure and node attribute information in an overall framework, and ignore the similarity of node attribute semantic relations in low-order structure and high-order structure semantics.
In summary, the existing network embedding method has not yet solved the above-mentioned problems, and therefore, how to provide a network embedding method and a model thereof based on attribute and structure deep fusion, which can effectively solve the above-mentioned defects, is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the invention provides a network embedding method and a model thereof based on attribute and structure deep fusion, and the network node embedding vector representation learned by the method and the model thereof can effectively retain rich attributes of the original network, and effectively utilize numerous network data analysis tasks to improve the accuracy of network analysis.
In order to achieve the purpose, the invention adopts the following technical scheme:
a network embedding method based on attribute and structure deep fusion comprises the following steps:
s1, fusing node attribute characteristics of all neighbor nodes of a current node through a heuristic method to obtain fused node attribute characteristics, and encoding and decoding the node attribute characteristics of each node in a network by using a depth automatic encoder so as to reconstruct the decoded node attribute characteristics and the fused node attribute characteristics to obtain node attribute characteristics of a reconstructed encoding layer;
s2, fusing the node attribute characteristics of the reconstructed coding layer with a node sequence generated by random walk to obtain a node attribute information sequence with the node attribute characteristics;
s3, inputting the node attribute information sequence into a self-translation frame, and constructing a self-translation process from the node attribute information sequence to a node identity sequence to obtain node embedded vector representation capable of retaining the network structure and the node attribute information of the original network;
and S4, verifying through the real network data set.
Preferably, in S1, the specific content of the obtained fused node attribute feature is: updating each node attribute feature to be a median of all neighbor node attribute features of the current node, and expressing the fused node attribute features as follows:
Preferably, in S3, the self-translation process from the node attribute information sequence to the node identity sequence specifically includes the following steps:
and coding the node attribute information sequence, introducing an attention mechanism to perform weight distribution on the feature information learned after coding, further decoding, and finally realizing the self-translation process from the whole node attribute information sequence to the node identity sequence.
Preferably, the hidden layer representation output after the node attribute information sequence is encoded is a network node embedded vector representation.
A network embedding model based on attribute and structure depth fusion, comprising: the system comprises a multi-mode attribute sensing module, an attribute embedding layer and a multi-hop structure sensing module; the multi-mode attribute sensing module is connected with the multi-hop structure sensing module through the attribute embedding layer;
the multi-mode attribute sensing module is used for fusing node attribute characteristics of all neighbor nodes of a current node by a heuristic method to obtain fused node attribute characteristics, coding and decoding the node attribute characteristics of each node in the network by using a depth automatic coder, and further reconstructing the decoded node attribute characteristics and the fused node attribute characteristics to obtain node attribute characteristics of a reconstructed coding layer;
the attribute embedding layer is used for fusing the node attribute characteristics of the reconstructed coding layer obtained by the multi-mode attribute perception module with a node sequence generated by random walk to obtain a node attribute information sequence with the node attribute characteristics;
and the multi-hop structure sensing module is used for inputting the node attribute information sequence obtained by the attribute embedding layer into a self-translation frame, constructing a self-translation process from the node attribute information sequence to the node identity sequence, and obtaining node embedding vector representation capable of retaining the network structure and the node attribute information of the original network.
Preferably, the multi-modal attribute perception module comprises a neighbor node fusion unit, a depth automatic encoder and a reconstruction unit; the neighbor node fusion unit, the depth automatic encoder and the reconstruction unit are connected in sequence;
the neighbor node fusion unit is used for acquiring the fused node attribute characteristics;
the depth automatic encoder comprises a first encoding layer and a first decoding layer, wherein the first encoding layer is used for encoding the node attribute characteristics of each node in the network, and the first decoding layer is used for decoding the encoded node attribute characteristics to obtain decoded node attribute characteristics;
and the reconstruction unit is used for reconstructing the decoded node attribute characteristics and the fused node attribute characteristics.
Preferably, the multi-hop structure sensing module includes: a self-translation framework and attention layer;
the self-translation framework comprises a second coding layer, a second decoding layer, a translation layer and a softmax layer;
the second coding layer is used for coding the node attribute information sequence;
the attention layer is arranged between the second coding layer and the second decoding layer and is used for introducing an attention mechanism into the self-translation framework so as to perform weight distribution on the characteristic information learned after coding;
the second decoding layer is used for decoding the node attribute information sequence which is subjected to weight distribution after being encoded;
the translation layer is used for translating the node semantic feature vector obtained after the second decoding layer decodes into a node identity sequence;
and the softmax layer is used for converting the feature vectors obtained by the translation layer into probability values.
According to the technical scheme, compared with the prior art, the invention discloses the network embedding method and the model based on the attribute and structure deep fusion, and aims at the problems in the prior art, the node embedding vector representation obtained by the method and the model disclosed by the invention not only can capture the low-order structure and attribute similarity, but also can further capture the high-order structure semantic and node attribute similarity, and meanwhile, the structure and attribute characteristics of the original network can be effectively reserved, so that the accuracy and the practicability of the network are effectively ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a diagram illustrating an motivational structure of a network embedding method based on attribute and structure depth fusion according to the present invention;
FIG. 2 is a diagram illustrating an overall architecture of a network embedded model based on attribute and structure depth fusion according to the present invention;
FIG. 3 is a bar graph illustrating the impact of node classification tasks on three datasets for different attribute feature embedding dimensions input into a multi-hop structural awareness module;
FIG. 4 is a bar graph illustrating the effect of a node classification task on three data sets taking into account different walk lengths in random walks;
FIG. 5 is a line graph illustrating the differences in node classification tasks represented on three data sets, taking into account different node embedding;
FIG. 6 is a line graph illustrating the effect of the present invention on the node classification task of three data sets of a model on ablation experiments performed on different modules.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a network embedding method based on attribute and structure deep fusion and a model thereof.
As shown, fig. 1 shows a real paper indexing network. The motivation for this model is divided into two parts: first, it models a low-level network structure, that is, paper 3 and paper 5 directly index each other, and the structure relationship belongs to a first-level network structure, while paper 4 and paper 5 do not directly index each other, but they all index paper 1, paper 2 and paper 3, so the structure sharing a common neighbor belongs to a second-level network structure. In particular, the attribute semantic relation of the nodes of the paper, namely the abstract content of the paper, is also modeled. Secondly, it models high-order structure semantics, such as a graph, a paper 5 and a paper 8, even if they are far apart, there is no second-order structure relationship, but since we model low-order structure and node attribute semantic information at the same time in the foregoing, the structure semantic information can be modeled, that is, the paper 1, the paper 2 and the paper 3 indexed by the paper 5 and the paper 9, the paper 10 and the paper 11 indexed by the paper 8 are all research papers in the same research field, and then the paper 5 and the paper 8 should belong to papers in the same research field, so that such high-order structure semantic information can be captured, and the self attribute semantic relationship of the paper 5 and the paper 8 is also considered.
Fig. 2 is an overall architecture diagram of the model. As shown in fig. 2, the specific steps are as follows:
the multi-modal attribute awareness module encodes and decodes the node attribute information using a depth self-encoder. Then, a heuristic method is provided, and a new node attribute feature is obtained by integrating the attributes of the node neighbors. And finally, reconstructing the attribute characteristics of the node neighbors, but not the original attributes of the node neighbors. In the attribute embedding layer, the learned node attribute features are fused with a node sequence generated after the random walk is cut off, and a node attribute sequence is constructed. For the multi-hop structure awareness module, the embodiment uses an attention-enhancing seq2seq framework to translate the network from the node attribute sequence to the node identity sequence.
Table 1 real network data set statistical table for verifying validity of model
Data set | Number of nodes | Number of edges | Attribute feature dimension | Number of categories |
Cora | 2708 | 5429 | 1433 | 7 |
Citeseer | 3327 | 4732 | 3703 | 6 |
Wiki | 2405 | 17981 | 4973 | 17 |
As shown in table 1, the contents of the various data sets are as follows:
cora and Citeseer are two paper citation networks, where nodes are articles and edges represent citations between articles. The attributes associated with the nodes are extracted from the headlines and summaries of each article and are represented as TFIDF vectors. A Wiki is a network of web pages in which nodes represent web pages and edges represent hyperlinks between web pages. The attributes of the relevant nodes are also denoted as TFIDF vectors.
TABLE 2 framework table of multi-modal attribute perception module in this model
Data set | Number of neurons per layer |
Cora | 1433-1300-1200 |
Citeseer | 3703-2500-1200 |
Wiki | 4973-2500-1200 |
As shown in table 2, the specific architecture is as follows:
since the depth self-encoder is designed in the present embodiment in the multi-modal attribute sensing module, different depth self-encoder structures are designed for different data sets. For the Cora data set, because the attribute characteristic dimension is 1433, the coding layer is designed into three layers, the number of the neurons of each layer is 1433, 1300 and 1200 in sequence, and then for the decoding layer, the reverse is true, and the three layers are also designed, and the number of the neurons of each layer is 1200,1300 and 1433 in sequence. The self-encoder structure for the other two data sets is the same.
Table 3 architecture table of multi-hop structure sensing module in this model
Data set | Cora | Citeseer | |
Embedding dimension | |||
1200 | 1200 | 1200 | |
Forward coding layer dimension | 500 | 500 | 500 |
Number of neurons in |
1 | 2 | 1 |
Backward coding layer dimension | 500 | 500 | 500 |
Number of neurons in the |
1 | 2 | 1 |
|
1000 | 2000 | 1000 |
Dimension of |
1000 | 2000 | 1000 |
Predicting layer dimensions | 2708 | 3327 | 2405 |
As shown in table 3, the details are as follows:
in the multi-hop structure sensing module, an attention-enhanced seq2seq framework is designed in the embodiment, so that different customizations are made to structures on different data sets. Firstly, for the Cora data set, because the dimension of the final encoding layer output of the previous module is 1200, that is, as the input dimension of the module, the dimension of the forward encoding layer is set to be 500, the number of layers is 1, the dimension of the backward encoding layer is set to be 500, the number of layers is 1, the dimension of the final Bi-LSTM output is 1000, the output dimension of the decoding layer is designed to be 1000, and the output dimension of the prediction layer is designed to be 2708, that is, the number of nodes of the Cora data set. The structural design for the other two data sets is shown, for the same reason.
Table 4 is a comparison table of node classification tasks of the model (hereinafter, DASE is used as an abbreviation of the model) and the existing baseline model in three real network data sets (table 4.1, table 4.2, and table 4.3 respectively show experimental results on Cora, Citeseer, Wiki data sets), and the specific results are as follows:
TABLE 4.1(Cora)
TABLE 4.2 (Citeser)
TABLE 4.3(Wiki)
In order to comprehensively evaluate the performance of the model, the percentage of the labeled nodes in training is randomly selected from 10% to 50% in the embodiment, and the rest is the test set. This process was repeated 10 times and the average performance was reported as Micro-F1. Accordingly, it can be seen that: first, DASE always maintains optimal performance in all methods. Second, it is worth noting that DASE can also achieve significant performance when there are fewer label nodes trained. For example, on Cora and citeser, DASE increased by 2.34% and 3.92%, respectively.
Table 5 is a table of significant analysis of classification task results of this model and other baseline models on three real datasets:
TABLE 5
To demonstrate that DASE is indeed statistically superior to the baseline model, the paired t-test (confidence level α ═ 0.05) was statistically calculated in this example to verify statistical significance between the 6 baseline algorithms. In fig. seven, where each entry (value) represents an assumed value t-test between the two algorithms, and an assumed value less than a 0.05 indicates that the difference is statistically significant. As can be seen from the figure, the model in this example is significantly different from other models in three data sets except for the F1 scoring result (p-value 0.1661) using DASE and stine on citeser.
Fig. 3 is a histogram (from left to right, results in Cora, Citeseer, Wiki datasets) considering the effect of different attribute feature embedding dimensions (d) on node classification tasks on three datasets input into the multi-hop structure awareness module, with the following specific analysis:
to investigate the effect of attribute feature embedding dimensions on the three datasets when the percentage (r) of label training nodes is from 10% to 50%, this embodiment ranges from 200 to 1200 or 1600. As can be seen from the figure, the DASE is stable to the performance of Cora in different dimensions.
However, the performance on Wiki and cineseer is fluctuating and the optimal value of the embedding dimension is 1200. The reason for this is that the embedding dimension is the input to the Bi-LSTM encoder layer, and when the embedding dimension is too low, the characteristic information is lost. Conversely, when the embedding dimension is too high, a large amount of noise is generated.
Fig. 4 is a histogram (from left to right, results in Cora, Citeseer, Wiki data sets in order) that considers the effect of the node classification task on three data sets for different walk lengths (l) in random walks, with the following detailed analysis:
to investigate the effect of random walk length on the three data sets when the percentage (r) of label training nodes is from 10% to 50%, this embodiment ranges from 6 to 20. As can be seen from the figure, DASE of different lengths is stable to Cora performance. For Citeseer and Wiki, initial performance may improve as the length increases. However, when the length of the random walk is greater than 10, the performance begins to slowly decline.
FIG. 5 is a line graph (results in Cora, Citeseer, Wiki datasets from left to right) showing the difference in node classification tasks on three datasets considered for different node embeddings, with the following detailed analysis:
the model has three node representations, the output of the Bi-LSTM encoder (e), the output layer of the LSTM decoder (d), and the combination of the encoder and decoder outputs (ed). In order to explore the influence of the above different node representations, a node classification experiment was performed on the above three data sets in the present embodiment. As shown, it can be seen that the node representation experiment results at the encoder layer output are better than the experiment results represented by other nodes on all data sets.
FIG. 6 is a line graph of the effect of the node classification task for three datasets on model ablation experiments at different modules (results in Cora, Citeseer, Wiki datasets from left to right in order), analyzed specifically as follows.
In order to fully analyze the effectiveness and efficiency of the model, three different variables were tested on three data sets in this example, namely enhancement matrix, self-encoder, and attention, which indicate that the attribute enhancement matrix, self-encoder, and attention were removed from the model, respectively. The classification results are shown in the figure, which shows that DASE achieves the best performance compared to other variants.
The invention provides an attribute and structure based deep fusion network embedding method and a model thereof, wherein the model consists of two modules, and for a multi-mode attribute sensing module, the model initializes and reconstructs node attribute characteristic representation so as to capture the similarity of a low-order structure and node attributes; for the multi-hop structure perception module, the attribute information sequence is translated into the node identity sequence, so that high-order structure semantics and node attribute similarity can be captured. And due to the high coupling of the two modules, the finally obtained node embedded vector represents the information capable of deeply fusing the network structure and the node attribute. Compared with the network embedding technology in the prior art, the method can consider the practical use significance, can deeply fuse the network structure and the node attribute information, has certain robustness in model design, is easy to realize, and is more efficient in operation efficiency.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
Claims (7)
1. A network embedding method based on attribute and structure deep fusion is characterized by comprising the following steps:
s1, fusing node attribute characteristics of all neighbor nodes of a current node through a heuristic method to obtain fused node attribute characteristics, and encoding and decoding the node attribute characteristics of each node in a network by using a depth automatic encoder so as to reconstruct the decoded node attribute characteristics and the fused node attribute characteristics to obtain node attribute characteristics of a reconstructed encoding layer;
s2, fusing the node attribute characteristics of the reconstructed coding layer with a node sequence generated by random walk to obtain a node attribute information sequence with the node attribute characteristics;
s3, inputting the node attribute information sequence into a self-translation frame, and constructing a self-translation process from the node attribute information sequence to a node identity sequence to obtain node embedded vector representation capable of retaining the network structure and the node attribute information of the original network;
and S4, verifying through the real network data set.
2. The network embedding method based on attribute and structure depth fusion of claim 1, wherein in S1, the specific contents of the obtained fused node attribute features are: updating each node attribute feature to be a median of all neighbor node attribute features of the current node, and expressing the fused node attribute features as follows:
3. The network embedding method based on attribute and structure depth fusion of claim 1, wherein in S3, the self-translation process from the node attribute information sequence to the node identity sequence specifically includes the following steps:
and coding the node attribute information sequence, introducing an attention mechanism to perform weight distribution on the feature information learned after coding, further decoding, and finally realizing the self-translation process from the whole node attribute information sequence to the node identity sequence.
4. The network embedding method based on attribute and structure depth fusion of claim 4, wherein the hidden layer representation output after encoding the node attribute information sequence is the network node embedding vector representation.
5. A network embedding model based on attribute and structure deep fusion is characterized by comprising the following steps: the system comprises a multi-mode attribute sensing module, an attribute embedding layer and a multi-hop structure sensing module; the multi-mode attribute sensing module is connected with the multi-hop structure sensing module through the attribute embedding layer;
the multi-mode attribute sensing module is used for fusing node attribute characteristics of all neighbor nodes of a current node by a heuristic method to obtain fused node attribute characteristics, coding and decoding the node attribute characteristics of each node in the network by using a depth automatic coder, and further reconstructing the decoded node attribute characteristics and the fused node attribute characteristics to obtain node attribute characteristics of a reconstructed coding layer;
the attribute embedding layer is used for fusing the node attribute characteristics of the reconstructed coding layer obtained by the multi-mode attribute perception module with a node sequence generated by random walk to obtain a node attribute information sequence with the node attribute characteristics;
and the multi-hop structure sensing module is used for inputting the node attribute information sequence obtained by the attribute embedding layer into a self-translation frame, constructing a self-translation process from the node attribute information sequence to the node identity sequence, and obtaining node embedding vector representation capable of retaining the network structure and the node attribute information of the original network.
6. The network embedding model based on attribute and structure depth fusion of claim 6, wherein the multi-modal attribute sensing module comprises a neighbor node fusion unit, a depth automatic encoder and a reconstruction unit; the neighbor node fusion unit, the depth automatic encoder and the reconstruction unit are connected in sequence;
the neighbor node fusion unit is used for acquiring the fused node attribute characteristics;
the depth automatic encoder comprises a first encoding layer and a first decoding layer, wherein the first encoding layer is used for encoding the node attribute characteristics of each node in the network, and the first decoding layer is used for decoding the encoded node attribute characteristics to obtain decoded node attribute characteristics;
and the reconstruction unit is used for reconstructing the decoded node attribute characteristics and the fused node attribute characteristics.
7. The network embedding model based on attribute and structure depth fusion of claim 6, wherein the multi-hop structure perception module comprises: a self-translation framework and attention layer;
the self-translation framework comprises a second coding layer, a second decoding layer, a translation layer and a softmax layer;
the second coding layer is used for coding the node attribute information sequence;
the attention layer is arranged between the second coding layer and the second decoding layer and is used for introducing an attention mechanism into the self-translation framework so as to perform weight distribution on the characteristic information learned after coding;
the second decoding layer is used for decoding the node attribute information sequence which is subjected to weight distribution after being encoded;
the translation layer is used for translating the node semantic feature vector obtained after the second decoding layer decodes into a node identity sequence;
and the softmax layer is used for converting the feature vectors obtained by the translation layer into probability values.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010410196.7A CN111598223B (en) | 2020-05-15 | 2020-05-15 | Network embedding method based on attribute and structure depth fusion and model thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010410196.7A CN111598223B (en) | 2020-05-15 | 2020-05-15 | Network embedding method based on attribute and structure depth fusion and model thereof |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111598223A true CN111598223A (en) | 2020-08-28 |
CN111598223B CN111598223B (en) | 2023-10-24 |
Family
ID=72182421
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010410196.7A Active CN111598223B (en) | 2020-05-15 | 2020-05-15 | Network embedding method based on attribute and structure depth fusion and model thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111598223B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112541340A (en) * | 2020-12-18 | 2021-03-23 | 昆明理工大学 | Weak supervision involved microblog evaluation object identification method based on variation double-theme representation |
WO2022160431A1 (en) * | 2021-01-26 | 2022-08-04 | 中山大学 | Attribute heterogeneous network embedding method, apparatus, and device, and medium |
CN116094952A (en) * | 2023-01-04 | 2023-05-09 | 中国联合网络通信集团有限公司 | Method, device, equipment and storage medium for determining network structure similarity |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110289063A1 (en) * | 2010-05-21 | 2011-11-24 | Microsoft Corporation | Query Intent in Information Retrieval |
CN107516110A (en) * | 2017-08-22 | 2017-12-26 | 华南理工大学 | A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding |
CN110598061A (en) * | 2019-09-20 | 2019-12-20 | 东北大学 | Multi-element graph fused heterogeneous information network embedding method |
US20200134428A1 (en) * | 2018-10-29 | 2020-04-30 | Nec Laboratories America, Inc. | Self-attentive attributed network embedding |
CN111104797A (en) * | 2019-12-17 | 2020-05-05 | 南开大学 | Paper network representation learning method based on dual sequence-to-sequence generation |
CN111127146A (en) * | 2019-12-19 | 2020-05-08 | 江西财经大学 | Information recommendation method and system based on convolutional neural network and noise reduction self-encoder |
-
2020
- 2020-05-15 CN CN202010410196.7A patent/CN111598223B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20110289063A1 (en) * | 2010-05-21 | 2011-11-24 | Microsoft Corporation | Query Intent in Information Retrieval |
CN107516110A (en) * | 2017-08-22 | 2017-12-26 | 华南理工大学 | A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding |
US20200134428A1 (en) * | 2018-10-29 | 2020-04-30 | Nec Laboratories America, Inc. | Self-attentive attributed network embedding |
CN110598061A (en) * | 2019-09-20 | 2019-12-20 | 东北大学 | Multi-element graph fused heterogeneous information network embedding method |
CN111104797A (en) * | 2019-12-17 | 2020-05-05 | 南开大学 | Paper network representation learning method based on dual sequence-to-sequence generation |
CN111127146A (en) * | 2019-12-19 | 2020-05-08 | 江西财经大学 | Information recommendation method and system based on convolutional neural network and noise reduction self-encoder |
Non-Patent Citations (2)
Title |
---|
LIU JIE, ET AL: "Content to Node:Self-Translation Network Embedding", KDD‘18:PROCEEDING OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVER&DATA MINING, pages 1794 - 1802 * |
刘正铭,等: "融合节点描述属性信息的网络表示学习算法", 计算机应用, vol. 39, no. 4, pages 1012 - 1020 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112541340A (en) * | 2020-12-18 | 2021-03-23 | 昆明理工大学 | Weak supervision involved microblog evaluation object identification method based on variation double-theme representation |
CN112541340B (en) * | 2020-12-18 | 2021-11-23 | 昆明理工大学 | Weak supervision involved microblog evaluation object identification method based on variation double-theme representation |
WO2022160431A1 (en) * | 2021-01-26 | 2022-08-04 | 中山大学 | Attribute heterogeneous network embedding method, apparatus, and device, and medium |
CN116094952A (en) * | 2023-01-04 | 2023-05-09 | 中国联合网络通信集团有限公司 | Method, device, equipment and storage medium for determining network structure similarity |
CN116094952B (en) * | 2023-01-04 | 2024-05-14 | 中国联合网络通信集团有限公司 | Method, device, equipment and storage medium for determining network structure similarity |
Also Published As
Publication number | Publication date |
---|---|
CN111598223B (en) | 2023-10-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103544255B (en) | Text semantic relativity based network public opinion information analysis method | |
CN111598223B (en) | Network embedding method based on attribute and structure depth fusion and model thereof | |
CN103514183B (en) | Information search method and system based on interactive document clustering | |
CN107229668B (en) | Text extraction method based on keyword matching | |
CN102591988B (en) | Short text classification method based on semantic graphs | |
Fournier‐Viger et al. | A survey of pattern mining in dynamic graphs | |
CN106909643A (en) | The social media big data motif discovery method of knowledge based collection of illustrative plates | |
CN111651198B (en) | Automatic code abstract generation method and device | |
CN111460818A (en) | Web page text classification method based on enhanced capsule network and storage medium | |
CN115017299A (en) | Unsupervised social media summarization method based on de-noised image self-encoder | |
CN103995804A (en) | Cross-media topic detection method and device based on multimodal information fusion and graph clustering | |
CN111966827A (en) | Conversation emotion analysis method based on heterogeneous bipartite graph | |
CN116756690A (en) | Cross-language multi-mode information fusion method and device | |
CN113988075A (en) | Network security field text data entity relation extraction method based on multi-task learning | |
CN116467438A (en) | Threat information attribution method based on graph attention mechanism | |
CN108595466B (en) | Internet information filtering and internet user information and network card structure analysis method | |
CN116958997B (en) | Graphic summary method and system based on heterogeneous graphic neural network | |
Lee et al. | Detecting suicidality with a contextual graph neural network | |
Zhuo et al. | Context attention heterogeneous network embedding | |
CN116775855A (en) | Automatic TextRank Chinese abstract generation method based on Bi-LSTM | |
Sajid et al. | Sequential pattern finding: A survey | |
Yu et al. | Mining hidden interests from twitter based on word similarity and social relationship for OLAP | |
CN114118058A (en) | Emotion analysis system and method based on fusion of syntactic characteristics and attention mechanism | |
CN108256055B (en) | Topic modeling method based on data enhancement | |
CN116136866B (en) | Knowledge graph-based correction method and device for Chinese news abstract factual knowledge |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |