CN111598223A - Network embedding method based on attribute and structure deep fusion and model thereof - Google Patents

Network embedding method based on attribute and structure deep fusion and model thereof Download PDF

Info

Publication number
CN111598223A
CN111598223A CN202010410196.7A CN202010410196A CN111598223A CN 111598223 A CN111598223 A CN 111598223A CN 202010410196 A CN202010410196 A CN 202010410196A CN 111598223 A CN111598223 A CN 111598223A
Authority
CN
China
Prior art keywords
node
attribute
layer
network
node attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010410196.7A
Other languages
Chinese (zh)
Other versions
CN111598223B (en
Inventor
张贤坤
罗学雄
马蕴玢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University of Science and Technology
Original Assignee
Tianjin University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University of Science and Technology filed Critical Tianjin University of Science and Technology
Priority to CN202010410196.7A priority Critical patent/CN111598223B/en
Publication of CN111598223A publication Critical patent/CN111598223A/en
Application granted granted Critical
Publication of CN111598223B publication Critical patent/CN111598223B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a network embedding method based on attribute and structure deep fusion, which comprises the following steps: s1, obtaining node attribute characteristics of a reconstructed coding layer; s2, obtaining a node attribute information sequence with node attribute characteristics; s3, translating the node attribute information sequence into a node identity sequence to obtain node embedded vector representation capable of retaining the network structure and the node attribute information of the original network; also disclosed is a network embedding model based on attribute and structure depth fusion, comprising: the system comprises a multi-mode attribute sensing module, an attribute embedding layer and a multi-hop structure sensing module; the multi-mode attribute sensing module is connected with the multi-hop structure sensing module through the attribute embedding layer; according to the invention, the similarity between the low-order structure and the node attribute can be captured, the semantic of the high-order structure and the similarity between the node attribute can also be captured, and finally the obtained node embedded vector can represent the information capable of deeply fusing the network structure and the node attribute.

Description

Network embedding method based on attribute and structure deep fusion and model thereof
Technical Field
The invention relates to the technical field of network embedding, in particular to a network embedding method based on attribute and structure deep fusion and a model thereof.
Background
The field of data mining is always a field which is concerned by the public, and particularly with the occurrence of a great amount of various social network data in recent years, such as a microblog network, a transportation network and the like, research and analysis on the social network become particularly important. Because the network data has the characteristics of large scale, high nonlinearity and rich node attribute information, the traditional method for analyzing based on the network adjacency matrix is easy to have the problem of high cost of computing resources and space resources, so that effective data analysis tasks, such as network node classification and clustering tasks, recommendation tasks, visualization tasks and the like, cannot be performed. Therefore, the research of network embedding technology appears to effectively solve the problem, and aims to find a proper mapping function for mapping large-scale and high-latitude network data to a vector space at a low latitude, and simultaneously retain the inherent properties of the original network, namely learning the embedded vector representation of all network nodes. The node embedded vector representation not only can effectively retain the structure and attribute characteristics of the original network, but also can be effectively used by machine learning related technologies, thereby facilitating further research and analysis of network data.
Although there is now some very good work in network embedding research, such as a network embedding method for capturing network structure information, the deep walk method proposes to obtain a node sequence by truncating random walks, and then to learn an embedded vector representation of a network node by using a skip-gram model in natural language processing technology. The LINE method provides modeling of first-order and second-order similarity of a network structure, and considers that two nodes which are directly connected have similar vector representations, and even if the two nodes which are not connected have a common neighbor, the two nodes have similar vector representations, so that a targeted loss function is designed to obtain effective node embedding vector representations. In order to solve the problem of high nonlinearity of network data, the SDNE method utilizes a depth self-encoder technology to mine network structure information and simultaneously reserves the high nonlinearity of a network structure. However, in real network data, network nodes have abundant attribute information, and therefore methods for jointly capturing network structure and node attribute information are also proposed, for example, SNE methods propose learning the structure and attribute similarity of nodes in a social network. The ANRL method designs an attribute-aware skip-gram module based on a self-encoder to capture the similarity of network structures and node attributes. The STNE method provides a self-translation network embedding model to deeply fuse network structure and node attribute information to obtain effective node embedding vector representation.
All of the above network embedding methods, although performing well on some network analysis tasks, have several problems as follows. Firstly, most methods respectively model network structure similarity and node attribute similarity, and then merge and consider learning of final network node embedded vector representation; the second reality network has high-order structural semantic similarity, in other words, although the two nodes are far away from each other, the structural semantics in the neighbor range where the two nodes are located are similar, so that the two nodes are also similar in semantics; most of the third existing methods only consider how to merge network structure and node attribute information in an overall framework, and ignore the similarity of node attribute semantic relations in low-order structure and high-order structure semantics.
In summary, the existing network embedding method has not yet solved the above-mentioned problems, and therefore, how to provide a network embedding method and a model thereof based on attribute and structure deep fusion, which can effectively solve the above-mentioned defects, is a problem that needs to be solved by those skilled in the art.
Disclosure of Invention
In view of the above, the invention provides a network embedding method and a model thereof based on attribute and structure deep fusion, and the network node embedding vector representation learned by the method and the model thereof can effectively retain rich attributes of the original network, and effectively utilize numerous network data analysis tasks to improve the accuracy of network analysis.
In order to achieve the purpose, the invention adopts the following technical scheme:
a network embedding method based on attribute and structure deep fusion comprises the following steps:
s1, fusing node attribute characteristics of all neighbor nodes of a current node through a heuristic method to obtain fused node attribute characteristics, and encoding and decoding the node attribute characteristics of each node in a network by using a depth automatic encoder so as to reconstruct the decoded node attribute characteristics and the fused node attribute characteristics to obtain node attribute characteristics of a reconstructed encoding layer;
s2, fusing the node attribute characteristics of the reconstructed coding layer with a node sequence generated by random walk to obtain a node attribute information sequence with the node attribute characteristics;
s3, inputting the node attribute information sequence into a self-translation frame, and constructing a self-translation process from the node attribute information sequence to a node identity sequence to obtain node embedded vector representation capable of retaining the network structure and the node attribute information of the original network;
and S4, verifying through the real network data set.
Preferably, in S1, the specific content of the obtained fused node attribute feature is: updating each node attribute feature to be a median of all neighbor node attribute features of the current node, and expressing the fused node attribute features as follows:
Figure BDA0002492865130000031
wherein
Figure BDA0002492865130000032
|N(i)Is node uiK is the kth dimension of the node attribute feature vector.
Preferably, in S3, the self-translation process from the node attribute information sequence to the node identity sequence specifically includes the following steps:
and coding the node attribute information sequence, introducing an attention mechanism to perform weight distribution on the feature information learned after coding, further decoding, and finally realizing the self-translation process from the whole node attribute information sequence to the node identity sequence.
Preferably, the hidden layer representation output after the node attribute information sequence is encoded is a network node embedded vector representation.
A network embedding model based on attribute and structure depth fusion, comprising: the system comprises a multi-mode attribute sensing module, an attribute embedding layer and a multi-hop structure sensing module; the multi-mode attribute sensing module is connected with the multi-hop structure sensing module through the attribute embedding layer;
the multi-mode attribute sensing module is used for fusing node attribute characteristics of all neighbor nodes of a current node by a heuristic method to obtain fused node attribute characteristics, coding and decoding the node attribute characteristics of each node in the network by using a depth automatic coder, and further reconstructing the decoded node attribute characteristics and the fused node attribute characteristics to obtain node attribute characteristics of a reconstructed coding layer;
the attribute embedding layer is used for fusing the node attribute characteristics of the reconstructed coding layer obtained by the multi-mode attribute perception module with a node sequence generated by random walk to obtain a node attribute information sequence with the node attribute characteristics;
and the multi-hop structure sensing module is used for inputting the node attribute information sequence obtained by the attribute embedding layer into a self-translation frame, constructing a self-translation process from the node attribute information sequence to the node identity sequence, and obtaining node embedding vector representation capable of retaining the network structure and the node attribute information of the original network.
Preferably, the multi-modal attribute perception module comprises a neighbor node fusion unit, a depth automatic encoder and a reconstruction unit; the neighbor node fusion unit, the depth automatic encoder and the reconstruction unit are connected in sequence;
the neighbor node fusion unit is used for acquiring the fused node attribute characteristics;
the depth automatic encoder comprises a first encoding layer and a first decoding layer, wherein the first encoding layer is used for encoding the node attribute characteristics of each node in the network, and the first decoding layer is used for decoding the encoded node attribute characteristics to obtain decoded node attribute characteristics;
and the reconstruction unit is used for reconstructing the decoded node attribute characteristics and the fused node attribute characteristics.
Preferably, the multi-hop structure sensing module includes: a self-translation framework and attention layer;
the self-translation framework comprises a second coding layer, a second decoding layer, a translation layer and a softmax layer;
the second coding layer is used for coding the node attribute information sequence;
the attention layer is arranged between the second coding layer and the second decoding layer and is used for introducing an attention mechanism into the self-translation framework so as to perform weight distribution on the characteristic information learned after coding;
the second decoding layer is used for decoding the node attribute information sequence which is subjected to weight distribution after being encoded;
the translation layer is used for translating the node semantic feature vector obtained after the second decoding layer decodes into a node identity sequence;
and the softmax layer is used for converting the feature vectors obtained by the translation layer into probability values.
According to the technical scheme, compared with the prior art, the invention discloses the network embedding method and the model based on the attribute and structure deep fusion, and aims at the problems in the prior art, the node embedding vector representation obtained by the method and the model disclosed by the invention not only can capture the low-order structure and attribute similarity, but also can further capture the high-order structure semantic and node attribute similarity, and meanwhile, the structure and attribute characteristics of the original network can be effectively reserved, so that the accuracy and the practicability of the network are effectively ensured.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a diagram illustrating an motivational structure of a network embedding method based on attribute and structure depth fusion according to the present invention;
FIG. 2 is a diagram illustrating an overall architecture of a network embedded model based on attribute and structure depth fusion according to the present invention;
FIG. 3 is a bar graph illustrating the impact of node classification tasks on three datasets for different attribute feature embedding dimensions input into a multi-hop structural awareness module;
FIG. 4 is a bar graph illustrating the effect of a node classification task on three data sets taking into account different walk lengths in random walks;
FIG. 5 is a line graph illustrating the differences in node classification tasks represented on three data sets, taking into account different node embedding;
FIG. 6 is a line graph illustrating the effect of the present invention on the node classification task of three data sets of a model on ablation experiments performed on different modules.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment of the invention discloses a network embedding method based on attribute and structure deep fusion and a model thereof.
As shown, fig. 1 shows a real paper indexing network. The motivation for this model is divided into two parts: first, it models a low-level network structure, that is, paper 3 and paper 5 directly index each other, and the structure relationship belongs to a first-level network structure, while paper 4 and paper 5 do not directly index each other, but they all index paper 1, paper 2 and paper 3, so the structure sharing a common neighbor belongs to a second-level network structure. In particular, the attribute semantic relation of the nodes of the paper, namely the abstract content of the paper, is also modeled. Secondly, it models high-order structure semantics, such as a graph, a paper 5 and a paper 8, even if they are far apart, there is no second-order structure relationship, but since we model low-order structure and node attribute semantic information at the same time in the foregoing, the structure semantic information can be modeled, that is, the paper 1, the paper 2 and the paper 3 indexed by the paper 5 and the paper 9, the paper 10 and the paper 11 indexed by the paper 8 are all research papers in the same research field, and then the paper 5 and the paper 8 should belong to papers in the same research field, so that such high-order structure semantic information can be captured, and the self attribute semantic relationship of the paper 5 and the paper 8 is also considered.
Fig. 2 is an overall architecture diagram of the model. As shown in fig. 2, the specific steps are as follows:
the multi-modal attribute awareness module encodes and decodes the node attribute information using a depth self-encoder. Then, a heuristic method is provided, and a new node attribute feature is obtained by integrating the attributes of the node neighbors. And finally, reconstructing the attribute characteristics of the node neighbors, but not the original attributes of the node neighbors. In the attribute embedding layer, the learned node attribute features are fused with a node sequence generated after the random walk is cut off, and a node attribute sequence is constructed. For the multi-hop structure awareness module, the embodiment uses an attention-enhancing seq2seq framework to translate the network from the node attribute sequence to the node identity sequence.
Table 1 real network data set statistical table for verifying validity of model
Data set Number of nodes Number of edges Attribute feature dimension Number of categories
Cora 2708 5429 1433 7
Citeseer 3327 4732 3703 6
Wiki 2405 17981 4973 17
As shown in table 1, the contents of the various data sets are as follows:
cora and Citeseer are two paper citation networks, where nodes are articles and edges represent citations between articles. The attributes associated with the nodes are extracted from the headlines and summaries of each article and are represented as TFIDF vectors. A Wiki is a network of web pages in which nodes represent web pages and edges represent hyperlinks between web pages. The attributes of the relevant nodes are also denoted as TFIDF vectors.
TABLE 2 framework table of multi-modal attribute perception module in this model
Data set Number of neurons per layer
Cora 1433-1300-1200
Citeseer 3703-2500-1200
Wiki 4973-2500-1200
As shown in table 2, the specific architecture is as follows:
since the depth self-encoder is designed in the present embodiment in the multi-modal attribute sensing module, different depth self-encoder structures are designed for different data sets. For the Cora data set, because the attribute characteristic dimension is 1433, the coding layer is designed into three layers, the number of the neurons of each layer is 1433, 1300 and 1200 in sequence, and then for the decoding layer, the reverse is true, and the three layers are also designed, and the number of the neurons of each layer is 1200,1300 and 1433 in sequence. The self-encoder structure for the other two data sets is the same.
Table 3 architecture table of multi-hop structure sensing module in this model
Data set Cora Citeseer Wiki
Embedding dimension
1200 1200 1200
Forward coding layer dimension 500 500 500
Number of neurons in forward coding layer 1 2 1
Backward coding layer dimension 500 500 500
Number of neurons in the backward coding layer 1 2 1
Context vector dimension 1000 2000 1000
Dimension of decoding layer 1000 2000 1000
Predicting layer dimensions 2708 3327 2405
As shown in table 3, the details are as follows:
in the multi-hop structure sensing module, an attention-enhanced seq2seq framework is designed in the embodiment, so that different customizations are made to structures on different data sets. Firstly, for the Cora data set, because the dimension of the final encoding layer output of the previous module is 1200, that is, as the input dimension of the module, the dimension of the forward encoding layer is set to be 500, the number of layers is 1, the dimension of the backward encoding layer is set to be 500, the number of layers is 1, the dimension of the final Bi-LSTM output is 1000, the output dimension of the decoding layer is designed to be 1000, and the output dimension of the prediction layer is designed to be 2708, that is, the number of nodes of the Cora data set. The structural design for the other two data sets is shown, for the same reason.
Table 4 is a comparison table of node classification tasks of the model (hereinafter, DASE is used as an abbreviation of the model) and the existing baseline model in three real network data sets (table 4.1, table 4.2, and table 4.3 respectively show experimental results on Cora, Citeseer, Wiki data sets), and the specific results are as follows:
TABLE 4.1(Cora)
Figure BDA0002492865130000071
Figure BDA0002492865130000081
TABLE 4.2 (Citeser)
Figure BDA0002492865130000082
TABLE 4.3(Wiki)
Figure BDA0002492865130000083
In order to comprehensively evaluate the performance of the model, the percentage of the labeled nodes in training is randomly selected from 10% to 50% in the embodiment, and the rest is the test set. This process was repeated 10 times and the average performance was reported as Micro-F1. Accordingly, it can be seen that: first, DASE always maintains optimal performance in all methods. Second, it is worth noting that DASE can also achieve significant performance when there are fewer label nodes trained. For example, on Cora and citeser, DASE increased by 2.34% and 3.92%, respectively.
Table 5 is a table of significant analysis of classification task results of this model and other baseline models on three real datasets:
TABLE 5
Figure BDA0002492865130000084
Figure BDA0002492865130000091
To demonstrate that DASE is indeed statistically superior to the baseline model, the paired t-test (confidence level α ═ 0.05) was statistically calculated in this example to verify statistical significance between the 6 baseline algorithms. In fig. seven, where each entry (value) represents an assumed value t-test between the two algorithms, and an assumed value less than a 0.05 indicates that the difference is statistically significant. As can be seen from the figure, the model in this example is significantly different from other models in three data sets except for the F1 scoring result (p-value 0.1661) using DASE and stine on citeser.
Fig. 3 is a histogram (from left to right, results in Cora, Citeseer, Wiki datasets) considering the effect of different attribute feature embedding dimensions (d) on node classification tasks on three datasets input into the multi-hop structure awareness module, with the following specific analysis:
to investigate the effect of attribute feature embedding dimensions on the three datasets when the percentage (r) of label training nodes is from 10% to 50%, this embodiment ranges from 200 to 1200 or 1600. As can be seen from the figure, the DASE is stable to the performance of Cora in different dimensions.
However, the performance on Wiki and cineseer is fluctuating and the optimal value of the embedding dimension is 1200. The reason for this is that the embedding dimension is the input to the Bi-LSTM encoder layer, and when the embedding dimension is too low, the characteristic information is lost. Conversely, when the embedding dimension is too high, a large amount of noise is generated.
Fig. 4 is a histogram (from left to right, results in Cora, Citeseer, Wiki data sets in order) that considers the effect of the node classification task on three data sets for different walk lengths (l) in random walks, with the following detailed analysis:
to investigate the effect of random walk length on the three data sets when the percentage (r) of label training nodes is from 10% to 50%, this embodiment ranges from 6 to 20. As can be seen from the figure, DASE of different lengths is stable to Cora performance. For Citeseer and Wiki, initial performance may improve as the length increases. However, when the length of the random walk is greater than 10, the performance begins to slowly decline.
FIG. 5 is a line graph (results in Cora, Citeseer, Wiki datasets from left to right) showing the difference in node classification tasks on three datasets considered for different node embeddings, with the following detailed analysis:
the model has three node representations, the output of the Bi-LSTM encoder (e), the output layer of the LSTM decoder (d), and the combination of the encoder and decoder outputs (ed). In order to explore the influence of the above different node representations, a node classification experiment was performed on the above three data sets in the present embodiment. As shown, it can be seen that the node representation experiment results at the encoder layer output are better than the experiment results represented by other nodes on all data sets.
FIG. 6 is a line graph of the effect of the node classification task for three datasets on model ablation experiments at different modules (results in Cora, Citeseer, Wiki datasets from left to right in order), analyzed specifically as follows.
In order to fully analyze the effectiveness and efficiency of the model, three different variables were tested on three data sets in this example, namely enhancement matrix, self-encoder, and attention, which indicate that the attribute enhancement matrix, self-encoder, and attention were removed from the model, respectively. The classification results are shown in the figure, which shows that DASE achieves the best performance compared to other variants.
The invention provides an attribute and structure based deep fusion network embedding method and a model thereof, wherein the model consists of two modules, and for a multi-mode attribute sensing module, the model initializes and reconstructs node attribute characteristic representation so as to capture the similarity of a low-order structure and node attributes; for the multi-hop structure perception module, the attribute information sequence is translated into the node identity sequence, so that high-order structure semantics and node attribute similarity can be captured. And due to the high coupling of the two modules, the finally obtained node embedded vector represents the information capable of deeply fusing the network structure and the node attribute. Compared with the network embedding technology in the prior art, the method can consider the practical use significance, can deeply fuse the network structure and the node attribute information, has certain robustness in model design, is easy to realize, and is more efficient in operation efficiency.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (7)

1. A network embedding method based on attribute and structure deep fusion is characterized by comprising the following steps:
s1, fusing node attribute characteristics of all neighbor nodes of a current node through a heuristic method to obtain fused node attribute characteristics, and encoding and decoding the node attribute characteristics of each node in a network by using a depth automatic encoder so as to reconstruct the decoded node attribute characteristics and the fused node attribute characteristics to obtain node attribute characteristics of a reconstructed encoding layer;
s2, fusing the node attribute characteristics of the reconstructed coding layer with a node sequence generated by random walk to obtain a node attribute information sequence with the node attribute characteristics;
s3, inputting the node attribute information sequence into a self-translation frame, and constructing a self-translation process from the node attribute information sequence to a node identity sequence to obtain node embedded vector representation capable of retaining the network structure and the node attribute information of the original network;
and S4, verifying through the real network data set.
2. The network embedding method based on attribute and structure depth fusion of claim 1, wherein in S1, the specific contents of the obtained fused node attribute features are: updating each node attribute feature to be a median of all neighbor node attribute features of the current node, and expressing the fused node attribute features as follows:
Figure FDA0002492865120000011
wherein
Figure FDA0002492865120000012
|N(i)Is node uiK is the kth dimension of the node attribute feature vector.
3. The network embedding method based on attribute and structure depth fusion of claim 1, wherein in S3, the self-translation process from the node attribute information sequence to the node identity sequence specifically includes the following steps:
and coding the node attribute information sequence, introducing an attention mechanism to perform weight distribution on the feature information learned after coding, further decoding, and finally realizing the self-translation process from the whole node attribute information sequence to the node identity sequence.
4. The network embedding method based on attribute and structure depth fusion of claim 4, wherein the hidden layer representation output after encoding the node attribute information sequence is the network node embedding vector representation.
5. A network embedding model based on attribute and structure deep fusion is characterized by comprising the following steps: the system comprises a multi-mode attribute sensing module, an attribute embedding layer and a multi-hop structure sensing module; the multi-mode attribute sensing module is connected with the multi-hop structure sensing module through the attribute embedding layer;
the multi-mode attribute sensing module is used for fusing node attribute characteristics of all neighbor nodes of a current node by a heuristic method to obtain fused node attribute characteristics, coding and decoding the node attribute characteristics of each node in the network by using a depth automatic coder, and further reconstructing the decoded node attribute characteristics and the fused node attribute characteristics to obtain node attribute characteristics of a reconstructed coding layer;
the attribute embedding layer is used for fusing the node attribute characteristics of the reconstructed coding layer obtained by the multi-mode attribute perception module with a node sequence generated by random walk to obtain a node attribute information sequence with the node attribute characteristics;
and the multi-hop structure sensing module is used for inputting the node attribute information sequence obtained by the attribute embedding layer into a self-translation frame, constructing a self-translation process from the node attribute information sequence to the node identity sequence, and obtaining node embedding vector representation capable of retaining the network structure and the node attribute information of the original network.
6. The network embedding model based on attribute and structure depth fusion of claim 6, wherein the multi-modal attribute sensing module comprises a neighbor node fusion unit, a depth automatic encoder and a reconstruction unit; the neighbor node fusion unit, the depth automatic encoder and the reconstruction unit are connected in sequence;
the neighbor node fusion unit is used for acquiring the fused node attribute characteristics;
the depth automatic encoder comprises a first encoding layer and a first decoding layer, wherein the first encoding layer is used for encoding the node attribute characteristics of each node in the network, and the first decoding layer is used for decoding the encoded node attribute characteristics to obtain decoded node attribute characteristics;
and the reconstruction unit is used for reconstructing the decoded node attribute characteristics and the fused node attribute characteristics.
7. The network embedding model based on attribute and structure depth fusion of claim 6, wherein the multi-hop structure perception module comprises: a self-translation framework and attention layer;
the self-translation framework comprises a second coding layer, a second decoding layer, a translation layer and a softmax layer;
the second coding layer is used for coding the node attribute information sequence;
the attention layer is arranged between the second coding layer and the second decoding layer and is used for introducing an attention mechanism into the self-translation framework so as to perform weight distribution on the characteristic information learned after coding;
the second decoding layer is used for decoding the node attribute information sequence which is subjected to weight distribution after being encoded;
the translation layer is used for translating the node semantic feature vector obtained after the second decoding layer decodes into a node identity sequence;
and the softmax layer is used for converting the feature vectors obtained by the translation layer into probability values.
CN202010410196.7A 2020-05-15 2020-05-15 Network embedding method based on attribute and structure depth fusion and model thereof Active CN111598223B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010410196.7A CN111598223B (en) 2020-05-15 2020-05-15 Network embedding method based on attribute and structure depth fusion and model thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010410196.7A CN111598223B (en) 2020-05-15 2020-05-15 Network embedding method based on attribute and structure depth fusion and model thereof

Publications (2)

Publication Number Publication Date
CN111598223A true CN111598223A (en) 2020-08-28
CN111598223B CN111598223B (en) 2023-10-24

Family

ID=72182421

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010410196.7A Active CN111598223B (en) 2020-05-15 2020-05-15 Network embedding method based on attribute and structure depth fusion and model thereof

Country Status (1)

Country Link
CN (1) CN111598223B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541340A (en) * 2020-12-18 2021-03-23 昆明理工大学 Weak supervision involved microblog evaluation object identification method based on variation double-theme representation
WO2022160431A1 (en) * 2021-01-26 2022-08-04 中山大学 Attribute heterogeneous network embedding method, apparatus, and device, and medium
CN116094952A (en) * 2023-01-04 2023-05-09 中国联合网络通信集团有限公司 Method, device, equipment and storage medium for determining network structure similarity

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110289063A1 (en) * 2010-05-21 2011-11-24 Microsoft Corporation Query Intent in Information Retrieval
CN107516110A (en) * 2017-08-22 2017-12-26 华南理工大学 A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding
CN110598061A (en) * 2019-09-20 2019-12-20 东北大学 Multi-element graph fused heterogeneous information network embedding method
US20200134428A1 (en) * 2018-10-29 2020-04-30 Nec Laboratories America, Inc. Self-attentive attributed network embedding
CN111104797A (en) * 2019-12-17 2020-05-05 南开大学 Paper network representation learning method based on dual sequence-to-sequence generation
CN111127146A (en) * 2019-12-19 2020-05-08 江西财经大学 Information recommendation method and system based on convolutional neural network and noise reduction self-encoder

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110289063A1 (en) * 2010-05-21 2011-11-24 Microsoft Corporation Query Intent in Information Retrieval
CN107516110A (en) * 2017-08-22 2017-12-26 华南理工大学 A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding
US20200134428A1 (en) * 2018-10-29 2020-04-30 Nec Laboratories America, Inc. Self-attentive attributed network embedding
CN110598061A (en) * 2019-09-20 2019-12-20 东北大学 Multi-element graph fused heterogeneous information network embedding method
CN111104797A (en) * 2019-12-17 2020-05-05 南开大学 Paper network representation learning method based on dual sequence-to-sequence generation
CN111127146A (en) * 2019-12-19 2020-05-08 江西财经大学 Information recommendation method and system based on convolutional neural network and noise reduction self-encoder

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIU JIE, ET AL: "Content to Node:Self-Translation Network Embedding", KDD‘18:PROCEEDING OF THE 24TH ACM SIGKDD INTERNATIONAL CONFERENCE ON KNOWLEDGE DISCOVER&DATA MINING, pages 1794 - 1802 *
刘正铭,等: "融合节点描述属性信息的网络表示学习算法", 计算机应用, vol. 39, no. 4, pages 1012 - 1020 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112541340A (en) * 2020-12-18 2021-03-23 昆明理工大学 Weak supervision involved microblog evaluation object identification method based on variation double-theme representation
CN112541340B (en) * 2020-12-18 2021-11-23 昆明理工大学 Weak supervision involved microblog evaluation object identification method based on variation double-theme representation
WO2022160431A1 (en) * 2021-01-26 2022-08-04 中山大学 Attribute heterogeneous network embedding method, apparatus, and device, and medium
CN116094952A (en) * 2023-01-04 2023-05-09 中国联合网络通信集团有限公司 Method, device, equipment and storage medium for determining network structure similarity
CN116094952B (en) * 2023-01-04 2024-05-14 中国联合网络通信集团有限公司 Method, device, equipment and storage medium for determining network structure similarity

Also Published As

Publication number Publication date
CN111598223B (en) 2023-10-24

Similar Documents

Publication Publication Date Title
CN103544255B (en) Text semantic relativity based network public opinion information analysis method
CN111598223B (en) Network embedding method based on attribute and structure depth fusion and model thereof
CN103514183B (en) Information search method and system based on interactive document clustering
CN107229668B (en) Text extraction method based on keyword matching
CN102591988B (en) Short text classification method based on semantic graphs
Fournier‐Viger et al. A survey of pattern mining in dynamic graphs
CN106909643A (en) The social media big data motif discovery method of knowledge based collection of illustrative plates
CN111651198B (en) Automatic code abstract generation method and device
CN111460818A (en) Web page text classification method based on enhanced capsule network and storage medium
CN115017299A (en) Unsupervised social media summarization method based on de-noised image self-encoder
CN103995804A (en) Cross-media topic detection method and device based on multimodal information fusion and graph clustering
CN111966827A (en) Conversation emotion analysis method based on heterogeneous bipartite graph
CN116756690A (en) Cross-language multi-mode information fusion method and device
CN113988075A (en) Network security field text data entity relation extraction method based on multi-task learning
CN116467438A (en) Threat information attribution method based on graph attention mechanism
CN108595466B (en) Internet information filtering and internet user information and network card structure analysis method
CN116958997B (en) Graphic summary method and system based on heterogeneous graphic neural network
Lee et al. Detecting suicidality with a contextual graph neural network
Zhuo et al. Context attention heterogeneous network embedding
CN116775855A (en) Automatic TextRank Chinese abstract generation method based on Bi-LSTM
Sajid et al. Sequential pattern finding: A survey
Yu et al. Mining hidden interests from twitter based on word similarity and social relationship for OLAP
CN114118058A (en) Emotion analysis system and method based on fusion of syntactic characteristics and attention mechanism
CN108256055B (en) Topic modeling method based on data enhancement
CN116136866B (en) Knowledge graph-based correction method and device for Chinese news abstract factual knowledge

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant