CN110674922A - Network representation obtaining method based on deep learning - Google Patents
Network representation obtaining method based on deep learning Download PDFInfo
- Publication number
- CN110674922A CN110674922A CN201910747332.9A CN201910747332A CN110674922A CN 110674922 A CN110674922 A CN 110674922A CN 201910747332 A CN201910747332 A CN 201910747332A CN 110674922 A CN110674922 A CN 110674922A
- Authority
- CN
- China
- Prior art keywords
- vector
- network
- sequence
- node
- time
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Image Analysis (AREA)
Abstract
The invention provides a network representation obtaining method based on deep learning, which comprises the following steps: step 1, obtaining a network containing node content, wherein the network containing the node content comprises | V | nodes, one node is selected from the | V | nodes to be used as a current root node, and the network containing the node content is randomly walked according to the current root node to obtain a content sequenceAnd node identification sequenceStep 2, the content sequence is processedInput into a deep learning model based on attention mechanism to obtain predictionAnd identifying the vector sequence to obtain the network characterization vector. The invention applies the research result of the deep learning technology in the machine translation direction to the network characterization learning, and fuses the content and the structure of the network from the machine translation angle to obtain a proper network characterization vector.
Description
Technical Field
The invention belongs to the field of machine learning, and particularly relates to a network representation acquisition method based on deep learning.
Background
In actual social life, many complex systems can be represented by network structures, nodes in a network represent data samples, edges represent relationships of nodes in the network, for example, people in a social network formed by people connecting with each other represent nodes, relationships among people represent edges of the network, documents in a citation network formed by citations and cited relationships among documents serve as nodes of the network, citations and relationships among documents serve as edges of the network, devices in an internet of things formed by devices such as sensors and controllers which are associated with each other represent nodes of the network, and connection relationships among devices represent edges of the network. It can be seen that networks are becoming a direct, primary means of representation of the many complex systems of the big data age with their "flexible" yet powerful representation capabilities. In order to better apply the data output by the system composed of network forms to various industry fields, how to effectively and accurately represent the network becomes a hot spot of current research. The traditional representation mode based on the network topology structure has obvious defects under the condition that the network scale is exponentially increased in the big data era, a great amount of iteration and combined calculation steps bring great inconvenience to network analysis and processing, and the strong coupling relation among nodes makes distributed and parallel calculation and grid calculation difficult to be applied to the processing of network data.
According to the continuous development of the network characterization learning method, the development process is taken as an angle to be divided into two main categories: the traditional Network characterization learning method based on graph structure and graph Embedding (Network Embedding) method.
The traditional network characterization learning method based on the graph structure, such as the method based on the adjacency matrix, has obvious disadvantages. Firstly, strong coupling, because the network is stored by using the traditional representation method, the nodes in the network have the strong coupling relationship and the distributed operation of the single node is difficult to be carried out; secondly, the high-cost computation complexity and the waste of storage resources are caused, along with the arrival of a big data era, the data scale is increased rapidly, the relationship among data is more complex, so that the complexity of iteration and combined computation is increased exponentially, and the network processing and analysis are difficult to be carried out well. Meanwhile, the edges of the massive data nodes are all stored in the edge set, which causes waste of storage resources.
For a graph-embedded network characterization learning method, a deep learning technology graph embedding method is proposed by Bryan Perozzi et al, the algorithm is a first graph embedding method applying a deep learning technology, essentially, a random walk mode is used for walking and capturing local context information of nodes in a network, and because the distribution of the nodes in short random walk is similar to the distribution of words processed by natural language, a Skip-Gram model is used for learning a representation vector of the nodes, and the vector represents local structure information of the nodes in the network. Although the algorithm applies the deep learning technology, the algorithm has the defects that firstly, the clear optimization target is lacked, and secondly, the local structure which only walks the network in a short distance captures local information and is lack of global property. The network characterization vectors learned by Tang-Jian et al, which have certain drawbacks in view of capturing only local structures, propose an algorithm that combines network global and local structures-LINE, which is suitable for networks with larger scales, retains first-order and second-order similarities, and uses negative sampling to optimize the Skip-Gram model. But the disadvantage is that LINE separately learns the local information and the global information of the network, and finally simply connects the two expression vectors, which cannot better fuse the local structure and the global structure. Trepeng et al propose a graph embedding method based on a semi-supervised model, which maintains the proximity of 2-hop neighbors of a node through the model architecture of a coding machine, and does not simply merge local and global structures through a connection mode. But the effect of the algorithm is less than ideal when the network characterization vector needs to represent the contents of the node itself. When the structure and the content of the network are considered, the STNE algorithm adopts an Encoder-Decoder model based on a recurrent neural network (STM), and can better obtain a characterization vector covering the structure and the content of the network. But has the disadvantage that the algorithm cannot accurately translate the node content vector (content vector) into the node identification vector (identity vector).
Therefore, the above-mentioned conventional graph embedding method cannot well represent an accurate token vector under the condition of covering the network structure and content.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide a network characterization acquisition method based on deep learning, and solve the technical problem that the prior art cannot well express an accurate characterization vector.
In order to solve the technical problem, the application adopts the following technical scheme:
compared with the prior art, the invention has the beneficial technical effects that:
1. the invention applies the research result of the deep learning technology in the machine translation direction to the network characterization learning, and fuses the content and the structure of the network from the machine translation angle to obtain a proper network characterization vector;
2. the invention adds an attention mechanism, so that the network characterization vector is more accurate.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a schematic diagram of an Encoder-Decoder model with attention mechanism;
FIG. 3 is a comparison graph of the results of MRR experiments in the prior art and the method provided by the present invention.
The details of the present invention are explained in further detail below with reference to the drawings and examples.
Detailed Description
The following embodiments of the present invention are provided, and it should be noted that the present invention is not limited to the following embodiments, and all equivalent changes based on the technical solutions of the present invention are within the protection scope of the present invention.
Random walk: the conservation quantities carried by any irregular walker correspond to a diffusion transport law respectively, are close to the Brownian motion, and are ideal mathematical states of the Brownian motion.
Long-short term memory neural network: a threshold cycle neural network capable of processing sequence data changes the weight in the cycle through thresholds with different functions, and can avoid the problem of gradient disappearance or gradient explosion of the traditional RNN. The LSTM memory cell stores information so that it can store context information well over long distances.
Attention vector: from the attention mechanism, which allows the decoder to consider the entire encoder output hidden state sequence at each time step, the encoder keeps more information scattered across all the hidden state vectors, and the decoder can decide which vectors are of more interest when using these hidden vectors.
Example one
The embodiment provides a network representation obtaining method based on deep learning, as shown in fig. 1, including the following steps:
step 1, acquiring a network containing node content, wherein the network containing the node content comprises | V | nodes, and | V | ≠ 0;
step 2, selecting one node from | V | nodes as a current root node, and performing random walk on the network containing the node content according to the current root node to obtain N random walk sequences, wherein N is a positive integer, each random walk sequence comprises a content sequence and a node identification sequence, and the nth content sequenceAnd nth node identification sequenceWhere T represents the total number of steps of the random walk, N is 1,2, … …, N,a content vector representing the q-th node,an identification element representing the qth node;
and carrying out random walk by taking | V | nodes in the network as root nodes, and walking out N random walk sequences in a co-walk manner.
Since the nodes in the content sequence are all raw data, such as the abstract and the title of an article, these raw nodes need to be processed to convert them into vector representations.
For content sequenceVectorizing to obtain content characterization vectorWhereinThe dimension of (a) is m;
in the specific implementation mode, the mapping function for vectorizing the content sequence is the combination of a full-link layer and a Hash Trick method, and when the dimension of a data set is larger than 3000, the Hash Trick is recommended to be used; both are not exceeded.
At the content embedding layer in FIG. 2, the initial input is the content sequenceEach of whichIs represented as follows:
wherein the content of the first and second substances,representing original text content (abstracts and topics extracted from an article), e.g.:[3,[Neural Collaborative Filtering In recent years,deep neural networkshave yielded immense success on speech recognition……]];
Mapping each node in the content sequence into 200(d) -dimensional vector representation by using a Hash Trick method, namely vectorizing the content sequenceWhereinIs shown andsimilarly, only the upper layer is required to beChange to [ id ]j,dimensiond]E.g. of:[3,0.1,0.2,0.4……]。
Let time t equal to 1;
step 3, representing the content into a vectorInputting a bidirectional long-time and short-time memory cyclic neural network (Bi-LSTM) to obtain an implicit state vector sequence during forward propagation at the time tImplicit state vector sequence when backward propagating with time tForward transmitting time tSequence of implicit state vectors at broadcast timeImplicit state vector sequence when backward propagating with time tChaining to obtain a t-moment hidden state vector sequence { ht1,ht2,...,htq,...htTT denotes the time, htqRepresenting the q-th element in the implicit state vector sequence; wherein the content of the first and second substances,andhas a dimension of d, htqDimension of (2 d);
the machine translation module containing the Attention mechanism is constructed in the invention and comprises a Content representation Content Embedding Layer, a coding Encoder Layer, an Attention Layer and a decoding Decoder Layer which are sequentially connected.
In the invention, in order to obtain the node identification sequence better, Bi-directionalLSTM (Bi-LSTM) is used as an Encoder layer.
Will be provided withThe built-in function (threshold function) input to the Bi-LSTM sub-band obtains the implicit state vector sequence in forward propagationAnd implicit state vector sequences when propagating in reverseAs shown in FIG. 2;
wherein the content of the first and second substances,linking means directly linking twoConnected by vectors, e.g. a ═ 1,2,3],b=[4,5,6]Operation of
According to the invention, the node vector representation in the node identification sequence is more accurate, and an attention layer is added into the model aiming at different influence degrees of the decoding layer output sequence of the neural network on different node vector representations in the node identification sequence.
Step 4, randomly setting a global vector u with dimension dωThe global vector muωAnd implicit state vector sequence ht1,ht2,...,htq,...htTInputting to the attention layer to get the attention vector ct;
Obtaining the attention vector c in step 4 by the formula (2)t:
In the formula (2), αtqDenotes the qth at time ttqVector to attention vector ctThe degree of contribution of (c);utq=tanh(Wωhtq+bω),Wωweight matrix representing the attention layer, bωIndicating the bias term for the attention layer.
Step 5, when t is equal to 1, attention vector c is focusedtAnd an initial input vectorInputting the data into a long-time memory recurrent neural network (LSTM) to obtain a node identification vector d1WhereinRepresenting links, node-id vectors d1D is the dimension of;
when t is>At 1 hourAttention vector ctAnd dt-1Inputting the data into a long-time memory recurrent neural network to obtain a node identification vector dtNode identification vector dtD is the dimension of;
the LSTM is used as the Decoder layer in the present invention. After the previous operations, a Decoder layer initial input vector w and a Decoder layer output vector d at a certain moment are obtainedtCorresponding attention vector ct. The decoder layer initial input vector w contains the content sequenceThe content of (2) can be used as an initial input vector of the decoder layer, so that the final output result of the decoder layer can be more accurate. In this embodiment, the Decoder layer inputs the vector and the attention vector c at each momenttJointly form high-level representation as input, and obtain a decoder layer result output sequence, D ═ D1,d2,...,dt,...,dT}。
And 6, repeating the steps 3 to 5 until T is T +1 to obtain the coding layer output sequence D ═ D1,d2,...,dt,...,dT};
Encoding layer output sequence D ═ D1,d2,...,dt,...,dTEvery node in the vector dtNode identification vector d 'mapped to | V | dimension'tVector d 'is identified for nodes of | V | dimension'tCarrying out normalization processing to obtain a probability vector pt(j) J represents the jth element in the probability vector, j ═ 1, 2., | V |;
step 7, each node in the | V | nodes is used as a current root node, the steps 1 to 6N times are repeated, and the loss value L is calculated through the formula (1);
v in the formula (1)qRepresenting the q node in the nth random walk sequence; snDenotes the n-thA random walk sequence;as a binary function, when the identification element of the q-th node in the nth random walk sequenceIs equal to jOutput 1, otherwiseOutputting 0;
if the loss value L is less than or equal to the preset threshold value P, D ═ D1,d2,...,dt,...,dTThe network representation vector is obtained; specifically, the preset threshold P may be 0.001.
Otherwise, adjusting the two-way long-time and short-time memory cyclic neural network and the weight matrix in the long-time and short-time memory cyclic neural network, repeating the steps 1 to 7 until the loss value L is less than or equal to a preset threshold value P, and at the moment, D ═ D1,d2,...,dt,...,dTThe network characterization vector is obtained.
Example two
In this embodiment, an experiment verification is performed on the deep learning-based network representation acquisition method provided by the present invention, and an AAN data set is used in the experiment, where the AAN data set includes 17667 articles and 107879 citation relations (edges of nodes in a network), each element in the data set is an extracted article, and the article includes an abstract and a title of an original article. For each query article, in this embodiment, nodes directly connected to the query article are randomly divided according to a ratio of 1:9 as hidden articles and seed articles, and after 584 query articles and isolated articles are removed, a new citation network is formed, which has 16791 nodes and 88617 edges, and a Hash check method is used to perform dimension reduction processing to vectorize the nodes.
The method provided by the invention and four classical algorithms are tested on an ANN data set, wherein the four classical algorithms are main _ sttenHOPE, Node2Vec, SONESDNE and GraRep respectively.
As shown in fig. 3, the average MRR score comparison graph of the five methods is obtained, MRR measures the recall ratio of the recommendation system, and it can be seen from fig. 3 that the MRR scores provided by the present invention are all higher than those of the four classical algorithms, and the score difference between the lowest SDNE algorithm and the method provided by the present invention is 0.16, that is, the network characterization acquisition method provided by the present invention acquires the network characterization vector most accurately.
Claims (3)
1. A network characterization acquisition method based on deep learning is used for acquiring a characterization vector of a network to be characterized, and is characterized by comprising the following steps:
step 1, obtaining a network to be characterized, wherein the network to be characterized comprises | V | nodes, and | V | ≠ 0; optionally selecting one node from the | V | nodes as a current root node, and executing the step 2;
step 2, carrying out random walk on the network to be characterized by taking the current root node as a starting point to obtain N random walk sequences, wherein N is a positive integer;
wherein the nth random walk sequence SnIncluding the nth content sequenceAnd nth node identification sequenceWherein q is 1,2, …, T, T represents total number of nodes randomly walked, N is 1,2, … …, N,a content vector representing the q-th node, c represents content,an identification element representing the q-th node, i representing an identification;
for the nth content sequenceVectorizing to obtain a content representation vector;
step 3, inputting the content characterization vectors into a bidirectional long-time memory recurrent neural network to obtain a hidden state vector sequence during forward propagation at the time t and a hidden state vector sequence during reverse propagation at the time t;
linking the hidden state vector sequence during forward propagation at the time t and the hidden state vector sequence during reverse propagation at the time t to obtain the hidden state vector sequence at the time t, wherein t represents the time, and when the step 3 is executed for the first time, t is 1;
step 4, randomly setting a global vector, inputting the global vector and the t-moment implicit state vector sequence obtained in the step 3 into the attention layer to obtain an attention vector ct;
Step 5, when t is equal to 1, attention vector c is focusedtInputting the initial input vector into a long-time memory recurrent neural network to obtain a node identification vector d1;
The initial input vector is obtained by linking the last element in the hidden state vector sequence when the t is 1 moment in forward transmission with the first element in the hidden state vector sequence when the t is 1 moment in reverse transmission;
when t is>1, attention vector ctAnd the node identification vector d obtained in the last execution of step 5t-1Inputting the data into a long-time memory recurrent neural network to obtain a node identification vector dt;
Step 6, repeating step 3 to step 5T-1 times until T node identification vectors are obtained, and obtaining a coding layer output sequence D ═ D1,d2,...,dt,...,dT};
Encoding layer output sequence D ═ D1,d2,...,dt,...,dTMapping each node identification vector in the probability vector to be a node identification vector of a dimension | V |, and carrying out normalization processing on the node identification vector of the dimension | V |, so as to obtain a probability vector, wherein the jth element in the probability vector is pt(j),j=1,2,...,|V|;
Step 7, after each node in the | V | nodes is taken as a current root node, repeating the step 2 to the step 6 for N-1 times, and calculating a loss value L through the formula (1);
v in the formula (1)qRepresenting the q node in the nth random walk sequence;as a binary function, when the identification element of the q-th node in the nth random walk sequenceIs equal to jOutput 1, otherwiseOutputting 0;
if the loss value L is less than or equal to the preset threshold value P, D ═ D1,d2,...,dt,...,dTThe network representation vector is obtained;
otherwise, adjusting the two-way long-time and short-time memory cyclic neural network and the weight matrix in the long-time and short-time memory cyclic neural network, repeating the steps 1 to 7 until the loss value L is less than or equal to a preset threshold value P, and at the moment, D ═ D1,d2,...,dt,...,dTAnd is a network characterization vector.
2. The method for obtaining network characterization based on deep learning of claim 1, wherein the attention vector c in step 4 is obtained by equation (2)t:
In the formula (2), αtqRepresenting the qth element pair attention vector c in the hidden state vector sequence at time ttDegree of contribution of htqRepresenting the q-th element in the implicit state vector sequence;
3. The deep learning-based network representation acquisition method according to claim 1, wherein the preset threshold P is 0.001.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910747332.9A CN110674922A (en) | 2019-08-14 | 2019-08-14 | Network representation obtaining method based on deep learning |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910747332.9A CN110674922A (en) | 2019-08-14 | 2019-08-14 | Network representation obtaining method based on deep learning |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110674922A true CN110674922A (en) | 2020-01-10 |
Family
ID=69068853
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910747332.9A Pending CN110674922A (en) | 2019-08-14 | 2019-08-14 | Network representation obtaining method based on deep learning |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110674922A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111708881A (en) * | 2020-05-22 | 2020-09-25 | 国网天津市电力公司 | Text representation learning method introducing incidence relation |
CN113033104A (en) * | 2021-03-31 | 2021-06-25 | 浙江大学 | Lithium battery state of charge estimation method based on graph convolution |
-
2019
- 2019-08-14 CN CN201910747332.9A patent/CN110674922A/en active Pending
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111708881A (en) * | 2020-05-22 | 2020-09-25 | 国网天津市电力公司 | Text representation learning method introducing incidence relation |
CN113033104A (en) * | 2021-03-31 | 2021-06-25 | 浙江大学 | Lithium battery state of charge estimation method based on graph convolution |
CN113033104B (en) * | 2021-03-31 | 2022-05-24 | 浙江大学 | Lithium battery state of charge estimation method based on graph convolution |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111274398B (en) | Method and system for analyzing comment emotion of aspect-level user product | |
Ma et al. | Adaptive-step graph meta-learner for few-shot graph classification | |
CN109635204A (en) | Online recommender system based on collaborative filtering and length memory network | |
Li et al. | Restricted Boltzmann machine-based approaches for link prediction in dynamic networks | |
CN109933792B (en) | Viewpoint type problem reading and understanding method based on multilayer bidirectional LSTM and verification model | |
CN110674323A (en) | Unsupervised cross-modal Hash retrieval method and system based on virtual label regression | |
CN109284406A (en) | Intension recognizing method based on difference Recognition with Recurrent Neural Network | |
CN113626589B (en) | Multi-label text classification method based on mixed attention mechanism | |
CN111008224A (en) | Time sequence classification and retrieval method based on deep multitask representation learning | |
CN110674922A (en) | Network representation obtaining method based on deep learning | |
CN113987167A (en) | Dependency perception graph convolutional network-based aspect-level emotion classification method and system | |
CN115051929B (en) | Network fault prediction method and device based on self-supervision target perception neural network | |
CN114972839A (en) | Generalized continuous classification method based on online contrast distillation network | |
CN111046655B (en) | Data processing method and device and computer readable storage medium | |
CN115775349A (en) | False news detection method and device based on multi-mode fusion | |
CN110222839A (en) | A kind of method, apparatus and storage medium of network representation study | |
Bai et al. | Learning high-level image representation for image retrieval via multi-task dnn using clickthrough data | |
CN116956228A (en) | Text mining method for technical transaction platform | |
Wu et al. | Multi-instance learning from positive and unlabeled bags | |
CN116403231A (en) | Multi-hop reading understanding method and system based on double-view contrast learning and graph pruning | |
Liu et al. | A sentence-level joint relation classification model based on reinforcement learning | |
CN111126758A (en) | Academic team influence propagation prediction method, device and storage medium | |
CN115422945A (en) | Rumor detection method and system integrating emotion mining | |
CN114330672A (en) | Multi-information aggregated graph residual generation model, classification method, electronic device and storage medium | |
Jiang et al. | PruneFaceDet: Pruning lightweight face detection network by sparsity training |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |