CN110674922A

CN110674922A - Network representation obtaining method based on deep learning

Info

Publication number: CN110674922A
Application number: CN201910747332.9A
Authority: CN
Inventors: 杨黎斌; 王楠鑫; 蔡晓妍; 梅欣; 顾铭; 刘森
Original assignee: Northwest University of Technology
Current assignee: Northwestern Polytechnical University; Northwest University of Technology
Priority date: 2019-08-14
Filing date: 2019-08-14
Publication date: 2020-01-10

Abstract

The invention provides a network representation obtaining method based on deep learning, which comprises the following steps: step 1, obtaining a network containing node content, wherein the network containing the node content comprises | V | nodes, one node is selected from the | V | nodes to be used as a current root node, and the network containing the node content is randomly walked according to the current root node to obtain a content sequence

And node identification sequenceStep 2, the content sequence is processed

Input into a deep learning model based on attention mechanism to obtain predictionAnd identifying the vector sequence to obtain the network characterization vector. The invention applies the research result of the deep learning technology in the machine translation direction to the network characterization learning, and fuses the content and the structure of the network from the machine translation angle to obtain a proper network characterization vector.

Description

Network representation obtaining method based on deep learning

Technical Field

The invention belongs to the field of machine learning, and particularly relates to a network representation acquisition method based on deep learning.

Background

In actual social life, many complex systems can be represented by network structures, nodes in a network represent data samples, edges represent relationships of nodes in the network, for example, people in a social network formed by people connecting with each other represent nodes, relationships among people represent edges of the network, documents in a citation network formed by citations and cited relationships among documents serve as nodes of the network, citations and relationships among documents serve as edges of the network, devices in an internet of things formed by devices such as sensors and controllers which are associated with each other represent nodes of the network, and connection relationships among devices represent edges of the network. It can be seen that networks are becoming a direct, primary means of representation of the many complex systems of the big data age with their "flexible" yet powerful representation capabilities. In order to better apply the data output by the system composed of network forms to various industry fields, how to effectively and accurately represent the network becomes a hot spot of current research. The traditional representation mode based on the network topology structure has obvious defects under the condition that the network scale is exponentially increased in the big data era, a great amount of iteration and combined calculation steps bring great inconvenience to network analysis and processing, and the strong coupling relation among nodes makes distributed and parallel calculation and grid calculation difficult to be applied to the processing of network data.

According to the continuous development of the network characterization learning method, the development process is taken as an angle to be divided into two main categories: the traditional Network characterization learning method based on graph structure and graph Embedding (Network Embedding) method.

The traditional network characterization learning method based on the graph structure, such as the method based on the adjacency matrix, has obvious disadvantages. Firstly, strong coupling, because the network is stored by using the traditional representation method, the nodes in the network have the strong coupling relationship and the distributed operation of the single node is difficult to be carried out; secondly, the high-cost computation complexity and the waste of storage resources are caused, along with the arrival of a big data era, the data scale is increased rapidly, the relationship among data is more complex, so that the complexity of iteration and combined computation is increased exponentially, and the network processing and analysis are difficult to be carried out well. Meanwhile, the edges of the massive data nodes are all stored in the edge set, which causes waste of storage resources.

For a graph-embedded network characterization learning method, a deep learning technology graph embedding method is proposed by Bryan Perozzi et al, the algorithm is a first graph embedding method applying a deep learning technology, essentially, a random walk mode is used for walking and capturing local context information of nodes in a network, and because the distribution of the nodes in short random walk is similar to the distribution of words processed by natural language, a Skip-Gram model is used for learning a representation vector of the nodes, and the vector represents local structure information of the nodes in the network. Although the algorithm applies the deep learning technology, the algorithm has the defects that firstly, the clear optimization target is lacked, and secondly, the local structure which only walks the network in a short distance captures local information and is lack of global property. The network characterization vectors learned by Tang-Jian et al, which have certain drawbacks in view of capturing only local structures, propose an algorithm that combines network global and local structures-LINE, which is suitable for networks with larger scales, retains first-order and second-order similarities, and uses negative sampling to optimize the Skip-Gram model. But the disadvantage is that LINE separately learns the local information and the global information of the network, and finally simply connects the two expression vectors, which cannot better fuse the local structure and the global structure. Trepeng et al propose a graph embedding method based on a semi-supervised model, which maintains the proximity of 2-hop neighbors of a node through the model architecture of a coding machine, and does not simply merge local and global structures through a connection mode. But the effect of the algorithm is less than ideal when the network characterization vector needs to represent the contents of the node itself. When the structure and the content of the network are considered, the STNE algorithm adopts an Encoder-Decoder model based on a recurrent neural network (STM), and can better obtain a characterization vector covering the structure and the content of the network. But has the disadvantage that the algorithm cannot accurately translate the node content vector (content vector) into the node identification vector (identity vector).

Therefore, the above-mentioned conventional graph embedding method cannot well represent an accurate token vector under the condition of covering the network structure and content.

Disclosure of Invention

Aiming at the defects in the prior art, the invention aims to provide a network characterization acquisition method based on deep learning, and solve the technical problem that the prior art cannot well express an accurate characterization vector.

In order to solve the technical problem, the application adopts the following technical scheme:

compared with the prior art, the invention has the beneficial technical effects that:

1. the invention applies the research result of the deep learning technology in the machine translation direction to the network characterization learning, and fuses the content and the structure of the network from the machine translation angle to obtain a proper network characterization vector;

2. the invention adds an attention mechanism, so that the network characterization vector is more accurate.

Drawings

FIG. 1 is an overall flow chart of the present invention;

FIG. 2 is a schematic diagram of an Encoder-Decoder model with attention mechanism;

FIG. 3 is a comparison graph of the results of MRR experiments in the prior art and the method provided by the present invention.

The details of the present invention are explained in further detail below with reference to the drawings and examples.

Detailed Description

The following embodiments of the present invention are provided, and it should be noted that the present invention is not limited to the following embodiments, and all equivalent changes based on the technical solutions of the present invention are within the protection scope of the present invention.

Random walk: the conservation quantities carried by any irregular walker correspond to a diffusion transport law respectively, are close to the Brownian motion, and are ideal mathematical states of the Brownian motion.

Long-short term memory neural network: a threshold cycle neural network capable of processing sequence data changes the weight in the cycle through thresholds with different functions, and can avoid the problem of gradient disappearance or gradient explosion of the traditional RNN. The LSTM memory cell stores information so that it can store context information well over long distances.

Attention vector: from the attention mechanism, which allows the decoder to consider the entire encoder output hidden state sequence at each time step, the encoder keeps more information scattered across all the hidden state vectors, and the decoder can decide which vectors are of more interest when using these hidden vectors.

Example one

The embodiment provides a network representation obtaining method based on deep learning, as shown in fig. 1, including the following steps:

step 1, acquiring a network containing node content, wherein the network containing the node content comprises | V | nodes, and | V | ≠ 0;

step 2, selecting one node from | V | nodes as a current root node, and performing random walk on the network containing the node content according to the current root node to obtain N random walk sequences, wherein N is a positive integer, each random walk sequence comprises a content sequence and a node identification sequence, and the nth content sequenceAnd nth node identification sequence

Where T represents the total number of steps of the random walk, N is 1,2, … …, N,a content vector representing the q-th node,

an identification element representing the qth node;

and carrying out random walk by taking | V | nodes in the network as root nodes, and walking out N random walk sequences in a co-walk manner.

Since the nodes in the content sequence are all raw data, such as the abstract and the title of an article, these raw nodes need to be processed to convert them into vector representations.

For content sequence

Vectorizing to obtain content characterization vector

Wherein

The dimension of (a) is m;

in the specific implementation mode, the mapping function for vectorizing the content sequence is the combination of a full-link layer and a Hash Trick method, and when the dimension of a data set is larger than 3000, the Hash Trick is recommended to be used; both are not exceeded.

At the content embedding layer in FIG. 2, the initial input is the content sequence

Each of which

Is represented as follows:

wherein the content of the first and second substances,

representing original text content (abstracts and topics extracted from an article), e.g.

:[3,[Neural Collaborative Filtering In recent years,deep neural networkshave yielded immense success on speech recognition……]]；

Mapping each node in the content sequence into 200(d) -dimensional vector representation by using a Hash Trick method, namely vectorizing the content sequence

Wherein

Is shown and

similarly, only the upper layer is required to be

Change to [ id ]_j,dimension_d]E.g. of

:[3,0.1,0.2,0.4……]。

Let time t equal to 1;

step 3, representing the content into a vector

Inputting a bidirectional long-time and short-time memory cyclic neural network (Bi-LSTM) to obtain an implicit state vector sequence during forward propagation at the time t

Implicit state vector sequence when backward propagating with time t

Forward transmitting time tSequence of implicit state vectors at broadcast time

Implicit state vector sequence when backward propagating with time t

Chaining to obtain a t-moment hidden state vector sequence { h_t1,h_t2,...,h_tq,...h_tTT denotes the time, h_tqRepresenting the q-th element in the implicit state vector sequence; wherein the content of the first and second substances,

and

has a dimension of d, h_tqDimension of (2 d);

the machine translation module containing the Attention mechanism is constructed in the invention and comprises a Content representation Content Embedding Layer, a coding Encoder Layer, an Attention Layer and a decoding Decoder Layer which are sequentially connected.

In the invention, in order to obtain the node identification sequence better, Bi-directionalLSTM (Bi-LSTM) is used as an Encoder layer.

Will be provided with

The built-in function (threshold function) input to the Bi-LSTM sub-band obtains the implicit state vector sequence in forward propagation

And implicit state vector sequences when propagating in reverse

As shown in FIG. 2;

wherein the content of the first and second substances,linking means directly linking twoConnected by vectors, e.g. a ═ 1,2,3]，b＝[4,5,6]Operation of

According to the invention, the node vector representation in the node identification sequence is more accurate, and an attention layer is added into the model aiming at different influence degrees of the decoding layer output sequence of the neural network on different node vector representations in the node identification sequence.

Step 4, randomly setting a global vector u with dimension d_ωThe global vector mu_ωAnd implicit state vector sequence h_t1,h_t2,...,h_tq,...h_tTInputting to the attention layer to get the attention vector c_t；

Obtaining the attention vector c in step 4 by the formula (2)_t：

In the formula (2), α_tqDenotes the qth at time t_tqVector to attention vector c_tThe degree of contribution of (c);

u_tq＝tanh(W_ωh_tq+b_ω)，W_ωweight matrix representing the attention layer, b_ωIndicating the bias term for the attention layer.

Step 5, when t is equal to 1, attention vector c is focused_tAnd an initial input vector

Inputting the data into a long-time memory recurrent neural network (LSTM) to obtain a node identification vector d₁Wherein

Representing links, node-id vectors d₁D is the dimension of;

when t is>At 1 hourAttention vector c_tAnd d_t-1Inputting the data into a long-time memory recurrent neural network to obtain a node identification vector d_tNode identification vector d_tD is the dimension of;

the LSTM is used as the Decoder layer in the present invention. After the previous operations, a Decoder layer initial input vector w and a Decoder layer output vector d at a certain moment are obtained_tCorresponding attention vector c_t. The decoder layer initial input vector w contains the content sequence

The content of (2) can be used as an initial input vector of the decoder layer, so that the final output result of the decoder layer can be more accurate. In this embodiment, the Decoder layer inputs the vector and the attention vector c at each moment_tJointly form high-level representation as input, and obtain a decoder layer result output sequence, D ═ D₁,d₂,...,d_t,...,d_T}。

And 6, repeating the steps 3 to 5 until T is T +1 to obtain the coding layer output sequence D ═ D₁,d₂,...,d_t,...,d_T}；

Encoding layer output sequence D ═ D₁,d₂,...,d_t,...,d_TEvery node in the vector d_tNode identification vector d 'mapped to | V | dimension'_tVector d 'is identified for nodes of | V | dimension'_tCarrying out normalization processing to obtain a probability vector p_t(j) J represents the jth element in the probability vector, j ═ 1, 2., | V |;

step 7, each node in the | V | nodes is used as a current root node, the steps 1 to 6N times are repeated, and the loss value L is calculated through the formula (1);

v in the formula (1)_qRepresenting the q node in the nth random walk sequence; s_nDenotes the n-thA random walk sequence;as a binary function, when the identification element of the q-th node in the nth random walk sequenceIs equal to j

Output 1, otherwise

Outputting 0;

if the loss value L is less than or equal to the preset threshold value P, D ═ D₁,d₂,...,d_t,...,d_TThe network representation vector is obtained; specifically, the preset threshold P may be 0.001.

Otherwise, adjusting the two-way long-time and short-time memory cyclic neural network and the weight matrix in the long-time and short-time memory cyclic neural network, repeating the steps 1 to 7 until the loss value L is less than or equal to a preset threshold value P, and at the moment, D ═ D₁,d₂,...,d_t,...,d_TThe network characterization vector is obtained.

Example two

In this embodiment, an experiment verification is performed on the deep learning-based network representation acquisition method provided by the present invention, and an AAN data set is used in the experiment, where the AAN data set includes 17667 articles and 107879 citation relations (edges of nodes in a network), each element in the data set is an extracted article, and the article includes an abstract and a title of an original article. For each query article, in this embodiment, nodes directly connected to the query article are randomly divided according to a ratio of 1:9 as hidden articles and seed articles, and after 584 query articles and isolated articles are removed, a new citation network is formed, which has 16791 nodes and 88617 edges, and a Hash check method is used to perform dimension reduction processing to vectorize the nodes.

The method provided by the invention and four classical algorithms are tested on an ANN data set, wherein the four classical algorithms are main _ sttenHOPE, Node2Vec, SONESDNE and GraRep respectively.

As shown in fig. 3, the average MRR score comparison graph of the five methods is obtained, MRR measures the recall ratio of the recommendation system, and it can be seen from fig. 3 that the MRR scores provided by the present invention are all higher than those of the four classical algorithms, and the score difference between the lowest SDNE algorithm and the method provided by the present invention is 0.16, that is, the network characterization acquisition method provided by the present invention acquires the network characterization vector most accurately.

Claims

1. A network characterization acquisition method based on deep learning is used for acquiring a characterization vector of a network to be characterized, and is characterized by comprising the following steps:

step 1, obtaining a network to be characterized, wherein the network to be characterized comprises | V | nodes, and | V | ≠ 0; optionally selecting one node from the | V | nodes as a current root node, and executing the step 2;

step 2, carrying out random walk on the network to be characterized by taking the current root node as a starting point to obtain N random walk sequences, wherein N is a positive integer;

wherein the nth random walk sequence S_nIncluding the nth content sequence

And nth node identification sequence

Wherein q is 1,2, …, T, T represents total number of nodes randomly walked, N is 1,2, … …, N,

a content vector representing the q-th node, c represents content,

an identification element representing the q-th node, i representing an identification;

for the nth content sequenceVectorizing to obtain a content representation vector;

step 3, inputting the content characterization vectors into a bidirectional long-time memory recurrent neural network to obtain a hidden state vector sequence during forward propagation at the time t and a hidden state vector sequence during reverse propagation at the time t;

linking the hidden state vector sequence during forward propagation at the time t and the hidden state vector sequence during reverse propagation at the time t to obtain the hidden state vector sequence at the time t, wherein t represents the time, and when the step 3 is executed for the first time, t is 1;

step 4, randomly setting a global vector, inputting the global vector and the t-moment implicit state vector sequence obtained in the step 3 into the attention layer to obtain an attention vector c_t；

Step 5, when t is equal to 1, attention vector c is focused_tInputting the initial input vector into a long-time memory recurrent neural network to obtain a node identification vector d₁；

The initial input vector is obtained by linking the last element in the hidden state vector sequence when the t is 1 moment in forward transmission with the first element in the hidden state vector sequence when the t is 1 moment in reverse transmission;

when t is>1, attention vector c_tAnd the node identification vector d obtained in the last execution of step 5_t-1Inputting the data into a long-time memory recurrent neural network to obtain a node identification vector d_t；

Step 6, repeating step 3 to step 5T-1 times until T node identification vectors are obtained, and obtaining a coding layer output sequence D ═ D₁,d₂,...,d_t,...,d_T}；

Encoding layer output sequence D ═ D₁,d₂,...,d_t,...,d_TMapping each node identification vector in the probability vector to be a node identification vector of a dimension | V |, and carrying out normalization processing on the node identification vector of the dimension | V |, so as to obtain a probability vector, wherein the jth element in the probability vector is p_t(j)，j＝1,2,...,|V|；

Step 7, after each node in the | V | nodes is taken as a current root node, repeating the step 2 to the step 6 for N-1 times, and calculating a loss value L through the formula (1);

v in the formula (1)_qRepresenting the q node in the nth random walk sequence;

as a binary function, when the identification element of the q-th node in the nth random walk sequence

Is equal to j

Output 1, otherwise

Outputting 0;

if the loss value L is less than or equal to the preset threshold value P, D ═ D₁,d₂,...,d_t,...,d_TThe network representation vector is obtained;

otherwise, adjusting the two-way long-time and short-time memory cyclic neural network and the weight matrix in the long-time and short-time memory cyclic neural network, repeating the steps 1 to 7 until the loss value L is less than or equal to a preset threshold value P, and at the moment, D ═ D₁,d₂,...,d_t,...,d_TAnd is a network characterization vector.

2. The method for obtaining network characterization based on deep learning of claim 1, wherein the attention vector c in step 4 is obtained by equation (2)_t：

In the formula (2), α_tqRepresenting the qth element pair attention vector c in the hidden state vector sequence at time t_tDegree of contribution of h_tqRepresenting the q-th element in the implicit state vector sequence;

u_tq＝tanh(W_ωh_tq+b_ω)，W_ωweight matrix representing the attention layer, b_ωBias term representing the attention layer, b_ω>0。

3. The deep learning-based network representation acquisition method according to claim 1, wherein the preset threshold P is 0.001.