WO2021196240A1 - Representation learning algorithm oriented to cross-network application - Google Patents

Representation learning algorithm oriented to cross-network application Download PDF

Info

Publication number
WO2021196240A1
WO2021196240A1 PCT/CN2020/083378 CN2020083378W WO2021196240A1 WO 2021196240 A1 WO2021196240 A1 WO 2021196240A1 CN 2020083378 W CN2020083378 W CN 2020083378W WO 2021196240 A1 WO2021196240 A1 WO 2021196240A1
Authority
WO
WIPO (PCT)
Prior art keywords
network
layer
node
feature
expression
Prior art date
Application number
PCT/CN2020/083378
Other languages
French (fr)
Chinese (zh)
Inventor
王朝坤
严本成
Original Assignee
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 清华大学 filed Critical 清华大学
Priority to PCT/CN2020/083378 priority Critical patent/WO2021196240A1/en
Priority to CN202080005540.2A priority patent/CN113228059A/en
Publication of WO2021196240A1 publication Critical patent/WO2021196240A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure belongs to the field of computer technology, and in particular relates to a representation learning algorithm oriented across networks.
  • Network structure data is widely used in many application scenarios because of its ability to naturally express the relationship between objects.
  • the friendly relationship between users and users can be expressed in the form of social networks; in the field of scientific research, the relationship between authors and papers, and the relationship between papers and papers, can be used separately Express with the reference network; in the field of e-commerce, the network formed by the click relationship between users and commodities.
  • the vectorization of nodes refers to the hope that the nodes in the network can be mapped to a low-dimensional space through algorithms. In this low-dimensional vector space, the distance between nodes can reflect the relationship between each other in the original network.
  • the learned node vector can be applied to multiple tasks, such as recommendation, link prediction, and so on.
  • the existing network embedding representation algorithms can be divided into two categories: one is the direct inference representation learning algorithm. Given a target network, the direct expression algorithm directly optimizes the expression vector of each node through the attributes of the node and the network relationship, such as DeepWalk and Node2vec. The second is the inductive representation learning algorithm.
  • the inductive representation algorithm is often to learn a mapping function. As long as the attributes of the input node and its neighbors are given, the expression vector of the node can be inferred through the mapping function, such as GCN, GraphSAGE and GAT.
  • the direct inference algorithm because the direct inference algorithm directly optimizes the expression vector of the nodes in the network, for a new network, the direct inference algorithm cannot directly infer the expression vector of the nodes in the new network. . Therefore, the direct algorithm does not have any available knowledge that can be used for cross-network learning.
  • inductive algorithm Although it considers learning a mapping function of node attributes and structural information when modeling, so that cross-network inference can be made naturally, but the inductive algorithm does not consider The data distribution between the network and the network is different, and the patterns or knowledge summarized from one network may not be well applicable to another network. Therefore, inductive algorithms also exist in the problem of cross-network representation learning. Certain flaws.
  • the present disclosure proposes a cross-network-oriented representation learning algorithm.
  • a cross-network-oriented representation learning algorithm including:
  • S1 Generate network data including a source network and a target network, each network data includes network topology information and node attribute information, and the target network is a network to be inferred and characterized;
  • S4 Perform classification prediction probability calculation on the expression vector of the source network node obtained from the neural network of the L layer, calculate the classification loss through the cross-entropy loss function, and combine the distance loss to update the network parameters through the back propagation algorithm;
  • step S3 after obtaining the input data of the source network and the target network, they are inputted into an L-layer neural network respectively, and each The layer calculates the structural characteristics and expression characteristics of the source network and the target network respectively, and the distance loss between the corresponding characteristics of the source network and the target network includes:
  • the structural feature obtains the new expression feature vector of the current node through the message aggregation module
  • S33 Calculate the structural feature distance loss and the expression feature distance loss between the source network and the target network in the current layer through the cross-network alignment module;
  • step S31 in each layer of the L-layer neural network, the node feature expression vector of each network generates structural features through a message routing module include:
  • the message routing module of each layer is expressed as:
  • Is the structural feature vector of the source network and the target network calculated by the node i in the l-th layer of the L-layer neural network Is the expression feature vector of the source network and the target network of the l-1th layer in the L-layer neural network
  • the expression feature vector of the 0th layer is represented by the original feature vector x i of the node
  • a (l)T is the parameter vector involved in the message routing module of the first layer
  • is the activation function
  • is the direct connection operation of the two vectors
  • N(v ) Is the set of neighbors directly connected to node v
  • step S32 the structural feature obtaining the new expression feature vector of the current node through the message aggregation module includes:
  • the message aggregation module of each layer is expressed as:
  • I the parameter matrix involved in the message aggregation module
  • I a vector showing the aggregation level of the node.
  • the step S33 is to calculate the structural feature distance loss and the expression feature distance loss from the source network and the target network in the current layer through the cross-network alignment module include:
  • the current layer from the structural feature distance loss between the source network and the target network is:
  • P r , Q r are the structural feature vectors of the source network and the target network with Distribution, Is a distance function used to calculate the structural feature vector with The expected distance.
  • the distance loss of the current layer from the expression feature between the source network and the target network is:
  • P a , Q a are the node expression feature vectors of the source network and the target network with Distribution, As a distance function, used to calculate the node expression feature vector with The expected distance.
  • steps S31 to S33 are repeated L times to obtain the node feature vectors of the final source network and the target network and the accumulated structural features of the L layer Distance loss and express feature distance loss include:
  • the node feature vectors of the source network and the target network and the accumulated structural feature distance loss of the L layer are:
  • the node feature vectors of the source network and the target network and the accumulated expression feature distance loss of the L layer are:
  • the expression vector of the source network node obtained from the neural network of the L layer is calculated by the classification prediction probability, and the cross-entropy loss function is used Calculate the classification loss, and combine the distance loss to update the network parameters through the backpropagation algorithm, including:
  • the cross entropy loss function is expressed as:
  • L S is the cross-entropy loss function
  • W Z is the weight parameter matrix
  • Is the feature expression vector of the node
  • z i is the classification prediction probability of the node category
  • yi is the true category of the node
  • V S is the set of nodes with category information in the source network.
  • the present disclosure has the following advantages:
  • the cross-network-oriented representation learning algorithm of the present disclosure can extract structural information and node attribute information in the network.
  • the algorithm takes into account the inconsistency of distribution between different network data, and compensates for the inconsistency by minimizing the feature distance.
  • the resulting information loss effectively solves the problem of cross-network representation learning, and has a broad application space in reality.
  • FIG. 1 is a flowchart of an embodiment of the cross-network-oriented representation learning algorithm proposed by the present disclosure
  • Fig. 2 is a flowchart of another embodiment of the cross-network-oriented representation learning algorithm proposed by the present disclosure.
  • FIG. 1 is a flowchart of an embodiment of the cross-network-oriented representation learning algorithm proposed by the present disclosure. As shown in FIG. 1, the cross-network-oriented representation learning algorithm:
  • each network data includes network topology information and node attribute information
  • the target network is the network to be inferred and characterized
  • the expression of the source network is G S
  • the target network is the network to be inferred and characterized
  • the expression of the source network is G S
  • the target network The expression of is G t
  • S4 Perform classification prediction probability calculation on the expression vector of the source network node obtained from the neural network of the L layer, calculate the classification loss through the cross-entropy loss function, and combine the distance loss to update the network parameters through the back propagation algorithm;
  • Figure 2 is a flowchart of another embodiment of the cross-network-oriented representation learning algorithm proposed by the present disclosure.
  • the step S3 is to obtain the input data of the source network and the target network, and then input them to An L-layer neural network, and calculate the structural characteristics and expression characteristics of the source network and the target network separately for each layer, and calculate the distance loss between the corresponding characteristics of the source network and the target network including:
  • the node feature expression vector of each network generates structural features through a message routing module; the structural feature expression is
  • the structural feature obtains the new expression feature vector of the current node through the message aggregation module, and the expression feature vector expression is
  • S33 Calculate the structural feature distance loss and the expression feature distance loss between the source network and the target network in the current layer through the cross-network alignment module;
  • step S31 in each layer of the L-layer neural network, the node feature expression vector of each network generates structural features through a message routing module including:
  • the message routing module of each layer is expressed as:
  • Is the structural feature vector of the source network and the target network calculated at the lth layer of the L-layer neural network for node i Is the expression feature vector of the source network and the target network of the l-1th layer in the L-layer neural network
  • the expression feature vector of the 0th layer is represented by the original feature vector x i of the node
  • a (l)T is the parameter vector involved in the message routing module of the first layer
  • is the activation function
  • is the direct connection operation of the two vectors
  • N(v ) Is the set of neighbors directly connected to node v
  • step S32 the structure feature obtaining the new expression feature vector of the current node through the message aggregation module includes:
  • the message aggregation module of each layer is expressed as:
  • I the parameter matrix involved in the message aggregation module
  • I a vector showing the aggregation level of the node.
  • step S33 calculating the structural feature distance loss and expression feature distance loss between the source network and the target network in the current layer through the cross-network alignment module includes:
  • the current layer from the structural feature distance loss between the source network and the target network is:
  • P r , Q r are the structural feature vectors of the source network and the target network with Distribution, Is a distance function used to calculate the structural feature vector with The expected distance.
  • the distance loss of the current layer from the expression feature between the source network and the target network is:
  • O a , Q a are the node expression feature vectors of the source network and the target network with Distribution, As a distance function, used to calculate the node expression feature vector with The expected distance.
  • the step S34 repeating steps S31 to S33 for L times, to obtain the node feature vectors of the final source network and the target network and the accumulated structural feature distance loss and expression feature distance loss of the L layer include:
  • the node feature vectors of the source network and the target network and the accumulated structural feature distance loss of the L layer are:
  • the node feature vectors of the source network and the target network and the accumulated expression feature distance loss of the L layer are:
  • step S4 the classification prediction probability calculation is performed on the expression vector of the source network node obtained from the neural network of the L layer, the classification loss is calculated by the cross-entropy loss function, and the distance loss is combined, and the network parameters are updated by the backpropagation algorithm.
  • the cross entropy loss function is expressed as:
  • L S is the cross-entropy loss function
  • W z is the weight parameter matrix
  • Is the feature expression vector of the node
  • z i is the classification prediction probability of the node category
  • yi is the true category of the node
  • V S is the set of nodes with category information in the source network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Provided is a representation learning algorithm oriented to cross-network application, the algorithm comprising: S1, generating data of networks comprising a source network and a target network, wherein data of each network data includes topological structure information and node attribute information of the network, and the target network is a network to be inferred and characterized; S2, randomly sampling a set number of nodes from the source network and the target network, and sorting same into a data format that meets an input for the algorithm; S3, after input data from the source network and the target network is received, respectively inputting same into an L-layer neural network, respectively calculating structural features and expression features of the source network and the target network for each layer, and calculating the distance loss between corresponding features of the source network and the target network; S4, performing classification prediction probability calculation on an expression vector, obtained from the L-layer neural network, of a source network node, calculating a classification loss using a cross entropy loss function, and updating, in combination with the distance loss, neural network parameters by means of a back-propagation algorithm; and S5, repeating steps S2-S4 until the entire algorithm converges. Therefore, the invention effectively solves the problem of cross-network representation learning, and has a broad application space in reality.

Description

面向跨网络的表示学习算法Representation learning algorithm for cross-network 技术领域Technical field
本公开属于计算机技术领域,尤其涉及一种面向跨网络的表示学习算法。The present disclosure belongs to the field of computer technology, and in particular relates to a representation learning algorithm oriented across networks.
背景技术Background technique
网络结构数据因其能够自然地表达对象与对象之间的关系而广泛存在于众多的应用场景中。比如在社交领域(微信或者微博),用户与用户的友好关系能够以社交网络的形式表达;在科研领域,作者和论文之间的关系,论文与论文之间的关系,可以分别用发表网络和引用网络进行表达;在电商领域,用户与商品之间的点击关系形成的网络。正因为网络结构数据的普遍性以及重要性,近年来,针对网络中的节点如何进行有效的向量化表达(即网络嵌入表达)成为一个重要的研究问题。节点的向量化,指的是希望通过算法将网络中的节点映射到一个低维空间。在这个低维的向量空间,节点与节点之间的距离能够反映彼此之间在原始网络中的关系。学习得到的节点向量可以被应用到多个任务,比如推荐、链路预测等等。Network structure data is widely used in many application scenarios because of its ability to naturally express the relationship between objects. For example, in the social field (WeChat or Weibo), the friendly relationship between users and users can be expressed in the form of social networks; in the field of scientific research, the relationship between authors and papers, and the relationship between papers and papers, can be used separately Express with the reference network; in the field of e-commerce, the network formed by the click relationship between users and commodities. Because of the universality and importance of network structure data, in recent years, how to effectively vectorize the expression (ie network embedding expression) for the nodes in the network has become an important research problem. The vectorization of nodes refers to the hope that the nodes in the network can be mapped to a low-dimensional space through algorithms. In this low-dimensional vector space, the distance between nodes can reflect the relationship between each other in the original network. The learned node vector can be applied to multiple tasks, such as recommendation, link prediction, and so on.
现有的网络嵌入表示算法主要可以分为两类:一是直推式的表示学习算法。给定一个目标网络,直推式的表示算法直接通过节点的属性以及网络关系去优化每个节点的表达向量,比如DeepWalk和Node2vec。二是归纳式的表示学习算法。归纳式的表示算法往往是学习出一个映射函数,只要给定输入节点的属性以及其邻居,就可以通过映射函数推断出节点的表达向量,比如GCN,GraphSAGE和GAT。The existing network embedding representation algorithms can be divided into two categories: one is the direct inference representation learning algorithm. Given a target network, the direct expression algorithm directly optimizes the expression vector of each node through the attributes of the node and the network relationship, such as DeepWalk and Node2vec. The second is the inductive representation learning algorithm. The inductive representation algorithm is often to learn a mapping function. As long as the attributes of the input node and its neighbors are given, the expression vector of the node can be inferred through the mapping function, such as GCN, GraphSAGE and GAT.
在现实的应用中,我们面临的可能是多个网络,每个网络可能来自不同的时刻或者不同的数据源。这些网络数据的分布可能不同。我们往往是希望从已知的网络中总结出有用的知识,将总结的知识应用到未知的网络中去。比如,在论文的引用网络中,即使不同时刻发表的论文主题热点不同,我们仍然可以借助由过去多年发表的论文形成的网络去帮助推断最近发表的论文 与论文之间的关系。因此,在面对多个不同网络的时候,如何解决网络与网络之间分布不同的问题,使得算法能够充分利用已知的网络数据,来提升未知网络数据的表示学习向量的质量是本技术研究的重点。In real applications, we may be faced with multiple networks, each of which may come from different moments or different data sources. The distribution of these network data may be different. We often hope to summarize useful knowledge from known networks and apply the summarized knowledge to unknown networks. For example, in the citation network of papers, even if the themes of the papers published at different times are different, we can still use the network formed by the papers published in the past years to help infer the relationship between the recently published papers and the papers. Therefore, when facing multiple different networks, how to solve the problem of different distributions between networks, so that the algorithm can make full use of the known network data to improve the quality of the learning vector of the unknown network data is the research of this technology. the key of.
然而,现有的算法都不能够很好地解决跨网络的表示学习问题。具体来讲:However, none of the existing algorithms can well solve the problem of cross-network representation learning. Specifically:
(1)对于直推式的算法而言,由于直推式算法是直接去优化网络中的节点表达向量,因此对于一个新的网络,直推式算法无法直接去推断新网络中节点的表达向量。因此直推式算法没有任何可用的知识可以被用来进行跨网络的学习。(1) For the direct inference algorithm, because the direct inference algorithm directly optimizes the expression vector of the nodes in the network, for a new network, the direct inference algorithm cannot directly infer the expression vector of the nodes in the new network. . Therefore, the direct algorithm does not have any available knowledge that can be used for cross-network learning.
(2)对于归纳式的算法而言,尽管其在建模的时候考虑的是学习一个节点属性和结构信息的映射函数,这样可以自然地进行跨网络的推断,但是归纳式的算法并没有考虑到网络与网络之间的数据分布是不同的,从一个网络中归纳出来的模式或者知识可能并不能很好地适用于另一个网络,因此归纳式算法在跨网络表示学习的问题上也存在着一定的缺陷。(2) For the inductive algorithm, although it considers learning a mapping function of node attributes and structural information when modeling, so that cross-network inference can be made naturally, but the inductive algorithm does not consider The data distribution between the network and the network is different, and the patterns or knowledge summarized from one network may not be well applicable to another network. Therefore, inductive algorithms also exist in the problem of cross-network representation learning. Certain flaws.
因此,现有技术需要改进。Therefore, the existing technology needs to be improved.
上述背景技术内容仅用于帮助理解本公开,而并不代表承认或认可所提及的任何内容属于相对于本公开的公知常识的一部分。The foregoing background technical content is only used to help understand the present disclosure, and does not represent an acknowledgement or recognition that any content mentioned is part of the common general knowledge relative to the present disclosure.
发明内容Summary of the invention
为解决上述技术问题,本公开提出了一种面向跨网络的表示学习算法。In order to solve the above technical problems, the present disclosure proposes a cross-network-oriented representation learning algorithm.
基于本公开实施例的一个方面,公开一种面向跨网络的表示学习算法,包括:Based on one aspect of the embodiments of the present disclosure, a cross-network-oriented representation learning algorithm is disclosed, including:
S1,生成包括源网络和目标网络的网络数据,每个网络数据包含网络的拓扑结构信息和节点属性信息,所述目标网络为所要推断表征的网络;S1: Generate network data including a source network and a target network, each network data includes network topology information and node attribute information, and the target network is a network to be inferred and characterized;
S2,分别从源网络和目标网络随机采样设定数量的节点,并整理成满足算法输入的数据格式;S2, randomly sample a set number of nodes from the source network and the target network, and sort them into a data format that meets the input of the algorithm;
S3,得到源网络和目标网络的输入数据后,分别将其输入到一个L层的神经网络,并对每一层分别计算源网络和目标网络的结构特征和表达特征,计算源网络和目标网络的对应特征之间的距离损失;S3, after obtaining the input data of the source network and the target network, input them into an L-layer neural network, and calculate the structural characteristics and expression characteristics of the source network and the target network for each layer, and calculate the source network and the target network The distance loss between the corresponding features;
S4,将从L层的神经网络中得到的源网络节点的表达向量进行分类预测概率计算,通过交叉熵损失函数计算分类损失,并且结合距离损失,通过反向传播算法更新网络参数;S4: Perform classification prediction probability calculation on the expression vector of the source network node obtained from the neural network of the L layer, calculate the classification loss through the cross-entropy loss function, and combine the distance loss to update the network parameters through the back propagation algorithm;
S5,重复步骤S2-S4,直至整个算法收敛。S5, repeat steps S2-S4 until the entire algorithm converges.
基于本公开的面向跨网络的表示学习算法的另一个实施例中,所述步骤S3,得到源网络和目标网络的输入数据后,分别将其输入到一个L层的神经网络,并对每一层分别计算源网络和目标网络的结构特征和表达特征,计算源网络和目标网络的对应特征之间的距离损失包括:In another embodiment of the cross-network-oriented representation learning algorithm based on the present disclosure, in step S3, after obtaining the input data of the source network and the target network, they are inputted into an L-layer neural network respectively, and each The layer calculates the structural characteristics and expression characteristics of the source network and the target network respectively, and the distance loss between the corresponding characteristics of the source network and the target network includes:
S30,将源网络和目标网络的节点特征输入到L层的神经网络中;S30: Input the node characteristics of the source network and the target network into the neural network of the L layer;
S31,在L层神经网络的每一层中,每个网络的节点特征表达向量经过一个消息路由模块产生结构特征;S31: In each layer of the L-layer neural network, the node feature expression vector of each network generates structural features through a message routing module;
S32,结构特征经过消息聚合模块得到当前节点的新的表达特征向量;S32, the structural feature obtains the new expression feature vector of the current node through the message aggregation module;
S33,通过跨网络对齐模块,计算当前层来自源网络和目标网络之间的结构特征距离损失和表达特征距离损失;S33: Calculate the structural feature distance loss and the expression feature distance loss between the source network and the target network in the current layer through the cross-network alignment module;
S34,重复步骤S31至S33进行L次,得到最终源网络和目标网络的节点特征向量和L层累积的结构特征距离损失和表达特征距离损失。S34: Repeat steps S31 to S33 for L times to obtain the node feature vectors of the final source network and the target network and the accumulated structural feature distance loss and expression feature distance loss of the L layer.
基于本公开的面向跨网络的表示学习算法的另一个实施例中,所述步骤S31,在L层神经网络的每一层中,每个网络的节点特征表达向量经过一个消息路由模块产生结构特征包括:In another embodiment of the cross-network-oriented representation learning algorithm based on the present disclosure, in step S31, in each layer of the L-layer neural network, the node feature expression vector of each network generates structural features through a message routing module include:
每一层的消息路由模块表示为:The message routing module of each layer is expressed as:
Figure PCTCN2020083378-appb-000001
Figure PCTCN2020083378-appb-000001
Figure PCTCN2020083378-appb-000002
Figure PCTCN2020083378-appb-000002
式中,
Figure PCTCN2020083378-appb-000003
为节点i在L层神经网络中第l层计算的源网络和目标网络的结构特征向量,
Figure PCTCN2020083378-appb-000004
为L层神经网络中第l-1层的源网络和目标网络的表达特征向量,第0层的表达特征向量由节点的原始特征向量x i表示,
Figure PCTCN2020083378-appb-000005
为第l层的消息路由模块涉及的参数矩阵,a (l)T为第l层的消息路由模块涉及的参数向量,σ为激活函数,||为两个向量的直接相连操作,N(v)为节点v直接相连的邻居集合,
Figure PCTCN2020083378-appb-000006
为节点u传向节点v的消息权重。
Where
Figure PCTCN2020083378-appb-000003
Is the structural feature vector of the source network and the target network calculated by the node i in the l-th layer of the L-layer neural network,
Figure PCTCN2020083378-appb-000004
Is the expression feature vector of the source network and the target network of the l-1th layer in the L-layer neural network, the expression feature vector of the 0th layer is represented by the original feature vector x i of the node,
Figure PCTCN2020083378-appb-000005
Is the parameter matrix involved in the message routing module of the first layer, a (l)T is the parameter vector involved in the message routing module of the first layer, σ is the activation function, || is the direct connection operation of the two vectors, N(v ) Is the set of neighbors directly connected to node v,
Figure PCTCN2020083378-appb-000006
Is the weight of the message sent from node u to node v.
基于本公开的面向跨网络的表示学习算法的另一个实施例中,所述步骤S32,结构特征经过消息聚合模块得到当前节点的新的表达特征向量包括:In another embodiment of the cross-network-oriented representation learning algorithm based on the present disclosure, in step S32, the structural feature obtaining the new expression feature vector of the current node through the message aggregation module includes:
每一层的消息聚合模块表示为:The message aggregation module of each layer is expressed as:
Figure PCTCN2020083378-appb-000007
Figure PCTCN2020083378-appb-000007
Figure PCTCN2020083378-appb-000008
Figure PCTCN2020083378-appb-000008
式中,
Figure PCTCN2020083378-appb-000009
Figure PCTCN2020083378-appb-000010
为消息聚合模块涉及的参数矩阵,
Figure PCTCN2020083378-appb-000011
为示节点聚合层面的向量。
Where
Figure PCTCN2020083378-appb-000009
with
Figure PCTCN2020083378-appb-000010
Is the parameter matrix involved in the message aggregation module,
Figure PCTCN2020083378-appb-000011
Is a vector showing the aggregation level of the node.
基于本公开的面向跨网络的表示学习算法的另一个实施例中,所述步骤S33,通过跨网络对齐模块,计算当前层来自源网络和目标网络之间的结构特征距离损失和表达特征距离损失包括:In another embodiment of the cross-network-oriented representation learning algorithm based on the present disclosure, the step S33 is to calculate the structural feature distance loss and the expression feature distance loss from the source network and the target network in the current layer through the cross-network alignment module include:
当前层来自源网络和目标网络之间的结构特征距离损失为:The current layer from the structural feature distance loss between the source network and the target network is:
Figure PCTCN2020083378-appb-000012
Figure PCTCN2020083378-appb-000012
式中,P r,Q r为源网络和目标网络的结构特征向量
Figure PCTCN2020083378-appb-000013
Figure PCTCN2020083378-appb-000014
的分布,
Figure PCTCN2020083378-appb-000015
为一个距离函数,用来计算结构特征向量
Figure PCTCN2020083378-appb-000016
Figure PCTCN2020083378-appb-000017
的期望距离。
In the formula, P r , Q r are the structural feature vectors of the source network and the target network
Figure PCTCN2020083378-appb-000013
with
Figure PCTCN2020083378-appb-000014
Distribution,
Figure PCTCN2020083378-appb-000015
Is a distance function used to calculate the structural feature vector
Figure PCTCN2020083378-appb-000016
with
Figure PCTCN2020083378-appb-000017
The expected distance.
当前层来自源网络和目标网络之间的表达特征距离损失为:The distance loss of the current layer from the expression feature between the source network and the target network is:
Figure PCTCN2020083378-appb-000018
Figure PCTCN2020083378-appb-000018
式中,P a,Q a为源网络和目标网络的节点表达特征向量
Figure PCTCN2020083378-appb-000019
Figure PCTCN2020083378-appb-000020
的分布,
Figure PCTCN2020083378-appb-000021
为一个距离函数,用来计算节点表达特征向量
Figure PCTCN2020083378-appb-000022
Figure PCTCN2020083378-appb-000023
的期望距离。
In the formula, P a , Q a are the node expression feature vectors of the source network and the target network
Figure PCTCN2020083378-appb-000019
with
Figure PCTCN2020083378-appb-000020
Distribution,
Figure PCTCN2020083378-appb-000021
As a distance function, used to calculate the node expression feature vector
Figure PCTCN2020083378-appb-000022
with
Figure PCTCN2020083378-appb-000023
The expected distance.
基于本公开的面向跨网络的表示学习算法的另一个实施例中,所述步骤S34,重复步骤S31至S33进行L次,得到最终源网络和目标网络的节点特征向量和L层累积的结构特征距离损失和表达特征距离损失包括:In another embodiment of the cross-network-oriented representation learning algorithm based on the present disclosure, in the step S34, steps S31 to S33 are repeated L times to obtain the node feature vectors of the final source network and the target network and the accumulated structural features of the L layer Distance loss and express feature distance loss include:
源网络和目标网络的节点特征向量和L层累积的结构特征距离损失为:The node feature vectors of the source network and the target network and the accumulated structural feature distance loss of the L layer are:
Figure PCTCN2020083378-appb-000024
Figure PCTCN2020083378-appb-000024
源网络和目标网络的节点特征向量和L层累积的表达特征距离损失为:The node feature vectors of the source network and the target network and the accumulated expression feature distance loss of the L layer are:
Figure PCTCN2020083378-appb-000025
Figure PCTCN2020083378-appb-000025
基于本公开的面向跨网络的表示学习算法的另一个实施例中,所述步骤S4,将从L层的神经网络中得到的源网络节点的表达向量进行分类预测概率计算,通过交叉熵损失函数计算分类损失,并且结合距离损失,通过反向传播算法更新网络参数包括:In another embodiment of the cross-network-oriented representation learning algorithm based on the present disclosure, in the step S4, the expression vector of the source network node obtained from the neural network of the L layer is calculated by the classification prediction probability, and the cross-entropy loss function is used Calculate the classification loss, and combine the distance loss to update the network parameters through the backpropagation algorithm, including:
交叉熵损失函数表示为:The cross entropy loss function is expressed as:
Figure PCTCN2020083378-appb-000026
Figure PCTCN2020083378-appb-000026
Figure PCTCN2020083378-appb-000027
Figure PCTCN2020083378-appb-000027
其中,L S为交叉熵损失函数,W Z为权重参数矩阵,
Figure PCTCN2020083378-appb-000028
为节点的特征表达向量,z i为节点类别的分类预测概率,y i为节点真实的类别,V S为源网络中有类别信息的节点集合。
Among them, L S is the cross-entropy loss function, W Z is the weight parameter matrix,
Figure PCTCN2020083378-appb-000028
Is the feature expression vector of the node, z i is the classification prediction probability of the node category, yi is the true category of the node, and V S is the set of nodes with category information in the source network.
与现有技术相比,本公开具有如下优点:Compared with the prior art, the present disclosure has the following advantages:
采用本公开的面向跨网络的表示学习算法可以提取出网络中的结构信息以及节点的属性信息,同时该算法又考虑到了不同网络数据之间分布不一致的问题,通过最小化特征距离来弥补由于不一致导致的信息损失,有效地解决了跨网络表示学习问题,在现实中有着广阔的应用空间。The cross-network-oriented representation learning algorithm of the present disclosure can extract structural information and node attribute information in the network. At the same time, the algorithm takes into account the inconsistency of distribution between different network data, and compensates for the inconsistency by minimizing the feature distance. The resulting information loss effectively solves the problem of cross-network representation learning, and has a broad application space in reality.
附图说明Description of the drawings
构成说明书的一部分的附图描述了本公开的实施例,并且连同描述一起用于解释本公开的原理。The drawings constituting a part of the specification describe the embodiments of the present disclosure, and together with the description, serve to explain the principle of the present disclosure.
参照附图,根据下面的详细描述,可以更加清楚地理解本公开,其中:With reference to the accompanying drawings, the present disclosure can be understood more clearly according to the following detailed description, in which:
图1为本公开提出的面向跨网络的表示学习算法的一个实施例的流程图;FIG. 1 is a flowchart of an embodiment of the cross-network-oriented representation learning algorithm proposed by the present disclosure;
图2为本公开提出的面向跨网络的表示学习算法的另一个实施例的流程图。Fig. 2 is a flowchart of another embodiment of the cross-network-oriented representation learning algorithm proposed by the present disclosure.
具体实施方式Detailed ways
为使本公开实施例的目的、技术方案和优点更加清楚,下面将结合本公开实施例中的附图,对本公开实施例中的技术方案进行清楚、完整的描述。显然,所描述的实施例只是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。In order to make the objectives, technical solutions, and advantages of the embodiments of the present disclosure clearer, the technical solutions in the embodiments of the present disclosure will be described clearly and completely in conjunction with the accompanying drawings in the embodiments of the present disclosure. Obviously, the described embodiments are only a part of the embodiments of the present disclosure, rather than all the embodiments. Based on the embodiments in the present disclosure, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of the present disclosure.
下面结合附图和实施例对本公开提供的一种面向跨网络的表示学习算法进行更详细的说明。In the following, a cross-network-oriented representation learning algorithm provided by the present disclosure will be described in more detail with reference to the accompanying drawings and embodiments.
图1为本公开提出的面向跨网络的表示学习算法的一个实施例的流程图,如图1所示,所述面向跨网络的表示学习算法:FIG. 1 is a flowchart of an embodiment of the cross-network-oriented representation learning algorithm proposed by the present disclosure. As shown in FIG. 1, the cross-network-oriented representation learning algorithm:
S1,生成包括源网络和目标网络的网络数据,每个网络数据包含网络的拓扑结构信息和节点属性信息,所述目标网络为所要推断表征的网络;源网络的表达式为G S,目标网络的表达式为G t,拓扑结构信息的表达式为G=(V,E),式中,V表示节点,E表示边,节点属性信息的表达式为x v,v∈V; S1, generating network data including a source network and a target network, each network data includes network topology information and node attribute information, the target network is the network to be inferred and characterized; the expression of the source network is G S , the target network The expression of is G t , the expression of topological structure information is G=(V,E), where V represents a node, E represents an edge, and the expression of node attribute information is x v ,v∈V;
S2,分别从源网络和目标网络随机采样设定数量的节点,并整理成满足算法输入的数据格式;将采集到的节点对应的节点属性x v作为算法的输入数据; S2, randomly sample a set number of nodes from the source network and the target network, and sort them into a data format that meets the input of the algorithm; use the node attributes x v corresponding to the collected nodes as the input data of the algorithm;
S3,得到源网络和目标网络的输入数据后,分别将其输入到一个L层的 神经网络,并对每一层分别计算源网络和目标网络的结构特征和表达特征,计算源网络和目标网络的对应特征之间的距离损失;S3, after obtaining the input data of the source network and the target network, input them into an L-layer neural network, and calculate the structural characteristics and expression characteristics of the source network and the target network for each layer, and calculate the source network and the target network The distance loss between the corresponding features;
S4,将从L层的神经网络中得到的源网络节点的表达向量进行分类预测概率计算,通过交叉熵损失函数计算分类损失,并且结合距离损失,通过反向传播算法更新网络参数;S4: Perform classification prediction probability calculation on the expression vector of the source network node obtained from the neural network of the L layer, calculate the classification loss through the cross-entropy loss function, and combine the distance loss to update the network parameters through the back propagation algorithm;
S5,重复步骤S2-S4,直至整个算法收敛。S5, repeat steps S2-S4 until the entire algorithm converges.
图2为本公开提出的面向跨网络的表示学习算法的另一个实施例的流程图,如图2所示,所述步骤S3,得到源网络和目标网络的输入数据后,分别将其输入到一个L层的神经网络,并对每一层分别计算源网络和目标网络的结构特征和表达特征,计算源网络和目标网络的对应特征之间的距离损失包括:Figure 2 is a flowchart of another embodiment of the cross-network-oriented representation learning algorithm proposed by the present disclosure. As shown in Figure 2, the step S3 is to obtain the input data of the source network and the target network, and then input them to An L-layer neural network, and calculate the structural characteristics and expression characteristics of the source network and the target network separately for each layer, and calculate the distance loss between the corresponding characteristics of the source network and the target network including:
S30,将源网络和目标网络的节点特征输入到L层的神经网络中;源网络和目标网络的节点特征分别为
Figure PCTCN2020083378-appb-000029
Figure PCTCN2020083378-appb-000030
Figure PCTCN2020083378-appb-000031
Figure PCTCN2020083378-appb-000032
输入到一个L层的神经网络;
S30: Input the node characteristics of the source network and the target network into the neural network of the L layer; the node characteristics of the source network and the target network are respectively
Figure PCTCN2020083378-appb-000029
with
Figure PCTCN2020083378-appb-000030
will
Figure PCTCN2020083378-appb-000031
with
Figure PCTCN2020083378-appb-000032
Input to an L-layer neural network;
S31,在L层神经网络的每一层中,每个网络的节点特征表达向量经过一个消息路由模块产生结构特征;结构特征表达式为
Figure PCTCN2020083378-appb-000033
S31, in each layer of the L-layer neural network, the node feature expression vector of each network generates structural features through a message routing module; the structural feature expression is
Figure PCTCN2020083378-appb-000033
S32,结构特征经过消息聚合模块得到当前节点的新的表达特征向量,表达特征向量表达式为
Figure PCTCN2020083378-appb-000034
S32, the structural feature obtains the new expression feature vector of the current node through the message aggregation module, and the expression feature vector expression is
Figure PCTCN2020083378-appb-000034
S33,通过跨网络对齐模块,计算当前层来自源网络和目标网络之间的结构特征距离损失和表达特征距离损失;S33: Calculate the structural feature distance loss and the expression feature distance loss between the source network and the target network in the current layer through the cross-network alignment module;
S34,重复步骤S31至S33进行L次,得到最终源网络和目标网络的节点特征向量和L层累积的结构特征距离损失和表达特征距离损失。最终源网络和目标网络的节点特征向量为
Figure PCTCN2020083378-appb-000035
Figure PCTCN2020083378-appb-000036
L层累积的结构特征距离损失值为L mra,表达特征距离损失值为L maa
S34: Repeat steps S31 to S33 for L times to obtain the node feature vectors of the final source network and the target network and the accumulated structural feature distance loss and expression feature distance loss of the L layer. Finally, the node feature vectors of the source network and the target network are
Figure PCTCN2020083378-appb-000035
with
Figure PCTCN2020083378-appb-000036
The cumulative structural feature distance loss value of the L layer is L mra , and the expression feature distance loss value is L maa .
所述步骤S31,在L层神经网络的每一层中,每个网络的节点特征表达向量经过一个消息路由模块产生结构特征包括:In step S31, in each layer of the L-layer neural network, the node feature expression vector of each network generates structural features through a message routing module including:
每一层的消息路由模块表示为:The message routing module of each layer is expressed as:
Figure PCTCN2020083378-appb-000037
Figure PCTCN2020083378-appb-000037
Figure PCTCN2020083378-appb-000038
Figure PCTCN2020083378-appb-000038
式中,
Figure PCTCN2020083378-appb-000039
为节点i在L层神经网络中第l层计算的源网络和目标网络的结构特征向量,
Figure PCTCN2020083378-appb-000040
为L层神经网络中第l-1层的源网络和目标网络的表达特征向量,第0层的表达特征向量由节点的原始特征向量x i表示,
Figure PCTCN2020083378-appb-000041
为第l层的消息路由模块涉及的参数矩阵,a (l)T为第l层的消息路由模块涉及的参数向量,σ为激活函数,||为两个向量的直接相连操作,N(v)为节点v直接相连的邻居集合,
Figure PCTCN2020083378-appb-000042
为节点u传向节点v的消息权重。
Where
Figure PCTCN2020083378-appb-000039
Is the structural feature vector of the source network and the target network calculated at the lth layer of the L-layer neural network for node i,
Figure PCTCN2020083378-appb-000040
Is the expression feature vector of the source network and the target network of the l-1th layer in the L-layer neural network, the expression feature vector of the 0th layer is represented by the original feature vector x i of the node,
Figure PCTCN2020083378-appb-000041
Is the parameter matrix involved in the message routing module of the first layer, a (l)T is the parameter vector involved in the message routing module of the first layer, σ is the activation function, || is the direct connection operation of the two vectors, N(v ) Is the set of neighbors directly connected to node v,
Figure PCTCN2020083378-appb-000042
Is the weight of the message sent from node u to node v.
所述步骤S32,结构特征经过消息聚合模块得到当前节点的新的表达特征向量包括:In step S32, the structure feature obtaining the new expression feature vector of the current node through the message aggregation module includes:
每一层的消息聚合模块表示为:The message aggregation module of each layer is expressed as:
Figure PCTCN2020083378-appb-000043
Figure PCTCN2020083378-appb-000043
Figure PCTCN2020083378-appb-000044
Figure PCTCN2020083378-appb-000044
式中,
Figure PCTCN2020083378-appb-000045
Figure PCTCN2020083378-appb-000046
为消息聚合模块涉及的参数矩阵,
Figure PCTCN2020083378-appb-000047
为示节点聚合层面的向量。
Where
Figure PCTCN2020083378-appb-000045
with
Figure PCTCN2020083378-appb-000046
Is the parameter matrix involved in the message aggregation module,
Figure PCTCN2020083378-appb-000047
Is a vector showing the aggregation level of the node.
所述步骤S33,通过跨网络对齐模块,计算当前层来自源网络和目标网络之间的结构特征距离损失和表达特征距离损失包括:In step S33, calculating the structural feature distance loss and expression feature distance loss between the source network and the target network in the current layer through the cross-network alignment module includes:
当前层来自源网络和目标网络之间的结构特征距离损失为:The current layer from the structural feature distance loss between the source network and the target network is:
Figure PCTCN2020083378-appb-000048
Figure PCTCN2020083378-appb-000048
式中,P r,Q r为源网络和目标网络的结构特征向量
Figure PCTCN2020083378-appb-000049
Figure PCTCN2020083378-appb-000050
的分布,
Figure PCTCN2020083378-appb-000051
为一个距离函数,用来计算结构特征向量
Figure PCTCN2020083378-appb-000052
Figure PCTCN2020083378-appb-000053
的期望距离。
In the formula, P r , Q r are the structural feature vectors of the source network and the target network
Figure PCTCN2020083378-appb-000049
with
Figure PCTCN2020083378-appb-000050
Distribution,
Figure PCTCN2020083378-appb-000051
Is a distance function used to calculate the structural feature vector
Figure PCTCN2020083378-appb-000052
with
Figure PCTCN2020083378-appb-000053
The expected distance.
当前层来自源网络和目标网络之间的表达特征距离损失为:The distance loss of the current layer from the expression feature between the source network and the target network is:
Figure PCTCN2020083378-appb-000054
Figure PCTCN2020083378-appb-000054
式中,O a,Q a为源网络和目标网络的节点表达特征向量
Figure PCTCN2020083378-appb-000055
Figure PCTCN2020083378-appb-000056
的分布,
Figure PCTCN2020083378-appb-000057
为一个距离函数,用来计算节点表达特征向量
Figure PCTCN2020083378-appb-000058
Figure PCTCN2020083378-appb-000059
的期望距离。
In the formula, O a , Q a are the node expression feature vectors of the source network and the target network
Figure PCTCN2020083378-appb-000055
with
Figure PCTCN2020083378-appb-000056
Distribution,
Figure PCTCN2020083378-appb-000057
As a distance function, used to calculate the node expression feature vector
Figure PCTCN2020083378-appb-000058
with
Figure PCTCN2020083378-appb-000059
The expected distance.
所述步骤S34,重复步骤S31至S33进行L次,得到最终源网络和目标网络的节点特征向量和L层累积的结构特征距离损失和表达特征距离损失包括:The step S34, repeating steps S31 to S33 for L times, to obtain the node feature vectors of the final source network and the target network and the accumulated structural feature distance loss and expression feature distance loss of the L layer include:
源网络和目标网络的节点特征向量和L层累积的结构特征距离损失为:The node feature vectors of the source network and the target network and the accumulated structural feature distance loss of the L layer are:
Figure PCTCN2020083378-appb-000060
Figure PCTCN2020083378-appb-000060
源网络和目标网络的节点特征向量和L层累积的表达特征距离损失为:The node feature vectors of the source network and the target network and the accumulated expression feature distance loss of the L layer are:
Figure PCTCN2020083378-appb-000061
Figure PCTCN2020083378-appb-000061
所述步骤S4,将从L层的神经网络中得到的源网络节点的表达向量进行分类预测概率计算,通过交叉熵损失函数计算分类损失,并且结合距离损失,通过反向传播算法更新网络参数包括:In step S4, the classification prediction probability calculation is performed on the expression vector of the source network node obtained from the neural network of the L layer, the classification loss is calculated by the cross-entropy loss function, and the distance loss is combined, and the network parameters are updated by the backpropagation algorithm. :
交叉熵损失函数表示为:The cross entropy loss function is expressed as:
Figure PCTCN2020083378-appb-000062
Figure PCTCN2020083378-appb-000062
Figure PCTCN2020083378-appb-000063
Figure PCTCN2020083378-appb-000063
其中,L S为交叉熵损失函数,W z为权重参数矩阵,
Figure PCTCN2020083378-appb-000064
为节点的特征表达向量,z i为节点类别的分类预测概率,y i为节点真实的类别,V S为源网络中有类别信息的节点集合。
Among them, L S is the cross-entropy loss function, W z is the weight parameter matrix,
Figure PCTCN2020083378-appb-000064
Is the feature expression vector of the node, z i is the classification prediction probability of the node category, yi is the true category of the node, and V S is the set of nodes with category information in the source network.
对于本领域技术人员而言,显然本公开实施例不限于上述示范性实施例的细节,而且在不背离本公开实施例的精神或基本特征的情况下,能够以其他的具体形式实现本公开实施例。因此,无论从哪一点来看,均应将实施例看作是示范性的,而且是非限制性的,本公开实施例的范围由所附权利要求 而不是上述说明限定,因此旨在将落在权利要求的等同要件的含义和范围内的所有变化涵括在本公开实施例内。不应将权利要求中的任何附图标记视为限制所涉及的权利要求。此外,显然“包括”一词不排除其他单元或步骤,单数不排除复数。系统、装置或终端权利要求中陈述的多个单元、模块或装置也可以由同一个单元、模块或装置通过软件或者硬件来实现。第一,第二等词语用来表示名称,而并不表示任何特定的顺序。For those skilled in the art, it is obvious that the embodiments of the present disclosure are not limited to the details of the above exemplary embodiments, and the embodiments of the present disclosure can be implemented in other specific forms without departing from the spirit or basic characteristics of the embodiments of the present disclosure. example. Therefore, from any point of view, the embodiments should be regarded as exemplary and non-limiting. The scope of the embodiments of the present disclosure is defined by the appended claims rather than the foregoing description, and therefore it is intended to fall on All changes within the meaning and scope of equivalent elements of the claims are included in the embodiments of the present disclosure. Any reference signs in the claims should not be regarded as limiting the claims involved. In addition, it is obvious that the word "including" does not exclude other units or steps, and the singular does not exclude the plural. Multiple units, modules or devices stated in the claims of a system, device or terminal can also be implemented by the same unit, module or device through software or hardware. Words such as first and second are used to denote names, but do not denote any specific order.
最后应说明的是,以上实施方式仅用以说明本公开实施例的技术方案而非限制,尽管参照以上较佳实施方式对本公开实施例进行了详细说明,本领域的普通技术人员应当理解,可以对本公开实施例的技术方案进行修改或等同替换都不应脱离本公开实施例的技术方案的精神和范围。Finally, it should be noted that the above implementation manners are only used to illustrate the technical solutions of the embodiments of the present disclosure and not to limit them. Although the embodiments of the present disclosure have been described in detail with reference to the above preferred implementation manners, those of ordinary skill in the art should understand that Modifications or equivalent replacements to the technical solutions of the embodiments of the present disclosure should not depart from the spirit and scope of the technical solutions of the embodiments of the present disclosure.

Claims (7)

  1. 一种面向跨网络的表示学习算法,其特征在于,包括:A cross-network-oriented representation learning algorithm, which is characterized in that it includes:
    S1,生成包括源网络和目标网络的网络数据,每个网络数据包含网络的拓扑结构信息和节点属性信息,所述目标网络为所要推断表征的网络;S1: Generate network data including a source network and a target network, each network data includes network topology information and node attribute information, and the target network is a network to be inferred and characterized;
    S2,分别从源网络和目标网络随机采样设定数量的节点,并整理成满足算法输入的数据格式;S2, randomly sample a set number of nodes from the source network and the target network, and sort them into a data format that meets the input of the algorithm;
    S3,得到源网络和目标网络的输入数据后,分别将其输入到一个L层的神经网络,并对每一层分别计算源网络和目标网络的结构特征和表达特征,计算源网络和目标网络的对应特征之间的距离损失;S3, after obtaining the input data of the source network and the target network, input them into an L-layer neural network, and calculate the structural characteristics and expression characteristics of the source network and the target network for each layer, and calculate the source network and the target network The distance loss between the corresponding features;
    S4,将从L层的神经网络中得到的源网络节点的表达向量进行分类预测概率计算,通过交叉熵损失函数计算分类损失,并且结合距离损失,通过反向传播算法更新网络参数;S4: Perform classification prediction probability calculation on the expression vector of the source network node obtained from the neural network of the L layer, calculate the classification loss through the cross-entropy loss function, and combine the distance loss to update the network parameters through the back propagation algorithm;
    S5,重复步骤S2-S4,直至整个算法收敛。S5, repeat steps S2-S4 until the entire algorithm converges.
  2. 根据权利要求1所述的面向跨网络的表示学习算法,其特征在于,所述步骤S3,得到源网络和目标网络的输入数据后,分别将其输入到一个L层的神经网络,并对每一层分别计算源网络和目标网络的结构特征和表达特征,计算源网络和目标网络的对应特征之间的距离损失包括:The cross-network-oriented representation learning algorithm according to claim 1, characterized in that, in step S3, after obtaining the input data of the source network and the target network, they are respectively input to an L-layer neural network, and each The first layer calculates the structural characteristics and expression characteristics of the source network and the target network respectively, and calculates the distance loss between the corresponding characteristics of the source network and the target network, including:
    S30,将源网络和目标网络的节点特征输入到L层的神经网络中;S30: Input the node characteristics of the source network and the target network into the neural network of the L layer;
    S31,在L层神经网络的每一层中,每个网络的节点特征表达向量经过一个消息路由模块产生结构特征;S31: In each layer of the L-layer neural network, the node feature expression vector of each network generates structural features through a message routing module;
    S32,结构特征经过消息聚合模块得到当前节点的新的表达特征向量;S32, the structural feature obtains the new expression feature vector of the current node through the message aggregation module;
    S33,通过跨网络对齐模块,计算当前层来自源网络和目标网络之间的结构特征距离损失和表达特征距离损失;S33: Calculate the structural feature distance loss and the expression feature distance loss between the source network and the target network in the current layer through the cross-network alignment module;
    S34,重复步骤S31至S33进行L次,得到最终源网络和目标网络的节点特征向量和L层累积的结构特征距离损失和表达特征距离损失。S34: Repeat steps S31 to S33 for L times to obtain the node feature vectors of the final source network and the target network and the accumulated structural feature distance loss and expression feature distance loss of the L layer.
  3. 根据权利要求2所述的面向跨网络的表示学习算法,其特征在于,所 述步骤S31,在L层神经网络的每一层中,每个网络的节点特征表达向量经过一个消息路由模块产生结构特征包括:The cross-network-oriented representation learning algorithm according to claim 2, characterized in that, in step S31, in each layer of the L-layer neural network, the node feature expression vector of each network generates a structure through a message routing module Features include:
    每一层的消息路由模块表示为:The message routing module of each layer is expressed as:
    Figure PCTCN2020083378-appb-100001
    Figure PCTCN2020083378-appb-100001
    Figure PCTCN2020083378-appb-100002
    Figure PCTCN2020083378-appb-100002
    式中,
    Figure PCTCN2020083378-appb-100003
    为节点i在L层神经网络中第l层计算的源网络和目标网络的结构特征向量,
    Figure PCTCN2020083378-appb-100004
    为L层神经网络中第l-1层的源网络和目标网络的表达特征向量,第0层的表达特征向量由节点的原始特征向量x i表示,
    Figure PCTCN2020083378-appb-100005
    为第l层的消息路由模块涉及的参数矩阵,a (l)T为第l层的消息路由模块涉及的参数向量,σ为激活函数,||为两个向量的直接相连操作,N(v)为节点v直接相连的邻居集合,
    Figure PCTCN2020083378-appb-100006
    为节点u传向节点v的消息权重。
    Where
    Figure PCTCN2020083378-appb-100003
    Is the structural feature vector of the source network and the target network calculated at the lth layer of the L-layer neural network for node i,
    Figure PCTCN2020083378-appb-100004
    Is the expression feature vector of the source network and the target network of the l-1th layer in the L-layer neural network, the expression feature vector of the 0th layer is represented by the original feature vector x i of the node,
    Figure PCTCN2020083378-appb-100005
    Is the parameter matrix involved in the message routing module of the first layer, a (l)T is the parameter vector involved in the message routing module of the first layer, σ is the activation function, || is the direct connection operation of the two vectors, N(v ) Is the set of neighbors directly connected to node v,
    Figure PCTCN2020083378-appb-100006
    Is the weight of the message sent from node u to node v.
  4. 根据权利要求3所述的面向跨网络的表示学习算法,其特征在于,所述步骤S32,结构特征经过消息聚合模块得到当前节点的新的表达特征向量包括:The cross-network-oriented representation learning algorithm according to claim 3, characterized in that in step S32, the structure feature obtaining the new expression feature vector of the current node through the message aggregation module comprises:
    每一层的消息聚合模块表示为:The message aggregation module of each layer is expressed as:
    Figure PCTCN2020083378-appb-100007
    Figure PCTCN2020083378-appb-100007
    Figure PCTCN2020083378-appb-100008
    Figure PCTCN2020083378-appb-100008
    式中,
    Figure PCTCN2020083378-appb-100009
    Figure PCTCN2020083378-appb-100010
    为消息聚合模块涉及的参数矩阵,
    Figure PCTCN2020083378-appb-100011
    为示节点聚合层面的向量。
    Where
    Figure PCTCN2020083378-appb-100009
    with
    Figure PCTCN2020083378-appb-100010
    Is the parameter matrix involved in the message aggregation module,
    Figure PCTCN2020083378-appb-100011
    Is a vector showing the aggregation level of the node.
  5. 根据权利要求4所述的面向跨网络的表示学习算法,其特征在于,所述步骤S33,通过跨网络对齐模块,计算当前层来自源网络和目标网络之间的结构特征距离损失和表达特征距离损失包括:The cross-network-oriented representation learning algorithm according to claim 4, characterized in that, in step S33, through the cross-network alignment module, the current layer is calculated from the structural feature distance loss and the expression feature distance between the source network and the target network. The losses include:
    当前层来自源网络和目标网络之间的结构特征距离损失为:The current layer from the structural feature distance loss between the source network and the target network is:
    Figure PCTCN2020083378-appb-100012
    Figure PCTCN2020083378-appb-100012
    式中,P r,Q r为源网络和目标网络的结构特征向量
    Figure PCTCN2020083378-appb-100013
    Figure PCTCN2020083378-appb-100014
    的分布,
    Figure PCTCN2020083378-appb-100015
    为一个距离函数,用来计算结构特征向量
    Figure PCTCN2020083378-appb-100016
    Figure PCTCN2020083378-appb-100017
    的期望距离。
    In the formula, P r , Q r are the structural feature vectors of the source network and the target network
    Figure PCTCN2020083378-appb-100013
    with
    Figure PCTCN2020083378-appb-100014
    Distribution,
    Figure PCTCN2020083378-appb-100015
    Is a distance function used to calculate the structural feature vector
    Figure PCTCN2020083378-appb-100016
    with
    Figure PCTCN2020083378-appb-100017
    The expected distance.
    当前层来自源网络和目标网络之间的表达特征距离损失为:The distance loss of the current layer from the expression feature between the source network and the target network is:
    Figure PCTCN2020083378-appb-100018
    Figure PCTCN2020083378-appb-100018
    式中,P a,Q a为源网络和目标网络的节点表达特征向量
    Figure PCTCN2020083378-appb-100019
    Figure PCTCN2020083378-appb-100020
    的分布,
    Figure PCTCN2020083378-appb-100021
    为一个距离函数,用来计算节点表达特征向量
    Figure PCTCN2020083378-appb-100022
    Figure PCTCN2020083378-appb-100023
    的期望距离。
    In the formula, P a , Q a are the node expression feature vectors of the source network and the target network
    Figure PCTCN2020083378-appb-100019
    with
    Figure PCTCN2020083378-appb-100020
    Distribution,
    Figure PCTCN2020083378-appb-100021
    As a distance function, used to calculate the node expression feature vector
    Figure PCTCN2020083378-appb-100022
    with
    Figure PCTCN2020083378-appb-100023
    The expected distance.
  6. 根据权利要求5所述的面向跨网络的表示学习算法,其特征在于,所述步骤S34,重复步骤S31至S33进行L次,得到最终源网络和目标网络的节点特征向量和L层累积的结构特征距离损失和表达特征距离损失包括:The cross-network-oriented representation learning algorithm according to claim 5, wherein the step S34 is to repeat steps S31 to S33 for L times to obtain the node feature vectors of the final source network and the target network and the L-layer accumulation structure Feature distance loss and expression feature distance loss include:
    源网络和目标网络的节点特征向量和L层累积的结构特征距离损失为:The node feature vectors of the source network and the target network and the accumulated structural feature distance loss of the L layer are:
    Figure PCTCN2020083378-appb-100024
    Figure PCTCN2020083378-appb-100024
    源网络和目标网络的节点特征向量和L层累积的表达特征距离损失为:The node feature vectors of the source network and the target network and the accumulated expression feature distance loss of the L layer are:
    Figure PCTCN2020083378-appb-100025
    Figure PCTCN2020083378-appb-100025
  7. 根据权利要求6所述的面向跨网络的表示学习算法,其特征在于,所述步骤S4,将从L层的神经网络中得到的源网络节点的表达向量进行分类预测概率计算,通过交叉熵损失函数计算分类损失,并且结合距离损失,通过反向传播算法更新网络参数包括:The cross-network-oriented representation learning algorithm according to claim 6, wherein the step S4 is to perform classification prediction probability calculation on the expression vector of the source network node obtained from the neural network of the L layer, through cross-entropy loss The function calculates the classification loss, and combines the distance loss to update the network parameters through the backpropagation algorithm including:
    交叉熵损失函数表示为:The cross entropy loss function is expressed as:
    Figure PCTCN2020083378-appb-100026
    Figure PCTCN2020083378-appb-100026
    Figure PCTCN2020083378-appb-100027
    Figure PCTCN2020083378-appb-100027
    其中,L s为交叉熵损失函数,W z为权重参数矩阵,
    Figure PCTCN2020083378-appb-100028
    为节点的特征表达向量,z i为节点类别的分类预测概率,y i为节点真实的类别,V s为源网络中有类别信息的节点集合。
    Among them, L s is the cross-entropy loss function, W z is the weight parameter matrix,
    Figure PCTCN2020083378-appb-100028
    Is the feature expression vector of the node, z i is the classification prediction probability of the node category, yi is the true category of the node, and V s is the set of nodes with category information in the source network.
PCT/CN2020/083378 2020-04-03 2020-04-03 Representation learning algorithm oriented to cross-network application WO2021196240A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2020/083378 WO2021196240A1 (en) 2020-04-03 2020-04-03 Representation learning algorithm oriented to cross-network application
CN202080005540.2A CN113228059A (en) 2020-04-03 2020-04-03 Cross-network-oriented representation learning algorithm

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/083378 WO2021196240A1 (en) 2020-04-03 2020-04-03 Representation learning algorithm oriented to cross-network application

Publications (1)

Publication Number Publication Date
WO2021196240A1 true WO2021196240A1 (en) 2021-10-07

Family

ID=77086007

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/083378 WO2021196240A1 (en) 2020-04-03 2020-04-03 Representation learning algorithm oriented to cross-network application

Country Status (2)

Country Link
CN (1) CN113228059A (en)
WO (1) WO2021196240A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114826921A (en) * 2022-05-05 2022-07-29 苏州大学应用技术学院 Network resource dynamic allocation method, system and medium based on sampling subgraph
CN115913971A (en) * 2022-03-09 2023-04-04 中国人民解放军63891部队 Network DNA feature representation and extraction method
CN117151279A (en) * 2023-08-15 2023-12-01 哈尔滨工业大学 Isomorphic network link prediction method and system based on line graph neural network

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109241321A (en) * 2018-07-19 2019-01-18 杭州电子科技大学 The image and model conjoint analysis method adapted to based on depth field
CN110489567A (en) * 2019-08-26 2019-11-22 重庆邮电大学 A kind of node information acquisition method and its device based on across a network Feature Mapping
US20200034687A1 (en) * 2012-03-29 2020-01-30 International Business Machines Corporation Multi-compartment neurons with neural cores
CN110751214A (en) * 2019-10-21 2020-02-04 山东大学 Target detection method and system based on lightweight deformable convolution

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200034687A1 (en) * 2012-03-29 2020-01-30 International Business Machines Corporation Multi-compartment neurons with neural cores
CN109241321A (en) * 2018-07-19 2019-01-18 杭州电子科技大学 The image and model conjoint analysis method adapted to based on depth field
CN110489567A (en) * 2019-08-26 2019-11-22 重庆邮电大学 A kind of node information acquisition method and its device based on across a network Feature Mapping
CN110751214A (en) * 2019-10-21 2020-02-04 山东大学 Target detection method and system based on lightweight deformable convolution

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115913971A (en) * 2022-03-09 2023-04-04 中国人民解放军63891部队 Network DNA feature representation and extraction method
CN115913971B (en) * 2022-03-09 2024-05-03 中国人民解放军63891部队 Network DNA characteristic representation and extraction method
CN114826921A (en) * 2022-05-05 2022-07-29 苏州大学应用技术学院 Network resource dynamic allocation method, system and medium based on sampling subgraph
CN114826921B (en) * 2022-05-05 2024-05-17 苏州大学应用技术学院 Dynamic network resource allocation method, system and medium based on sampling subgraph
CN117151279A (en) * 2023-08-15 2023-12-01 哈尔滨工业大学 Isomorphic network link prediction method and system based on line graph neural network

Also Published As

Publication number Publication date
CN113228059A (en) 2021-08-06

Similar Documents

Publication Publication Date Title
WO2023000574A1 (en) Model training method, apparatus and device, and readable storage medium
WO2021196240A1 (en) Representation learning algorithm oriented to cross-network application
CN112508085B (en) Social network link prediction method based on perceptual neural network
US20230039182A1 (en) Method, apparatus, computer device, storage medium, and program product for processing data
CN109902274A (en) A kind of method and system converting json character string to thrift binary stream
WO2022120997A1 (en) Distributed slam system and learning method therefor
US20230049817A1 (en) Performance-adaptive sampling strategy towards fast and accurate graph neural networks
WO2021184367A1 (en) Social network graph generation method based on degree distribution generation model
CN112231592A (en) Network community discovery method, device, equipment and storage medium based on graph
US20210012196A1 (en) Peer-to-peer training of a machine learning model
WO2023207790A1 (en) Classification model training method and device
CN112541575A (en) Method and device for training graph neural network
WO2023207013A1 (en) Graph embedding-based relational graph key personnel analysis method and system
Bi et al. Knowledge transfer for out-of-knowledge-base entities: Improving graph-neural-network-based embedding using convolutional layers
WO2023000165A1 (en) Method and apparatus for classifying nodes of a graph
CN114254738A (en) Double-layer evolvable dynamic graph convolution neural network model construction method and application
CN116090504A (en) Training method and device for graphic neural network model, classifying method and computing equipment
Wu et al. A federated deep learning framework for privacy-preserving consumer electronics recommendations
WO2021217933A1 (en) Community division method and apparatus for homogeneous network, and computer device and storage medium
Chiang et al. Optimal Transport based one-shot federated learning for artificial intelligence of things
CN116976461A (en) Federal learning method, apparatus, device and medium
WO2023035526A1 (en) Object sorting method, related device, and medium
Wang et al. Heterogeneous defect prediction algorithm combined with federated sparse compression
Chen et al. TranGAN: Generative adversarial network based transfer learning for social tie prediction
CN112765489B (en) Social network link prediction method and system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20928640

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20928640

Country of ref document: EP

Kind code of ref document: A1