CN112989199B

CN112989199B - A Collaborative Network Link Prediction Method Based on Multi-Dimensional Proximity Attribute Network

Info

Publication number: CN112989199B
Application number: CN202110343021.3A
Authority: CN
Inventors: 吴江; 贺超城; 欧桂燕; 左任衔
Original assignee: Wuhan University WHU
Current assignee: Wuhan University WHU
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2023-05-30
Anticipated expiration: 2041-03-30
Also published as: CN112989199A

Abstract

The invention provides a cooperative network link prediction method based on a multidimensional proximity attribute network, which belongs to the field of cooperative recommendation and comprises the following steps: respectively reserving multidimensional proximity features, local network features and global network features by using a self-encoder model, a joint probability model and an attribute Skip-Gram model; wherein the multidimensional proximity features include cognitive proximity features, geographic proximity features, and institutional proximity features; the loss function of the self-coding model, the loss function of the local network characteristic, the loss function of the global network characteristic and the loss function of the L2-norm are combined to serve as an overall objective function, and a random gradient descent method is adopted to optimize the overall objective function, so that the representation learning of the network nodes is realized; and carrying out cooperative network link prediction through the vector cosine similarity corresponding to the network node. According to the method, the network characteristics and the node attribute information are comprehensively considered, and the accuracy of the cooperative network link prediction is improved.

Description

A collaborative network link prediction method based on multi-dimensional proximity attribute network

技术领域Technical Field

本发明属于合作推荐领域，更具体地，涉及一种基于多维邻近属性网络的合作网络链路预测方法。The present invention belongs to the field of cooperative recommendation, and more specifically, relates to a cooperative network link prediction method based on a multi-dimensional neighboring attribute network.

背景技术Background Art

合作者推荐对于促进科研合作具有重要意义。已有的文献研究主要集中于推荐所有共同作者之间的合作关系。现有的合作者推荐方法主要基于网络模型、内容模型和混合模型。其中，基于网络模型的合作者推荐方法合并了本地网络功能(例如：公共邻居)或全局网络功能(例如：带有重启的随机游走RWR)。基于内容模型的合作者推荐方法通过提取内容特征(例如基于LDA的相似性)来推荐作者。基于混合模型的合作者推荐方法结合了网络特征和内容特征。结合网络特征和节点属性信息的属性网络嵌入模型表现出了良好的性能。现有的文献指出了科研合作邻近性(proximity)的五个维度。但是，现有的合作者推荐方法仅包含属于网络特征的社会邻近性(social proximity)或属于文本特征的认知邻近性。Collaborator recommendation is of great significance for promoting scientific research collaboration. Existing literature research mainly focuses on recommending the collaboration between all co-authors. Existing collaborator recommendation methods are mainly based on network models, content models and hybrid models. Among them, the collaborator recommendation methods based on network models incorporate local network features (e.g., common neighbors) or global network features (e.g., random walks with restarts RWR). The collaborator recommendation methods based on content models recommend authors by extracting content features (e.g., similarity based on LDA). The collaborator recommendation methods based on hybrid models combine network features and content features. The attribute network embedding model that combines network features and node attribute information shows good performance. Existing literature points out five dimensions of proximity in scientific research collaboration. However, existing collaborator recommendation methods only include social proximity, which belongs to network features, or cognitive proximity, which belongs to text features.

专利文献CN104573103B提出了一种科技文献异构网络下合作者推荐方法，但该方法仅考虑了一对作者与彼此合作的意愿度进行合作者的推荐，未从主要作者的角度出发。Patent document CN104573103B proposes a collaborator recommendation method in a heterogeneous network of scientific literature, but this method only considers the willingness of a pair of authors to cooperate with each other to recommend collaborators, and does not start from the perspective of the main author.

发明内容Summary of the invention

针对现有技术的缺陷，本发明的目的在于提供一种基于多维邻近属性网络的合作网络链路预测方法，旨在解决现有的合作者推荐方法未从多维邻近性特征考虑，且无法对多维邻近性特征进行函数处理后体现科研相似性与作者间合作概率，导致作者合作的预测精准度比较低的问题。In view of the defects of the prior art, the purpose of the present invention is to provide a collaboration network link prediction method based on a multidimensional proximity attribute network, aiming to solve the problem that the existing collaborator recommendation method does not take the multidimensional proximity characteristics into consideration and cannot reflect the scientific research similarity and the probability of collaboration between authors after functional processing of the multidimensional proximity characteristics, resulting in a relatively low prediction accuracy of author collaboration.

为实现上述目的，本发明提供了一种基于多维邻近属性网络的合作网络链路预测方法，包括以下步骤：To achieve the above object, the present invention provides a method for predicting cooperative network links based on a multi-dimensional proximity attribute network, comprising the following steps:

利用自编码器模型、联合概率模型和属性Skip-Gram模型分别保留多维邻近性特征、局部网络特征和全局网络特征；其中，多维邻近性特征包括认知邻近性特征、地理邻近性特征和制度邻近性特征；The autoencoder model, joint probability model and attribute Skip-Gram model are used to retain multidimensional proximity features, local network features and global network features respectively; among which, the multidimensional proximity features include cognitive proximity features, geographical proximity features and institutional proximity features;

结合自编码模型的损失函数、局部网络特征的损失函数和全局网络特征的损失函数以及L2-范数的损失函数作为整体目标函数，采用随机梯度下降方法优化整体目标函数，实现对网络节点的表示学习；The loss function of the autoencoder model, the loss function of the local network features, the loss function of the global network features, and the loss function of the L2-norm are combined as the overall objective function, and the stochastic gradient descent method is used to optimize the overall objective function to achieve representation learning of network nodes;

通过网络节点对应的向量余弦相似度进行合作网络链路预测；The cooperative network link prediction is performed through the cosine similarity of the vectors corresponding to the network nodes;

其中，网络节点代表作者；认知邻近性特征表征作者在科研领域的认知水平；地理邻近性特征表征各个作者的位置关系；制度邻近性特征表征作者所在位置语言的相似度；局部网络特征表征各作者合作的概率表示；全局网络特征通过作者邻近性向量的似然值表示科研相似性。Among them, network nodes represent authors; cognitive proximity features represent the author's cognitive level in the field of scientific research; geographical proximity features represent the positional relationship of each author; institutional proximity features represent the similarity of the language of the author's location; local network features represent the probability of cooperation among authors; and global network features represent scientific research similarity through the likelihood value of the author's proximity vector.

优选地，认知邻近性特征表示为：

Preferably, the cognitive proximity feature is expressed as:

其中，Cp_i,y为作者a_i发表在y年的论文认知向量累加和；y₀为基础年；Y为年限区间；Among them, Cp _i,y is the cumulative sum of cognitive vectors of papers published by author a _i in year y; y ₀ is the base year; Y is the year interval;

地理邻近性特征表示为GG＝(VG,EG)，其中，VG为地理节点集；EG为地理边的集合；The geographic proximity feature is expressed as GG = (VG, EG), where VG is the set of geographic nodes; EG is the set of geographic edges;

优选地，制度邻近性采用通用语言的连续聚合指数衡量。Preferably, institutional proximity is measured using a continuous aggregation index of common language.

优选地，自编码器模型为：Preferably, the autoencoder model is:

h_i＝σ₁(W⁽¹⁾x_i+b⁽¹⁾)h _i =σ ₁ (W ⁽¹⁾ x _i +b ⁽¹⁾ )

其中，x_i为作者的邻近特征向量，h_i是编码器的隐层表示；

是解码器的重构；θ＝{W⁽¹⁾,b⁽¹⁾,W⁽²⁾,b⁽²⁾}是模型参数；σ₁(·)为激活函数中的tanh函数；Among them, _xi is the author's neighbor feature vector, _hi is the hidden layer representation of the encoder;

is the reconstruction of the decoder; θ = {W ⁽¹⁾ , b ⁽¹⁾ , W ⁽²⁾ , b ⁽²⁾ } are model parameters; σ ₁ (·) is the tanh function in the activation function;

自编码器模型的损失函数为：The loss function of the autoencoder model is:

其中，n为作者总数目。Where n is the total number of authors.

优选地，局部网络特征的损失函数为：

Preferably, the loss function of the local network feature is:

其中，p_ij是作者a_i和作者a_j的联合概率；e_ij为作者a_i和作者a_j间的连边。Among them, p _ij is the joint probability of author a _i and author a _j ; e _ij is the edge between author a _i and author a _j .

优选地，全局网络特征的损失函数为：Preferably, the loss function of the global network feature is:

其中，a_i+j为生成的节点序列中的节点上下文，w是窗口大小；p(a_i+j|x_i)的条件概率是上下文a_i+j给定节点i邻近性向量的似然值；G为所有网络节点集合；C为所有随机漫步序列；a_i代表作者。Among them, ai _+j is the node context in the generated node sequence, w is the window size; the conditional probability of p( _ai+j | _xi ) is the likelihood value of the proximity vector of node i given the context ai _+j ; G is the set of all network nodes; C is all random walk sequences; _ai represents the author.

总体而言，通过本发明所构思的以上技术方案与现有技术相比，具有以下有益效果：In general, the above technical solution conceived by the present invention has the following beneficial effects compared with the prior art:

本发明从多维邻近性视角，全面涵盖科研合作者的属性特征；通过将认知邻近性、地理邻近性和制度邻近性进行预训练，表示成低维向量，在不破坏网络特征的前提下，能够全面的考虑各作者所特有的属性；同时保留的网络特征，包括局部网络和全局网络，局部网络能准确地反映各作者合作的意愿，全局网络则更能突出体现科研的相似度；在此基础上，以网络特征和节点属性特征损失函数之和的最小值为目标进行优化，提升了合作网络链路预测的精准度。The present invention comprehensively covers the attribute characteristics of scientific research collaborators from the perspective of multidimensional proximity. By pre-training cognitive proximity, geographical proximity and institutional proximity and expressing them as low-dimensional vectors, the unique attributes of each author can be comprehensively considered without destroying the network characteristics. At the same time, the retained network characteristics include local networks and global networks. The local network can accurately reflect the willingness of each author to cooperate, while the global network can better highlight the similarity of scientific research. On this basis, the optimization is performed with the minimum value of the sum of the network characteristics and node attribute characteristics loss functions as the goal, thereby improving the accuracy of the cooperation network link prediction.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1(a)是本发明实施例提供的GPN的距离加权网络示意图；FIG. 1( a ) is a schematic diagram of a distance-weighted network of a GPN provided in an embodiment of the present invention;

图1(b)是本发明实施例提供的GPN的地理邻近网络示意图；FIG. 1( b ) is a schematic diagram of a geographically adjacent network of a GPN provided in an embodiment of the present invention;

图1(c)是本发明实施例提供的GPN节点D的转移概率示意图；FIG1( c ) is a schematic diagram of the transition probability of a GPN node D provided in an embodiment of the present invention;

图2(a)是本发明实施例提供的制度邻近网络示意图；FIG. 2( a ) is a schematic diagram of an institutional proximity network provided by an embodiment of the present invention;

图2(b)是本发明实施例提供的IPN节点d的转移概率示意图；FIG2( b ) is a schematic diagram of the transition probability of an IPN node d provided in an embodiment of the present invention;

图3是本发明实施例提供的科研合作网络构建示意图；FIG3 is a schematic diagram of a scientific research cooperation network construction provided by an embodiment of the present invention;

图4是本发明实施例提供的联合优化框架示意图。FIG4 is a schematic diagram of a joint optimization framework provided by an embodiment of the present invention.

具体实施方式DETAILED DESCRIPTION

为了使本发明的目的、技术方案及优点更加清楚明白，以下结合附图及实施例，对本发明进行进一步详细说明。应当理解，此处所描述的具体实施例仅仅用以解释本发明，并不用于限定本发明。In order to make the purpose, technical solution and advantages of the present invention more clearly understood, the present invention is further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention and are not intended to limit the present invention.

本发明提供了一种基于多维邻近属性网络的合作网络链路预测方法，包括以下步骤：The present invention provides a method for predicting cooperative network links based on a multi-dimensional proximity attribute network, comprising the following steps:

其中，网络节点代表作者；认知邻近性特征表征作者在科研领域的认知水平；地理邻近性特征表征各个作者的位置关系；制度邻近性特征表征作者所在位置语言的相似度；局部网络特征表征各作者存在的合作关系；全局网络特征通过作者间的邻域相似性。Among them, network nodes represent authors; cognitive proximity features represent the author's cognitive level in the field of scientific research; geographical proximity features represent the positional relationship of each author; institutional proximity features represent the similarity of the language of the author's location; local network features represent the cooperative relationship between authors; and global network features are based on the neighborhood similarity between authors.

以下进行详细介绍，首先介绍本发明涉及到的多维邻近性的表示学习方法，其次阐述本发明提供的ARCR模型(科研合作推荐模型：attribute-aware researchrecommendation)的框架。The following is a detailed introduction. First, the representation learning method of multi-dimensional proximity involved in the present invention is introduced, and then the framework of the ARCR model (attribute-aware research recommendation model) provided by the present invention is explained.

1.本发明提供的多维邻近性特征表示包括认知邻近性特征表示、地理邻近性特征表示和制度邻近性表示。1. The multidimensional proximity feature representation provided by the present invention includes cognitive proximity feature representation, geographic proximity feature representation and institutional proximity representation.

(1)认知邻近性特征表示：(1) Cognitive proximity feature representation:

科学论文是链接性文本，不仅包含文本，还包含引文链接。文本内容信息和链接信息对于衡量科学论文相似性都是必不可少的。在本发明中，通过P2V(paper-to-vector)进行文本内容特征和引文链接特征表示，以得到科研主体的认知邻近性表示。考虑到科研主体的认知基础的动态变化，采用时间权重衰减因子，对于作者a_i，根据其各个年度发表论文的认知向量，得到认知邻近性特征表示：Scientific papers are linked texts, which contain not only text but also citation links. Both text content information and link information are essential for measuring the similarity of scientific papers. In the present invention, P2V (paper-to-vector) is used to represent text content features and citation link features to obtain the cognitive proximity representation of scientific research subjects. Considering the dynamic changes in the cognitive basis of scientific research subjects, a time weight decay factor is used. For author a _i , according to the cognitive vectors of the papers published in each year, the cognitive proximity feature representation is obtained:

其中，Cp_i,y为作者a_i发表在y年的论文的认知向量累加和；y₀为基础年；Y为年限区间。Among them, Cp _i,y is the cumulative sum of the cognitive vectors of the papers published by author a _i in year y; y ₀ is the base year; and Y is the year interval.

(2)地理邻近性特征表示：(2) Geographic proximity feature representation:

地理邻近性网络(GPN：Geographical Proximity Network)被定义为GG＝(VG,EG)，其中，VG为地理节点集；v₁∈VG，表示一个城市；EG为地理边的集合；e₁∈EG，表示两个城市之间的地理联系，e₁＝(u₁,v₁)；其中，u₁为异于v₁的另一城市；e₁与边的权重

相关，

是国家对之间共同语言的连续聚合指数；其中，u₂为异于v₂的另一个国家。The Institutional Proximity Network (IPN) is defined as GI = (VI, EI), where VI is the set of institutional nodes, v ₂ ∈VI represents a specific country; EI is the set of institutional edges; each institutional edge e ₂ ∈EI represents the institutional proximity between two countries, e ₂ = (u ₂ ,v ₂ ) and the weight of the institutional edge

Related,

is a continuous aggregation index of the common language between country pairs; where u ₂ is another country different from v ₂ .

图2(a)是制度邻近网络，图2(b)是网络节点d的转移概率；在IPN基于随机游走(random walks)生成节点序列，然后对节点序列进行属性Skip-Gram算法。在抽样结果中，制度邻近性较大的两个国家更有可能同时出现，从而导致相似的表示。Figure 2(a) is the institutional proximity network, and Figure 2(b) is the transition probability of network node d. In IPN, node sequences are generated based on random walks, and then the attribute Skip-Gram algorithm is performed on the node sequences. In the sampling results, two countries with greater institutional proximity are more likely to appear at the same time, resulting in similar representations.

2.科研合作网络(Coauthorship network)2. Coauthorship network

科研合作网络体现了社会邻近性，科研主体合著发表论文的合作信息构建科研合作网络。图3是科研合作网络构建示意图；当两个科研主体有一篇合著论文时，则两者有一条权重为1的两边；当两个科研主体有k篇合著论文时，则两者有一条权重为k的连边。The scientific research cooperation network reflects social proximity. The cooperation information of scientific research subjects co-authoring papers builds the scientific research cooperation network. Figure 3 is a schematic diagram of the construction of the scientific research cooperation network; when two scientific research subjects have a co-authored paper, there is a two-way edge with a weight of 1 between them; when two scientific research subjects have k co-authored papers, there is a connecting edge with a weight of k between them.

3.问题陈述3. Problem Statement

G＝(A,E,X)是属性科研合作网络，其中，A＝{a₁,a₂,...,a_n}是作者集合；每条边e_ij＝(a_i,a_j)∈E表示作者a_i和a_j之间的科研合作关系；X∈R^n×m表示节点属性矩阵；x_i是作者a_i的邻近性特征向量；节点属性信息是认知、地理和制度邻近性向量的并集；旨在通过学习映射函数f：a_i→h_i∈R^d，将每个作者a_i表示为低维向量h_i，其中，d＜＜n，并保留网络特征和节点属性信息；网络特征包括局部网络特征和全局网络特征。局部网络特征表示两个作者之间是否存在边；全局网络特征表示节点的高阶邻域相似性。G = (A, E, X) is an attributed scientific research cooperation network, where A = {a ₁ , a ₂ , ..., a _n } is the set of authors; each edge e _ij = (a _i , a _j ) ∈ E represents the scientific research cooperation relationship between authors a _i and a _j ; X ∈ R ^n×m represents the node attribute matrix; _xi is the proximity feature vector of author a _i ; the node attribute information is the union of cognitive, geographical and institutional proximity vectors; the goal is to represent each author a _i as a low-dimensional vector h _i by learning the mapping function f: a _i → _hi ∈ R ^d , where d ≤ n, and retain the network features and node attribute information; the network features include local network features and global network features. The local network feature indicates whether there is an edge between two authors; the global network feature indicates the high-order neighborhood similarity of the node.

4.基于邻近性特征的自编码器4. Autoencoder based on proximity features

邻近性特征属于属性特征，采用自编码器保存节点属性信息。自编码器模型包括以下三层，分别为输入层、隐层和输出层；自编码器的表示函数为：The proximity feature belongs to the attribute feature, and the autoencoder is used to save the node attribute information. The autoencoder model includes the following three layers: input layer, hidden layer and output layer; the representation function of the autoencoder is:

h_i＝σ₁(W⁽¹⁾x_i+b⁽¹⁾)h _i =σ ₁ (W ⁽¹⁾ x _i +b ⁽¹⁾ )

其中，x_i为作者的邻近特征向量，h_i∈R^d是编码器的隐层表示；

是解码器的重构；θ＝{W⁽¹⁾,b⁽¹⁾,W⁽²⁾,b⁽²⁾}是模型参数；σ₁(·)为激活函数中的tanh函数；Among them, _xi is the author's neighbor feature vector, _hi∈Rd ^is the hidden layer representation of the encoder;

通过最小化以下损失函数进行学习模型参数。The model parameters are learned by minimizing the following loss function.

为了保留属性信息中的高度非线性，在编码器中采用了K层隐层；In order to preserve the high nonlinearity in the attribute information, K hidden layers are used in the encoder;

……

其中，

表示作者a_i所需的低维隐藏表示形式。相应，在解码器中也采用了K层隐层。in,

represents the low-dimensional hidden representation required by the author a _i . Accordingly, K hidden layers are also used in the decoder.

5.局部网络特征5. Local Network Characteristics

网络特征包括局部网络特征和全局网络特征；局部网络特征的具体情况如下：Network features include local network features and global network features. The specifics of local network features are as follows:

最大化下面的似然估计以保留局部网络特征：L_f＝∏e_ij>0p_ij；Maximize the following likelihood estimate to preserve local network features: L _f = ∏e _ij>0 p _ij ;

其中，p_ij是a_i和a_j的联合概率：

Where p _ij is the joint probability of a _i and a _j :

因此，可以将负似然最小化，具体如下：Therefore, the negative likelihood can be minimized as follows:

6.全局网络特征6. Global Network Characteristics

为了保留全局网络特征，采用了基于属性的Skip-Gram模型。通过为所有随机游走序列c∈C提供当前节点a_i及其邻近性特征x_i，最小化以下负对数似然：In order to preserve the global network characteristics, the attribute-based Skip-Gram model is adopted. By providing the current node a _i and its proximity features x _i for all random walk sequences c ∈ C, the following negative log-likelihood is minimized:

其中，a_i+j为生成的节点序列中的节点上下文，w是窗口大小；p(a_i+j|x_i)的条件概率是上下文a_i+j给定节点i邻近性向量的似然值；G为所有网络节点集合；C为所有随机漫步序列；Where a _i+j is the node context in the generated node sequence, w is the window size; the conditional probability of p(a _i+j | _xi ) is the likelihood value of the proximity vector of node i given the context a _i+j ; G is the set of all network nodes; C is the sequence of all random walks;

其中，f(·)为自动编码器模型的编码器部分的函数；

是上下文节点

的对应表示方式。但是该公式的计算成本较高，因此，使用负采样法，将p(a_i+j|x_i)替换为：Where f(·) is the function of the encoder part of the autoencoder model;

Is the context node

However, the computational cost of this formula is high, so negative sampling is used to replace p(a _i+j | _xi ) with:

其中，σ₂(·)为激活函数中的sigmoid函数，|neg|为负例样本的数量；

为期望函数；

d_a是节点a的度数；Where σ ₂ (·) is the sigmoid function in the activation function, and |neg| is the number of negative samples;

is the expected function;

d _a is the degree of node a;

7.联合优化框架7. Joint Optimization Framework

图4为联合优化框架。由于自编码器模型、联合概率模型和属性感知skip-gram模型共享相同的编码器层，因此这些模型是紧密相连的。

的最终表示方式捕获了网络特征和节点属性信息。The joint optimization framework is shown in Figure 4. Since the autoencoder model, joint probability model, and attribute-aware skip-gram model share the same encoder layer, these models are closely connected.

The final representation captures both network characteristics and node attribute information.

将三个目标函数结合以获得联合模型的总目标函数：The three objective functions are combined to obtain the overall objective function of the joint model:

采用随机梯度下降算法优化上式中的损失函数，迭代优化两个耦合组件(αL_f+βL_ae+γL_reg,and L_h)；L_reg为

The stochastic gradient descent algorithm is used to optimize the loss function in the above formula, and the two coupled components (αL _f +βL _ae +γL _reg , and L _h ) are iteratively optimized; L _reg is

8.科研合作推荐8. Recommendation for scientific research cooperation

基于作者a_i的低维隐藏表示

以及作者a_j的低维隐藏表示

计算h_i和h_j余弦相似度；最后，向目标作者推荐前k个同类作者作为潜在的科研合作对象。Based on the low-dimensional hidden representation of the author a _i

And the low-dimensional hidden representation of author a _j

Calculate the cosine similarity between _hi and _hj ; finally, recommend the top k similar authors to the target author as potential scientific research cooperation partners.

实施例Example

数据源，利用Web of Science核心引文数据库收集2010～2019年医药类别的论文(“生物化学与分子生物学”、“医药、研究与实验”、“药理学与药学”和“毒理学”)，搜索查询为“WC＝A AND PY＝B AND LANGUAGE＝'English'”，其中，A是药学领域，B是2010～2019。排除独立作者的论文和非期刊论文，最终检索到528118篇论文。作者姓名消歧后筛选了162196位发表过5篇以上论文的作者。利用Google地图API获取地理信息。采用CEPII语言获取语言信息。将数据集按出版年份分为两部分：2018年之前的数据作为训练集，2018～2019年的数据作为测试集。Data source: Web of Science Core Citation Database was used to collect papers in the pharmaceutical category from 2010 to 2019 ("Biochemistry and Molecular Biology", "Medicine, Research and Experiment", "Pharmacology and Pharmacy", and "Toxicology"), and the search query was "WC = A AND PY = B AND LANGUAGE = 'English'", where A is the pharmaceutical field and B is 2010 to 2019. Papers with independent authors and non-journal papers were excluded, and 528,118 papers were finally retrieved. After author name disambiguation, 162,196 authors who had published more than 5 papers were screened. Geographic information was obtained using Google Maps API. Language information was obtained using CEPII language. The dataset was divided into two parts according to the year of publication: data before 2018 was used as a training set, and data from 2018 to 2019 was used as a test set.

实施过程：建立一个三层的自动编码器；第一层、第二层和第三层的隐藏维数分别为的d(1)＝600、d(2)＝512、d(3)＝256；地理邻近向量和制度邻近向量的维数均为64，认知邻近向量的维数为256。将局部网络特征损失函数的权重设为α＝1，自动编码器损失函数的权重设为β＝10，L2-范数正则化γ的权重设为10。随机选择100位作者作为目标节点，运行ARCR。Implementation: A three-layer autoencoder was built; the hidden dimensions of the first, second, and third layers were d(1)=600, d(2)=512, and d(3)=256, respectively; the dimensions of the geographic proximity vector and the institutional proximity vector were both 64, and the dimension of the cognitive proximity vector was 256. The weight of the local network feature loss function was set to α=1, the weight of the autoencoder loss function was set to β=10, and the weight of the L2-norm regularization γ was set to 10. 100 authors were randomly selected as target nodes and ARCR was run.

综上所述，本发明存在以下优势：In summary, the present invention has the following advantages:

本领域的技术人员容易理解，以上所述仅为本发明的较佳实施例而已，并不用以限制本发明，凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等，均应包含在本发明的保护范围之内。It will be easily understood by those skilled in the art that the above description is only a preferred embodiment of the present invention and is not intended to limit the present invention. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the protection scope of the present invention.

Claims

1. A cooperative network link prediction method based on a multidimensional proximity attribute network is characterized by comprising the following steps:

respectively reserving multidimensional proximity features, local network features and global network features by using a self-encoder model, a joint probability model and an attribute Skip-Gram model; wherein the multidimensional proximity features include cognitive proximity features, geographic proximity features, and institutional proximity features;

the loss function of the self-coding model, the loss function of the local network characteristic, the loss function of the global network characteristic and the loss function of the L2-norm are combined to serve as an overall objective function, and a random gradient descent method is adopted to optimize the overall objective function, so that the representation learning of the network nodes is realized;

carrying out cooperative network link prediction through vector cosine similarity corresponding to the network node;

wherein the network node represents an author; the multi-dimensional proximity feature representation includes a cognitive proximity feature representation, a geographic proximity feature representation, and a institutional proximity feature representation; cognitive proximity features characterize the cognitive level of authors in the scientific domain; the geographic proximity features characterize the position relationship of each author; the system proximity feature represents the similarity of the language of the position where the author is located; the local network characteristics represent the cooperative relationship existing in each author; the global network features represent scientific research similarity through likelihood values of author proximity vectors;

wherein the geographic proximity network in the geographic proximity feature is defined as gg= (VG, EG), where VG is a set of geographic nodes; v ₁ E VG, represents a city; EG is a collection of geographic edges; e, e ₁ EG, represents the geographic relationship between two cities, e ₁ ＝(u ₁ ,v ₁ ) The method comprises the steps of carrying out a first treatment on the surface of the Wherein u is ₁ Is different from v ₁ Is a city of another city; e, e ₁ Weighting of edges

Correlation;

Is city u ₁ And city v ₁ A distance therebetween; performing biased random walk on a geographic proximity network, wherein the transition probability is in direct proportion to the weight of the edge, generating a city node sequence according to the transition probability, and finally executing an attribute Skip-Gram model; the more likely two cities with closer geographic distances are co-present in the sampling result, and similar vector representations are obtained; wherein the system proximity network in the system proximity feature is defined as gi= (VI, EI), VI is the system node set, v ₂ Epsilon VI represents a specific country; EI is the set of institutional edges; each system side e ₂ E EI, the system adjacency between two countries, e ₂ ＝(u ₂ ,v ₂ ) Weights with the system side->

Related (I)>

Is a continuous aggregation index of common language between country pairs; wherein u is ₂ Is different from v ₂ Is a country of another country; performing random walk on a system proximity network to generate a national node sequence, and performing an attribute Skip-Gram algorithm on the national node sequence; in the sampling result, the more likely two countries with greater system proximity are to appear simultaneously, resulting in a similar representation;

the self-encoder model, the joint probability model and the attribute aware skip-gram model share the same encoder layer;

the overall objective function is:

L＝L _h +αL _f +βL _ae +γL _reg

optimizing the whole objective function by adopting a random gradient descent algorithm, and iteratively optimizing the two coupling components alpha L _f +βL _ae +γL _reg And L _h ；L _f A loss function that is a local network feature; l (L) _h Is a global network feature; l (L) _ae A loss function that is a self-coding model; l (L) _reg Is a loss function of the L2-norm.

2. The cooperative network link prediction method according to claim 1, wherein the institutional adjacency is measured in terms of a continuous aggregation index in a common language.

3. The cooperative network link prediction method according to claim 2, wherein the self-encoder model is: h is a _i ＝σ ₁ (W ⁽¹⁾ x _i +b ⁽¹⁾ )，

Wherein x is _i For the author's neighboring feature vector, h _i Is a hidden layer representation of the encoder;

is a reconstruction of the decoder; θ= { W ⁽¹⁾ ,b ⁽¹⁾ ,W ⁽²⁾ ,b ⁽²⁾ -model parameters; sigma (sigma) ₁ (. Cndot.) is the tanh function in the activation function. />

4. A method of collaborative network link prediction according to claim 3, wherein the loss function of the self-encoder model is:

where n is the total number of authors.

5. The cooperative network link prediction method according to claim 2, wherein the loss function of the local network characteristic is:

wherein p is _ij Is author a _i And author a _j Is a joint probability of (2); e, e _ij For author a _i And author a _j The connecting edges between the two.

6. The cooperative network link prediction method according to claim 2, wherein the loss function of the global network feature is:

wherein a is _i+j For the node context in the generated node sequence, w is the window size; p (a) _i+j |x _i ) The conditional probability of (a) is context a _i+j Giving likelihood values of the node i proximity vector; g is a set of all network nodes; c is all random walk sequences; a, a _i Representing the author.