CN112148876A

CN112148876A - Paper classification and recommendation method

Info

Publication number: CN112148876A
Application number: CN202011009122.9A
Authority: CN
Inventors: 谭春阳; 吴震; 何亮; 戴新宇; 梁华盛; 颜强
Original assignee: Nanjing University
Current assignee: Nanjing University
Priority date: 2020-09-23
Filing date: 2020-09-23
Publication date: 2020-12-29
Anticipated expiration: 2040-09-23
Also published as: CN112148876B

Abstract

The invention provides a paper classification and recommendation method, which comprises the following steps: sampling each thesis node to obtain candidate attribute neighbor nodes, wherein all structure neighbors of each thesis node are used as candidate structure neighbor nodes; generating attribute representation and structure representation of the thesis node and the neighbor node; fusing the structure representation of the paper node and the structure representation of the attribute neighbor to obtain a fused structure representation, and fusing the attribute representation of the paper node and the attribute representation of the final structure neighbor to obtain a fused attribute representation; then training is carried out through a reconstruction loss function in the neural network training process, and the representation is continuously updated; splicing the updated fusion structure representation and fusion attribute representation to obtain a low-dimensional vector representation of the paper node; inputting the low-dimensional representation of part of the thesis nodes with the category labels in the thesis network into a support vector machine, and training to obtain a classification model; and predicting the category of the thesis node through a classification model, and classifying the thesis and recommending the thesis of the same category by using the category.

Description

Paper classification and recommendation method

Technical Field

The invention belongs to the field of data mining, and particularly relates to a paper classification and recommendation method.

Background

At present, network representation learning is a popular research direction in the field of data mining, and has very wide application in real life, such as node classification, node clustering, recommendation systems and the like. The thesis network representation learning is a research field of low-dimensional representation learning of the thesis network containing attribute information on nodes, and low-dimensional representation is learned for each node and applied to offline application. The mainstream method includes a representation learning method using only the structural information and a multisource information fusion representation learning method combining the structural information and the attribute information at the same time. Meanwhile, the multi-source information is merged into the representation learning method combining the structural information and the attribute information, and the learned representation obtains better performance compared with a method only utilizing the structural information.

Although current multi-source information fusion methods have made great progress, these methods usually ignore the problem of outliers in the attribute network. Specifically, a paper node may not belong to the same category as its neighboring nodes or neighboring nodes with similar attributes, but many models are currently based on such similarity assumptions; in addition, the structure and the attribute information have the phenomenon of inconsistent distribution, namely the description of the structure and the attribute information to the node is not uniform. The similarity assumption and consistency assumption based on the current model are not consistent with the network distribution of the paper in reality, so that the representation capability of the model is limited.

Network data is widely available in real life, such as a paper reference network, a social media communication network and the like. How to learn the nodes in the paper network to obtain a low-dimensional representation and apply the low-dimensional representation to offline tasks (such as paper classification, paper recommendation and the like) is a research hotspot. Some methods of network representation learning are mainly described below.

Perozzi et al ([ Perozzi et al, 2014] deep walk: Online learning of social representation) proposed a deep walk model, which walks network data by a random walk method for the first time to obtain a plurality of walk sequences as pseudo sentences. And then training the walking sequences by using a word vector learning method Skip-gram and a hierarchical softmax optimization method so as to obtain a low-dimensional node representation.

Grover et al ([ Grover et al, 2016] node2vec: Scalable feature learning for networks) improved on the basis of a Deepwalk model in a random walk strategy, and proposed that biased random walks exist and the depth and the width of the walk are controlled based on parameters p and q. And after a biased walking sequence is obtained, learning by a Skip-gram algorithm and a negative sampling optimization method to obtain a low-dimensional vector representation of the node.

Tang et al ([ Tang et al, 2015] LINE: Large-scale information network embedding) then proposed a LINE model, and first proposed concepts of first-order similarity and second-order similarity for measuring the similarity between nodes. And training in an Edge sampling optimization mode to obtain low-dimensional vector representation of the nodes.

Yang et al ([ Yang et al, 2015] Network representation learning with rich text information) proposed the first model TADW that incorporates text information into paper Network representation learning, which utilizes both structural information and attribute information by a matrix decomposition method. Because of the simultaneous use of multi-source information, the effectiveness of TADW is generally better than models that use only structural information.

The TriDNR proposed by Pan et al ([ Pan et al, 2016] Tri-Party Deep Network Representation) is a semi-supervised attribute Network Representation learning model based on a simple neural Network, the Deep association among three relations of node-node, node-attribute and label-attribute is considered for the first time by the TriDNR, and two training networks are constructed: a structure training network and an attribute-label training network, which are shared by the two to represent the nodes. And through an alternate training mode, the representation of the node in the structure diagram and the representation in the attribute-label network are continuously and alternately updated, so that the low-dimensional representation can simultaneously fuse the structure information, the attribute information and the supervision information, and the two networks can mutually influence and promote during training.

Yang et al ([ Yang et al, 2017] From properties to links: Deep network embedding on completing graphs) bases on consistency assumptions of structure and properties. For example, in a network of articles, a node (article) belonging to the natural language processing category is more likely to refer to a plurality of articles in the same domain, and at the same time, the contents between the articles referred to each other are more likely to be similar. Therefore, theoretically, a relationship of mutual mapping exists from the structure to the attribute to some extent. By such an assumption, consistency of the two in the depth space is maintained, aiming to enhance the robustness of the representation, thereby obtaining a higher quality representation. However, by analysis, there is a gap between real-world data and such consistency assumptions, limiting the quality of the representation learned by such methods.

Disclosure of Invention

The purpose of the invention is as follows: in the thesis network representation learning method, gaps exist between similarity assumptions and consistency assumptions which are usually based and real data distribution, and the representation quality is limited. The problem to be solved by the invention is how to dig more hidden information under real data distribution, reduce the difference between hypothesis and real data, thus learning to obtain a high-quality low-dimensional thesis node representation, and apply the low-dimensional thesis node representation to offline tasks such as thesis classification and thesis recommendation.

The invention specifically provides a low-dimensional vector representation learning method for a thesis network structure by fusing sampling neighbor information, which comprises the following steps:

step 1, obtaining the candidate of the truncated attribute neighbor of each paper node in the paper network G by truncation neighbor sampling, and using a set symbol C_aRepresenting a truncated attribute neighbor candidate set of a paper node p;

step 2, based on the truncated attribute, neighbor candidateAnd (3) collecting, and calculating to obtain a final truncated attribute neighbor: applying cosine similarity metric approach to paper node p and truncated attribute neighbor candidate set of nodes

In the above, the attribute of paper node p is expressed as one-hot

Truncating the attribute-only hot representation set of the attribute neighbor candidate set to

Calculating and reserving K attribute neighbors with highest similarity as final truncation attribute neighbors; where m represents the number of truncated attribute neighbors of paper node p,

representing the ith truncated attribute neighbor in the truncated attribute neighbor candidate set for paper node p,

representing the attribute one-hot representation of the ith truncation attribute neighbor, wherein the value of i is 1-m;

step 3, reserving T structure neighbors of the thesis node p as final structure neighbors;

step 4, establishing a structural multilayer self-encoder AE_sThe structure unique representation of the paper nodes is used as input, and the preliminary structure representation and the structure representation of attribute neighbors of each paper node in the attribute network are obtained through a structure multilayer self-encoder; build Attribute multilayer autoencoder AE_aThe attribute unique representation of the paper nodes is used as input, and the preliminary attribute representation and the attribute representation of the structure neighbor of each paper node in the attribute network are obtained through an attribute multilayer self-encoder;

step 5, fusing the preliminary structure representation of the paper node and the structure representation of the attribute neighbor to obtain a fused structure representation of the paper node, and fusing the preliminary attribute representation of the paper node and the attribute representation of the attribute neighbor to obtain a fused attribute representation of the paper node;

step 6, enabling the fusion structure representation and the attribute multilayer self-encoder in the step 5 to interact through reconstruction operation, and enabling the fusion attribute representation and the structure multilayer self-encoder in the step 5 to interact;

step 7, splicing the fusion structure representation obtained through the interaction in the step 6 and the fusion attribute representation obtained through the interaction in the step 6 to obtain a low-dimensional representation of the paper node;

step 8, inputting the low-dimensional representation of the labeled thesis nodes in the thesis network into a support vector machine classifier for training to obtain a trained classifier;

and 9, inputting the thesis nodes without labels in the thesis network into a classifier, obtaining the prediction labels of the thesis nodes as classification results, and performing thesis classification and thesis recommendation of the same category.

In step 1, the thesis network refers to a network in which nodes are thesis and the nodes include attribute information (title text, abstract text, body text, etc.), such as a thesis network Cora ([ Sen et al, 2008)]A collection classification in network data) containing title text and summary text information; the truncated neighbor sampling is: randomly walking each paper node by giving a walking step length L (set to be 10) and a walking number gamma (set to be 80) of each node, obtaining a walking sequence of each paper node by walking, and obtaining a truncated attribute neighbor candidate set C of each paper node by statistically de-duplicating the paper nodes in the sequence_ap。

The step 2 comprises the following steps:

step 2-1, setting a truncated attribute neighbor candidate set of the paper node p

Representing the ith truncated attribute neighbor in the truncated attribute neighbor candidate set of paper node p, truncated attribute neighbor candidate set C_apIs a collection of property-only-hot representations

To represent

The property of (1) is expressed by single heat, and the value of i is 1-m;

one-hot representation of attributes of paper node p

And attribute-only representation set of truncated attribute neighbor candidate set

And (3) performing cosine similarity calculation:

to represent

And ith attribute unique hot representation

Cosine similarity of (d);

then m sets of cosine similarity results are obtained as

Step 2-2, sorting the cosine similarity results of the truncated attribute neighbors of the paper node p in a descending order, and taking the largest first K (for example, the first 20) as the final attribute neighbors of the paper node p; if m is<K, performing a playback random sampling to obtain a set of

Wherein

Representing the sampled i-th attribute neighbor, C_ApRepresenting a collection of these attribute neighbors.

In step 3, obtaining T (for example, set to be 20) final structure neighbors of the paper node through random sampling; if the number of the structure neighbor candidates of the thesis node is less than T, the random sampling is put back to obtain a set of

Wherein

And i represents the ith structure neighbor of the node p, and the value of i is 1-T.

Step 4 comprises the following steps:

step 4-1, build a structural multi-layer autoencoder ([ Hinton et al, 2006)]A fast learning algorithm for deep belief nets)AE_sStructured multilayer autoencoder AE_sThe device comprises an encoding module and a decoding module, wherein the encoding module is as follows:

y¹＝σ(W¹x_s+b¹)

y^m＝σ(W^my^m-1+b^m)

wherein, yⁱRepresents the output of the i-th fully-connected layer, WⁱAnd bⁱThe distribution represents the projection matrix and offset vector in the ith layer,

structural one-hot representation of the representation input, d_sσ is the sigmoid function for the dimension of the one-hot representation of the structure. WⁱAnd bⁱIs trained along with the representation learning method, wherein

Denotes d_sA complex space of dimensions.

The decoding module inverts the coding module to obtain output

The parameters are trained along with the learning method through reconstruction, and the loss function of the structural multi-layer self-encoder is

Wherein n is_sAs to the number of the nodes,

the structure input for the ith node is expressed uniquely,

for the structural representation resulting from the decoding by the encoder,

to penalize the parameter, e.g., set to 10, more penalties are given to non-zero values in the representation.

Paper node p and attribute neighbor C_ApStructural independent thermal representation input structure multilayer self-encoder AE of_sObtaining a preliminary structural representation of the nodes of the thesis at the hidden layer

And structural representation of attribute neighbors

Wherein

Obtaining a preliminary structure representation for the ith attribute neighbor from the hidden layer;

step 4-2, establishing an attribute multilayer self-encoder AE_aThe method is consistent with the step 4-1, and the paper node p and the structure neighbor C are connected_spAttribute-independent representation of input attributes multilayer autoencoder AE_aIn hidden layer theoryPreliminary attribute representation of text nodes

And attribute representation of structure neighbors

Wherein

A preliminary attribute representation derived from the hidden layer for the ith structure neighbor. l is 1-T;

the step 5 comprises the following steps:

step 5-1, a method by point-by-point attention mechanism ([ Luong et al, 2015)]Effective applications to Attention-based Neural Machine Translation), calculating a structural similarity coefficient by point multiplication, and expressing the structural similarity according to the primary structure of a paper node p

And structural representation of attribute neighbors

Calculating point-by-point similarity:

wherein e_p,jNormalizing the point-by-point similarity between the thesis node p and the attribute neighbor K to obtain a similarity coefficient alpha_p,j：

Wherein exp represents a natural index; normalization by the inverse mechanism yields a new inverse attention coefficient

Wherein the reverse attention coefficient indicates the attention degree of the paper node to each neighbor;

neighbor information representation weighted by attention coefficient calculation

Step 5-2, obtaining the fusion structure representation of the thesis node

Step 5-3, calculating to obtain weighted neighbor information representation

Step 5-4, obtaining the fusion attribute representation of the paper node

Step 5-2 comprises: the importance of the neighbors is controlled using a weighted sum mechanism:

obtaining a fused structural representation of a paper node

Wherein the parameter lambda_sAnd the value is 0-1, and the method is used for controlling the importance degree represented by the neighbor information.

Step 5-3 comprises: calculated by the following formulaWeighted neighbor information representation

Step 5-4 comprises: the importance of the neighbors is controlled using a weighted sum mechanism:

obtaining a fused attribute representation of a paper node

Wherein the parameter lambda_aAnd the value is 0-1, and the method is used for controlling the importance degree represented by the neighbor information.

The step 6 comprises the following steps:

step 6-1, build structure-to-attribute decoder, let the fused structure representation in step 5-2

Deriving a reconstructed property representation by a structure-to-property decoder

Interacting through reconstruction operations to obtain a loss function for a structure-to-attribute decoder

Wherein n is_aIn order to be the number of attributes,

for the property-one-hot representation of node p,

is a penalty coefficient;

step 6-2, build attribute to structure decoder, let step 5-4 fuse attribute representation

Deriving a reconstructed structure representation by an attribute-to-structure decoder

Interacting through reconstruction operations to obtain a loss function for an attribute-to-structure decoder

Wherein n is_sIn order to be the number of attributes,

for the property-one-hot representation of node p,

is a penalty factor. The parameters are trained with the method.

And 7, splicing the fusion structure representation and the fusion attribute representation after the interaction of the step 6-1 and the step 6-2 to be used as the final representation output of the nodes of the thesis.

According to the method provided by the invention, the information of the nodes is further supplemented by sampling the structure neighbors and the attribute neighbors of the thesis nodes, so that the problem that the theoretical assumption is inconsistent with the real data distribution is solved, and the node representation quality in the thesis network representation learning is improved. And the method of the invention provides a truncation sampling method aiming at the problem of sampling efficiency, relieves the problem of complexity and is easier to expand into a large-scale thesis network.

The invention has the following beneficial effects:

the technical level is as follows:

1 original thesis network representation learning method is basically based on two basic assumptions: similarity assumptions (nodes that are similar in structure or similar in attributes are more similar) and consistency assumptions (predicted preferences for structure information and attribute information are consistent). However, data analysis finds that a gap exists between the hypothesis and the real data distribution, and the phenomenon limits the representation quality learned by the current model. In order to alleviate the problem, the method of the invention excavates and fuses additional attribute information and structure information, reduces the difference between the assumed data distribution and the real data distribution, and improves the representation quality.

The truncation neighbor sampling method provided by the invention avoids high-complexity calculation for each thesis node, reduces the complexity of the method, and is suitable for large-scale attribute network data sets.

3 the reverse attention mechanism provided by the invention enables each thesis node to fuse more diverse neighbor information, and the reverse attention mechanism has a better effect through experimental verification.

The application level is as follows: the low-dimensional vector representation learning method for the paper network structure by fusing the sampled neighbor information can be applied to the paper network with any attribute, is not limited to the network with the text attribute, can be conveniently expanded to a large-scale paper network representation learning task, and performs paper classification and paper recommendation on the basis.

Drawings

The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.

Fig. 1 is a flowchart of a paper network representation learning method based on a depth self-encoder according to an embodiment.

Fig. 2 is a flow chart of truncating neighbor samples.

FIG. 3 is a flow chart of a reverse attention mechanism.

FIG. 4 is a flow chart of the method of the present invention in conjunction with a deep auto-encoder based paper network representation learning method.

FIG. 5 is a graph of the effect of the invention and other methods on node classification on the Cora dataset and the DBLP dataset.

Detailed Description

The flow chart is shown in fig. 1. As shown in the figure, the invention provides a paper classification and recommendation method, firstly establishing a baseline model based on a depth self-encoder, which comprises the following steps:

step 1, establishing a depth structure multilayer self-encoder AE_sComprising an encoder module and a decoder module, the input being a structural, one-hot representation x of a paper node_sThe encoder output is a low dimensional hidden layer vector

Decoder output is reconstructed structure representation x'_s。

Step 2, establishing a depth attribute multilayer self-encoder AE_aComprising an encoder module and a decoder module, the input being a property-independent representation x of a paper node_aThe encoder output is a low dimensional hidden layer vector

Decoder output is reconstructed attribute representation x'_a。

Step 3, establishing a decoder with attributes to a structure, and inputting the decoder as attribute hidden layer vectors of the paper nodes

Output as a reconstructed structure representation of attributes to structures

Step 4, a structure-to-attribute decoder is established, and the structure hidden layer vector of the thesis node is input

Output as a reconstructed attribute representation of structure to attribute

In this flow, the self-encoder in step 1 and step 2 is implemented by extending the depth self-encoder method ([ Hinton et al, 2006] a fast learning algorithm for deep belief nets), and the encoder module can be expressed as:

y¹＝σ(W¹x+b¹)

y^m＝σ(W^my^m-1+b^m)

wherein, WⁱAnd bⁱThe distribution represents the projection matrix and offset vector in the i-th layer, x represents the input structure/property one-hot representation, and σ is the sigmoid function. WⁱAnd bⁱTrained along with the representation learning method. The input of the encoder is the structure one-hot representation and the attribute one-hot representation of the paper nodes, the structure information and the attribute information of the paper network are summarized, and the output dimension of the encoder is set to be 64 dimensions.

The decoder module in steps 1 to 4 can be represented as the inversion process of the encoder module, the input dimension of the decoder module is 64 dimensions, the output dimension is consistent with the input dimension of the encoder, and the decoding output x' is obtained, and the parameters are obtained along with network training.

The training baseline model adopts a reconstruction loss function, and adds all reconstruction losses in steps 1 to 4, for example, the reconstruction loss in step 1 is:

wherein

Representing the output representation from the decoder module of the encoder module,

representing the input from the encoder module(s),

Next, a truncated neighbor sampling method proposed in the present invention is described. Flow diagram As shown in FIG. 2, the following notation is used, including with the set notation C_apRepresenting a truncated attribute neighbor candidate set for node p. The method comprises the following steps:

step 1, random walk is carried out on each paper node through a given walk step length L (set to be 10 for example) and the walk times gamma (set to be 80 for example) of each paper node, a walk sequence of each paper node is obtained through the walk, paper nodes in the sequence are subjected to statistical deduplication to obtain a truncated attribute neighbor candidate set C of each paper node p_ap。

Step 2, the attribute of the thesis node p is expressed in a single hot way

The cosine similarity calculation is carried out to obtain m cosine similarity result sets as

Step 3, sorting the cosine similarity results of the truncated attribute neighbors in a descending order, and taking the largest top K (for example, set to be 20) as the final attribute neighbors of the thesis node p; if m is<K, performing a playback random sampling to obtain a set of

In this flow, the cosine similarity calculation formula in step 2 is:

the reverse attention mechanism proposed by the present invention is described next. Flow diagram As shown in FIG. 3, the notation is used to include a preliminary structure/attribute representation of a paper node

Preliminary structure/attribute representation of attributes/structure neighbors

The method comprises the following steps:

step 1, a method of point-by-point attention mechanism ([ Luong et al, 2015)]Effective applications to Attention-based Neural Machine Translation), calculating a structural similarity coefficient by adopting point multiplication, and expressing according to the primary structure of the nodes of the paper

And preliminary structural representation of attribute neighbors

And calculating the point-by-point similarity, and normalizing the point-by-point similarity of K (set to 20, for example) to obtain a similarity coefficient.

And 2, obtaining a new reverse attention coefficient through the normalization of a reverse mechanism.

In this flow, the point-by-point similarity calculation formula in step 1 is:

the normalized formula is:

where exp denotes the natural index.

The reverse attention coefficient calculation formula in step 2 is as follows:

wherein the reverse attention coefficient indicates the degree of attention of the paper node to each neighbor. The weighted neighbor information representation is calculated according to the attention coefficient:

how the low-dimensional vector representation learning method for the paper network structure by fusing the sampled neighbor information provided by the invention is applied to the baseline model based on the depth self-encoder is described next. Flow diagram As shown in FIG. 4, the following notation is used, including with the set notation C_apRepresenting a truncated attribute neighbor candidate set. The method comprises the following steps:

step 1, obtaining the candidate of the truncated attribute neighbor of each node in the thesis network G by truncation neighbor sampling, and using a set symbol C_aRepresenting a truncated attribute neighbor candidate set;

step 2, applying cosine similarity measurement mode to paper node p and node truncation attribute neighbor candidate set

In the above, the attribute of paper node p is expressed as one-hot

Calculating and reserving K attribute neighbors with highest similarity as final truncation attribute neighbors; where m represents the number of truncated attribute neighbors of paper node pThe number of the first and second groups is,

step 3, reserving T (for example, set to be 20) structure neighbors of the node p of the node paper as final structure neighbors through random sampling, wherein the set is represented as

Wherein

Representing the ith structure neighbor of the node p, wherein the value of i is 1-T;

step 4, establishing a structural multilayer self-encoder AE_sThe structure unique representation of the paper nodes is used as input, and the preliminary structure representation of each paper node in the paper network and the preliminary structure representation of the attribute neighbor are obtained through a structure multilayer self-encoder; build Attribute multilayer autoencoder AE_aThe attribute unique representation of the paper nodes is used as input, and the preliminary attribute representation of each paper node in the attribute network and the preliminary attribute representation of the structure neighbor are obtained through an attribute multilayer self-encoder;

step 5, a reverse attention mechanism and a weighting and method are provided, the preliminary structure representation of the paper node and the preliminary structure representation of the attribute neighbor are fused to obtain a fusion structure representation of the paper node, and the attribute representation of the paper node and the preliminary attribute representation of the structure neighbor are fused to obtain a fusion attribute representation of the paper node through the reverse attention mechanism and the weighting and method;

step 6, enabling the fusion structure representation of the paper node in the step 5 to interact with the attribute multilayer self-encoder through reconstruction operation, and enabling the fusion attribute representation of the paper node in the step 5 to interact with the structure multilayer self-encoder;

and 7, splicing the fusion structure representation obtained through the interaction in the step 6 and the fusion attribute representation obtained through the interaction in the step 6 to obtain a final representation of the thesis node.

In this flow, step 1 is the same as step 1 of the truncated neighbor sampling method, the default walking step length L is set to 80 steps, and the number of walks γ of each paper node is set to 10 times by default.

Step 2 adopts the same method as step 2 and step 3 of the truncated neighbor sampling method.

Step 3, obtaining T final structure neighbors of the paper node through random sampling; if the number of the structure neighbor candidates of the thesis node is less than T, the random sampling is put back to obtain a set of

In particular, T is 20.

The step 4 comprises the following steps:

step 4-1 is consistent with the step of establishing the structural multi-layer self-encoder in the baseline model based on the depth self-encoder in the foregoing, and the structural multi-layer self-encoder AE is established_s. Structural one-hot representation with input as paper node

Obtaining an output by a decoding module

And preliminary structural representation of attribute neighbors

Step 4-2, withIn the foregoing, the steps of establishing the attribute multi-layer self-encoder in the baseline model based on the depth self-encoder are consistent, and the attribute multi-layer self-encoder AE is established_a. Input as an attribute-only representation of a paper node

Obtaining an output by a decoding module

Paper node p and structure neighbor C_sAttribute-independent representation of input attributes multilayer autoencoder AE_aObtaining a preliminary attribute representation of a paper node at a hidden layer

And preliminary attribute representation of structural neighbors

The step 5 comprises the following steps:

step 5-1, a method by point-by-point attention mechanism ([ Luong et al, 2015)]Effective applications to Attention-based Neural Machine Translation), by calculating a structural similarity coefficient, according to the preliminary structural representation of the nodes of the paper

And preliminary structural representation of attribute neighbors

Calculating point-by-point similarity:

and normalizing the K point multiplied similarity to obtain a similarity coefficient:

where exp denotes the natural index. The new inverse attention coefficient is obtained by normalization of the inverse mechanism:

and 5-2, controlling the importance of the neighbor by adopting a weighting sum mechanism:

and obtaining the fusion structure representation of the nodes of the thesis. Wherein λ_sThe value is 0-1 and is used for controlling the importance degree represented by the neighbor information.

Step 5-3, similar to step 5-1, the weighted neighbor information representation is calculated:

step 5-4, similar to step 5-2, uses a weighted sum mechanism to control the importance of the neighbors:

obtaining a fused attribute representation of the nodes of the paper, wherein lambda_aThe value is 0-1 and is used for controlling the importance degree represented by the neighbor information.

The step 6 comprises the following steps:

Wherein n is_aIn order to be the number of attributes,

for the property-one-hot representation of node p,

is a penalty factor.

Wherein n is_sIn order to be the number of attributes,

for the property-one-hot representation of node p,

is a penalty factor. The decoder parameters are trained along with the method.

In step 7, the fusion structure of the thesis obtained by the interaction in step 6 is represented

And step 6, the fusion attribute representation of the thesis node is obtained by interaction

Is spliced to obtain

As the final output representation.

And 8, inputting the low-dimensional representation of the labeled thesis nodes in the thesis network into a support vector machine classifier for training to obtain the trained classifier.

Example 1

This example uses the Cora paper citation data set and the DBLP paper citation data set for experimental and validation work.

Cora is a paper reference dataset containing 2708 paper nodes and 5429 edges expressing reference relationships, and the attribute information of each node is a unique representation extracted from the title and abstract text, with 1433 dimensions. Each node belongs to a paper domain category, with a total of 7 categories.

DBLP is a paper reference dataset containing 18448 paper nodes and 45526 edges expressing reference relationships, and the attribute information of each node is a unique hot representation extracted from the paper text, having 2476 dimensions. Each node belongs to a category of the paper domain, with a total of 4 categories.

In order to verify the effectiveness of the invention, the invention mainly compares the performances of different methods in the node classification task:

deepwalk ([ Perozzi et al, 2014] Deepwalk: Online learning of social representations): a method based on random walk generates a paper node sequence through the walk and learns to obtain a low-dimensional expression vector of a paper node by using a skip-gram algorithm and a hierarchical softmax optimization method in a word2vec tool.

Node2vec ([ Grover et al, 2016] Node2vec: Scalable feature learning for networks): a method based on biased random walk controls the breadth and depth of the walk through parameters p and q, and after a biased walk sequence is obtained, low-dimensional vector representation of a node is obtained through learning through a Skip-gram algorithm and a negative sampling optimization method.

TADW ([ Yang et al, 2015] Network representation learning with rich information): a thesis network representation learning method capable of fusing structure information and attribute information simultaneously utilizes the structure information and the attribute information simultaneously through a matrix decomposition method. Because of the simultaneous use of multi-source information, the effectiveness of TADW is generally better than models that use only structural information.

TriDNR ([ Pan et al, 2016] Tri-Party Deep Network replication): the model is a semi-supervised attribute network representation learning model based on a simple neural network, and two training networks are constructed: a structure training network and an attribute-label training network, which are shared by the two to represent the nodes. The method can simultaneously fuse structure information, attribute information and supervision information. The overhead information used by the method is removed from the comparison for fairness.

MVC _ DNE ([ Yang et al, 2017] From properties to links: Deep network embedding on completing graphs): the method is a depth learning method based on a multilayer self-encoder, and the method assumes that a mutual mapping relationship exists from a structure to an attribute to a certain extent, and the structure and the attribute are kept consistent in a depth space during training.

The method of node classification is selected for comparing the expression effects of the paper node vectors. In the experiment, a Cross-validation (Cross-validation) method is adopted, and an SVM classifier is used for evaluating the classification effect of each data set and the method.

The invention adopts two measures of Micro-average (Micro-F1) and Macro-average (Macro-F1).

The calculation mode of the Micro-F1 is as follows:

wherein the Macro-F1 is calculated in the following way:

wherein P and R represent precision and recall, respectively.

The effect of the node classification on the Cora dataset and the DBLP dataset of the present invention and other methods is shown in FIG. 5, where 10% and 50% of the nodes of the paper are taken as training data, respectively, and the rest are taken as test data. This process was repeated 30 times, and the average of Micro-F1 (Mi-F1 in the figure) and Macro-F1 (Ma-F1 in the figure) was used as a final index for comparison. Compared with the models Deepwalk and node2vec which do not utilize the attribute information of the paper network, the method utilizing the attribute information is obviously superior in performance. Compared with the methods TADW, TriDNR and MVC-DNE which simultaneously utilize the thesis network structure information and the attribute information, the method of the invention simultaneously combines the depth self-encoder and the truncation neighbor sampling, and can realize better performance. On two data sets and two measurement indexes, the method provided by the invention can be improved by about 2 percent relatively to a second best method, and a great improvement is obtained.

While the invention has been described in terms of a number of papers and methods for their identification and recommendation, which are presently preferred embodiments of the invention, it should be understood that various modifications and adaptations can be made by those skilled in the art without departing from the principles of the invention, and are intended to be within the scope of the invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims

1. A method for paper classification and recommendation, comprising the steps of:

step 1, obtaining the candidate of the truncated attribute neighbor of each paper node in the paper network G by truncation neighbor sampling, and using a set symbol C_apRepresenting a truncated attribute neighbor candidate set of a paper node p;

step 2, calculating to obtain a final truncated attribute neighbor based on the truncated attribute neighbor candidate set;

2. The method of claim 1, wherein the truncated neighbor samples in step 1 are: randomly walking each paper node by a given walking step length L and the walking times gamma of each node, obtaining a walking sequence of each paper node by walking, and obtaining a truncated attribute neighbor candidate set C of each paper node by counting and de-duplicating the paper nodes in the sequence_a。

3. The method of claim 2, wherein step 2 comprises:

To represent

The property of (1) is expressed by single heat, and the value of i is 1-m;

one-hot representation of attributes of paper node p

And (3) performing cosine similarity calculation:

to represent

And ith attribute unique hot representation

Cosine similarity of (d);

then m sets of cosine similarity results are obtained as

Step 2-2, sorting the cosine similarity results of the truncation attribute neighbors of the paper node p in a descending order, and taking the top K maximum neighbors as the final attribute neighbors of the paper node p; if m is less than K, the random sampling is put back to obtain a set of

Wherein

Representing the ith property of the sampled node pNeighbors, C_ApRepresenting a collection of these attribute neighbors.

4. The method according to claim 3, wherein in step 3, T final structure neighbors of a paper node are obtained by random sampling; if the number of the structure neighbor candidates of the thesis node is less than T, the random sampling is put back to obtain a set of

Wherein

5. The method of claim 4, wherein step 4 comprises:

step 4-1, building a structural multilayer self-encoder AE_sStructured multilayer autoencoder AE_sThe device comprises an encoding module and a decoding module, wherein the encoding module is as follows:

y¹＝σ(W¹x_s+b¹)

y^m＝σ(W^my^m-1+b^m)

structural one-hot representation of the representation input, d_sFor the dimension of the one-hot representation of the structure, σ is the sigmoid function,

denotes d_sA complex space of dimensions,;

the decoding module inverts the process of the coding module to obtain output

Wherein n is_sAs to the number of the nodes,

the structure input for the ith node is expressed uniquely,

for the structural representation resulting from the decoding by the encoder,

is a penalty parameter;

And structural representation of attribute neighbors

Wherein

step 4-2, establishing an attribute multilayer self-encoder AE_aPaper node p and structure neighbor C_spAttribute-independent representation of input attributes multilayer self-encoderAE_aObtaining a preliminary attribute representation of a paper node p at a hidden layer

And attribute representation of structure neighbors

Wherein

And l is 1-T for the initial attribute representation obtained from the hidden layer for the ith structure neighbor.

6. The method of claim 5, wherein step 5 comprises:

step 5-1, calculating a structural similarity coefficient through point multiplication by a point multiplication attention mechanism method, and representing according to the primary structure of the thesis node p

And structural representation of attribute neighbors

Calculating point-by-point similarity:

wherein e_p，jNormalizing the point-by-point similarity between the thesis node p and the attribute neighbor K to obtain a similarity coefficient alpha_p，j：