CN112148876A - Paper classification and recommendation method - Google Patents
Paper classification and recommendation method Download PDFInfo
- Publication number
- CN112148876A CN112148876A CN202011009122.9A CN202011009122A CN112148876A CN 112148876 A CN112148876 A CN 112148876A CN 202011009122 A CN202011009122 A CN 202011009122A CN 112148876 A CN112148876 A CN 112148876A
- Authority
- CN
- China
- Prior art keywords
- representation
- attribute
- node
- neighbor
- paper
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 94
- 230000004927 fusion Effects 0.000 claims abstract description 27
- 238000005070 sampling Methods 0.000 claims abstract description 27
- 238000012549 training Methods 0.000 claims abstract description 20
- 239000013598 vector Substances 0.000 claims abstract description 16
- 230000008569 process Effects 0.000 claims abstract description 4
- 238000012706 support-vector machine Methods 0.000 claims abstract description 4
- 230000007246 mechanism Effects 0.000 claims description 20
- 238000004364 calculation method Methods 0.000 claims description 10
- 230000006870 function Effects 0.000 claims description 10
- 230000003993 interaction Effects 0.000 claims description 8
- 239000011159 matrix material Substances 0.000 claims description 5
- 238000010606 normalization Methods 0.000 claims description 4
- 239000004576 sand Substances 0.000 claims description 2
- 238000013528 artificial neural network Methods 0.000 abstract description 3
- 238000013145 classification model Methods 0.000 abstract 2
- 238000005295 random walk Methods 0.000 description 6
- 241000689227 Cora <basidiomycete fungus> Species 0.000 description 5
- 238000004422 calculation algorithm Methods 0.000 description 5
- 230000000694 effects Effects 0.000 description 5
- 238000005457 optimization Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000001537 neural effect Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000013519 translation Methods 0.000 description 3
- 238000002790 cross-validation Methods 0.000 description 2
- 238000007418 data mining Methods 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000005259 measurement Methods 0.000 description 2
- 230000006978 adaptation Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000013604 expression vector Substances 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 230000010076 replication Effects 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/335—Filtering based on additional data, e.g. user or group profiles
- G06F16/337—Profile generation, learning or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/194—Calculation of difference between files
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a paper classification and recommendation method, which comprises the following steps: sampling each thesis node to obtain candidate attribute neighbor nodes, wherein all structure neighbors of each thesis node are used as candidate structure neighbor nodes; generating attribute representation and structure representation of the thesis node and the neighbor node; fusing the structure representation of the paper node and the structure representation of the attribute neighbor to obtain a fused structure representation, and fusing the attribute representation of the paper node and the attribute representation of the final structure neighbor to obtain a fused attribute representation; then training is carried out through a reconstruction loss function in the neural network training process, and the representation is continuously updated; splicing the updated fusion structure representation and fusion attribute representation to obtain a low-dimensional vector representation of the paper node; inputting the low-dimensional representation of part of the thesis nodes with the category labels in the thesis network into a support vector machine, and training to obtain a classification model; and predicting the category of the thesis node through a classification model, and classifying the thesis and recommending the thesis of the same category by using the category.
Description
Technical Field
The invention belongs to the field of data mining, and particularly relates to a paper classification and recommendation method.
Background
At present, network representation learning is a popular research direction in the field of data mining, and has very wide application in real life, such as node classification, node clustering, recommendation systems and the like. The thesis network representation learning is a research field of low-dimensional representation learning of the thesis network containing attribute information on nodes, and low-dimensional representation is learned for each node and applied to offline application. The mainstream method includes a representation learning method using only the structural information and a multisource information fusion representation learning method combining the structural information and the attribute information at the same time. Meanwhile, the multi-source information is merged into the representation learning method combining the structural information and the attribute information, and the learned representation obtains better performance compared with a method only utilizing the structural information.
Although current multi-source information fusion methods have made great progress, these methods usually ignore the problem of outliers in the attribute network. Specifically, a paper node may not belong to the same category as its neighboring nodes or neighboring nodes with similar attributes, but many models are currently based on such similarity assumptions; in addition, the structure and the attribute information have the phenomenon of inconsistent distribution, namely the description of the structure and the attribute information to the node is not uniform. The similarity assumption and consistency assumption based on the current model are not consistent with the network distribution of the paper in reality, so that the representation capability of the model is limited.
Network data is widely available in real life, such as a paper reference network, a social media communication network and the like. How to learn the nodes in the paper network to obtain a low-dimensional representation and apply the low-dimensional representation to offline tasks (such as paper classification, paper recommendation and the like) is a research hotspot. Some methods of network representation learning are mainly described below.
Perozzi et al ([ Perozzi et al, 2014] deep walk: Online learning of social representation) proposed a deep walk model, which walks network data by a random walk method for the first time to obtain a plurality of walk sequences as pseudo sentences. And then training the walking sequences by using a word vector learning method Skip-gram and a hierarchical softmax optimization method so as to obtain a low-dimensional node representation.
Grover et al ([ Grover et al, 2016] node2vec: Scalable feature learning for networks) improved on the basis of a Deepwalk model in a random walk strategy, and proposed that biased random walks exist and the depth and the width of the walk are controlled based on parameters p and q. And after a biased walking sequence is obtained, learning by a Skip-gram algorithm and a negative sampling optimization method to obtain a low-dimensional vector representation of the node.
Tang et al ([ Tang et al, 2015] LINE: Large-scale information network embedding) then proposed a LINE model, and first proposed concepts of first-order similarity and second-order similarity for measuring the similarity between nodes. And training in an Edge sampling optimization mode to obtain low-dimensional vector representation of the nodes.
Yang et al ([ Yang et al, 2015] Network representation learning with rich text information) proposed the first model TADW that incorporates text information into paper Network representation learning, which utilizes both structural information and attribute information by a matrix decomposition method. Because of the simultaneous use of multi-source information, the effectiveness of TADW is generally better than models that use only structural information.
The TriDNR proposed by Pan et al ([ Pan et al, 2016] Tri-Party Deep Network Representation) is a semi-supervised attribute Network Representation learning model based on a simple neural Network, the Deep association among three relations of node-node, node-attribute and label-attribute is considered for the first time by the TriDNR, and two training networks are constructed: a structure training network and an attribute-label training network, which are shared by the two to represent the nodes. And through an alternate training mode, the representation of the node in the structure diagram and the representation in the attribute-label network are continuously and alternately updated, so that the low-dimensional representation can simultaneously fuse the structure information, the attribute information and the supervision information, and the two networks can mutually influence and promote during training.
Yang et al ([ Yang et al, 2017] From properties to links: Deep network embedding on completing graphs) bases on consistency assumptions of structure and properties. For example, in a network of articles, a node (article) belonging to the natural language processing category is more likely to refer to a plurality of articles in the same domain, and at the same time, the contents between the articles referred to each other are more likely to be similar. Therefore, theoretically, a relationship of mutual mapping exists from the structure to the attribute to some extent. By such an assumption, consistency of the two in the depth space is maintained, aiming to enhance the robustness of the representation, thereby obtaining a higher quality representation. However, by analysis, there is a gap between real-world data and such consistency assumptions, limiting the quality of the representation learned by such methods.
Disclosure of Invention
The purpose of the invention is as follows: in the thesis network representation learning method, gaps exist between similarity assumptions and consistency assumptions which are usually based and real data distribution, and the representation quality is limited. The problem to be solved by the invention is how to dig more hidden information under real data distribution, reduce the difference between hypothesis and real data, thus learning to obtain a high-quality low-dimensional thesis node representation, and apply the low-dimensional thesis node representation to offline tasks such as thesis classification and thesis recommendation.
The invention specifically provides a low-dimensional vector representation learning method for a thesis network structure by fusing sampling neighbor information, which comprises the following steps:
step 1, obtaining the candidate of the truncated attribute neighbor of each paper node in the paper network G by truncation neighbor sampling, and using a set symbol CaRepresenting a truncated attribute neighbor candidate set of a paper node p;
step 2, based on the truncated attribute, neighbor candidateAnd (3) collecting, and calculating to obtain a final truncated attribute neighbor: applying cosine similarity metric approach to paper node p and truncated attribute neighbor candidate set of nodesIn the above, the attribute of paper node p is expressed as one-hotTruncating the attribute-only hot representation set of the attribute neighbor candidate set toCalculating and reserving K attribute neighbors with highest similarity as final truncation attribute neighbors; where m represents the number of truncated attribute neighbors of paper node p,representing the ith truncated attribute neighbor in the truncated attribute neighbor candidate set for paper node p,representing the attribute one-hot representation of the ith truncation attribute neighbor, wherein the value of i is 1-m;
step 3, reserving T structure neighbors of the thesis node p as final structure neighbors;
step 4, establishing a structural multilayer self-encoder AEsThe structure unique representation of the paper nodes is used as input, and the preliminary structure representation and the structure representation of attribute neighbors of each paper node in the attribute network are obtained through a structure multilayer self-encoder; build Attribute multilayer autoencoder AEaThe attribute unique representation of the paper nodes is used as input, and the preliminary attribute representation and the attribute representation of the structure neighbor of each paper node in the attribute network are obtained through an attribute multilayer self-encoder;
step 5, fusing the preliminary structure representation of the paper node and the structure representation of the attribute neighbor to obtain a fused structure representation of the paper node, and fusing the preliminary attribute representation of the paper node and the attribute representation of the attribute neighbor to obtain a fused attribute representation of the paper node;
step 6, enabling the fusion structure representation and the attribute multilayer self-encoder in the step 5 to interact through reconstruction operation, and enabling the fusion attribute representation and the structure multilayer self-encoder in the step 5 to interact;
step 7, splicing the fusion structure representation obtained through the interaction in the step 6 and the fusion attribute representation obtained through the interaction in the step 6 to obtain a low-dimensional representation of the paper node;
step 8, inputting the low-dimensional representation of the labeled thesis nodes in the thesis network into a support vector machine classifier for training to obtain a trained classifier;
and 9, inputting the thesis nodes without labels in the thesis network into a classifier, obtaining the prediction labels of the thesis nodes as classification results, and performing thesis classification and thesis recommendation of the same category.
In step 1, the thesis network refers to a network in which nodes are thesis and the nodes include attribute information (title text, abstract text, body text, etc.), such as a thesis network Cora ([ Sen et al, 2008)]A collection classification in network data) containing title text and summary text information; the truncated neighbor sampling is: randomly walking each paper node by giving a walking step length L (set to be 10) and a walking number gamma (set to be 80) of each node, obtaining a walking sequence of each paper node by walking, and obtaining a truncated attribute neighbor candidate set C of each paper node by statistically de-duplicating the paper nodes in the sequenceap。
The step 2 comprises the following steps:
step 2-1, setting a truncated attribute neighbor candidate set of the paper node pRepresenting the ith truncated attribute neighbor in the truncated attribute neighbor candidate set of paper node p, truncated attribute neighbor candidate set CapIs a collection of property-only-hot representationsTo representThe property of (1) is expressed by single heat, and the value of i is 1-m;
one-hot representation of attributes of paper node pAnd attribute-only representation set of truncated attribute neighbor candidate setAnd (3) performing cosine similarity calculation:
Step 2-2, sorting the cosine similarity results of the truncated attribute neighbors of the paper node p in a descending order, and taking the largest first K (for example, the first 20) as the final attribute neighbors of the paper node p; if m is<K, performing a playback random sampling to obtain a set ofWhereinRepresenting the sampled i-th attribute neighbor, CApRepresenting a collection of these attribute neighbors.
In step 3, obtaining T (for example, set to be 20) final structure neighbors of the paper node through random sampling; if the number of the structure neighbor candidates of the thesis node is less than T, the random sampling is put back to obtain a set ofWhereinAnd i represents the ith structure neighbor of the node p, and the value of i is 1-T.
Step 4 comprises the following steps:
step 4-1, build a structural multi-layer autoencoder ([ Hinton et al, 2006)]A fast learning algorithm for deep belief nets)AEsStructured multilayer autoencoder AEsThe device comprises an encoding module and a decoding module, wherein the encoding module is as follows:
y1=σ(W1xs+b1)
ym=σ(Wmym-1+bm)
wherein, yiRepresents the output of the i-th fully-connected layer, WiAnd biThe distribution represents the projection matrix and offset vector in the ith layer,structural one-hot representation of the representation input, dsσ is the sigmoid function for the dimension of the one-hot representation of the structure. WiAnd biIs trained along with the representation learning method, whereinDenotes dsA complex space of dimensions.
The decoding module inverts the coding module to obtain outputThe parameters are trained along with the learning method through reconstruction, and the loss function of the structural multi-layer self-encoder is
Wherein n issAs to the number of the nodes,the structure input for the ith node is expressed uniquely,for the structural representation resulting from the decoding by the encoder,to penalize the parameter, e.g., set to 10, more penalties are given to non-zero values in the representation.
Paper node p and attribute neighbor CApStructural independent thermal representation input structure multilayer self-encoder AE ofsObtaining a preliminary structural representation of the nodes of the thesis at the hidden layerAnd structural representation of attribute neighborsWhereinObtaining a preliminary structure representation for the ith attribute neighbor from the hidden layer;
step 4-2, establishing an attribute multilayer self-encoder AEaThe method is consistent with the step 4-1, and the paper node p and the structure neighbor C are connectedspAttribute-independent representation of input attributes multilayer autoencoder AEaIn hidden layer theoryPreliminary attribute representation of text nodesAnd attribute representation of structure neighborsWhereinA preliminary attribute representation derived from the hidden layer for the ith structure neighbor. l is 1-T;
the step 5 comprises the following steps:
step 5-1, a method by point-by-point attention mechanism ([ Luong et al, 2015)]Effective applications to Attention-based Neural Machine Translation), calculating a structural similarity coefficient by point multiplication, and expressing the structural similarity according to the primary structure of a paper node pAnd structural representation of attribute neighborsCalculating point-by-point similarity:
wherein ep,jNormalizing the point-by-point similarity between the thesis node p and the attribute neighbor K to obtain a similarity coefficient alphap,j:
Wherein exp represents a natural index; normalization by the inverse mechanism yields a new inverse attention coefficient
Wherein the reverse attention coefficient indicates the attention degree of the paper node to each neighbor;
Step 5-2 comprises: the importance of the neighbors is controlled using a weighted sum mechanism:
obtaining a fused structural representation of a paper nodeWherein the parameter lambdasAnd the value is 0-1, and the method is used for controlling the importance degree represented by the neighbor information.
Step 5-4 comprises: the importance of the neighbors is controlled using a weighted sum mechanism:
obtaining a fused attribute representation of a paper nodeWherein the parameter lambdaaAnd the value is 0-1, and the method is used for controlling the importance degree represented by the neighbor information.
The step 6 comprises the following steps:
step 6-1, build structure-to-attribute decoder, let the fused structure representation in step 5-2Deriving a reconstructed property representation by a structure-to-property decoderInteracting through reconstruction operations to obtain a loss function for a structure-to-attribute decoder
Wherein n isaIn order to be the number of attributes,for the property-one-hot representation of node p,is a penalty coefficient;
step 6-2, build attribute to structure decoder, let step 5-4 fuse attribute representationDeriving a reconstructed structure representation by an attribute-to-structure decoderInteracting through reconstruction operations to obtain a loss function for an attribute-to-structure decoder
Wherein n issIn order to be the number of attributes,for the property-one-hot representation of node p,is a penalty factor. The parameters are trained with the method.
And 7, splicing the fusion structure representation and the fusion attribute representation after the interaction of the step 6-1 and the step 6-2 to be used as the final representation output of the nodes of the thesis.
According to the method provided by the invention, the information of the nodes is further supplemented by sampling the structure neighbors and the attribute neighbors of the thesis nodes, so that the problem that the theoretical assumption is inconsistent with the real data distribution is solved, and the node representation quality in the thesis network representation learning is improved. And the method of the invention provides a truncation sampling method aiming at the problem of sampling efficiency, relieves the problem of complexity and is easier to expand into a large-scale thesis network.
The invention has the following beneficial effects:
the technical level is as follows:
1 original thesis network representation learning method is basically based on two basic assumptions: similarity assumptions (nodes that are similar in structure or similar in attributes are more similar) and consistency assumptions (predicted preferences for structure information and attribute information are consistent). However, data analysis finds that a gap exists between the hypothesis and the real data distribution, and the phenomenon limits the representation quality learned by the current model. In order to alleviate the problem, the method of the invention excavates and fuses additional attribute information and structure information, reduces the difference between the assumed data distribution and the real data distribution, and improves the representation quality.
The truncation neighbor sampling method provided by the invention avoids high-complexity calculation for each thesis node, reduces the complexity of the method, and is suitable for large-scale attribute network data sets.
3 the reverse attention mechanism provided by the invention enables each thesis node to fuse more diverse neighbor information, and the reverse attention mechanism has a better effect through experimental verification.
The application level is as follows: the low-dimensional vector representation learning method for the paper network structure by fusing the sampled neighbor information can be applied to the paper network with any attribute, is not limited to the network with the text attribute, can be conveniently expanded to a large-scale paper network representation learning task, and performs paper classification and paper recommendation on the basis.
Drawings
The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
Fig. 1 is a flowchart of a paper network representation learning method based on a depth self-encoder according to an embodiment.
Fig. 2 is a flow chart of truncating neighbor samples.
FIG. 3 is a flow chart of a reverse attention mechanism.
FIG. 4 is a flow chart of the method of the present invention in conjunction with a deep auto-encoder based paper network representation learning method.
FIG. 5 is a graph of the effect of the invention and other methods on node classification on the Cora dataset and the DBLP dataset.
Detailed Description
The flow chart is shown in fig. 1. As shown in the figure, the invention provides a paper classification and recommendation method, firstly establishing a baseline model based on a depth self-encoder, which comprises the following steps:
step 1, establishing a depth structure multilayer self-encoder AEsComprising an encoder module and a decoder module, the input being a structural, one-hot representation x of a paper nodesThe encoder output is a low dimensional hidden layer vectorDecoder output is reconstructed structure representation x's。
Step 2, establishing a depth attribute multilayer self-encoder AEaComprising an encoder module and a decoder module, the input being a property-independent representation x of a paper nodeaThe encoder output is a low dimensional hidden layer vectorDecoder output is reconstructed attribute representation x'a。
Step 3, establishing a decoder with attributes to a structure, and inputting the decoder as attribute hidden layer vectors of the paper nodesOutput as a reconstructed structure representation of attributes to structures
Step 4, a structure-to-attribute decoder is established, and the structure hidden layer vector of the thesis node is inputOutput as a reconstructed attribute representation of structure to attribute
In this flow, the self-encoder in step 1 and step 2 is implemented by extending the depth self-encoder method ([ Hinton et al, 2006] a fast learning algorithm for deep belief nets), and the encoder module can be expressed as:
y1=σ(W1x+b1)
ym=σ(Wmym-1+bm)
wherein, WiAnd biThe distribution represents the projection matrix and offset vector in the i-th layer, x represents the input structure/property one-hot representation, and σ is the sigmoid function. WiAnd biTrained along with the representation learning method. The input of the encoder is the structure one-hot representation and the attribute one-hot representation of the paper nodes, the structure information and the attribute information of the paper network are summarized, and the output dimension of the encoder is set to be 64 dimensions.
The decoder module in steps 1 to 4 can be represented as the inversion process of the encoder module, the input dimension of the decoder module is 64 dimensions, the output dimension is consistent with the input dimension of the encoder, and the decoding output x' is obtained, and the parameters are obtained along with network training.
The training baseline model adopts a reconstruction loss function, and adds all reconstruction losses in steps 1 to 4, for example, the reconstruction loss in step 1 is:
whereinRepresenting the output representation from the decoder module of the encoder module,representing the input from the encoder module(s),to penalize the parameter, e.g., set to 10, more penalties are given to non-zero values in the representation.
Next, a truncated neighbor sampling method proposed in the present invention is described. Flow diagram As shown in FIG. 2, the following notation is used, including with the set notation CapRepresenting a truncated attribute neighbor candidate set for node p. The method comprises the following steps:
step 1, random walk is carried out on each paper node through a given walk step length L (set to be 10 for example) and the walk times gamma (set to be 80 for example) of each paper node, a walk sequence of each paper node is obtained through the walk, paper nodes in the sequence are subjected to statistical deduplication to obtain a truncated attribute neighbor candidate set C of each paper node pap。
Step 2, the attribute of the thesis node p is expressed in a single hot wayAnd attribute-only representation set of truncated attribute neighbor candidate setThe cosine similarity calculation is carried out to obtain m cosine similarity result sets as
Step 3, sorting the cosine similarity results of the truncated attribute neighbors in a descending order, and taking the largest top K (for example, set to be 20) as the final attribute neighbors of the thesis node p; if m is<K, performing a playback random sampling to obtain a set of
In this flow, the cosine similarity calculation formula in step 2 is:
the reverse attention mechanism proposed by the present invention is described next. Flow diagram As shown in FIG. 3, the notation is used to include a preliminary structure/attribute representation of a paper nodePreliminary structure/attribute representation of attributes/structure neighborsThe method comprises the following steps:
step 1, a method of point-by-point attention mechanism ([ Luong et al, 2015)]Effective applications to Attention-based Neural Machine Translation), calculating a structural similarity coefficient by adopting point multiplication, and expressing according to the primary structure of the nodes of the paperAnd preliminary structural representation of attribute neighborsAnd calculating the point-by-point similarity, and normalizing the point-by-point similarity of K (set to 20, for example) to obtain a similarity coefficient.
And 2, obtaining a new reverse attention coefficient through the normalization of a reverse mechanism.
In this flow, the point-by-point similarity calculation formula in step 1 is:
the normalized formula is:
where exp denotes the natural index.
The reverse attention coefficient calculation formula in step 2 is as follows:
wherein the reverse attention coefficient indicates the degree of attention of the paper node to each neighbor. The weighted neighbor information representation is calculated according to the attention coefficient:
how the low-dimensional vector representation learning method for the paper network structure by fusing the sampled neighbor information provided by the invention is applied to the baseline model based on the depth self-encoder is described next. Flow diagram As shown in FIG. 4, the following notation is used, including with the set notation CapRepresenting a truncated attribute neighbor candidate set. The method comprises the following steps:
step 1, obtaining the candidate of the truncated attribute neighbor of each node in the thesis network G by truncation neighbor sampling, and using a set symbol CaRepresenting a truncated attribute neighbor candidate set;
step 2, applying cosine similarity measurement mode to paper node p and node truncation attribute neighbor candidate setIn the above, the attribute of paper node p is expressed as one-hotTruncating the attribute-only hot representation set of the attribute neighbor candidate set toCalculating and reserving K attribute neighbors with highest similarity as final truncation attribute neighbors; where m represents the number of truncated attribute neighbors of paper node pThe number of the first and second groups is,representing the ith truncated attribute neighbor in the truncated attribute neighbor candidate set for paper node p,representing the attribute one-hot representation of the ith truncation attribute neighbor, wherein the value of i is 1-m;
step 3, reserving T (for example, set to be 20) structure neighbors of the node p of the node paper as final structure neighbors through random sampling, wherein the set is represented asWhereinRepresenting the ith structure neighbor of the node p, wherein the value of i is 1-T;
step 4, establishing a structural multilayer self-encoder AEsThe structure unique representation of the paper nodes is used as input, and the preliminary structure representation of each paper node in the paper network and the preliminary structure representation of the attribute neighbor are obtained through a structure multilayer self-encoder; build Attribute multilayer autoencoder AEaThe attribute unique representation of the paper nodes is used as input, and the preliminary attribute representation of each paper node in the attribute network and the preliminary attribute representation of the structure neighbor are obtained through an attribute multilayer self-encoder;
step 5, a reverse attention mechanism and a weighting and method are provided, the preliminary structure representation of the paper node and the preliminary structure representation of the attribute neighbor are fused to obtain a fusion structure representation of the paper node, and the attribute representation of the paper node and the preliminary attribute representation of the structure neighbor are fused to obtain a fusion attribute representation of the paper node through the reverse attention mechanism and the weighting and method;
step 6, enabling the fusion structure representation of the paper node in the step 5 to interact with the attribute multilayer self-encoder through reconstruction operation, and enabling the fusion attribute representation of the paper node in the step 5 to interact with the structure multilayer self-encoder;
and 7, splicing the fusion structure representation obtained through the interaction in the step 6 and the fusion attribute representation obtained through the interaction in the step 6 to obtain a final representation of the thesis node.
In this flow, step 1 is the same as step 1 of the truncated neighbor sampling method, the default walking step length L is set to 80 steps, and the number of walks γ of each paper node is set to 10 times by default.
Step 2 adopts the same method as step 2 and step 3 of the truncated neighbor sampling method.
Step 3, obtaining T final structure neighbors of the paper node through random sampling; if the number of the structure neighbor candidates of the thesis node is less than T, the random sampling is put back to obtain a set ofIn particular, T is 20.
The step 4 comprises the following steps:
step 4-1 is consistent with the step of establishing the structural multi-layer self-encoder in the baseline model based on the depth self-encoder in the foregoing, and the structural multi-layer self-encoder AE is establisheds. Structural one-hot representation with input as paper nodeObtaining an output by a decoding module
Paper node p and attribute neighbor CapStructural independent thermal representation input structure multilayer self-encoder AE ofsObtaining a preliminary structural representation of the nodes of the thesis at the hidden layerAnd preliminary structural representation of attribute neighbors
Step 4-2, withIn the foregoing, the steps of establishing the attribute multi-layer self-encoder in the baseline model based on the depth self-encoder are consistent, and the attribute multi-layer self-encoder AE is establisheda. Input as an attribute-only representation of a paper nodeObtaining an output by a decoding module
Paper node p and structure neighbor CsAttribute-independent representation of input attributes multilayer autoencoder AEaObtaining a preliminary attribute representation of a paper node at a hidden layerAnd preliminary attribute representation of structural neighbors
The step 5 comprises the following steps:
step 5-1, a method by point-by-point attention mechanism ([ Luong et al, 2015)]Effective applications to Attention-based Neural Machine Translation), by calculating a structural similarity coefficient, according to the preliminary structural representation of the nodes of the paperAnd preliminary structural representation of attribute neighborsCalculating point-by-point similarity:
and normalizing the K point multiplied similarity to obtain a similarity coefficient:
where exp denotes the natural index. The new inverse attention coefficient is obtained by normalization of the inverse mechanism:
wherein the reverse attention coefficient indicates the degree of attention of the paper node to each neighbor. The weighted neighbor information representation is calculated according to the attention coefficient:
and 5-2, controlling the importance of the neighbor by adopting a weighting sum mechanism:
and obtaining the fusion structure representation of the nodes of the thesis. Wherein λsThe value is 0-1 and is used for controlling the importance degree represented by the neighbor information.
Step 5-3, similar to step 5-1, the weighted neighbor information representation is calculated:
step 5-4, similar to step 5-2, uses a weighted sum mechanism to control the importance of the neighbors:
obtaining a fused attribute representation of the nodes of the paper, wherein lambdaaThe value is 0-1 and is used for controlling the importance degree represented by the neighbor information.
The step 6 comprises the following steps:
step 6-1, build structure-to-attribute decoder, let the fused structure representation in step 5-2Deriving a reconstructed property representation by a structure-to-property decoderInteracting through reconstruction operations to obtain a loss function for a structure-to-attribute decoder
Wherein n isaIn order to be the number of attributes,for the property-one-hot representation of node p,is a penalty factor.
Step 6-2, build attribute to structure decoder, let step 5-4 fuse attribute representationDeriving a reconstructed structure representation by an attribute-to-structure decoderInteracting through reconstruction operations to obtain a loss function for an attribute-to-structure decoder
Wherein n issIn order to be the number of attributes,for the property-one-hot representation of node p,is a penalty factor. The decoder parameters are trained along with the method.
In step 7, the fusion structure of the thesis obtained by the interaction in step 6 is representedAnd step 6, the fusion attribute representation of the thesis node is obtained by interactionIs spliced to obtainAs the final output representation.
And 8, inputting the low-dimensional representation of the labeled thesis nodes in the thesis network into a support vector machine classifier for training to obtain the trained classifier.
And 9, inputting the thesis nodes without labels in the thesis network into a classifier, obtaining the prediction labels of the thesis nodes as classification results, and performing thesis classification and thesis recommendation of the same category.
Example 1
This example uses the Cora paper citation data set and the DBLP paper citation data set for experimental and validation work.
Cora is a paper reference dataset containing 2708 paper nodes and 5429 edges expressing reference relationships, and the attribute information of each node is a unique representation extracted from the title and abstract text, with 1433 dimensions. Each node belongs to a paper domain category, with a total of 7 categories.
DBLP is a paper reference dataset containing 18448 paper nodes and 45526 edges expressing reference relationships, and the attribute information of each node is a unique hot representation extracted from the paper text, having 2476 dimensions. Each node belongs to a category of the paper domain, with a total of 4 categories.
In order to verify the effectiveness of the invention, the invention mainly compares the performances of different methods in the node classification task:
deepwalk ([ Perozzi et al, 2014] Deepwalk: Online learning of social representations): a method based on random walk generates a paper node sequence through the walk and learns to obtain a low-dimensional expression vector of a paper node by using a skip-gram algorithm and a hierarchical softmax optimization method in a word2vec tool.
Node2vec ([ Grover et al, 2016] Node2vec: Scalable feature learning for networks): a method based on biased random walk controls the breadth and depth of the walk through parameters p and q, and after a biased walk sequence is obtained, low-dimensional vector representation of a node is obtained through learning through a Skip-gram algorithm and a negative sampling optimization method.
TADW ([ Yang et al, 2015] Network representation learning with rich information): a thesis network representation learning method capable of fusing structure information and attribute information simultaneously utilizes the structure information and the attribute information simultaneously through a matrix decomposition method. Because of the simultaneous use of multi-source information, the effectiveness of TADW is generally better than models that use only structural information.
TriDNR ([ Pan et al, 2016] Tri-Party Deep Network replication): the model is a semi-supervised attribute network representation learning model based on a simple neural network, and two training networks are constructed: a structure training network and an attribute-label training network, which are shared by the two to represent the nodes. The method can simultaneously fuse structure information, attribute information and supervision information. The overhead information used by the method is removed from the comparison for fairness.
MVC _ DNE ([ Yang et al, 2017] From properties to links: Deep network embedding on completing graphs): the method is a depth learning method based on a multilayer self-encoder, and the method assumes that a mutual mapping relationship exists from a structure to an attribute to a certain extent, and the structure and the attribute are kept consistent in a depth space during training.
The method of node classification is selected for comparing the expression effects of the paper node vectors. In the experiment, a Cross-validation (Cross-validation) method is adopted, and an SVM classifier is used for evaluating the classification effect of each data set and the method.
The invention adopts two measures of Micro-average (Micro-F1) and Macro-average (Macro-F1).
The calculation mode of the Micro-F1 is as follows:
wherein the Macro-F1 is calculated in the following way:
wherein P and R represent precision and recall, respectively.
The effect of the node classification on the Cora dataset and the DBLP dataset of the present invention and other methods is shown in FIG. 5, where 10% and 50% of the nodes of the paper are taken as training data, respectively, and the rest are taken as test data. This process was repeated 30 times, and the average of Micro-F1 (Mi-F1 in the figure) and Macro-F1 (Ma-F1 in the figure) was used as a final index for comparison. Compared with the models Deepwalk and node2vec which do not utilize the attribute information of the paper network, the method utilizing the attribute information is obviously superior in performance. Compared with the methods TADW, TriDNR and MVC-DNE which simultaneously utilize the thesis network structure information and the attribute information, the method of the invention simultaneously combines the depth self-encoder and the truncation neighbor sampling, and can realize better performance. On two data sets and two measurement indexes, the method provided by the invention can be improved by about 2 percent relatively to a second best method, and a great improvement is obtained.
While the invention has been described in terms of a number of papers and methods for their identification and recommendation, which are presently preferred embodiments of the invention, it should be understood that various modifications and adaptations can be made by those skilled in the art without departing from the principles of the invention, and are intended to be within the scope of the invention. All the components not specified in the present embodiment can be realized by the prior art.
Claims (10)
1. A method for paper classification and recommendation, comprising the steps of:
step 1, obtaining the candidate of the truncated attribute neighbor of each paper node in the paper network G by truncation neighbor sampling, and using a set symbol CapRepresenting a truncated attribute neighbor candidate set of a paper node p;
step 2, calculating to obtain a final truncated attribute neighbor based on the truncated attribute neighbor candidate set;
step 3, reserving T structure neighbors of the thesis node p as final structure neighbors;
step 4, establishing a structural multilayer self-encoder AEsThe structure unique representation of the paper nodes is used as input, and the preliminary structure representation and the structure representation of attribute neighbors of each paper node in the attribute network are obtained through a structure multilayer self-encoder; build Attribute multilayer autoencoder AEaThe attribute unique representation of the paper nodes is used as input, and the preliminary attribute representation and the attribute representation of the structure neighbor of each paper node in the attribute network are obtained through an attribute multilayer self-encoder;
step 5, fusing the preliminary structure representation of the paper node and the structure representation of the attribute neighbor to obtain a fused structure representation of the paper node, and fusing the preliminary attribute representation of the paper node and the attribute representation of the attribute neighbor to obtain a fused attribute representation of the paper node;
step 6, enabling the fusion structure representation and the attribute multilayer self-encoder in the step 5 to interact through reconstruction operation, and enabling the fusion attribute representation and the structure multilayer self-encoder in the step 5 to interact;
step 7, splicing the fusion structure representation obtained through the interaction in the step 6 and the fusion attribute representation obtained through the interaction in the step 6 to obtain a low-dimensional representation of the paper node;
step 8, inputting the low-dimensional representation of the labeled thesis nodes in the thesis network into a support vector machine classifier for training to obtain a trained classifier;
and 9, inputting the thesis nodes without labels in the thesis network into a classifier, obtaining the prediction labels of the thesis nodes as classification results, and performing thesis classification and thesis recommendation of the same category.
2. The method of claim 1, wherein the truncated neighbor samples in step 1 are: randomly walking each paper node by a given walking step length L and the walking times gamma of each node, obtaining a walking sequence of each paper node by walking, and obtaining a truncated attribute neighbor candidate set C of each paper node by counting and de-duplicating the paper nodes in the sequencea。
3. The method of claim 2, wherein step 2 comprises:
step 2-1, setting a truncated attribute neighbor candidate set of the paper node p Representing the ith truncated attribute neighbor in the truncated attribute neighbor candidate set of paper node p, truncated attribute neighbor candidate set CapIs a collection of property-only-hot representations To representThe property of (1) is expressed by single heat, and the value of i is 1-m;
one-hot representation of attributes of paper node pAnd attribute-only representation set of truncated attribute neighbor candidate setAnd (3) performing cosine similarity calculation:
Step 2-2, sorting the cosine similarity results of the truncation attribute neighbors of the paper node p in a descending order, and taking the top K maximum neighbors as the final attribute neighbors of the paper node p; if m is less than K, the random sampling is put back to obtain a set ofWhereinRepresenting the ith property of the sampled node pNeighbors, CApRepresenting a collection of these attribute neighbors.
4. The method according to claim 3, wherein in step 3, T final structure neighbors of a paper node are obtained by random sampling; if the number of the structure neighbor candidates of the thesis node is less than T, the random sampling is put back to obtain a set ofWhereinAnd i represents the ith structure neighbor of the node p, and the value of i is 1-T.
5. The method of claim 4, wherein step 4 comprises:
step 4-1, building a structural multilayer self-encoder AEsStructured multilayer autoencoder AEsThe device comprises an encoding module and a decoding module, wherein the encoding module is as follows:
y1=σ(W1xs+b1)
ym=σ(Wmym-1+bm)
wherein, yiRepresents the output of the i-th fully-connected layer, WiAnd biThe distribution represents the projection matrix and offset vector in the ith layer,structural one-hot representation of the representation input, dsFor the dimension of the one-hot representation of the structure, σ is the sigmoid function,denotes dsA complex space of dimensions,;
the decoding module inverts the process of the coding module to obtain outputThe parameters are trained along with the learning method through reconstruction, and the loss function of the structural multi-layer self-encoder is
Wherein n issAs to the number of the nodes,the structure input for the ith node is expressed uniquely,for the structural representation resulting from the decoding by the encoder,is a penalty parameter;
paper node p and attribute neighbor CApStructural independent thermal representation input structure multilayer self-encoder AE ofsObtaining a preliminary structural representation of the nodes of the thesis at the hidden layerAnd structural representation of attribute neighborsWhereinObtaining a preliminary structure representation for the ith attribute neighbor from the hidden layer;
step 4-2, establishing an attribute multilayer self-encoder AEaPaper node p and structure neighbor CspAttribute-independent representation of input attributes multilayer self-encoderAEaObtaining a preliminary attribute representation of a paper node p at a hidden layerAnd attribute representation of structure neighborsWhereinAnd l is 1-T for the initial attribute representation obtained from the hidden layer for the ith structure neighbor.
6. The method of claim 5, wherein step 5 comprises:
step 5-1, calculating a structural similarity coefficient through point multiplication by a point multiplication attention mechanism method, and representing according to the primary structure of the thesis node pAnd structural representation of attribute neighborsCalculating point-by-point similarity:
wherein ep,jNormalizing the point-by-point similarity between the thesis node p and the attribute neighbor K to obtain a similarity coefficient alphap,j:
Wherein exp represents a natural index; normalization by the inverse mechanism yields a new inverse attention coefficient
Wherein the reverse attention coefficient indicates the attention degree of the paper node to each neighbor;
7. The method of claim 6, wherein step 5-2 comprises: the importance of the neighbors is controlled using a weighted sum mechanism:
9. The method of claim 8, wherein steps 5-4 comprise: the importance of the neighbors is controlled using a weighted sum mechanism:
10. The method of claim 9, wherein step 6 comprises:
step 6-1, build structure-to-attribute decoder, let the fused structure representation in step 5-2Deriving a reconstructed property representation by a structure-to-property decoderInteracting through reconstruction operations to obtain the loss of structure-to-attribute decoder
Wherein n isaIn order to be the number of attributes,for the property-one-hot representation of node p,is a penalty coefficient;
step 6-2, build attribute to structure decoder, let step 5-4 fuse attribute representationDeriving a reconstructed structure representation by an attribute-to-structure decoderInteracting through reconstruction operations to obtain attribute-to-structure decoder losses
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011009122.9A CN112148876B (en) | 2020-09-23 | 2020-09-23 | Paper classification and recommendation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011009122.9A CN112148876B (en) | 2020-09-23 | 2020-09-23 | Paper classification and recommendation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112148876A true CN112148876A (en) | 2020-12-29 |
CN112148876B CN112148876B (en) | 2023-10-13 |
Family
ID=73897877
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011009122.9A Active CN112148876B (en) | 2020-09-23 | 2020-09-23 | Paper classification and recommendation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112148876B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112749757A (en) * | 2021-01-21 | 2021-05-04 | 厦门大学 | Paper classification model construction method and system based on gated graph attention network |
CN112836050A (en) * | 2021-02-04 | 2021-05-25 | 山东大学 | Citation network node classification method and system aiming at relation uncertainty |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108228728A (en) * | 2017-12-11 | 2018-06-29 | 北京航空航天大学 | A kind of paper network node of parametrization represents learning method |
CN109376857A (en) * | 2018-09-03 | 2019-02-22 | 上海交通大学 | A kind of multi-modal depth internet startup disk method of fusion structure and attribute information |
US20200082531A1 (en) * | 2018-09-07 | 2020-03-12 | 3Mensio Medical Imaging B.V. | Method, Device and System for Dynamic Analysis from Sequences of Volumetric Images |
-
2020
- 2020-09-23 CN CN202011009122.9A patent/CN112148876B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108228728A (en) * | 2017-12-11 | 2018-06-29 | 北京航空航天大学 | A kind of paper network node of parametrization represents learning method |
CN109376857A (en) * | 2018-09-03 | 2019-02-22 | 上海交通大学 | A kind of multi-modal depth internet startup disk method of fusion structure and attribute information |
US20200082531A1 (en) * | 2018-09-07 | 2020-03-12 | 3Mensio Medical Imaging B.V. | Method, Device and System for Dynamic Analysis from Sequences of Volumetric Images |
Non-Patent Citations (3)
Title |
---|
CHUNYANG TAN: "MSGE: A Multi-step Gated Model for Knowledge Graph Completion", 《SPRINGER》 * |
余传明;林奥琛;钟韵辞;安璐;: "基于网络表示学习的科研合作推荐研究", 情报学报, no. 05 * |
谭春阳: "复杂网络结构的表示学习研究", 《硕士电子期刊》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112749757A (en) * | 2021-01-21 | 2021-05-04 | 厦门大学 | Paper classification model construction method and system based on gated graph attention network |
CN112749757B (en) * | 2021-01-21 | 2023-09-12 | 厦门大学 | Thesis classification model construction method and system based on gating graph annotation force network |
CN112836050A (en) * | 2021-02-04 | 2021-05-25 | 山东大学 | Citation network node classification method and system aiming at relation uncertainty |
CN112836050B (en) * | 2021-02-04 | 2022-05-17 | 山东大学 | Citation network node classification method and system aiming at relation uncertainty |
Also Published As
Publication number | Publication date |
---|---|
CN112148876B (en) | 2023-10-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110321494B (en) | Socialized recommendation method based on matrix decomposition and network embedding combined model | |
CN112380435B (en) | Document recommendation method and system based on heterogeneous graph neural network | |
CN109753589A (en) | A kind of figure method for visualizing based on figure convolutional network | |
CN111709518A (en) | Method for enhancing network representation learning based on community perception and relationship attention | |
Qu et al. | Curriculum learning for heterogeneous star network embedding via deep reinforcement learning | |
CN111079409A (en) | Emotion classification method by using context and aspect memory information | |
CN112100485B (en) | Comment-based scoring prediction article recommendation method and system | |
CN109447261B (en) | Network representation learning method based on multi-order proximity similarity | |
CN112597296A (en) | Abstract generation method based on plan mechanism and knowledge graph guidance | |
CN112148876A (en) | Paper classification and recommendation method | |
CN112559764A (en) | Content recommendation method based on domain knowledge graph | |
Song et al. | Session-based recommendation with hierarchical memory networks | |
CN110781271A (en) | Semi-supervised network representation learning model based on hierarchical attention mechanism | |
CN107491782A (en) | Utilize the image classification method for a small amount of training data of semantic space information | |
Luo et al. | ResumeNet: A learning-based framework for automatic resume quality assessment | |
Jin et al. | Deepwalk-aware graph convolutional networks | |
CN112784118A (en) | Community discovery method and device in graph sensitive to triangle structure | |
Cheng et al. | Dynamic embedding on textual networks via a gaussian process | |
CN109815335A (en) | A kind of paper domain classification method suitable for document network | |
Li et al. | Neural architecture search via proxy validation | |
CN117150041A (en) | Small sample knowledge graph completion method based on reinforcement learning | |
Bao et al. | HTRM: a hybrid neural network algorithm based on tag-aware | |
CN116821371A (en) | Method for generating scientific abstracts of multiple documents by combining and enhancing topic knowledge graphs | |
CN116610874A (en) | Cross-domain recommendation method based on knowledge graph and graph neural network | |
CN117009674A (en) | Cloud native API recommendation method integrating data enhancement and contrast learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |