CN108228728B

CN108228728B - Parameterized thesis network node representation learning method

Info

Publication number: CN108228728B
Application number: CN201711308050.6A
Authority: CN
Inventors: 蒲菊华; 陈虞君; 刘伟; 班崟峰; 杜佳鸿; 熊璋
Original assignee: Shenzhen Beihang Emerging Industrial Technology Research Institute; Beihang University
Current assignee: Shenzhen Beihang Emerging Industrial Technology Research Institute; Beihang University
Priority date: 2017-12-11
Filing date: 2017-12-11
Publication date: 2020-07-17
Anticipated expiration: 2037-12-11
Also published as: CN108228728A

Abstract

The invention discloses a parameterized thesis network node representation learning method, which comprises the steps of firstly constructing an empty thesis node queue, and then sampling a neighbor node of any one thesis node and a neighbor node of the neighbor in a random walk mode; the selected thesis node is used as a first element of the thesis node queue, and then other elements of the thesis node queue are obtained according to the skipping probability; all the thesis nodes are traversed, and a thesis node queue set exists; then, generating neural network training data of the multilayer perceptron by adopting a positive and negative sampling method; and finally, processing by adopting a neural network paper probability model to obtain the nonlinear transformation from paper node semantic information to paper node vector representation, and further obtaining the vector representation of the paper nodes.

Description

Parameterized thesis network node representation learning method

Technical Field

The present invention relates to a representation learning method of a thesis network, and more particularly, to a parameterized representation learning method of a thesis network node.

Background

Social networks belong to internet concept nouns. Social networking service platforms such as Blog, WIKI, Tag, SNS, RSS, etc. The internet leads to a brand-new human social organization and a survival mode to enter the invention, a huge group-network group which exceeds the earth space is constructed, the human society of the 21 st century gradually emerges brand-new forms and characteristics, and individuals in the network global era are converging into a new social group. The network of papers refers to the networking of the relations between papers, and shows on the network the mutual citation and the sharing author between papers.

The expression learning of the paper network mostly adopts a non-parametric model at present, such as a Deepwalk: Online L earning of Social representation "translation of deep walking: Online learning of Social representation, Bryan Perozzi et al, 26Mar 2014, wherein a word2vec non-parametric method is used for learning the expression of the paper network.

Network fabric refers to the physical connectivity of a network. The topological structure of the network has various structures, such as a two-dimensional structure including a ring, a star, a tree, a neighbor connection network, a pulsating flow array and the like, and the contents of the interconnection network structure analysis, Wangxing, Chen nationality editors, 10 months in 1990 and pages 36-38 are referred to. With the development of networks, there are also network structures, honeycomb structures, etc.

The present representation learning method of the paper network must traverse all papers in the paper network to learn the representation of the papers. When a new thesis is added to the thesis network, representation learning of the new thesis cannot be performed, and classification and analysis work of the new thesis cannot be further completed.

Disclosure of Invention

In order to solve the problem that a newly added paper cannot represent learning, the invention provides a parameterized paper network node representation learning method, in the invention, a star-shaped paper network structure is sampled by means of a random walk statistical model to obtain paper node vector information, a paper node queue after sampling is composed of a series of paper nodes, the selection of the next paper node is random each time, after the paper network sampling step is carried out, a deep neural network based on a twin network framework is constructed, wherein two identical sub-networks of the twin network are composed of multilayer perceptrons (M L P), the learned multilayer perceptrons are used as nonlinear mapping functions, and network node representation vectors are obtained by constructing the nonlinear mapping functions from rich text information of network nodes to network node representation vectors.

The invention provides a parameterized thesis network node representation learning method, which is characterized by comprising the following steps of:

the method comprises the following steps that firstly, a neighbor-paper node set of any paper node and a neighbor-paper node set of the neighbor are obtained through sampling based on a random walk method;

step 101: constructing a paper node empty queue marked as V, wherein the V is used for storing a paper node sequence; the maximum queue element bit number of the paper node empty queue V is mv, and the value of mv is 10-20; then, step 102 is executed;

step 102: selecting any paper node paper_aThen the paper is put_aPutting the position 1 in a thesis node queue V; then step 103 is executed;

step 103: acquiring paper belonging to any paper node_aAll neighbor paper node set of (1), noted

The neighbor paper node refers to a paper node paper associated with any one paper node_aA thesis node set with edges in between; then step 104 is executed;

step 104: according to the neighbor paper node set

The total number B of middle neighbor nodes determines the probability of jumping to the first jump

c represents the hop count; then step 105 is executed;

step 105: using alias sampling algorithm (alias sampling), based on the current

In the above-mentioned

Neighbor paper node for acquiring next hop

At the same time will

Place bit 2 of the paper node queue V; then step 106 is executed;

step 106: obtaining nodes belonging to neighbor thesis

All neighbor paper node set of (2), i.e. neighbor-paper node set of neighbors

Then step 107 is performed;

step 107: computing neighbor thesis nodes

With any paper node paper_aShortest hop count in between

Then step 108 is executed;

wherein

Representing the least hop distance from any neighbor paper node to the previous paper node;

step 108: according to said

To determine

Jump to second jump probability

Then step 109 is performed;

the second probability of hopping

c represents the hop count;

step 109: warp beam

After determination, according to

And alias sampling, selecting

As a next-hop thesis node, will simultaneously

Bit 3 placed in paper node queue V; then, step 110 is executed;

step 110: circularly executing the step 106 and the step 109 until the digit in the paper node queue V is mv, and stopping the random walk; then step 111 is executed;

step 111: repeating the steps 101 to 110 for each thesis node in the whole thesis network to complete the neighbor node sampling of the thesis node, and then there is a thesis node queue set denoted as VF ═ V₁,V₂,...,V_f,...,V_F}; then step 201 is executed;

V₁representing a first paper node queue;

V₂representing a second paper node queue;

V_frepresenting any one thesis node queue, and f representing an identification number of the thesis node queue;

V_Frepresenting the last paper node queue, F representing the total number of paper node queue sets, F ∈ F;

generating neural network training data of the multilayer perceptron by adopting a negative sampling method;

step 201: establishing a positive sample queue Q_pAnd negative sample queue Q_nRespectively storing positive sampling data and negative sampling data required by training a neural network, and then executing step 202;

step 202: setting up a neighbor window size hyperparameter WD, if WD is in a paper node queue V_fIn, then belongs to the thesis node queue V_fEach article in (1) is noted as

Then step 203 is executed;

indicating belonging to any one of the paper node queues V_fThe first paper node of (1);

indicating belonging to any one of the paper node queues V_fThe second paper node of (1);

indicating belonging to any one of the paper node queues V_fG represents an identification number of a neighbor thesis node;

indicating belonging to any one of the paper node queues V_fG denotes a paper node queue V_fLength of G ∈ G;

for any node in the paper queue, all nodes with the distance from the node in the queue smaller than WD are considered as positive sample nodes, and every time, for any paper node, the invention firstly acquires 2 × WD adjacent paper node sets which belong to the same and marks the adjacent paper node sets as the positive sample nodes

Indicated at the neighboring paper node

The node with the minimum identification number;

indicated at the neighboring paper node

The node with the medium and maximum identification number;

indicated at the neighboring paper node

Middle removing

And

queue outside-adjacent thesis nodes, subscript l denotes the identification number of the node that is not the largest or the smallest thesis;

step 203: for any arbitrary queue-paper node

Sampling from small to large according to the sequence of the neighbor identification numbers, wherein the sampling process is to

Each node in (1) and any queue-thesis node

A triple is formed, and then step 204 is executed;

for the said

And arbitrary queue-paper node

Form a triad, i.e. (,)

) Wherein +1 represents that the triplet is a positive sample, whereas-1 represents that the triplet is a negative sample, and (b) is

) Insert positive sample queue Q_pPerforming the following steps;

for the said

And arbitrary queue-paper node

Form a triad, i.e. (,)

) Insert positive sample queue Q_pPerforming the following steps;

for the said

And arbitrary queue-paper node

Form a triad, i.e. (,)

) Insert positive sample queue Q_pPerforming the following steps;

step 204: step 202 and step 203 are executed in a loop until the thesis node queue set VF ═ V₁,V₂,...,V_f,...,V_FAll the paper nodes in all the paper node queues in the queue complete the sampling work of the neighbor paper nodes to obtain a positive sample queue Q_pThen, step 207 is performed;

step 205: sampling all paper nodes in the network, and selecting any two paper nodes from the network at each time, namely a first paper node paper_aSecond paper node paper_o(ii) a If a connecting edge exists between two paper nodes or two randomly selected paper nodes are the same, continuing the step, otherwise, performing any two paper nodes_a、paper_oComposition triplet (paper)_a,paper_o-1) storing into a negative sample queue Q_nThen step 206 is performed;

step 206: step 205 is executed in a loop, and a positive/negative sample ratio parameter μ is established, assuming a positive sample queue Q_pThe number of the middle triples is np, then when Q is_nStops when the number of triples in (d) equals μ × np, and then performs step 207;

step 207: queue Q of positive samples obtained in step 204_pAnd the negative sample queue Q obtained in step 206_nAre combined together to obtain a new sample queue Q_New＝{Q₁....,Q_(1+μ)×npExecute step 208;

Q₁indicating a new sample queue Q_NewMinimum identification number inThe triplet of (2);

Q_(1+μ)×npindicating a new sample queue Q_NewThe subscript (1+ mu) × np represents the sample queue Q_NewThe kit comprises (1+ mu) × np triplets;

step 208: queue Q of new samples_New＝{Q₁....,Q_(1+μ)×npDisordering the sequence of all elements in the sequence to obtain a disordered sample queue Q_Sorting＝{Q₁....,Q_(1+μ)×npThen step 301 is performed;

processing in a neural network paper probability model based on a multilayer perceptron;

step 301: for the Q obtained in step 208_Sorting＝{Q₁....,Q_(1+μ)×npOne triple (paper) at a time_a,paper_oB), putting the neural network paper nodes into a neural network paper probability model as a pair of paper nodes for learning, and executing the step 302;

step 302: for two paper nodes paper in each triple_aAnd paper_oBy means of a model

Mapping to obtain two corresponding transformed vectors

Step 303 is executed;

to belong to paper_aThe multi-layer perceptron function of (1);

to belong to paper_oThe multi-layer perceptron function of (1);

step 303: calculating Euclidean distances of the two thesis nodes, and executing step 304;

the Euclidean distance is:

E_posrepresents the euclidean shortest distance; e_negRepresents the Euclidean longest distance; c represents the hop count;

304, merging the positive and negative samples, putting the merged positive and negative samples into a loss function of Euclidean distance related to distributed representation of the thesis, carrying out loss function calculation of balancing the positive and negative samples to obtain an integral loss function L, and executing the step 305;

step 305: determining a non-linear transformation function f by using a random gradient descent algorithm_θCompleting any two paper node papers_aAnd paper_oThe representation of (2) is learned.

Network node representation describes each node in the network with a vector. In order to process the numerous and complicated information and the neighbor node relationship in the social network, the invention provides a parameterized network node representation learning method. The network node representation learning method can learn a nonlinear mapping function, so that the vector representation of the network node can be simply obtained from the content information of the network node. For vector representation of a node, a random walk is used to obtain peripheral nodes of the node, and then a relation between the node and neighbor nodes of the node is constructed according to a twin network, so that a nonlinear mapping function is learned and determined. In the simulation experiment, under the condition that the network node expression vectors obtained by the method are used by the same SVM classifier, the classification result is obviously better than that of other methods, and the method can be verified to be effective in the aspect of network node expression of the thesis network.

Drawings

FIG. 1 is a flow chart of parameterized paper network node representation learning in accordance with the present invention.

FIG. 2 shows the results of the evaluation of the Micro-F1 index in the Cora data set.

FIG. 3 shows the results of evaluation of the Macro-F1 index in the Cora data set.

FIG. 4 is the results of the evaluation of the Micro-F1 metric in the Wiki data set.

FIG. 5 is a graph of the results of the evaluation of the Macro-F1 metric in a Wiki dataset.

Detailed Description

The present invention will be described in further detail with reference to the accompanying drawings and examples.

In the present invention, the paper is referred to as a paper, multiple papers form a paper set, which is referred to as AP, and AP ═ paper₁,paper₂,…,paper_a,…,paper_o,…,paper_A}; any one of the papers in the paper set AP is called a paper node in the star-shaped paper network structure;

paper₁representing a first paper node;

paper₂representing a second paper node;

paper_arepresenting the a-th paper node, wherein a represents the identification number of the paper node;

paper_Arepresenting the last paper node, a representing the total number of papers, a ∈ a.

For convenience of explanation, paper_aAlso called any paper node, paper_oIs to remove paper_aIn addition to another arbitrary paper node, hereinafter paper_aReferred to as the first arbitrary paper node, paper_oReferred to as the second arbitrary paper node.

Will belong to any paper node paper_aAll neighbor paper nodes of (2)

And is

Also referred to as neighbor-paper node set for short;

representing a paper belonging to any one of the paper nodes_aThe first neighboring node of (a);

representing a paper belonging to any one of the paper nodes_aA second neighboring node of (a);

representing a paper belonging to any one of the paper nodes_aB represents the identification number of the neighbor node;

representing a paper belonging to any one of the paper nodes_aB represents belonging to a paper_aB ∈ a.

Will belong to any one neighbor node

All neighbor paper nodes of (2)

And is

Also referred to simply as the neighbor of a neighbor-paper node set.

Indicating belonging to any neighbor paper node

The first neighboring node of (a);

indicating belonging to any neighbor paper node

A second neighboring node of (a);

indicating belonging to any neighbor paper node

E represents a node belonging to a neighbor paper

The identification number of the neighbor node of (2);

indicating belonging to any neighbor paper node

E denotes a node belonging to

E ∈ a (total number of neighbor nodes for short neighbors).

In the present invention, the star-shaped paper network structure is adopted as the structure of fig. 1.19(c) on page 37 of "analysis of interconnection network structure". Wandingxing, a good compilation in old countries; first edition of month 10, 1990.

In the invention, the semantic information of the paper node refers to vector representation of words contained in the title, abstract and text of the paper through a lexical process. The part-of-speech processing is to perform 0 or 1 binarization coding according to the occurrence or non-occurrence of the semantic information of the paper node in any paper content, so as to obtain a 0 or 1 represented vector corresponding to the paper content. A "0" indicates absence and a "1" indicates presence. And processing all the thesis nodes belonging to the star-shaped thesis network structure by adopting lexical processing to obtain a two-dimensional matrix of association of the word number and the thesis node number, which is called a thesis binary matrix for short.

Neural network thesis probability model for constructing multilayer perceptron by adopting thesis node semantic information

In the invention, the construction of the paper probability model comprises the following steps: (A) setting neural network paper probability model expression

(B) From AP ═ { paper ═₁,paper₂,…,paper_a,…,paper_o,…,paper_AChoose any two paper nodes paper_aAnd paper_oAnd combining the paper_aAnd paper_oIn that

Respectively obtaining the data belonging to the paper_aMulti-layer perceptron function of

Belong to paper_oMulti-layer perceptron function of

(C) According to

And

computing paper_aAnd paper_oEuclidean distance between the positive and negative samples, and performing loss function processing for balancing the positive and negative samples; (D) using a random gradient descent algorithm to pair f_θThe weighted parameter WEIGHT and the BIAS parameter BIAS are processed to obtain a nonlinear transformation function f of a learning target_θAnd traversing all the triples to obtain the neural network training of the semantic information of the nodes of the thesis based on the multilayer perceptron.

In the invention, the constructed neural network paper probability model expression based on the multilayer perceptron is recorded as f_θAs a non-linear mapping function, for the paper_aSemantic information of the paper node. By mapping a function f to a non-linearity_θLearning is performed to determine the parameter θ in the nonlinear mapping. Based on a non-linear mapping function f_θPaper for any one of the papers can be obtained_aExpression of paper probability model

In the present invention, for paper_aAll have rich text information corresponding to them

Then obtaining the signal by adopting a multilayer perceptron neural network

Is performed by a non-linear transformation. Assuming that the multi-layer perceptron has H layers in total, the neural network based on the multi-layer perceptron has a WEIGHT parameter WEIGHT and a BIAS parameter BIAS of each layer.

In the present invention, WEIGHT parameter WEIGHT ═ { WEIGHT ═ WEIGHT [ ({ WEIGHT) }₁,weight₂,...,weight_h,...,weight_H}。

In the present invention, BIAS parameter BIAS ═ BIAS₁,bias₂,...,bias_h,...,bias_H}。

weight₁A weight parameter representing a first layer network in the neural network;

weight₂a weight parameter representing a second tier network in the neural network;

weight_hrepresenting the weight parameter of any layer of network in the neural network, and h represents the layer number identification number of the perceptron;

weight_Hrepresenting the last layer in a neural networkThe weight parameter of the network, H represents the total number of layers of the perceptron;

bias₁a bias parameter representing a first layer network in the neural network;

bias₂a bias parameter representing a second layer network in the neural network;

bias_hrepresenting the bias parameters of any layer of network in the neural network;

bias_Hrepresenting the bias parameters of the last layer of the neural network.

First layer output for a multi-layer perceptron is noted

Wherein

Representing the output of the first layer of a multi-layer perceptron, f₁Representing the activation function of the first layer neural network.

Similarly, the second layer output of the multi-layer perceptron is recorded as

Wherein

Representing the output of the second layer in a multi-layer perceptron, f₂Representing the activation function of the second layer neural network.

Any output of the multi-layer perceptron is recorded as

Is composed of

f_hRepresenting the activation function of any layer of neural network.

The last output of the multi-layer perceptron is recorded as

In the present inventionIn the light of the above, the activation function f for any layer of the multi-layer perceptron_hNon-linear functions, such as sigmoid or tanh functions, are typically chosen. Output to last layer of multi-layer perceptron

For multiple non-linear functions to input

Can thus be simply depicted as

Where θ represents the sum of all parameterized functions. Will be described in

The final output of the neural network paper probability model based on the multilayer perceptron is

In the present invention, in order to make the euclidean distance between similar points in the expression space as short as possible, the euclidean distance between dissimilar points is as long as possible. The basic form is as follows:

E_posrepresents the euclidean shortest distance; e_negRepresents the Euclidean longest distance; c represents the number of hops.

In the present invention, since the triple (paper)_a,paper_oAnd,) that represents whether the triplet is a positive or negative sample, where positive samples may be considered to require similar points in space, and negative samples may be considered to require points as far as possible in space.Thus, for the present application, the present invention may take advantage of the merging of positive and negative samples into a loss function for euclidean distance of the distributed representation of the paper:

m represents Q_SortingThe identification number of any one of the triples in (b),

representing the first arbitrary paper node in a triplet m,

representing the second arbitrary paper node in the triplet m,^(m)the flag indicating positive and negative samples in the triplet m L represents the overall loss function, which should be the sequence of out-of-order samples Q_SortingThe sum of all element loss functions in (c).

In the invention, the proportion of positive and negative samples is different, and the similarity between the positive and negative samples is different. For example, positive samples may be more similar due to the existence of a connecting edge, while negative samples are more different, so that the loss functions generated by the positive and negative samples will not be the same, and therefore, for this application, the present invention needs a harmonic parameter γ to balance the loss functions of the positive and negative samples, and therefore the loss functions will be added to γ to become:

in the invention, the purpose of training the neural network is to reduce the value of the loss function to the minimum, and in order to train the neural network and determine the weight of the neural network and the bias value of the neural network, the invention adopts a random gradient descent algorithm to learn the network parameters.

In the present invention, the model is trained by determining the non-linear transformation function f by a stochastic gradient descent algorithm_θDue to a non-linear transformation function f_θMain bagThe update value for each gradient descent of the WEIGHT parameter WEIGHT and the BIAS parameter BIAS is L partial derivatives with respect to the WEIGHT parameter WEIGHT and the BIAS parameter BIAS, so that at each iteration, the WEIGHT parameter WEIGHT and the BIAS parameter BIAS are updated at a learning rate according to the parameter update value:

WEIGHT_{rear end}＝WEIGHT_{Front side}+·ΔWEIGHT

BIAS_{Rear end}＝BIAS_{Front side}+·ΔBIAS

WEIGHT_{Front side}WEIGHT parameter for upper layer in perceptron, WEIGHT_{Rear end}Δ WEIGHT is the partial derivative of each gradient descent L with respect to the WEIGHT parameter WEIGHT for the current layer in the perceptron.

BLAS_{Front side}For sensing the bias parameters of the upper layer in the machine, B L AS_{Rear end}To sense the bias parameters of the current layer in the machine, Δ B L AS is the partial derivative of each gradient descent at L relative to the bias parameters B L AS.

When random gradient descent is used, due to the fact that the number of training iterations is too many, overfitting can occur, the early-stop method is adopted, training is stopped when the loss function L is not reduced continuously, and the overfitting phenomenon generated during training is prevented.

In the invention, the WEIGHT parameter WEIGHT and the BIAS parameter BIAS of each layer in the perceptron are saved to obtain the nonlinear transformation function f of the learning target_θThereby completing the neural network training based on the multilayer perceptron and finally obtaining the target f according to the learning_θFor the paper_aGenerating the expression vector thereof, namely constructing the neural network paper probability model of the multilayer perceptron aiming at the semantic information of paper nodes

The invention provides a parameterized thesis network node representation learning method, which specifically comprises the following steps:

in the present invention, the paper set AP ═ paper₁,paper₂,…,paper_a,…,paper_o,…,paper_AIn the star-shaped thesis network structure, the sampling of the neighbor thesis nodes of each thesis node is carried out by random walk of the jump probability of adding the previous jump and the next jump. For any paper node paper_aThe random walk method is adopted to obtain samples belonging to paper_aNeighbor-paper node set

In the invention, the neighbor paper node refers to a paper node paper associated with any one paper node_aA thesis node set with edges in between; then step 104 is executed;

step 104: according to the neighbor paper node set

The total number of middle neighbor nodes B determines the probability of jumping to each neighbor paper node

(first hop probability for short),

c represents the hop count; then step 105 is executed;

step 105: adopting alias sampling algorithm (alias sampling) according to current jump probability

In the above-mentioned

Neighbor paper node for acquiring next hop

At the same time will

Place bit 2 of the paper node queue V; then step 106 is executed;

step 106: obtaining nodes belonging to neighbor thesis

All neighbor paper node set of (2), i.e. neighbor-paper node set of neighbors

Then step 107 is performed;

step 107: computing neighbor thesis nodes

With any paper node paper_aShortest hop count in between

Then step 108 is executed;

in the present invention, wherein

Representing the minimum hop distance from any neighbor paper node to a previous paper node, e.g. if a neighbor paper node

To paper node paper_aA minimum of 1 hop is required, then

If neighbor paper node

Namely paper node paper_aThen, then

And so on.

Step 108: according to said

To determine

Probability of hopping to each neighbor paper node

(second hop probability for short); then step 109 is performed;

the second probability of hopping

c represents the number of hops.

In the present invention, the shortest hop count refers to the minimum hop required between two paper nodes.

In the invention, p is the second hop probability for adjusting paper nodes not in the paper node queue V in the random walk method

Size parameter (abbreviated as a skip parameter), q isSecond hop probability for adjusting paper nodes in the paper node queue V in a random walk method

The size parameter (jump-in parameter for short), p, q control the probability of jumping, if we want to walk more randomly in local jumping, p needs to be set larger; conversely, q needs to be set larger.

Step 109: warp beam

After determination, according to

And alias sampling, selecting

As a next-hop thesis node, will simultaneously

Bit 3 placed in paper node queue V; then, step 110 is executed;

step 111: in the present invention, the steps 101 to 110 are repeatedly executed for each thesis node in the whole thesis network to complete the neighbor node sampling of the thesis node, and then there is a thesis node queue set denoted as VF ═ V-₁,V₂,...,V_f,...,V_F}; step 201 is then performed.

V₁Representing a first paper node queue;

V₂representing a second paper node queue;

V_Frepresenting the last paper node queue, F representsTotal number of paper node queue sets, F ∈ F.

in the invention, the thesis node queue set VF (V) obtained in the step one is used for generating training data which can be used by the neural network₁,V₂,...,V_f,...,V_F}; in addition to the training data in the paper node queue set, the present invention can generate the data required for training the model by means of a negative sampling algorithm.

Then step 203 is executed;

indicating belonging to any one of the paper node queues V_fG represents the identification number of the neighbor paper node;

indicating belonging to any one of the paper node queues V_fG denotes a paper node queue V_fLength of (G ∈ G).

For any node in the paper queue, all nodes with the distance from the node to the node smaller than WD in the queue are considered as positive sample nodes, and each time, for any paper node, the invention firstly acquires 2 × WD adjacent paper node sets which belong to the same, and the nodes are marked as the positive sample nodes

Indicated at the neighboring paper node

The node with the smallest identification number.

Indicated at the neighboring paper node

The node with the largest identification number.

Indicated at the neighboring paper node

Middle removing

And

any other paper node is called a queue-adjacent paper node for short. The subscript l indicates the identification number of the node of the paper that is neither the largest nor the smallest, i.e. divided byOther identification numbers of these 2 paper nodes.

Step 203: for any arbitrary queue-paper node

Each node in (1) and any queue-thesis node

A triple is formed, and then step 204 is executed;

for the said

And arbitrary queue-paper node

Form a triad, i.e. (,)

) Insert positive sample queue Q_pIn (1).

For the said

And arbitrary queue-paper node

Form a triad, i.e. (,)

) Insert positive sample queue Q_pIn (1).

For the said

And arbitrary queue-paper node

Form a triad, i.e. (,)

) Insert positive sample queue Q_pIn (1).

step 205: sampling all paper nodes in the network, and selecting any two paper nodes (the selected two paper nodes can be adjacent or non-adjacent) from the network each time, namely, the first arbitrary paper node paper_aSecond paper node paper_o. If there is a continuous edge between two paper nodes ((paper)_a,paper_o) ∈ E), or the two randomly chosen paper nodes are identical (paper)_a＝paper_o) Continuing the step, otherwise, dividing any two paper nodes into paper_a、paper_oComposition ofTriple (paper)_a,paper_o-1) storing into a negative sample queue Q_nThen step 206 is performed;

Q₁indicating a new sample queue Q_NewThe triplet of the smallest identification number in (1).

Q_(1+μ)×npIndicating a new sample queue Q_NewSubscript (1+ μ) × np represents the sample queue Q_NewComprising (1+ μ) × np triplets.

Step 208: queue Q of new samples_New＝{Q₁....,Q_(1+μ)×npDisordering the sequence of all elements in the sequence to obtain a disordered sample queue Q_Sorting＝{Q₁....,Q_(1+μ)×npAnd then step 301 is performed.

Mapping to obtain two corresponding transformed vectors

Step 303 is executed;

to belong to paper_aThe multi-layer perceptron function of (1);

to belong to paper_oThe multi-layer perceptron function of (1);

in the present invention, the twin network is aimed at making the euclidean distance between similar points in the expression space as short as possible, and the euclidean distance between dissimilar points as long as possible. The basic form is as follows:

Example 1

The embodiment adopts the Cora paper data set and the Pubmed knowledge network data set to carry out learning and experimental work.

Cora is a collective paper data containing 2708 paper nodes, each node corresponding to a paper rich text information vector of length 1433, and 5429 edges, the rich text information vector indicating the presence or absence of a word by 0/1. Meanwhile, each node is associated with a category attribute, and the total category attribute value number is 7.

Pubmed is a knowledge network data aggregation containing 19717 paper nodes, including 19717 nodes and 44338 edges, each node corresponding to a paper rich text information vector with the length of 500, and the rich text information vector indicates whether a word exists or not by 0/1. Meanwhile, each node is associated with a category attribute, and the total category attribute value number is 3.

In order to verify the effectiveness, the invention mainly compares the performances of different methods in the classification task of the paper nodes:

deepwalk: the network is sampled by adopting a common random walk algorithm, and then the representation of each node in the network is obtained by using a word2vec algorithm. (2014deep walk of online learning of social representation [ J ]. Perozzi B, Alrfou R, Skiiena S.KDD:701-710.)

TADW: and decomposing the random walk in the Deepwalk, skillfully adding rich text information of the nodes, and obtaining the representation of each node in the network by adopting a matrix multiplication mode. (2015, Network representation with rich text information [ C ] YangC, ZHao D, et al. International conference on Intelligent Association. AAAI Press: 2111-

Node2Vec, an upgraded version of Deepwalk, employs a second-order random walk algorithm to sample the network, and then obtains a representation of each Node in the network using a word2Vec algorithm (2016, Node2Vec: Scalable Feature L earning for Networks [ C ]// Grover A, L eskovec J.KDD:855.)

The method selects a node prediction method to compare vector representation effects. In the experiment, a hybrid verification technology (cross-validation) is adopted, and SVM classifiers are selected for classification in different classification prediction methods.

The invention adopts two evaluation indexes to measure respectively Micro-F1 and Macro-F1.

The Macro-F1 calculation method comprises the following steps:

wherein P is_macroAnd R_macroRespectively representing the macro level difference and the macro recall.

The calculation method of the Micro-F1 comprises the following steps:

wherein P is_microAnd R_microRespectively representing the micro-tolerance rate and the micro-recall rate.

Effects on the Cora data set As shown in FIGS. 2 and 3, the effects of the present invention are compared with those of other methods on the Cora data set, FIG. 2 represents the performance of each method on the Micro-F1 evaluation index, and FIG. 3 represents the performance of each method on the Macro-F1 evaluation index. The horizontal axis of the two graphs represents the training data of the classifier as a percentage of the total data. The figure shows that the method has better effect than other network representation learning methods under the evaluation indexes of Micro-F1 and Macro-F1, and particularly shows that compared with a Deepwalk and Node2vec algorithm which purely use network information and do not adopt network Node semantic information, the algorithm of the invention improves the proportion of each training data by more than 5% under the evaluation indexes of Micro-F1 and Macro-F1, and the obtained network Node representation vector is remarkably better than that obtained by only using network topology information after the network Node information and the network topology structure are fused. Meanwhile, as can be seen from the comparison of the TADW method combining the network node information and the network topology information, the method provided by the present invention is still improved by 3% in both evaluation indexes.

The effect of Wiki data set is shown in FIGS. 4 and 5, and it can be seen that the present invention has better effect than other network representation learning methods under both Micro-F1 and Macro-F1 evaluation indexes. Since the category number of Wiki data sets is far more than that of the Cora data sets, it can be found that the classification effect is poor and is far lower than the result of analysis by using TADW without adopting the Deepwalk and Node2vec algorithms of network Node semantic information. This illustrates that semantics dominates the data set. According to the method, under the evaluation indexes of Micro-F1 and Macro-F1, the experiment result obtained by the TADW method is improved by 2%, and the fact that the network node expression vector obtained by the method after the network node information and the network topology structure are fused is better than the network node expression vector obtained by directly multiplying the network node expression vector by a matrix can be shown. The invention can be demonstrated to better fuse in the combination of network information and semantic information in the network node representation, and obtain a better representation vector.

Through the analysis of fig. 2-5, these experiments show that the invention can naturally merge both the network structure and the semantic information, so as to obtain a better network node representation vector, and thus, the validity of the invention can be verified.

Claims

1. A parameterized thesis network node representation learning method is characterized by comprising the following steps:

the method comprises the following steps that firstly, a neighbor thesis node set of any one thesis node and a neighbor thesis node set of a neighbor are obtained through sampling based on a random walk method;

step 103: acquiring paper belonging to any paper node_aAll neighbors ofNode set of the living thesis, note

The neighbor paper node set refers to a set of nodes related to any paper node paper_aA neighbor thesis node set with connected edges exists between the neighbor thesis nodes; then step 104 is executed;

representing a paper belonging to any one of the paper nodes_aThe first neighbor node of (2), i.e. the first neighbor paper node;

representing a paper belonging to any one of the paper nodes_aA second neighbor node of (2), i.e. a second neighbor paper node;

representing a paper belonging to any one of the paper nodes_aB represents a node belonging to a paper, i.e. the last neighbor paper node_aB ∈ a;

step 104: according to the neighbor paper node set

c represents the hop count; then step 105 is executed;

step 105: adopting alias sampling algorithm according to the current first jump probability

In the above-mentioned

Neighbor paper node for acquiring next hop

At the same time will

Place bit 2 of the paper node queue V; then step 106 is executed;

step 106: obtaining nodes belonging to any neighbor thesis

All neighbor paper node set of (2), i.e., neighbor paper node set of neighbors

Then step 107 is performed;

indicating belonging to any neighbor paper node

The first neighbor node of the neighbor, namely the first neighbor paper node of the neighbor;

indicating belonging to any neighbor paper node

A second neighbor node of the neighbor, i.e., a second neighbor paper node of the neighbor;

indicating belonging to any neighbor paper node

E denotes a node belonging to the neighbor paper, i.e. any neighbor paper node of the neighbor

The identification number of the neighbor node of (2);

indicating belonging to any neighbor paper node

E denotes a node belonging to the last neighbor of the neighbor, i.e. the last neighbor paper node of the neighbor

Total number of neighbor nodes of E ∈ a;

step 107: neighbor paper node for calculating any neighbor

With any paper node paper_aShortest hop count in between

Then step 108 is executed;

wherein

Represented by any one neighborTo a neighbor paper node located in a paper_aA minimum hop distance of a previous paper node;

step 108: according to said

To determine

Second hop probability of hopping to each neighbor paper node

Then step 109 is performed;

the second probability of hopping

c represents the hop count; p is a second hop probability for adjusting paper nodes not in the paper node queue V in the random walk method

A size parameter, i.e., a skip parameter; q is a second hop probability for adjusting paper nodes in the paper node queue V in a random walk method

A size parameter, i.e., a hop-in parameter;

step 109: warp beam

After determination, according to

And alias sampling, selecting

As a next-hop thesis node, will simultaneously

Bit 3 placed in paper node queue V; then, step 110 is executed;

step 111: repeating the steps 101 to 109 for each thesis node in the whole thesis network to complete the neighbor node sampling of the thesis node, and then there is a thesis node queue set denoted as VF ═ V₁,V₂,...,V_f,...,V_F}; then step 201 is executed;

V₁representing a first paper node queue;

V₂representing a second paper node queue;

Then step 203 is executed;

indicating belonging to any one of the paper node queues V_fG represents an identification number of the thesis node;

for nodes in any one paper queue

Consider a node in a queue

All nodes with the distance smaller than WD are positive sample nodes; at a time, for any one paper node

First obtain the information of

2 × WD neighbor paper node set, noted

Indicated at the neighboring paper node

The node with the minimum identification number;

indicated at the neighboring paper node

The node with the medium and maximum identification number;

indicated at the neighboring paper node

Middle removing

And

step 203: for any arbitrary queue-paper node

Each node in (1) and any queue-thesis node

Forming a triple and then executingA row step 204;

for the said

And arbitrary queue-paper node

Form a triplet, i.e.

Where +1 denotes that the triplet is a positive sample, whereas-1 denotes that the triplet is a negative sample, and will

Insert positive sample queue Q_pPerforming the following steps;

for the said

And arbitrary queue-paper node

Form a triplet, i.e.

Insert positive sample queue Q_pPerforming the following steps;

for the said

And arbitrary queue-paper node

Form a triplet, i.e.

Insert positive sample queue Q_pPerforming the following steps;

step 207: queue Q of positive samples obtained in step 204_pAnd the negative sample queue Q obtained in step 206_nAre combined together to obtain a new sample queue Q_New＝{Q₁....,Q_(1+μ)×npThen go to step 208;

Q₁representing new samplesQueue Q_NewThe triplet of the smallest identification number in (1);

Mapping to obtain two corresponding transformed vectors

Step 303 is executed;

to belong to paper_aThe multi-layer perceptron function of (1);

to belong to paper_oThe multi-layer perceptron function of (1);

the Euclidean distance is:

gamma represents a harmonic parameter, which is a loss function used to balance positive and negative samples;

m represents Q_SortingAn identification number of any one of the triples;

due to triplets (paper)_a,paper_oA) is a flag that represents whether the triplet is a positive or negative sample, where positive samples are considered points that need to be similar in space and negative samples are considered points that need to be as far apart in space as possible;

2. A parameterized thesis network node representation learning method in accordance with claim 1, characterized in that: step 103, step 104 and step 105 realize the acquisition of the 2 nd element in the paper node queue V.

3. A parameterized thesis network node representation learning method in accordance with claim 1, characterized in that: step 106 to step 110 realize the acquisition of the element after the 2 nd bit element is relayed by the paper node queue V until the maximum queue element number mv of the paper node empty queue V is reached.