CN113807543A - Network embedding algorithm and system based on direction perception - Google Patents

Network embedding algorithm and system based on direction perception Download PDF

Info

Publication number
CN113807543A
CN113807543A CN202110983059.7A CN202110983059A CN113807543A CN 113807543 A CN113807543 A CN 113807543A CN 202110983059 A CN202110983059 A CN 202110983059A CN 113807543 A CN113807543 A CN 113807543A
Authority
CN
China
Prior art keywords
embedding
network
directed
node
nodes
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110983059.7A
Other languages
Chinese (zh)
Other versions
CN113807543B (en
Inventor
周晟
刘劭荣
卜佳俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN202110983059.7A priority Critical patent/CN113807543B/en
Publication of CN113807543A publication Critical patent/CN113807543A/en
Application granted granted Critical
Publication of CN113807543B publication Critical patent/CN113807543B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Physics & Mathematics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A directional network embedding algorithm based on direction awareness, comprising: s1, calculating asymmetric proximity, specifically including: defining single step probability for a random walk strategy in a directed network, storing the single step direction and the proximity information in the random walk in a weight, and calculating scores among nodes; s2, establishing a directed network embedding, specifically including: after the asymmetric proximity between the nodes is obtained through calculation, a qualitative directed network embedding DNE-L is established, the discrete asymmetric proximity between the nodes is reserved in an embedding network, after the asymmetric proximity between the nodes is obtained through calculation, a quantitative directed network embedding DNE-T is established, the discrete asymmetric proximity between the nodes is reserved in the embedding network, and a model is optimized. The invention also includes a system for implementing a directional network embedding algorithm based on direction perception. The invention has better explanatory property to the actual problem in the real network, and effectively reserves the discrete and continuous directed network embedding in the embedding space.

Description

Network embedding algorithm and system based on direction perception
Technical Field
The invention relates to machine learning, in particular to a directional network embedding algorithm and a system based on direction perception.
Background
The purpose of the network embedding algorithm is to embed nodes in an existing network into a low-dimensional vector space in order to better understand the semantic relationships between the nodes. Existing network embedding algorithms, which primarily focus on dealing with undirected networks, preserve similarities through deterministic metrics or random walks. For a directed embedding network, the general solution is to ignore the direction of the edges in the directed network and apply an undirected network embedding algorithm to the transformed network. However, this may result in information loss, and it is more likely that a wrong embedding result is learned.
Because edges in real networks are often related to direction, directed network embedding algorithms have received attention. The directed edges represent asymmetric proximity between nodes in the network, and the potential asymmetric proximity is a key characteristic of the directed network and needs to be preserved by using a network embedding algorithm. While some existing methods attempt to preserve asymmetric proximity in a directed graph, the meaning that they capture asymmetric proximity is ambiguous. Therefore, obtaining asymmetric proximity and efficiently preserving it in the embedding space, and making it practical for real networks, faces significant challenges.
Disclosure of Invention
The present invention provides a directional network embedding algorithm based on direction perception and a system thereof.
The invention aims to acquire asymmetric proximity in a directed network, effectively store the asymmetric proximity in an embedding space and achieve better effect in link prediction and node classification tasks of a real network.
In order to achieve the purpose, the invention adopts the following technical scheme: a directional network embedding algorithm based on direction awareness, comprising:
s1: calculating asymmetric proximity;
s1 a: defining single step probability for a random walk strategy in the directed network, wherein the single step probability formula is as follows:
Figure BDA0003229842410000021
wherein, P represents the single step probability of random walk,
Figure BDA0003229842410000022
represents from viThe k-th step of the starting random walk,
Figure BDA0003229842410000023
indicating the number of neighbors of node a,
Figure BDA0003229842410000024
number of neighbors representing node a, E ab1 means that there is one directed edge from a to b;
s1 b: the single step direction and the proximity information in the random walk are stored in the weight, and the single step weight formula is as follows:
Figure BDA0003229842410000025
wherein r isi,i+11 denotes random walk in the edge direction, ri,i+1With-1 representing a random walk in the opposite direction along the edge, ri,i+10 denotes the node viAnd vi+1Directional edges exist in the two directions;
s1 c: calculating scores among the nodes to express asymmetric proximity among the nodes, wherein the formula is as follows:
Figure BDA0003229842410000026
wherein r isj,j+1Is the step weight j, 1/k is used to normalize the effect from the step number.
S2: establishing directed network embedding;
s21: after the asymmetric proximity between the nodes is obtained through calculation, a qualitative directed network embedding DNE-L is established, and the discrete asymmetric proximity between the nodes is reserved in an embedding network:
s21 a: defining the probability of observing the context of a directed graph, i.e. s in asymmetric proximityu,vIn the context of a directed graph of node u, observing node vProbability. Different probability formulas are selected according to the directionality between the nodes:
Figure BDA0003229842410000027
Figure BDA0003229842410000028
Figure BDA0003229842410000029
wherein h issIs source embedding, htIs object embedding. The probability of the observation score is the dot product between the source embedding of node u and the target embedding of node v. When s isu,vWhen 0, node u and node v tend to form a bidirectional edge, so the probability is the sum of the embedded probabilities resulting from both directions.
S21 b: by maximizing the probability of observing context nodes of the directed graph, asymmetric proximity is kept in network embedding:
Figure BDA00032298424100000210
wherein, DCuIs the directed context of node u, su,vIs the result of the random walk strategy computation by S1, P (v | u, S)u,v) Is given a score of su,vThe probability of node v is observed in the directed graph context of node u.
S22: after the asymmetric proximity between the nodes is obtained through calculation, a quantitative directed network is built to be embedded into DNE-T, and the discrete asymmetric proximity between the nodes is reserved in an embedded network:
s22 a: defining a weight conversion formula, and obtaining a new weight by the asymmetric proximity score calculated in the step S1 through a weighting function:
Figure BDA0003229842410000031
wherein s isu,vIs the sum of the scores computed in the above Infowalk, and b is an offset value used to ensure that the weight is positive.
S22 b: defining a quantitative directed network embedding model, and learning source embedding and target embedding through weighted Skip-Gram optimization:
Figure BDA0003229842410000032
wherein h issIs source embedding, htIs object embedding, piu,vIs the conversion of scores into weights in a quantitative directed network.
S23: optimizing the model: the training efficiency is improved by adopting a negative sampling and random gradient descent strategy:
Figure BDA0003229842410000033
Figure BDA0003229842410000034
where, σ denotes the activation function,
Figure BDA0003229842410000035
the source embedding of the representation node u,
Figure BDA0003229842410000036
object embedding, π, representing node vu,vRepresenting the weight between nodes u and v.
Preferably, in S202a, the weighting function should satisfy the following requirement: (1) pi0>0;(2)
Figure BDA0003229842410000037
Figure BDA0003229842410000038
πm>πn;(3)
Figure BDA0003229842410000039
Wherein the content of the first and second substances,
Figure BDA00032298424100000310
representing the result of a calculation of a weighting function with a length i and an asymmetric proximity score m.
Further, a random walk strategy Infowalk is used for effectively obtaining a hierarchical structure and asymmetric proximity between nodes in the directed network to obtain a weighted node sequence representing the asymmetric proximity between the nodes for directed embedded learning; the use of qualitative directed network embedding DNE-L and quantitative directed network embedding DNE-T effectively preserves the embedded network in the embedding space, allowing it to achieve excellent task results on real-world reference datasets.
The system for implementing the directional-perception-based directional network embedding algorithm of the invention comprises a memory, a processor and a program stored on the memory and executed on the processor, and is characterized in that: the program comprises an asymmetric proximity calculation module and a directed network embedding building module which are connected in sequence.
The invention has the advantages that: 1) a new information random walking strategy is provided to effectively obtain asymmetric proximity in a directed network structure, so that a better explanation is provided for actual problems in a real network; 2) a directed network embedding algorithm (DNE-L with two variables and DNE-T with a directed network embedding method) with qualitative and quantitative is provided, and discrete and continuous directed network embedding is effectively kept in an embedding space.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
Fig. 1a to fig. 1f are schematic diagrams of an information walking strategy on a directed network according to an embodiment of the present invention, where fig. 1a is a schematic diagram of first backward walking and then forward walking between three nodes, fig. 1b is a schematic diagram of first forward walking and then backward walking between three nodes, fig. 1c is a schematic diagram of first backward walking and then forward walking twice between four nodes, fig. 1d is a schematic diagram of forward and backward spaced walking between four nodes, fig. 1e is a schematic diagram of forward walking in a directed ring graph, and fig. 1f is a schematic diagram of backward walking in a directed ring graph;
fig. 2 is an overall framework diagram of a directed network embedding method according to an embodiment of the present invention;
fig. 3a and 3b are comparative graphs of scoring results compared with existing algorithms under a user analysis experiment of a user recommendation scene, provided by an embodiment of the present invention, wherein fig. 3a is a comparative graph evaluated by a Micro-F1 score on different algorithms under a user contour analysis scene, and fig. 3b is a comparative graph evaluated by a Macro-F1 score on different algorithms under a user contour analysis scene.
Detailed Description
The technical solutions in the embodiments of the present invention are clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments of the present invention without making any creative effort, shall fall within the protection scope of the present invention.
The invention provides a directional network embedding algorithm based on direction perception, which can be used for: (1) a new information random walking strategy is used to effectively obtain asymmetric proximity between nodes of a directed network, and the method can be well applied to actual problems in a real network; (2) both qualitative and quantitative directed network embedding (DNE-L and the directed network embedding method DNE-T with two variables) are used to maintain discrete and continuous asymmetric proximity in the potential embedding space.
The core method proposed in the present invention is explained in detail below.
S1: calculating asymmetric proximity;
the invention provides an information random walk strategy Infowalk for calculating asymmetric proximity between nodes. The basic idea of InfoWalk is to first ignore the direction of an edge and allow random walks to access nodes from various directions. In each step of the random walk, the direction and asymmetric proximity are stored in a well-designed weight. After the random walk reaches the specified length, the Infowalk obtains a step-length weighted node sequence representing the asymmetric proximity between nodes, and the step-length weighted node sequence can be used for directed embedding learning.
S1 a: defining single step probabilities for a random walk policy in a directed network:
given a directed network G, the random walk from node vi can be represented as
Figure BDA0003229842410000051
vi→vj…→vkThis is a sequence of nodes that are currently visited,
Figure BDA0003229842410000052
indicating random walk
Figure BDA0003229842410000053
The node accessed in the k step. Suppose that in the k-th step, a random walk reaches node va
Figure BDA0003229842410000054
In step (k +1), the random walk will walk uniformly to node vaIs adjacent to
Figure BDA0003229842410000055
Or outer neighbor
Figure BDA0003229842410000056
Figure BDA0003229842410000057
Wherein P represents the probability of random walk,
Figure BDA0003229842410000058
represents from viThe k-th step of the starting random walk,
Figure BDA0003229842410000059
indicating the number of neighbors of node a,
Figure BDA00032298424100000510
number of neighbors representing node a, E ab1 means that there is one directed edge from a to b.
This random walk can be viewed as a directionless network walk ignoring the edge direction in the directed graph G. The walking method can reach nodes without paths in the directed network and acquire asymmetric proximity.
S1 b: the direction and proximity information of the single step in the random walk are saved in the weight:
to obtain the direction and proximity between nodes, the present invention further provides for each vi,i+1Introducing a direction-sensing step weight r according to the following rule on stepsi,i+1
Figure BDA00032298424100000511
Wherein r isi,i+11 denotes random walk in the edge direction, ri,i+1With-1 representing a random walk in the opposite direction along the edge, ri,i+10 denotes the node viAnd vi+1There is a directed edge in both directions.
S1 c: calculating scores among the nodes to represent asymmetric proximity among the nodes:
given the weight r of each stepi,i+1The result of Infowalk may be represented as a sequence of nodes with weighted edges:
Figure BDA0003229842410000061
Figure BDA0003229842410000062
based on the node sequence of the weighted edge, the invention defines a node v in the sequenceiAnd vi+1Fraction s ofi,i+kAs an index sum between each step thereof, the formula is as follows:
Figure BDA0003229842410000063
wherein r isj,j+1Is the step weight j, 1/k is used to normalize the effect from the step number.
Fig. 1 is a schematic diagram illustrating an information migration policy on a directed network according to an embodiment of the present invention, where solid arrows indicate steps moving along an edge direction, and dashed arrows indicate steps moving in a direction opposite to the edge direction. si,j> 0 denotes node viTends to observe a direction vjDirected edge of, si,j< 0 denotes the node vjTends to observe a direction viDirected edge of, si,j0 denotes the node vjAnd viWith bidirectional edges being observed.
Infowalk can easily acquire asymmetric proximity because Infowalk ignores the direction of edges in a directed network, making nodes with higher degree of introversion and introversion more easily accessible frequently. Therefore, the probability of these nodes appearing in other node windows is also higher.
S2: establishing directed network embedding;
the invention proposes two variants of directed network embedding: a qualitative directed network (DNE-L) and a quantitative directed network (DNE-T). For each variant, two independent embeddings are learned to preserve asymmetric proximity, referred to as source and target embeddings. The two variants differ in the way in which the asymmetric proximity is retained, and FIG. 2 shows the radicals of DNE-L and DNE-T of the two methods described aboveThis architecture, where DNE-L retains discrete directed network embeddings and DNE-T retains continuous directed network embeddings. DNE will be based on the score su,vAnd defining the context of the directed graph, and preserving the directed relation between the nodes through the source embedding and the target embedding of each node.
The context of the directed graph is the result of random walk of information on the directed network G, and is divided into a source context, a target context, and an ambiguous context. The source context refers to the node that the DNE method arrives at, and may have a directed edge from that node with it; the target context refers to the node that the DNE method arrives at, and may have a directed edge with it that reaches the node; ambiguous context refers to the nodes that the DNE method arrives at, but there is no explicit direction between them.
S21: after the asymmetric proximity between the nodes is obtained through calculation, a qualitative directed network embedding DNE-L is established, and the discrete asymmetric proximity between the nodes is reserved in an embedding network:
s21 a: defining the probability of observing the context of a directed graph, i.e. s in asymmetric proximityu,vIn the context of a directed graph of node u, the probability of node v is observed. Different probability formulas are selected according to the directionality between the nodes:
Figure BDA0003229842410000071
Figure BDA0003229842410000072
Figure BDA0003229842410000073
wherein h issIs source embedding, htIs object embedding. The probability of the observation score is the dot product between the source embedding of node u and the target embedding of node v. When s isu,vWhen 0, node u and node v tend to form a bidirectional edge, so the probability is fromThe sum of the probabilities of embedding resulting from both directions.
S21 b: by maximizing the probability of observing context nodes of the directed graph, asymmetric proximity is kept in network embedding:
Figure BDA0003229842410000074
wherein, DCuIs the directed context of node u, su,vIs the result of the random walk strategy computation by S1, P (v | u, S)u,v) Is given a score of su,vThe probability of node v is observed in the directed graph context of node u.
S22: after the asymmetric proximity between the nodes is obtained through calculation, a quantitative directed network is built to be embedded into DNE-T, and the discrete asymmetric proximity between the nodes is reserved in an embedded network:
s22 a: since the probability of the context node of the directed graph being accessed by the Infowalk is different from that of the central node, it is reasonable to measure the current node according to the relative scores of the context nodes. However, due to 1) the fraction su,vThe weight of the context node being 0 is not a positive weight but 0; 2) fraction s even if the random walk length is differentu,vThe weight of the context node of 0 is still the same, and the accuracy of directly using the score to measure the importance of the node is intuitively affected.
In order to solve the above problems, in the quantitative directed network embedding, s needs to be newly formulated according to new requirementsu,vThe weighting function requires the following: (1) pi0>0;(2)
Figure BDA0003229842410000075
πm>πn;(3)
Figure BDA0003229842410000076
Figure BDA0003229842410000077
Defining a weight conversion formula, and obtaining a new weight by the asymmetric proximity score calculated in the step S1 through a weighting function:
Figure BDA0003229842410000078
wherein s isu,vIs the sum of the scores computed in the above Infowalk, and b is an offset value used to ensure that the weight is positive. Such a conversion ensures that the score possesses the following attributes: (1) nodes with higher scores have larger weights; (2) nodes with longer distances have less weight in random walks.
S22 b: defining a quantitative directed network embedding model, and learning source embedding and target embedding through weighted Skip-Gram optimization:
Figure BDA0003229842410000081
wherein h issIs source embedding, htIs object embedding, piu,vIs the conversion of scores into weights in a quantitative directed network.
S23: optimizing the model: the training efficiency is improved by adopting a negative sampling and random gradient descent strategy, and the projection formula is as follows:
Figure BDA0003229842410000082
Figure BDA0003229842410000083
where, σ denotes the activation function,
Figure BDA0003229842410000084
the source embedding of the representation node u,
Figure BDA0003229842410000085
object embedding, π, representing node vu,vRepresenting the weight between nodes u and v.
The system for implementing the directional perception-based directed network embedding algorithm comprises a memory, a processor and a program stored on the memory and executed on the processor, wherein the program comprises an asymmetric proximity calculation module and a directed network embedding establishment module which are connected in sequence. The execution content of the asymmetric proximity calculation module corresponds to the content of step S1 of the method of the present invention, and the directed network embedding creation module corresponds to the content of step S2 of the method of the present invention.
In order to more clearly illustrate the specific application of the invention, the embodiment takes user recommendation on a microblog as an example, and elaborates the specific implementation process in detail:
the specific scenarios of this embodiment are: recommending interested users for the microblog users to pay attention.
A method for recommending interested users to pay attention to microblog users comprises the following steps:
step one, technicians need to collect attention information between users and establish a user relationship directed network. The nodes of the directed network represent a user individual, the directed edges represent the attention behaviors of the user, the outgoing direction of the edges represents an attention person, and the incoming direction of the edges represents an attention person.
Step two, after the directed network is established, the technician obtains the asymmetric proximity between the nodes, i.e., the asymmetric proximity between the users, by using the random walk strategy proposed in step S1.
Step three, the technician can choose to use the qualitative directed network embedding DNE-L mentioned in step S21 or the quantitative directed network embedding DNE-T mentioned in step S22 to keep the asymmetric proximity between users in the network embedding. In this process, the technician needs to use the negative sampling and stochastic gradient descent strategy in step S23 to improve the training efficiency and optimize the network model.
And step four, after learning of the directed network embedded model is completed, technicians can represent each user as a representation for a downstream task, namely a matching task of the user. The technical personnel calculate the similarity of the representation of each user, namely the users with similar representations are classified into one class for user recommendation.
The scheme provided by the embodiment of the invention mainly has the following beneficial effects: 1. asymmetric proximity is effectively obtained in a real directional network; 2. discrete or continuous network embedding reserved by using the DNE method has better effect in tasks such as link prediction and node classification than the existing embedding method. In order to explain the effects of the above-described embodiments of the present invention, experiments are described.
First, experimental data.
Experiments a wide range of experiments were performed using several real social network datasets and a bookkeeping network with tags on each node. Where a social network with directed edges is used to evaluate user recommendations and a bookkeeping network is used for user analysis. Because it is difficult to collect large-scale real social networks with real tags, experiments have adopted a booklist network with directed edges. Table 1 shows the statistics of the data set.
Dataset #Nodes #Edges #Labels %Dangling Node %Bi-directional Edges
Wiki 7,115 103,689 - 0.141 0.0565
Epinions 75,879 508,837 - 0.204 0.4052
Slashdot 77,360 905,468 - 0.271 0.8783
Twitter 90,908 443,399 - 0.087 0.6066
LastFM 136,409 1,685,524 - 0.439 0.0009
Pubmed 19,717 44,338 3 0.803 0.0001
Cocit 44,034 195,361 15 0.451 0.0001
TABLE 1 statistical information of data sets
And II, experimental conclusion.
1. The DNE approach achieves better results in most network data sets by preserving proximity between nodes.
In the experiment, the method of the present invention was compared with several of the most advanced directed network embedding methods and user recommendation methods to evaluate the proposed DNE. In the experiment, no comparison was made with the social network based user recommendation method, as the experiment focused on evaluating the learning effect of embedding users/nodes in the directed graph.
In the baseline method, Node2Vec, deep walk, APP, NERD are all random walk-based methods, and for fair comparison, the experiment sets the random walk parameters in these methods to be the same as the DNE method in the present invention. The method specifically comprises the following steps: the random walk length l is 10, the window size k is 4, and the walk number r of each node is 10. For the Node2Vec method, the probability of width-first sampling is set to 0.25 and the probability of depth-first sampling is set to 0.5. The inner product of the embedded vectors is used in the experiment to estimate the proximity between nodes. The APP, ATP, NERD, and HOPE approaches preserve asymmetric proximity by learning two independent source and target embeddings. For the node classification task, two kinds of embedding are used to test performance and report the best results. LINE learns two embeddings per node, namely context embedding and node embedding. In the experiment, the DNE method in the invention is realized by using PyTorch and Tensorflow, the model parameters are initialized randomly by using Xavier, an Adam optimizer is adopted for optimization, the learning rate is set to be 0.0005, and the batch size is set to be 512. The number of vector bits for all methods is 128.
Table 2 shows the generic user recommendations in five real-world social networking datasets. NA represents the case where these methods cannot be run on hardware due to memory limitations or run time exceeding one week,
Figure BDA0003229842410000102
is shown at p<The results of the pair-wise difference test at 0.05 were significant.
Figure BDA0003229842410000101
Figure BDA0003229842410000111
TABLE 2 comparison of Performance of the present invention and existing algorithms on common user recommendations
From table 2, it can be seen that: both variants of the proposed DNE method achieve better results in most network datasets than the existing methods in terms of preserving asymmetric proximity, which demonstrates the effectiveness of the present invention in obtaining asymmetric proximity in directed social networks.
2. The DNE method improves the effect of preserving the direction between the nodes in the user recommendation scene.
The experiment further evaluates the user recommendation tasks with directional perception to simulate the scene in the real world where the recommendation direction should be considered. The common user recommendation task only predicts whether the edge exists, and cannot ensure that the direction can be well predicted. For example, from viTo vjThere are directed edges, but from vjTo viWithout edges, the method of predicting edges from both directions would blend through the index by positive sampling and would not sample as negative samples. The experiment also tested the method according to the experimental setup of the existing methodRecommending the effect of the task to the perceived user. Where 30% of the links are randomly sampled from the original network as positive links, and negative links include random samples from edges that are not present in the original network and negative edges that are not present in the positive edges. Table 3 illustrates the effect of direction-aware user recommendations and classic user recommendations on real data sets.
Figure BDA0003229842410000112
Figure BDA0003229842410000121
Table 3 comparison of the performance of the present invention in direction perception with existing algorithms
From table 3, it can be seen that: among all the evaluation methods, DNE-L and DNE-T in the present invention achieved the best results on all data sets, and were significantly improved over the existing methods. Comparing tables 2 and 3, it can be observed that all methods have a reduced effect on direction-aware user recommendations, which illustrates the necessity to consider the direction and asymmetric proximity of edges. An improvement of both tasks can be observed comparing DNE-L and DNE-T, the improved effect being more pronounced in directional-perception user recommendations than in classical user recommendations. This further indicates the importance of the directional links between the predicted nodes to consider the direction.
3. The method has better effect on the aspect of user contour analysis.
User profiling is another important task for user modeling, especially in directed social networks, where the goal of user profiling is to find the user group to which the user belongs, which is the same as the classical node classification task. In the experiment, 30% of the randomly sampled and labeled nodes are trained, and the rest nodes are tested. The learned embeddings would be input into the same SVM classifier, and the results evaluated using Micro-F1 and Macro-F1 scores. For the method of learning two independent embeddings for each node, the embeddings are concatenated for evaluation, and the evaluation result is shown in fig. 3.
As can be seen from fig. 3, the basic observation result is similar to the user recommended task, and the DNE method has better effect than the existing method in two evaluation indexes.
The embodiments described in this specification are merely illustrative of implementations of the inventive concept and the scope of the present invention should not be considered limited to the specific forms set forth in the embodiments but rather by equivalents thereof as may occur to those skilled in the art upon consideration of the present inventive concept.

Claims (4)

1. A directional network embedding algorithm based on direction awareness, comprising:
s1: calculating asymmetric proximity;
s1 a: defining single step probability for a random walk strategy in the directed network, wherein the single step probability formula is as follows:
Figure FDA0003229842400000011
wherein, P represents the single step probability of random walk,
Figure FDA0003229842400000012
represents from viThe k-th step of the starting random walk,
Figure FDA0003229842400000013
indicating the number of neighbors of node a,
Figure FDA0003229842400000014
number of neighbors representing node a, Eab1 means that there is one directed edge from a to b;
s1 b: the single step direction and the proximity information in the random walk are stored in the weight, and the single step weight formula is as follows:
Figure FDA0003229842400000015
wherein r isi,i+11 denotes random walk in the edge direction, ri,i+1With-1 representing a random walk in the opposite direction along the edge, ri,i+10 denotes the node viAnd vi+1Directional edges exist in the two directions;
s1 c: calculating scores among the nodes to express asymmetric proximity among the nodes, wherein the formula is as follows:
Figure FDA0003229842400000016
wherein r isj,j+1Is the step weight j, 1/k is used to normalize the effect from the step number.
S2: establishing directed network embedding;
s21: after the asymmetric proximity between the nodes is obtained through calculation, a qualitative directed network embedding DNE-L is established, and the discrete asymmetric proximity between the nodes is reserved in an embedding network:
s21 a: defining the probability of observing the context of a directed graph, i.e. s in asymmetric proximityu,vIn the context of a directed graph of node u, the probability of node v is observed. Different probability formulas are selected according to the directionality between the nodes:
Figure FDA0003229842400000017
Figure FDA0003229842400000018
Figure FDA0003229842400000021
wherein h issIs source embedding, htIs object embedding. Score of observationIs the dot product between the source embedding of node u and the target embedding of node v. When s isu,vWhen 0, node u and node v tend to form a bidirectional edge, so the probability is the sum of the embedded probabilities resulting from both directions.
S21 b: by maximizing the probability of observing context nodes of the directed graph, asymmetric proximity is kept in network embedding:
Figure FDA0003229842400000022
wherein, DCuIs the directed context of node u, su,vIs the result of the random walk strategy computation by S1, P (v | u, S)u,v) Is given a score of su,vThe probability of node v is observed in the directed graph context of node u.
S22: after the asymmetric proximity between the nodes is obtained through calculation, a quantitative directed network is built to be embedded into DNE-T, and the discrete asymmetric proximity between the nodes is reserved in an embedded network:
s22 a: defining a weight conversion formula, and obtaining a new weight by the asymmetric proximity score calculated in the step S1 through a weighting function:
Figure FDA0003229842400000023
wherein s isu,vIs the sum of the scores computed in the above Infowalk, and b is an offset value used to ensure that the weight is positive.
S22 b: defining a quantitative directed network embedding model, and learning source embedding and target embedding through weighted Skip-Gram optimization:
Figure FDA0003229842400000024
wherein h issIs source embedding, htIs object embedding, piu,vIs to quantify the score to the weight in a directed networkAnd (4) heavy conversion.
S23: optimizing the model: the training efficiency is improved by adopting a negative sampling and random gradient descent strategy:
Figure FDA0003229842400000025
Figure FDA0003229842400000026
where, σ denotes the activation function,
Figure FDA0003229842400000031
the source embedding of the representation node u,
Figure FDA0003229842400000032
object embedding, π, representing node vu,vRepresenting the weight between nodes u and v.
2. The directional-awareness-based directed network embedding algorithm of claim 1, wherein: in step S202a, the weighting function should satisfy the following requirements: (1) pi0>0;(2)
Figure FDA0003229842400000033
(3)
Figure FDA0003229842400000034
Wherein the content of the first and second substances,
Figure FDA0003229842400000035
representing the result of a calculation of a weighting function with a length i and an asymmetric proximity score m.
3. The directional-awareness-based directed network embedding algorithm of claim 2, wherein: the method comprises the steps that a random walk strategy Infowalk is used for effectively obtaining a hierarchical structure and asymmetric proximity among nodes in a directed network, and a weighted node sequence representing the asymmetric proximity among the nodes is obtained and used for directed embedding learning; the use of qualitative directed network embedding DNE-L and quantitative directed network embedding DNE-T effectively preserves the embedded network in the embedding space, allowing it to achieve excellent task results on real-world reference datasets.
4. A system for implementing the directional-awareness-based directional-network-embedding algorithm of claim 1, comprising a memory and a processor and a program stored on the memory and executed on the processor, wherein: the program comprises an asymmetric proximity calculation module and a directed network embedding building module which are connected in sequence.
CN202110983059.7A 2021-08-25 2021-08-25 Network embedding method and system based on direction sensing Active CN113807543B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110983059.7A CN113807543B (en) 2021-08-25 2021-08-25 Network embedding method and system based on direction sensing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110983059.7A CN113807543B (en) 2021-08-25 2021-08-25 Network embedding method and system based on direction sensing

Publications (2)

Publication Number Publication Date
CN113807543A true CN113807543A (en) 2021-12-17
CN113807543B CN113807543B (en) 2023-12-08

Family

ID=78894107

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110983059.7A Active CN113807543B (en) 2021-08-25 2021-08-25 Network embedding method and system based on direction sensing

Country Status (1)

Country Link
CN (1) CN113807543B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100121792A1 (en) * 2007-01-05 2010-05-13 Qiong Yang Directed Graph Embedding
US20180053400A1 (en) * 2016-08-22 2018-02-22 General Electric Company Method and Apparatus For Determination Of Sensor Health
CN111008447A (en) * 2019-12-21 2020-04-14 杭州师范大学 Link prediction method based on graph embedding method
CN111292197A (en) * 2020-01-17 2020-06-16 福州大学 Community discovery method based on convolutional neural network and self-encoder
CN111581445A (en) * 2020-05-08 2020-08-25 杨洋 Graph embedding learning method based on graph elements
CN112633314A (en) * 2020-10-15 2021-04-09 浙江工业大学 Active learning source tracing attack method based on multi-layer sampling

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100121792A1 (en) * 2007-01-05 2010-05-13 Qiong Yang Directed Graph Embedding
US20180053400A1 (en) * 2016-08-22 2018-02-22 General Electric Company Method and Apparatus For Determination Of Sensor Health
CN111008447A (en) * 2019-12-21 2020-04-14 杭州师范大学 Link prediction method based on graph embedding method
CN111292197A (en) * 2020-01-17 2020-06-16 福州大学 Community discovery method based on convolutional neural network and self-encoder
CN111581445A (en) * 2020-05-08 2020-08-25 杨洋 Graph embedding learning method based on graph elements
CN112633314A (en) * 2020-10-15 2021-04-09 浙江工业大学 Active learning source tracing attack method based on multi-layer sampling

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
CHANG ZHOU 等: "Scalable Graph Embedding for Asymmetric Proximity", 《PROCEEDINGS OF THE THIRTY-FIRST AAAI CONFERENCE ON ARTIFICIAL INTELLIGENCE》, pages 2942 - 2948 *
张文涛 等: "图嵌入算法的分布式优化与实现", 《软件学报》, pages 636 - 649 *

Also Published As

Publication number Publication date
CN113807543B (en) 2023-12-08

Similar Documents

Publication Publication Date Title
Bliss et al. An evolutionary algorithm approach to link prediction in dynamic social networks
Shi et al. mvn2vec: Preservation and collaboration in multi-view network embedding
CN110532436B (en) Cross-social network user identity recognition method based on community structure
Chen et al. A rough set approach to feature selection based on ant colony optimization
Ngonmang et al. Churn prediction in a real online social network using local community analysis
Chen et al. Home location profiling for users in social media
Selvarajah et al. Dynamic network link prediction by learning effective subgraphs using CNN-LSTM
Cheng et al. Long-term effect estimation with surrogate representation
Wang et al. Graph active learning for GCN-based zero-shot classification
Perez-Cervantes et al. Using link prediction to estimate the collaborative influence of researchers
Asim et al. Predicting influential blogger’s by a novel, hybrid and optimized case based reasoning approach with balanced random forest using imbalanced data
Leng et al. Interpretable stochastic block influence model: measuring social influence among homophilous communities
Yang et al. Balanced influence maximization in social networks based on deep reinforcement learning
Terziev Feature Generation using Ontologies during Induction of Decision Trees on Linked Data.
Sun et al. Heterogeneous network representation learning based on role feature extraction
CN113807543A (en) Network embedding algorithm and system based on direction perception
Spinde et al. What do Twitter comments tell about news article bias? Assessing the impact of news article bias on its perception on Twitter
Morshed et al. LeL-GNN: Learnable edge sampling and line based graph neural network for link prediction
Perez et al. A social network representation for collaborative filtering recommender systems
CN115063251A (en) Social communication propagation dynamic network representation method based on relationship strength and feedback mechanism
CN114387005A (en) Arbitrage group identification method based on graph classification
Hruschka Jr et al. Roles played by Bayesian networks in machine learning: an empirical investigation
Shen et al. Developing Machine Learning and Deep Learning Models for Customer Churn Prediction in Telecommunication Industry
Zhou New Techniques for Learning Parameters in Bayesian Networks.
Gu et al. Influence maximization in social networks using role-based embedding.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant