CN111209611A - Hyperbolic geometry-based directed network space embedding method - Google Patents

Hyperbolic geometry-based directed network space embedding method Download PDF

Info

Publication number
CN111209611A
CN111209611A CN202010016003.XA CN202010016003A CN111209611A CN 111209611 A CN111209611 A CN 111209611A CN 202010016003 A CN202010016003 A CN 202010016003A CN 111209611 A CN111209611 A CN 111209611A
Authority
CN
China
Prior art keywords
network
node
nodes
hyperbolic
space
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010016003.XA
Other languages
Chinese (zh)
Inventor
吴宗柠
狄增如
樊瑛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Normal University
Original Assignee
Beijing Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Normal University filed Critical Beijing Normal University
Priority to CN202010016003.XA priority Critical patent/CN111209611A/en
Publication of CN111209611A publication Critical patent/CN111209611A/en
Withdrawn legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a hyperbolic geometry-based directed network space embedding method, which defines and quantifies the link possibility of directed edges in a network and the node similarity measure by taking a binary structure of a directed network, multiplexing node information and a forming mechanism of the directed network as an embedding basis, and calculates the node space coordinates based on the node similarity measure. Compared with the prior art, the method can perform dimension reduction mapping on a large amount of high-dimensional directed network big data in a hyperbolic space, finally provides a space coordinate capable of explaining a network forming mechanism from multiple scales, and has important practical significance for research in the fields of effective distance measurement of network big data with asymmetry characteristics, network forming mechanisms, visualization and the like. In addition, the method has wide application prospect on the problems of quantitative analysis, link prediction, node classification and the like of the directed network with the non-standard property.

Description

Hyperbolic geometry-based directed network space embedding method
Technical Field
The invention belongs to the research of the interdiscipline field of computer and system science, and aims to realize a method for storing and characterizing and learning asymmetric network (directed network) data in a dimensionality reduction manner based on hyperbolic geometric spatial properties by extracting network big data features. On the basis, the node distance and the similarity of the directed network are quantized, so that the visual presentation of the directed network is better shown and the application of related fields is assisted. The invention technically relates to a graph network space embedding theory, a hyperbolic geometric theory and a complex network analysis method.
Background
The complex network can greatly simplify the actual system and retain the basic information of the interactive structure, and becomes an ideal tool for researching the complex system. However, the complex network is a non-geometric attribute model, so that a geometric theory framework, tools and methods cannot be applied in the research of the complex network. Machine learning techniques are good at processing structured vector data as a hotspot direction in the field of data mining. With the research interest of machine learning and deep learning, making network embedding a leading topic in current network science, it aims to map most of the observed real complex systems to potential (i.e., hidden variables, gaussian latent variables, etc.) or low-dimensional (i.e., euclidean space or hyperbolic space) quantities according to network topology information and connection rules in hidden metric space.
The main idea of network embedding based on a machine learning method in the prior art is to embed a complex network into a Euclidean space by methods of random walk, deepwalk, word vector and the like, so that the dimension of the complex network is reduced from an n x n matrix to m x n. But the embedded space coordinates thereof have no practical meaning and cannot identify the structural features of the nodes. Recent advances in network geometry have indicated that structural features observed in scale-free networks can appear as geometric features in hyperbolic space. Hyperbolic geometry is a branch of non-euclidean geometry and has many applications in practical engineering techniques. The node degree distribution of a real complex network obeys power law distribution, and the capability of representing a scale-free network by a hyperbolic space is theoretically proved and is successful in some applications, which comprises the research in the fields of brain science, international trade, Internet routing, protein formation mechanism and the like. More importantly, the stochastic geometric model and the growth hyperbolic embedded model can not only explain the hierarchy, heterogeneity, high clustering and other characteristics of the scale-free network, but also give clear practical meaning to the spatial coordinates of the nodes.
The development of the research framework draws wide attention, and a series of models and algorithms are developed to embed a complex network into a hyperbolic space. These models have a prominent behaviour in studying the underlying geometry of the network. The classical complex network hyperbolic space embedding method is a popularity-similarity optimization method, which estimates the space coordinates of nodes through statistical inference, namely the probability that two-dimensional information of the nodes in a hyperbolic geometric space defines links between the nodes is coupled: the trade-off between node popularity, abstracted by radial coordinates, and similarity, represented by angular coordinate distance. However, the existing model cannot be completely embedded in a real complex system, and the biggest disadvantage is that the existing model ignores the direction of the link. In most practical networks, the relationships between nodes may not be equal, and the inequality of such relationships is reflected in the directionality of the edges between the nodes, also referred to as link asymmetry. Although asymmetries in the network may pose many challenges to the prediction of detected links in the underlying space, ignoring directional network space mapping will lose a large amount of important information and may not fully represent the structure and functionality of a real system. Therefore, considering hyperbolic spatial embedding of a directed network is a considerable and important research issue.
The invention provides a hyperbolic space embedding method of a directed network, namely an asymmetric popularity-similarity directed network hyperbolic space embedding optimization algorithm. Different from the previous research, the method is an embedding method which considers the relation asymmetry in the complex network, thereby making up the defects of the original theory and algorithm. First, the inherent link relationship between the directed link and the network topology is explored. The asymmetry of the connection edges in the complex network is reflected in the topological structure information in a hidden way, and plays an important role in the function and evolution of the system, namely, a hidden bipartite structure exists in the directed network. We examined that various directed networks have such a structure, and this phenomenon is a general rule of directed networks. Secondly, the dichotomy structure and node information multiplexing of the directed network are identified and used as the basis of embedding the directed network, and a new idea is provided for reducing the dimensionality of the directed network data. Specifically, the hyperbolic space embedding basis of the directed network is the result of four tradeoffs, namely similarity and popularity in the out-degree and in-degree directions: the nodes with higher popularity and similarity tend to be connected with the nodes with higher directing degree, so that the hyperbolic space embedding process of the directed network is completely described. In addition, since the similarity coordinate has no analytic solution, a maximum likelihood estimation algorithm and a sampling method are coupled to carry out parameter estimation on the node similarity coordinate. Finally, based on the space coordinates of the nodes, the invention also provides a visual directed network technology and a node similarity measurement scheme. On one hand, the visualization method can more intuitively display the importance of the nodes and the macroscopic structure and function of the network; on the other hand, the node similarity measure can be used for researching the effective distance between nodes in an actual system.
Compared with the traditional embedding method, the asymmetric popularity-similarity directed network hyperbolic space embedding optimization method can better depict the connection probability of directed edges by considering the global information and the local information of nodes, gives the network meaning of node space coordinates, can find corresponding measurement and visualization technology only by mining and analyzing network big data, and innovates the prior art method.
Disclosure of Invention
The purpose of the invention is as follows: the invention researches the spatial embedding problem of the directed network, and finds that related researches are less concerned about spatial dimension reduction of the directed network and how the spatial coordinates thereof reveal a network forming mechanism from multiple scales. Therefore, the invention provides a method for spatial embedding of a directed network based on hyperbolic geometry and a complex network analysis method. The method can find the mapping relation between the asymmetric network (directed network) and the low-dimensional space, realize the dimension reduction storage and visualization scheme of the directed network big data and the meaning of the space coordinate after the dimension reduction is clear, and further realize the node classification, the node importance evaluation and the link prediction of the directed network.
Based on the thought, the method solves the spatial embedding problem of the link directivity and endows the actual meaning of the spatial coordinates of the nodes, greatly improves the feasibility and the accuracy of the spatial embedding of the directed network data, and provides a node similarity measurement method based on a geometric space and a directed network visualization technology. Specifically, we model the node relationship with asymmetric property as a directed network, and if there is some relationship between node a and node B, there is a directed edge between node a and node B. For example, in the international trade directed network, a node represents a country, and if country a exports products to country B (i.e., there is an import or export relationship), country a has a directed edge pointing to country B.
The invention firstly identifies the bipartite topological structure in the directed network and establishes an asymmetric network space embedding dimension reduction method from a directed network forming mechanism. The technical scheme includes that an asymmetric node connection possibility expression is defined by identifying the structure of the directed network, and the space coordinates of the nodes are estimated through a hyperbolic geometric theory and the topological property of the directed network, so that the distance measurement and visualization method of the quantitative directed network is obtained. On the aspect of defining asymmetric node link possibility, a node multiplexing mode is adopted to divide nodes into two subsets with different properties, namely an A set and a B set, wherein no connecting edge exists in the subsets, and the connecting edge exists between the subsets and represents a link which actually exists between node pairs in a directed network. For the international trading network, these two subsets represent respectively the export country (set a) and the import country (set B), and the edge from set a to set B represents the export products of i countries belonging to set a to j countries belonging to set B. Further, in the process of estimating the space coordinates of the nodes, an asymmetric similarity (radial coordinate) -popularity (angular coordinate) algorithm and a theoretical derivation coupling embedding method are developed by combining a hyperbolic geometric theory and a maximum likelihood estimation algorithm.
The invention provides a hyperbolic geometry-based directed network space embedding method, which comprises the following steps of:
1. checking a hyperbolic space embedding basis according to a binary structure of a directed network, wherein the step 1 comprises the following steps of:
1-1) after binary structure properties are obtained based on node multiplexing (shown in figure 2), a degree-out sequence and a degree-in sequence of network nodes are obtained, and then power exponent gamma of the degree distribution is obtained under a hyperbolic logarithmic coordinate.
1-2) embedding a hyperbolic space network into a power exponent gamma >2.1 which is adaptive to the degree distribution based on hyperbolic theoretical knowledge. Most networks are heterogeneous (i.e. node degrees and weights), the heterogeneity can be used for filtering sparse sub-networks, and if the sparse sub-networks are not satisfied, the method for extracting the skeleton network can be adopted to obtain key links of the network and then embedding the key links.
2. Constructing a model according to a bipartite structure of a directed network, wherein the step 2 comprises the following steps:
2-1) constructing a directed network embedding model based on a directed network forming mechanism, namely defining the connection possibility and the embedding model of the asymmetric links.
2-2) according to the hidden measurement and the isomorphic property of the hyperbolic space, the node coordinates in the hyperbolic space are represented by the hidden measurement which is easier to solve.
3. Using a maximum expectation algorithm for parameter estimation, step 3 comprises:
3-1) carrying out log-likelihood processing on the embedded model through a Bayes rule to obtain a linear regression equation;
3-2) because the likelihood function of the node angle coordinate can not solve the analytic solution, the invention adopts an approximate maximization method to estimate the angle coordinate.
3-2) using an approximate maximization method (LMH: location Methopolis-Hastings methods)
Solving the space angular coordinate of the node:
4. defining node distances and directed network visualization based on the node space coordinates, step 4 comprising:
and 4-1) obtaining the hyperbolic distance between nodes in the directed network according to the definition of the node coordinates and the hyperbolic distance.
4-2) for each node, drawing a visual graph of the directed network connection structure by using the angular coordinate and the radial coordinate of the node.
Advantageous effects
1) Compared with the existing theoretical model, the method has better explanation effect by considering the geometric property of the directed graph network and the actual meaning of the node space coordinate. Meanwhile, the existing maximum likelihood estimation algorithm cannot depict the link possibility of asymmetric edges in a complex network, and the link possibility of different types of links in the directed network is more reasonably, scientifically and accurately depicted by identifying the binary structure and the topological property of the directed network, and the directed network is successfully subjected to spatial embedding dimension reduction.
2) Compared with the traditional theoretical model, the method can only embed undirected network space into reduced dimension and cannot explain the actual meaning of the space coordinates of the nodes from the dimension, particularly the specific meaning of the coordinates with different dimensions. The hyperbolic geometric asymmetric graph network space embedding dimension reduction method not only can depict the link possibility of the directed edges, but also can disclose how the node coordinates in the hyperbolic space influence the forming mechanism of the directed network from multiple scales.
3) Through network space coordinates, the directed network has an important development space in the technical methods of quantifying node similarity, node classification, edge link prediction and directed network big data visualization.
Drawings
FIG. 1 is a flow chart of the present invention.
Fig. 2 is a schematic diagram of a bipartite structure of a directed network. Where the continuous edge produced by x1 in point set A and y4 in point set B represents the continuous edge pointed to by point a at point d. The point set A and the point set B respectively represent the starting point and the end point of the directed edge in the directed network.
Fig. 3 is a stopping condition for extracting the directed network skeleton, that is, when the distance from a point on a curve to a diagonal line is maximum, the deletion of a low-weight connected edge is stopped. Wherein, different colors represent the skeleton extraction process of different networks, and the abscissa represents the ratio of the number of the residual edges after the edges are deleted to the number of the initial network edges. Similarly, the ordinate represents the ratio of the number of nodes remaining after the edge deletion to the number of nodes in the initial network
FIG. 4 is a flow chart of modeling and application of the present invention, including embedded model, main technology, visualization schematic, etc.
Fig. 5 is a result of comparing information of network topology before and after embedding in the directed network space, where the first-order statistical feature embedding result of the directed network is focused: cumulative degree distribution of the degree of departure and the degree of entry. Wherein, the graphs (a) and (b) respectively show the comparison results before and after the international trade network accumulation degree distribution space embedding in 2011. (c) And (d) graphs respectively showing the comparison results before and after 2016 years of international trade network accumulation degree distribution space embedding
Fig. 6 is the evolution of the national export popularity measure.
Fig. 7 is the evolution of the national import popularity measure.
Fig. 8 is a relationship between angular distance, hyperbolic distance, and node true distance in a directed hyperbolic network. Taking the caenorhabditis elegans neural network as an example, the relation of the distances between the neurons is analyzed. Graph (a) is a comparison of the angular distance (angular distance) and spatial distance (position distance) frequency histograms of nodes in the network. The abscissa indicates the distance and the ordinate indicates the frequency of the distance. Graph (b) is a comparison of hyperbolic distance (hyperbaric distance) and spatial distance (position distance) frequency histograms of nodes in the network. The abscissa indicates the distance and the ordinate indicates the frequency of the distance.
Detailed Description
The technical scheme of the invention is explained in detail in the following with the accompanying drawings:
referring to fig. 1 and 4, the present invention is embodied as follows.
Step 1, checking hyperbolic space embedding basis according to binary structure of directed network
1-1) hyperbolic geometric theory indicates that the power-law exponent of the degree distribution of the geometric description of a complex network needs to meet the condition, namely p (k) -kAnd gamma >2.1, where k represents the degree of a node in the network
Figure BDA0002358902110000061
The node degree probability distribution function is p (k), the power exponent γ of the degree distribution. For a practical system, the power exponent of the distribution of the degree of departure and the degree of entrance of the national trade network are 3.5 and 2.5 respectively; the power exponents of the degree of out and degree of in of the biological neural network are 3.42 and 3 respectively.
1-2) if the network degree distribution does not meet the power law distribution, identifying the key connection relation in the network by adopting a network framework extraction method; sorting in a descending order according to the weight of the connected edges, deleting the connected edges with small weight from back to front in sequence and ensuring that no isolated point exists in the network; the above results are repeated and the current edge (L) is recorded at each stepB) And the original number of edges (L)0) The ratio of (A) to (B): (L)B/L0) And the current number of nodes (L)B) Andnumber of original nodes L0Ratio of (A) to (B)B/N0And drawing a scatter diagram by taking the coordinate axis as the coordinate axis;
1-2) if the key connection relation is not satisfied, identifying the key connection relation in the network by adopting a network framework extraction method. The basic idea is to sort in descending order according to the weight of the connected edges, delete the connected edges with small weight from back to front in sequence and ensure that no isolated point exists in the network. Based on the results of the previous study, the above results were repeated and the ratio of the current edge to the original edge (L) was recorded at each stepB/L0) And the ratio of the number of current nodes to the number of original nodes (N)B/N0) And is given a value of (L)B/L0) And (N)B/N0) And drawing a scatter diagram, wherein the step number corresponding to the point with the maximum diagonal distance is the edge deletion threshold, such as the position of the five-pointed star in fig. 2.
Step 2, constructing a model according to the binary structure of the directed network
2-1) FIG. 3 shows a schematic diagram of a bipartite structure of a directed network, according to which the invention constructs a network model: in hyperbolic space, a node has four coordinates (r)ai,rbjaibj) The popularity r and the similarity theta of the direction and the incoming direction are shown, respectively. Based on the mechanism of forming the directed network, the more similarity and popularity of the outgoing direction and the incoming direction are, the easier the directed edge is to form the connected edge. Thus, the model is given by equation (1):
f(xai,bj)=(1+xai,bj β)-1(1)
distinct weights of directed edges: x is the number ofai,bj=rai+rbj+2dai,bj(2)
Angular similarity of directed edges: dai,bj=min(|π-|θaibj||) (3)
In a specific system, the different weights of the directed edges represent the spatial topological distance between the nodes, such as the topological distance measure between countries in the international trade system and the topological distance between neurons in the biological system.
2-1) according to isomorphic properties of a hidden space and a hyperbolic space, the popularity coordinate r of a node can have a node expectation degree kappaaiAnd kappabiCharacterised, i.e.
rai=R-ln(κaia0) (4)
Minimum desirability:
Figure BDA0002358902110000071
wherein R is the radius of the hyperbolic Poincare disc,
Figure BDA0002358902110000072
and gammaiRespectively representing the average out degree (or average in degree) of the i node and the power law relation index of the out degree or in degree distribution, i.e. p (k) -kIn particular, a popularity measure r of a country in a national trading network may characterize the importance and status of that country. The results were fitted in a log-log scale as shown in fig. 5.
And 3, performing parameter estimation by using a maximum expectation algorithm:
3-1) hiding isomorphic properties of the metric space and the hyperbolic space, and converting a task of solving the popularity and the similarity coordinate into inference on hidden variables and angle coordinates. The parameter estimation is to find the hyperbolic space coordinate which is most matched with the topological information of the given adjacency matrix, and the process obeys a Bayes formula:
Figure BDA0002358902110000073
and (3) carrying out log linearization on the above formula to obtain a likelihood function of the model:
Figure BDA0002358902110000074
Figure BDA0002358902110000075
and 3-2) solving the node space popularity radial coordinate according to the likelihood function. By partial derivation, the maximization conditional probability expectation function lnL:
Figure BDA0002358902110000081
Figure BDA0002358902110000082
wherein, the subscript refers to the statistical properties of the nodes in the set, such as average degree, degree distribution power index, β is a parameter for controlling the network aggregation coefficient, the invention adopts the distribution index of the node common neighbor number as the estimation, i.e. p (m) -m
3-3) maximization conditional probability likelihood function can not obtain the analytic solution of the angular coordinate due to partial derivation, the invention adopts an approximate maximization method (LMH, location Metropolis-Hastings) to solve the spatial similarity angular coordinate of the node:
likelihood function lnL of i defining nodes*iThen, the likelihood value of the whole hyperbolic network is:
Figure BDA0002358902110000083
lnL*i=∑*j≠*ia*i*jlnp(x*i*j)+∑*j≠*i(1-a*i*j)ln[1-p(x*i*j)](12)
the node angle coordinate is [0,2 pi ]]And randomly selecting and sequentially accessing the nodes. When a node i is visited, it will move to have its likelihood value lnL*iMaximum angular coordinate θ*iAnd is still fixed at the angular coordinate theta when accessing other nodes*iThe above. The node movement causes the likelihood values of other nodes to change, but the likelihood values of the nodes are mainly influenced by the coordinates of the node neighbors.
And 4, defining node distance and directed network visualization based on the node space coordinates:
4-1) symmetric hyperbolic distance. Our model considers the intrinsic drive in the link generation process more carefully, i.e. with high raiTend to generate and have a high rbjThe node of (1). However, a new problem is also brought, that is, the hyperbolic distance of the directed network does not have asymmetry, strictly speaking, does not meet the premise of distance definition, and some practical tasks, such as node clustering analysis and node centrality calculation, all require symmetric distance measurement. Here, the invention mixes xai,bjAnd xbj,aiDefining a symmetric hyperbolic distance, i.e.
xij=xai,bj+xbj,ai(13)
4-2) a hyperbolic space coordinate-based directed network visualization method, as shown in fig. 6. The distribution of nodes in space has the following characteristics due to model definition:
4-2-1) the distance of a node from the center represents the popularity r of the node, i.e., the more central the node is, the more important and easier it is to have a higher likelihood of linking with other nodes.
By taking energy trade as an example, the method of the invention is used for calculating the long-term evolution of the national evolution path and the national status, and the result is shown in fig. 6. From the perspective of national export ability, the popularity measure r represents the imbalance, diversification and multi-polarization development of world energy commodity trade. The united states and russia have always been central reflecting the fact that energy is a determining factor in export capacity. Particularly in the uk, the exit position in the energy trade in the uk is gradually marginalized with the exhaustion of the north sea oil field. Interestingly, the energy export position of saudi arabia is gradually marginalized due to the change in the direction of the saudi energy policy. Asia, africa and the continental europe have become active areas and have injected new vitality into the energy trade market. With the increasing dominance of imports, the european community, china and india, has moved to a central position. Although india has progressed towards a more central location, in the past few years it has become a new area with a super-kingdom position; it is noted that the united states has been the core region of world trade, which is the leader of the major trading countries. On the other hand, from the perspective of the national import popularity measure, the results are shown in fig. 7. The core position in the united states is always at the core position, the centermost disc. China and India are slowly approaching the core of the world's energy trade, while European countries such as the United kingdom and Germany fluctuate widely and are unstable.
Furthermore, the national popularity measure r has a significant negative correlation with GDP (about-0.5), which indicates that the spatial location of a country can determine the economic size of the country, in particular the export capacity of the country: the correlation coefficient of the import and the GDP is higher than that of the GDP, namely, the export condition can be used as a coarse-grained measure of national economy scale.
4-2-2) the relative positions of the nodes indicate similarity between the nodes, i.e., the closer the spatial positions between the nodes, the more similar the nodes are and the higher the likelihood of linking.
Thus, any two nodes in the network have an angular distance measure, a hyperbolic topological distance measure, and a true distance measure of the node (e.g., a geographical distance between countries, a spatial distance between nerves in an organism). The application of different distances in explaining the problems of node classification and the like is introduced by taking a nematode neural network formed by the interaction among caenorhabditis elegans neurons as an example, and a new visual angle is provided for researching the topology and visualization of the neural network.
First, we compare the relevant distance measure obtained by embedding the hyperbolic space of the nematode neural network with the actual distance of the neuron, and the result is shown in fig. 8. From the results, it can be found that the angular distance is similar to the actual distance. Whereas the difference between the hyperbolic distance and the positional distance is large. This indicates that the effective distance between neurons is not just a positional distance, but also contains other dimensional information, and this effective distance is the result of a non-linear mixture of topological and spatial information.
In addition, we use the idea of community detection and disk partitioning to obtain the community structure of the neural network and compare it with the actual neural function area. The angular distance of the hyperbolic disk implies the similarity of the nodes, and the partitions of the disk are potential biochemical modules defined in geometric space. The partition method comprises the following steps:
(1) the theoretical framework of hyperbolic embedding shows that the angular distance is [0,360%0]Are randomly distributed. Thus, we divided the hyperbolic disk into 10 sections, each 36 degrees, based on 10 regions of neural function. An important issue to note here is the difference in initial position selection, which may affect the effect of the partitioning.
(2) To solve this problem, we use the concept of community structure to determine the partition of the disk.A definition of community structure is that the links of the internal communities are dense and the links between communities are sparse.corresponding to the hyperbolic embedding model, we define a parameter η that represents the ratio of the internal distances to the external distances of the communities to describe the optimal partition.A η larger represents a more reasonable classification of the node.
Based on this, we obtained the optimal partitioning of neurons in hyperbolic space by selecting different starting partition locations, with the minimum distance between nodes as a reference for starting partition span and the maximum η. an interesting finding is that some neurons with higher direct prevalence belong to the same neural functional area-the lateral ganglia, such as RIAL, RIAR, SAAVR, RMDR and SMDVR, but the partitions in hyperbolic space are more scattered.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by a person skilled in the art that the scope of the invention as referred to in the present application is not limited to the embodiments with a specific combination of the above-mentioned features, but also covers other embodiments with any combination of the above-mentioned features or other equivalent features without departing from the inventive concept. For example, the above features and the technical features (but not limited to) having similar functions disclosed in the present application are mutually replaced to form the technical solution.

Claims (2)

1. A hyperbolic geometry-based directed network space embedding method is characterized by comprising the following steps:
step 1, checking hyperbolic space embedding basis according to a binary structure of a directed network:
1-1) according to hyperbolic geometric theory, a complex network is represented by a matrix where a ═ aij}n×nThe power index of the degree distribution of the geometric description needs to satisfy the condition, namely p (k) kAnd gamma >2.1, where k represents the degree of a node in the network
Figure FDA0002358902100000011
The node degree probability distribution function is p (k), and the power exponent gamma of degree distribution;
1-2) if the network degree distribution does not meet the power law distribution, identifying the key connection relation in the network by adopting a network framework extraction method; sorting in a descending order according to the weight of the connected edges, deleting the connected edges with small weight from back to front in sequence and ensuring that no isolated point exists in the network; the above results are repeated and the current edge (L) is recorded at each stepB) And the original number of edges (L)0) The ratio of (A) to (B): (L)B/L0) And the current number of nodes (L)B) And the number of original nodes L0The ratio of (A) to (B): n is a radical ofB/N0And drawing a scatter diagram by taking the coordinate axis as the coordinate axis;
step 2, constructing a model according to the binary structure of the directed network
2-1) constructing a network model: in hyperbolic space, a node has four coordinates rai,rbjaibjRespectively representing the popularity r and the similarity theta of the outgoing direction and the incoming direction, wherein the more the similarity and the popularity of the outgoing direction and the incoming direction are, the more easily the directional edge forms a connecting edge on the basis of a directional network forming mechanism; thus, the model is given by equation (1):
f(xai,bj)=(1+xai,bj β)-1(1)
distinct weights of directed edges: x is the number ofai,bj=rai+rbj+2dai,bj(2)
Angular similarity of directed edges: dai,bj=min(|π-|θaibj||) (3)
β is a parameter for controlling network aggregation, subscript ai in the formula represents an i node of a point set A, in a specific system, the distinct weight of a directed edge represents a spatial topological distance between nodes, and the angle similarity reflects a measurement mode of the similarity between nodes;
2-1) according to isomorphic properties of a hidden space and a hyperbolic space, the popularity coordinate r of the node has a node expectation degree kappaaiAnd kappabiCharacterization, namely:
rai=R-ln(κaia0) (4)
minimum desirability:
Figure FDA0002358902100000021
wherein R is the radius of the hyperbolic Poincare disc,
Figure FDA0002358902100000022
and gammaiThe power law relation index of the distribution of the average out degree or average in degree and out degree or in degree of the i node is expressed as p (k) -kFitting the result under a log-log coordinate;
and 3, performing parameter estimation by using a maximum expectation algorithm:
3-1) hiding isomorphic properties of a metric space and a hyperbolic space, converting a task of solving popularity and similarity coordinates into inference on hidden variables and angle coordinates, wherein parameter estimation is to find the hyperbolic space coordinate most matched with the parameters by giving topological information of an adjacency matrix, and the process obeys a Bayesian formula:
Figure FDA0002358902100000023
and (3) carrying out log linearization on the above formula to obtain a likelihood function of the model:
Figure FDA0002358902100000024
Figure FDA0002358902100000025
3-2) solving the popularity radial coordinate of the node space according to the likelihood function, and maximizing the conditional probability expectation function ln L by solving the partial derivative:
Figure FDA0002358902100000026
Figure FDA0002358902100000027
wherein, the subscript indicates the statistical property of the nodes in the set, such as the average degree, β is a parameter for controlling the network aggregation coefficient, and the distribution index of the number of the common neighbors of the nodes is used as the estimation, namely p (m) -m
3-3) solving the spatial similarity angular coordinate of the node by adopting an approximate maximization method, wherein the maximum conditional probability likelihood function can not obtain the analytic solution of the angular coordinate due to partial derivation:
defining likelihood functions of i of nodes
Figure FDA0002358902100000028
The likelihood value of the whole hyperbolic network is:
Figure FDA0002358902100000031
Figure FDA0002358902100000032
randomly selecting the node angle coordinates at 0,2 pi, and sequentially accessing the nodes; when a node i is visited, it will move to make its likelihood value
Figure FDA0002358902100000033
Maximum angular coordinate
Figure FDA0002358902100000034
And is still fixed at angular coordinates when other nodes are visited
Figure FDA0002358902100000035
The above step (1); the node movement can cause the likelihood values of other nodes to change, but the likelihood values of the nodes are influenced by the coordinates of the node neighbors;
and 4, defining node distance and directed network visualization based on the node space coordinates:
4-1) symmetric hyperbolic distance, with high raiTend to generate and have a high rbjOf nodes of (2), hybrid xai,bjAnd xbj,aiDefining a symmetric hyperbolic distance, i.e.
xij=xai,bj+xbj,ai(13)
4-2) directed network visualization method based on hyperbolic space coordinates, the distribution of nodes in the space has the following characteristics due to model definition:
4-2-1) the distance of a node from the center represents the popularity r of the node;
4-2-2) the relative positions of the nodes indicate similarity between the nodes, i.e., the closer the spatial positions between the nodes, the more similar the nodes are and the higher the likelihood of linking.
2. The method as claimed in claim 1, wherein in step 1-1), power exponents of the distribution of the degree of departure and the degree of entrance of the national trade network are 3.5 and 2.5, respectively; the power exponents of the degree of out and degree of in of the biological neural network are 3.42 and 3 respectively.
CN202010016003.XA 2020-01-08 2020-01-08 Hyperbolic geometry-based directed network space embedding method Withdrawn CN111209611A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010016003.XA CN111209611A (en) 2020-01-08 2020-01-08 Hyperbolic geometry-based directed network space embedding method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010016003.XA CN111209611A (en) 2020-01-08 2020-01-08 Hyperbolic geometry-based directed network space embedding method

Publications (1)

Publication Number Publication Date
CN111209611A true CN111209611A (en) 2020-05-29

Family

ID=70785646

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010016003.XA Withdrawn CN111209611A (en) 2020-01-08 2020-01-08 Hyperbolic geometry-based directed network space embedding method

Country Status (1)

Country Link
CN (1) CN111209611A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989189A (en) * 2021-03-08 2021-06-18 武汉大学 Structural hole node searching method based on hyperbolic geometric space
CN115034026A (en) * 2022-06-30 2022-09-09 河南理工大学 Quantitative characterization method for double complex fractal water system network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170352061A1 (en) * 2016-06-03 2017-12-07 University Of Maryland, College Park Optimal social network ad allocation using hyperbolic embedding
CN109471995A (en) * 2018-10-26 2019-03-15 武汉大学 A kind of hyperbolic embedding grammar of complex network
CN109522953A (en) * 2018-11-13 2019-03-26 北京师范大学 The method classified based on internet startup disk algorithm and CNN to graph structure data
CN109800504A (en) * 2019-01-21 2019-05-24 北京邮电大学 A kind of embedding grammar and device of heterogeneous information network
CN109857457A (en) * 2019-01-29 2019-06-07 中南大学 A kind of function level insertion representation method learnt in source code in the hyperbolic space

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170352061A1 (en) * 2016-06-03 2017-12-07 University Of Maryland, College Park Optimal social network ad allocation using hyperbolic embedding
CN109471995A (en) * 2018-10-26 2019-03-15 武汉大学 A kind of hyperbolic embedding grammar of complex network
CN109522953A (en) * 2018-11-13 2019-03-26 北京师范大学 The method classified based on internet startup disk algorithm and CNN to graph structure data
CN109800504A (en) * 2019-01-21 2019-05-24 北京邮电大学 A kind of embedding grammar and device of heterogeneous information network
CN109857457A (en) * 2019-01-29 2019-06-07 中南大学 A kind of function level insertion representation method learnt in source code in the hyperbolic space

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
FRAGKISKOS PAPADOPOULOS ET AL.: "Network Mapping by Replaying Hyperbolic Growth", 《IEEE/ACM TRANSACTIONS ON NETWORKING》 *
ZONGNING WU ET AL.: "A hyperbolic Embedding Model for Directed Networks", 《ARXIV》 *
吴宗柠等: "双曲空间下国际贸易网络建模与分析 ———以小麦国际贸易为例", 《复杂系统与复杂性科学》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112989189A (en) * 2021-03-08 2021-06-18 武汉大学 Structural hole node searching method based on hyperbolic geometric space
CN115034026A (en) * 2022-06-30 2022-09-09 河南理工大学 Quantitative characterization method for double complex fractal water system network
CN115034026B (en) * 2022-06-30 2023-11-21 河南理工大学 Dual complex fractal water system network quantitative characterization method

Similar Documents

Publication Publication Date Title
Amini et al. On density-based data streams clustering algorithms: A survey
Cao et al. An improved k-medoids clustering algorithm
CN107705212B (en) Role identification method based on particle swarm random walk
Kotary et al. A many-objective whale optimization algorithm to perform robust distributed clustering in wireless sensor network
CN111931505A (en) Cross-language entity alignment method based on subgraph embedding
Wang et al. An improved data characterization method and its application in classification algorithm recommendation
CN112182306B (en) Uncertain graph-based community discovery method
CN107577742A (en) A kind of more relation social network patterns method for digging based on bayes method
CN111209611A (en) Hyperbolic geometry-based directed network space embedding method
Du et al. Detection of key figures in social networks by combining harmonic modularity with community structure-regulated network embedding
Zhang et al. Hierarchical community detection based on partial matrix convergence using random walks
Zhang et al. Complex network graph embedding method based on shortest path and moea/d for community detection
Zhang DBSCAN Clustering Algorithm Based on Big Data Is Applied in Network Information Security Detection
Dehbi et al. Learning grammar rules of building parts from precise models and noisy observations
Diao et al. Clustering by detecting density peaks and assigning points by similarity-first search based on weighted K-nearest neighbors graph
Dahal Effect of different distance measures in result of cluster analysis
Aljibawi et al. A survey on clustering density based data stream algorithms
Nguyen et al. A method for efficient clustering of spatial data in network space
Rezaeipanah et al. Providing a new method for link prediction in social networks based on the meta-heuristic algorithm
Long et al. A unified community detection algorithm in large-scale complex networks
Yang et al. Detecting communities in attributed networks through bi-direction penalized clustering and its application
Hu et al. Learning deep representations in large integrated network for graph clustering
Liu et al. An accurate method of determining attribute weights in distance-based classification algorithms
Sikarwar et al. A Review on Social Network Analysis Methods and Algorithms
Xiong Initial clustering based on the swarm intelligence algorithm for computing a data density parameter

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200529

WW01 Invention patent application withdrawn after publication