WO2016138836A1 - 相似性度量的方法及设备 - Google Patents

相似性度量的方法及设备 Download PDF

Info

Publication number
WO2016138836A1
WO2016138836A1 PCT/CN2016/074728 CN2016074728W WO2016138836A1 WO 2016138836 A1 WO2016138836 A1 WO 2016138836A1 CN 2016074728 W CN2016074728 W CN 2016074728W WO 2016138836 A1 WO2016138836 A1 WO 2016138836A1
Authority
WO
WIPO (PCT)
Prior art keywords
matrix
nodes
represented
correction
correction vector
Prior art date
Application number
PCT/CN2016/074728
Other languages
English (en)
French (fr)
Inventor
李震国
成杰峰
范伟
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to EP16758452.3A priority Critical patent/EP3258368B1/en
Publication of WO2016138836A1 publication Critical patent/WO2016138836A1/zh
Priority to US15/694,559 priority patent/US10579703B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/11Complex mathematical operations for solving equations, e.g. nonlinear equations, general mathematical optimization problems
    • G06F17/12Simultaneous equations, e.g. systems of linear equations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/16Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F7/00Methods or arrangements for processing data by operating upon the order or content of the data handled
    • G06F7/02Comparing digital values

Definitions

  • Embodiments of the present invention relate to the field of data processing, and more particularly, to a method and apparatus for similarity metrics.
  • big pictures and large networks are common expressions of data and information, such as social networks, the Internet, e-commerce, communication networks, and so on.
  • Graph-based applications can include retrieval and recommendation.
  • the search can be, for example, a Google search engine.
  • Recommendations such as Facebook friend recommendation, LinkedIn professional recommendation, Netflix movie recommendation, Ebay and Amazon product recommendation, Twitter message recommendation, etc.
  • retrieval and recommendation are based on similarities between nodes in the graph.
  • social networking is an important platform for sharing information between friends.
  • the more friends, the more frequent information sharing and communication. Therefore, an important function of maintaining a social network is to make friend recommendations based on the similarity between nodes.
  • One way to measure the similarity between nodes is to collect various attributes of all nodes, such as age, occupation, income, hobbies, etc., and then measure the similarity between nodes according to the similarity of various attributes.
  • this method not only requires a large amount of customer information to be collected, but also has high storage requirements, and this method may involve the customer's personal privacy information.
  • SimRank Another way to more effectively measure the similarity between nodes is SimRank.
  • SimRank has been widely used in various scenarios, such as recommendation systems, information retrieval, link prediction, citation networks, student course networks, and the like.
  • the method based on the SimRank similarity measure in the prior art is directly calculated according to the definition, resulting in high time and space complexity, and is not suitable for a large network.
  • Embodiments of the present invention provide a method for similarity measurement, which has low time and space complexity and can Suitable for large networks.
  • a method of similarity metrics comprising:
  • the attenuation factor is an attenuation factor defined in a SimRank similarity method, and a dimension of the constraint matrix is n ⁇ n;
  • a diagonal element of the diagonal correction matrix is a component of the correction vector, and a dimension of the diagonal correction matrix is n ⁇ n;
  • the determining, by using the Jacobi method, the linear equations to determine the correction vector includes:
  • the Jacobi method is used to iteratively solve the linear equations, and the solution at the time of convergence is determined as the correction vector, or the solution when the preset maximum number of iterations is reached is determined as the correction vector.
  • the constraint matrix is represented as A
  • the correction vector is represented as x
  • the determining, by using the Jacobi method, the linear equations to determine the correction vector includes:
  • x i represents the i-th element of the correction vector x
  • x j represents the j-th element of the correction vector x
  • a ij represents the element of the i-th row and the j-th column of the constraint matrix A
  • the attenuation factor is represented as c
  • the transfer matrix is represented as P
  • the constraint matrix is represented as A
  • the constraint matrix is calculated according to the transition matrix and the attenuation factor, including:
  • e i and e j are orthogonal unit vectors, and t is a preset positive integer.
  • the correction vector is represented as x
  • the diagonal correction matrix is represented as D
  • the generating a diagonal correction matrix according to the correction vector includes:
  • Determining the element D ij of the diagonal correction matrix D is:
  • D ij represents the element of the i-th row and the j-th column of the diagonal correction matrix D
  • the attenuation factor is represented as c
  • the transfer matrix is represented as P
  • the diagonal correction matrix is denoted as D
  • the similarity between the nodes is denoted as S
  • the similarity between the n nodes is calculated according to the transition matrix, the attenuation factor and the diagonal correction matrix Degree, including:
  • T represents transposition
  • t is a preset positive integer
  • the element s ij of the i-th row and the j-th column of the matrix represented by S represents the similarity between the i-th node and the j-th node.
  • the obtaining a pointing relationship between the n nodes in the network, and Determining the transfer matrix according to the pointing relationship includes:
  • the first-order transfer matrix on the inverse graph of the graph is taken as the transfer matrix.
  • the transfer matrix is represented as P, and
  • P ij represents an element of the i-th row and the j-th column of the transfer matrix P
  • In(j) represents all sets of nodes pointing to the node j
  • E represents a set of node groups having a pointing relationship.
  • an apparatus for similarity metrics comprising:
  • An obtaining unit configured to acquire a pointing relationship between two nodes in the network, and used to obtain an attenuation factor, wherein the attenuation factor is an attenuation factor defined in a SimRank similarity method, and n is greater than or equal to 2 Positive integer
  • a processing unit configured to determine a transfer matrix according to the pointing relationship acquired by the acquiring unit, and calculate a constraint matrix according to the transfer matrix and the attenuation factor acquired by the acquiring unit, where a dimension of the transfer matrix is n ⁇ n, the dimension of the constraint matrix is n ⁇ n;
  • the processing unit is further configured to construct a system of linear equations according to the constraint matrix, wherein a coefficient matrix of the system of linear equations is the constraint matrix, and a variable of the system of linear equations is a correction vector;
  • the processing unit is further configured to iteratively solve the linear equations by using a Jacobi Jacobi method to determine the correction vector;
  • the processing unit is further configured to generate a diagonal correction matrix according to the correction vector, wherein a diagonal element of the diagonal correction matrix is a component of the correction vector, and a dimension of the diagonal correction matrix is n ⁇ n;
  • the processing unit is further configured to calculate a similarity between the n nodes according to the transition matrix, the diagonal correction matrix, and the attenuation factor acquired by the acquiring unit.
  • the processing unit is specifically configured to:
  • the Jacobi method is used to iteratively solve the linear equations, and the solution at the time of convergence is determined as the correction vector, or the solution when the preset maximum number of iterations is reached is determined as the correction vector.
  • the constraint matrix is represented as A
  • the correction vector is represented as x
  • the processing unit is specifically configured to:
  • x i represents the i-th element of the correction vector x
  • x j represents the j-th element of the correction vector x
  • a ij represents the element of the i-th row and the j-th column of the constraint matrix A
  • the attenuation factor is represented as c
  • the transfer matrix is represented as P
  • the constraint matrix is represented as A
  • the processing unit is specifically configured to:
  • e i and e j are orthogonal unit vectors, and t is a preset positive integer.
  • the correction vector is represented as x
  • the diagonal correction matrix is represented as D
  • the processing unit is specifically configured to:
  • Determining the element D ij of the diagonal correction matrix D is:
  • D ij represents the element of the i-th row and the j-th column of the diagonal correction matrix D
  • the attenuation factor is represented as c
  • the transfer matrix is represented as P
  • the diagonal correction matrix is represented as D
  • the similarity between the n nodes is represented as S
  • the processing unit is specifically configured to:
  • T represents transposition
  • t is a preset positive integer
  • the element s ij of the i-th row and the j-th column of the matrix represented by S represents the similarity between the i-th node and the j-th node.
  • the processing unit is specifically configured to:
  • the first-order transfer matrix on the inverse graph of the graph is taken as the transfer matrix.
  • the transfer matrix is represented as P, and
  • P ij represents an element of the i-th row and the j-th column of the transfer matrix P
  • In(j) represents all sets of nodes pointing to the node j
  • E represents a set of node groups having a pointing relationship.
  • an apparatus for similarity metrics comprising:
  • a receiver configured to acquire a pointing relationship between two nodes in the network, and used to obtain an attenuation factor, wherein the attenuation factor is an attenuation factor defined in a SimRank similarity method, and n is greater than or equal to 2.
  • a processor configured to determine a transfer matrix according to the pointing relationship acquired by the receiver, and calculate a constraint matrix according to the transfer matrix and the attenuation factor acquired by the receiver, where a dimension of the transfer matrix is n ⁇ n, the dimension of the constraint matrix is n ⁇ n;
  • the processor is further configured to construct a linear equation group according to the constraint matrix, wherein a coefficient matrix of the linear equations is the constraint matrix, and a variable of the linear equation group is a correction vector;
  • the processor is further configured to iteratively solve the linear equations by using a Jacobian Jacobi method to determine the correction vector;
  • the processor is further configured to generate a diagonal correction matrix according to the correction vector, wherein a diagonal element of the diagonal correction matrix is a component of the correction vector, and a dimension of the diagonal correction matrix is n ⁇ n;
  • the processor is further configured to calculate a similarity between the n nodes according to the transition matrix, the diagonal correction matrix, and the attenuation factor acquired by the receiver.
  • the processor is specifically configured to:
  • the Jacobi method is used to iteratively solve the linear equations, and the solution at the time of convergence is determined as the correction vector, or the solution when the preset maximum number of iterations is reached is determined as the correction vector.
  • the constraint matrix is represented as A
  • the correction vector is represented as x
  • the processor is specifically configured to:
  • x i represents the i-th element of the correction vector x
  • x j represents the j-th element of the correction vector x
  • a ij represents the element of the i-th row and the j-th column of the constraint matrix A
  • the attenuation factor is represented as c
  • the transfer matrix is represented as P
  • the constraint matrix is represented as A
  • the processor is specifically configured to:
  • e i and e j are orthogonal unit vectors, and t is a preset positive integer.
  • the correction vector is represented as x
  • the diagonal correction matrix is represented as D
  • the processor is specifically configured to:
  • Determining the element D ij of the diagonal correction matrix D is:
  • D ij represents the element of the i-th row and the j-th column of the diagonal correction matrix D
  • the attenuation factor is represented as c
  • the transfer matrix is represented as P
  • the diagonal correction matrix is represented as D
  • the similarity between the n nodes is represented as S
  • the processor is specifically configured to:
  • T represents transposition
  • t is a preset positive integer
  • the element s ij of the i-th row and the j-th column of the matrix represented by S represents the similarity between the i-th node and the j-th node.
  • the processor is specifically configured to:
  • the first-order transfer matrix on the inverse graph of the graph is taken as the transfer matrix.
  • the transfer matrix is represented as P, and
  • P ij represents an element of the i-th row and the j-th column of the transfer matrix P
  • In(j) represents all sets of nodes pointing to the node j
  • E represents a set of node groups having a pointing relationship.
  • the correction vector is determined by the Jacobi method, and the similarity between the nodes can be further calculated.
  • the elements of the computed correction vector are independent of each other, so that they can be calculated in parallel, thereby enabling the use of computer clusters to effectively reduce computation time, reduce computational time complexity and space complexity, and Suitable for large networks.
  • FIG. 1 is a flow chart of a method of similarity metrics in accordance with one embodiment of the present invention.
  • Figure 2 is a schematic illustration of a "figure" of one embodiment of the present invention.
  • FIG. 3 is a structural block diagram of an apparatus for similarity metrics according to an embodiment of the present invention.
  • FIG. 4 is a structural block diagram of an apparatus for similarity metrics according to another embodiment of the present invention.
  • SimRank is a model based on graph topology information to measure the degree of similarity between any two nodes.
  • V is a vertex set representing a set of nodes in the graph
  • E is an arc set representing a set of node groups having a pointing relationship, that is, E is a subset of V ⁇ V.
  • the node has the highest similarity to itself, which is 1; the similarity between the two nodes is the mean of the similarity of the nodes pointing to them, multiplied by an attenuation factor.
  • the SimRank matrix representation can be:
  • I is the identity matrix
  • the element P ij of the first-order transfer matrix P can be expressed as
  • D is a diagonal matrix and can be called a diagonal correction matrix.
  • S can be decomposed into:
  • the key to calculating the similarity of SimRank is to calculate the diagonal correction matrix D.
  • the Gauss-Seidel algorithm is used to calculate the diagonal correction matrix D. The calculation of each step depends on the result of the previous step, which results in long time and low computational efficiency.
  • FIG. 1 is a flow chart of a method of similarity metrics in accordance with one embodiment of the present invention. The method shown in Figure 1 includes:
  • the attenuation factor is an attenuation factor defined in a SimRank similarity method, and a dimension of the constraint matrix is n ⁇ n.
  • 105 Generate a diagonal correction matrix according to the correction vector, wherein a diagonal element of the diagonal correction matrix is a component of the correction vector, and a dimension of the diagonal correction matrix is n ⁇ n.
  • the correction vector is determined by the Jacobi method, and the similarity between the nodes can be further calculated.
  • the elements of the computed correction vector are independent of each other, so that they can be calculated in parallel, thereby enabling the use of computer clusters to effectively reduce computation time, reduce computational time complexity and space complexity, and Suitable for large networks.
  • n can be on the order of millions or even billions.
  • Facebook has more than 2.2 billion registered users, and Facebook users constitute nodes in its network. Therefore, the number of nodes will be greater than 2.2 billion.
  • the n nodes in the 101 may be all nodes in the network, or may be partial nodes in the network.
  • n nodes may refer to all of the more than 2.2 billion registered users, or may refer to about 1 billion users whose gender is female, or may refer to users who have recently logged in to India. The invention is not limited thereto.
  • the specific scenario of the network is not limited in the embodiment of the present invention.
  • the embodiment of the present invention does not limit the manner in which the pointing relationship between the n nodes in the network is obtained.
  • the pointing relationship may be determined according to the mutual attention relationship between the n nodes, or the pointing relationship may be determined according to the call record between the n nodes, and the like.
  • the network in the embodiment of the present invention may be a social network, and nodes in the network may be used to represent users in the social network. Then, the pointing relationship between the nodes may refer to the social network. The relationship between the users of the two.
  • the more commonly used social networks are Weibo (Microbolog), WeChat, WeChat, MiTalk, Facebook, Twitter, and LinkedIn. Then, in the social network such as Weibo, if the user U1 is the follower of the user U2, it can be understood that the user U1 to the user U2 have a pointing relationship. In a social network such as WeChat, user U1 is the follower of user U2, and user U2 must also be the follower of user U1. It can be understood that user U1 to user U2 have a pointing relationship, and user U2 to user U1 also Has a pointing relationship.
  • the network in the embodiment of the present invention may be a communication network (Huawei off-network as described above), and nodes in the network may be used to represent users in the communication network, then the nodes are two or two
  • the pointing relationship between the two can refer to the relationship between the two users in the communication network.
  • the user U1 makes a call to the user U2, it can be understood that the user U1 to the user U2 have a pointing relationship.
  • the pointing relationship has directivity.
  • the pointing relationship between the node N1 and the node N2 may be: the node N1 points to the node N2; or the node N2 points to the node N1; or the node N1 points to the node N2 and the node N2 points to the node N1.
  • 101 may include: constructing a graph according to a pointing relationship between n nodes in the network; and using a first-order transition matrix on the inverse graph of the graph as the transfer matrix .
  • the n nodes constitute nodes in the graph
  • the pointing relationship constitutes a directed edge between nodes in the graph.
  • the constructed graph is a directed graph.
  • the first-order transition matrix on the inverse graph of the graph is related to the number of points of each node in the graph.
  • the number of directed edges pointing to each node can be determined, and the transfer matrix is further calculated based on the number of directed edges pointing to each node.
  • N1, N2, N3, N4, and N5, respectively and the figure also includes directed edges between the nodes. Then, it can be easily determined that the number of nodes to which node N1 is pointed is 2; the number of nodes to which node N2 is pointed is 1; the number of nodes to which node N3 is pointed is 3; the number of nodes to which node N4 is pointed is 1, node N5 The number of nodes pointed to is 2.
  • the transfer matrix is represented as P
  • the attenuation factor is represented as c
  • the constraint matrix is represented as A
  • the correction vector is represented as x
  • the diagonal correction matrix is represented as D
  • S The similarity between the nodes.
  • the dimension of the transfer matrix P is n ⁇ n
  • the dimension of the constraint matrix A is n ⁇ n
  • the dimension of the diagonal correction matrix D is n ⁇ n
  • the similarity S between the nodes The dimension is n ⁇ n.
  • the dimension of the correction vector x is n. Where n is a positive integer.
  • P ij represents an element of the i-th row and the j-th column of the transfer matrix P
  • a ij represents an element of the i-th row and the j-th column of the constraint matrix A
  • x i represents an i-th of the correction vector x
  • the elements, D ij represent the elements of the i-th row and the j-th column of the diagonal correction matrix D.
  • i, j 1, 2,...,n.
  • In(j) represents all sets of nodes pointing to node j
  • V represents a set of nodes in the graph
  • E represents a set of node groups having a pointing relationship.
  • the pointing relationship between nodes can be determined in the process of constructing the graph. For example, in the aforementioned Huawei off-grid analysis, the pointing relationship between nodes can be constructed according to the call record between customers. If customer A corresponds to node A in the figure, customer B corresponds to node B in the figure. Then, if customer A makes a call to customer B, it can establish a directed edge from node A to node B when constructing the map. That is, node A points to node B.
  • the variable of the linear equation is the correction vector.
  • the correction vector in 104 may be determined by iteration based on the initial value of the correction vector.
  • the initial value of the correction vector is an initial correction vector, denoted as x (0) .
  • 104 may include iteratively solving the linear system of equations by using the Jacobi method, and determining the solution at the time of convergence as the correction vector, or determining the solution when the preset maximum number of iterations is reached as the correction vector.
  • A is called a constraint matrix, and the element of A is obtained by (7)
  • a ij e i ⁇ e j +cPe i ⁇ Pe j +...+c t P t e i ⁇ P t e j .
  • the correction vector can be initialized first to obtain an initialized correction vector x (0) , and The iterative calculation is then performed using the initialized correction vector x (0) .
  • the method for initializing the correction vector is not limited in the embodiment of the present invention, and the value of the initialization correction vector is not limited.
  • a random function can be used for initialization; for example, an initialized correction vector can be defined to be equal to 1;
  • 104 can specifically include: The correction vector is calculated.
  • x i represents the i-th element of the correction vector x
  • x j represents the j-th element of the correction vector x
  • a ij represents the element of the i-th row and the j-th column of the constraint matrix A
  • a ii An element representing the i-th row and the i-th column of the constraint matrix A
  • b i 1
  • the correction vector determined by 104 may be a solution after convergence of the linear equations.
  • the value after the kth iteration can be As a solution to this linear system of equations.
  • the correction vector determined by 104 may be a value when the linear equation system reaches a preset maximum number of iterations.
  • the kth iteration depends on the result of the k-1th iteration, and does not depend on each other. That is, Calculation and Related, but with None.
  • the calculations can be performed in parallel. Thereby, the calculation time can be shortened and the calculation efficiency can be improved.
  • the parallel computing can be performed independently by a plurality of CPUs or in parallel using a high-performance computer cluster, which can fully utilize the resources of the computer cluster, improve the utilization of the computer, and reduce the space complexity and time complexity.
  • 102, 103, and 104 can be performed in parallel.
  • the first row of the constraint matrix A can be calculated first at 102, then a linear equation can be constructed using the first row of the constraint matrix A at 103, and the linear equation is computed at 104.
  • the second row of the constraint matrix A, ..., and the like can be calculated at 102 while calculating 103 and 104.
  • 105 may include determining that the element D ij of the diagonal correction matrix D is:
  • D ij represents an element of the i-th row and the j-th column of the diagonal correction matrix D
  • the dimension of the diagonal correction matrix D is n ⁇ n, and n is a positive integer.
  • the diagonal correction matrix D is:
  • the diagonal correction matrix D calculated by the Jacobi method is further used, and the similarity between the nodes can be calculated by further using the formula (5). That is, 106 may include calculating the similarity between the nodes according to the following formula:
  • T is a transpose and t is a preset positive integer.
  • the element s ij of the i-th row and the j-th column of the matrix S represents the similarity between the i-th node and the j-th node.
  • the similarity between each of the n nodes can be calculated. That is, the similarity between the two nodes can be calculated.
  • the i-th node among the n nodes is the node i
  • the j-th node among the n nodes is the node j.
  • the similarity between node i and node j can be achieved by the following code 1 (Algorithm 1), which can be called SinglePairSimRank(i,j):
  • the calculation can be performed by the above Algorithm 1. And, if you assume that there is in the network The number of edges to the side is Q, then the time complexity of SinglePairSimRank is O(MQ) and the space complexity is O(Q).
  • the i-th node among the n nodes is the node i, and for all the nodes i, all other nodes can be calculated (that is, n-1 other than the node i among the n nodes)
  • the similarity between the node and the node i can be achieved by the following code 2 (Algorithm 2), which can be called SingleSourceSimRank (i):
  • the similarity between all the nodes may be calculated by using the above code 2 (Algorithm 2), and may be implemented by the following code 3 (Algorithm 3), which may be called AllPairsSimRank:
  • the similarity between all nodes can be calculated through Algorithm 3, and then it can be determined which type of information is recommended for each customer separately. Moreover, if it is assumed that the number of directed edges in the network is Q, the time complexity of AllPairsSimRank is O(M 2 Qn), and the space complexity is O(Q).
  • FIG. 3 is a structural block diagram of an apparatus for similarity metrics according to an embodiment of the present invention.
  • the apparatus 200 shown in FIG. 3 includes an acquisition unit 201 and a processing unit 202.
  • the obtaining unit 201 is configured to acquire a pointing relationship between the n nodes in the network, and obtain an attenuation factor, where the attenuation factor is an attenuation factor defined in the SimRank similarity method, and n is greater than or equal to 2 Positive integer
  • the processing unit 202 is configured to determine a transfer matrix according to the pointing relationship acquired by the obtaining unit 201, and calculate a constraint matrix according to the transfer matrix and the attenuation factor acquired by the obtaining unit 201, where the dimension of the transfer matrix is n ⁇ n, the dimension of the constraint matrix is n ⁇ n;
  • the processing unit 202 is further configured to construct a system of linear equations according to the constraint matrix, where a coefficient matrix of the system of linear equations is the constraint matrix, and a variable of the system of linear equations is a correction vector;
  • the processing unit 202 is further configured to iteratively solve the linear equations by using a Jacobi Jacobi method to determine the correction vector;
  • the processing unit 202 is further configured to generate a diagonal correction matrix according to the correction vector, wherein a diagonal element of the diagonal correction matrix is a component of the correction vector, and a dimension of the diagonal correction matrix is n ⁇ n;
  • the processing unit 202 is further configured to calculate a similarity between the n nodes according to the transition matrix, the diagonal correction matrix, and the attenuation factor acquired by the acquiring unit 201.
  • the correction vector is determined by the Jacobi method, and the similarity between the nodes can be further calculated.
  • the elements of the computed correction vector are independent of each other, so that they can be calculated in parallel, thereby enabling the use of computer clusters to effectively reduce computation time, reduce computational time complexity and space complexity, and Suitable for large networks.
  • the transfer matrix is represented as P
  • the attenuation factor is represented by For c, the constraint matrix is denoted as A, the correction vector is denoted as x, the diagonal correction matrix is denoted as D, and the similarity between the nodes is denoted as S.
  • the dimension of the transfer matrix P is n ⁇ n
  • the dimension of the constraint matrix A is n ⁇ n
  • the dimension of the diagonal correction matrix D is n ⁇ n
  • the similarity S between the nodes The dimension is n ⁇ n.
  • the dimension of the correction vector x is n. Where n is a positive integer and n is related to the number of nodes.
  • P ij represents an element of the i-th row and the j-th column of the transfer matrix P
  • a ij represents an element of the i-th row and the j-th column of the constraint matrix A
  • x i represents an i-th of the correction vector x
  • the elements, D ij represent the elements of the i-th row and the j-th column of the diagonal correction matrix D.
  • i, j 1, 2,...,n.
  • the constraint matrix is represented as A
  • the correction vector is represented as x
  • the processing unit 202 is specifically configured to: calculate a linearity by using a Jacobi method.
  • the system of equations Ax b, where b is a vector with 1 for each element.
  • the processing unit 202 is specifically configured to: iteratively solve the linear equations by using the Jacobi method, and determine the solution at the time of convergence as the correction vector. Or, the solution when the preset maximum number of iterations is reached is determined as the correction vector.
  • the processing unit 202 is specifically configured to: pass The correction vector is calculated.
  • x i represents the i-th element of the correction vector x
  • x j represents the j-th element of the correction vector x
  • a ij represents the element of the i-th row and the j-th column of the constraint matrix A
  • a ii An element representing the i-th row and the i-th column of the constraint matrix A
  • b i 1
  • the attenuation factor is represented as c
  • the transfer matrix is represented as P
  • the constraint matrix is represented as A
  • the processing unit 202 is specifically configured to: determine the constraint matrix A.
  • e i and e j are orthogonal unit vectors
  • t is a preset positive integer.
  • the correction vector is represented as x
  • the diagonal correction matrix is represented as D
  • the attenuation factor is represented as c
  • the transfer matrix is represented as P
  • the diagonal correction matrix is represented as D
  • the similarity between the nodes is represented as S
  • the processing is performed.
  • the unit 202 is specifically configured to: calculate a similarity between the nodes according to the following formula:
  • the processing unit 202 is specifically configured to: construct, according to the pointing relationship between the n nodes in the network acquired by the obtaining unit 201, a graph in which the n nodes constitute n nodes in the graph, the pointing relationship constitutes a directed edge between nodes in the graph; and a first-order transition matrix on a reverse graph of the graph As the transfer matrix.
  • the transfer matrix is represented as P, and
  • P ij represents an element of the i-th row and the j-th column of the transfer matrix P
  • In(j) represents all sets of nodes pointing to the node j
  • E represents a set of node groups having a pointing relationship.
  • the device 200 may be a server for processing data.
  • it can be a server for a social network.
  • the device 200 can be used to implement the method in the foregoing embodiment of FIG. 1. To avoid repetition, details are not described herein again.
  • the device 300 shown in FIG. 4 includes a processor 301, a receiver 302, a transmitter 303, and a memory 304.
  • the receiver 302 is configured to acquire a pointing relationship between two nodes in the network, and use Obtaining an attenuation factor, wherein the attenuation factor is an attenuation factor defined in a SimRank similarity method, and n is a positive integer greater than or equal to 2;
  • the processor 301 is configured to determine a transfer matrix according to the pointing relationship acquired by the receiver 302, and calculate a constraint matrix according to the transfer matrix and the attenuation factor acquired by the receiver 302, where the dimension of the transfer matrix is n ⁇ n, the dimension of the constraint matrix is n ⁇ n;
  • the processor 301 is further configured to construct a system of linear equations according to the constraint matrix, wherein a coefficient matrix of the system of linear equations is the constraint matrix, and a variable of the system of linear equations is a correction vector;
  • the processor 301 is further configured to iteratively solve the linear equations by using a Jacobi Jacobi method to determine the correction vector;
  • the processor 301 is further configured to generate a diagonal correction matrix according to the correction vector, wherein a diagonal element of the diagonal correction matrix is a component of the correction vector, and a dimension of the diagonal correction matrix is n ⁇ n;
  • the processor 301 is further configured to calculate a similarity between the n nodes according to the transition matrix, the diagonal correction matrix, and the attenuation factor acquired by the receiver 302.
  • the correction vector is determined by the Jacobi method, and the similarity between the nodes can be further calculated.
  • the elements of the computed correction vector are independent of each other, so that they can be calculated in parallel, thereby enabling the use of computer clusters to effectively reduce computation time, reduce computational time complexity and space complexity, and Suitable for large networks.
  • bus system 305 which in addition to the data bus includes a power bus, a control bus, and a status signal bus.
  • bus system 305 includes a power bus, a control bus, and a status signal bus.
  • Processor 301 may be an integrated circuit chip with signal processing capabilities. In the implementation process, each step of the foregoing method may be completed by an integrated logic circuit of hardware in the processor 301 or an instruction in a form of software.
  • the processor 301 may be a general-purpose processor, a digital signal processor (DSP), an application specific integrated circuit (ASIC), a field programmable gate array (FPGA), or the like. Programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processor
  • ASIC application specific integrated circuit
  • FPGA field programmable gate array
  • the general purpose processor may be a microprocessor or the processor or any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present invention may be directly implemented by the hardware decoding processor, or may be performed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a conventional storage medium such as random access memory, flash memory, read only memory, programmable read only memory or electrically erasable programmable memory, registers, and the like.
  • the storage medium is located in the memory 304, and the processor 301 reads the information in the memory 304 and completes the steps of the above method in combination with its hardware.
  • the memory 304 in the embodiments of the present invention may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be a read-only memory (ROM), a programmable read only memory (PROM), an erasable programmable read only memory (Erasable PROM, EPROM), or an electric Erase programmable read only memory (EEPROM) or flash memory.
  • the volatile memory can be a Random Access Memory (RAM) that acts as an external cache.
  • RAM Random Access Memory
  • many forms of RAM are available, such as static random access memory (SRAM), dynamic random access memory (DRAM), synchronous dynamic random access memory (Synchronous DRAM).
  • the memory 304 of the systems and methods described herein is intended to comprise, without being limited to, these and any other suitable types of memory.
  • the embodiments described herein can be implemented in hardware, software, firmware, middleware, microcode, or a combination thereof.
  • the processing unit can be implemented in one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processing (DSP), Digital Signal Processing Equipment (DSP Device, DSPD), programmable Programmable Logic Device (PLD), Field-Programmable Gate Array (FPGA), general purpose processor, controller, microcontroller, microprocessor, other for performing the functions described herein In an electronic unit or a combination thereof.
  • ASICs Application Specific Integrated Circuits
  • DSP Digital Signal Processing
  • DSP Device Digital Signal Processing Equipment
  • PLD programmable Programmable Logic Device
  • FPGA Field-Programmable Gate Array
  • Code segments can represent procedures, functions, Subprograms, programs, routines, subroutines, modules, software groups, classes, or any combination of instructions, data structures, or program statements.
  • a code segment can be combined into another code segment or hardware circuit by transmitting and/or receiving information, data, arguments, parameters or memory contents.
  • Information, arguments, parameters, data, etc. can be communicated, forwarded, or transmitted using any suitable means including memory sharing, messaging, token passing, network transmission, and the like.
  • the techniques described herein can be implemented by modules (eg, procedures, functions, and so on) that perform the functions described herein.
  • the software code can be stored in a memory unit and executed by the processor.
  • the memory unit can be implemented in the processor or external to the processor, in the latter case the memory unit can be communicatively coupled to the processor via various means known in the art.
  • the transfer matrix is represented as P
  • the attenuation factor is represented as c
  • the constraint matrix is represented as A
  • the correction vector is represented as x
  • the diagonal correction matrix is represented as D
  • S The similarity between the nodes.
  • the dimension of the transfer matrix P is n ⁇ n
  • the dimension of the constraint matrix A is n ⁇ n
  • the dimension of the diagonal correction matrix D is n ⁇ n
  • the similarity S between the nodes The dimension is n ⁇ n.
  • the dimension of the correction vector x is n. Where n is a positive integer and n is related to the number of nodes.
  • P ij represents an element of the i-th row and the j-th column of the transfer matrix P
  • a ij represents an element of the i-th row and the j-th column of the constraint matrix A
  • x i represents an i-th of the correction vector x
  • the elements, D ij represent the elements of the i-th row and the j-th column of the diagonal correction matrix D.
  • i, j 1, 2,...,n.
  • the constraint matrix is represented as A
  • the correction vector is represented as x
  • the processor 301 is specifically configured to: calculate a linearity by using a Jacobi method.
  • the system of equations Ax b, where b is a vector with 1 for each element.
  • the processor 301 is specifically configured to: iteratively solve the linear equations by using the Jacobi method, and determine the solution at the time of convergence as the correction vector. Or, the solution when the preset maximum number of iterations is reached is determined as the correction vector.
  • the processor 301 is specifically configured to: pass The correction vector is calculated.
  • x i represents the i-th element of the correction vector x
  • x j represents the j-th element of the correction vector x
  • a ij represents the element of the i-th row and the j-th column of the constraint matrix A
  • a ii An element representing the i-th row and the i-th column of the constraint matrix A
  • b i 1
  • the attenuation factor is represented as c
  • the transfer matrix is represented as P
  • the constraint matrix is represented as A
  • the processor 301 is specifically configured to: determine the constraint matrix A.
  • e i and e j are orthogonal unit vectors
  • t is a preset positive integer.
  • the correction vector is represented as x
  • the diagonal correction matrix is represented as D
  • the attenuation factor is represented as c
  • the transfer matrix is represented as P
  • the diagonal correction matrix is represented as D
  • the similarity between the nodes is represented as S
  • the processing is performed.
  • the device 301 is specifically configured to: calculate a similarity between the nodes according to the following formula:
  • S D + cP T DP + c 2 (P T) 2 DP 2 + ... + c t (P T) t DP t, where, T represents a transpose, t is a predetermined positive integer, S is represented by a matrix The element s ij of the i-th row and the j-th column represents the similarity between the i-th node and the j-th node.
  • the processor 301 is specifically configured to: according to the obtained pointing relationship between the n nodes in the network, construct a map, where The n nodes constitute n nodes in the graph, the pointing relationship constitutes a directed edge between nodes in the graph; and the first-order transition matrix on the inverse graph of the graph is used as Transfer matrix.
  • the transfer matrix is represented as P, and
  • P ij represents an element of the i-th row and the j-th column of the transfer matrix P
  • In(j) represents all sets of nodes pointing to the node j
  • E represents a set of node groups having a pointing relationship.
  • the transmitter 303 can be used to output the value of the similarity calculated by the processor 301, for example, can be output to the display screen of the device 300, or can be output to other devices connected to the device 300. Equipment or device.
  • the memory 304 can be used to store preset values (such as values of c, t) required for calculation, and can also be used to store code executed by the processor 301 (for example, as shown in FIG. 1)
  • the Algorithm 1, Algorithm 2, and Algorithm 3) in the embodiment can also be used to store intermediate results in the calculation process and the like.
  • the device 300 may be a server for processing data.
  • it can be a server for a social network.
  • the device 300 can be used to implement the foregoing method in the embodiment of FIG. 1. To avoid repetition, details are not described herein again.
  • the disclosed systems, devices, and methods may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of the unit is only a logical function division.
  • there may be another division manner for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the functions may be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a standalone product.
  • the technical solution of the present invention which is essential or contributes to the prior art, or a part of the technical solution, may be embodied in the form of a software product, which is stored in a storage medium, including
  • the instructions are used to cause a computer device (which may be a personal computer, server, or network device, etc.) to perform all or part of the steps of the methods described in various embodiments of the present invention.
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program codes. .

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Mathematical Analysis (AREA)
  • Pure & Applied Mathematics (AREA)
  • Computational Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Algebra (AREA)
  • Software Systems (AREA)
  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Computing Systems (AREA)
  • Operations Research (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Complex Calculations (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种相似性度量的方法,包括:获取网络中的节点之间的指向关系并确定转移矩阵(101);根据转移矩阵和所获取的衰减因子计算约束矩阵(102);构建线性方程组,其中,线性方程组的系数矩阵为所述约束矩阵,线性方程组的变量为矫正向量(103);通过Jacobi方法迭代求解线性方程组,确定矫正向量(104);根据转移矩阵、衰减因子和由矫正向量所生成的对角矫正矩阵,计算节点之间的相似度(106)。该方法采用Jacobi方法确定矫正向量,进一步可计算节点之间的相似度。在Jacobi方法的每一次迭代中,计算矫正向量的各个元素是互相独立的,这样能够并行计算,从而能够利用计算机集群有效地减少计算时间,降低计算时的时间复杂度和空间复杂度,并且能够适用于大网络。

Description

相似性度量的方法及设备 技术领域
本发明实施例涉及数据处理领域,并且更具体地,涉及一种相似性度量的方法及设备。
背景技术
在如今的大数据互联网时代,大图和大网络是数据和信息的常见表达方式,例如,社交网络、互联网、电子商务、通信网络等。基于图的应用可以包括检索和推荐。其中,检索可以例如谷歌(google)搜索引擎。推荐可以例如脸书(Facebook)朋友推荐、领英(LinkedIn)职业推荐、网飞(Netflix)电影推荐、易贝(Ebay)和亚马逊(Amazon)商品推荐、推特(Twitter)消息推荐等。一般地,检索和推荐都是基于图中节点之间的相似性进行的。
例如,社交网络是朋友之间分享信息的重要平台。朋友越多,信息共享和交流越频繁。因此维护社交网络的一个重要功能是,根据节点之间的相似性进行朋友推荐。
再例如,在华为离网分析中,假设客户A放弃联通业务转向移动业务,那么对于联通来说,需要了解与客户A最为“相似”的客户,作为潜在的可能会流失的客户并重点关注。
一种衡量节点间的相似性的方法为:收集所有节点的各种属性,例如年龄、职业、收入、爱好等,然后根据各种属性的相似性来衡量节点之间的相似性。然而,这种方法不仅需要收集大量的客户信息,对存储要求高,并且这种方法可能会涉及到客户的个人隐私信息。
另一种较为有效地进行节点之间的相似性度量的方法为SimRank。目前,SimRank已经被广泛地应用在各种场景中,例如,推荐系统、信息检索、连接预测(link prediction)、引文网络(citation network)、学生课程网络等。但是,现有技术当中基于SimRank的相似性度量的方法是根据定义直接计算的,造成时间和空间复杂度高,不适于大网络。
发明内容
本发明实施例提供一种相似性度量的方法,时间和空间复杂度低,能够 适于大网络。
第一方面,提供了一种相似性度量的方法,包括:
获取网络中的n个节点两两之间的指向关系,并根据所述指向关系确定转移矩阵,其中,所述转移矩阵的维度为n×n,n为大于或等于2的正整数;
获取衰减因子,并根据所述转移矩阵和所述衰减因子计算约束矩阵,其中,所述衰减因子为SimRank相似度方法中定义的衰减因子,且所述约束矩阵的维度为n×n;
根据所述约束矩阵,构建线性方程组,其中,所述线性方程组的系数矩阵为所述约束矩阵,所述线性方程组的变量为矫正向量;
采用雅可比Jacobi方法迭代求解所述线性方程组,确定所述矫正向量;
根据所述矫正向量生成对角矫正矩阵,其中,所述对角矫正矩阵的对角元素为所述矫正向量的分量,且所述对角矫正矩阵的维度为n×n;
根据所述转移矩阵、所述衰减因子和所述对角矫正矩阵,计算所述n个节点之间的相似度。
结合第一方面,在第一方面的第一种可能的实现方式中,所述采用Jacobi方法迭代求解所述线性方程组,确定所述矫正向量,包括:
采用Jacobi方法迭代求解所述线性方程组,并将收敛时的解确定为所述矫正向量,或者,将达到预设的最大迭代次数时的解确定为所述矫正向量。
结合第一方面或者第一方面的第一种可能的实现方式,在第一方面的第二种可能的实现方式中,所述约束矩阵表示为A,所述矫正向量表示为x,所述线性方程组表示为Ax=b,
其中,b为每个元素均为1的向量。
结合第一方面的第二种可能的实现方式,在第一方面的第三种可能的实现方式中,所述采用Jacobi方法迭代求解所述线性方程组,确定所述矫正向量,包括:
通过
Figure PCTCN2016074728-appb-000001
计算所述矫正向量;
其中,xi表示所述矫正向量x的第i个元素,xj表示所述矫正向量x的第j个元素,aij表示所述约束矩阵A的第i行第j列的元素,aii表示所述约束矩阵A的第i行第i列的元素,bi=1,k表示所述Jacobi方法的迭代次数, i,j=1,2,…,n,并且,k为正整数。
结合第一方面或者上述第一方面的任一种可能的实现方式,在第一方面的第四种可能的实现方式中,所述衰减因子表示为c,所述转移矩阵表示为P,所述约束矩阵表示为A,所述根据所述转移矩阵和所述衰减因子计算约束矩阵,包括:
确定所述约束矩阵A的元素为aij=ei·ej+cPei·Pej+…+ctPtei·Ptej
其中,ei、ej为正交单位向量,t为预设的正整数。
结合第一方面或者上述第一方面的任一种可能的实现方式,在第一方面的第五种可能的实现方式中,所述矫正向量表示为x,所述对角矫正矩阵表示为D,所述根据所述矫正向量生成对角矫正矩阵,包括:
确定所述对角矫正矩阵D的元素Dij为:
Figure PCTCN2016074728-appb-000002
其中,Dij表示所述对角矫正矩阵D的第i行第j列的元素,xi表示所述矫正向量x的第i个元素,i,j=1,2,…,n。
结合第一方面或者上述第一方面的任一种可能的实现方式,在第一方面的第六种可能的实现方式中,所述衰减因子表示为c,所述转移矩阵表示为P,所述对角矫正矩阵表示为D,所述节点之间的相似度表示为S,所述根据所述转移矩阵、所述衰减因子和所述对角矫正矩阵,计算所述n个节点之间的相似度,包括:
根据下式计算所述n个节点之间的相似度:
S=D+cPTDP+c2(PT)2DP2+…+ct(PT)tDPt
其中,T表示转置,t为预设的正整数,S所表示的矩阵的第i行第j列的元素sij表示第i个节点与第j个节点之间的相似度。
结合第一方面或者上述第一方面的任一种可能的实现方式,在第一方面的第七种可能的实现方式中,所述获取网络中的n个节点两两之间的指向关系,并根据所述指向关系确定转移矩阵,包括:
根据所述网络中的n个节点两两之间的指向关系,构建图,其中,所述n个节点构成所述图中的n个节点,所述指向关系构成所述图中的节点之间 的有向边;
将所述图的逆向图上的一阶转移矩阵作为所述转移矩阵。
结合第一方面或者上述第一方面的任一种可能的实现方式,在第一方面的第八种可能的实现方式中,所述转移矩阵表示为P,并且
Figure PCTCN2016074728-appb-000003
其中,Pij表示所述转移矩阵P的第i行第j列的元素,In(j)表示所有指向节点j的节点集合,E表示具有指向关系的节点组的集合。
第二方面,提供了一种用于相似性度量的设备,包括:
获取单元,用于获取网络中的n个节点两两之间的指向关系,并用于获取衰减因子,其中,所述衰减因子为SimRank相似度方法中定义的衰减因子,n为大于或等于2的正整数;
处理单元,用于根据所述获取单元获取的所述指向关系确定转移矩阵,并根据所述转移矩阵和所述获取单元获取的所述衰减因子计算约束矩阵,其中,所述转移矩阵的维度为n×n,所述约束矩阵的维度为n×n;
所述处理单元,还用于根据所述约束矩阵,构建线性方程组,其中,所述线性方程组的系数矩阵为所述约束矩阵,所述线性方程组的变量为矫正向量;
所述处理单元,还用于采用雅可比Jacobi方法迭代求解所述线性方程组,确定所述矫正向量;
所述处理单元,还用于根据所述矫正向量生成对角矫正矩阵,其中,所述对角矫正矩阵的对角元素为所述矫正向量的分量,且所述对角矫正矩阵的维度为n×n;
所述处理单元,还用于根据所述转移矩阵、所述对角矫正矩阵和所述获取单元获取的所述衰减因子,计算所述n个节点之间的相似度。
结合第二方面,在第二方面的第一种可能的实现方式中,所述处理单元,具体用于:
采用Jacobi方法迭代求解所述线性方程组,并将收敛时的解确定为所述矫正向量,或者,将达到预设的最大迭代次数时的解确定为所述矫正向量。
结合第二方面或者第二方面的第一种可能的实现方式,在第二方面的第 二种可能的实现方式中,所述约束矩阵表示为A,所述矫正向量表示为x,所述线性方程组表示为Ax=b,
其中,b为每个元素均为1的向量。
结合第二方面的第二种可能的实现方式,在第二方面的第三种可能的实现方式中,所述处理单元,具体用于:
通过
Figure PCTCN2016074728-appb-000004
计算所述矫正向量;
其中,xi表示所述矫正向量x的第i个元素,xj表示所述矫正向量x的第j个元素,aij表示所述约束矩阵A的第i行第j列的元素,aii表示所述约束矩阵A的第i行第i列的元素,bi=1,k表示所述Jacobi方法的迭代次数,i,j=1,2,…,n,并且,k为正整数。
结合第二方面或者上述第二方面的任一种可能的实现方式,在第二方面的第四种可能的实现方式中,所述衰减因子表示为c,所述转移矩阵表示为P,所述约束矩阵表示为A,所述处理单元,具体用于:
确定所述约束矩阵A的元素为aij=ei·ej+cPei·Pej+…+ctPtei·Ptej
其中,ei、ej为正交单位向量,t为预设的正整数。
结合第二方面或者上述第二方面的任一种可能的实现方式,在第二方面的第五种可能的实现方式中,所述矫正向量表示为x,所述对角矫正矩阵表示为D,所述处理单元,具体用于:
确定所述对角矫正矩阵D的元素Dij为:
Figure PCTCN2016074728-appb-000005
其中,Dij表示所述对角矫正矩阵D的第i行第j列的元素,xi表示所述矫正向量x的第i个元素,i,j=1,2,…,n。
结合第二方面或者上述第二方面的任一种可能的实现方式,在第二方面的第六种可能的实现方式中,所述衰减因子表示为c,所述转移矩阵表示为P,所述对角矫正矩阵表示为D,所述n个节点之间的相似度表示为S,所述处理单元,具体用于:
根据下式计算所述节点之间的相似度:
S=D+cPTDP+c2(PT)2DP2+…+ct(PT)tDPt
其中,T表示转置,t为预设的正整数,S所表示的矩阵的第i行第j列的元素sij表示第i个节点与第j个节点之间的相似度。
结合第二方面或者上述第二方面的任一种可能的实现方式,在第二方面的第七种可能的实现方式中,所述处理单元,具体用于:
根据所述获取单元获取的所述网络中的n个节点两两之间的指向关系,构建图,其中,所述n个节点构成所述图中的n个节点,所述指向关系构成所述图中的节点之间的有向边;
将所述图的逆向图上的一阶转移矩阵作为所述转移矩阵。
结合第二方面或者上述第二方面的任一种可能的实现方式,在第二方面的第八种可能的实现方式中,所述转移矩阵表示为P,并且
Figure PCTCN2016074728-appb-000006
其中,Pij表示所述转移矩阵P的第i行第j列的元素,In(j)表示所有指向节点j的节点集合,E表示具有指向关系的节点组的集合。
第三方面,提供了一种用于相似性度量的设备,包括:
接收器,用于获取网络中的n个节点两两之间的指向关系,并用于获取衰减因子,其中,所述衰减因子为SimRank相似度方法中定义的衰减因子,n为大于或等于2的正整数;
处理器,用于根据所述接收器获取的所述指向关系确定转移矩阵,并根据所述转移矩阵和所述接收器获取的所述衰减因子计算约束矩阵,其中,所述转移矩阵的维度为n×n,所述约束矩阵的维度为n×n;
所述处理器,还用于根据所述约束矩阵,构建线性方程组,其中,所述线性方程组的系数矩阵为所述约束矩阵,所述线性方程组的变量为矫正向量;
所述处理器,还用于采用雅可比Jacobi方法迭代求解所述线性方程组,确定所述矫正向量;
所述处理器,还用于根据所述矫正向量生成对角矫正矩阵,其中,所述对角矫正矩阵的对角元素为所述矫正向量的分量,且所述对角矫正矩阵的维度为n×n;
所述处理器,还用于根据所述转移矩阵、所述对角矫正矩阵和所述接收器获取的所述衰减因子,计算所述n个节点之间的相似度。
结合第三方面,在第三方面的第一种可能的实现方式中,所述处理器,具体用于:
采用Jacobi方法迭代求解所述线性方程组,并将收敛时的解确定为所述矫正向量,或者,将达到预设的最大迭代次数时的解确定为所述矫正向量。
结合第三方面或者第三方面的第一种可能的实现方式,在第三方面的第二种可能的实现方式中,所述约束矩阵表示为A,所述矫正向量表示为x,所述线性方程组表示为Ax=b,
其中,b为每个元素均为1的向量。
结合第三方面的第一种可能的实现方式,在第三方面的第三种可能的实现方式中,所述处理器,具体用于:
通过
Figure PCTCN2016074728-appb-000007
计算所述矫正向量;
其中,xi表示所述矫正向量x的第i个元素,xj表示所述矫正向量x的第j个元素,aij表示所述约束矩阵A的第i行第j列的元素,aii表示所述约束矩阵A的第i行第i列的元素,bi=1,k表示所述Jacobi方法的迭代次数,i,j=1,2,…,n,并且,k为正整数。
结合第三方面或者上述第三方面的任一种可能的实现方式,在第三方面的第四种可能的实现方式中,所述衰减因子表示为c,所述转移矩阵表示为P,所述约束矩阵表示为A,所述处理器,具体用于:
确定所述约束矩阵A的元素为aij=ei·ej+cPei·Pej+…+ctPtei·Ptej
其中,ei、ej为正交单位向量,t为预设的正整数。
结合第三方面或者上述第三方面的任一种可能的实现方式,在第三方面的第五种可能的实现方式中,所述矫正向量表示为x,所述对角矫正矩阵表示为D,所述处理器,具体用于:
确定所述对角矫正矩阵D的元素Dij为:
Figure PCTCN2016074728-appb-000008
其中,Dij表示所述对角矫正矩阵D的第i行第j列的元素,xi表示所述矫正向量x的第i个元素,i,j=1,2,…,n。
结合第三方面或者上述第三方面的任一种可能的实现方式,在第三方面的第六种可能的实现方式中,所述衰减因子表示为c,所述转移矩阵表示为P,所述对角矫正矩阵表示为D,所述n个节点之间的相似度表示为S,所述处理器,具体用于:
根据下式计算所述节点之间的相似度:
S=D+cPTDP+c2(PT)2DP2+…+ct(PT)tDPt
其中,T表示转置,t为预设的正整数,S所表示的矩阵的第i行第j列的元素sij表示第i个节点与第j个节点之间的相似度。
结合第三方面或者上述第三方面的任一种可能的实现方式,在第三方面的第七种可能的实现方式中,所述处理器,具体用于:
根据获取的所述网络中的n个节点两两之间的指向关系,构建图,其中,所述n个节点构成所述图中的n个节点,所述指向关系构成所述图中的节点之间的有向边;
将所述图的逆向图上的一阶转移矩阵作为所述转移矩阵。
结合第三方面或者上述第三方面的任一种可能的实现方式,在第三方面的第八种可能的实现方式中,所述转移矩阵表示为P,并且
Figure PCTCN2016074728-appb-000009
其中,Pij表示所述转移矩阵P的第i行第j列的元素,In(j)表示所有指向节点j的节点集合,E表示具有指向关系的节点组的集合。
本发明实施例中,采用Jacobi方法确定矫正向量,进一步可计算节点之间的相似度。在Jacobi方法的每一次迭代中,计算矫正向量的各个元素是互相独立的,这样能够并行计算,从而能够利用计算机集群有效地减少计算时间,降低计算时的时间复杂度和空间复杂度,并且能够适用于大网络。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例或现有技 术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本发明一个实施例的相似性度量的方法的流程图。
图2是本发明一个实施例的“图”的示意图。
图3是本发明一个实施例的用于相似性度量的设备的结构框图。
图4是本发明另一个实施例的用于相似性度量的设备的结构框图。
具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
SimRank是一种基于图的拓扑结构信息来衡量任意两个节点间相似程度的模型。
在图G=(V,E)中,V为顶点集,表示图中的节点的集合;E为弧集,表示具有指向关系的节点组的集合,即E为V×V的子集。
用In(i)表示所有指向节点i的节点集合(即入邻点集合),用s(i,j)表示两个节点i和j之间的SimRank相似度,那么,SimRank的数学定义可以表示如下:
1.s(i,j)=0,当
Figure PCTCN2016074728-appb-000010
或者
Figure PCTCN2016074728-appb-000011
2.在其他情况下,
Figure PCTCN2016074728-appb-000012
其中,c∈(0,1)是衰减因子,
Figure PCTCN2016074728-appb-000013
表示空集。
从定义可以看出,在SimRank的相似性度量中,节点和自己本身的相似性最高,为1;两节点之间的相似性是指向它们的节点的相似性的均值,乘以一个衰减因子。
根据上述的定义,SimRank矩阵表示形式可以为:
S=(cPTSP)∨I。                (2)
其中,I为单位矩阵,P为原图G=(V,E)的逆向图GT上的一阶转移矩阵,∨表示两个矩阵的对应元素取最大。
其中,一阶转移矩阵P的元素Pij可以表示为
Figure PCTCN2016074728-appb-000014
根据SimRank矩阵表示形式S=(cPTSP)∨I,可以将S分解为:
S=cPTSP+D;
其中,D是对角阵,可以称为对角矫正矩阵。进一步地,S可以分解为:
S=D+cPTDP+c2(PT)2DP2+…。        (4)
可见,计算SimRank相似度的关键是计算对角矫正矩阵D。目前采用高斯-赛德代数(Gauss-Seidel algorithm)的方法计算对角矫正矩阵D,每一步的计算都依赖于上一步的结果,这样,造成耗时长,计算效率低。
图1是本发明一个实施例的相似性度量的方法的流程图。图1所示的方法包括:
101,获取网络中的n个节点两两之间的指向关系,并根据所述指向关系确定转移矩阵,其中,所述转移矩阵的维度为n×n,n为大于或等于2的正整数。
102,获取衰减因子,并根据所述转移矩阵和所述衰减因子计算约束矩阵,其中,所述衰减因子为SimRank相似度方法中定义的衰减因子,且所述约束矩阵的维度为n×n。
103,根据所述约束矩阵,构建线性方程组,其中,所述线性方程组的系数矩阵为所述约束矩阵,所述线性方程组的变量为矫正向量。
104,采用雅可比Jacobi方法迭代求解所述线性方程组,确定所述矫正向量。
105,根据所述矫正向量生成对角矫正矩阵,其中,所述对角矫正矩阵的对角元素为所述矫正向量的分量,且所述对角矫正矩阵的维度为n×n。
106,根据所述转移矩阵、所述衰减因子和所述对角矫正矩阵,计算所述n个节点之间的相似度。
本发明实施例中,采用Jacobi方法确定矫正向量,进一步可计算节点之间的相似度。在Jacobi方法的每一次迭代中,计算矫正向量的各个元素是互相独立的,这样能够并行计算,从而能够利用计算机集群有效地减少计算时间,降低计算时的时间复杂度和空间复杂度,并且能够适用于大网络。
一般地,网络中的节点数众多,n的数量级也较大。例如,n可以为百万量级,甚至可以为亿量级。举例来说,Facebook的注册用户数量大于22亿,Facebook的用户构成其网络中的节点,因此,节点的数量n也会大于22亿。
本发明实施例中,101中的n个节点可以是网络中的所有节点,或者,也可以是网络中的部分节点。例如,对于Facebook来说,n个节点可以是指全部的大于22亿注册用户,或者也可以是指性别为女性的大约10亿用户,或者也可以是指最近一次登录地点为印度的用户。本发明对此不作限定。
应注意,本发明实施例对网络的具体场景不作限定。本发明实施例对获取网络中的n个节点两两之间的指向关系的方式不作限定。例如,可以根据n个节点之间的互相关注的关系确定指向关系,或者,也可以根据n个节点之间的通话记录确定指向关系,等等。
举例来说,本发明实施例中的网络可以是社交网络(social network),网络中的节点可以用来表示社交网络中的用户,那么,节点两两之间的指向关系可以是指社交网络中的用户两两之间的关注关系。
例如,比较常用的社交网络有微博(Weibo或MicroBlog)、微信(WeChat)、易信、米聊(MiTalk)、脸书(Facebook)、推特(Twitter)和领英(LinkedIn)等。那么,在诸如微博的社交网络中,若用户U1为用户U2的关注者,则可理解为:用户U1到用户U2具有指向关系。在诸如微信的社交网络中,用户U1为用户U2的关注者,同时用户U2也一定为用户U1的关注者,则可理解为:用户U1到用户U2具有指向关系,且用户U2到用户U1也具有指向关系。
再举例来说,本发明实施例中的网络可以是通讯网络(如前所述的华为离网),网络中的节点可以用来表示通讯网络中的用户,那么,节点两两之 间的指向关系可以是指通讯网络中的用户两两之间的通话关系。
例如,若用户U1给用户U2打过电话,则可以理解为:用户U1到用户U2具有指向关系。
可见,本发明实施例中,指向关系具有方向性。例如,节点N1与节点N2之间的指向关系可以是:节点N1指向节点N2;或者节点N2指向节点N1;或者节点N1指向节点N2且节点N2指向节点N1。
可选地,101可以包括:根据所述网络中的n个节点两两之间的指向关系,构建图(Graph);并将所述图的逆向图上的一阶转移矩阵作为所述转移矩阵。其中,所述n个节点构成所述图中的节点,所述指向关系构成所述图中的节点之间的有向边。
可理解,所构建的图为有向图。其中,图的逆向图上的一阶转移矩阵与图中的每个节点的被指向的数目有关。这里,可以确定指向每个节点的有向边的数目,并进一步根据指向每个节点的有向边的数目,计算所述转移矩阵。
例如,在图2所示的“图”中,包括五个节点,分别为N1、N2、N3、N4和N5,并且图中还包括节点之间的有向边。那么,可以很容易地确定:节点N1被指向的节点数为2;节点N2被指向的节点数为1;节点N3被指向的节点数为3;节点N4被指向的节点数为1,节点N5被指向的节点数为2。
应注意,关于图(Graph)的具体描述可以参见现有技术的图论中的相关定义和描述,为避免重复,这里不再赘述。
具体地,本发明实施例中,所述转移矩阵表示为P,所述衰减因子表示为c,约束矩阵表示为A,所述矫正向量表示为x,所述对角矫正矩阵表示为D,所述节点之间的相似度表示为S。
并且,所述转移矩阵P的维度为n×n,所述约束矩阵A的维度为n×n,所述对角矫正矩阵D的维度为n×n,所述节点之间的相似度S的维度为n×n。所述矫正向量x的维度为n。其中,n为正整数。
相应地,Pij表示所述转移矩阵P的第i行第j列的元素,aij表示所述约束矩阵A的第i行第j列的元素,xi表示所述矫正向量x的第i个元素,Dij表示所述对角矫正矩阵D的第i行第j列的元素。其中,i,j=1,2,…,n。
本发明实施例中,转移矩阵P为原图G=(V,E)的逆向图GT上的一阶转 移矩阵,101可以由下式确定:
Figure PCTCN2016074728-appb-000015
其中,In(j)表示所有指向节点j的节点集合,V表示图中的节点的集合,E表示具有指向关系的节点组的集合。
其中,节点之间的指向关系可以在构建图的过程中确定。举例来说,在前述的华为离网分析中,可以根据客户之间的通话记录构建节点之间的指向关系。假如客户A对应的为图中的节点A,客户B对应的为图中的节点B。那么,如果客户A给客户B打过电话,便可以在构建图时建立一条从节点A指向节点B的有向边。也就是说,节点A指向节点B。
本发明实施例中,104可以包括:采用Jacobi方法计算线性方程组Ax=b,其中,b为每个元素均为1的向量。这里,该线性方程组的变量即为矫正向量。
具体地,104中的矫正向量可以是在矫正向量的初值的基础上,经过迭代所确定的。其中,所述矫正向量的初值为初始化的矫正向量,表示为x(0)。104可包括:采用Jacobi方法迭代求解所述线性方程组,并将收敛时的解确定为所述矫正向量,或者,将达到预设的最大迭代次数时的解确定为所述矫正向量。
具体地,理论分析过程可以如下所述:
由于衰减因子c∈(0,1),因此,根据前述(4)式,可以将节点之间的相似度S近似为:
S≈St=D+cPTDP+c2(PT)2DP2+…+ct(PT)tDPt。       (5)
其中,t为正整数。例如,t=5。
进一步,根据(1)式,由于一个节点和其本身的相关性为1,即s(i,i)=1,于是有:
Figure PCTCN2016074728-appb-000016
其中,ei为正交单位向量,具体地,满足
Figure PCTCN2016074728-appb-000017
如果假设x=(D11,D22,…,Dnn)T,那么在(6)式的基础上可以得到:
1=xT(ei·ei+cPei·Pei+…+ctPtei·Ptei)。         (7)
这样,便可以通过计算线性方程组Ax=b来计算对角矫正矩阵D。其中,b=(b1,b2,…,bn)T,且b1=b2=…=bn=1,A称为约束矩阵,并且A的元素由(7)式可得
aij=ei·ej+cPei·Pej+…+ctPtei·Ptej。         (8)
经过上述分析可知,102可以包括:确定所述约束矩阵A的元素为aij=ei·ej+cPei·Pej+…+ctPtei·Ptej,其中,ei和ej为正交单位向量,t为预设的正整数。
进一步地,在103中,可以利用约束矩阵A构建线性方程组Ax=b,进而在104通过迭代求解线性方程组Ax=b得到矫正向量x。
例如,104中,可以先初始化矫正向量,得到初始化的矫正向量x(0),且
Figure PCTCN2016074728-appb-000018
然后再利用该初始化的矫正向量x(0)进行迭代计算。
应注意,本发明实施例对初始化矫正向量的方法不作限定,对初始化的矫正向量的值也不作限定。例如,可以采用随机(Random)函数进行初始化;例如,可以定义初始化的矫正向量等于1;等等。
那么,104可以具体包括:通过
Figure PCTCN2016074728-appb-000019
计算所述矫正向量。其中,xi表示所述矫正向量x的第i个元素,xj表示所述矫正向量x的第j个元素,aij表示所述约束矩阵A的第i行第j列的元素,aii表示所述约束矩阵A的第i行第i列的元素,bi=1,k表示所述Jacobi方法的迭代次数,i,j=1,2,…,n,所述约束矩阵A的维度为n×n,并且,k和n均为正整数。
可选地,作为一个实施例,104所确定的矫正向量,可以是线性方程组收敛后的解。
例如,若
Figure PCTCN2016074728-appb-000020
则认为解达到收敛,便可以将第k次迭代后的值
Figure PCTCN2016074728-appb-000021
作为该线性方程组的解。这里,ε为预定义的值,例如ε=10-6
可选地,作为一个实施例,104所确定的矫正向量,可以是线性方程组达到预设的最大迭代次数时的值。
例如,若假设预设的最大迭代次数为N,那么,如果当k=N时仍然没有达到收敛,此时便可以将第N次迭代后的值
Figure PCTCN2016074728-appb-000022
作为该线性方程组的解。
同时,在采用Jacobi方法进行迭代的过程中,从
Figure PCTCN2016074728-appb-000023
可以看出,第k次迭代时依赖于第k-1次迭代的结果,而不互相依赖。也就是说,
Figure PCTCN2016074728-appb-000024
的计算与
Figure PCTCN2016074728-appb-000025
有关,但是与
Figure PCTCN2016074728-appb-000026
无关。这样,在第k次迭代时,对于n个
Figure PCTCN2016074728-appb-000027
的计算,可以并行地进行。从而能够缩短计算的时间,提高计算的效率。
并且,并行的计算可以是由多个CPU独立地进行的,或者采用高性能计算机集群并行地进行,能够充分计算机集群的资源,提高计算机的利用率,降低空间复杂度和时间复杂度。
进一步地,从
Figure PCTCN2016074728-appb-000028
可以看出,在102中计算约束矩阵A时,可以不需要显示地构造整个约束矩阵A,而只需要在线地每次计算约束矩阵A的每一行即可。
也就是说,本发明实施例中,102、103和104可以并行地进行。例如,可以先在102计算约束矩阵A的第一行,然后在103中采用约束矩阵A的第一行构建出一个线性方程,并在104计算该线性方程。而且,在103和104计算的同时可以在102计算约束矩阵A的第二行,……,等等。
进一步地,105可以包括:确定所述对角矫正矩阵D的元素Dij为:
Figure PCTCN2016074728-appb-000029
其中,Dij表示所述对角矫正矩阵D的第i行第j列的元素,xi表示所述矫正向量x的第i个元素,i,j=1,2,…,n,所述对角矫正矩阵D的维度为n×n,并且,n为正整数。
也就是说,对角矫正矩阵D为:
Figure PCTCN2016074728-appb-000030
这样,本发明实施例中,利用Jacobi方法计算的到对角矫正矩阵D,进一步利用(5)式,便可以计算得到节点间的相似度。也就是说,106可以包括:根据下式计算所述节点之间的相似度:
S=D+cPTDP+c2(PT)2DP2+…+ct(PT)tDPt
其中,T表示转置,t为预设的正整数。
可理解,矩阵S的第i行第j列的元素sij表示第i个节点与第j个节点之间的相似度。这样,可以计算得到n个节点中每两个节点之间的相似度。也即,可以计算得到两两节点之间的相似度。
本发明实施例中,对t的大小不作限定,例如,可以是t=5,或者可以是t=20,等。可以理解,t的值越大,计算的精度越高,但是时间成本也越高。
本发明实施例中,衰减因子c∈(0,1)的大小可以是预设置的,例如,可以是c=0.6,本发明对此不作限定。
本发明实施例中,假设n个节点中的第i个节点为节点i,n个节点中的第j个节点为节点j。对于给定的节点i和节点j,那么,节点i和节点j之间的相似度可以通过如下的代码1(Algorithm 1)实现,可以称为SinglePairSimRank(i,j):
Figure PCTCN2016074728-appb-000031
举例来说,针对社交网络,如果只期望计算客户A与客户B之间的相似度,那么可以通过上述的Algorithm 1进行计算。并且,若假设网络中有 向边的数量为Q,那么SinglePairSimRank的时间复杂度为O(MQ),空间复杂度为O(Q)。
本发明实施例中,假设n个节点中的第i个节点为节点i,对于给定的节点i,可以计算其他所有的节点(即n个节点中除节点i之外的其他n-1个节点)与该节点i之间的相似度。并且可以通过如下的代码2(Algorithm 2)实现,可以称为SingleSourceSimRank(i):
Figure PCTCN2016074728-appb-000032
举例来说,在华为离网分析中,如果期望判断与客户A“相似”的客户,那么可以通过上述的Algorithm 2进行计算。并且,若假设网络中有向边的数量为Q,那么SingleSourceSimRank的时间复杂度为O(M2Q),空间复杂度为O(Q)。
本发明实施例中,可以利用上述的代码2(Algorithm 2),计算所有节点两两之间的相似度,并且可以通过如下的代码3(Algorithm 3)实现,可以称为AllPairsSimRank:
Figure PCTCN2016074728-appb-000033
Figure PCTCN2016074728-appb-000034
举例来说,在信息推荐的过程中,可以通过Algorithm 3计算所有节点之间的相似度,进而可以确定给每个客户分别推荐哪一类信息。并且,若假设网络中有向边的数量为Q,那么AllPairsSimRank的时间复杂度为O(M2Qn),空间复杂度为O(Q)。
图3是本发明一个实施例的用于相似性度量的设备的结构框图。图3所示的设备200包括获取单元201和处理单元202。
获取单元201,用于获取网络中的n个节点两两之间的指向关系,并用于获取衰减因子,其中,所述衰减因子为SimRank相似度方法中定义的衰减因子,n为大于或等于2的正整数;
处理单元202,用于根据获取单元201获取的所述指向关系确定转移矩阵,并根据所述转移矩阵和获取单元201获取的所述衰减因子计算约束矩阵,其中,所述转移矩阵的维度为n×n,所述约束矩阵的维度为n×n;
处理单元202,还用于根据所述约束矩阵,构建线性方程组,其中,所述线性方程组的系数矩阵为所述约束矩阵,所述线性方程组的变量为矫正向量;
处理单元202,还用于采用雅可比Jacobi方法迭代求解所述线性方程组,确定所述矫正向量;
处理单元202,还用于根据所述矫正向量生成对角矫正矩阵,其中,所述对角矫正矩阵的对角元素为所述矫正向量的分量,且所述对角矫正矩阵的维度为n×n;
处理单元202,还用于根据所述转移矩阵、所述对角矫正矩阵和获取单元201获取的所述衰减因子,计算所述n个节点之间的相似度。
本发明实施例中,采用Jacobi方法确定矫正向量,进一步可计算节点之间的相似度。在Jacobi方法的每一次迭代中,计算矫正向量的各个元素是互相独立的,这样能够并行计算,从而能够利用计算机集群有效地减少计算时间,降低计算时的时间复杂度和空间复杂度,并且能够适用于大网络。
具体地,本发明实施例中,所述转移矩阵表示为P,所述衰减因子表示 为c,约束矩阵表示为A,所述矫正向量表示为x,所述对角矫正矩阵表示为D,所述节点之间的相似度表示为S。
并且,所述转移矩阵P的维度为n×n,所述约束矩阵A的维度为n×n,所述对角矫正矩阵D的维度为n×n,所述节点之间的相似度S的维度为n×n。所述矫正向量x的维度为n。其中,n为正整数,并且n与节点的数量有关。
相应地,Pij表示所述转移矩阵P的第i行第j列的元素,aij表示所述约束矩阵A的第i行第j列的元素,xi表示所述矫正向量x的第i个元素,Dij表示所述对角矫正矩阵D的第i行第j列的元素。其中,i,j=1,2,…,n。
可选地,作为一个实施例中,所述约束矩阵表示为A,所述矫正向量表示为x,所述线性方程组表示为Ax=b,处理单元202,具体用于:采用Jacobi方法计算线性方程组Ax=b,其中,b为每个元素均为1的向量。
具体地,可以先初始化矫正向量为x(0),进一步通过迭代求解该线性方程组Ax=b。
在采用Jacobi方法迭代求解所述线性方程组,确定所述矫正向量时,处理单元202,具体用于:采用Jacobi方法迭代求解所述线性方程组,并将收敛时的解确定为所述矫正向量,或者,将达到预设的最大迭代次数时的解确定为所述矫正向量。
可选地,作为另一个实施例中,处理单元202,具体用于:通过
Figure PCTCN2016074728-appb-000035
计算所述矫正向量。其中,xi表示所述矫正向量x的第i个元素,xj表示所述矫正向量x的第j个元素,aij表示所述约束矩阵A的第i行第j列的元素,aii表示所述约束矩阵A的第i行第i列的元素,bi=1,k表示所述Jacobi方法的迭代次数,i,j=1,2,…,n,所述约束矩阵A的维度为n×n,并且,k和n均为正整数。
可选地,作为另一个实施例中,所述衰减因子表示为c,所述转移矩阵表示为P,所述约束矩阵表示为A,处理单元202,具体用于:确定所述约束矩阵A的元素为aij=ei·ej+cPei·Pej+…+ctPtei·Ptej。其中,ei、ej为正交单位向量,t为预设的正整数。
可选地,作为另一个实施例中,所述矫正向量表示为x,所述对角矫正矩阵表示为D,处理单元202,具体用于:确定所述对角矫正矩阵D的元素Dij为:
Figure PCTCN2016074728-appb-000036
其中,Dij表示所述对角矫正矩阵D的第i行第j列的元素,xi表示所述矫正向量x的第i个元素,i,j=1,2,…,n,所述对角矫正矩阵D的维度为n×n,并且,n为正整数。
可选地,作为另一个实施例中,所述衰减因子表示为c,所述转移矩阵表示为P,所述对角矫正矩阵表示为D,所述节点之间的相似度表示为S,处理单元202,具体用于:根据下式计算所述节点之间的相似度:
S=D+cPTDP+c2(PT)2DP2+…+ct(PT)tDPt,其中,T表示转置,t为预设的正整数,S所表示的矩阵的第i行第j列的元素sij表示第i个节点与第j个节点之间的相似度。
可选地,作为另一个实施例中,在获取单元201获取指向关系之后,处理单元202具体用于:根据获取单元201获取的所述网络中的n个节点两两之间的指向关系,构建图,其中,所述n个节点构成所述图中的n个节点,所述指向关系构成所述图中的节点之间的有向边;将所述图的逆向图上的一阶转移矩阵作为所述转移矩阵。
可选地,作为另一个实施例中,所述转移矩阵表示为P,并且
Figure PCTCN2016074728-appb-000037
其中,Pij表示所述转移矩阵P的第i行第j列的元素,In(j)表示所有指向节点j的节点集合,E表示具有指向关系的节点组的集合。
可选地,本发明实施例中,设备200可以为用于处理数据的服务器。例如,可以为社交网络的服务器。
设备200能够用于实现前述图1的实施例中的方法,为避免重复,这里不再赘述。
图4是本发明另一个实施例的用于相似性度量的设备的结构框图。图4所示的设备300包括处理器301、接收器302、发送器303和存储器304。
接收器302,用于获取网络中的n个节点两两之间的指向关系,并用于 获取衰减因子,其中,所述衰减因子为SimRank相似度方法中定义的衰减因子,n为大于或等于2的正整数;
处理器301,用于根据接收器302获取的所述指向关系确定转移矩阵,并根据所述转移矩阵和接收器302获取的所述衰减因子计算约束矩阵,其中,所述转移矩阵的维度为n×n,所述约束矩阵的维度为n×n;
处理器301,还用于根据所述约束矩阵,构建线性方程组,其中,所述线性方程组的系数矩阵为所述约束矩阵,所述线性方程组的变量为矫正向量;
处理器301,还用于采用雅可比Jacobi方法迭代求解所述线性方程组,确定所述矫正向量;
处理器301,还用于根据所述矫正向量生成对角矫正矩阵,其中,所述对角矫正矩阵的对角元素为所述矫正向量的分量,且所述对角矫正矩阵的维度为n×n;
处理器301,还用于根据所述转移矩阵、所述对角矫正矩阵和接收器302获取的所述衰减因子,计算所述n个节点之间的相似度。
本发明实施例中,采用Jacobi方法确定矫正向量,进一步可计算节点之间的相似度。在Jacobi方法的每一次迭代中,计算矫正向量的各个元素是互相独立的,这样能够并行计算,从而能够利用计算机集群有效地减少计算时间,降低计算时的时间复杂度和空间复杂度,并且能够适用于大网络。
设备300中的各个组件通过总线系统305耦合在一起,其中总线系统305除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图4中将各种总线都标为总线系统305。
上述本发明实施例揭示的方法可以应用于处理器301中,或者由处理器301实现。处理器301可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器301中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器301可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本发明实施例中的公开的各方法、步骤及逻辑框图。 通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本发明实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器304,处理器301读取存储器304中的信息,结合其硬件完成上述方法的步骤。
可以理解,本发明实施例中的存储器304可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。本文描述的系统和方法的存储器304旨在包括但不限于这些和任意其它适合类型的存储器。
可以理解的是,本文描述的这些实施例可以用硬件、软件、固件、中间件、微码或其组合来实现。对于硬件实现,处理单元可以实现在一个或多个专用集成电路(Application Specific Integrated Circuits,ASIC)、数字信号处理器(Digital Signal Processing,DSP)、数字信号处理设备(DSP Device,DSPD)、可编程逻辑设备(Programmable Logic Device,PLD)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)、通用处理器、控制器、微控制器、微处理器、用于执行本申请所述功能的其它电子单元或其组合中。
当在软件、固件、中间件或微码、程序代码或代码段中实现实施例时,它们可存储在例如存储部件的机器可读介质中。代码段可表示过程、函数、 子程序、程序、例程、子例程、模块、软件分组、类、或指令、数据结构或程序语句的任意组合。代码段可通过传送和/或接收信息、数据、自变量、参数或存储器内容来稿合至另一代码段或硬件电路。可使用包括存储器共享、消息传递、令牌传递、网络传输等任意适合方式来传递、转发或发送信息、自变量、参数、数据等。
对于软件实现,可通过执行本文所述功能的模块(例如过程、函数等)来实现本文所述的技术。软件代码可存储在存储器单元中并通过处理器执行。存储器单元可以在处理器中或在处理器外部实现,在后一种情况下存储器单元可经由本领域己知的各种手段以通信方式耦合至处理器。
具体地,本发明实施例中,所述转移矩阵表示为P,所述衰减因子表示为c,约束矩阵表示为A,所述矫正向量表示为x,所述对角矫正矩阵表示为D,所述节点之间的相似度表示为S。
并且,所述转移矩阵P的维度为n×n,所述约束矩阵A的维度为n×n,所述对角矫正矩阵D的维度为n×n,所述节点之间的相似度S的维度为n×n。所述矫正向量x的维度为n。其中,n为正整数,并且n与节点的数量有关。
相应地,Pij表示所述转移矩阵P的第i行第j列的元素,aij表示所述约束矩阵A的第i行第j列的元素,xi表示所述矫正向量x的第i个元素,Dij表示所述对角矫正矩阵D的第i行第j列的元素。其中,i,j=1,2,…,n。
可选地,作为一个实施例中,所述约束矩阵表示为A,所述矫正向量表示为x,所述线性方程组表示为Ax=b,处理器301,具体用于:采用Jacobi方法计算线性方程组Ax=b,其中,b为每个元素均为1的向量。
具体地,可以先初始化矫正向量为x(0),进一步通过迭代求解该线性方程组Ax=b。
在采用Jacobi方法迭代求解所述线性方程组,确定所述矫正向量时,处理器301,具体用于:采用Jacobi方法迭代求解所述线性方程组,并将收敛时的解确定为所述矫正向量,或者,将达到预设的最大迭代次数时的解确定为所述矫正向量。
可选地,作为另一个实施例中,处理器301,具体用于:通过
Figure PCTCN2016074728-appb-000038
计算所述矫正向量。其中,xi表示所述矫正向量x的第i个元素,xj表示所述矫正向量x的第j个元素,aij表示所述约束矩阵A的第i行第j列的元素,aii表示所述约束矩阵A的第i行第i列的元素,bi=1,k表示所述Jacobi方法的迭代次数,i,j=1,2,…,n,所述约束矩阵A的维度为n×n,并且,k和n均为正整数。
可选地,作为另一个实施例中,所述衰减因子表示为c,所述转移矩阵表示为P,所述约束矩阵表示为A,处理器301,具体用于:确定所述约束矩阵A的元素为aij=ei·ej+cPei·Pej+…+ctPtei·Ptej。其中,ei、ej为正交单位向量,t为预设的正整数。
可选地,作为另一个实施例中,所述矫正向量表示为x,所述对角矫正矩阵表示为D,处理器301,具体用于:确定所述对角矫正矩阵D的元素Dij为:
Figure PCTCN2016074728-appb-000039
其中,Dij表示所述对角矫正矩阵D的第i行第j列的元素,xi表示所述矫正向量x的第i个元素,i,j=1,2,…,n,所述对角矫正矩阵D的维度为n×n,并且,n为正整数。
可选地,作为另一个实施例中,所述衰减因子表示为c,所述转移矩阵表示为P,所述对角矫正矩阵表示为D,所述节点之间的相似度表示为S,处理器301,具体用于:根据下式计算所述节点之间的相似度:
S=D+cPTDP+c2(PT)2DP2+…+ct(PT)tDPt,其中,T表示转置,t为预设的正整数,S所表示的矩阵的第i行第j列的元素sij表示第i个节点与第j个节点之间的相似度。
可选地,作为另一个实施例中,在接收器302获取指向关系之后,处理器301具体用于:根据获取的所述网络中的n个节点两两之间的指向关系,构建图,其中,所述n个节点构成所述图中的n个节点,所述指向关系构成所述图中的节点之间的有向边;将所述图的逆向图上的一阶转移矩阵作为所述转移矩阵。
可选地,作为另一个实施例中,所述转移矩阵表示为P,并且
Figure PCTCN2016074728-appb-000040
其中,Pij表示所述转移矩阵P的第i行第j列的元素,In(j)表示所有指向节点j的节点集合,E表示具有指向关系的节点组的集合。
可理解,本发明实施例中,发送器303可用于将处理器301计算得到的相似度的值进行输出,例如,可以输出至设备300的显示屏,或者可以输出至于该设备300连接的其他的设备或装置。
可理解,本发明实施例中,存储器304可用于存储计算所需要的预设值(如c,t的值),还可以用于存储处理器301所执行的代码(例如,图1所示的实施例中的Algorithm 1、Algorithm 2和Algorithm 3),还可以用于存储计算过程中的中间结果等。
可选地,本发明实施例中,设备300可以为用于处理数据的服务器。例如,可以为社交网络的服务器。
设备300能够用于实现前述图1的实施例中的方法,为避免重复,这里不再赘述。
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本发明的范围。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的系统、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。

Claims (18)

  1. 一种相似性度量的方法,其特征在于,包括:
    获取网络中的n个节点两两之间的指向关系,并根据所述指向关系确定转移矩阵,其中,所述转移矩阵的维度为n×n,n为大于或等于2的正整数;
    获取衰减因子,并根据所述转移矩阵和所述衰减因子计算约束矩阵,其中,所述衰减因子为SimRank相似度方法中定义的衰减因子,且所述约束矩阵的维度为n×n;
    根据所述约束矩阵,构建线性方程组,其中,所述线性方程组的系数矩阵为所述约束矩阵,所述线性方程组的变量为矫正向量;
    采用雅可比Jacobi方法迭代求解所述线性方程组,确定所述矫正向量;
    根据所述矫正向量生成对角矫正矩阵,其中,所述对角矫正矩阵的对角元素为所述矫正向量的分量,且所述对角矫正矩阵的维度为n×n;
    根据所述转移矩阵、所述衰减因子和所述对角矫正矩阵,计算所述n个节点之间的相似度。
  2. 根据权利要求1所述的方法,其特征在于,所述采用Jacobi方法迭代求解所述线性方程组,确定所述矫正向量,包括:
    采用Jacobi方法迭代求解所述线性方程组,并将收敛时的解确定为所述矫正向量,或者,将达到预设的最大迭代次数时的解确定为所述矫正向量。
  3. 根据权利要求1或2所述的方法,其特征在于,所述约束矩阵表示为A,所述矫正向量表示为x,所述线性方程组表示为Ax=b,
    其中,b为每个元素均为1的向量。
  4. 根据权利要求3所述的方法,其特征在于,所述采用Jacobi方法迭代求解所述线性方程组,确定所述矫正向量,包括:
    通过
    Figure PCTCN2016074728-appb-100001
    计算所述矫正向量;
    其中,xi表示所述矫正向量x的第i个元素,xj表示所述矫正向量x的第j个元素,aij表示所述约束矩阵A的第i行第j列的元素,aii表示所述约束矩阵A的第i行第i列的元素,bi=1,k表示所述Jacobi方法的迭代次数,i,j=1,2,…,n,并且,k为正整数。
  5. 根据权利要求1至4任一项所述的方法,其特征在于,所述衰减因 子表示为c,所述转移矩阵表示为P,所述约束矩阵表示为A,所述根据所述转移矩阵和所述衰减因子计算约束矩阵,包括:
    确定所述约束矩阵A的元素为aij=ei·ej+cPei·Pej+…+ctPtei·Ptej
    其中,ei、ej为正交单位向量,t为预设的正整数。
  6. 根据权利要求1至5任一项所述的方法,其特征在于,所述矫正向量表示为x,所述对角矫正矩阵表示为D,所述根据所述矫正向量生成对角矫正矩阵,包括:
    确定所述对角矫正矩阵D的元素Dij为:
    Figure PCTCN2016074728-appb-100002
    其中,Dij表示所述对角矫正矩阵D的第i行第j列的元素,xi表示所述矫正向量x的第i个元素,i,j=1,2,…,n。
  7. 根据权利要求1至6任一项所述的方法,其特征在于,所述衰减因子表示为c,所述转移矩阵表示为P,所述对角矫正矩阵表示为D,所述节点之间的相似度表示为S,所述根据所述转移矩阵、所述衰减因子和所述对角矫正矩阵,计算所述n个节点之间的相似度,包括:
    根据下式计算所述n个节点之间的相似度:
    S=D+cPTDP+c2(PT)2DP2+…+ct(PT)tDPt
    其中,T表示转置,t为预设的正整数,S所表示的矩阵的第i行第j列的元素sij表示第i个节点与第j个节点之间的相似度。
  8. 根据权利要求1至7任一项所述的方法,其特征在于,所述获取网络中的n个节点两两之间的指向关系,并根据所述指向关系确定转移矩阵,包括:
    根据所述网络中的n个节点两两之间的指向关系,构建图,其中,所述n个节点构成所述图中的n个节点,所述指向关系构成所述图中的节点之间的有向边;
    将所述图的逆向图上的一阶转移矩阵作为所述转移矩阵。
  9. 根据权利要求8所述的方法,其特征在于,所述转移矩阵表示为P,并且
    Figure PCTCN2016074728-appb-100003
    其中,Pij表示所述转移矩阵P的第i行第j列的元素,In(j)表示所有指向节点j的节点集合,E表示具有指向关系的节点组的集合。
  10. 一种用于相似性度量的设备,其特征在于,包括:
    获取单元,用于获取网络中的n个节点两两之间的指向关系,并用于获取衰减因子,其中,所述衰减因子为SimRank相似度方法中定义的衰减因子,n为大于或等于2的正整数;
    处理单元,用于根据所述获取单元获取的所述指向关系确定转移矩阵,并根据所述转移矩阵和所述获取单元获取的所述衰减因子计算约束矩阵,其中,所述转移矩阵的维度为n×n,所述约束矩阵的维度为n×n;
    所述处理单元,还用于根据所述约束矩阵,构建线性方程组,其中,所述线性方程组的系数矩阵为所述约束矩阵,所述线性方程组的变量为矫正向量;
    所述处理单元,还用于采用雅可比Jacobi方法迭代求解所述线性方程组,确定所述矫正向量;
    所述处理单元,还用于根据所述矫正向量生成对角矫正矩阵,其中,所述对角矫正矩阵的对角元素为所述矫正向量的分量,且所述对角矫正矩阵的维度为n×n;
    所述处理单元,还用于根据所述转移矩阵、所述对角矫正矩阵和所述获取单元获取的所述衰减因子,计算所述n个节点之间的相似度。
  11. 根据权利要求10所述的设备,其特征在于,所述处理单元,具体用于:
    采用Jacobi方法迭代求解所述线性方程组,并将收敛时的解确定为所述矫正向量,或者,将达到预设的最大迭代次数时的解确定为所述矫正向量。
  12. 根据权利要求10或11所述的设备,其特征在于,所述约束矩阵表示为A,所述矫正向量表示为x,所述线性方程组表示为Ax=b,
    其中,b为每个元素均为1的向量。
  13. 根据权利要求12所述的设备,其特征在于,所述处理单元,具体用于:
    通过
    Figure PCTCN2016074728-appb-100004
    计算所述矫正向量;
    其中,xi表示所述矫正向量x的第i个元素,xj表示所述矫正向量x的第j个元素,aij表示所述约束矩阵A的第i行第j列的元素,aii表示所述约束矩阵A的第i行第i列的元素,bi=1,k表示所述Jacobi方法的迭代次数,i,j=1,2,…,n,并且,k为正整数。
  14. 根据权利要求10至13任一项所述的设备,其特征在于,所述衰减因子表示为c,所述转移矩阵表示为P,所述约束矩阵表示为A,所述处理单元,具体用于:
    确定所述约束矩阵A的元素为aij=ei·ej+cPei·Pej+…+ctPtei·Ptej
    其中,ei、ej为正交单位向量,t为预设的正整数。
  15. 根据权利要求10至14任一项所述的设备,其特征在于,所述矫正向量表示为x,所述对角矫正矩阵表示为D,所述处理单元,具体用于:
    确定所述对角矫正矩阵D的元素Dij为:
    Figure PCTCN2016074728-appb-100005
    其中,Dij表示所述对角矫正矩阵D的第i行第j列的元素,xi表示所述矫正向量x的第i个元素,i,j=1,2,…,n。
  16. 根据权利要求10至15任一项所述的设备,其特征在于,所述衰减因子表示为c,所述转移矩阵表示为P,所述对角矫正矩阵表示为D,所述n个节点之间的相似度表示为S,所述处理单元,具体用于:
    根据下式计算所述节点之间的相似度:
    S=D+cPTDP+c2(PT)2DP2+…+ct(PT)tDPt
    其中,T表示转置,t为预设的正整数,S所表示的矩阵的第i行第j列的元素sij表示第i个节点与第j个节点之间的相似度。
  17. 根据权利要求10至16任一项所述的设备,其特征在于,所述处理单元,具体用于:
    根据所述获取单元获取的所述网络中的n个节点两两之间的指向关系, 构建图,其中,所述n个节点构成所述图中的n个节点,所述指向关系构成所述图中的节点之间的有向边;
    将所述图的逆向图上的一阶转移矩阵作为所述转移矩阵。
  18. 根据权利要求17所述的设备,其特征在于,所述转移矩阵表示为P,并且
    Figure PCTCN2016074728-appb-100006
    其中,Pij表示所述转移矩阵P的第i行第j列的元素,In(j)表示所有指向节点j的节点集合,E表示具有指向关系的节点组的集合。
PCT/CN2016/074728 2015-03-03 2016-02-26 相似性度量的方法及设备 WO2016138836A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
EP16758452.3A EP3258368B1 (en) 2015-03-03 2016-02-26 Similarity measurement method and equipment
US15/694,559 US10579703B2 (en) 2015-03-03 2017-09-01 Similarity measurement method and device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510093574.2A CN105989154B (zh) 2015-03-03 2015-03-03 相似性度量的方法及设备
CN201510093574.2 2015-03-03

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/694,559 Continuation US10579703B2 (en) 2015-03-03 2017-09-01 Similarity measurement method and device

Publications (1)

Publication Number Publication Date
WO2016138836A1 true WO2016138836A1 (zh) 2016-09-09

Family

ID=56848352

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/074728 WO2016138836A1 (zh) 2015-03-03 2016-02-26 相似性度量的方法及设备

Country Status (4)

Country Link
US (1) US10579703B2 (zh)
EP (1) EP3258368B1 (zh)
CN (1) CN105989154B (zh)
WO (1) WO2016138836A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508727A (zh) * 2018-04-23 2019-03-22 北京航空航天大学 一种基于加权欧氏距离的度量功能间相似性的方法

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108052485B (zh) * 2017-12-15 2021-05-07 东软集团股份有限公司 向量相似度的分布式计算方法和装置,存储介质和节点
CN110751161B (zh) * 2018-07-23 2023-08-22 阿里巴巴(中国)有限公司 基于Spark的节点相似度计算方法、装置及终端

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090262664A1 (en) * 2008-04-18 2009-10-22 Bonnie Berger Leighton Method for identifying network similarity by matching neighborhood topology
CN101894123A (zh) * 2010-05-11 2010-11-24 清华大学 基于子图的链接相似度的快速近似计算系统和方法
CN103177414A (zh) * 2013-03-27 2013-06-26 天津大学 一种基于结构的图节点相似度并行计算方法
JP2013196201A (ja) * 2012-03-16 2013-09-30 Nippon Telegr & Teleph Corp <Ntt> 類似ノード検索装置及び方法及びプログラム

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101576903B (zh) * 2009-03-03 2011-03-30 杜小勇 一种文档相似度衡量方法
US8311950B1 (en) * 2009-10-01 2012-11-13 Google Inc. Detecting content on a social network using browsing patterns
JP5315291B2 (ja) * 2010-04-30 2013-10-16 インターナショナル・ビジネス・マシーンズ・コーポレーション グラフにおけるノードの間の類似度を計算するための方法、プログラム、およびシステム
WO2012118087A1 (ja) * 2011-03-03 2012-09-07 日本電気株式会社 レコメンダシステム、レコメンド方法、及びプログラム
US8582554B2 (en) 2011-04-21 2013-11-12 International Business Machines Corporation Similarity searching in large disk-based networks
US20130346386A1 (en) * 2012-06-22 2013-12-26 Microsoft Corporation Temporal topic extraction
CN103412872B (zh) * 2013-07-08 2017-04-26 西安交通大学 一种基于有限节点驱动的微博社会网络信息推荐方法
US10115115B2 (en) * 2014-09-16 2018-10-30 Microsoft Technology Licensing, Llc Estimating similarity of nodes using all-distances sketches
CN104361062B (zh) * 2014-11-03 2017-10-31 百度在线网络技术(北京)有限公司 一种关联信息的推荐方法及装置

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090262664A1 (en) * 2008-04-18 2009-10-22 Bonnie Berger Leighton Method for identifying network similarity by matching neighborhood topology
CN101894123A (zh) * 2010-05-11 2010-11-24 清华大学 基于子图的链接相似度的快速近似计算系统和方法
JP2013196201A (ja) * 2012-03-16 2013-09-30 Nippon Telegr & Teleph Corp <Ntt> 類似ノード検索装置及び方法及びプログラム
CN103177414A (zh) * 2013-03-27 2013-06-26 天津大学 一种基于结构的图节点相似度并行计算方法

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
MAEHARA, T. ET AL.: "Efficient SimRank Computation via Linearization", vol. 1411, 26 November 2014 (2014-11-26), pages 7729, XP055417598 *
WU, DANYU: "the Comparison between Jacobi Iteration and Gauss-Seidel Iteration", JOURNAL OF ZHONGKAI UNIVERSITY OF AGRICULTURE AND TECHNOLOGY, vol. 18, no. 3, 31 December 2005 (2005-12-31), pages 48 - 50, XP009500981 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109508727A (zh) * 2018-04-23 2019-03-22 北京航空航天大学 一种基于加权欧氏距离的度量功能间相似性的方法
CN109508727B (zh) * 2018-04-23 2021-07-16 北京航空航天大学 一种基于加权欧氏距离的度量功能间相似性的方法

Also Published As

Publication number Publication date
EP3258368B1 (en) 2020-01-29
US10579703B2 (en) 2020-03-03
CN105989154A (zh) 2016-10-05
CN105989154B (zh) 2020-07-14
EP3258368A4 (en) 2018-03-21
US20170364478A1 (en) 2017-12-21
EP3258368A1 (en) 2017-12-20

Similar Documents

Publication Publication Date Title
Ye The sharp existence of constrained minimizers for a class of nonlinear Kirchhoff equations
US10936765B2 (en) Graph centrality calculation method and apparatus, and storage medium
US20150339493A1 (en) Privacy protection against curious recommenders
Ke et al. On a class of fractional order differential inclusions with infinite delays
WO2018205999A1 (zh) 一种数据处理方法及装置
WO2016138836A1 (zh) 相似性度量的方法及设备
Lal et al. Topological right gyrogroups and gyrotransversals
Groeneboom et al. Estimation in monotone single‐index models
Birkenmeier et al. When is a sum of annihilator ideals an annihilator ideal?
Jorgensen et al. Graph Laplacians and discrete reproducing kernel Hilbert spaces from restrictions
Ackermann et al. Alternating sign multibump solutions of nonlinear elliptic equations in expanding tubular domains
Shi et al. Nonconforming H 1-Galerkin mixed finite element method for strongly damped wave equations
Ali et al. Commuting values of generalized derivations on multilinear polynomials
CN116668351A (zh) 服务质量预测方法、装置、计算机设备及存储介质
AU2013377887A1 (en) Privacy protection against curious recommenders
CN115293252A (zh) 信息分类的方法、装置、设备和介质
Li et al. New error estimates of nonconforming mixed finite element methods for the Stokes problem
Kobayashi et al. Tail asymptotics of the occupation measure for a Markov additive process with an M/G/1-type background process
Anh et al. Fixed point methods for pseudomonotone variational inequalities involving strict pseudocontractions
Le et al. Comment: Ridge regression and regularization of large matrices
Ciuperca Empirical likelihood for nonlinear models with missing responses
He et al. A class of nonlinear proximal point algorithms for variational inequality problems
Jin et al. B-spline estimation for partially linear varying coefficient composite quantile regression models
Hackmann Karhunen–Loeve expansions of Lévy processes
Qin et al. Quantile estimation in the presence of auxiliary information under negatively associated samples

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16758452

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

REEP Request for entry into the european phase

Ref document number: 2016758452

Country of ref document: EP