WO2020192289A1 - 确定关系网络图中图节点向量的方法及装置 - Google Patents

确定关系网络图中图节点向量的方法及装置 Download PDF

Info

Publication number
WO2020192289A1
WO2020192289A1 PCT/CN2020/075012 CN2020075012W WO2020192289A1 WO 2020192289 A1 WO2020192289 A1 WO 2020192289A1 CN 2020075012 W CN2020075012 W CN 2020075012W WO 2020192289 A1 WO2020192289 A1 WO 2020192289A1
Authority
WO
WIPO (PCT)
Prior art keywords
node
association
account
degree
vector
Prior art date
Application number
PCT/CN2020/075012
Other languages
English (en)
French (fr)
Inventor
曹绍升
Original Assignee
阿里巴巴集团控股有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 阿里巴巴集团控股有限公司 filed Critical 阿里巴巴集团控股有限公司
Publication of WO2020192289A1 publication Critical patent/WO2020192289A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Definitions

  • One or more embodiments of this specification relate to the field of computer information processing technology, and in particular to a computer-executed method and device for determining a graph node vector in a relational network graph.
  • the relationship network diagram is a description of the relationship between entities in the real world, and is currently widely used in various computer information processing.
  • a relational network graph includes a set of nodes and a set of edges.
  • the nodes represent entities in the real world
  • the edges represent the connections between entities in the real world. For example, in social networks, people are entities, and the relationships or connections between people are edges.
  • each node (entity) in the relational network graph with a vector of the same dimension, that is, to generate a node vector for each node.
  • the generated node vector can be used to calculate the similarity between nodes and nodes, discover the community structure in the graph, predict the possible future edge connections, and visualize the graph, etc.
  • the node vector generation method has become the basic algorithm of graph calculation. According to one solution, an unsupervised generation method can be used to generate node vectors of nodes in the relational network graph. However, the existing unsupervised generation methods cannot meet the accuracy requirements of node vectors.
  • One or more embodiments of this specification describe a computer-executed method and device for determining node vectors in a relational network graph. Through this method, the accuracy of the generated node vector can be effectively improved.
  • the relational network graph includes N nodes and connecting edges between the nodes.
  • the N nodes include any The first node; the method includes: acquiring the adjacency information of the relationship network graph, the adjacency information is used to record the connection relationship between nodes in the relationship network graph; according to the adjacency information, determining the first The first degree of association between the node and each of the N nodes, to obtain N first degree of association; wherein, each node includes a second node, and the first node and the second node are A degree of association is related to the path of the first node to the second node through a predetermined number of connecting edges within K; based on the N first degree of association, the second degree of association between the first node and each node is determined , Obtain N second degree of association; wherein, the second degree of association between the first node and the second node is based on the first degree
  • the N nodes correspond to N users, and the connecting edge between the nodes indicates that there is an association relationship between the two correspondingly connected users.
  • the adjacency information is an adjacency matrix
  • the determining the first degree of association between the first node and each of the N nodes includes: determining the symmetry corresponding to the adjacency matrix Matrix; add the symmetric matrix from 1 to the predetermined number K to obtain a first matrix, the first matrix includes a first element, the row and column of the first element correspond to A first node and a second node, and the value of the first element represents a first degree of association between the first node and the second node.
  • the relational network graph is a directed graph
  • the determining the symmetric matrix corresponding to the adjacency matrix includes: summing the transposes of the adjacency matrix and the adjacency matrix , To obtain the symmetric matrix.
  • the determining the second degree of association between the first node and each node includes: dividing the first degree of association between the first node and the second node by the N first nodes The sum of the degree of association; based on the obtained quotient, the second degree of association between the first node and the second node is determined.
  • the determining the second degree of association between the first node and the second node based on the obtained quotient includes: using the quotient as the first node The second degree of association with the second node; or, the quotient is used as the input of a preset increasing function, and the obtained output result is determined as the second association between the first node and the second node degree.
  • the N-dimensional data is an N-dimensional vector
  • the constructing the N-dimensional data based on at least the N second degrees of association includes: forming the N second degrees of association into the first The N-dimensional vector of the node; said performing dimensionality reduction processing on the N-dimensional data to obtain the node vector of the first node includes: inputting the N-dimensional vector into a restricted Boltzmann machine to obtain the The node vector of the first node.
  • the N-dimensional data is an N-dimensional matrix
  • the constructing the N-dimensional data based on at least the N second degree of association includes: combining N second nodes corresponding to each of the N nodes The degree of association is respectively used as the row data corresponding to each node to obtain an N-dimensional matrix
  • the performing dimensionality reduction processing on the N-dimensional data to obtain the node vector of the first node includes: performing singularity on the N-dimensional matrix Value decomposition is performed to obtain the corresponding left singular matrix; the vector composed of each row of data in the left singular matrix is used as the node vector of each corresponding node.
  • an apparatus for determining a node vector in a relational network graph includes N nodes and connecting edges between nodes, and the N nodes include any first node;
  • the device includes: an acquiring unit configured to acquire adjacency information of the relationship network graph, where the adjacency information is used to record the connection relationship between nodes in the relationship network graph; and a first determining unit is configured to Adjacency information, determine the first degree of association between the first node and each of the N nodes, and obtain N first degrees of association; wherein, each node includes a second node, and the first node The first degree of association with the second node is related to the path from the first node to the second node through a predetermined number of connecting edges within K; the second determining unit is configured to be based on N first degrees of association , Determine the second degree of association between the first node and each node, and obtain N second degree of association; wherein, the second degree of association between the first node and
  • a method for determining the risk status of an account comprising: obtaining adjacency information of an account network diagram, the account network diagram including N accounts and connection edges between accounts, the adjacent The information is used to record the connection relationship between accounts in the account network diagram.
  • a first vector corresponding to the first account to be tested among the N accounts and a second vector corresponding to a known account whose account risk status is known are determined through vector embedding processing, wherein the vector is embedded
  • the processing includes: determining the first degree of association between any first account among the N accounts and each of the N accounts to obtain N first degree of association; wherein, each account includes a second account , The first degree of association between the first account and the second account is related to the path that the first account takes to reach the second account through a predetermined number of connection edges within K; based on the N first degree of association, Determine the second degree of association between the first account and each account to obtain N second degree of association; wherein, the second degree of association between the first account and the second account is based on the first account and the first account.
  • a device for determining the risk status of an account comprising: an obtaining unit configured to obtain adjacency information of an account network diagram, the account network diagram including N accounts and connections between accounts
  • the adjacency information is used to record the connection relationship between accounts in the account network graph.
  • the first determining unit is configured to determine the first vector corresponding to the first account under test among the N accounts and the second vector corresponding to the known account whose account risk status is known through vector embedding processing according to the adjacency information Vector, wherein the first determining unit specifically includes: a first determining subunit configured to determine the first degree of association between any first account among the N accounts and each of the N accounts, to obtain N first degree of relevance; wherein each account includes a second account, and the first degree of relevance between the first account and the second account reaches the first account through a predetermined number of connections within K The path of the second account is related; the second determining subunit is configured to determine the second degree of relevance between the first account and each account based on the N first degrees of relevance to obtain N second degrees of relevance; wherein, The second degree of association between the first account and the second account is determined based on the first degree of association between the first account and the second account, and the sum of the N first degree of association; The subunit is configured to construct N-dimensional data based on at least
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method of the first aspect or the third aspect.
  • a computing device including a memory and a processor, the memory stores executable code, and when the processor executes the executable code, the method of the first aspect or the third aspect is implemented .
  • the method for determining the node vector in the relational network graph disclosed in the embodiment of this specification can effectively improve the accuracy of the generated node vector.
  • Figure 1 shows a schematic diagram of a relationship network diagram
  • Fig. 2 shows a flowchart of a method for determining a node vector in a relational network diagram according to an embodiment
  • Fig. 3 shows a schematic diagram of an undirected graph according to an embodiment
  • Fig. 4 shows a schematic diagram of a directed graph according to an embodiment
  • Fig. 5 shows a schematic diagram of a directed graph with connected edges having weight values according to an embodiment
  • Fig. 6 shows a structural diagram of an apparatus for determining a node vector in a relational network diagram according to an embodiment
  • FIG. 7 shows a flowchart of a method for determining the risk status of an account according to an embodiment
  • Fig. 8 shows a structural diagram of a device for determining the risk status of an account according to an embodiment.
  • the relational network graph can be abstracted to include a set of nodes and a set of edges, where nodes represent entities in the real world, and edges represent association relationships between entities.
  • Figure 1 shows a schematic diagram of a relational network diagram, in which users are used as nodes for example. As shown in the figure, users with an association relationship are connected by edges.
  • a supervised algorithm or an unsupervised algorithm can be used to generate the node vector of the node in the above-mentioned relational network graph.
  • the existing unsupervised generation algorithms are difficult to meet the accuracy requirements for node vectors.
  • the embodiments of this specification provide an unsupervised generation method, which can generate node vectors with higher accuracy. In the following, the method will be introduced in conjunction with specific embodiments.
  • Fig. 2 shows a flowchart of a method for determining a node vector in a relational network graph according to an embodiment.
  • the execution subject of the method may be any device or device or platform or device cluster with computing and processing capabilities.
  • the relationship network graph corresponding to the method includes multiple nodes and connecting edges between the nodes.
  • N nodes the above multiple nodes are collectively referred to as N nodes below, where N refers to the number corresponding to the multiple nodes. Specifically, N may be an integer greater than 2, such as 1 million or 100 million and so on. And, use the first node to refer to any one of the N nodes. In addition, the method will be described below mainly from the perspective of determining the node vector of the first node.
  • the method includes the following steps: step S210, acquiring adjacency information of the relationship network graph, where the adjacency information is used to record the connection relationship between nodes in the relationship network graph; step S220, according to all The adjacency information determines the first degree of association between the first node and each of the N nodes to obtain N first degrees of association; wherein, each node includes a second node, and the first node The first degree of association between the node and the second node is related to the path from the first node to the second node through a predetermined number of connecting edges within K; step S230, based on the N first degree of association, determine The second degree of association between the first node and each node obtains N second degree of association; wherein, the second degree of association between the first node and the second node is based on the first node and the second node The first degree of relevance between, and the sum of the N first degree of relevance; step S240, construct N-dimensional data based on at least the N
  • the method in FIG. 2 can determine the node vector in the relationship network graph, which includes nodes representing entities and edges representing association relationships between entities.
  • the relationship network graph is an undirected graph, that is, the association relationship between entities is not directional, or can be understood as two-way intercommunication. Accordingly, the edge representing the association relationship between entities is an undirected edge , Which can be represented by a line without an arrow.
  • the relational network diagram shown in Figure 3 is an undirected graph.
  • the nodes in the relationship network graph correspond to users, and the users can be identified by the user's ID or account number.
  • the connecting edges between nodes correspond to non-directional association relationships between users, and may specifically include one or more of the following, such as social relationships, media relationships, and kinship relationships.
  • a social network formed based on the above social relationship if two users have a common following object (for example, a Weibo account is following the same person), or they have previously contacted, or joined a common group (for example, QQ group, WeChat group, etc.), or there is interaction in red envelopes, lottery and other activities, then it can be considered that there is a social relationship between these two nodes, and an undirected edge can be established to connect.
  • a common following object for example, a Weibo account is following the same person
  • a common group For example, QQ group, WeChat group, etc.
  • a media network formed based on the above-mentioned media relationship if two users have used the same media, such as encrypted bank cards, ID cards, mailboxes, account numbers, mobile phone numbers, and physical addresses (such as MAC addresses) ), terminal device number (such as UMID, TID, UTDID), etc., there is a media relationship between the two users, and an undirected edge can be established for connection.
  • an undirected edge can be established for connection.
  • the kinship network formed based on the above-mentioned kinship relationship if two users have activated the intimate payment function on the payment platform, or the mobile phone numbers belong to the same kinship number combination, an undirected connection can be established.
  • nodes in the relationship network graph may correspond to commodities, and commodities may be identified by commodity IDs.
  • the connecting edges between nodes correspond to non-directional association relationships between commodities, and may specifically include one or more of the following, such as buyer association relationships, seller association relationships, category association relationships, etc.
  • buyer association relationships if two products have been purchased by the same buyer, then the two products can be considered to have a buyer relationship.
  • the two products can be considered to have a seller-related relationship.
  • two commodities are classified into the same predetermined level by the system platform, then the two commodities can be considered to have a category association relationship.
  • the relationship network graph is a directed graph, that is, the association relationship between entities is directional.
  • the edge representing the association relationship between the entities is a directed edge.
  • an arrow with Connection means the relational network diagram shown in Figure 4 is a directed graph.
  • the nodes in the relationship network graph correspond to users, and the connection edges between nodes correspond to directional association relationships between users, such as transfer relationships, lending relationships, and subordinate relationships. Wait.
  • user A has transferred money to user B, but user B has not transferred money to user A, then it can be considered that the two users have a one-way transfer relationship from user A to user B. Accordingly, a transfer can be established.
  • User A connects to the directed edge of user B.
  • user C and user D have transferred money to each other, then the two users can be considered to have a two-way transfer relationship. Accordingly, a directed edge from user C to user D and a directed edge from user D to user D can be established. The directed edge of user C is connected.
  • the nodes in the relationship network graph may also represent other entities, and according to one or more relationships between entities, the nodes are connected by connecting edges.
  • the relationship network graph is a weighted graph, that is, the edges in it have corresponding weight values.
  • the specific weight values can be based on the historical data corresponding to the association relationship represented by the edges and predetermined rules/ Algorithm is determined.
  • the directed edges in the directed graph shown in FIG. 5 have weights.
  • the node may represent a user, and the weight corresponds to the historical transfer times of one user to another user.
  • the relationship network graph is an unweighted graph, that is, the edges in the graph do not consider weights, or it can be understood that all edges in the same network relationship graph have the same weight value.
  • the undirected edges in the undirected graph shown in FIG. 3 and the directed edges in the directed graph shown in FIG. 4 do not have weights.
  • step S210 the adjacency information of the relational network graph is obtained, and the adjacency information is used to record the connection relationship between nodes in the relational network graph.
  • the adjacency information can correspond to a variety of storage methods of the relational network graph, including adjacency matrix, edge array, adjacency list, cross-linked list and adjacency multi-list, etc., which can all be used to record the connections between nodes in the graph. relationship.
  • the connection relationship of the nodes may include whether there are connecting edges between the nodes, the directionality and weight value of the connecting edges, etc. For details, please refer to the above, and will not be repeated here.
  • the relational network graph is pre-stored in the form of an adjacency matrix, and accordingly, the acquired adjacency information is the corresponding adjacency matrix.
  • the above-mentioned relational network graph includes N nodes, and the corresponding adjacency matrix is an N-order square matrix X, where the elements X i, j represent the number of nodes from the node numbered i to The value corresponding to the connecting edge between nodes numbered j.
  • the relational network graph is an undirected graph without weight; in this case, the value of an element in the adjacency matrix is 0, which means that there is no connecting edge between the corresponding two nodes, and the value is 1.
  • the relational network graph is a weighted graph; in this case, the value of an element in the adjacency matrix is 5, which means that the weight value of the corresponding connected edge is 5.
  • the relational network graph is pre-stored in the form of an adjacency table, and accordingly, the acquired adjacency information is the corresponding adjacency table.
  • step S220 according to the adjacency information, a first degree of association between the first node and each of the N nodes is determined to obtain N first degrees of association; wherein, each node includes The second node, the first degree of association between the first node and the second node, is related to the path from the first node to the second node through a predetermined number of connecting edges within K.
  • the first degree of association between the first node and the second node is related to a path from the first node to the second node through a predetermined number K of connecting edges.
  • the predetermined number K is a positive integer, which can be preset by the staff according to actual needs, for example, it can be set to 2, or 3 or 4, and so on.
  • the relational network graph is an unweighted graph. Accordingly, the first degree of association between the first node and the second node can reach the first node through a predetermined number of connecting edges within K. The number of paths between the two nodes is positively correlated. In a specific embodiment, the number of paths described above may be determined as the first degree of association between the first node and the second node. In a specific embodiment, the number of the aforementioned paths may be determined based on the adjacency information by traversing the adjacency table or the like.
  • the relational network graph is a weighted graph.
  • the first degree of association between the first node and the second node can reach the second node through a predetermined number of connections between the first node and the second node.
  • the weight value of the edge corresponding to each path in the path is related.
  • the product value obtained by multiplying at least one weight value corresponding to at least one connecting edge included in each path may be used as the path weight of the corresponding path, and then the path weight of each path Sum, and use the obtained sum as the first degree of association between the first node and the second node.
  • the sum of at least one weight corresponding to at least one connecting edge included in each path may be used as the path weight of the corresponding path, and then the path weight of each path Sum, and use the obtained sum as the first degree of association between the first node and the second node.
  • the adjacency matrix may be added from the 1 power to the predetermined number K power to obtain the first matrix.
  • the relational network graph includes N nodes, the corresponding adjacency matrix is an N-order square matrix, and the corresponding first matrix is an N-order square matrix.
  • the rows in the first matrix corresponding to the first node are The values of the included N elements correspond to the N association degrees of the first node.
  • the first matrix includes a first element, the rows and columns of the first element correspond to a first node and a second node, respectively, and the value of the first element represents the first node and the second node. The first degree of association between nodes.
  • the above-mentioned adjacency matrix is a square matrix X of order N, and the first matrix M can be calculated by the following formula (1).
  • the specific formula is as follows:
  • n 1, 2,...,K.
  • the rows and columns of the elements Mi ,j respectively correspond to the node numbered i (hereinafter referred to as node i) and numbered j among the N nodes (Hereinafter referred to as node j)
  • the value of element Mi ,j represents the first degree of association between node i and node j. It can also be understood that the values of the N elements included in the i-th row of the matrix M correspond to N degrees of association between the node i and the N nodes.
  • the above relational network graph is an unweighted graph, then for the element x i, j in the matrix corresponding to X n , it means that the above node i can reach the node through n connecting edges.
  • Mi ,j represents the number of paths that the aforementioned node i takes to reach node j through a predetermined number of connecting edges within K.
  • the above relational network graph is a weighted graph, then for the elements x i,j in the matrix corresponding to X n , it means that the above node i can just reach the path of node j through n connecting edges ,
  • the sum of path weights corresponding to each path here refers to the product of at least one weight value corresponding to at least one edge included in each path).
  • the relational network graph when the relational network graph is a directed graph, the corresponding path above also has directionality.
  • the directionality in the directed graph can be weakened. For example, as long as there is a connecting edge between the two nodes of the directed graph, two-way connectivity can be realized. This can avoid the calculation by the above formula (1).
  • when the relational network graph is a directed graph first convert the obtained adjacency matrix into a corresponding symmetric matrix, and then add the symmetric matrix from the 1st power to the above-mentioned predetermined number K power , Get the above first matrix.
  • the adjacency matrix of the undirected graph is originally a symmetric matrix, no additional conversion is required.
  • the adjacency matrix X can be converted into a symmetric matrix A by the following formula (2), the specific formula is as follows:
  • T represents the matrix transposition operation. Further, the matrix A can be added from the 1st power to a predetermined number K to obtain the above-mentioned first matrix.
  • step S230 based on the N first degree of association, determine the second degree of association between the first node and each node to obtain N second degree of association; wherein, between the first node and the second node The second degree of association is determined based on the first degree of association between the first node and the second node, and the sum of the N first association degrees.
  • the determination of the second degree of association between the first node and the second node is taken as an example, and the method of determining the second degree of association between any two nodes among N nodes is described.
  • the second degree of association between the first node and the second node is based on the first degree of association between the first node and the second node, and the sum of the N first degrees of association of the first node. determine.
  • the second degree of association between the first node and the second node is positively related to the relative magnitude of the first degree of association between the two nodes and the above-mentioned sum.
  • the first degree of association between the first node and the second node is first divided by the above sum to obtain the corresponding quotient. Based on the obtained quotient, the second degree of association between the first node and the second node is determined. Further, in an example, the above quotient may be directly used as the second degree of association between the first node and the second node. In another example, the above quotient may be used as the input of the preset increasing function, and the obtained output result may be determined as the second degree of association between the first node and the second node.
  • the type of the preset increasing function and the constant value therein can be preset by the staff according to actual needs. For example, the type of the preset increasing function can be a logarithmic function or a linear function, and so on.
  • the following formula (3) can be used to determine the second degree of association P i,j between node i and node j.
  • the specific formula is as follows:
  • t 1, 2,..., K
  • Mi ,j represents the first degree of association between node i and node j
  • ⁇ N M i,t represents the ith row in the matrix (corresponding to node i)
  • the base and C of the log function are both hyperparameters.
  • the base can be set to a value greater than 1, such as 2 or 10, and C can be set to 0, 1 or -1, and so on. It can be understood that the base number and the value of C can have various combinations.
  • the above sum can be input into a preset subtraction function to obtain the corresponding function value, and then the first correlation between the first node and the second node is multiplied by the function value to obtain The corresponding product value. Based on the obtained product value, the second degree of association between the first node and the second node is determined. In an example, the above product value can be directly determined as the second degree of association between the first node and the second node.
  • the second degree of association between the first node and each node can be determined, and N second degree of association can be obtained.
  • step S240 construct N-dimensional data based on at least the N second degree of association of the first node; and, in step S250, perform dimensionality reduction processing on the N-dimensional data to obtain the first node The node vector.
  • the N-dimensional data can be either an N-dimensional vector or an N-dimensional matrix, and the specific structure is a vector or a matrix, which is related to the selected dimensionality reduction algorithm.
  • the magnitude of N is relatively large.
  • N can be tens of millions or hundreds of millions, and the generated node vector usually needs to be used for subsequent calculations. If the N-dimensional vector is directly constructed as the corresponding node The node vector may cause great difficulty in the amount of calculation and calculation resources for subsequent calculations. Therefore, it is necessary to reduce the dimension of the N-dimensional data, and then determine the node vector based on the reduced data.
  • the N second degree of association of the first node may be formed into the N-dimensional vector of the first node.
  • P i,j obtained based on the above formula (4), (P i,1 ,P i,2 ,...,P i,N ) can be constructed as the N-dimensional vector of node i.
  • the N second degree of association corresponding to each of the N nodes may be used as the row data corresponding to each node to obtain an N-dimensional matrix.
  • P i, j may be P i, j as an element value of i-th row j-th column element, and thus give the corresponding N-dimensional matrix P.
  • the dimensionality reduction processing is to perform linear or non-linear operations on the feature data in the original high-dimensional samples to obtain processed samples with reduced dimensionality.
  • the feature value in the processed sample does not directly correspond to a certain feature in the original sample, but is the result of a common operation of multiple features in the original sample.
  • a Restricted Boltzmann machine may be used to perform dimensionality reduction processing.
  • the above-mentioned N-dimensional vector of the first node is input into the RBM to obtain the Node vector.
  • RBM includes two layers of neural networks.
  • the first layer of RBM is called the visible layer or input layer, and the second layer is called the hidden layer.
  • Neurons in the same layer are independent of each other, while neurons in different network layers are connected to each other (two-way connection).
  • the number of nodes in the hidden layer needs to be set in advance, and the set value d is usually much smaller than the number of input nodes in the input layer (for example, corresponding to the above-mentioned dimension N).
  • d can be set Is 100 or 50.
  • the N-dimensional vector corresponding to the first node is input into the RBM, and the corresponding d-dimensional vector can be obtained as the node vector of the first node.
  • Singular Value Decomposition can be used to perform dimensionality reduction processing.
  • the above-mentioned N-dimensional matrix is first subjected to singular value decomposition to obtain the corresponding left singular matrix, and then The vector composed of each row of data in the left singular matrix is used as the node vector of each corresponding node.
  • the left singular matrix U corresponding to the order of N*d can be obtained, and then the vector composed of the i-th row data, namely (U i, 1 , U i , 2 ,..., U i, d ) as the node vector of node i.
  • the dimensionality reduction method corresponding to the above-mentioned dimensionality reduction processing may further include a principal component analysis (PCA) method.
  • the PCA method transforms the original data N-dimensional data into a set of linearly independent representations of each dimension through linear orthogonal transformation.
  • the first principal component has the largest variance value
  • each subsequent component is in line with the aforementioned It has the largest variance under the restriction of principal components orthogonality.
  • the dimensionality reduction method includes the least absolute shrinkage and selection operator LASSO (Least absolute shrinkage and selection operator) method.
  • This method is a compressed estimation, and its basic idea is to minimize the residual sum of squares under the constraint that the sum of the absolute values of the regression coefficients is less than a constant.
  • some transform operations in the mathematical wavelet analysis process can eliminate some interference data and can also play a role in dimensionality reduction. Therefore, it can also be used as a dimensionality reduction method.
  • the dimensionality reduction method may also include a Linear Discriminant Analysis (LDA) method, Laplacian feature mapping, LLE local linear embedding (Locally linear embedding), and so on.
  • LDA Linear Discriminant Analysis
  • Laplacian feature mapping Laplacian feature mapping
  • LLE local linear embedding LLE local linear embedding (Locally linear embedding), and so on.
  • the node vector corresponding to the first node can be determined. It can be understood that the first node is any node among the N nodes in the relational network graph. Therefore, the node vector corresponding to each node in the N nodes can be determined by the method.
  • the method for determining node vectors in the relational network graph disclosed in the embodiments of this specification can effectively improve the accuracy of the generated node vectors.
  • a device for determining a node vector in a relational network graph is provided, and the device can be deployed in any device, platform, or device cluster with computing and processing capabilities.
  • Fig. 6 shows a structural diagram of an apparatus for determining a node vector in a relational network graph according to an embodiment.
  • the device 600 in FIG. 6 is used to determine a node vector in a relational network graph, where the relational network graph includes N nodes and connecting edges between nodes, and the N nodes include any first node.
  • the device 600 includes:
  • the obtaining unit 610 is configured to obtain adjacency information of the relationship network graph, where the adjacency information is used to record the connection relationship between nodes in the relationship network graph.
  • the first determining unit 620 is configured to determine, according to the adjacency information, a first degree of association between the first node and each of the N nodes to obtain N first degrees of association; wherein The nodes include a second node, and the first degree of association between the first node and the second node is related to a path from the first node to the second node through a predetermined number of connecting edges within K.
  • the second determining unit 630 is configured to determine a second degree of association between the first node and each node based on the N first degree of association, to obtain N second degree of association; wherein, the first node and the second node The second degree of association between is determined based on the first degree of association between the first node and the second node, and the sum of the N first degrees of association.
  • the constructing unit 640 is configured to construct N-dimensional data based on at least the N second degree of association.
  • the dimension reduction unit 650 is configured to perform dimension reduction processing on the N-dimensional data to obtain the node vector of the first node.
  • the N nodes correspond to N users, and the connecting edge between the nodes indicates that there is an association relationship between the two correspondingly connected users.
  • the adjacency information is an adjacency matrix
  • the first determining unit 620 specifically includes: a first determining subunit 621 configured to determine a symmetric matrix corresponding to the adjacency matrix; and a first calculating subunit 622 , Configured to add the symmetric matrix from 1 to the predetermined number K to obtain a first matrix, the first matrix includes a first element, and the rows and columns of the first element correspond to For the first node and the second node, the value of the first element represents the first degree of association between the first node and the second node.
  • the relationship network graph is an undirected graph; the first determining subunit 621 is specifically configured to determine the adjacency matrix as the symmetric matrix.
  • the relationship network graph is a directed graph
  • the first determining subunit 621 is specifically configured to: sum the transposes of the adjacency matrix and the adjacency matrix to obtain The symmetric matrix.
  • the second determining unit 630 specifically includes: a second calculating subunit 631, configured to divide the first degree of association between the first node and the second node by the N first nodes. The sum of the degree of association; the second determining subunit 632 is configured to determine the second degree of association between the first node and the second node based on the obtained quotient.
  • the second determining subunit 632 is specifically configured to: use the quotient value as the second degree of association between the first node and the second node; or The quotient value is used as an input of a preset increasing function, and the obtained output result is determined as the second degree of association between the first node and the second node.
  • the N-dimensional data is an N-dimensional vector
  • the construction unit 640 is specifically configured to: form the N-dimensional vector of the first node by the N second association degrees; and the dimensionality reduction unit 650 is specifically configured to: input the N-dimensional vector into a restricted Boltzmann machine to obtain the node vector of the first node.
  • the N-dimensional data is an N-dimensional matrix
  • the construction unit 640 is specifically configured to: use the N second association degrees corresponding to each of the N nodes as row data corresponding to each node. , Obtain an N-dimensional matrix; the dimensionality reduction unit 650 is specifically configured to: perform singular value decomposition on the N-dimensional matrix to obtain the corresponding left singular matrix; and use the vector composed of each row of data in the left singular matrix as the The node vector of each node.
  • FIG. 7 shows a flowchart of a method for determining the risk status of an account according to an embodiment, and the execution subject of the method may be any device or device or platform or device cluster with computing and processing capabilities. As shown in Figure 7, the method specifically includes the following steps:
  • step S710 the adjacency information of the account network diagram is obtained.
  • the account network diagram includes N accounts and the connection edges between the accounts.
  • the adjacency information is used to record the connection relationship between the accounts in the account network diagram. .
  • step S710 refers to the foregoing description of step S210, which is not repeated here.
  • step S720 the first vector corresponding to the first account under test among the N accounts and the second vector corresponding to the known account whose account risk status is known are determined through vector embedding processing according to the adjacency information ,
  • the vector embedding processing includes: step S721, determining a first degree of association between any first account of the N accounts and each of the N accounts, and obtaining N first degrees of association; wherein , The respective accounts include a second account, and the first degree of association between the first account and the second account is related to the path that the first account takes to reach the second account through a predetermined number of connection edges within K
  • Step S722 Determine the second degree of association between the first account and each account based on the N first degree of association, and obtain N second degree of association; wherein, the first account and the second account number
  • the second degree of relevance is determined based on the first degree of relevance between the first account and the second account and the sum of the N first relevance degrees;
  • step S723 is based on at least the N second relevance degrees, Con
  • steps S721 to S724 reference may be made to the foregoing description of steps S220 to S250, which will not be repeated here.
  • the first vector corresponding to the first account to be tested and the second vector corresponding to the known account whose account risk status is known can be obtained.
  • the account risk status may include multiple types.
  • the account risk status may include normal and abnormal.
  • the account risk status may include low risk, medium risk, high risk, and so on. It should be noted that a known account whose account risk status is known can be obtained by pre-calibration by staff based on feedback from users such as complaints and freezing requests.
  • step S730 based on the first vector and the second vector, the account risk status of the first account to be tested is determined.
  • the similarity between the first vector and the second vector is determined first. Further, in a case where the similarity is greater than a predetermined threshold, it is determined that the account risk status of the first account to be tested is consistent with the known account.
  • the predetermined threshold may be preset by the staff based on actual experience, for example, it may be set to 0.8 or 0.9, and so on.
  • the account risk status of the aforementioned known account is abnormal. Further, in an example, assuming that the predetermined threshold is 0.85, and the determined similarity is 0.9, it can be determined that the first account to be detected is an abnormal account.
  • a determination device which can be deployed in any device, platform or device cluster with computing and processing capabilities.
  • Fig. 8 shows a structural diagram of a device for determining the risk status of an account according to an embodiment. As shown in Fig. 8, the device 800 includes:
  • the obtaining unit 810 is configured to obtain adjacency information of the account network diagram, the account network diagram includes N accounts and connection edges between accounts, and the adjacency information is used to record the connections between accounts in the account network diagram relationship.
  • the first determining unit 820 is configured to determine the first vector corresponding to the first account under test among the N accounts and the first vector corresponding to the known account whose account risk status is known through vector embedding processing according to the adjacency information. Two vectors.
  • the first determining unit 820 specifically includes: a first determining subunit 821, configured to determine a first degree of association between any first account among the N accounts and each of the N accounts, to obtain N first degree of relevance; wherein each account includes a second account, and the first degree of relevance between the first account and the second account reaches the first account through a predetermined number of connections within K The path of the second account is related; the second determining subunit 822 is configured to determine the second degree of association between the first account and each account based on the N first degree of association, to obtain N second degree of association; wherein , The second degree of association between the first account and the second account is determined based on the first degree of association between the first account and the second account, and the sum of the N first association degrees;
  • the constructing subunit 823 is configured to construct N-dimensional data based on at least the N second association degrees; the dimensionality reduction subunit 824 is configured to perform dimensionality reduction processing on the N-dimensional data to obtain the embedding of the first account vector.
  • the second determining unit 830 is configured to determine the account risk status of the first account to be tested based on the first vector and the second vector.
  • the second determining unit 830 is specifically configured to: determine the similarity between the first vector and the second vector; if the similarity is greater than a predetermined threshold, determine the first waiting The account risk status of the test account is consistent with the known account.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed in a computer, the computer is caused to execute the method described in conjunction with FIG. 2 or FIG. 7.
  • a computing device including a memory and a processor, the memory stores executable code, and when the processor executes the executable code, a combination of FIG. 2 or FIG. 7 The method described.

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Finance (AREA)
  • Strategic Management (AREA)
  • Theoretical Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

本说明书实施例提供一种计算机执行的、确定关系网络图中节点向量的方法,所述关系网络图中包括N个节点以及节点之间的连接边,N个节点中包括任意的第一节点。所述方法包括:首先,获取关系网络图的邻接信息,用于记录关系网络图中节点之间的连接关系;接着,根据邻接信息,确定第一节点与N个节点对应的N个第一关联度;其中,第一节点与N个节点中的第二节点之间的第一关联度,与第一节点经过预定数量K以内的连接边到达第二节点的路径相关;然后,基于N个第一关联度,确定第一节点与各个节点的第二关联度,得到N个第二关联度;再接着,至少基于N个第二关联度,构造N维数据;再然后,对N维数据进行降维处理,得到第一节点的节点向量。

Description

确定关系网络图中图节点向量的方法及装置 技术领域
本说明书一个或多个实施例涉及计算机信息处理技术领域,尤其涉及计算机执行的、确定关系网络图中图节点向量的方法及装置。
背景技术
关系网络图是对现实世界中实体之间的关系的描述,目前广泛地应用于各种计算机信息处理中。一般地,关系网络图包含节点集合和边集合,节点表示现实世界中的实体,边表示现实世界中实体之间的联系。例如,在社交网络中,人就是实体,人和人之间的关系或联系就是边。
在很多情况下,希望对关系网络图中的节点、边等的拓扑特性进行分析,从中提取出有效信息,实现这类过程的计算方法称为图计算。典型地,希望将关系网络图中的每个节点(实体)用相同维度的向量来表示,也就是生成针对每个节点的节点向量。如此,生成的节点向量可以应用于计算节点和节点之间的相似度,发现图中的社团结构,预测未来可能形成的边联系,以及对图进行可视化等。
节点向量的生成方法已成为图计算的基础算法。根据一种方案,可以采用无监督的生成方法,生成关系网络图中节点的节点向量。然而,现有的无监督的生成方法难以满足对节点向量的准确度要求。
因此,需要一种合理的方案,能够生成精准度更高的图节点向量。
发明内容
本说明书一个或多个实施例描述了一种计算机执行的确定关系网络图中节点向量的方法及装置。通过这样的方法,可以有效提高生成的节点向量的精准度。
根据第一方面,提供了一种计算机执行的、确定关系网络图中节点向量的方法,所述关系网络图中包括N个节点以及节点之间的连接边,所述N个节点中包括任意的第一节点;所述方法包括:获取所述关系网络图的邻接信息,所述邻接信息用于记录所述关系网络图中节点之间的连接关系;根据所述邻接信息,确定所述第一节点与所述N个节点中各个节点之间的第一关联度,得到N个第一关联度;其中,所述各个节点包括 第二节点,所述第一节点与第二节点之间的第一关联度,与所述第一节点经过预定数量K以内的连接边到达所述第二节点的路径相关;基于N个第一关联度,确定所述第一节点与各个节点的第二关联度,得到N个第二关联度;其中,所述第一节点和第二节点之间的第二关联度,基于所述第一节点和第二节点之间的第一关联度,以及所述N个第一关联度的总和而确定;至少基于所述N个第二关联度,构造N维数据;对所述N维数据进行降维处理,得到所述第一节点的节点向量。
在一个实施例中,所述N个节点对应于N个用户,所述节点之间的连接边表示对应连接的两个用户之间具有关联关系。
在一个实施例中,所述邻接信息为邻接矩阵;所述确定所述第一节点与所述N个节点中各个节点之间的第一关联度,包括:确定所述邻接矩阵所对应的对称矩阵;将所述对称矩阵从1次方加和至所述预定数量K次方,得到第一矩阵,所述第一矩阵中包括第一元素,所述第一元素的行和列分别对应于第一节点和第二节点,所述第一元素的值表示所述第一节点和第二节点之间的第一关联度。
进一步地,在一个具体的实施例中,所述关系网络图为无向图;所述确定所述邻接矩阵所对应的对称矩阵,包括:将所述邻接矩阵确定为所述对称矩阵。
在另一个具体的实施例中,所述关系网络图为有向图,所述确定所述邻接矩阵所对应的对称矩阵,包括:对所述邻接矩阵与所述邻接矩阵的转置进行求和,得到所述对称矩阵。
在一个实施例中,所述确定所述第一节点与各个节点的第二关联度,包括:将所述第一节点和第二节点之间的第一关联度除以所述N个第一关联度的总和;基于得到的商值,确定所述第一节点和第二节点之间的第二关联度。
进一步地,在一个具体的实施例中,所述基于得到的商值,确定所述第一节点和第二节点之间的第二关联度,包括:将所述商值作为所述第一节点和第二节点之间的第二关联度;或,将所述商值作为预设增函数的输入,并将得到的输出结果确定为所述第一节点和第二节点之间的第二关联度。
在一个实施例中,所述N维数据为N维向量,所述至少基于所述N个第二关联度,构造N维数据,包括:将所述N个第二关联度构成所述第一节点的N维向量;所述对所述N维数据进行降维处理,得到所述第一节点的节点向量,包括:将所述N维向量输入受限玻尔兹曼机中,得到所述第一节点的节点向量。
在一个实施例中,所述N维数据为N维矩阵,所述至少基于所述N个第二关联度,构造N维数据,包括:将N个节点中各个节点所对应的N个第二关联度分别作为对应于各个节点的行数据,得到N维矩阵;所述对所述N维数据进行降维处理,得到所述第一节点的节点向量,包括:对所述N维矩阵进行奇异值分解,得到对应的左奇异矩阵;将所述左奇异矩阵中的各行数据组成的向量分别作为所对应的各个节点的节点向量。
根据第二方面,提供了一种确定关系网络图中节点向量的装置,所述关系网络图中包括N个节点以及节点之间的连接边,所述N个节点中包括任意的第一节点;所述装置包括:获取单元,配置为获取所述关系网络图的邻接信息,所述邻接信息用于记录所述关系网络图中节点之间的连接关系;第一确定单元,配置为根据所述邻接信息,确定所述第一节点与所述N个节点中各个节点之间的第一关联度,得到N个第一关联度;其中,所述各个节点包括第二节点,所述第一节点与第二节点之间的第一关联度,与所述第一节点经过预定数量K以内的连接边到达所述第二节点的路径相关;第二确定单元,配置为基于N个第一关联度,确定所述第一节点与各个节点的第二关联度,得到N个第二关联度;其中,所述第一节点和第二节点之间的第二关联度,基于所述第一节点和第二节点之间的第一关联度,以及所述N个第一关联度的总和而确定;构造单元,配置为至少基于所述N个第二关联度,构造N维数据;降维单元,配置为对所述N维数据进行降维处理,得到所述第一节点的节点向量。
根据第三方面,提供了一种账号风险状态的确定方法,所述方法包括:获取账号网络图的邻接信息,所述账号网络图中包括N个账号以及账号之间的连接边,所述邻接信息用于记录所述账号网络图中账号之间的连接关系。根据所述邻接信息,通过向量嵌入处理,确定所述N个账号中第一待测账号对应的第一向量,以及账号风险状态已知的已知账号对应的第二向量,其中所述向量嵌入处理包括:确定所述N个账号中任意的第一账号与所述N个账号中各个账号之间的第一关联度,得到N个第一关联度;其中,所述各个账号包括第二账号,所述第一账号与第二账号之间的第一关联度,与所述第一账号经过预定数量K以内的连接边到达所述第二账号的路径相关;基于N个第一关联度,确定所述第一账号与各个账号的第二关联度,得到N个第二关联度;其中,所述第一账号和第二账号之间的第二关联度,基于所述第一账号和第二账号之间的第一关联度,以及所述N个第一关联度的总和而确定;至少基于所述N个第二关联度,构造N维数据;对所述N维数据进行降维处理,得到所述第一账号的嵌入向量。基于所述第一向量和第二向量,确定第一待测账号的账号风险状态。
根据第四方面,提供了一种账号风险状态的确定装置,所述装置包括:获取单元,配置为获取账号网络图的邻接信息,所述账号网络图中包括N个账号以及账号之间的连接边,所述邻接信息用于记录所述账号网络图中账号之间的连接关系。第一确定单元,配置为根据所述邻接信息,通过向量嵌入处理,确定所述N个账号中第一待测账号对应的第一向量,以及账号风险状态已知的已知账号对应的第二向量,其中所述第一确定单元具体包括:第一确定子单元,配置为确定所述N个账号中任意的第一账号与所述N个账号中各个账号之间的第一关联度,得到N个第一关联度;其中,所述各个账号包括第二账号,所述第一账号与第二账号之间的第一关联度,与所述第一账号经过预定数量K以内的连接边到达所述第二账号的路径相关;第二确定子单元,配置为基于N个第一关联度,确定所述第一账号与各个账号的第二关联度,得到N个第二关联度;其中,所述第一账号和第二账号之间的第二关联度,基于所述第一账号和第二账号之间的第一关联度,以及所述N个第一关联度的总和而确定;构造子单元,配置为至少基于所述N个第二关联度,构造N维数据;降维子单元,配置为对所述N维数据进行降维处理,得到所述第一账号的嵌入向量。第二确定单元,配置为基于所述第一向量和第二向量,确定第一待测账号的账号风险状态。
根据第五方面,提供了一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行第一方面或第三方面的方法。
根据第六方面,提供了一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现第一方面或第三方面的方法。
采用本说明书实施例披露的确定关系网络图中节点向量的方法,可以有效提高生成的节点向量的准确度。
附图说明
为了更清楚地说明本发明实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1示出关系网络图的示意图;
图2示出根据一个实施例的确定关系网络图中节点向量的方法流程图;
图3示出根据一个实施例的无向图的示意图;
图4示出根据一个实施例的有向图的示意图;
图5示出根据一个实施例的连接边具有权重值的有向图的示意图;
图6示出根据一个实施例的确定关系网络图中节点向量的装置结构图;
图7示出根据一个实施例的账号风险状态的确定方法流程图;
图8示出根据一个实施例的账号风险状态的确定装置结构图。
具体实施方式
下面结合附图,对本说明书提供的方案进行描述。
如前所述,关系网络图可以抽象为包括节点集合和边集合,其中节点表示现实世界中的实体,边表示实体之间的关联关系。图1示出关系网络图的示意图,其中以用户为节点进行示例。如图所示,具有关联关系的用户之间通过边进行连接。
目前,可以采用监督算法或无监督算法等,生成上述关系网络图中节点的节点向量。然而,其中现有的无监督生成算法难以满足对节点向量的准确度要求。基于此,本说明书实施例提供一种无监督的生成方法,可以生成精确度更高的节点向量。下面,结合具体的实施例,对所述方法进行介绍。
图2示出根据一个实施例的确定关系网络图中节点向量的方法流程图,所述方法的执行主体可以为任何具有计算、处理能力的装置或设备或平台或设备集群。此外,所述方法对应的关系网络图中包括多个节点以及节点之间的连接边。
为了更加清楚地对所述方法进行描述,以下将上述多个节点统称为N个节点,其中N指代多个节点所对应的数量,具体地,N可以为大于2的整数,如100万或1亿等等。以及,用第一节点指代N个节点中的任意一个节点。此外,以下将主要从确定第一节点的节点向量的角度出发,对所述方法进行说明。
如图2所示,该方法包括以下步骤:步骤S210,获取所述关系网络图的邻接信息,所述邻接信息用于记录所述关系网络图中节点之间的连接关系;步骤S220,根据所述邻接信息,确定所述第一节点与所述N个节点中各个节点之间的第一关联度,得到N个第一关联度;其中,所述各个节点包括第二节点,所述第一节点与第二节点之间的第一关联度,与所述第一节点经过预定数量K以内的连接边到达所述第二节点的路径相关; 步骤S230,基于N个第一关联度,确定所述第一节点与各个节点的第二关联度,得到N个第二关联度;其中,所述第一节点和第二节点之间的第二关联度,基于所述第一节点和第二节点之间的第一关联度,以及所述N个第一关联度的总和而确定;步骤S240,至少基于所述N个第二关联度,构造N维数据;步骤S250,对所述N维数据进行降维处理,得到所述第一节点的节点向量。
下面结合具体例子,描述以上各个步骤的具体执行方式。
如上所述,图2的方法可以确定关系网络图中的节点向量,所述关系网络图中包括表示实体的节点,以及表示实体之间的关联关系的边。
在一个实施例中,关系网络图为无向图,也就是其中实体之间的关联关系不具有方向性,或者可以理解为双向互通,相应地,表示实体之间关联关系的边为无向边,具体可以用不带箭头的连线表示,如图3示出的关系网络图为无向图。
进一步地,在一个具体的实施例中,关系网络图中的节点对应于用户,用户可以通过用户的ID或者账号等进行标识。节点之间的连接边对应于用户之间的不具有方向性的关联关系,具体可以包括以下中的一种或多种,如社交关系、媒介关系、亲属关系等。在一个例子中,在基于上述社交关系形成的社交网络中,若两个用户有共同关注对象(例如微博账号共同关注了同一人),或他们之前有来往联系,或加入了共同群组(例如QQ群、微信群等),或在红包、彩票等活动中有互动,那么可以认为这两个节点之间存在社交关系,可以建立一条无向边进行连接。在一个例子中,在基于上述媒介关系形成的媒介网络中,若两个用户使用过同样的媒介,例如加密后的银行卡、身份证、邮箱、户号、手机号、物理地址(例如MAC地址)、终端设备号(例如UMID、TID、UTDID)等,则这两个用户之间存在媒介关系的关联,可以建立一条无向边进行连接。在一个例子中,在基于上述亲属关系形成的亲属网络中,若两个用户在支付平台开通了亲密付功能,或者手机号归属于同一个亲情号组合,可以建立一条无向边进行连接。
在另一个具体的实施例中,关系网络图中节点可以对应于商品,商品可以通过商品ID进行标识。节点之间的连接边对应于商品之间的不具有方向性的关联关系,具体可以包括以下中的一种或多种,如买家关联关系,卖家关联关系,类别关联关系等。在一个例子中,如果两个商品曾被同一买家购买,那么可以认为这两个商品具有买家关联关系。在一个例子中,如果两个商品曾被同一卖家出售,那么可以认为这两个商品具有卖家关联关系。在一个例子中,如果两个商品被系统平台划分到同一预定级别的类,那么可以认为这两个商品具有类别关联关系。
在另一个实施例中,关系网络图为有向图,也就是其中实体之间的关联关系具有方向性,相应地,表示实体之间关联关系的边为有向边,具体可以用带箭头的连线表示,如图4示出的关系网络图为有向图。
进一步地,在一个具体的实施例中,关系网络图中的节点对应于用户,节点之间的连接边对应于用户之间的具有方向性的关联关系,如转账关系、借贷关系、上下级关系等。在一个例子中,如果用户A曾转账给用户B,而用户B未曾转账给用户A,那么可以认为这两个用户具有由用户A指向用户B的单向转账关系,相应地,可以建立一条由用户A指向用户B的有向边进行连接。在一个例子中,如果用户C和用户D之间曾互相转账,那么可以认为这两个用户具有双向转账关系,相应地,可以建立一条由用户C指向用户D的有向边以及由用户D指向用户C的有向边进行连接。
可以理解的是,关系网络图中的节点还可以代表其他实体,根据实体之间的一种或多种关系,在节点之间通过连接边进行连接。
另一方面,在一个实施例中,关系网络图为有权图,也就是其中的边具有对应的权重值,具体的权重值可以基于边表示的关联关系所对应的历史数据和预定的规则/算法而确定。在一个例子中,图5示出的有向图中的有向边具有权重。进一步地,其中节点可以表示用户,权重对应于其中一个用户向另一用户进行转账的历史转账次数。在另一个实施例中,关系网络图为无权图,也就是其中的边不考虑权重,或者可以理解为同一网络关系图中所有的边具有相同的权重值。在一个例子中,图3示出的无向图中的无向边和图4示出的有向图中的有向边不具有权重。
以上,对关系网络图进行了介绍。为了确定上述关系网络图中的节点向量,首先在步骤S210,获取所述关系网络图的邻接信息,所述邻接信息用于记录所述关系网络图中节点之间的连接关系。
需要说明的是,邻接信息可以对应于关系网络图的多种存储方式,具体包括邻接矩阵、边数组、邻接表、十字链表和邻接多重表等,均可以用于记录图中节点之间的连接关系。此外,由上述对关系网络图的描述内容可知,节点的连接关系可以包括,节点之间是否存在连接边,连接边的方向性和权重值等,具体可以参见上文,在此不再赘述。
在一个实施例中,关系网络图通过邻接矩阵的方式预先存储,相应地,获取的邻接信息为对应的邻接矩阵。在一个具体的实施例中,上述关系网络图中包括N个节点,其所对应的邻接矩阵为N阶方阵X,其中的元素X i,j表示N个节点中从编号为i的节点到 编号为j的节点之间的连接边所对应的取值。在一个例子中,关系网络图为无权重的无向图;在这样的情况下,邻接矩阵中某个元素的值为0,则说明对应的两个节点之间不存在连接边,值为1则说明两个节点之间存在连接边。在另一个例子中,关系网络图为有权图;在这样的情况下,邻接矩阵中某个元素的值为5,则说明对应连接边的权重值为5。在另一个实施例中,关系网络图通过邻接表的方式预先存储,相应地,获取的邻接信息为对应的邻接表。
以上,可以获取关系网络图的邻接信息。接着,在步骤S220,根据所述邻接信息,确定所述第一节点与所述N个节点中各个节点之间的第一关联度,得到N个第一关联度;其中,所述各个节点包括第二节点,所述第一节点与第二节点之间的第一关联度,与所述第一节点经过预定数量K以内的连接边到达所述第二节点的路径相关。
首先,以第一节点与第二节点之间的第一关联度为例,对任意两个节点之间第一关联度的含义和确定方法进行说明。第一节点与第二节点之间的第一关联度,与所述第一节点经过预定数量K以内的连接边到达所述第二节点的路径相关。其中,预定数量K为正整数,具体可以由工作人员根据实际需要预先设定,例如,可以设定为2、或3或4,等等。
在一个实施例中,关系网络图中为无权图,相应地,第一节点与第二节点之间的第一关联度,可以与第一节点经过预定数量K以内的连接边到达所述第二节点的路径数量正相关。在一个具体的实施例中,可以将上述路径数量确定为第一节点与第二节点之间的第一关联度。在一个具体的实施例中,上述路径数量可以通过遍历邻接表等方式基于邻接信息确定。
在另一个实施例中,关系网络图为有权图,相应地,第一节点与第二节点之间的第一关联度,可以与第一节点经过预定数量K以内的连接边到达第二节点的路径中各条路径所对应的边的权重值相关。在一个具体的实施例中,可以将其中各条路径包括的至少一条连接边所对应的至少一个权重值相乘所得到的积值作为对应路径的路径权重,然后对各条路径的路径权重进行求和,将得到的和值作为第一节点与第二节点之间的第一关联度。在另一个具体的实施例中,可以将其中各条路径包括的至少一条连接边所对应的至少一个权重相加所得到的和值作为对应路径的路径权重,然后对各条路径的路径权重进行求和,将得到的和值作为第一节点与第二节点之间的第一关联度。
下面,结合具体的实施例和例子,对第一关联度的确定方法进行进一步说明。
在一个具体的实施例中,可以将上述邻接矩阵从1次方加和至上述预定数量K次方,得到第一矩阵。可以理解的是,关系网络图中包括N个节点,对应的邻接矩阵是N阶方阵,相应得到的第一矩阵为N阶方阵,此外,第一矩阵中与第一节点对应的行所包括的N个元素的值,对应于第一节点具有的N个关联度。具体地,所述第一矩阵中包括第一元素,所述第一元素的行和列分别对应于第一节点和第二节点,所述第一元素的值表示所述第一节点和第二节点之间的第一关联度。
在一个例子中,上述邻接矩阵为N阶方阵X,可以通过以下公式(1)计算出第一矩阵M,具体公式如下:
M=X+X 2+X 3+…+X K=∑ nX n        (1)
其中,n=1,2,...,K。对于基于公式(1)得到的矩阵M,其中包括元素M i,j,元素M i,j的行和列分别对应于N个节点中编号为i的节点(以下简称节点i)和编号为j的节点(以下简称节点j),元素M i,j的值表示节点i和节点j之间的第一关联度。同样可以理解的是,矩阵M中第i行包括的N个元素的值对应于节点i与N个节点之间的N个关联度。
需要说明的是,在一种情况下,上述关系网络图为无权图,则对于X n所对应的矩阵中的元素x i,j,其表示上述节点i经过n条连接边正好可以到达节点j的路径数量,相应地,M i,j表示上述节点i经过预定数量K以内的连接边到达节点j的路径数量。在另一种情况下,上述关系网络图为有权图,则对于X n所对应的矩阵中的元素x i,j,其表示上述节点i经过n条连接边正好可以到达节点j的路径中,各条路径所对应的路径权重(此处指各条路径中包括的至少一条边所对应的至少一个权重值的乘积)之和。
此外,考虑到有向图中的连接边具有方向性,在关系网络图为有向图的情况下,其所对应的上述路径同样具有方向性。在实际计算过程中,可以弱化有向图中的方向性,如认为只要有向图的两个节点之间存在连接边,则可以实现双向连通,如此可以避免通过上述公式(1)进行计算而可能存在的数学或性能上的问题。在一个具体的实施例中,在关系网络图为有向图的情况下,先将获取的邻接矩阵转换为对应的对称矩阵,再将对称矩阵从1次方加和至上述预定数量K次方,得到上述第一矩阵。另外需要说明的是,因无向图的邻接矩阵原本就属于对称矩阵,因此不需要另外进行转换。在一个例子中,可以通过以下公式(2)将邻接矩阵X转换为对称矩阵A,具体公式如下:
A=X+X T          (2)
其中,T表示矩阵的转置运算。进一步地,可以将矩阵A从1次方加和至预定数 量K次方,得到上述第一矩阵。
以上,可以实现根据获取的邻接信息,确定所述第一节点与所述N个节点中各个节点之间的第一关联度,得到N个第一关联度。
然后,在步骤S230,基于N个第一关联度,确定所述第一节点与各个节点的第二关联度,得到N个第二关联度;其中,所述第一节点和第二节点之间的第二关联度,基于所述第一节点和第二节点之间的第一关联度,以及所述N个第一关联度的总和而确定。
首先,以确定第一节点和第二节点之间的第二关联度为例,对确定N个节点中任意两个节点之间第二关联度的方法进行说明。具体地,第一节点和第二节点之间的第二关联度,基于第一节点和第二节点之间的第一关联度,以及上述第一节点具有的N个第一关联度的总和而确定。在一个实施例中,第一节点和第二节点之间的第二关联度,与这两个节点的第一关联度和上述总和的相对大小正向相关。
在一个具体的实施例中,先将第一节点和第二节点之间的第一关联度除以上述总和,得到对应的商值。再基于得到的商值,确定第一节点和第二节点之间的第二关联度。进一步地,在一个例子中,可以将上述商值直接作为第一节点和第二节点之间的第二关联度。在另一个例子中,可以将上述商值作为预设增函数的输入,并将得到的输出结果确定为第一节点和第二节点之间的第二关联度。在一个具体的例子中,其中预设增函数的类型和其中的常量值可以由工作人员根据实际需要预先设定,例如,预设增函数的类型可以为对数函数或线性函数,等等。
根据一个具体的例子,基于上述公式(1)得到的第一矩阵M,可以采用下述公式(3)确定节点i和节点j之间的第二关联度P i,j,具体公式如下:
Figure PCTCN2020075012-appb-000001
其中,t=1,2,...,K;M i,j表示节点i与节点j之间的第一关联度;∑ NM i,t表示矩阵中第i行(对应于节点i)中所有元素的元素值的总和。此外,log函数的底数和C均为超参数,一般地,底数可以设置为大于1的数值,如2或10等,而C可以设置为0或1或-1等等。可以理解的是,底数和C的数值可以具有多种组合。
需要说明的是,当公式(1)中的矩阵X为对称矩阵时,M也为对称矩阵,相应地,M i,t=M t,i也就时说,上述公式(3)等价于以下公式(4):
Figure PCTCN2020075012-appb-000002
在另一个具体的实施例中,可以将上述总和输入预设减函数中,得到对应的函数值,再将第一节点和第二节点之间的第一关联度乘以所述函数值,得到对应的积值。再基于得到的积值,确定第一节点和第二节点之间的第二关联度。在一个例子中,可以将上述积值直接确定为第一节点和第二节点之间的第二关联度。
以上,可以确定所述第一节点与各个节点的第二关联度,得到N个第二关联度。再接着,在步骤S240,至少基于上述第一节点具有的N个第二关联度,构造N维数据;以及,在步骤S250,对所述N维数据进行降维处理,得到所述第一节点的节点向量。
需要说明的是,N维数据可以是N维向量也可以是N维矩阵,具体构造为向量还是矩阵,与选取的降维算法相关。在一种情况下,N的数量级是比较大的,例如,N可以为千万级或亿级,而生成的节点向量通常是需要用于后续计算的,若直接构造N维向量作为对应节点的节点向量,可能会给后续计算造成运算量、运算资源上的极大难度。因此,需要通过对N维数据进行降维处理,再基于降维后的数据确定节点向量。
具体地,关于上述构造N维数据,在一个实施例中,可以将第一节点具有的N个第二关联度构成第一节点的N维向量。在一个例子中,对于基于上述公式(4)得到的P i,j,可以构造(P i,1,P i,2,...,P i,N)作为节点i的N维向量。
在另一个实施例中,可以将N个节点中各个节点所对应的N个第二关联度分别作为对应于各个节点的行数据,得到N维矩阵。在一个例子中,对于基于上述公式(4)得到的P i,j,可以将P i,j作为第i行第j列元素的元素值,进而得到对应的N维矩阵P。
另一方面,在一个实施例中,上述降维处理所对应的方法有多种,降维处理是对原始高维样本中的特征数据进行线性或非线性的运算,得到维度降低的处理样本。一般地,处理样本中的特征值并不直接对应于原始样本中的某个特征,而是原始样本中多个特征共同运算的结果。
在一个具体的实施例中,可以利用受限玻尔兹曼机(Restricted Boltzmann machine,RBM)进行降维处理,具体地,将上述第一节点的N维向量输入RBM中,得到第一节点的节点向量。需要说明的是,RBM包括两层神经网络,RBM的第一层被称为可见层或者输入层,它的第二层叫做隐藏层。在同一层的神经元之间是相互独立的,而在不同的网络层之间的神经元是相互连接的(双向连接)。其中,隐藏层的节点数量需要预先设定,且设定的数值d通常远小于输入层输入节点的数量(例如,对应于上述维度N),例如,N为1亿时,可以将d设定为100或50。在一个例子中,将第一节点对应的N 维向量输入RBM中,可以得到对应的d维向量,作为第一节点的节点向量。
在另一个具体的实施例中,可以利用奇异值分解(Singular Value Decomposition,简称SVD)进行降维处理,具体地,先对上述N维矩阵进行奇异值分解,得到对应的左奇异矩阵,然后将左奇异矩阵中的各行数据组成的向量分别作为所对应的各个节点的节点向量。在一个例子中,对上述N维矩阵P进行奇异值分解,可以得到对应N*d阶的左奇异矩阵U,再将其中的第i行数据组成的向量,即(U i,1,U i,2,...,U i,d)作为节点i的节点向量。
在又一个具体的实施例中,上述降维处理所对应的降维方法还可以包括主成分分析(Principal Component Analysis,简称PCA)方法。PCA方法通过线性正交变换将原始数据N维数据变换为一组各维度线性无关的表示,变换后的结果中,第一个主成分具有最大的方差值,每个后续的成分在与前述主成分正交条件限制下具有最大方差。在还一个具体的实施例中,降维方法包括最小绝对收缩和选择算子LASSO(Least absolute shrinkage and selection operator)方法。该方法是一种压缩估计,其基本思想是在回归系数的绝对值之和小于一个常数的约束条件下,使残差平方和最小化。在还一个具体的实施例中,数学上的小波分析过程中的一些变换操作可以排除一些干扰数据,也可以起到降维作用,因此也可以将其作为一种降维方法。
此外,降维方法还可以包括线性判别(Linear Discriminant Analysis,简称LDA)方法,拉普拉斯特征映射,LLE局部线性嵌入(Locally linear embedding)等等。
由上,可以确定出第一节点对应的节点向量。可以理解的是,第一节点是关系网络图中N个节点中的任一节点,由此,通过所述方法可以确定出N个节点中各个节点所对应的节点向量。
综上可知,采用本说明书实施例披露的确定关系网络图中节点向量的方法,可以有效提高生成的节点向量的准确度。
根据另一方面的实施例,提供了一种确定关系网络图中节点向量的装置,该装置可以部署在任何具有计算、处理能力的设备、平台或设备集群中。图6示出根据一个实施例的确定关系网络图中节点向量的装置结构图。图6中的装置600用于确定关系网络图中节点向量,其中关系网络图中包括N个节点以及节点之间的连接边,所述N个节点中包括任意的第一节点。如图6所示,该装置600包括:
获取单元610,配置为获取所述关系网络图的邻接信息,所述邻接信息用于记录所 述关系网络图中节点之间的连接关系。
第一确定单元620,配置为根据所述邻接信息,确定所述第一节点与所述N个节点中各个节点之间的第一关联度,得到N个第一关联度;其中,所述各个节点包括第二节点,所述第一节点与第二节点之间的第一关联度,与所述第一节点经过预定数量K以内的连接边到达所述第二节点的路径相关。
第二确定单元630,配置为基于N个第一关联度,确定所述第一节点与各个节点的第二关联度,得到N个第二关联度;其中,所述第一节点和第二节点之间的第二关联度,基于所述第一节点和第二节点之间的第一关联度,以及所述N个第一关联度的总和而确定。
构造单元640,配置为至少基于所述N个第二关联度,构造N维数据。
降维单元650,配置为对所述N维数据进行降维处理,得到所述第一节点的节点向量。
在一个实施例中,所述N个节点对应于N个用户,所述节点之间的连接边表示对应连接的两个用户之间具有关联关系。
在一个实施例中,所述邻接信息为邻接矩阵;所述第一确定单元620具体包括:第一确定子单元621,配置为确定所述邻接矩阵所对应的对称矩阵;第一计算子单元622,配置为将所述对称矩阵从1次方加和至所述预定数量K次方,得到第一矩阵,所述第一矩阵中包括第一元素,所述第一元素的行和列分别对应于第一节点和第二节点,所述第一元素的值表示所述第一节点和第二节点之间的第一关联度。
进一步地,在一个具体的实施例中,所述关系网络图为无向图;所述第一确定子单元621具体配置为:将所述邻接矩阵确定为所述对称矩阵。
在另一个具体的实施例中,所述关系网络图为有向图,所述第一确定子单元621具体配置为:对所述邻接矩阵与所述邻接矩阵的转置进行求和,得到所述对称矩阵。
在一个实施例中,所述第二确定单元630具体包括:第二计算子单元631,配置为将所述第一节点和第二节点之间的第一关联度除以所述N个第一关联度的总和;第二确定子单元632,配置为基于得到的商值,确定所述第一节点和第二节点之间的第二关联度。
进一步地,在一个具体的实施例中,所述第二确定子单元632具体配置为:将所 述商值作为所述第一节点和第二节点之间的第二关联度;或,将所述商值作为预设增函数的输入,并将得到的输出结果确定为所述第一节点和第二节点之间的第二关联度。
在一个实施例中,所述N维数据为N维向量,所述构造单元640具体配置为:将所述N个第二关联度构成所述第一节点的N维向量;所述降维单元650具体配置为:将所述N维向量输入受限玻尔兹曼机中,得到所述第一节点的节点向量。
在一个实施例中,所述N维数据为N维矩阵,所述构造单元640具体配置为:将N个节点中各个节点所对应的N个第二关联度分别作为对应于各个节点的行数据,得到N维矩阵;所述降维单元650具体配置为:对所述N维矩阵进行奇异值分解,得到对应的左奇异矩阵;将所述左奇异矩阵中的各行数据组成的向量分别作为所对应的各个节点的节点向量。
通过以上的装置,可以生成准确度更高的节点向量。
根据又一方面的实施例,本说明书实施例还提供一种账号风险状态的确定方法。具体地,图7示出根据一个实施例的账号风险状态的确定方法流程图,所述方法的执行主体可以为任何具有计算、处理能力的装置或设备或平台或设备集群。如图7所示,所述方法具体包括以下步骤:
首先,步骤S710,获取账号网络图的邻接信息,所述账号网络图中包括N个账号以及账号之间的连接边,所述邻接信息用于记录所述账号网络图中账号之间的连接关系。
需要说明的是,对步骤S710的描述,可以参见前述对步骤S210的描述,在此不作赘述。
接着,在步骤S720,根据所述邻接信息,通过向量嵌入处理,确定所述N个账号中第一待测账号对应的第一向量,以及账号风险状态已知的已知账号对应的第二向量,其中所述向量嵌入处理包括:步骤S721,确定所述N个账号中任意的第一账号与所述N个账号中各个账号之间的第一关联度,得到N个第一关联度;其中,所述各个账号包括第二账号,所述第一账号与第二账号之间的第一关联度,与所述第一账号经过预定数量K以内的连接边到达所述第二账号的路径相关;步骤S722,基于N个第一关联度,确定所述第一账号与各个账号的第二关联度,得到N个第二关联度;其中,所述第一账号和第二账号之间的第二关联度,基于所述第一账号和第二账号之间的第一关联度,以及所述N个第一关联度的总和而确定;步骤S723,至少基于所述N个第二关联度,构造N维数据;步骤S724,对所述N维数据进行降维处理,得到所述第一账号的嵌入向 量。
需要说明的是,对其中步骤S721至步骤S724的描述,可以参见前述对步骤S220至步骤S250的描述,在此不作赘述。如此,基于确定出的N个账号中各个账号的嵌入向量,可以获取第一待测账号对应的第一向量,以及账号风险状态已知的已知账号对应的第二向量。
此外,在一个实施例中,账号风险状态可以包括多种。在一个具体的实施例中,账号风险状态可以包括正常、异常。在另一个具体的实施例中,账号风险状态可以包括低风险、中等风险和高风险,等等。需要说明的是,账号风险状态已知的已知账号,可以由工作人员根据用户投诉、请求冻结等反馈情况预先标定而得到。
然后,在步骤S730,基于所述第一向量和第二向量,确定第一待测账号的账号风险状态。
在一个实施例中,先确定所述第一向量和第二向量的相似度。进一步地,在所述相似度大于预定阈值的情况下,确定出所述第一待测账号的账号风险状态与所述已知账号一致。在一个具体的实施例中,其中预定阈值可以由工作人员根据实际经验预先设定,例如,可以设定为0.8或0.9,等等。在一个具体的实施例中,上述已知账号的账号风险状态为异常。进一步地,在一个例子中,假定预定阈值为0.85,确定出的相似度为0.9,由此可以判定第一待检测账号为异常账号。
以上,基于高准确度的节点向量,相应可以提高账号风险状态检测的精准度。
根据还一方面的实施例,提供了一种确定装置,该装置可以部署在任何具有计算、处理能力的设备、平台或设备集群中。图8示出根据一个实施例的账号风险状态的确定装置结构图。如图8所示,该装置800包括:
获取单元810,配置为获取账号网络图的邻接信息,所述账号网络图中包括N个账号以及账号之间的连接边,所述邻接信息用于记录所述账号网络图中账号之间的连接关系。
第一确定单元820,配置为根据所述邻接信息,通过向量嵌入处理,确定所述N个账号中第一待测账号对应的第一向量,以及账号风险状态已知的已知账号对应的第二向量。
其中所述第一确定单元820具体包括:第一确定子单元821,配置为确定所述N个账号中任意的第一账号与所述N个账号中各个账号之间的第一关联度,得到N个第 一关联度;其中,所述各个账号包括第二账号,所述第一账号与第二账号之间的第一关联度,与所述第一账号经过预定数量K以内的连接边到达所述第二账号的路径相关;第二确定子单元822,配置为基于N个第一关联度,确定所述第一账号与各个账号的第二关联度,得到N个第二关联度;其中,所述第一账号和第二账号之间的第二关联度,基于所述第一账号和第二账号之间的第一关联度,以及所述N个第一关联度的总和而确定;构造子单元823,配置为至少基于所述N个第二关联度,构造N维数据;降维子单元824,配置为对所述N维数据进行降维处理,得到所述第一账号的嵌入向量。
第二确定单元830,配置为基于所述第一向量和第二向量,确定第一待测账号的账号风险状态。
在一个实施例中,所述第二确定单元830具体配置为:确定所述第一向量和第二向量的相似度;在所述相似度大于预定阈值的情况下,确定出所述第一待测账号的账号风险状态与所述已知账号一致。
根据另一方面的实施例,还提供一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行结合图2或图7所描述的方法。
根据再一方面的实施例,还提供一种计算设备,包括存储器和处理器,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现结合图2或图7所描述的方法。
本领域技术人员应该可以意识到,在上述一个或多个示例中,本发明所描述的功能可以用硬件、软件、固件或它们的任意组合来实现。当使用软件实现时,可以将这些功能存储在计算机可读介质中或者作为计算机可读介质上的一个或多个指令或代码进行传输。
以上所述的具体实施方式,对本发明的目的、技术方案和有益效果进行了进一步详细说明,所应理解的是,以上所述仅为本发明的具体实施方式而已,并不用于限定本发明的保护范围,凡在本发明的技术方案的基础之上,所做的任何修改、等同替换、改进等,均应包括在本发明的保护范围之内。

Claims (24)

  1. 一种计算机执行的、确定关系网络图中节点向量的方法,所述关系网络图中包括N个节点以及节点之间的连接边,所述N个节点中包括任意的第一节点;所述方法包括:
    获取所述关系网络图的邻接信息,所述邻接信息用于记录所述关系网络图中节点之间的连接关系;
    根据所述邻接信息,确定所述第一节点与所述N个节点中各个节点之间的第一关联度,得到N个第一关联度;其中,所述各个节点包括第二节点,所述第一节点与第二节点之间的第一关联度,与所述第一节点经过预定数量K以内的连接边到达所述第二节点的路径相关;
    基于N个第一关联度,确定所述第一节点与各个节点的第二关联度,得到N个第二关联度;其中,所述第一节点和第二节点之间的第二关联度,基于所述第一节点和第二节点之间的第一关联度,以及所述N个第一关联度的总和而确定;
    至少基于所述N个第二关联度,构造N维数据;
    对所述N维数据进行降维处理,得到所述第一节点的节点向量。
  2. 根据权利要求1所述的方法,其中,所述N个节点对应于N个用户,所述节点之间的连接边表示对应连接的两个用户之间具有关联关系。
  3. 根据权利要求1所述的方法,其中,所述邻接信息为邻接矩阵;
    所述确定所述第一节点与所述N个节点中各个节点之间的第一关联度,包括:
    确定所述邻接矩阵所对应的对称矩阵;
    将所述对称矩阵从1次方加和至所述预定数量K次方,得到第一矩阵,所述第一矩阵中包括第一元素,所述第一元素的行和列分别对应于第一节点和第二节点,所述第一元素的值表示所述第一节点和第二节点之间的第一关联度。
  4. 根据权利要求3所述的方法,其中,所述关系网络图为无向图;所述确定所述邻接矩阵所对应的对称矩阵,包括:
    将所述邻接矩阵确定为所述对称矩阵。
  5. 根据权利要求3所述的方法,其中,所述关系网络图为有向图,所述确定所述邻接矩阵所对应的对称矩阵,包括:
    对所述邻接矩阵与所述邻接矩阵的转置进行求和,得到所述对称矩阵。
  6. 根据权利要求1所述的方法,其中,所述确定所述第一节点与各个节点的第二关联度,包括:
    将所述第一节点和第二节点之间的第一关联度除以所述N个第一关联度的总和;
    基于得到的商值,确定所述第一节点和第二节点之间的第二关联度。
  7. 根据权利要求6所述的方法,其中,所述基于得到的商值,确定所述第一节点和第二节点之间的第二关联度,包括:
    将所述商值作为所述第一节点和第二节点之间的第二关联度;或,
    将所述商值作为预设增函数的输入,并将得到的输出结果确定为所述第一节点和第二节点之间的第二关联度。
  8. 根据权利要求1所述的方法,其中,所述N维数据为N维向量,所述至少基于所述N个第二关联度,构造N维数据,包括:
    将所述N个第二关联度构成所述第一节点的N维向量;
    所述对所述N维数据进行降维处理,得到所述第一节点的节点向量,包括:
    将所述N维向量输入受限玻尔兹曼机中,得到所述第一节点的节点向量。
  9. 根据权利要求1所述的方法,其中,所述N维数据为N维矩阵,所述至少基于所述N个第二关联度,构造N维数据,包括:
    将N个节点中各个节点所对应的N个第二关联度分别作为对应于各个节点的行数据,得到N维矩阵;
    所述对所述N维数据进行降维处理,得到所述第一节点的节点向量,包括:
    对所述N维矩阵进行奇异值分解,得到对应的左奇异矩阵;
    将所述左奇异矩阵中的各行数据组成的向量分别作为所对应的各个节点的节点向量。
  10. 一种确定关系网络图中节点向量的装置,所述关系网络图中包括N个节点以及节点之间的连接边,所述N个节点中包括任意的第一节点;所述装置包括:
    获取单元,配置为获取所述关系网络图的邻接信息,所述邻接信息用于记录所述关系网络图中节点之间的连接关系;
    第一确定单元,配置为根据所述邻接信息,确定所述第一节点与所述N个节点中各个节点之间的第一关联度,得到N个第一关联度;其中,所述各个节点包括第二节点,所述第一节点与第二节点之间的第一关联度,与所述第一节点经过预定数量K以内的连接边到达所述第二节点的路径相关;
    第二确定单元,配置为基于N个第一关联度,确定所述第一节点与各个节点的第二关联度,得到N个第二关联度;其中,所述第一节点和第二节点之间的第二关联度,基于所述第一节点和第二节点之间的第一关联度,以及所述N个第一关联度的总和而确定;
    构造单元,配置为至少基于所述N个第二关联度,构造N维数据;
    降维单元,配置为对所述N维数据进行降维处理,得到所述第一节点的节点向量。
  11. 根据权利要求10所述的装置,其中,所述N个节点对应于N个用户,所述节点之间的连接边表示对应连接的两个用户之间具有关联关系。
  12. 根据权利要求10所述的装置,其中,所述邻接信息为邻接矩阵;
    所述第一确定单元具体包括:
    第一确定子单元,配置为确定所述邻接矩阵所对应的对称矩阵;
    第一计算子单元,配置为将所述对称矩阵从1次方加和至所述预定数量K次方,得到第一矩阵,所述第一矩阵中包括第一元素,所述第一元素的行和列分别对应于第一节点和第二节点,所述第一元素的值表示所述第一节点和第二节点之间的第一关联度。
  13. 根据权利要求12所述的装置,其中,所述关系网络图为无向图;所述第一确定子单元具体配置为:
    将所述邻接矩阵确定为所述对称矩阵。
  14. 根据权利要求12所述的装置,其中,所述关系网络图为有向图,所述第一确定子单元具体配置为:
    对所述邻接矩阵与所述邻接矩阵的转置进行求和,得到所述对称矩阵。
  15. 根据权利要求10所述的装置,其中,所述第二确定单元具体包括:
    第二计算子单元,配置为将所述第一节点和第二节点之间的第一关联度除以所述N个第一关联度的总和;
    第二确定子单元,配置为基于得到的商值,确定所述第一节点和第二节点之间的第二关联度。
  16. 根据权利要求15所述的装置,其中,所述第二确定子单元具体配置为:
    将所述商值作为所述第一节点和第二节点之间的第二关联度;或,
    将所述商值作为预设增函数的输入,并将得到的输出结果确定为所述第一节点和第二节点之间的第二关联度。
  17. 根据权利要求10所述的装置,其中,所述N维数据为N维向量,所述构造单元具体配置为:
    将所述N个第二关联度构成所述第一节点的N维向量;
    所述降维单元具体配置为:
    将所述N维向量输入受限玻尔兹曼机中,得到所述第一节点的节点向量。
  18. 根据权利要求10所述的装置,其中,所述N维数据为N维矩阵,所述构造单 元具体配置为:
    将N个节点中各个节点所对应的N个第二关联度分别作为对应于各个节点的行数据,得到N维矩阵;
    所述降维单元具体配置为:
    对所述N维矩阵进行奇异值分解,得到对应的左奇异矩阵;
    将所述左奇异矩阵中的各行数据组成的向量分别作为所对应的各个节点的节点向量。
  19. 一种账号风险状态的确定方法,所述方法包括:
    获取账号网络图的邻接信息,所述账号网络图中包括N个账号以及账号之间的连接边,所述邻接信息用于记录所述账号网络图中账号之间的连接关系;
    根据所述邻接信息,通过向量嵌入处理,确定所述N个账号中第一待测账号对应的第一向量,以及账号风险状态已知的已知账号对应的第二向量,其中所述向量嵌入处理包括:
    确定所述N个账号中任意的第一账号与所述N个账号中各个账号之间的第一关联度,得到N个第一关联度;其中,所述各个账号包括第二账号,所述第一账号与第二账号之间的第一关联度,与所述第一账号经过预定数量K以内的连接边到达所述第二账号的路径相关;
    基于N个第一关联度,确定所述第一账号与各个账号的第二关联度,得到N个第二关联度;其中,所述第一账号和第二账号之间的第二关联度,基于所述第一账号和第二账号之间的第一关联度,以及所述N个第一关联度的总和而确定;
    至少基于所述N个第二关联度,构造N维数据;
    对所述N维数据进行降维处理,得到所述第一账号的嵌入向量;
    基于所述第一向量和第二向量,确定第一待测账号的账号风险状态。
  20. 根据权利要求19所述的方法,其中,所述基于所述第一向量和第二向量,确定第一待测账号的账号风险状态,包括:
    确定所述第一向量和第二向量的相似度;
    在所述相似度大于预定阈值的情况下,确定出所述第一待测账号的账号风险状态与所述已知账号一致。
  21. 一种账号风险状态的确定装置,所述装置包括:
    获取单元,配置为获取账号网络图的邻接信息,所述账号网络图中包括N个账号以及账号之间的连接边,所述邻接信息用于记录所述账号网络图中账号之间的连接关系;
    第一确定单元,配置为根据所述邻接信息,通过向量嵌入处理,确定所述N个账号中第一待测账号对应的第一向量,以及账号风险状态已知的已知账号对应的第二向量,其中所述第一确定单元具体包括:
    第一确定子单元,配置为确定所述N个账号中任意的第一账号与所述N个账号中各个账号之间的第一关联度,得到N个第一关联度;其中,所述各个账号包括第二账号,所述第一账号与第二账号之间的第一关联度,与所述第一账号经过预定数量K以内的连接边到达所述第二账号的路径相关;
    第二确定子单元,配置为基于N个第一关联度,确定所述第一账号与各个账号的第二关联度,得到N个第二关联度;其中,所述第一账号和第二账号之间的第二关联度,基于所述第一账号和第二账号之间的第一关联度,以及所述N个第一关联度的总和而确定;
    构造子单元,配置为至少基于所述N个第二关联度,构造N维数据;
    降维子单元,配置为对所述N维数据进行降维处理,得到所述第一账号的嵌入向量;
    第二确定单元,配置为基于所述第一向量和第二向量,确定第一待测账号的账号风险状态。
  22. 根据权利要求21所述的装置,其中,所述第二确定单元具体配置为:
    确定所述第一向量和第二向量的相似度;
    在所述相似度大于预定阈值的情况下,确定出所述第一待测账号的账号风险状态与所述已知账号一致。
  23. 一种计算机可读存储介质,其上存储有计算机程序,当所述计算机程序在计算机中执行时,令计算机执行权利要求1-9、19-20中任一项的所述的方法。
  24. 一种计算设备,包括存储器和处理器,其特征在于,所述存储器中存储有可执行代码,所述处理器执行所述可执行代码时,实现权利要求1-9、19-20中任一项所述的方法。
PCT/CN2020/075012 2019-03-25 2020-02-13 确定关系网络图中图节点向量的方法及装置 WO2020192289A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910228862.2A CN110032665B (zh) 2019-03-25 2019-03-25 确定关系网络图中图节点向量的方法及装置
CN201910228862.2 2019-03-25

Publications (1)

Publication Number Publication Date
WO2020192289A1 true WO2020192289A1 (zh) 2020-10-01

Family

ID=67236643

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/075012 WO2020192289A1 (zh) 2019-03-25 2020-02-13 确定关系网络图中图节点向量的方法及装置

Country Status (2)

Country Link
CN (1) CN110032665B (zh)
WO (1) WO2020192289A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537272A (zh) * 2021-03-29 2021-10-22 之江实验室 基于深度学习的半监督社交网络异常账号检测方法

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110032665B (zh) * 2019-03-25 2023-11-17 创新先进技术有限公司 确定关系网络图中图节点向量的方法及装置
CN112580916B (zh) * 2019-09-30 2024-05-28 深圳无域科技技术有限公司 数据评估方法、装置、计算机设备和存储介质
CN111046299B (zh) * 2019-12-11 2023-07-18 支付宝(杭州)信息技术有限公司 针对关系网络的特征信息提取方法及装置
CN111813951A (zh) * 2020-06-18 2020-10-23 国网上海市电力公司 一种基于技术图谱的关键点识别方法
CN111538870B (zh) * 2020-07-07 2020-12-18 北京百度网讯科技有限公司 文本的表达方法、装置、电子设备及可读存储介质
CN111784208A (zh) * 2020-07-30 2020-10-16 支付宝(杭州)信息技术有限公司 调查任务处理方法、装置、设备及存储介质
CN111930463A (zh) * 2020-09-23 2020-11-13 杭州橙鹰数据技术有限公司 展示方法及装置
CN113190790B (zh) * 2021-03-30 2023-05-30 桂林电子科技大学 一种基于多移位算子的时变图信号重构方法
CN113378899B (zh) * 2021-05-28 2024-05-28 百果园技术(新加坡)有限公司 非正常账号识别方法、装置、设备和存储介质
CN113609345B (zh) * 2021-09-30 2021-12-10 腾讯科技(深圳)有限公司 目标对象关联方法和装置、计算设备以及存储介质
CN115102920B (zh) * 2022-07-28 2022-11-18 京华信息科技股份有限公司 基于关系网络的个体的传输管控方法
CN115994373B (zh) * 2023-03-22 2023-05-30 山东中联翰元教育科技有限公司 基于大数据的高考志愿填报系统数据加密方法
CN115994374B (zh) * 2023-03-23 2023-05-19 汶上县金源物流有限公司 一种物流流转分拣信息管理方法及系统

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243593A (zh) * 2015-08-04 2016-01-13 电子科技大学 基于混合测度的加权网络社区聚类方法
CN108563710A (zh) * 2018-03-27 2018-09-21 腾讯科技(深圳)有限公司 一种知识图谱构建方法、装置及存储介质
CN110032665A (zh) * 2019-03-25 2019-07-19 阿里巴巴集团控股有限公司 确定关系网络图中图节点向量的方法及装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3069305B1 (en) * 2013-11-15 2020-11-04 Intel Corporation Methods, systems and computer program products for using a distributed associative memory base to determine data correlations and convergence therein
CN105447028B (zh) * 2014-08-27 2019-06-28 阿里巴巴集团控股有限公司 识别特征账号的方法及装置
CN106445988A (zh) * 2016-06-01 2017-02-22 上海坤士合生信息科技有限公司 一种大数据的智能处理方法和系统
US10984045B2 (en) * 2017-05-24 2021-04-20 International Business Machines Corporation Neural bit embeddings for graphs
CN108345901A (zh) * 2018-01-17 2018-07-31 同济大学 一种基于自编码神经网络的符号图节点分类方法
CN108304865A (zh) * 2018-01-19 2018-07-20 同济大学 一种基于循环神经网络的图节点分类方法
CN108921566B (zh) * 2018-05-03 2021-11-05 创新先进技术有限公司 一种基于图结构模型的虚假交易识别方法和装置
CN109063041B (zh) * 2018-07-17 2020-04-07 阿里巴巴集团控股有限公司 关系网络图嵌入的方法及装置
CN109118053B (zh) * 2018-07-17 2022-04-05 创新先进技术有限公司 一种盗卡风险交易的识别方法和装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105243593A (zh) * 2015-08-04 2016-01-13 电子科技大学 基于混合测度的加权网络社区聚类方法
CN108563710A (zh) * 2018-03-27 2018-09-21 腾讯科技(深圳)有限公司 一种知识图谱构建方法、装置及存储介质
CN110032665A (zh) * 2019-03-25 2019-07-19 阿里巴巴集团控股有限公司 确定关系网络图中图节点向量的方法及装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113537272A (zh) * 2021-03-29 2021-10-22 之江实验室 基于深度学习的半监督社交网络异常账号检测方法
CN113537272B (zh) * 2021-03-29 2024-03-19 之江实验室 基于深度学习的半监督社交网络异常账号检测方法

Also Published As

Publication number Publication date
CN110032665B (zh) 2023-11-17
CN110032665A (zh) 2019-07-19

Similar Documents

Publication Publication Date Title
WO2020192289A1 (zh) 确定关系网络图中图节点向量的方法及装置
US11995702B2 (en) Item recommendations using convolutions on weighted graphs
CN112085172B (zh) 图神经网络的训练方法及装置
US11315032B2 (en) Method and system for recommending content items to a user based on tensor factorization
WO2022063151A1 (en) Method and system for relation learning by multi-hop attention graph neural network
US8458182B2 (en) Method and system for clustering data arising from a database
CN111612039A (zh) 异常用户识别的方法及装置、存储介质、电子设备
CN112085615A (zh) 图神经网络的训练方法及装置
CN111046299A (zh) 针对关系网络的特征信息提取方法及装置
Mu et al. A novel aggregation principle for hesitant fuzzy elements
WO2023103527A1 (zh) 一种访问频次的预测方法及装置
Li Linear operator‐based statistical analysis: A useful paradigm for big data
CN115943397A (zh) 垂直分区数据的联邦双随机核学习
US10769100B2 (en) Method and apparatus for transforming data
CN115439192A (zh) 医疗商品信息的推送方法及装置、存储介质、计算机设备
CN114139593A (zh) 一种去偏差图神经网络的训练方法、装置和电子设备
CN116883151A (zh) 对用户风险的评估系统进行训练的方法及装置
Charlier et al. Profiling smart contracts interactions with tensor decomposition and graph mining
Ma et al. Fuzzy nodes recognition based on spectral clustering in complex networks
CN111368337B (zh) 保护隐私的样本生成模型构建、仿真样本生成方法及装置
CN114429404A (zh) 一种多模异质社交网络社区发现方法
CN110910198A (zh) 非正常对象预警方法、装置、电子设备及存储介质
Şimşek et al. Heteroskedasticity-consistent covariance matrix estimators in small samples with high leverage points
WO2018184463A1 (en) Statistics-based multidimensional data cloning
CN113221023B (zh) 信息推送方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20777907

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20777907

Country of ref document: EP

Kind code of ref document: A1