CN113742430B

CN113742430B - Method and system for determining number of triangle structures formed by nodes in graph data

Info

Publication number: CN113742430B
Application number: CN202110893175.XA
Authority: CN
Inventors: 邹磊; 林殷年
Original assignee: Peking University
Current assignee: Peking University
Priority date: 2021-08-04
Filing date: 2021-08-04
Publication date: 2024-03-29
Anticipated expiration: 2041-08-04
Also published as: CN113742430A

Abstract

The application discloses a method and a system for determining the number of triangle structures formed by nodes in graph data, and belongs to the technical field of Internet. The method comprises the following steps: assigning numbers to the nodes based on the number of the corresponding first neighbor nodes of the nodes; determining a first direction of each side based on the size of the corresponding number of each side connecting two nodes in the graph data; determining other nodes pointed by the first direction of the edge connected with any node, and determining the other nodes as second neighbor nodes of any node; and for two nodes connected on any side, determining whether the two nodes both have corresponding second neighbor nodes, if so, determining the target number of the same second neighbor nodes corresponding to the two nodes in each task node pair, and determining the sum of the target numbers corresponding to each task node as the number of triangle structures consisting of the nodes in the graph data. By adopting the method and the device, the efficiency of determining the number of the triangular structures formed by the nodes in the graph data can be improved.

Description

Method and system for determining number of triangle structures formed by nodes in graph data

Technical Field

The present invention relates to the field of internet technologies, and in particular, to a method and system for determining the number of triangle structures formed by nodes in graph data.

Background

With the development of internet technology and big data, graph data (graph) is widely used in various fields. The graph is a data structure, and can comprise nodes and edges, wherein the nodes can store data of different objects according to different application scenes, and the edges can represent the relation among the different objects. For example, the graph data may be used to store data of each account in the social software, for example, different nodes may store attribute information (such as gender, age, etc.) of the corresponding account, an edge between connected nodes may represent a friend relationship between two accounts, and an edge may also have a direction to represent an addition relationship of friends, etc.

In the graph data, if there is one edge between any two nodes for three nodes, it can be determined that the three nodes constitute a triangle structure. Wherein the number of all triangles included in the graph data can reflect the degree of tightness between nodes in the graph data. For example, in the graph data of each account information in the storage social software, if three nodes form a triangle structure, it can be stated that the accounts corresponding to the three nodes are friends, or that one account is a common friend of the other two accounts. Thus, the more the structure of triangles made up of nodes in the graph data, the tighter the relationship between the individual accounts is accounted for. Therefore, the number of triangle structures formed by nodes in the graph data is generally calculated to perform different downstream tasks. For example, the number of the structures of the triangle formed by the nodes in one of the subgraphs in the social relation graph is calculated, and whether the users in the subgraph exist in the same social relation or not is determined, for example, the users may be classmates, colleagues and the like, so that advertisements, contents and the like can be recommended to each user in the subgraph.

In the related art, the number of triangles is generally calculated by traversing edges, that is, for two nodes connected by each edge, whether a public neighbor node exists in the two nodes is determined, and the number of public neighbor nodes exists in the two nodes, that is, the number of triangle structures formed by the two nodes connected by the edge and other nodes. However, each triangle structure in the graph data has three sides, which results in the same triangle being repeatedly computed three times, which results in a lower efficiency in determining the number of triangle structures formed by nodes in the graph data.

Disclosure of Invention

The embodiment of the application provides a method and a system for determining the number of triangle structures formed by nodes in graph data, which can improve the efficiency of determining the number of triangle structures formed by the nodes in the graph data. The technical scheme is as follows:

in a first aspect, a method for determining the number of triangle structures formed by nodes in graph data is provided, where the method includes:

determining the number of the nodes corresponding to the first neighbor nodes based on each edge among the nodes in the graph data, and allocating numbers to the nodes based on the number of the nodes corresponding to the first neighbor nodes, wherein the numbers corresponding to the nodes with the larger number of the first neighbor nodes are smaller;

Determining a first direction of each side based on the size of a corresponding number of each side connecting two nodes in the graph data;

for any node, determining other nodes pointed by a first direction of an edge connected with the any node, and determining the other nodes as second neighbor nodes of the any node;

for two nodes connected on any side in the graph data, determining whether the two nodes have corresponding second neighbor nodes, and if the two nodes have corresponding second neighbor nodes, determining the two nodes as task node pairs;

and determining the target number of the same second neighbor nodes corresponding to two nodes in each task node pair, and determining the sum of the target numbers corresponding to each task node as the number of triangle structures consisting of nodes in the graph data.

Optionally, before determining the target number of the same second neighbor nodes corresponding to two nodes in each task node pair, the method further includes:

dividing each node into a first class node and a second class node based on the number of second neighbor nodes corresponding to each node, wherein the number of the second neighbor nodes corresponding to the first class node is larger than the number of the second neighbor nodes corresponding to the second class node.

Optionally, the determining the target number of the same second neighbor nodes corresponding to two nodes in each task node pair includes:

determining a task node pair only comprising the first type of nodes as a first type of task node pair; determining a task node pair comprising a first class node and a second class node as a second class task node pair; determining a task node pair only comprising the second class of nodes as a third class of task node pair;

determining first numbers of identical second neighbor nodes corresponding to two nodes in each first class task node pair; determining the second number of the same second neighbor nodes corresponding to two nodes in each second class task node pair; determining the third number of the same second neighbor nodes corresponding to two nodes in each third class task node pair;

and determining the sum of the target numbers corresponding to each task node as the number of triangle structures consisting of nodes in the graph data, wherein the method comprises the following steps:

and determining the sum of the first number, the second number and the third number as the number of triangle structures consisting of nodes in the graph data.

Optionally, the method further comprises:

Determining adjacency list information corresponding to the first type of nodes, wherein the adjacency list information comprises binary groups, the binary groups are spliced by binary subarrays corresponding to the first type of nodes according to the number sequence corresponding to the first type of nodes, in any binary subarray corresponding to the first type of nodes, if the value of an N bit is 1, the node with the number of N is a second neighbor node of any first type of nodes, and if the value of the N bit is 0, the node with the number of N is not the second neighbor node of any first type of nodes, wherein N is an integer greater than or equal to 0.

Optionally, the method further comprises:

for a binary sub-array corresponding to any first type node in the binary array, dividing the binary sub-array into a plurality of array blocks according to a preset dividing number, and determining the sequence identifier of each array block; removing array blocks with all elements of 0, and forming the rest array blocks into compressed binary subarrays corresponding to any first type node;

and forming an array block indication array by the sequence identification of the array block corresponding to the compressed binary sub-array, and forming the compressed binary sub-array into the compressed binary array.

Optionally, the determining the first number of the same second neighbor nodes corresponding to two nodes in each first class of task node pair includes:

for each first-class task node pair, acquiring array block indication arrays respectively corresponding to two nodes in the first-class task node pair, if the array block indication arrays comprise the same target sequence identifiers, acquiring a first array block and a second array block corresponding to the target sequence identifiers in binary sub-arrays respectively corresponding to the two nodes in the first-class task node pair, and performing bit-wise and operation on the first array block and the second array block to obtain the number of the second neighbor nodes of the two nodes in the first-class task node pair; and summing the numbers of the second neighbor nodes with the same two nodes in each first-class task node pair to obtain the first number.

Optionally, determining the second number of the same second neighbor nodes corresponding to the two nodes in each second class of task node pair includes:

for each second class task node pair, acquiring a target array block indication array corresponding to a first class node in the second class task node pair, and acquiring a target number corresponding to each second neighbor of the second class node in the second class task node pair;

For each target number, converting the target number into a first sequence identifier of a corresponding array block based on the target number and the dividing number; if the target array block indicates that the array has the first sequence identifier, determining that the node corresponding to the target number is the same second neighbor node corresponding to two nodes in the second class task node pair;

and determining the number of the same second neighbor nodes corresponding to each second class of task nodes as the second numerical value.

Optionally, determining the first number of the same second neighbor nodes corresponding to two nodes in each first-class task node pair; determining the second number of the same second neighbor nodes corresponding to two nodes in each second class task node pair; determining a third number of identical second neighbor nodes corresponding to two nodes in each third class of task node pair, including:

based on a first calculation module, executing the first number of the same second neighbor nodes corresponding to two nodes in each first-class task node pair; based on a second calculation module, executing the process of determining the second number of the same second neighbor nodes corresponding to two nodes in each second class task node pair, and based on a third calculation module, executing the process of determining the third number of the same second neighbor nodes corresponding to two nodes in each third class task node pair;

And respectively detecting the task processing conditions of the first computing module, the second computing module and the third computing module, and controlling any computing module to execute the processing corresponding to the other computing modules when any computing module completes the corresponding processing.

Optionally, the graph data is graph data storing user information, nodes in the graph data are attribute information of corresponding users, and the edge is used for indicating that users corresponding to two nodes connected by the edge are friends;

after the number of triangle structures formed by nodes in the graph data is determined, the method further comprises:

and inputting the number of triangle structures formed by nodes in the graph data into an aggregation coefficient analysis model to obtain the degree of tightness of the relationship among users corresponding to each node of the graph data.

In a second aspect, a system for determining the number of triangle structures formed by nodes in graph data is provided, where the system includes a CPU and an FPGA, and where:

the CPU is used for determining the number of the nodes corresponding to the first neighbor nodes based on the edges among the nodes in the graph data, and allocating numbers to the nodes based on the number of the nodes corresponding to the first neighbor nodes, wherein the numbers corresponding to the nodes with the larger number of the first neighbor nodes are smaller; determining a first direction of each side based on the size of a corresponding number of each side connecting two nodes in the graph data; for any node, determining other nodes pointed by a first direction of an edge connected with the any node, and determining the other nodes as second neighbor nodes of the any node; for two nodes connected on any side in the graph data, determining whether the two nodes have corresponding second neighbor nodes, and if the two nodes have corresponding second neighbor nodes, determining the two nodes as task node pairs;

The FPGA is used for determining the target number of the same second neighbor nodes corresponding to two nodes in each task node pair, and determining the sum of the target numbers corresponding to each task node as the number of triangle structures formed by the nodes in the graph data.

Optionally, the CPU is configured to:

Optionally, the method is characterized in that,

the CPU is used for: determining a task node pair only comprising the first type of nodes as a first type of task node pair; determining a task node pair comprising a first class node and a second class node as a second class task node pair; determining a task node pair only comprising the second class of nodes as a third class of task node pair;

the FPGA is used for: determining first numbers of identical second neighbor nodes corresponding to two nodes in each first class task node pair; determining the second number of the same second neighbor nodes corresponding to two nodes in each second class task node pair; and determining the third number of the same second neighbor nodes corresponding to two nodes in each third type of task node pair, and determining the sum of the first number, the second number and the third number as the number of triangle structures consisting of the nodes in the graph data.

Optionally, the CPU is configured to:

Optionally, the FPGA is configured to:

Optionally, the FPGA includes a first computing unit, a second computing unit, and a third computing unit;

the first computing unit is used for executing the first number of the same second neighbor nodes corresponding to two nodes in each first type task node pair;

the second computing unit is used for executing the second number processing of determining the same second neighbor nodes corresponding to two nodes in each second class task node pair;

the third computing unit is used for determining the third number of the same second neighbor nodes corresponding to two nodes in each third class task node pair;

the CPU is used for: and respectively detecting the task processing conditions of the first computing unit, the second computing unit and the third computing unit, and controlling any computing unit to execute the processing corresponding to the other computing units when any computing unit completes the corresponding processing.

The CPU is further configured to: and inputting the number of triangle structures formed by nodes in the graph data into an aggregation coefficient analysis model to obtain the degree of tightness of the relationship among users corresponding to each node of the graph data.

In a third aspect, there is provided an apparatus for determining the number of triangle structures formed by nodes in graph data, the apparatus comprising:

the first determining module is used for determining the number of the nodes corresponding to the first neighbor nodes based on the edges among the nodes in the graph data, and allocating numbers to the nodes based on the number of the nodes corresponding to the first neighbor nodes, wherein the numbers corresponding to the nodes with the larger number of the first neighbor nodes are smaller;

the second determining module is used for determining the first direction of each side based on the size of the corresponding number of each side connecting two nodes in the graph data;

the third determining module is used for determining other nodes pointed by the first direction of the edge connected with any node for any node, and determining the other nodes as second neighbor nodes of the any node;

a fourth determining module, configured to determine, for two nodes connected to any one side in the graph data, whether the two nodes both have corresponding second neighbor nodes, and if the two nodes both have corresponding second neighbor nodes, determine the two nodes as a task node pair;

And a fifth determining module, configured to determine the target number of the same second neighbor nodes corresponding to two nodes in each task node pair, and determine the sum of the target numbers corresponding to each task node as the number of triangle structures formed by the nodes in the graph data.

Optionally, the apparatus further includes a dividing module, configured to:

Optionally, the fifth determining module is configured to:

Optionally, the apparatus further includes a sixth determining module configured to:

Optionally, the sixth determining module is further configured to:

Optionally, the fifth determining module is configured to:

for each first-class task node pair, acquiring array block indication arrays respectively corresponding to two nodes in the first-class task node pair, if the array block indication arrays comprise the same target sequence identifiers, acquiring a first array block and a second array block corresponding to the target sequence identifiers in binary sub-arrays respectively corresponding to the two nodes in the first-class task node pair, and performing bit-wise and operation on the first array block and the second array block to obtain the number of the second neighbor nodes of the two nodes in the first-class task node pair; summing the numbers of the second neighbor nodes with the same two nodes in each first-class task node pair to obtain the first number;

optionally, the fifth determining module is configured to:

Optionally, the fifth determining module is configured to:

the apparatus further comprises an analysis module for: and inputting the number of triangle structures formed by nodes in the graph data into an aggregation coefficient analysis model to obtain the degree of tightness of the relationship among users corresponding to each node of the graph data.

In a fourth aspect, there is provided a computer device comprising a processor and a memory having stored therein at least one instruction that is loaded and executed by the processor to perform the operations performed by the method of determining the number of triangle structures formed by nodes in graph data according to the first aspect.

In a fifth aspect, a computer readable storage medium has stored therein at least one instruction that is loaded and executed by a processor to implement operations performed by a method for determining the number of triangle structures formed by nodes in graph data according to the first aspect.

The beneficial effects that technical scheme that this application embodiment provided brought are:

According to the embodiment of the application, each neighbor node is numbered according to the number of the first neighbor nodes of the node, the corresponding first direction is allocated to each edge according to the size of the number, the second neighbor node corresponding to each node is determined according to the first direction, the edges of the second neighbor nodes corresponding to each edge connection node can be filtered out according to the second neighbor nodes corresponding to each edge connection node (namely, only the edges of the second neighbor nodes exist in two connected nodes are reserved), and therefore when the number of triangle structures formed by the nodes in the graph data is determined, the efficiency of determining the number of triangle structures formed by the nodes in the graph data can be improved due to the reduction of the number of the edges. In addition, the first direction of the edge is determined according to the size of the number of the distribution node, and the second neighbor nodes corresponding to the nodes are determined according to the first direction of the edge, so that for three edges forming a triangle structure, only two nodes connected by one edge have the same second neighbor nodes, and therefore three times of repeated calculation on the three edges of the triangle structure are avoided, and the efficiency of determining the number of the triangle structures formed by the nodes in the graph data can be further improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of an implementation environment provided by embodiments of the present application;

FIG. 2 is a schematic diagram of a method for determining the number of triangle structures formed by nodes in graph data according to an embodiment of the present application;

FIG. 3 is a flowchart of a method for determining the number of triangle structures formed by nodes in graph data according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a method for determining the number of triangle structures formed by nodes in graph data according to an embodiment of the present application;

FIG. 5 is a schematic diagram of a method for determining the number of triangle structures formed by nodes in graph data according to an embodiment of the present application;

FIG. 6 is a schematic diagram of a method for determining the number of triangle structures formed by nodes in graph data according to an embodiment of the present application;

FIG. 7 is a flowchart of a method for determining the number of triangle structures formed by nodes in graph data according to an embodiment of the present application;

FIG. 8 is a schematic diagram of a device for determining the number of triangle structures formed by nodes in graph data according to an embodiment of the present application;

fig. 9 is a schematic structural diagram of a computer device according to an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the present application more apparent, the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

FIG. 1 is a schematic diagram of an implementation environment provided by embodiments of the present application. Referring to fig. 1, the method for determining the number of triangle structures formed by nodes in graph data is provided. The method may be implemented by a computer device, which may be a terminal or a server or the like. The computer device may include a processing module such as a CPU (central processing unit ), an FPGA (field programmable gate array, field programmable gate array), and a storage module such as a mechanical hard Disk (SSD) and a Solid State Disk (SSD). The storage module may store program instructions (such as program codes) corresponding to the method for determining the number of triangle structures formed by the nodes in the graph data, corresponding execution data (such as graph data), and the like. The processing module executes the program instructions in the storage module and processes the execution data in the storage module, so that the method for determining the number of triangle structures formed by nodes in the graph data is realized.

In this application, the graph data refers to storage data having a graph structure, and as shown in fig. 2, a plurality of nodes are included in the graph, and a relationship between any two nodes can be represented by a connection edge. For example, the graph data may be a social relationship graph stored by a background server of the social application, in each node of which attribute information of the user, such as gender, age, favorites, etc., may be stored. Edges between nodes may represent friends relationships, attention relationships, etc. of corresponding users of two nodes connected. If there is an edge between any two nodes for three nodes, it can be determined that the three nodes form a triangle structure, such as triangle ABC formed by node a, node B, and node C in fig. 2. If there are a large number of triangle structures included in one sub-graph in the social relationship graph, users in that sub-graph may exist in the same social relationship, e.g., may be colleagues, etc. Therefore, the degree of tightness among users corresponding to each node in the social relation graph can be determined by counting the triangle structures formed by each node in the social relation graph.

Fig. 3 is a schematic diagram of a method for determining the number of triangle structures formed by nodes in graph data according to an embodiment of the present application, and referring to fig. 3, the method includes:

step 301, determining the number of the nodes corresponding to the first neighbor nodes based on the edges between the nodes in the graph data, and allocating numbers to the nodes based on the number of the nodes corresponding to the first neighbor nodes.

The larger the number of the first neighbor nodes is, the smaller the corresponding number of the nodes is. In the embodiment of the application, the graph data may be a social network graph stored in a background server of the social application, that is, attribute information of the user, such as gender, age, preference, etc., may be stored in nodes in the graph data. The edges between nodes can represent friend relations between users corresponding to the two nodes respectively.

In implementation, a neighbor node (first neighbor node) corresponding to each node may be determined according to edges between nodes in graph data (which may be referred to as a graph hereinafter). When the graph is an undirected graph, for any node, the first neighbor node of the node can be the node directly connected with the node through the edge; when the graph is a directed graph, for any node, the first neighbor node of the node may be the node to which the node points by the outgoing edge.

After determining the number of the first neighbor nodes corresponding to each node, each node may be numbered according to the number of the first neighbor nodes corresponding to each node. For the node corresponding to the node with the larger number of the first neighbor nodes, for example, the number of the node with the largest number of the first neighbor nodes is 0, the number of the node with the largest number of the first neighbor nodes is 1, and so on, the number corresponding to each node can be determined. Wherein, for nodes having the same number of first neighbor nodes, the corresponding numbers may be arranged randomly.

Step 302, determining a first direction of each edge based on the size of the corresponding number of each edge connecting two nodes in the graph data.

After each node in the graph is assigned a corresponding number, a direction (first direction) may be reassigned for each edge according to the size of each node number. For two nodes where an edge originally exists, the first direction of the corresponding edge may be from the node with the smaller number to the node with the larger number.

Step 303, for any node, determining other nodes pointed by the first direction of the edge connected by any node, and determining the other nodes as second neighbor nodes of any node.

After numbering the nodes in the graph and allocating a first direction to each edge, a second neighbor node corresponding to each node can be determined according to the first direction of each edge. That is, for any node, the node pointed to by the first direction of the edge connected to that node may be determined as the second neighbor node of that node.

Step 304, for two nodes connected on any side in the graph data, determining whether the two nodes have corresponding second neighbor nodes, and if the two nodes have corresponding second neighbor nodes, determining the two nodes as task node pairs.

In this application, before determining the number of triangle structures formed by the nodes in the graph data (which may be referred to as determining the number of triangles in the graph hereinafter), filtering may be performed on the edges according to two nodes connected by the edges. Wherein, for any one edge, the number of two nodes which can be connected by the edge can be represented. For example, for the edges connecting the two nodes numbered 2 and 3, this can be represented by (2, 3).

The filtering treatment for the edges is as follows: two nodes connected by any side can be determined, whether the two nodes have second neighbor nodes or not is determined, and if one node has no corresponding second neighbor node, the side can be filtered out. This is because, of the two nodes connected by one edge, one node does not have a corresponding second neighbor node, that node must not have the same neighbor node as the other node, that is, that node and the other node do not form a triangle structure with the other node through the edge. Otherwise, if the two nodes have the second neighbor node, it may be stated that the two nodes may have the same second neighbor node, and may form a triangle structure with other nodes through edges. Therefore, two nodes connected on any side can be provided with corresponding second neighbor nodes, corresponding sides can be reserved, and the reserved sides can be indicated by recording the numbers corresponding to the two nodes, wherein the numbers corresponding to the two recorded nodes are task node pairs.

As shown in fig. 4, fig. 4 is a graph composed of numbers 0,1, 2, 3, and 4, and it can be seen from the graph that the second neighbor node of the node 0 (the node with the number 0) is the node 1 and the node 2, the second neighbor node of the node 1 is the node 2 and the node 3, the second neighbor node of the node 2 is the node 4, and the nodes 3 and 4 have no second neighbor node. Edges (1, 3), edges (2, 4) can thus be filtered out, leaving edges (0, 1), edges (0, 2) and edges (1, 2), i.e. nodes 0,1, 0,2 and nodes 1,2, identified as task node pairs.

Since the filtering process is performed on the edges in advance, when the number of triangles included in the graph is determined based on the edges in the following, the calculation amount of a part can be reduced, and the efficiency of determining the number of triangles included in the graph can be improved.

Step 305, determining the target number of the same second neighbor nodes corresponding to two nodes in each task node pair, and determining the sum of the target numbers corresponding to each task node as the number of triangle structures composed of the nodes in the graph data.

After filtering the edges in the graph data, traversing the reserved edges (each task node pair), determining the number (target number) of the common second neighbor nodes corresponding to the two nodes in each task node pair, and summing the target number of each task node pair to obtain the number of triangle structures consisting of the nodes in the graph data.

Because in this application, the neighbor node of each node is the node pointed by the corresponding first direction of the edge, that is, the number of the neighbor node of each node is smaller than the number of the neighbor node. Thus, any of the triangular structures in the figures may be the structure shown in fig. 5. Wherein the numbers of node A and node B are less than the number of node C. Thus, for triangle ABC, the triangle can only be determined by edge AB. I.e. by determining the common node C of node a and node B. For the edge BC, node a does not belong to the common node of nodes B, C; for the side AC, since the node B does not belong to the common node of the nodes A, C, the triangle cannot be determined by the side BC and the side AC. Thus, the problem of repeated counting of the same triangle does not exist in the present application.

In one implementation manner, before executing the step 305, each task node may be classified, and for each task node in different classes, the corresponding target number may be determined by different processing manners, where the corresponding processing is as follows:

step 3051, dividing each node into a first class node and a second class node based on the number of second neighbor nodes corresponding to each node, wherein the number of the second neighbor nodes corresponding to the first class node is greater than the number of the second neighbor nodes corresponding to the second class node.

In implementation, the categories to which each node belongs may be classified according to the number of second neighbor nodes corresponding to each node. The class to which the node belongs may be divided into two classes in the present application, that is, the node with more number of corresponding second neighbor nodes is divided into one class (referred to as a first class node or a heavy class node in the present application), and the node with less number of corresponding second neighbor nodes is divided into one class (referred to as a second class node or a light class node in the present application). For example, a technician may preset a threshold of the number of second neighbor nodes for distinguishing two node categories, and divide each node into a first type node and a second type node according to the set threshold and the number of second neighbor nodes corresponding to each node.

Or traversing the nodes from small to large according to the number, marking the traversed nodes as first class nodes in sequence, stopping traversing when the sum of the numbers of second neighbor nodes of the marked first class nodes is larger than or equal to rho times of the total edge number in the graph data, and obtaining the rest nodes which are not traversed as second class nodes. Where 0 < ρ < 1, ρ is an empirical parameter, which may be set to 0.5, for example.

Step 3052, determining adjacency list information corresponding to the first class node and adjacency list information corresponding to the second class node.

For convenience of distinction, the adjacency list information corresponding to the first type node may be referred to as first adjacency list information, and the adjacency list information corresponding to the second type node may be referred to as second adjacency list information in the present application.

The graph may be represented by a CSR (Compressed Sparse Row, column sparse compressed storage) structure prior to determining the first and second adjacency list information. As shown in fig. 6, the CSR structure of fig. 6 is composed of an edge list and an offset list, where the edge list is formed by splicing numbers of the second neighbor nodes corresponding to each node in sequence according to the sequence of the numbers from small to large, and the offset list is used to represent the position of the first second neighbor node of each node in the edge list. Corresponding to the map data in fig. 6, the edge list l1= 123423434 and the offset list l2= 047899.

In the present application, the first adjacency list information and the second adjacency list information may be obtained on the basis of the CSR structure corresponding to the graph. The determination process for the first adjacency list information is as follows:

wherein the first adjacency list information comprises binary digit groups and digit group block indication digit groups.

The binary number group is obtained by splicing binary sub-arrays corresponding to the first type nodes according to the number sequence corresponding to the first type nodes, if the value of the N-th bit is 1, the node with the number N is a second neighbor node of any first type node, and if the value of the N-th bit is 0, the node with the number N is not a second neighbor node of any first type node, wherein N is an integer greater than or equal to 0.

In implementation, for a binary subarray corresponding to any one of the first class nodes, the number of the second neighbor node of the first class node may be determined, then the value of the element at the position corresponding to the number in the binary subarray is set to be 1, and the values of the elements at other positions are set to be 0. Thus, the second neighbor node of the first type node can be represented by a binary subarray. For example, in the graph data in fig. 6, node 0 and node 1 are the first type nodes, the binary subarray corresponding to node 0 is 01111000, and the binary subarray corresponding to node 1 is 00111000. Then, the binary sub-arrays corresponding to each first type node can be spliced according to the sequence from the small number to the large number of the first type node to obtain the binary array included in the first adjacency list information, namely, the binary array corresponding to the first type node in the graph data in fig. 6 is 0111100000111000.

Since in practical applications, the number of nodes in the graph may be large, for example, tens of thousands, or hundreds of millions may exist. Thus, each binary subarray may be compressed as follows: for the binary subarrays corresponding to any first type node in the binary array, dividing the binary subarrays into a plurality of array blocks according to a preset dividing number, and determining the sequence identifier of each array block; and removing the array blocks with all elements of 0, and forming the rest array blocks into compressed binary subarrays corresponding to any first type node. And forming an array block indication array by the sequence identification of the array block corresponding to the compressed binary sub-array, and forming the compressed binary sub-array into the compressed binary array. That is, the compressed binary array and array block indicator array may be included in the first adjacency list information.

In implementation, the binary subarray may be divided into a plurality of array blocks according to a preset division number. An array block with elements of all 0 can be removed, for example, the number of division is 4, and the binary subarray 0001001100000001 can be divided into four array blocks of 0001, 0011, 0000 and 0001, and each array block is assigned a sequence identifier, that is, the sequence identifier corresponding to 0001 is 0, the sequence identifier corresponding to 0011 is 1, the sequence identifier corresponding to 0000 is 2, and the sequence identifier corresponding to 0001 is 3. This results in 000100110001 when the array blocks with all elements 0 are removed. The corresponding array block indicates an array of 013.

In addition, the array blocks may be formed into a sequential identification list in the order of the number of each node of the first class, and the sequential identification offset list may be generated based on the sequential identification list. In the sequence identifier offset list, the value of each mth element may be used to represent the first sequence identifier in the array block corresponding to the first type node with the number of M, and the position in the sequence identifier list. In the first adjacency list information, a sequence identification list and a sequence identification offset list may be further included.

The determination process for the second adjacency list information is as follows:

the second adjacency list information comprises an offset list and an edge list which are determined by the second class node after mapping the number corresponding to the second class node. And mapping the numbers corresponding to the second class nodes. I.e. L may be subtracted from the number of each second class node such that the number of the second class node starts from 0. For example, the numbers of the second class nodes 2, 3 and 4 in fig. 6 are all subtracted by 2, and the numbers of the second class nodes obtained are mapped from 2, 3 and 4 to 0, 1 and 2. And then determining an offset list and an edge list in the CSR structure formed by the second type nodes according to the numbers of the second type nodes after mapping. As for the second class nodes 2, 3, 4 in fig. 6, the corresponding edge list l3=34 and the offset list l4=0122.

Step 3053, determining a task node pair only comprising the first type of nodes as the first type of task node pair; determining a task node pair comprising a first class node and a second class node as a second class task node pair; and determining the task node pair only comprising the second class of nodes as a third class of task node pair.

In the application, the types of task node pairs (which can be called as edge tasks in the follow-up) can be divided in advance according to the types of the nodes, and the number of the triangle structures formed by the task node pairs of different types and other nodes is calculated through different processing methods. The task nodes may be divided into a first class task node pair (first class edge task) including only the first class node, a second class task node pair (second class edge task) including the first class node and the second class node, and a third class task node pair (third class edge task) including only the task node pair of the second class node. Because the number of the neighbor nodes of the first class node is more, and the number of the neighbor nodes of the second class node is less, the edges are classified according to the type of the connection node, and the number of the triangles formed by different edges and other nodes is determined based on different methods, so that the efficiency of determining the number of the triangles in the graph can be improved.

Step 3054, determining first numbers of identical second neighbor nodes corresponding to two nodes in each first class task node pair; determining the second number of the same second neighbor nodes corresponding to two nodes in each second class task node pair; and determining the third number of the same second neighbor nodes corresponding to the two nodes in each third class task node pair.

After determining task node pairs of different categories, the number of triangle structures formed by the task node pairs of different categories and other nodes can be calculated through different processing methods, as follows:

1) Determining the first number of the same second neighbor nodes corresponding to two nodes in each first-class task node pair comprises the following steps:

for each first-class task node pair, acquiring array block indication arrays respectively corresponding to two nodes in the first-class task node pair, if the array block indication arrays comprise the same target sequence identifiers, acquiring a first array block and a second array block corresponding to the target sequence identifiers in binary subarrays corresponding to the two nodes in the first-class task node pair, and performing bit pressing and operation on the first array block and the second array block to acquire the number of the second neighbor nodes of the two nodes in the first-class task node pair; and summing the numbers of the two nodes in each first-class task node pair with the same second neighbor nodes to obtain the first number.

In implementation, since the first-type task node pair includes two first-type nodes, according to the numbers corresponding to the two first-type nodes, and the sequence identification list and the sequence identification offset list in the first adjacency list information, array block indication arrays corresponding to the two first-type nodes are respectively determined. If the same target sequence identifier exists in the two array block indication arrays, the first data block and the second data block in the first class task node pair can be acquired according to the target sequence identifier. The first data block and the second data block correspond to different nodes in the first class of task node pairs. And then carrying out bit pressing and operation on the first array block and the second array block, and determining the number of 1's in the result after the bit pressing and operation. And summing the numbers of 1 corresponding to each target sequence identifier to obtain the numbers of the second neighbor nodes which are the same in the two nodes in the first-class task node pair. And then summing the numbers of the second neighbor nodes with the same second neighbor nodes in the two nodes in each first-class task node pair to obtain the first number.

For determining the same target sequence identifier in the array block indication array corresponding to the two first type nodes respectively, the sequence identifiers in the array block indication array corresponding to the two first type nodes pointed to by the pointers can be sequentially compared by maintaining the two pointers, and the sequence identifiers are determined as follows:

1. Two pointers are maintained that point to the first element of the two array block indication arrays, respectively, before the iteration begins.

2. And (3) starting iteration, judging the size relation of numbered parts in list elements pointed by the two pointers, if the two sequential labels are equal, executing the step (3), otherwise, executing the step (4).

3. If the sequential labels are equal, this indicates that there may be 0-k common elements (k indicates the number of elements in the array for the array block), performing bitwise and operation on the array block corresponding to the two sequential labels, and then counting the number of 1 bits in the result, and increasing the triangle count accordingly.

4. If the numbers are not equal, the pointer pointing to the smaller sequential index is moved down one position, and when at least one pointer points to the corresponding array block to indicate the end of the array, the iteration is ended.

2) Determining the second number of the same second neighbor nodes corresponding to the two nodes in each second class of task node pair comprises the following steps:

for each second class task node pair, acquiring a target array block indication array corresponding to a first class node in the second class task node pair, and acquiring a target number corresponding to each second neighbor of the second class node in the second class task node pair; for each target number, converting the target number into a first sequence identifier of a corresponding array block based on the target number and the dividing number; if the target array block indicates that the array has the first sequence identifier, determining that the node corresponding to the target number is the same second neighbor node corresponding to two nodes in the second class task node pair; and determining the number of the same second neighbor nodes corresponding to each second class task node as a second numerical value.

In an implementation, the second class of task nodes includes a first class of nodes and a second class of nodes. And for the second class node, the number of the second neighbor corresponding to the second class node can be obtained according to the second adjacency list information. For example, in the offset list in the second adjacency list information, determining a first starting position of a second neighbor node of the second type node in the edge list and a second starting position of a second neighbor node of a next second type node corresponding to the second type node in the number in the edge list, and determining the number of the second neighbor node of the second type node according to the first starting position and the second starting position. And then, according to the L value recorded before, the number of the second neighbor node is mapped back to the number originally allocated according to the second neighbor node. And converting the number into a sequence identifier according to the value of the number. That is, dividing the number by L and rounding down, a first order identifier is obtained that converts the target number into a corresponding array block.

And then, according to the number of the first type node, acquiring an array block indication array corresponding to the first type node from the first adjacency list information. If the array block indicates that the corresponding first sequence identifier exists in the array, the first class node and the second class node are determined to correspond to the same second neighbor node. Therefore, each second neighbor node of the second class nodes can be determined in turn, the same second neighbor node can be found in the second neighbor nodes corresponding to the first class nodes, the number of the second neighbor nodes with the same two nodes in each second class task node pair can be further determined, and the sum of the corresponding numbers is determined to be a second numerical value.

The processing for determining whether the corresponding first sequence identifier exists in the array indicated by the array block corresponding to the first type task node is as follows:

1. traversing each number in the second neighbor nodes of the second class nodes, respectively determining each number as a target value, and circularly executing the following steps 2-5.

2. And maintaining registers of two record array blocks for indicating the positions of elements in the array, namely a left subscript and a right subscript, and initializing the registers to the first position and the last position of the elements in the array block for indicating the positions of the elements in the array respectively.

3. Calculating the average value of the two registers, rounding, marking as a middle index, comparing the size relation between the value of the middle index and the target value in the second neighbor list of the first type node, if the sizes are equal, turning to the step 4, otherwise, turning to the step 5

4. If the sizes are equal, a common element is found, the triangle counter is incremented by one, the loop is jumped out, the next number corresponding to the current target value is determined as the target value, and the process of the step 2 is carried out.

5. If the value of the middle index is larger than the number of the target value, the right index is assigned as the middle index minus one, otherwise, the left index is assigned as the middle index plus one, and the step 3 is repeated.

3) The process of determining the third number of the same second neighbor nodes corresponding to two nodes in each third class task node pair is as follows:

and obtaining the numbers of the second neighbor nodes corresponding to the two nodes in the third class task node pair respectively, and determining the number of the numbers of the same second neighbor nodes in the numbers of the second neighbor nodes corresponding to the two nodes respectively. And summing the numbers of the numbers including the same second neighbor nodes corresponding to each three kinds of task node pairs to obtain the third number of the same second neighbor nodes corresponding to two nodes in each third kind of task node pair.

In implementation, the third class task node includes two second class nodes, and the number of the second neighbor corresponding to the second class node can be obtained according to the second adjacency list information. For example, the number of the second class node, in the offset list in the second adjacency list information, the first starting position of the second neighbor node of the second class node in the edge list is determined, the second starting position of the second neighbor node of the next second class node corresponding to the second class node in the number in the edge list is used, and according to the first starting position and the second starting position, the numbers of the second neighbor nodes of the two second class nodes in the second class task node pair can be determined. And further determining the number of the numbers of the same second neighbor nodes in the numbers of the second neighbor nodes corresponding to the two nodes respectively. And then summing the numbers of the second neighbor nodes corresponding to the same third type of task nodes to obtain a third number.

The process of determining the number of the numbers of the same second neighbor nodes in the numbers of the second neighbor nodes corresponding to the two nodes respectively may be as follows:

1. and maintaining two pointers, wherein before iteration starts, the two pointers respectively point to the first elements of two neighbor lists, and the two neighbor lists are neighbor lists formed by numbers corresponding to the second neighbor nodes corresponding to the two second-class nodes.

2. Starting iteration, and judging the size relation of neighbor list elements pointed by two pointers: if the two elements are equal, indicating that a common element is found, adding one to the triangle count, and pointing to the next element in the list by both pointers; if the two elements are not equal, then modifying the pointer to the smaller element points to the next element, the other pointer being unchanged.

3. The above step 2 is repeated until at least one pointer has pointed to the end of the neighbour list, at which point no new common element can be found anymore and the calculation is ended.

Step 3055, determining the sum of the first number, the second number and the third number as the number of triangle structures composed of nodes in the graph data.

In implementation, after determining the first number corresponding to the first type of edge task, the second number corresponding to the second type of edge task, and the third number corresponding to the third type of edge task, the sum of the first number, the second number, and the third number may be determined as the number of triangle structures composed of nodes in the graph data.

In addition, in the application, a method for dynamically scheduling tasks is also provided, wherein the first-class side tasks, the second-class side tasks and the third-class side tasks can be processed through different computing modules. The computing module may be a virtual computing module, such as a different processing process. Or may be a physical processing unit such as a different computing unit in an FPGA.

The first computing module is used for executing the first number of the same second neighbor nodes corresponding to two nodes in each first type of task node pair; determining a second number of identical second neighbor nodes corresponding to two nodes in each second class of task node pair, wherein the second number of identical second neighbor nodes can be executed by a second computing module; the process of determining a third number of identical second neighbor nodes corresponding to two nodes in each third class of task node pair may be performed by the second computing module. In the process of executing corresponding processing by the first computing module, the second computing module and the third computing module, the efficiency of determining the number of the triangle structures formed by the nodes in the graph data can be respectively detected. And when any computing module finishes corresponding processing, controlling the efficiency of determining the number of the node forming triangle structures in the graph data. Any calculation module executes the efficiency of determining the number of the node forming triangle structures in the graph data. And processing corresponding to other computing modules. And further, the dynamic scheduling of the processing tasks of different computing modules is realized, and the efficiency of determining the number of the triangle structures formed by the nodes in the graph data is improved.

In the present application, the graph data is graph data storing user information, such as a social network graph stored in a background server of a social application program, and nodes in the graph data are attribute information of a corresponding user, such as gender, age, preference, and the like. The edge is used for indicating that users corresponding to two nodes connected by the edge are friends. After the number of triangle structures in the social network diagram is determined, the number of triangle structures formed by nodes in the diagram data can be input into an aggregation coefficient analysis model to obtain the degree of relationship tightness among users corresponding to the nodes of the diagram data.

In practice, in social media, the user-to-user relationships are typically represented using graph data, which is commonly referred to as a social network graph. In such graph data, nodes typically represent users, and edges in the graph typically represent various relationships between users, such as friends, interests, blackout, and so forth. By mining some properties in the social network diagram, it can help upper layers such as advertising, content recommendation, etc. applications optimize. For example, the number of triangles in the social network graph can be determined, and then the aggregation coefficient of the nodes in the social network graph is calculated, so that the compactness of the combination of the neighbor nodes of a certain node is judged, the more tightly the neighbor nodes are combined, the more likely the neighbor nodes and the node form a community, the more likely the interests and the focuses of the users are similar, and the recommendation can be more accurately performed.

The specific implementation steps of community discovery and recommendation applied to social media can be as follows:

for the infrequent-change social network graph, the number of triangles of the whole social network graph can be calculated off-line and stored in a memory or a hard disk, and when the aggregation coefficient of a certain sub-graph in the whole social network graph needs to be calculated, the pre-calculated data can be directly read. For the social network graph with more frequent change, a method for calculating the local aggregation coefficient can be adopted, the subgraphs of one social network graph are taken as input at a time, and the number of triangles related to the edges in the social network graph is calculated.

The aggregation coefficients are two types of local aggregation coefficients and global aggregation coefficients, and the local aggregation coefficients are calculated as follows:wherein R is _u K is the number of triangles passing through a given node and neighbor nodes thereof _u Representing the degree of a given node. And the calculation of the global aggregation factor may be obtained by a normalized summation of the local aggregation factors.

And judging the stability and the connection tightness degree of communities according to the aggregation coefficient so as to guide the specific application of an upper-layer recommendation system and the like. If the input social network diagram has a higher aggregation coefficient, it is indicated that the users represented by the nodes in the social network diagram are more likely to be in a tight community. For example, the users may be colleagues, colleagues in the same department, close friends, and so forth. The user groups in the compact communities generally have higher common tendency, common preference and common attention, so that when presenting advertisements and recommended contents for users in the communities, the advertisement and content distribution recommendation system at the upper layer can be helped to extract more accurate characteristics by utilizing the information of other users in the same communities.

Fig. 7 is a schematic diagram of a method for determining the number of triangle structures formed by nodes in graph data according to an embodiment of the present application, where the flowchart of the method is applied to a system formed by a CPU and an FPGA. In the system, a CPU and one or more FPGAs establish a connection. The CPU and FPGA may be responsible for the processing of the different steps described below, respectively. Referring to fig. 7, the specific process is as follows:

In step 701, the CPU determines the number of the first neighbor nodes corresponding to each node based on each edge between each node in the graph data, and assigns numbers to each node based on the number of the first neighbor nodes corresponding to each node.

Step 702, the CPU determines a first direction of each edge based on the size of the corresponding number of each edge connecting two nodes in the graph data.

In step 703, the CPU determines, for any node, other nodes pointed by the first direction of the edge connected to any node, and determines the other nodes as second neighbor nodes of any node.

Step 704, the CPU determines, for two nodes connected on any side in the graph data, whether both nodes have corresponding second neighbor nodes, and if both nodes have corresponding second neighbor nodes, determines the two nodes as a task node pair.

In the embodiment of the present application, the processing of steps 701 to 704 is the same as the processing of steps 301 to 302 in the above embodiment, and the specific process is not repeated. The processing in steps 701-704 may be referred to as data preprocessing, that is, processing the graph data by the CPU to obtain each edge task (each task node pair) to be processed by the FPGA, and then sending each graph data and each task node pair to the FPGA for processing. And determining the number of nodes forming a triangle with each task node pair by the FPGA.

Step 705, the FPGA determines the target number of the same second neighbor nodes corresponding to two nodes in each task node pair, and determines the sum of the target numbers corresponding to each task node as the number of triangle structures composed of the nodes in the graph data.

In addition, the present application further provides a method for scheduling tasks, which is used for the FPGA to execute the step 705, and the specific method is as follows:

step 7051, the CPU classifies the edge tasks, and determines the class of the edge task executed by each FPGA and/or the computing module in the FPGA based on the number of FPGAs and/or the number of computing modules in the FPGAs.

In the application, the edge tasks can be divided into three types, and task node pairs only comprising the first type of nodes are determined to be the first type of task node pairs; determining a task node pair comprising a first class node and a second class node as a second class task node pair; and determining the task node pair only comprising the second class of nodes as a third class of task node pair. The specific classification method may refer to steps 3051-3053 in the foregoing embodiment, and in this embodiment of the present application, the first type of task node pair may be referred to as a first type of edge task, the first type of task node pair may be referred to as a second type of edge task, and the third type of task node pair may be referred to as a third type of task.

If the CPU is connected with a plurality of FPGAs, the type of each FPGA executing the edge task can be determined according to the number of the FPGAs. For example, if 3 FPGAs are connected to the CPU, then the type of edge task that each FPGA performs can be determined. Or when the CPU is connected with only one FPGA, the number of the computing modules in the FPGA can be used for determining the type of the edge task executed by each computing module. For example, if the FPGA includes 9 computing modules, it may be determined that every three FPGAs perform the same type of edge task.

In the application, taking an example that a CPU is connected with an FPGA, different types of edge tasks are executed by the FPGA, and other cases are similar and will not be described again. After determining the type of the corresponding edge task of the computing unit in each FPGA, the CPU. A corresponding task table may be generated for each computing unit, where the task table includes the edge tasks (i.e., the task node pairs) that the corresponding computing unit needs to execute, for maintaining the situation in which each computing unit in the FPGA executes the edge tasks. And, the CPU may send the first adjacency list information corresponding to the first type node and the second adjacency list information corresponding to the second type node to a shared storage area of the FPGA, where the shared storage area is commonly accessible by the plurality of computing units. And the task table corresponding to each computing unit may be sent to the corresponding computing unit.

Step 7052, a first calculation unit in the FPGA, configured to perform a process of determining a first number of identical second neighbor nodes corresponding to two nodes in each first class task node pair; the second computing unit is used for executing the second number processing of determining the same second neighbor nodes corresponding to two nodes in each second class task node pair; and the third computing unit is used for determining the third number of the same second neighbor nodes corresponding to the two nodes in each third class task node pair.

Wherein, the first computing unit is used for executing the first side task, the second computing unit is used for executing the second side task, and the third computing unit is used for executing the third side task.

In implementation, the first computing unit may obtain, from the first adjacency list information corresponding to the first type task stored in the FPGA shared memory, array block indication arrays corresponding to two first type nodes in each task node according to the first type task node pair recorded in the corresponding task list, so as to determine the first number of identical second neighbor nodes corresponding to two nodes in each first type task node pair, and specific processing may refer to step 3054, which is not repeated herein.

The second calculating unit may obtain, from the first adjacency list information and the second adjacency list information stored in the FPGA shared memory, an array block indication array corresponding to the first class node in each second class task node and a number of the second neighbor node of the second class node according to the second class task node pair recorded in the corresponding task list. For determining the second number of identical second neighbor nodes corresponding to two nodes in each second class task node pair, refer to step 3054, and details thereof are not repeated herein.

The third computing unit may obtain, according to the third class of task node pairs recorded in the corresponding task table, numbers corresponding to two second class nodes in each third task node from the second adjacency list information corresponding to the second class nodes stored in the FPGA shared memory, so as to determine the third number of identical second neighbor nodes corresponding to two nodes in each second class task node pair, and specific processing may refer to step 3054, which is not described herein again.

In step 7053, the CPU detects the task processing situations of the first computing unit, the second computing unit, and the third computing unit, and when any computing unit completes the corresponding processing, it controls any computing unit to execute the processing corresponding to the other computing units.

In implementation, the CPU may send a corresponding task execution instruction to the computing unit in the FPGA according to the task table corresponding to each computing unit, where the task execution instruction may carry a task node pair, the FPGA may execute a corresponding edge task according to the task node pair carried by the task execution instruction, and after each computing unit in the FPGA executes the corresponding edge task, that is, after determining the number of nodes forming a triangle with the task node pair, the CPU may send a corresponding execution completion notification. The CPU may send the next task node pair to the corresponding FPGA computation unit according to the edge task list. When the CPU determines the computing unit for executing the same class of edge tasks and executes all the edge tasks of the corresponding type, the task node pairs can be obtained from the task tables corresponding to the edge tasks of other types and sent to the computing unit for execution.

In addition, for a plurality of computing units executing the same side task type, scheduling among a plurality of computing units processing the same task can also be realized.

For example, for a plurality of computing units executing the same edge task type, when the CPU determines that one of the target computing units has executed all edge tasks in its corresponding task table, then the number of edge tasks that are not executed temporarily may be determined in the task tables corresponding to the other computing units. And then, acquiring the next target side task to be executed from a task table with the maximum number of the remaining side tasks, and sending an execution instruction corresponding to the target side task to the target computing unit, so that the task dynamic scheduling among a plurality of computing units executing the same side task type is realized.

Therefore, the efficiency of executing the same side task type by the plurality of computing units can be improved for realizing the dynamic scheduling of the tasks among the plurality of computing units executing the same side task type, and the efficiency of executing the side task type by the computing units can be improved for realizing the dynamic scheduling of the tasks among the computing units executing different side task types.

In this embodiment of the present application, the graph data is graph data storing user information, such as a social network graph stored in a background server of a social application, and nodes in the graph data are attribute information of a corresponding user, such as gender, age, preference, and the like. The edge is used for indicating that users corresponding to two nodes connected by the edge are friends. After the number of triangle structures in the social network diagram is determined, the number of triangle structures formed by nodes in the diagram data can be input into an aggregation coefficient analysis model to obtain the degree of relationship tightness among users corresponding to the nodes of the diagram data.

The implementation of community discovery and recommendation applied to social media can be realized by a CPU, and the specific steps can be as follows:

The aggregation coefficients are two types of local aggregation coefficients and global aggregation coefficients, and the local aggregation coefficients are calculated as follows:wherein R is _u K is the number of triangles passing through a given node and neighbor nodes thereof _u Indicating the degree of the fixed node. And the calculation of the global aggregation factor may be obtained by a normalized summation of the local aggregation factors.

Fig. 8 is a schematic diagram of an apparatus for determining the number of triangle structures formed by nodes in graph data according to an embodiment of the present application, where the apparatus may be a computer device in the foregoing embodiment, and the apparatus includes:

A first determining module 810, configured to determine, based on each edge between nodes in the graph data, a number of nodes corresponding to a first neighbor node, and allocate a number to each node based on the number of nodes corresponding to the first neighbor node, where the number corresponding to a node with a larger number of the first neighbor nodes is smaller;

a second determining module 820, configured to determine, based on the size of the corresponding number of each edge connecting two nodes in the graph data, a first direction of each edge;

a third determining module 830, configured to determine, for any node, other nodes pointed by a first direction of an edge connected to the any node, and determine the other nodes as second neighbor nodes of the any node;

a fourth determining module 840, configured to determine, for two nodes connected on any side in the graph data, whether the two nodes both have corresponding second neighbor nodes, and if the two nodes both have corresponding second neighbor nodes, determine the two nodes as a task node pair;

and a fifth determining module 850, configured to determine the target number of the same second neighbor nodes corresponding to two nodes in each task node pair, and determine the sum of the target numbers corresponding to each task node as the number of triangle structures composed of nodes in the graph data.

Optionally, the apparatus further includes a dividing module, configured to:

Optionally, the fifth determining module 850 is configured to:

Optionally, the sixth determining module is further configured to:

Optionally, the fifth determining module is configured to:

optionally, the fifth determining module is configured to:

Optionally, the fifth determining module 850 is configured to:

It should be noted that: the device for determining the number of triangle structures formed by the nodes in the graph data provided in the above embodiment is only exemplified by the division of the above functional modules when determining the number of triangle structures formed by the nodes in the graph data, and in practical application, the above functional allocation may be completed by different functional modules according to needs, i.e. the internal structure of the device is divided into different functional modules, so as to complete all or part of the functions described above. In addition, the device for determining the number of triangle structures formed by the nodes in the graph data provided in the above embodiment belongs to the same concept as the method embodiment for determining the number of triangle structures formed by the nodes in the graph data, and the specific implementation process is detailed in the method embodiment and will not be described herein.

Fig. 9 shows a block diagram of a computer device 900 provided in an exemplary embodiment of the present application. The computer device 900 may be a terminal or a server in the above embodiments, and when the computer device 900 is a terminal, it may be a portable mobile terminal, for example: a smart phone, a tablet computer, an MP3 player (moving picture experts group audio layer III, motion picture expert compression standard audio plane 3), an MP4 (moving picture experts group audio layer IV, motion picture expert compression standard audio plane 4) player, a notebook computer, or a desktop computer. Computer device 900 may also be referred to by other names of user devices, portable terminals, laptop terminals, desktop terminals, and the like. When the computer device 900 is a server, the server may be a single server or may be a server group formed by a plurality of servers, and different servers may be responsible for different processes in the above embodiments, which may be specifically set by a technician according to an actual scenario.

In general, the computer device 900 includes: a processor 901 and a memory 902.

Processor 901 may include one or more processing cores, such as a 4-core processor, an 8-core processor, and the like. The processor 901 may be implemented in at least one hardware form of DSP (digital signal processing ), FPGA (field-programmable gate array, field programmable gate array), PLA (programmable logic array ). The processor 901 may also include a main processor and a coprocessor, the main processor being a processor for processing data in an awake state, also referred to as a CPU (central processing unit ); a coprocessor is a low-power processor for processing data in a standby state. In some embodiments, the processor 901 may integrate a GPU (graphics processing unit, image processor) for taking care of rendering and drawing of content that the display screen needs to display. In some embodiments, the processor 901 may also include an AI (artificial intelligence ) processor for processing computing operations related to machine learning.

The memory 902 may include one or more computer-readable storage media, which may be non-transitory. The memory 902 may also include high-speed random access memory, as well as non-volatile memory, such as one or more magnetic disk storage devices, flash memory storage devices. In some embodiments, a non-transitory computer readable storage medium in memory 902 is configured to store at least one instruction for execution by processor 901 to implement a method for determining the number of triangle structures formed by nodes in graph data provided by a method embodiment in the present application.

In some embodiments, the computer device 900 may also optionally include: a peripheral interface 903, and at least one peripheral. The processor 901, memory 902, and peripheral interface 903 may be connected by a bus or signal line. The individual peripheral devices may be connected to the peripheral device interface 903 via buses, signal lines, or circuit boards. Specifically, the peripheral device includes: at least one of radio frequency circuitry 904, a display 905, a camera assembly 906, audio circuitry 907, a positioning assembly 908, and a power source 909.

The peripheral interface 903 may be used to connect at least one input/output (I/O) related peripheral to the processor 901 and the memory 902. In some embodiments, the processor 901, memory 902, and peripheral interface 903 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 901, the memory 902, and the peripheral interface 903 may be implemented on separate chips or circuit boards, which is not limited in this embodiment.

The radio frequency circuit 904 is configured to receive and transmit RF (radio frequency) signals, also known as electromagnetic signals. The radio frequency circuit 904 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency circuit 904 converts an electrical signal into an electromagnetic signal for transmission, or converts a received electromagnetic signal into an electrical signal. Optionally, the radio frequency circuit 904 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency circuit 904 may communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (wireless fidelity ) networks. In some embodiments, the radio frequency circuit 904 may also include NFC (near field communication ) related circuits, which are not limited in this application.

The display 905 is used to display a UI (user interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display 905 is a touch display, the display 905 also has the ability to capture touch signals at or above the surface of the display 905. The touch signal may be input as a control signal to the processor 901 for processing. At this time, the display 905 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display 905 may be one, disposed on the front panel of the computer device 900; in other embodiments, the display 905 may be at least two, respectively disposed on different surfaces of the computer device 900 or in a folded design; in other embodiments, the display 905 may be a flexible display disposed on a curved surface or a folded surface of the computer device 900. Even more, the display 905 may be arranged in an irregular pattern other than rectangular, i.e., a shaped screen. The display 905 may be made of LCD (liquid crystal display ), OLED (organic light-emitting diode) or other materials.

The camera assembly 906 is used to capture images or video. Optionally, the camera assembly 906 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the terminal and the rear camera is disposed on the rear surface of the terminal. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera, and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and VR (virtual reality) shooting function or other fusion shooting functions. In some embodiments, camera assembly 906 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.

The audio circuit 907 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 901 for processing, or inputting the electric signals to the radio frequency circuit 904 for voice communication. For purposes of stereo acquisition or noise reduction, the microphone may be multiple, each disposed at a different location of the computer device 900. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 901 or the radio frequency circuit 904 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio circuit 907 may also include a headphone jack.

The location component 908 is used to locate the current geographic location of the computer device 900 to enable navigation or LBS (location based service, location-based services). The positioning component 908 may be a positioning component based on the United states GPS (global positioning system ), the Beidou system of China, or the Galileo system of Russia.

The power supply 909 is used to power the various components in the computer device 900. The power supply 909 may be an alternating current, a direct current, a disposable battery, or a rechargeable battery. When the power source 909 includes a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.

In some embodiments, computer device 900 also includes one or more sensors 910. The one or more sensors 910 include, but are not limited to: acceleration sensor 911, gyroscope sensor 912, pressure sensor 913, fingerprint sensor 914, optical sensor 915, and proximity sensor 916.

The acceleration sensor 911 can detect the magnitudes of accelerations on three coordinate axes of the coordinate system established by the computer device 900. For example, the acceleration sensor 911 may be used to detect components of gravitational acceleration in three coordinate axes. The processor 901 may control the display 905 to display the user interface in a landscape view or a portrait view according to the gravitational acceleration signal acquired by the acceleration sensor 911. The acceleration sensor 911 may also be used for the acquisition of motion data of a game or a user.

The gyro sensor 912 may detect a body direction and a rotation angle of the computer device 900, and the gyro sensor 912 may collect a 3D motion of the user on the computer device 900 in cooperation with the acceleration sensor 911. The processor 901 may implement the following functions according to the data collected by the gyro sensor 912: motion sensing (e.g., changing UI according to a tilting operation by a user), image stabilization at shooting, game control, and inertial navigation.

The pressure sensor 913 may be disposed on a side frame of the computer device 900 and/or on an underside of the display 905. When the pressure sensor 913 is disposed on the side frame of the computer device 900, a holding signal of the computer device 900 by the user may be detected, and the processor 901 performs left-right hand recognition or quick operation according to the holding signal collected by the pressure sensor 913. When the pressure sensor 913 is provided at the lower layer of the display 905, the processor 901 performs control of the operability control on the UI interface according to the pressure operation of the user on the display 905. The operability controls include at least one of a button control, a scroll bar control, an icon control, and a menu control.

The fingerprint sensor 914 is used for collecting the fingerprint of the user, and the processor 901 identifies the identity of the user according to the fingerprint collected by the fingerprint sensor 914, or the fingerprint sensor 914 identifies the identity of the user according to the collected fingerprint. Upon recognizing that the user's identity is a trusted identity, the processor 901 authorizes the user to perform relevant sensitive operations including unlocking the screen, viewing encrypted information, downloading software, paying for and changing settings, etc. The fingerprint sensor 914 may be disposed on the front, back, or side of the computer device 900. When a physical key or vendor Logo is provided on the computer device 900, the fingerprint sensor 914 may be integrated with the physical key or vendor Logo.

The optical sensor 915 is used to collect the intensity of ambient light. In one embodiment, the processor 901 may control the display brightness of the display panel 905 based on the intensity of ambient light collected by the optical sensor 915. Specifically, when the ambient light intensity is high, the display luminance of the display screen 905 is turned up; when the ambient light intensity is low, the display luminance of the display panel 905 is turned down. In another embodiment, the processor 901 may also dynamically adjust the shooting parameters of the camera assembly 906 based on the ambient light intensity collected by the optical sensor 915.

A proximity sensor 916, also referred to as a distance sensor, is typically provided on the front panel of the computer device 900. Proximity sensor 916 is used to capture the distance between the user and the front of computer device 900. In one embodiment, when the proximity sensor 916 detects that the distance between the user and the front of the computer device 900 gradually decreases, the processor 901 controls the display 905 to switch from the bright screen state to the off screen state; when the proximity sensor 916 detects that the distance between the user and the front surface of the computer device 900 gradually increases, the display 905 is controlled by the processor 901 to switch from the off-screen state to the on-screen state.

Those skilled in the art will appreciate that the architecture shown in fig. 9 is not limiting of the computer device 900, and may include more or fewer components than shown, or may combine certain components, or employ a different arrangement of components.

In an exemplary embodiment, a computer readable storage medium, such as a memory, comprising instructions executable by a processor in a terminal to perform the method of determining the number of triangle structures formed by nodes in graph data in the above embodiment is also provided. The computer readable storage medium may be non-transitory. For example, the computer readable storage medium may be a ROM (read-only memory), a RAM (random access memory ), a CD-ROM, a magnetic tape, a floppy disk, an optical data storage device, and the like.

It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program for instructing relevant hardware, where the program may be stored in a computer readable storage medium, and the storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.

The foregoing description of the preferred embodiments of the present application is not intended to limit the invention to the particular embodiments of the present application, but to limit the scope of the invention to the particular embodiments of the present application.

Claims

1. A method for determining the number of triangle structures formed by nodes in graph data, the method comprising:

determining the number of the first neighbor nodes corresponding to each node based on each edge between nodes in graph data, and allocating numbers to each node based on the number of the first neighbor nodes corresponding to each node, wherein the numbers corresponding to nodes with more numbers of the first neighbor nodes are smaller, the graph data are graph data for storing user information, the nodes in the graph data are attribute information of corresponding users, and the edges are used for indicating that the users corresponding to two nodes connected by the edges are friends;

Determining the target number of the same second neighbor nodes corresponding to two nodes in each task node pair, and determining the sum of the target numbers corresponding to each task node as the number of triangle structures formed by the nodes in the graph data;

2. The method of claim 1, wherein prior to determining the target number of identical second neighbor nodes corresponding to two nodes in each pair of task nodes, the method further comprises:

dividing each node into a first class node and a second class node based on the number of second neighbor nodes corresponding to each node, wherein the number of the second neighbor nodes corresponding to the first class node is larger than that of the second neighbor nodes corresponding to the second class node;

the determining the target number of the same second neighbor nodes corresponding to the two nodes in each task node pair comprises the following steps:

3. The method according to claim 2, wherein the method further comprises:

determining adjacency list information corresponding to the first type of nodes, wherein the adjacency list information comprises binary groups, the binary groups are spliced by binary subarrays corresponding to the first type of nodes according to the serial numbers corresponding to the first type of nodes, in any binary subarrays corresponding to the first type of nodes, if the numerical value of an N bit is 1, the node with the serial number is a second neighbor node of any first type of nodes, and if the numerical value of the N bit is 0, the node with the serial number is not the second neighbor node of any first type of nodes, wherein N is an integer greater than or equal to 0;

The method further comprises the steps of:

4. A method according to claim 3, wherein determining the first number of identical second neighbor nodes corresponding to two nodes in each pair of first class task nodes comprises:

The determining the second number of the same second neighbor nodes corresponding to the two nodes in each second class task node pair comprises the following steps:

and determining the number of the same second neighbor nodes corresponding to each second type of task node as the second number.

5. The method of claim 2, wherein the determining includes determining a first number of identical second neighbor nodes corresponding to two nodes in each pair of first class task nodes; determining the second number of the same second neighbor nodes corresponding to two nodes in each second class task node pair; determining a third number of identical second neighbor nodes corresponding to two nodes in each third class of task node pair, including:

6. The system for determining the number of triangle structures formed by nodes in graph data is characterized by comprising a CPU and an FPGA, wherein:

the CPU is used for determining the number of the first neighbor nodes corresponding to each node based on each edge between each node in the graph data, and allocating numbers to each node based on the number of the first neighbor nodes corresponding to each node, wherein the numbers corresponding to the nodes with the larger number of the first neighbor nodes are smaller, the graph data are graph data for storing user information, the nodes in the graph data are attribute information of corresponding users, and the edges are used for indicating that the users corresponding to the two nodes connected by the edge are friends; determining a first direction of each side based on the size of a corresponding number of each side connecting two nodes in the graph data; for any node, determining other nodes pointed by a first direction of an edge connected with the any node, and determining the other nodes as second neighbor nodes of the any node; for two nodes connected on any side in the graph data, determining whether the two nodes have corresponding second neighbor nodes, and if the two nodes have corresponding second neighbor nodes, determining the two nodes as task node pairs;

The FPGA is used for determining the target number of the same second neighbor nodes corresponding to two nodes in each task node pair, and determining the sum of the target numbers corresponding to each task node as the number of triangle structures formed by the nodes in the graph data;

and the CPU is used for inputting the number of triangle structures formed by the nodes in the graph data into an aggregation coefficient analysis model to obtain the relationship tightness degree between users corresponding to each node of the graph data.

7. The system of claim 6, wherein the CPU is configured to:

The FPGA is used for: determining first numbers of identical second neighbor nodes corresponding to two nodes in each first class task node pair; determining the second number of the same second neighbor nodes corresponding to two nodes in each second class task node pair; determining a third number of identical second neighbor nodes corresponding to two nodes in each third type of task node pair, and determining the sum of the first number, the second number and the third number as the number of triangle structures consisting of nodes in the graph data;

the CPU is used for: determining adjacency list information corresponding to the first type of nodes, wherein the adjacency list information comprises binary groups, the binary groups are spliced by binary subarrays corresponding to the first type of nodes according to the serial numbers corresponding to the first type of nodes, in any binary subarrays corresponding to the first type of nodes, if the numerical value of an N bit is 1, the node with the serial number is a second neighbor node of any first type of nodes, and if the numerical value of the N bit is 0, the node with the serial number is not the second neighbor node of any first type of nodes, wherein N is an integer greater than or equal to 0;

The CPU is used for: for a binary sub-array corresponding to any first type node in the binary array, dividing the binary sub-array into a plurality of array blocks according to a preset dividing number, and determining the sequence identifier of each array block; removing array blocks with all elements of 0, and forming the rest array blocks into compressed binary subarrays corresponding to any first type node; and forming an array block indication array by the sequence identification of the array block corresponding to the compressed binary sub-array, and forming the compressed binary sub-array into the compressed binary array.

8. The system of claim 7, wherein the system further comprises a controller configured to control the controller,

the FPGA is used for:

The FPGA is used for:

9. The system of claim 7, wherein the FPGA comprises a first computing unit, a second computing unit, and a third computing unit;

10. A device for determining the number of triangle structures formed by nodes in graph data, the device comprising:

the first determining module is used for determining the number of the first neighbor nodes corresponding to each node based on each edge between each node in the graph data, and allocating numbers to each node based on the number of the first neighbor nodes corresponding to each node, wherein the number corresponding to the nodes with the larger number of the first neighbor nodes is smaller, the graph data are graph data for storing user information, the nodes in the graph data are attribute information of corresponding users, and the edges are used for indicating that the users corresponding to the two nodes connected by the edge are friends;

a fifth determining module, configured to determine the target number of the same second neighbor nodes corresponding to two nodes in each task node pair, and determine the sum of the target numbers corresponding to each task node as the number of triangle structures formed by the nodes in the graph data;

and the analysis module is used for inputting the number of triangle structures formed by the nodes in the graph data into an aggregation coefficient analysis model to obtain the relationship tightness degree between users corresponding to each node of the graph data.

11. A computer device comprising a processor and a memory having stored therein at least one instruction that is loaded and executed by the processor to perform the operations performed by the method of determining the number of triangle structures formed by nodes in graph data as claimed in any one of claims 1 to 5.

12. A computer readable storage medium having stored therein at least one instruction that is loaded and executed by a processor to implement operations performed by the method of determining the number of triangle structures formed by nodes in graph data of any one of claims 1 to 5.