WO2021027331A1

WO2021027331A1 - Graph data-based full relationship calculation method and apparatus, device, and storage medium

Info

Publication number: WO2021027331A1
Application number: PCT/CN2020/087619
Authority: WO
Inventors: 邓强; 张娟; 屠宁; 赵之砚; 施奕明
Original assignee: 深圳壹账通智能科技有限公司
Priority date: 2019-08-15
Filing date: 2020-04-28
Publication date: 2021-02-18
Also published as: CN110609924A

Abstract

A graph data-based full relationship calculation method and apparatus, a device, and a storage medium, used for merging node attributes into node identifiers using bit operations, avoiding the duplication of node data, reducing the consumption of memory resources, and increasing calculation efficiency. The method comprises: acquiring preprocessed graph data, the preprocessed graph data comprising node data and edge data of each node (101); performing bit operations on the node data to generate a synthesised node identifier of each node (102); dividing the node data and edge data with each node data at the centre to generate multiple data groups, each data group comprising the synthesised node identifier of a current node and edge data connected to the current node (103); sending a single node identifier list of each node to all of the adjacent nodes, the single node identifier list being used for storing the synthesised node identifiers of the adjacent nodes (104); and, on the basis of the single node identifier list received by each node, generating the second-degree relationships of each node (105).

Description

Full relationship calculation method, device, equipment and storage medium based on graph data

This application claims the priority of the Chinese patent application filed with the Chinese Patent Office on August 15, 2019, the application number is 201910751784.4, and the invention title is "the method, device, equipment and storage medium for calculating the full relationship based on graph data", all of which The content is incorporated in this application by reference.

Technical field

This application relates to the field of big data technology, and in particular to a method, device, device, and storage medium for calculating a full relationship based on graph data.

Background technique

Graph data mining is an important method in relationship mining and group profiling. Graph data is composed of nodes and edges. The nodes in the graph data are used to represent the connected subjects, and the edges in the graph data are used to represent the association between the subjects. A node is associated with other nodes through the edges connected to it. One of the typical applications in graph calculation is to find the full relationship of a certain node and analyze the statistical characteristics. Among them, the calculation of the second-degree relationship and the third-degree relationship has become a difficult point in the graph calculation due to the large amount of calculation and computational resource consumption.

The current typical environment for graph computing is the GraphX environment in the Spark project open sourced by the Apache Software Foundation. GraphX uses a memory computing strategy to achieve fast iterative calculations; however, the inventor realized that memory computing consumes a huge amount of memory resources, and it is difficult to support the processing of massive data. In terms of index calculation of second-degree and third-degree associated nodes, using GraphX to calculate graphs on 50 million nodes and 400 million edges will consume 2000GB of memory. The calculation efficiency of the full relationship is low. On social networks, such calculation efficiency is difficult to meet the application of billions to billions of nodes.

Summary of the invention

This application provides a method, device, device, and storage medium for calculating a full-scale relationship based on graph data, which are used to merge node attributes into node identifiers using bit operations to avoid duplication of node data, reduce memory resource consumption, and improve computing efficiency .

The first aspect of the embodiments of the present application provides a full-scale relationship calculation method based on graph data, including: obtaining preprocessed graph data, the preprocessed graph data including node data and edge data of each node; Perform a bit operation to generate the synthetic node ID of each node; divide the node data and the edge data with each node data as the center to generate multiple data groups, each data group includes a synthetic node of the current node ID and the edge data connected to the current node; send a single node ID list of each node to all adjacent nodes, the single node ID list is used to store the composite node ID of the adjacent node; according to each node The received single node identification list generates the second-degree relationship of each node.

A second aspect of the embodiments of the present application provides a full-scale relationship calculation device based on graph data, including: a first obtaining unit configured to obtain preprocessed graph data, the preprocessed graph data including node data of each node and Edge data; an operation generating unit for performing bit operations on the node data to generate a synthetic node identifier for each node; a division generating unit for centering the node data and the edge data on each node data Divide, generate multiple data groups, each data group includes a composite node ID of the current node and edge data connected to the current node; the first sending unit is used to send the single node ID list of each node to all Adjacent nodes, the single node identification list is used to store the composite node identification of the adjacent nodes; the first generating unit is used to generate the second degree of each node according to the single node identification list received by each node relationship.

The third aspect of the embodiments of the present application provides a computer device, including: one or more processors; a memory; one or more computer programs, wherein the one or more computer programs are stored in the memory and Is configured to be executed by the one or more processors, and the one or more computer programs are configured to execute a full relationship calculation method based on graph data, wherein the full relationship calculation method based on graph data includes :

Acquiring preprocessed graph data, the preprocessed graph data including node data and edge data of each node;

Performing bit operations on the node data to generate a synthetic node identifier for each node;

Dividing the node data and the edge data with each node data as the center to generate a plurality of data groups, each data group including a synthetic node identifier of the current node and the edge data connected to the current node;

Sending a single node identification list of each node to all adjacent nodes, where the single node identification list is used to store the composite node identification of the adjacent nodes;

According to the single node identification list received by each node, the second-degree relationship of each node is generated.

A fourth aspect of the embodiments of the present application provides a computer-readable storage medium having a computer program stored on the computer-readable storage medium, and when the computer program is executed by a processor, a method for calculating a full relationship based on graph data is implemented, wherein , The method for calculating the full relationship based on graph data includes the following steps:

In the technical solution provided by the embodiment of the present application, preprocessed graph data is obtained, and the preprocessed graph data includes node data and edge data of each node; bit operations are performed on the node data to generate a synthetic node identifier for each node ; Divide the node data and the edge data with each node data as the center to generate multiple data groups, each data group includes a synthetic node identifier of the current node and the edge data connected to the current node; The single-node identification list of each node is sent to all adjacent nodes, and the single-node identification list is used to store the composite node identification of the adjacent nodes; and each node is generated according to the single-node identification list received by each node. The second-degree relationship of the node. Use bit operation to merge node attributes into node identifiers, and eliminate self-connection, avoid node data duplication, reduce memory resource consumption, and improve calculation efficiency.

Description of the drawings

FIG. 1 is a schematic diagram of an embodiment of a method for calculating a full relationship based on graph data in an embodiment of the application;

2 is a schematic diagram of another embodiment of a method for calculating a full relationship based on graph data in an embodiment of the application;

Fig. 3 is a schematic diagram of an embodiment of a full relation calculation device based on graph data in an embodiment of the application;

FIG. 4 is a schematic diagram of another embodiment of a full relationship calculation device based on graph data in an embodiment of the application;

FIG. 5 is a schematic diagram of an embodiment of a full relationship calculation device based on graph data in an embodiment of the application.

detailed description

Please refer to Fig. 1, a flowchart of a method for calculating a full relationship based on graph data provided by an embodiment of the present application, which specifically includes:

101. Obtain preprocessed graph data, where the preprocessed graph data includes node data and edge data of each node.

The server obtains the preprocessed graph data, and the preprocessed graph data includes node data and edge data of each node. Specifically, the server loads the preprocessed graph data through Spark. The preprocessed graph data is composed of node data and edge data. The node data in the graph data is used to indicate the subject that is connected, and the edge data is used to indicate the association between the subjects. . A node is related to other nodes through its connected edges. The node data of a target node includes the target node attribute and a target node identifier, and the target edge data includes a target edge identifier and two node identifiers connected to the target edge.

Among them, the node attribute contains several label data of the node. For example, the node label data may include an ID number, a mobile phone number, and three Boolean variables A, B, and C. Among them, variable A is used to indicate the gender of the user, 1 is used to indicate male, 0 is used to indicate female; variable B is used to indicate whether it is dishonest, 1 is used to indicate failure, and 0 is used to indicate no failure; variable C is used to indicate whether there is a university degree, with 1 means a university degree, and 0 means no university degree.

It should be noted that, in this embodiment of the application, the preprocessed graph data means that a large amount of original graph data has been deduplicated, and the graph data that meets the requirements have been selected.

It is understandable that the execution subject of this application may be a full-scale relational computing device based on graph data, or may be a terminal or a server, which is not specifically limited here. The embodiment of the present application takes the server as the execution subject as an example for description.

102. Perform a bit operation on the node data to generate a synthetic node identifier for each node.

The server performs bit operations on the node data to generate a synthetic node ID for each node. Specifically, the server determines multiple nodes in the node data; the server obtains the node attributes of each node and the initial node identifier corresponding to each node; the server obtains preset rules, which include the total storage bits for each node identifier Number and the starting and ending ordinal number of the storage location occupied by each variable; the server performs bit operations on the node attributes and initial node identification of each node according to the total number of storage bits for each node identification and the starting and ending ordinal number of the storage location occupied by each variable ; The server generates a synthetic node ID for each node. Among them, the preset rules include: (1) the total number of storage bits for each node identification; (2) the starting and ending ordinal numbers of the storage bits occupied by each variable. In this embodiment, storage of ID card number, mobile phone number, and the aforementioned three Boolean variables A, B, and C are taken as examples for description.

For example, the server calculates that the ID card number occupies 61 bits, the mobile phone number occupies 37 bits, and 3 Boolean variables occupy a total of 3 bits, which occupies 61+37+3=101 bits in total. In this case, the preset rules are: (1) Each node allocates 101 bits of storage; (2) 1-61 bits are used to store the ID number, the 62nd-98th bits store the mobile phone number, and the 99th bit stores the variable A, the 100th bit stores variable B, and the 101st bit stores variable C. The server saves a lot of memory resources by using bit storage instead of individually assigning double integer storage space to each variable.

It should be noted that the initial node identification is collected by the user in the preprocessing stage of the graph data and stored on the external memory (hard disk) of the computer. After starting the calculation, the server loads the initial node identifier into the internal memory. The data identified by the initial node belongs to the original data, and its collection occurs before the bit operation.

103. Divide the node data and the edge data with each node data as the center to generate multiple data groups, each data group including a synthetic node identifier of the current node and edge data connected to the current node.

The server divides the node data and the edge data with each node data as the center, and generates multiple data groups. Each data group includes a synthetic node identifier of the current node and the edge data connected to the current node.

It should be noted that the graph data contains node data and edge data. After being loaded into the memory, in order to perform distributed calculations, the graph data needs to be divided into small processing units, which are referred to as "data groups" here. By establishing a node-centric "data group", it is ensured that each node has only one copy, which will only appear once, and avoiding multiple node data replication.

For example, a data group contains the node ID and all the edge data on the node. One edge data contains the composite node ID of the node and another node connected to it, but does not include any attribute data of the connected node.

104. Send the single node identification list of each node to all adjacent nodes, and the single node identification list is used to store the composite node identification of the adjacent nodes.

The server sends the single node identification list of each node to all adjacent nodes, and the single node identification list is used to store the composite node identification of each node's adjacent nodes.

It should be noted that the single node identification list is collected based on the edge data on the data group. Because the edge data contains the synthetic node identification of the current node and the once-associated node connected to the current node, all connected nodes can be collected by traversing all the edge data. The single node identifier list contains only the composite node identifiers of adjacent nodes.

For example, node a and nodes b, c, and d are adjacent nodes. Then first collect the identification list of all adjacent nodes (ie, single-node identification list), where the node identification is the four letters a, b, c, d (here is a simple example, in actual situations it contains hundreds of millions of nodes, you need Use more complex node identification to indicate, for example, when the device is a node, use the production information of the device, such as the date of generation, serial number, etc., as the node identification). For node a, the obtained node identification list is the list of [b, c, d]; then, node a passes the list of [b, c, d] to the three nodes b, c, d, respectively.

105. According to the single node identification list received by each node, generate the second-degree relationship of each node.

The server generates the second-degree relationship of each node according to the single node identification list received by each node. Specifically, the server receives the list of single node identifiers of each node; the server determines its own synthetic node identifier for each node; the server separately receives the same node identifier as its own synthetic node identifier in the list of single node identifiers received by each node Delete; the server generates a second-degree relationship for each node, and the second-degree relationship is used to indicate that there is an interval between the second-degree associated node and the current node.

It should be noted that because its own synthetic node identifier belongs to a 0-degree relationship, not a two-degree relationship. Delete the synthetic node identifier that is the same as the self in the list, which completes the elimination of the self node. For example, according to the scenario described above, node a is connected to nodes b, c, and d, and nodes b, c, and d are not directly connected, then after node b receives the list [b, c, d] sent by node a, Delete b from the list, and there are two nodes [c, d] left in the list. These two nodes are the second-degree relationship of b.

It can be understood that a first-degree relationship refers to the connection between two nodes, that is, adjacent nodes; a second-degree relationship refers to a node between two nodes. For example, in the above example, node b and node c are separated by node a, so node b and node c are second-degree related, that is, node b and node c have a second-degree relationship.

Optionally, after step 105, the method further includes: the server obtains the node attribute of each node according to the second-degree relationship of each node, and performs statistical analysis according to the node attribute of each node to generate an analysis result. Specifically, the server reads the second-degree relationship of each node; the server determines the node attribute of each node from the second-degree relationship; the server separates the node attribute of each node from the synthetic node identification according to preset rules; Perform statistical analysis on the node attributes of each node to generate analysis results.

In the embodiment of the present application, by dividing the minimum processing unit with node data as the center, a large amount of repeated node data is avoided, memory resources occupied by node data are reduced, a large amount of computing resources are saved, and computing efficiency is improved.

Please refer to FIG. 2, another flowchart of the full relationship calculation method based on graph data provided by the embodiment of the present application, which specifically includes:

201. Obtain preprocessed graph data. The preprocessed graph data includes node data and edge data of each node.

The server obtains the preprocessed graph data, and the preprocessed graph data includes the node data and edge data of each node. Specifically, the server loads the preprocessed graph data through Spark. The preprocessed graph data is composed of node data and edge data. The node data in the graph data is used to indicate the subject that is connected, and the edge data is used to indicate the association between the subjects. . A node is related to other nodes through its connected edges. The node data of a target node includes the target node attribute and a target node identifier, and the target edge data includes a target edge identifier and two node identifiers connected to the target edge.

202. Perform a bit operation on the node data to generate a synthetic node identifier for each node.

203. Divide the node data and the edge data with each node data as the center, and generate multiple data groups, each data group including a synthetic node identifier of the current node and the edge data connected to the current node.

The server divides the node data and edge data with each node data as the center, and generates multiple data groups. Each data group includes a synthetic node identifier of the current node and the edge data connected to the current node.

204. Send the single node identification list of each node to all adjacent nodes, where the single node identification list is used to store the composite node identification of the adjacent nodes.

The server sends the single node identification list of each node to all adjacent nodes, and the single node identification list is used to store the composite node identification of the adjacent nodes of each node.

205. According to the single node identification list received by each node, generate a second-degree relationship of each node.

It can be understood that the first-degree relationship refers to the connection between two nodes, that is, adjacent nodes; the second-degree relationship refers to the distance between two nodes by one node. For example, in the above example, node b and node c are separated by node a, so node b and node c are second-degree related, that is, node b and node c have a second-degree relationship.

206. Generate a second-degree relationship identifier list according to the second-degree relationship of each node, where the second-degree relationship identifier list is used to store the second-degree relationship of each node.

The server generates a second-degree relationship identifier list according to the second-degree relationship of each node, and the second-degree relationship identifier list is used to store the second-degree relationship of each node.

207. Send the second-degree relationship identifier list of each node to all adjacent nodes.

The server sends the second-degree relationship identifier list of each node to all neighboring nodes, and the second-degree relationship identifier list is used to store the composite node identifier of the neighboring nodes of each node.

208. Generate a three-degree relationship identifier list for each node according to the two-degree relationship identifier list sent by neighboring nodes received by each node. The three-degree relationship identifier list is used to store the three-degree relationship of each node, and the third-degree relationship is used for Indicates that there is a first-degree associated node and a second-degree associated node between the three-degree associated node and the current node.

The server generates a three-degree relationship identifier list for each node according to the second-degree relationship identifier list received by each node from neighboring nodes. The three-degree relationship identifier list is used to store the three-degree relationship of each node, and the third-degree relationship is used to indicate Between the third-degree associated node and the current node, there is a first-degree associated node and a second-degree associated node. Specifically, the server obtains the second-degree relationship identifier list of each node; the server determines each node's own synthetic node identifier; the server separately receives the second-degree relationship identifier list received by each node that is the same as its own synthetic node identifier The node ID is deleted; the server determines the three-degree relationship of each node, and the three-degree relationship is used to indicate the interval between two nodes (that is, the interval between the three-degree associated node and the current node is a one-degree associated node and a two-degree associated node. Degree-related nodes); generate a three-degree relationship identifier list, and the three-degree relationship identifier list is used to store the three-degree relationship of each node.

It should be noted that because its own synthetic node identifier belongs to a 0 degree relationship, not a three degree relationship. Delete the node ID that is the same as itself in the list, that is, to exclude the own node. At the same time, first-degree and second-degree relations need to be excluded from the list of third-degree relations. For example, node a is connected to nodes b, c, and d, node e is connected to node b, and nodes b, c, and d are not directly connected, then node b receives the list [b, c, d] sent by node a, Delete b from the list, and there are two nodes [c, d] left in the list. These two nodes are the second degree relationship of b. The node b continues to send the list [c, d] to the node e, and then obtains the three-degree relationship list of the node e, that is, the node e and the node c, and the e and d form a three-degree relationship. There is no case where it is connected to itself (that is, its own synthetic node ID), and at the same time, node b will also send the node ID list [c, d] to node a. However, since the list [c, d] also exists in the one-degree relationship list [b, c, d] of the node a, c and d need to be excluded and do not form a three-degree relationship with a.

Optionally, after step 208, the method further includes: the server obtains the node attributes of each node according to the three-degree relationship identification list of each node, and performs statistical analysis according to the node attributes of each node. Specifically, the separation here refers to the process of reading the node attributes from the synthetic node identifier. It corresponds to the previous data reading process and follows the unified node attribute preset rules. Using the aforementioned example, use 101 bits to store the ID number, mobile phone number, and three Boolean variables of A, B, and C. After the calculation is completed, when reading the data, take out the 1st to 61st bit to read the ID number, the 62nd to 98th bit to read the mobile phone number, the 99th bit to read the variable A, the 100th bit to read the variable B, read the variable C at the 101st bit to get the node attribute.

In the embodiments of the present application, the separated node attributes are statistically analyzed according to business needs, which improves the calculation efficiency. For example, count all the friends worth recommending in the three-degree relationship, or count all the users with good credit, and so on.

Optionally, before step 201, the method further includes: the server obtains the original graph data of each node; the server performs deduplication and verification processing on the original graph data; and the server generates preprocessed graph data that meets the requirements.

It should be noted that the embodiments of this application are purely node-centric lists, which support all Spark optimizations on resilient distributed datasets (RDD), such as memory parameters, storage methods, and map calculations on RDDs. Strategy etc.

The above describes the full relationship calculation method based on graph data in the embodiment of this application. The following describes the full relationship calculation device based on graph data in the embodiment of this application. Please refer to FIG. 3, the full relationship calculation method based on graph data in the embodiment of this application. An embodiment of the relational computing device includes:

The first obtaining unit 301 is configured to obtain preprocessed graph data, where the preprocessed graph data includes node data and edge data of each node;

The operation generating unit 302 is configured to perform a bit operation on the node data to generate a synthetic node identifier of each node;

The dividing and generating unit 303 is configured to divide the node data and the edge data with each node data as the center to generate multiple data groups, each data group including a composite node identifier of the current node and connection with the current node Edge data;

The first sending unit 304 is configured to send a single node identification list of each node to all adjacent nodes, and the single node identification list is used to store the composite node identification of the adjacent nodes;

The first generating unit 305 is configured to generate the second degree relationship of each node according to the single node identification list received by each node.

Please refer to FIG. 4, another embodiment of the full relation calculation device based on graph data in the embodiment of the present application includes:

Optionally, the operation generating unit 302 is specifically configured to:

Determine multiple nodes in the node data; obtain the node attributes of each node and the initial node identifier corresponding to each node; obtain preset rules, the preset rules including the total number of storage bits for each node identifier and The starting and ending ordinal number of the storage position occupied by each variable; according to the total number of storage bits for each node identification and the starting and ending ordinal number of the storage position occupied by each variable, the node attributes of each node and the initial node The identification performs bitwise operations; the synthetic node identification of each node is generated.

Optionally, the full relationship calculation device based on graph data further includes:

The statistical analysis unit 306 is configured to obtain the node attribute of each node according to the second-degree relationship of each node, and perform statistical analysis according to the node attribute of each node to generate an analysis result.

Optionally, the statistical analysis unit 306 is specifically used to:

Read the second-degree relationship of each node; determine the node attribute of each node from the second-degree relationship; separate the node attribute of each node from the synthetic node identifier according to preset rules; Perform statistical analysis on the node attributes of each node to generate analysis results.

Optionally, the first generating unit 305 is specifically configured to:

Receive the single-node identification list of each node; determine the connection status of each node; delete the synthetic node identification that is the same as itself in the received single-node identification list; generate the second degree relationship of each node, the second degree The relationship is used to indicate the second degree associated node and the current node, and there is an interval of one degree associated node.

The second generating unit 307 is configured to generate a second-degree relationship identifier list according to the second-degree relationship of each node, and the second-degree relationship identifier list is used to store the second-degree relationship of each node;

The second sending unit 308 is configured to send the second-degree relationship identifier list of each node to all adjacent nodes;

The third generating unit 309 is configured to generate a third-degree relationship identifier list of each node according to the second-degree relationship identifier list sent by neighboring nodes received by each node, and the third-degree relationship identifier list is used to store the three-degree relationship identifier list of each node. Degree relationship, the three-degree relationship is used to indicate that there is an interval between a first-degree associated node and a second-degree associated node between the third-degree associated node and the current node.

The second obtaining unit 310 is configured to obtain the original graph data of each node;

The processing unit 311 is configured to perform deduplication processing and verification processing on the original image data;

The fourth generating unit 312 is used to generate preprocessed image data that meets the requirements.

The above Figures 3 to 4 describe in detail the full relationship calculation device based on graph data in the embodiment of the present application from the perspective of a modular functional entity. The following describes the full relationship calculation device based on graph data in the embodiment of the present application from the perspective of hardware processing Give a detailed description.

FIG. 5 is a schematic structural diagram of a full relational computing device based on graph data provided by an embodiment of the present application. The full relational computing device 500 based on graph data may have relatively large differences due to different configurations or performances, and may include one or One or more processors (central processing units, CPU) 501 (for example, one or more processors) and a memory 509, and one or more storage media 508 (for example, one or one storage device with a large amount of storage) storing application programs 507 or data 506. Among them, the memory 509 and the storage medium 508 may be short-term storage or persistent storage. The program stored in the storage medium 508 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the full relational computing device based on graph data. Further, the processor 501 may be configured to communicate with the storage medium 508, and execute a series of instruction operations in the storage medium 508 on the full relationship computing device 500 based on graph data.

The full relational computing device 500 based on graph data may also include one or more power supplies 502, one or more wired or wireless network interfaces 503, one or more input and output interfaces 504, and/or, one or more operating systems 505 , Such as Windows Serve, MacOS X, Unix, Linux, FreeBSD, etc. Those skilled in the art can understand that the structure of the full relationship computing device based on graph data shown in FIG. 5 does not constitute a limitation on the full relationship computing device based on graph data, and may include more or less components than shown in the figure. Or combine certain components, or different component arrangements. The processor 501 can execute the first acquisition unit 301, the operation generation unit 302, the division generation unit 303, the first generation unit 305, the statistical analysis unit 306, the second generation unit 307, the third generation unit 309, and the second generation unit 301 in the above embodiments. Functions of the acquiring unit 310, the processing unit 311, and the fourth generating unit 312.

In the following, the components of the full relational computing device based on graph data will be specifically introduced in conjunction with Figure 5:

The processor 501 is the control center of the full relationship calculation device based on graph data, and can perform processing in accordance with the set full relationship calculation method based on graph data. The processor 501 uses various interfaces and lines to connect various parts of the entire graph data-based full relational computing device, by running or executing software programs and/or modules stored in the memory 509, and calling data stored in the memory 509, Perform various functions and processing data of full relational computing equipment based on graph data. By dividing the minimum processing unit with node data as the center, avoid generating a large amount of repeated node data, reducing the memory resources occupied by node data, and saving a lot of Computing resources to improve computing efficiency. The storage medium 508 and the memory 509 are both carriers for storing data. In the embodiment of the present application, the storage medium 508 may refer to an internal memory with a small storage capacity but high speed, and the storage 509 may have a large storage capacity but a slow storage speed. External memory.

The memory 509 may be used to store software programs and modules. The processor 501 executes various functional applications and data processing of the full relational computing device 500 based on graph data by running the software programs and modules stored in the memory 509. The memory 509 may mainly include a program storage area and a data storage area. The storage program area may store an operating system and at least one application program required by a function (for example, sending a single node identification list of each node to all adjacent nodes, The single node identifier list is used to store the composite node identifiers of adjacent nodes, etc.; the storage data area can store data created according to the use of the full relationship computing device based on graph data (such as the composite node identifier of each node). In addition, the memory 509 may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices. The full relation calculation method program based on graph data and the received data stream provided in the embodiment of the present application are stored in the memory, and the processor 501 is called from the memory 509 when needed.

When the computer program instructions are loaded and executed on the computer, the processes or functions described in the embodiments of the present application are generated in whole or in part. The computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices. The computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium, the storage medium being a volatile storage medium or a non-volatile storage medium, For example, the computer instructions can be sent from one website site, computer, server, or data center to another website site, through wired (such as coaxial cable, optical fiber, twisted pair) or wireless (such as infrared, wireless, microwave, etc.) Computer, server or data center for transmission. The computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a server or data center integrated with one or more available media. The usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, an optical disc), or a semiconductor medium (for example, a solid state disk (SSD)).

Those skilled in the art can clearly understand that, for the convenience and conciseness of description, the specific working process of the above-described system, device, and unit can refer to the corresponding process in the foregoing method embodiment, which will not be repeated here.

In the several embodiments provided in this application, it should be understood that the disclosed system, device, and method may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.

In addition, the functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units may be integrated into one unit. The above-mentioned integrated unit can be implemented in the form of hardware or software functional unit.

If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer readable storage medium. Based on this understanding, the technical solution of this application essentially or the part that contributes to the existing technology or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , Including several instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the method described in each embodiment of the present application. The aforementioned storage media include: U disk, mobile hard disk, read-only memory (read-only memory, ROM), random access memory (random access memory, RAM), magnetic disk or optical disk and other media that can store program code .

As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, not to limit them; although the present application has been described in detail with reference to the foregoing embodiments, a person of ordinary skill in the art should understand that: The technical solutions recorded in the embodiments are modified, or some of the technical features are equivalently replaced; these modifications or replacements do not cause the essence of the corresponding technical solutions to deviate from the spirit and scope of the technical solutions of the embodiments of the present application.

Claims

A calculation method for full relationship based on graph data, including:

Acquiring preprocessed graph data, the preprocessed graph data including node data and edge data of each node;

Performing bit operations on the node data to generate a synthetic node identifier for each node;

Dividing the node data and the edge data with each node data as the center to generate a plurality of data groups, each data group including a synthetic node identifier of the current node and the edge data connected to the current node;

Sending a single node identification list of each node to all adjacent nodes, where the single node identification list is used to store the composite node identification of the adjacent nodes;

According to the single node identification list received by each node, the second-degree relationship of each node is generated.
The method for calculating a full relationship based on graph data according to claim 1, wherein said performing bit operation on said node data to generate a synthetic node identifier for each node comprises:

Determine multiple nodes in the node data;

Obtain the node attributes of each node and the initial node ID corresponding to each node;

Acquiring a preset rule, the preset rule including a total number of storage bits for each node identifier and a starting and ending sequence number of storage bits occupied by each variable;

Perform a bit operation on the node attributes of each node and the initial node identifier according to the total number of storage bits for each node identifier and the starting and ending ordinal numbers of the storage positions occupied by each variable;

Generate a synthetic node ID for each node.
The method for calculating a full relationship based on graph data according to claim 1, wherein, after said generating the two-degree relationship of each node according to the single node identification list received by each node, the method further comprises:

Obtain the node attributes of each node according to the two-degree relationship of each node, and perform statistical analysis according to the node attributes of each node to generate analysis results.
The method for calculating a full relationship based on graph data according to claim 3, wherein the node attribute of each node is obtained according to the second-degree relationship of each node, and statistical analysis is performed according to the node attribute of each node to generate an analysis The results include:

Read the second-degree relationship of each node;

Determine the node attribute of each node from the two-degree relationship;

Separating the node attribute of each node from the synthetic node identifier according to a preset rule;

Perform statistical analysis on the node attributes of each node to generate analysis results.
The method for calculating a full relationship based on graph data according to claim 1, wherein said generating the second-degree relationship of each node according to the single node identification list received by each node comprises:

Receive a single node identification list of each node;

Determine the synthetic node ID of each node;

Respectively delete the node ID that is the same as its own synthetic node ID from the single node ID list received by each node;

A second-degree relationship is generated for each node, and the second-degree relationship is used to indicate that there is an interval between the second-degree associated node and the current node by a first-degree associated node.
The method for calculating a full relationship based on graph data according to claim 1, wherein, after said generating the two-degree relationship of each node according to the single node identification list received by each node, the method further comprises:

Generating a second-degree relationship identifier list according to the second-degree relationship of each node, where the second-degree relationship identifier list is used to store the second-degree relationship of each node;

Send the second-degree relationship identifier list of each node to all adjacent nodes;

A three-degree relationship identifier list for each node is generated according to the two-degree relationship identifier list sent by neighboring nodes received by each node. The three-degree relationship identifier list is used to store the three-degree relationship of each node. Used to indicate the interval between a three-degree associated node and the current node, a first-degree associated node and a second-degree associated node.
The method for calculating a full relationship based on graph data according to any one of claims 1-6, wherein, before said acquiring preprocessed graph data, said preprocessed graph data including node data and edge data of each node, The method also includes:

Obtain the original graph data of each node;

Performing de-duplication processing and verification processing on the original image data;

Generate preprocessed map data that meets the requirements.
A full-scale relational calculation device based on graph data, which includes:

The first acquiring unit is configured to acquire preprocessed graph data, where the preprocessed graph data includes node data and edge data of each node;

An operation generating unit, configured to perform bit operation on the node data to generate a synthetic node identifier for each node;

The dividing and generating unit is used to divide the node data and the edge data with each node data as the center to generate a plurality of data groups, each data group including a synthetic node identifier of the current node and a connection to the current node Edge data

The first sending unit is configured to send a single node identification list of each node to all adjacent nodes, where the single node identification list is used to store the composite node identification of the adjacent nodes;

The first generating unit is configured to generate the second-degree relationship of each node according to the single node identification list received by each node.
A computer device, which includes:

One or more processors;

Memory

One or more computer programs, wherein the one or more computer programs are stored in the memory and configured to be executed by the one or more processors, and the one or more computer programs are configured to execute A method for calculating a full relationship based on graph data, wherein the method for calculating a full relationship based on graph data includes:

Acquiring preprocessed graph data, the preprocessed graph data including node data and edge data of each node;

Performing bit operations on the node data to generate a synthetic node identifier for each node;

Dividing the node data and the edge data with each node data as the center to generate a plurality of data groups, each data group including a synthetic node identifier of the current node and the edge data connected to the current node;

Sending a single node identification list of each node to all adjacent nodes, where the single node identification list is used to store the composite node identification of the adjacent nodes;

According to the single node identification list received by each node, the second-degree relationship of each node is generated.
The computer device according to claim 9, wherein said performing a bit operation on said node data to generate a synthetic node identifier of each node comprises:

Determine multiple nodes in the node data;

Obtain the node attributes of each node and the initial node ID corresponding to each node;

Acquiring a preset rule, the preset rule including a total number of storage bits for each node identifier and a starting and ending sequence number of storage bits occupied by each variable;

Perform a bit operation on the node attributes of each node and the initial node identifier according to the total number of storage bits for each node identifier and the starting and ending ordinal numbers of the storage positions occupied by each variable;

Generate a synthetic node ID for each node.
The computer device according to claim 9, wherein, after said generating the two-degree relationship of each node according to the single node identification list received by each node, the method further comprises:

Obtain the node attributes of each node according to the two-degree relationship of each node, and perform statistical analysis according to the node attributes of each node to generate analysis results.
11. The computer device according to claim 11, wherein said acquiring the node attribute of each node according to the two-degree relationship of each node, and performing statistical analysis according to the node attribute of each node, and generating the analysis result comprises:

Read the second-degree relationship of each node;

Determine the node attribute of each node from the two-degree relationship;

Separating the node attribute of each node from the synthetic node identifier according to a preset rule;

Perform statistical analysis on the node attributes of each node to generate analysis results.
The computer device according to claim 9, wherein the generating the second-degree relationship of each node according to the single node identification list received by each node comprises:

Receive a single node identification list of each node;

Determine the synthetic node ID of each node;

Respectively delete the node ID that is the same as its own synthetic node ID from the single node ID list received by each node;

A second-degree relationship is generated for each node, and the second-degree relationship is used to indicate that there is an interval between the second-degree associated node and the current node by a first-degree associated node.
The computer device according to claim 9, wherein, after said generating the two-degree relationship of each node according to the single node identification list received by each node, the method further comprises:

Generating a second-degree relationship identifier list according to the second-degree relationship of each node, where the second-degree relationship identifier list is used to store the second-degree relationship of each node;

Send the second-degree relationship identifier list of each node to all adjacent nodes;

A three-degree relationship identifier list for each node is generated according to the two-degree relationship identifier list sent by neighboring nodes received by each node. The three-degree relationship identifier list is used to store the three-degree relationship of each node. Used to indicate the interval between a three-degree associated node and the current node, a first-degree associated node and a second-degree associated node.
The computer device according to any one of claims 9-14, wherein, before said acquiring preprocessed graph data, the preprocessed graph data including node data and edge data of each node, the method further comprises:

Obtain the original graph data of each node;

Performing de-duplication processing and verification processing on the original image data;

Generate preprocessed map data that meets the requirements.
A computer-readable storage medium, wherein a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, a method for calculating a full relationship based on graph data is implemented, wherein the full amount of graph data is The relationship calculation method includes the following steps:

Acquiring preprocessed graph data, the preprocessed graph data including node data and edge data of each node;

Performing bit operations on the node data to generate a synthetic node identifier for each node;

Dividing the node data and the edge data with each node data as the center to generate a plurality of data groups, each data group including a synthetic node identifier of the current node and the edge data connected to the current node;

Sending a single node identification list of each node to all adjacent nodes, where the single node identification list is used to store the composite node identification of the adjacent nodes;

According to the single node identification list received by each node, the second-degree relationship of each node is generated.
The computer-readable storage medium according to claim 16, wherein the performing a bit operation on the node data to generate a synthetic node identifier for each node comprises:

Determine multiple nodes in the node data;

Obtain the node attributes of each node and the initial node ID corresponding to each node;

Acquiring a preset rule, the preset rule including a total number of storage bits for each node identifier and a starting and ending sequence number of storage bits occupied by each variable;

Perform a bit operation on the node attributes of each node and the initial node identifier according to the total number of storage bits for each node identifier and the starting and ending ordinal numbers of the storage positions occupied by each variable;

Generate a synthetic node ID for each node.
The computer-readable storage medium according to claim 16, wherein, after said generating the two-degree relationship of each node according to the single node identification list received by each node, the method further comprises:

Obtain the node attributes of each node according to the two-degree relationship of each node, and perform statistical analysis according to the node attributes of each node to generate analysis results.
18. The computer-readable storage medium according to claim 18, wherein the obtaining the node attribute of each node according to the two-degree relationship of each node, and performing statistical analysis according to the node attribute of each node, and generating the analysis result comprises:

Read the second-degree relationship of each node;

Determine the node attribute of each node from the two-degree relationship;

Separating the node attribute of each node from the synthetic node identifier according to a preset rule;

Perform statistical analysis on the node attributes of each node to generate analysis results.
The computer-readable storage medium according to claim 16, wherein the generating the second-degree relationship of each node according to the single node identification list received by each node comprises:

Receive a single node identification list of each node;

Determine the synthetic node ID of each node;

Respectively delete the node ID that is the same as its own synthetic node ID from the single node ID list received by each node;

A second-degree relationship is generated for each node, and the second-degree relationship is used to indicate that there is an interval between the second-degree associated node and the current node by a first-degree associated node.