CN115774736B

CN115774736B - NUMA architecture time-varying graph processing method and device for data delay transmission

Info

Publication number: CN115774736B
Application number: CN202310095934.7A
Authority: CN
Inventors: 程永利; 陈�光; 曾令仿; 程宏才; 陈兰香; 李勇; 朱健; 张云云; 张丽颖
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2023-02-10
Filing date: 2023-02-10
Publication date: 2023-05-09
Anticipated expiration: 2043-02-10
Also published as: CN115774736A

Abstract

The invention discloses a NUMA architecture time-varying graph processing method and device for data delay sending, which are characterized in that initial time-varying graph data representation is established based on a baseline snapshot; updating the time-varying graph data representation according to the updated snapshot, and constructing a snapshot union; based on the snapshot union, carrying out iterative computation in the NUMA node, and updating and accumulating vertex data; propagating the accumulated vertex data to other NUMA nodes to update other vertex data; and circulating the steps until no computable active vertex exists in each NUMA node, and aggregating the results output by each NUMA node to finish the processing of the NUMA architecture time-varying graph. The invention focuses on the NUMA structure characteristics of the server, realizes reasonable distribution of data and flexible transmission of data packets, reduces the communication frequency among NUMA nodes, improves the utilization rate of computing resources, and remarkably improves the computing efficiency of time-varying graphs.

Description

NUMA architecture time-varying graph processing method and device for data delay transmission

Technical Field

The invention belongs to the technical field of time-varying graph processing, and particularly relates to a NUMA architecture time-varying graph processing method and device for data delay transmission.

Background

The graph is used as a data structure for effectively describing big data, plays a great role in the fields of Internet analysis, social network analysis, recommended network analysis and the like, a plurality of complex calculation problems can be converted into a graph-based problem in reality, and the graph-based problem can be easily solved by using a graph correlation algorithm. However, the real world changes at any time, so that simply processing the static diagram does not satisfy the social needs well, and it is also necessary to analyze the time-varying diagram quickly. So-called time-variant diagrams, also called timing diagrams, consist of a plurality of snapshots, and these snapshots are consecutive in time, each snapshot being represented as a state of the diagram structure of the original diagram at a certain moment in the evolution process. Through the quick analysis of the internal connection between the time-varying map snapshots, people can be helped to predict the future development trend of the real world, and decision support is provided for different fields such as electronic commerce, social interaction and the like.

NUMA (Non-uniform memory access) architecture refers to a system architecture of a computer, which is composed of a plurality of nodes, wherein each node is internally provided with a plurality of CPUs, the CPUs in the nodes use a common memory controller, and the nodes are connected and information interacted through an interconnection module. Thus, all memory within a node is equivalent for all CPUs of the node, but different for all CPUs in other nodes. That is, each CPU can access the entire system memory, but the memory access to the local node is the fastest, and the memory access to the non-local node is slightly slower, i.e., the speed at which the CPU accesses memory is related to the distance of the node. While this feature may have a significant impact on the efficiency of graph analysis, existing graph processing systems are largely NUMA independent, such as graphchi, ligra, X-stream, etc., focusing on other aspects, such as improving memory access, supporting complex task schedulers, reducing edge random access, etc.

NUMA architectures such as polymer, hyGN, etc. are of interest, although there are also a small number of systems. The polymer improves the access mode of the nodes, converts a large number of remote accesses into local accesses, converts a large number of random accesses into sequential accesses, optimizes the locality of data access, and improves the calculation efficiency; hyGN utilizes the characteristics of two processing modes of synchronization and asynchronization, combines synchronous processing and asynchronization processing in the same graph calculation task, can switch the calculation modes according to the algorithm, the execution stage and the graph topology by the system according to the situation, supports a complex task scheduling program and improves the calculation efficiency. However, these systems only focus on the computation of static graphs and cannot support the computation of time-varying graphs. To compute the time-varying graphs, they need to execute static graph algorithms on multiple snapshots, respectively, so the algorithm execution time tends to be proportional to the number of snapshots, resulting in excessively long algorithm execution time.

For most of the above graph processing systems, the influence of the NUMA architecture is ignored, and the problem of lack of a calculation method for a time-varying graph under the NUMA architecture is solved, so that a large-scale time-varying graph processing method based on the NUMA architecture is needed.

Disclosure of Invention

Aiming at the defects of the prior art, the invention aims to provide a method and a device for processing NUMA architecture large-scale time-varying graphs for data delay transmission.

In order to achieve the above purpose, the invention adopts the following technical scheme: an embodiment of the present invention provides a method for processing a NUMA architecture time-varying map of data delay transmission, including the following steps:

(1) Establishing an initial time-varying graph data representation based on the baseline snapshot;

(2) Updating the time-varying graph data representation constructed in the step (1) according to the updated snapshot, and constructing a snapshot union;

(3) Based on the snapshot union set constructed in the step (2), carrying out iterative computation in NUMA nodes, updating and accumulating vertex data;

(4) Propagating the vertex data updated and accumulated in step (3) to other NUMA nodes to update other vertex data;

(5) And (3) to (4) are circularly executed until no computable active vertex exists in each NUMA node, and the result output by each NUMA node is aggregated to finish the processing of the NUMA architecture time-varying graph.

Further, the step (1) specifically includes the following substeps:

(1.1) creating a thread pool, wherein the capacity of the thread pool is the number of CPUs in a server, and each thread in the thread pool is evenly distributed and bound to a corresponding NUMA node;

the method comprises the steps of (1.2) dividing a graph into mutually disjoint graph partitions, wherein the number of the graph partitions is the number of NUMA nodes in a server, reading a baseline snapshot file, calculating the graph partition to which a source vertex belongs by carrying out a residual operation on the source vertex ID of an edge read in sequence, and adding the edge into a task queue corresponding to a thread with a smaller number of tasks in the NUMA nodes corresponding to the graph partition;

and (1.3) after the baseline snapshot file is read, all threads in the thread pool start to execute tasks in the task queue of the threads, and corresponding graph partitions are built in each NUMA node to obtain initial time-varying graph data representation.

Further, the step (2) specifically includes the following substeps:

(2.1) reading a subsequent updated snapshot, calculating a graph partition to which the vertex belongs by carrying out a residual operation on the source vertex ID of the sequentially read edges, and adding the edges to a task queue of a thread with a smaller task number in a NUMA node corresponding to the graph partition;

(2.2) looping step (2.1), reading all updated snapshots, starting threads in the thread pool, each thread executing tasks in its own task queue to update each graph partition to build a snapshot union.

Further, the snapshot union contains all vertices and edges that appear in the time-varying graph multi-snapshot, and each vertex or edge is stored only once and not repeatedly.

Further, the step (3) specifically comprises:

based on the snapshot union constructed in the step (2), in the process of iterative computation in each NUMA node, each NUMA node uses a counter to count the number of active vertexes to be iteratively computed in the respective NUMA node next time, and the vertex data is updated and accumulated after the computation of the currently active vertexes.

Further, the step (4) comprises a vertex data propagation process in the same partition and a vertex data propagation process of different partitions;

the vertex data propagation process in the same partition comprises the following steps: after one round of calculation, the vertexes transmit the updated value to the adjacent vertexes through edges, and the adjacent vertexes update after receiving the updated value;

the vertex data propagation process for different partitions includes: storing the updated value after the calculation of each partition into a message array, delaying the message transmission among the partitions, reducing the communication frequency among the partitions, packaging the message array held by the current NUMA node, sending the message array into other NUMA nodes to update the vertex data in other graph partitions, and starting the other NUMA nodes to perform new iterative calculation.

Further, the vertex data propagation process of the different partitions specifically includes the following steps:

counting the number of active vertexes in each NUMA node in the next process of iteratively calculating the number of active vertexes in each NUMA node; if the number of active vertexes in the current NUMA node is larger than the number of active vertexes in the next NUMA node in the counting process, setting the propagation threshold in the current NUMA node as the number of active vertexes in the next NUMA node; in the statistical process, the number of the movable vertexes in the NUMA node at the next time is required to be compared with a propagation threshold value; after the iterative computation is finished, packaging and transmitting an accumulated message array held by the current NUMA node to other NUMA nodes; message propagation between NUMA nodes is only performed when the next number of active vertices is less than or equal to the propagation threshold;

after receiving the data packet, other NUMA nodes start to update the vertex data in the partition, and count the number of the vertices of the next activity again.

Further, the vertex data propagation process of the different partitions further includes:

and taking the NUMA nodes with the number of the active vertexes being greater than the propagation threshold as the NUMA nodes for frequently sending the data packets, and applying a punishment mechanism to the propagation threshold of the NUMA nodes for frequently sending the data packets, wherein the punishment mechanism is used for modifying the propagation threshold to reduce the sending frequency of the data packets.

A second aspect of the present invention provides a NUMA architecture time-varying graph processing apparatus for data deferred transmission, including a memory and a processor, where the memory is coupled to the processor; the memory is used for storing program data, and the processor is used for executing the program data to realize the NUMA architecture time-varying graph processing method for data delay transmission.

A third aspect of an embodiment of the present invention provides a computer readable storage medium having stored thereon a computer program, which when executed by a processor, implements the above-mentioned NUMA architecture time-varying map processing method for data delay transmission.

Compared with the prior art, the invention has the beneficial effects that:

the NUMA architecture time-varying graph processing method for data delay transmission provided by the invention focuses on NUMA structural features of a server, realizes reasonable distribution of data and flexible transmission of data packets by setting a punishment mechanism, reduces communication frequency among NUMA nodes, improves the utilization rate of computing resources, has simple and convenient implementation method and flexible means, and obviously improves the computing efficiency of a time-varying graph algorithm.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort to a person skilled in the art.

FIG. 1 is a flow chart of a method for processing NUMA architecture large-scale time-varying diagrams with data delay transmission;

FIG. 2 is a block diagram of a NUMA architecture time-varying map processing system for data deferred delivery in accordance with the present invention;

FIG. 3 is a schematic diagram of a time-varying loading subsystem according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a time-varying computing subsystem provided by an embodiment of the present invention;

FIG. 5 is a schematic diagram of a NUMA architecture time-varying graph processing apparatus for data deferred transmission according to the present invention.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the invention. Rather, they are merely examples of apparatus and methods consistent with aspects of the invention as detailed in the accompanying claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the invention. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

The present invention will be described in detail with reference to the accompanying drawings. The features of the examples and embodiments described below may be combined with each other without conflict.

As shown in fig. 1 and fig. 2, the embodiment of the invention provides a method for processing a NUMA architecture time-varying graph with data delay transmission, which not only focuses on the internal structure of a server and improves the utilization rate of computing resources, but also is beneficial to reducing the communication frequency between NUMA nodes and improving the computing efficiency of the time-varying graph; the method comprises the following steps:

the step (1) specifically comprises the following substeps:

(1.2) dividing the graph into mutually exclusive graph partitions, the number of the graph partitions being the number of NUMA nodes in the server. Reading a baseline snapshot file, calculating a graph partition to which the source vertex belongs by carrying out a residual operation on the source vertex ID of the sequentially read edges, and adding the edges to task queues corresponding to threads with smaller task numbers in NUMA nodes corresponding to the graph partition;

and (1.3) after the baseline snapshot file is read, all threads in the thread pool start to execute tasks in the task queue, corresponding graph partitions are constructed in each NUMA node, initial time-varying graph data representation is obtained, and user-defined data can be added to the vertexes or edges in the process.

(2) Updating the time-varying graph data representation constructed in the step (1) according to the updated snapshot, and constructing a snapshot union.

The step (2) specifically comprises the following substeps:

(2.1) reading a subsequent updated snapshot, calculating a graph partition to which the source vertex belongs by performing a residual operation according to the source vertex IDs of the edges read in sequence, and adding the edges to a task queue of threads with a smaller task number in NUMA nodes corresponding to the graph partition;

(2.2) looping step (2.1), reading all updated snapshots, starting threads in the thread pool, each thread executing tasks in its own task queue to update each graph partition to build a snapshot union. User-defined data can be added or modified to the vertices or edges during this process.

As shown in FIG. 3, the time-varying graph is composed of several snapshots, each snapshot represents a state of the time-varying graph at a certain point in time, the snapshot union contains all vertices and edges that appear in multiple snapshots of the time-varying graph, and each object, i.e., each vertex or edge, is stored only once and is not stored repeatedly.

(3) Based on the snapshot union set constructed in the step (2), carrying out iterative computation in each NUMA node, and updating and accumulating vertex data;

based on the snapshot union set constructed in the step (2), in the process of iterative computation in each NUMA node, each NUMA node uses a respective independent counter to count the number of active vertexes which participate in iterative computation next time, and updated vertex data is obtained after the calculation of the currently active vertexes is completed.

(4) Propagating the vertex data updated and accumulated in step (3) to other NUMA nodes to update other vertex data.

Since the graph is divided into a plurality of mutually exclusive graph partitions in step (1), the vertex data may be divided into vertex data inside the partitions and vertex data outside the partitions. And after one round of calculation is finished, the vertexes in the same partition can immediately transmit the updated value to the adjacent vertexes through edges, and the adjacent vertexes can be immediately updated after receiving the updated value. For vertex data of different partitions, although connection relations exist among the vertex data, the updated value after calculation of each partition is not immediately transmitted to the vertexes of other partitions, the updated value is stored in a message array, message transmission among the partitions is delayed, and communication frequency among the partitions is reduced. Message propagation between partitions requires packing the message array held by the current NUMA node by an adaptive packet propagation algorithm, and sending the message packet to other NUMA nodes to update vertex data in other graph partitions and start the other NUMA nodes to perform a new round of iterative computation.

In this example, the vertex data accumulated in step (3) is sent to other NUMA nodes to update the vertex data of other partitions by an adaptive packet propagation algorithm; the method comprises the following specific steps:

and (4.1) counting the number of active vertexes in each NUMA node in the process of iteratively calculating the number of active vertexes in the next NUMA node, if the current number of active vertexes is larger than the next number of active vertexes in the NUMA node in the counting process, setting the propagation threshold value in the NUMA node as the next number of active vertexes in the NUMA node, and comparing the next number of active vertexes in the NUMA node with the threshold value in the counting process. After the iteration execution is finished, the accumulated information is packaged and sent to other NUMA nodes; message propagation between NUMA nodes occurs only when the next number of active vertices is less than or equal to a threshold;

(4.2) after other NUMA nodes receive the data packet, starting to update the vertex data in the partition, and counting the number of the vertices of the next activity again;

in particular, for a NUMA node that frequently sends a packet, that is, the number of active vertices often exceeds the threshold, a penalty mechanism is applied to the threshold setting of the NUMA node, where the penalty mechanism is set to be half of the propagation threshold in the adaptive packet propagation algorithm of the NUMA node that frequently sends the packet, so that the frequency of sending the packet is reduced, and the frequent sending of the packet is reduced.

(5) And (3) to (4) are circularly executed until each NUMA node has calculated convergence (namely no computable active vertex), convergence results output by each NUMA node are aggregated, and processing of the NUMA architecture time-varying graph is completed.

Correspondingly, the invention provides a NUMA architecture time-varying diagram processing system for data delay transmission, which is used for realizing the NUMA architecture time-varying diagram processing method for data delay transmission. The time-varying graph loading subsystem is used for distributing the topological structure of the graph, user-defined data and running states to all NUMA nodes in the server; the time-varying diagram calculation subsystem is used for controlling the communication frequency among NUMA nodes and enabling the NUMA nodes to transmit messages in the form of data packets when a penalty mechanism is triggered during time-varying diagram calculation; and finally, converging the converged calculation results in each NUMA node and outputting the converged calculation results.

Example 1: based on the above-mentioned NUMA architecture time-varying graph processing system with data delay transmission, embodiment 1 is described in detail, as shown in FIG. 4, in this example, it is assumed that the computer has 2 NUMA nodes, and therefore, the first baseline snapshot is divided into two graph partitions, where the first partition includes a first vertex V1 and a second vertex V2, and the first partition includes a third vertex V3 and a fourth vertex V4. And then reading and updating the second baseline snapshot and the third baseline snapshot, and adding the edges of the two snapshots to the corresponding partitions according to the partition where the source vertex id is positioned to construct a snapshot union. And then performing iterative computation on the snapshot union, wherein the first vertex V1 and the second vertex V2 are respectively computed in a first NUMA node, the third vertex V3 and the fourth vertex V4 are respectively computed in a second NUMA node, in the computation process, the first NUMA node and the second NUMA node count the number of active vertices at the moment T1 and the moment T2, the first NUMA node finds that the number of active vertices at the moment T1 and the moment T2 is 2, and the number of active vertices at the moment T1 and the moment T2 of the second NUMA node is 2 and 0 respectively. Since the number of active vertices of the first NUMA node at time T1 is equal to that of the second NUMA node at time T2, no message will be sent to the second NUMA node, so that at time T1, in the first NUMA node, the first vertex V1 and the second vertex V2 will store the update value of the third vertex V3 in the message array, the second vertex V2 will also directly update the first vertex V1, and in the second NUMA node, the third vertex V3 directly updates the fourth vertex V4, and no data packet will be sent because there is no message to be sent to the first NUMA node at this time; then, at time T2, the first vertex V1 and the second vertex V2 store the updated value of the fourth vertex V4 in the message array, and find that the number of active vertices at time T3 is 0, immediately after the end of time T2, package the message array storing the updated values at time T1 and T2, and send the packaged message array to the second NUMA node to update the third vertex V3 and the fourth vertex V4. At this time, the second NUMA node counts the number of active vertices again, finds that the number of active vertices at time T3 is 2 and the number of active vertices at time T4 is 0, so after time T3 is over, the message array storing the updated value of the third vertex V3 to the second vertex V2 and the updated value of the fourth vertex V4 to the first vertex V1 is packaged immediately, and sent to the first NUMA node to update the first vertex V1 and the second vertex V2.

Corresponding to the embodiment of the NUMA architecture time-varying graph processing method for data delay transmission, the invention also provides an embodiment of the NUMA architecture time-varying graph processing device for data delay transmission.

Referring to fig. 5, a NUMA architecture time-varying graph processing apparatus for data delay transmission provided by an embodiment of the present invention includes one or more processors configured to implement the NUMA architecture time-varying graph processing method for data delay transmission in the foregoing embodiment.

The embodiment of the NUMA architecture time-varying graph processing device for data delay transmission can be applied to any device with data processing capability, such as a computer or the like. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in a nonvolatile memory into a memory by a processor of any device with data processing capability. In terms of hardware, as shown in fig. 5, a hardware structure diagram of an apparatus with optional data processing capability where a NUMA architecture time-varying diagram processing device for data delay transmission according to the present invention is located is shown in fig. 5, and in addition to a processor, a memory, a network interface, and a nonvolatile memory shown in fig. 5, the apparatus with optional data processing capability where an embodiment is located generally includes other hardware according to an actual function of the apparatus with optional data processing capability, which is not described herein again.

The implementation process of the functions and roles of each unit in the above device is specifically shown in the implementation process of the corresponding steps in the above method, and will not be described herein again.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present invention. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The embodiment of the invention also provides a computer readable storage medium, on which a program is stored, which when executed by a processor, implements the NUMA architecture time-varying map processing method of data delay transmission in the above embodiment.

The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any of the data processing enabled devices described in any of the previous embodiments. The computer readable storage medium may be any device having data processing capability, for example, a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), or the like, which are provided on the device. Further, the computer readable storage medium may include both internal storage units and external storage devices of any data processing device. The computer readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing apparatus, and may also be used for temporarily storing data that has been output or is to be output.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains. The specification and examples are to be regarded in an illustrative manner only.

It is to be understood that the present application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof.

Claims

1. A method for processing a NUMA architecture time-varying map of data delay transmission, the method comprising the steps of:

the step (1) specifically comprises the following substeps:

(1.2) dividing the graph into mutually exclusive graph partitions, wherein the number of the graph partitions is the number of NUMA nodes in the server; reading a baseline snapshot file, calculating a graph partition to which the source vertex belongs by carrying out a residual operation on the source vertex ID of the sequentially read edges, and adding the edges to task queues corresponding to threads with smaller task numbers in NUMA nodes corresponding to the graph partition;

after the baseline snapshot file is read, all threads in the thread pool start to execute tasks in the task queue of the threads, and corresponding graph partitions are built in each NUMA node to obtain initial time-varying graph data representation;

the step (4) comprises a vertex data transmission process in the same partition and a vertex data transmission process of different partitions;

the vertex data propagation process for different partitions includes: storing the updated value after the calculation of each partition into a message array, delaying the message transmission among the partitions, reducing the communication frequency among the partitions, packaging the message array held by the current NUMA node, transmitting the message array into other NUMA nodes to update the vertex data in other graph partitions, and starting the other NUMA nodes to perform new iteration calculation;

the vertex data propagation process of different partitions specifically comprises the following steps:

after other NUMA nodes receive the data packet, the vertex data in the partition is updated, and the number of the vertices of the next activity is counted again;

the vertex data propagation process for the different partitions further includes:

using NUMA nodes with the number of the active vertexes being greater than the propagation threshold as NUMA nodes for frequently sending data packets, and applying a punishment mechanism to the propagation threshold of the NUMA nodes for frequently sending the data packets, wherein the punishment mechanism is used for modifying the propagation threshold to reduce the sending frequency of the data packets;

2. The method for processing a NUMA architecture time-varying map for data delay transmission according to claim 1, wherein the step (2) specifically comprises the following substeps:

3. A method for processing a time-varying map of a NUMA architecture with delayed data transmission according to claim 1 or 2, wherein the snapshot union contains all vertices and edges that appear in multiple snapshots of the time-varying map, and each vertex or edge is stored only once and not repeatedly.

4. The method for processing a NUMA architecture time-varying map for data delay transmission according to claim 1, wherein the step (3) specifically comprises:

5. A NUMA architecture time-varying graph processing apparatus for data deferred transmission, comprising a memory and a processor, wherein the memory is coupled with the processor; wherein the memory is configured to store program data, and the processor is configured to execute the program data to implement the NUMA architecture time-varying map processing method of data delay transmission according to any one of claims 1 to 4.

6. A computer readable storage medium having stored thereon a computer program, wherein the program when executed by a processor implements a NUMA architecture time variant map processing method of data deferred delivery as claimed in any of claims 1 to 4.