CN116680296A

CN116680296A - Large-scale graph data processing system based on single machine

Info

Publication number: CN116680296A
Application number: CN202310695465.2A
Authority: CN
Inventors: 朱筱可; 刘阳; 刘书昊; 樊文飞
Original assignee: Shenzhen Institute of Computing Sciences
Current assignee: Shenzhen Institute of Computing Sciences
Priority date: 2023-06-12
Filing date: 2023-06-12
Publication date: 2023-09-01

Abstract

The application provides a single-machine-based large-scale graph data processing system, which comprises a data loading module, a data calculating module, a data releasing module, a storage management module and a magnetic disk, wherein the data loading module is used for loading data to the large-scale graph data processing system; the data loading module is used for acquiring a sub-graph with active states from the disk and transmitting the sub-graph to the data calculating module; the data calculation module is used for updating the subgraph and transmitting a message generated by updating to the storage management module; the data calculation module is also used for transmitting the subgraph to the data release module; the data release module is used for writing the subgraph into the disk; the storage management module is to set the state of the subgraph to converge when the subgraph is written to disk. By applying the sub-graph-center based computing model to a stand-alone system and building a unique set of pipeline processing architecture that can overlap data I/O and CPU operations, the I/O cost of a conventional vertex-center computing model is reduced while CPU utilization is improved and sequential access to disk is facilitated.

Description

Large-scale graph data processing system based on single machine

Technical Field

The application relates to the technical field of data processing, in particular to a large-scale graph data processing system based on a single machine.

Background

In recent years, because the graph data is easy to abstract entities and relations in the real world, the graph data has become a topic which is important in the fields of data science and engineering, and is widely applied to a plurality of fields such as social network analysis, recommendation systems, financial fraud detection, drug discovery and the like, meanwhile, the graph data has high flexibility, and many problems originally modeled by using matrixes, relations or other data structures can be converted into graph data processing, so that the importance of the graph data is further highlighted. With the enhancement of social media and mobile internet applications, the size of abstract graph data generated or collected by computer systems is growing rapidly, and this increase in magnitude presents a very acute challenge to the large-scale data storage, analysis, and mining capabilities of modern computer systems.

Conventional large-scale graph computing systems use a parallelization approach to data partitioning, i.e., integrating multiple computer resources to complete the graph computation task. While these computing systems play an important role in the area of large-scale graph processing, due to the high maintenance and construction costs, only a few companies with large-scale computer clusters are able to perform large-scale graph computation, and furthermore, distributed computing systems are generally based on an assumption that using more computing nodes reduces computation time, but in practice this assumption is not always true, and adding computing nodes may result in greater communication costs, thereby failing to significantly improve system performance.

For the practical requirements and resource-constrained usage scenario of large-scale graph analysis, a series of single-machine-based large-scale graph processing systems are proposed. The systems process the large graph by using the external memory as the internal memory expansion, and adopt a calculation model based on the vertex center (the model limits the information in the calculation process to be transmitted between nodes), so as to improve the data locality and simplify the use burden of users. Although the calculation model based on the vertex center is simple and easy to understand, there is also a problem that communication overhead or I/O overhead is high.

Disclosure of Invention

In view of the above problems, the present application has been made to provide a stand-alone large scale graphics data processing system that overcomes or at least partially solves the problems, including:

a large-scale graph data processing system based on a single machine comprises a data loading module, a data calculating module, a data releasing module, a storage management module and a magnetic disk; the disk stores large-scale graph data composed of a plurality of subgraphs; the storage management module stores state information corresponding to each sub-graph; in the initial state, the state of the subgraph is active;

the data loading module is used for acquiring the subgraph with active state from the disk and transmitting the subgraph to the data calculating module;

the data calculation module is used for updating the subgraph and transmitting a message generated by updating to the storage management module;

when the updated sub-graph has a change, the data calculation module is further used for transmitting the sub-graph to the data release module;

when the subgraph is not the last in the current round of updating, the data release module is used for writing the subgraph into the disk;

the storage management module is configured to set a state of the subgraph to be converged when the subgraph is written to the disk.

Preferably, the data calculation module is further configured to write the sub-graph to the disk when there is no change in the updated sub-graph.

Preferably, the data release module is further configured to transmit the sub-graph to the data loading module when the sub-graph is the last in the current round of updating.

Preferably, the storage management module is further configured to set the state of the sub-graph of the received message to active when the current round of updating is finished.

Preferably, when the current round of updating is finished and there is no message buffer in the storage management module, the data calculation module is further configured to aggregate all the subgraphs to obtain updated large-scale graph data.

Preferably, the storage management module comprises a message storage unit and a state management unit; the state management unit stores the state information;

the data calculation module is used for transmitting the message generated by updating to the message storage unit;

when the current round of updating is finished, the state management unit is used for setting the state of the sub-graph of the received message to be active.

Preferably, the data calculation module comprises an aggregation calculation unit;

and when the current round of updating is finished and the message cache does not exist in the message storage unit, the aggregation calculation unit is used for aggregating all the subgraphs to obtain updated large-scale graph data.

Preferably, when the sub-graph is acquired by the data loading module, the storage management module is further configured to set a state of the sub-graph to wait for calculation.

Preferably, the storage management module is further configured to set the state of the sub-graph to be being calculated when the sub-graph is transferred to the data calculation module.

Preferably, the storage management module is further configured to set the state of the sub-graph to be released when the sub-graph is transferred to the data release module.

The application has the following advantages:

in the embodiment of the application, compared with the problem that the communication cost or the I/O cost of the existing large-scale graph processing system based on a single machine is higher, the application provides a solution for applying a calculation model based on a sub-graph center to the single machine system and establishing a set of pipeline processing architecture, which comprises the following specific steps: a large-scale image data processing system based on a single machine comprises a data loading module, a data calculating module, a data releasing module, a storage management module and a magnetic disk; the disk stores large-scale graph data composed of a plurality of subgraphs; the storage management module stores state information corresponding to each sub-graph; in the initial state, the state of the subgraph is active; the data loading module is used for acquiring the subgraph with active state from the disk and transmitting the subgraph to the data calculating module; the data calculation module is used for updating the subgraph and transmitting a message generated by updating to the storage management module; when the updated sub-graph has a change, the data calculation module is further used for transmitting the sub-graph to the data release module; when the subgraph is not the last in the current round of updating, the data release module is used for writing the subgraph into the disk; the storage management module is to set the state of the subgraph to be converging when the subgraph is written to the disk. By applying the computation model based on the sub-graph center to a stand-alone system and establishing a set of unique pipeline processing architecture, the architecture can overlap data I/O and CPU operations, thereby reducing the I/O cost of the traditional vertex center computation model, improving the CPU utilization rate and promoting sequential access to a disk.

Drawings

In order to more clearly illustrate the technical solutions of the present application, the drawings that are needed in the description of the present application will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a schematic illustration of a connected component calculation process on a vertex center model and a subgraph center model;

FIG. 2 is a schematic diagram of a large-scale data processing system according to an embodiment of the present application;

FIG. 3 is a schematic diagram of a state management and optimization strategy for a large-scale data processing system according to one embodiment of the present application.

Detailed Description

In order that the manner in which the above recited objects, features and advantages of the present application are obtained will become more readily apparent, a more particular description of the application briefly described above will be rendered by reference to the appended drawings. It will be apparent that the described embodiments are some, but not all, embodiments of the application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

The inventor has found by analyzing the prior art that the calculation model based on the vertex center can definitely bring additional communication cost or I/O cost due to the transmission of messages between vertices, and as shown in fig. 1, the calculation of the connected components is completed on the input graph G, and the calculation steps are obviously less by adopting the calculation model based on the sub-graph center (the model allows the information in the calculation process to be freely transmitted in the sub-graph) than by adopting the calculation model based on the vertex center.

However, the conventional computing model based on the sub-graph center is designed for a multi-machine system, and there are no works to introduce the computing model based on the sub-graph center into a stand-alone environment, so some problems have not been clarified, for example: after the communication cost is potentially converted to an I/O cost, introducing a computation model based on sub-graph centers, can systematically reduce the I/O cost of the out-of-core graph system and improve multi-core parallelism? Traditional sub-graph-center based computational models require finer granularity partitioning of the graph to promote parallelism, but this comes at the cost of more redundant control information, such as global vertex ID to local vertex ID mappings, which can be resolved in a distributed environment by allocating enough memory for each compute node, but finer granularity graph partitioning in a single-core multi-core environment can take up valuable memory resources.

The inventors believe that expanding the sub-graph center based computational model to a stand-alone system would face the following improvements: when the input diagram exceeds the memory capacity, the single-machine system needs to calculate by taking an auxiliary memory (such as a hard disk, an SSD and the like) as a memory extension, so that the scheduling of the diagram between the memory and the disk needs to be reasonably managed; the traditional calculation model based on the sub-graph center transmits information between calculation units through a computer network to synchronize, but under the condition of sharing a memory, the synchronization logic of a single machine system is changed, so that the information synchronization needs to be realized more effectively; the calculation model based on the sub-graph center only utilizes the parallelism of the data partition, which may lead to the insufficient utilization rate of the CPU kernel or excessive graph fragmentation under the condition of limited memory capacity, so that the parallel calculation between sub-graphs and the parallel calculation in the sub-graphs need to be considered in balance; because of the shared memory architecture of stand-alone systems, the cost of job migration between cores of a single machine is low, and flexible resource scheduling is required to improve system performance.

In this embodiment, a large-scale graph data processing system based on a single machine is provided, including a data loading module, a data calculating module, a data releasing module, a storage management module and a disk; the disk stores large-scale graph data composed of a plurality of subgraphs; the storage management module stores state information corresponding to each sub-graph; in the initial state, the state of the subgraph is active;

In the embodiment of the application, compared with the problem of higher communication overhead or I/O overhead of the existing large-scale graph processing system based on a single machine, the method and the device apply the calculation model based on the sub-graph center to the single machine system and establish a set of pipeline processing architecture. Referring to fig. 2, given a large graph G (the large graph G is initially stored on disk), the pipeline processing architecture takes the subgraph F0, F1, F2, F3 of the graph G as the minimum input-output unit and iteratively updates the large graph G with the pipeline, specifically, the architecture breaks the extranuclear processing of the subgraph Fi into three successive stages: reading Fi into a memory, calculating and updating Fi, and if necessary, writing the updated Fi back to an external memory, wherein the stages are completed through three modules, namely a data loading module, a data calculating module and a data releasing module, and the modules work asynchronously through two task queues, namely an input queue and an output queue in a pipeline processing architecture.

The pipeline processing architecture effectively overlaps sub-graph I/O and CPU operations, calculates on a memory sub-graph, loads a suspended sub-graph from a disk, can reduce the I/O cost of a traditional vertex center calculation model, improves the CPU utilization rate by reducing idle waiting, and can continuously access the disk; in addition, the architecture adopts a shared memory data structure to carry out message transmission and efficient synchronization, and can separate calculation from memory management and scheduling, thereby providing new opportunities for optimization.

A large-scale drawing data processing system based on a single machine in the present exemplary embodiment will be further described.

In this embodiment, the system employs APIs based on a hybrid computing model. The APIs adopt a unified PIE+ interface, the set of interfaces integrate programming models of the vertex center and the sub-graph center, a user can parallelize a sequence graph algorithm under a calculation model based on the sub-graph center to simplify parallel programming (parallel among sub-graphs), and parallelism inside the sub-graphs based on the vertex center calculation model can be further explored through a new interface. It should be noted that, the hybrid model supports the inter-subgraph parallelism of the "subgraph center calculation model" and the intra-subgraph parallelism of the "vertex center calculation model" at the same time, so that the multi-core resource can be better utilized in the limited memory, and the fragmentation of the input graph is avoided; in addition, it provides a unified interface from which users can select the interface that best suits their application and graphics.

In this embodiment, the system further includes a scheduler. The scheduler is used to track and allocate threads in a thread pool, where each thread corresponds to a physical CPU core, it decides to allocate physical threads to virtual work threads for performing (parallel) computations on sub-graphs, it also makes proactive adjustments to support two levels of parallelism: when a thread is available, the scheduler allocates it to a new compute unit by consuming an "input queue" to speed up parallelism between sub-graphs, or to improve parallelism within sub-graphs by the operating work core.

In this embodiment, the storage management module includes a message storage unit; the data calculation module is used for transmitting the message generated by updating to the message storage unit. The message storage unit is used for realizing message synchronization among the parallel computing units. In particular, the message storage unit is implemented as a data structure in memory, which may be implemented as a compact array of variable length in order to increase space efficiency. It should be noted that the spatial complexity of the message storage unit is closely related to the partitioning strategy, and if there are more boundary vertices/edges, the more space the message storage unit consumes. Compared with the message passing strategy of the multi-machine system, the message storage unit is adopted to have higher working efficiency in the shared memory environment.

In this embodiment, the storage management module further includes a state management unit; the state management unit stores the state information and can update the state information at a specific time. The state management unit is configured to maintain a state machine to model the state of the subgraph. Specifically, the state management unit is implemented as a lightweight data structure, each sub-graph only maintains several states, and the occupied memory space is negligible.

Recording the state information of the subgraph with the state management unit is a low-cost convergence detection method, and only one mark list M is needed to help track the message exchange among the computing units, and a lightweight state machine is needed to model the working progress of each computing unit. Specifically, the state management unit constructs a tag list M, where each sub-graph corresponds to a tag, and is used to indicate whether it receives any message in the round of iteration, if at least one pending update of a sub-graph is to be extracted from the message storage unit, the corresponding M [ i ] is true, and otherwise, M [ i ] is false. In actual operation, a finite state machine may be employed to model the progress of each sub-graph and use the flags MI to trigger state transitions for the sub-graph.

As shown in fig. 3, the states of the sub-graph include "active", "waiting for computation", "calculating", "releasing" and "converging", and at any time, the sub-graph is in one of five states, where the first two states represent that the sub-graph is on disk and the remaining states represent that the sub-graph is in memory. The initial state of each sub-graph is "active", meaning that the sub-graph is waiting for the data loading module to load it into memory; when the sub-graph is acquired by the data loading module, the state management unit is used for setting the state of the sub-graph to be 'waiting for calculation', which means that the sub-graph is already resident in a memory, and the sub-graph waits for being allocated with a processing core; when the sub-graph is transferred to the data computation module, the state management unit is configured to set the state of the sub-graph to "computing", meaning that the sub-graph is being processed by a processing core computing; when the sub-graph is transmitted to the data release module (i.e. the sub-graph generates a message to be sent to other sub-graphs in the current round of updating), the state management unit is configured to set the state of the sub-graph to "in release"; the state management unit is configured to set a state of the subgraph to "convergence" when the subgraph is written to the disk; when the current round of updating is finished, the state management unit is further used for setting the state of the subgraph participating in the next round of updating to be active so that the part of subgraph starts the next round of updating; the entire system stops updating if and only if the current round of updating is over and there is no message cache in the message storage unit.

Under certain conditions, the system may skip certain states in a round of computation without affecting correctness, that is, may take some "shortcuts" in state transitions and reduce unnecessary computation and I/O.

As shown in fig. 3, in this embodiment, when the current round update ends, the storage management module is configured to set the state of the sub-graph of the received message to "active" ("shortcut a"). In order to start a new round of incremental computation, the state of the sub-graph with the state of "convergence" needs to be reset to be "active", and if M [ i ] corresponding to one sub-graph is true at this time, the sub-graph can be kept in the "convergence" state so as to not participate in the next round of updating, thereby comprehensively skipping the processing of the sub-graph without updating, and not affecting the correctness of the program. The "shortcut A" is often utilized in cases where the input graph is not well connected, some subgraph is "isolated" and effectively reduces I/O costs.

In this embodiment, the data release module is configured to transmit the sub-graph to the data loading module ("shortcut B") when the sub-graph is the last of the current round of updates. After all the sub-graphs complete the current round of updating, a new round of updating is started, if one sub-graph is still in a state of being released, that is, is not completely saved on a disk, the state of the sub-graph can be directly set to be "waiting for calculation", so that the new round of updating of the sub-graph can be started without passing through the disk. "shortcut B" can be utilized at the end of each round and effectively reduces I/O costs.

In this embodiment, the data calculation module is configured to write the sub-graph to the disk ("shortcut C") when there is no change in the updated sub-graph. When a sub-map update is completed, if it has not changed compared to before calculation, the "in release" state can be skipped and set to "convergence" directly, thereby effectively reducing redundant disk writing.

In this embodiment, the data computing module includes an aggregation computing unit, where when the current round of updating is finished and there is no message cache in the message storage unit, the aggregation computing unit is configured to call a preset aggregation function to aggregate all the subgraphs, so as to obtain updated large-scale graph data.

While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiment and all such alterations and modifications as fall within the scope of the embodiments of the application.

Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or terminal that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or terminal. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or terminal device comprising the element.

The foregoing has outlined a detailed description of a single-machine-based large-scale graphics data processing system in accordance with the present application, wherein specific examples are presented herein to illustrate the principles and implementations of the application and to assist in understanding the method and core concepts of the application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims

1. The large-scale graph data processing system based on the single machine is characterized by comprising a data loading module, a data calculating module, a data releasing module, a storage management module and a magnetic disk; the disk stores large-scale graph data composed of a plurality of subgraphs; the storage management module stores state information corresponding to each sub-graph; in the initial state, the state of the subgraph is active;

2. The system of claim 1, wherein the data computation module is further configured to write the subgraph to the disk when there is no change in the updated subgraph.

3. The system of claim 1, wherein the data release module is further configured to transmit the sub-graph to the data loading module when the sub-graph is the last of the current round of updates.

4. The system of claim 1, wherein the storage management module is further configured to set the state of the sub-graph of the received message to active when the current round of updating is completed.

5. The system of claim 1, wherein the data computation module is further configured to aggregate all of the subgraphs to obtain updated large-scale graph data when a current round of updating is completed and there is no message buffering in the storage management module.

6. The system of claim 1, wherein the storage management module comprises a message storage unit and a status management unit; the state management unit stores the state information;

7. The system of claim 6, wherein the data computing module comprises an aggregate computing unit;

8. The system of claim 1, wherein the storage management module is further configured to set a state of the subgraph to wait for computation when the subgraph is acquired by the data loading module.

9. The system of claim 1, wherein the storage management module is further configured to set the state of the subgraph as being computed when the subgraph is transmitted to the data computation module.

10. The system of claim 1, wherein the storage management module is further configured to set the state of the sub-graph to be released when the sub-graph is transferred to the data release module.