CN117149795A - Adaptive graph calculation updating method and system based on hybrid memory - Google Patents

Adaptive graph calculation updating method and system based on hybrid memory Download PDF

Info

Publication number
CN117149795A
CN117149795A CN202311197360.0A CN202311197360A CN117149795A CN 117149795 A CN117149795 A CN 117149795A CN 202311197360 A CN202311197360 A CN 202311197360A CN 117149795 A CN117149795 A CN 117149795A
Authority
CN
China
Prior art keywords
graph
data
sub
vertex
edge
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311197360.0A
Other languages
Chinese (zh)
Inventor
刘燕兵
李保珂
曹聪
袁方方
王大魁
张啸梁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Information Engineering of CAS
Original Assignee
Institute of Information Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Information Engineering of CAS filed Critical Institute of Information Engineering of CAS
Priority to CN202311197360.0A priority Critical patent/CN117149795A/en
Publication of CN117149795A publication Critical patent/CN117149795A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • G06F15/7821Tightly coupled to memory, e.g. computational memory, smart memory, processor in memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • G06F9/544Buffers; Shared memory; Pipes
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a self-adaptive graph calculation updating method and system based on a hybrid memory. When the directed graph data is stored in an edge-out mode, the vertex interval and the corresponding edge-out data block are obtained by uniformly dividing the vertex interval; when the storage is carried out in an edge entering mode, vertex intervals and corresponding edge entering data blocks are obtained according to the uniform division of the vertex intervals; when constructing the subgraph, the corresponding edge entering data blocks and edge exiting data blocks are directly utilized, so that all data blocks are prevented from being traversed; in order to improve the access efficiency of the sub-graph data, the invention stores the edge entering data and the edge exiting data of the graph data in two NUMA nodes respectively; secondly, a push-pull self-adaptive data updating strategy based on data driving is adopted to optimize the flow of message updating in the iterative graph calculation process. The method solves the problems of sub-graph construction and updating modes of the hybrid memory graph calculation model, and greatly improves the graph calculation efficiency.

Description

Adaptive graph calculation updating method and system based on hybrid memory
Technical Field
The invention belongs to the technical field of artificial intelligence-big data-graph calculation, and relates to a self-adaptive graph calculation updating method and system based on a hybrid memory.
Background
In the field of computer science, graph (Graph) is one of the most complex and widely used data structures. It uses vertex V and edge E to represent the relationship between objects, a native expression of the relationship between things. In general, the figures are written as: g= (V, E). Wherein the vertex set V is a finite non-empty set that can be used to represent various objects, which can contain the ID number of the vertex, value information, and other user-defined attributes; the edge set E is a finite and empty set, and can represent various association relations between objects. Many real world data can be represented naturally as map data. For example, a Web page in a Web network may be used as vertex data and hyperlinks in the Web page may be used as edges; stations in the traffic network may be considered vertices and routes between stations may be considered edges; user data in the social field can be regarded as vertices, and friends and attention relations among users can be regarded as edges. With the support of graph theory, a plurality of practical problems can be effectively solved by means of various graph algorithms. For example, the web page value assessment may be implemented using a centrality series of algorithms; route planning may be achieved by means of a path search series algorithm; social group discovery may be implemented using a community discovery algorithm. However, with the rapid development of the internet and digitizing technology, the scale of graph data grows exponentially and often accompanies the power law distribution. The conventional graph algorithm is difficult to solve various problems such as unbalanced load, frequent data exchange, low efficiency of a caching mechanism and the like, which directly results in that the conventional graph algorithm is difficult to meet the increasing performance and function requirements. In recent years, with the rapid development of digital technologies such as artificial intelligence, high-performance graph computing technology has gained widespread attention from research personnel in domestic and foreign industries and academic circles.
The graph calculation is an abstract calculation process of a classical graph algorithm by utilizing various iteration paradigms and optimization technologies on the basis of a given hardware environment so as to meet the requirement of high-efficiency processing of large-scale graph data. Graph computation plays an extremely important role in various fields from relationship analysis, product recommendation to fraud detection. In recent years, with the rapid development of artificial intelligence technology, the scale of graph data grows exponentially, so that huge memory occupation problems occur when storing and processing the large-scale graph data. In addition to the vast data size, graph data tends to exhibit a power law distribution. Kumar P et al point out that this inherent imbalance characteristic can lead to high memory-to-computation ratio problems of graph computation, load imbalance problems, and parallel inefficiency problems, among others. In order to cope with the above problems, research focused on improving the efficiency of graph calculation models is beginning to come into great play. Zhang et al build a single-machine memory map calculation model with large-capacity memory to avoid high disk I/O overhead; and constructing a distributed memory map calculation model in the distributed cluster by using a map division technology, and processing a plurality of subgraphs in parallel to improve the expandability of the single-machine map calculation system. With significant progress in the research of persistent memory (Persistent Memory, PMEM), this new memory device and conventional memory (DRAM) constitute a hybrid memory system (Hybrid Memory System, HMS). HMS becomes a viable solution for data-centric efficient graph computation. In summary, the current memory-based graph computation model can be divided into: a single graph calculation model based on a traditional memory, a graph calculation model based on a distributed shared memory and a graph calculation model based on a hybrid memory.
Single-computer graph calculation model based on traditional memory
A single-machine graph calculation model based on a traditional memory utilizes a large-capacity memory to build a lightweight graph calculation model. The memory capacity of the model can accommodate the whole graph data, so that the efficient programming is convenient; and simultaneously, the parallel execution efficiency of the graph algorithm is improved conveniently. The Ligra model proposed by shun et al provides two typical programming interfaces for edge mapping and vertex mapping, respectively. This makes the model very advantageous for implementation of graph traversal algorithms on top sets. In addition, a parallel traversal concept is used in Breadth First Search (BFS) based algorithms, wherein the programming interface can also adaptively switch Push-Pull (Push-Pull) computing modes according to the density of the graph data. A vertex-program-centric program is mapped into GraphMat with efficient and scalable sparse matrix operation. Under the multi-core multi-thread parallel condition, the efficiency of the graph calculation model can be further improved. The graph calculation model GraphIt proposed by Zhang et al separates graph calculation and graph scheduling, and can process graph data with different structures and different sizes. Because of its large enough DRAM, these systems can avoid the costly disk I/O overhead.
(II) graph calculation model based on distributed shared memory
The design target of the model is high abstract like MapReduce, so that an iterative graph algorithm with sparse calculation dependency characteristics can be efficiently executed, and high consistency and efficient parallel calculation of graph data in a calculation process are ensured. The open source diagram computing framework proposed by the Select laboratory of CMU (university of Carcinyl Mercury) is developed and realized by using C++ language, is a large-scale flow diagram data parallel computing framework, and can be efficiently operated in a cluster environment of multiple processors. Rong Chen et al analyzed the Power-law (Power-law) of the plot data and employed Hybrid partitioning strategies. The vertex generation of the Mirror vertex is reduced by adopting a vertex-cut strategy for the height vertex, so that the problem of heavy calculation task of the height vertex is solved; and adopting an edge-cut segmentation mode for the low-degree vertex to ensure that the low-degree vertex calculation is executed as locally as possible. Tim et al propose a graph calculation model Grade which utilizes a resource attribution method to construct a fine-granularity and uniform workload level and system level performance view through a monitoring log and an application program, and can automatically identify resource bottlenecks and common performance problems. The model can process a plurality of subgraphs in parallel, and improves the parallel computing capacity of a single graph computing system.
(III) graph calculation model based on hybrid memory
The model aims to construct an efficient single-machine memory map calculation model in an emerging hybrid memory system, reduce the dependence and construction cost on a traditional DRAM and improve the expandability of the traditional memory map calculation model. Huang et al processes evolution graphs by using a hybrid storage format in combination with edge lists and adjacency lists. It first loops through the edge log in the DRAM and stores the updated graph data in an edge list format. In addition, it uses many adjacency tables to store old data (edge data periodically archived from edge logs) to support efficient graph storage and querying. Wang R et al introduced a PMEM-based efficient graph storage model XPgraph, and proposed a large-scale dynamic graph storage model with hierarchical vertex buffer management and NUMA-friendly graph data access capabilities using a PMEM-friendly XPline data access program. Li et al set up the graph calculation model EPGraph based on PMEM, placing all vertices and edges in DRAM and PMEM, respectively. To reduce random access to vertices, they use DRAM as a data buffer. Meanwhile, a graph data layering method based on degree perception is adopted, so that the graph data access efficiency is improved.
The prior technical scheme mainly comprises a single-machine graph calculation model based on a traditional memory, a graph calculation model based on a distributed shared memory and a graph calculation model based on an emerging hybrid memory. The existing mainstream models in the schemes have certain defects in different aspects, and are specifically as follows:
1. single-machine graph calculation model based on traditional memory: the model can avoid high disk I/O overhead and is convenient for optimizing the execution efficiency of the graph algorithm by utilizing a multithreading technology. The model can process map data of a certain scale, but the model is seriously dependent on a large-capacity DRAM, so that the map data which is exponentially and continuously growing is difficult to process; although such models may utilize multi-threaded parallel optimization graph algorithms, stand-alone computing power is limited after all. The above factors result in severely limited application of single-machine memory map computing systems with limited resources.
2. Graph calculation model based on distributed shared memory: the model can process the ultra-large scale map data by using the distributed clusters, reduces the hardware requirement of a single computing node, and improves the expansion capacity and parallel computing capacity of the single-machine memory map computing model. However, the inherent power of the graph data can cause the problem of unbalanced load in the graph dividing process and the problem of unbalanced task in the graph calculating process, and the communication overhead among the distributed computing nodes becomes the performance bottleneck of the model.
Graph calculation model based on hybrid memory: the model utilizes the characteristics of PMEM such as large capacity, durability, byte addressing and the like, and reduces the dependency of the model on the traditional DRAM. Because of the read-write performance difference between DRAM and PMEM, the problem of low graph data access efficiency in HMS is caused, and in particular, the existing hybrid memory graph calculation model does not consider memory heterogeneity between NUMA nodes, that is, the local PMEM data access (Local PMEM Access) efficiency is far lower than the remote memory data access (Remote Memory Access), and optimal performance cannot be obtained.
Disclosure of Invention
Aiming at the problems existing in the prior art, the invention aims to provide a self-adaptive graph calculation updating method and system based on a hybrid memory.
The method is mainly used for solving the problems of sub-graph construction and updating modes existing in the existing mixed memory graph calculation model; the subgraph construction is an important basic link in the graph calculation model; whereas the update pattern of the graph data G directly affects the efficiency of the graph computation. The method mainly focuses on accelerating sub-graph construction based on a Dual-Block storage mode, and effectively supports a self-adaptive Push-Pull (Push-Pull) update model based on data driving. Specifically, in HMS, the outgoing (InBlock) and incoming (OutBlock) data of the subgraph are stored using the mass characteristics of PMEM; when the directed graph data is first stored in the edge-out mode, the vertex interval i is uniformly divided according to the vertex interval ]The corresponding outgoing edge data block is OutBlock [ i ]]The method comprises the steps of carrying out a first treatment on the surface of the Similarly, when the directed graph data is stored in the edge-in manner, the vertex interval [ i ] is uniformly divided according to the vertex interval]The corresponding edge entering data block is InBlock [ i ]]The method comprises the steps of carrying out a first treatment on the surface of the Construction of subgraph G i In this case, the edge data block InBlock i is directly used]And out-edge data block OutBlock [ i ]]Thereby avoiding traversing all InBlock or OutBlock. In order to improve the access efficiency of the sub-graph data, the invention stores the in-edge data InBlock and the out-edge data OutBlock of the graph data in two NUMA (Non-Uniform Memory Access, non-uniform memory access architecture) nodes (Node 0, node 1) respectively. Second, a push based on data driving is adopted-pulling an adaptive data update strategy (Date-Drivened Adaptive Push-Pull Updating Model) to optimize the flow of message updates in the iterative map calculation process. Finally, using dynamic data migration strategy based on degree perception, performing swap-in-and-out operation between DRAM-PMEM aiming at possible subgraphs in next round of iteration.
In general, the method accelerates the subgraph construction process by storing the InBlock [ i ] and OutBlock [ i ] of the subgraph through the bi-directional subgraph data. The graph storage mode effectively supports a data-driven self-adaptive Push-Pull (Push-Pull) update model, and the self-adaptive calculation mode directly improves the graph calculation efficiency, so that the method becomes a key point for efficiently processing large-scale graph data.
The method and the device can remarkably improve the efficiency of sub-graph construction, and improve the overall efficiency of the model with acceptable data storage space cost. Unlike previous work, the model provided by the invention combines a Dual-Block graph storage representation mode-based and a data-driven-based adaptive push-pull strategy, and optimizes graph data access efficiency in the HMS through graph data layering and dynamic graph data migration strategies. A number of experimental results also indicate that the proposed NPGraph model can provide better performance.
The technical scheme of the invention is as follows:
a self-adaptive graph calculation updating method based on a hybrid memory comprises the following steps:
1) Setting two NUMA nodes, namely Node0 and Node1, wherein each NUMA Node adopts a hybrid memory system HMS, and the hybrid memory system HMS comprises a persistent memory PMEM and a memory DRAM; the Node0 is provided with a preprocessing module, a sub-graph construction module and a sub-graph update module;
2) The preprocessing module divides the vertex set V and the edge set E of the graph data g= (V, E) into P disjoint vertex sections V 1 ~V p Sum edge block E 1 ~E p For generating P sub-graphs G 1 ~G P The method comprises the steps of carrying out a first treatment on the surface of the Wherein i=1 to P, and the ith vertex section V i The included vertex subset is the sub-graph G to be generated i Vertex set, edge block E i Including sub-graph G to be generated i With a vertex interval V i OutBlock [ i ] is an out-edge data block of a source vertex]And in the vertex interval V i InBlock [ i ] for destination vertex edge data block]The method comprises the steps of carrying out a first treatment on the surface of the Then loading the sub-graph data of the edge-out mode into the Node0, and loading the sub-graph data of the edge-in mode into the Node 1;
3) The subgraph construction module calculates the density degree of each subgraph to be generated according to the outgoing edge data of each subgraph read from the Node0, sets a proportion parameter delta according to the statistical analysis of the vertex degree information of the graph data G, and transfers the most dense delta P subgraph data to be generated in the permanent memory PMEM of the hybrid memory system HMS of the Node0 to the memory DRAM of the hybrid memory system HMS of the Node 0; reading the incoming edge data of the corresponding subgraph from the Node1, calculating the density degree of each subgraph to be generated, and migrating the most dense delta P subgraph data to be generated in the permanent memory PMEM of the hybrid memory system HMS of the Node1 to the memory DRAM of the hybrid memory system HMS of the Node1 according to the proportion parameter delta; then the sub-graph construction module generates an ith sub-graph G according to the ith sub-graph data in the local memory DRAM of the Node0 and the ith sub-graph data in the memory DRAM of the remote access Node1 i =(V i ,E i );
4) The sub-graph update module updates each sub-graph G i The iterative updating method comprises the following steps: computing sub-graph G at current iteration i Is of activity epsilon of (2) [i] Determining a subgraph G i The update mode adopted; wherein one is set based on epsilon [i] And θ, as subgraph G) i Is of activity epsilon of (2) [i] When the threshold value theta is smaller than the set threshold value theta, returning the threshold function selection model () to the Push mode, otherwise, returning to the Pull mode; subgraph G in Push mode i The vertex v in (1) holds the target vertex set D v The method comprises the steps of carrying out a first treatment on the surface of the Subgraph G in Pull mode i The vertex v in (1) holds the source vertex set S v The method comprises the steps of carrying out a first treatment on the surface of the Push mode graph algorithm pushes updated value of vertex v to D v Pull mode graph algorithm from S v The source vertex value is pulled to update the value of vertex v.
Further, the specific implementation method of the step 1) is as follows: firstly, storing the edge-out mode data of the graph data G in a first storage unit and performing sub-division according to vertex intervals to obtain each vertex interval and an edge-out data block corresponding to each vertex interval, wherein the edge-out data block corresponding to an ith vertex interval [ i ] is OutBlock [ i ]; and storing the edge entering mode data of the graph data G in a second storage unit and performing sub-division according to the vertex intervals to obtain each vertex interval and an edge entering data block corresponding to each vertex interval, wherein the edge entering data block corresponding to the ith vertex interval [ i ] is InBlock [ i ].
Further, the first storage unit and the second storage unit are solid state disks or mechanical hard disks.
Further, in step 2), the method for constructing the subgraph includes: 1) Node0 and Node1 create state data and attribute data of all sub-graph data vertices in DRAM at the same time; 2) Reading out edge data blocks OutBlock of all sub-graphs to be generated from a Node0 and entering a local PMEM; reading all edge entering data blocks InBlock of all sub-graphs to be generated from a Node1 and entering a local PMEM; 3) Node0 and Node1 respectively calculate the density degree R of each sub-graph to be generated i The method comprises the steps of carrying out a first treatment on the surface of the 4) Node0 and Node1 respectively set a proportion parameter delta according to the statistical analysis of the vertex degree information of the graph data G, and transfer the most dense delta P pieces of sub-graph data to be generated in the permanent memory PMEM of the mixed memory system HMS to the memory DRAM of the mixed memory system HMS; 5) Generating an ith sub-graph G according to the ith sub-graph data in the DRAM or the PMEM of the Node0 and the ith sub-graph data in the DRAM or the PMEM of the Node1 i =(V i ,E i )。
Further, subgraph G i Is of a degree of density of (1)Wherein d v Degree representing vertex v; v (V) i I represents subgraph G i Number of medium vertices.
Further, DRAM access rate Wherein N is D N is the number of accesses to DRAM P The number of access times of PMEM is 0.ltoreq.delta.ltoreq.1.
Further, the data of each sub-graph includes graph structure data, attribute data, and state data; marking vertexes in the graph structure data as Row and edges as Col; creating attribute data and state data for each vertex; marking attribute data of vertices as D curr And D next Marking state data of the vertex as S curr And S is next
Further, sub-graph G i Out block i]After being compressed by CSR, the data are stored in a Node 0; subgraph G i InBlock [ i ] of the incoming edge data block of (1)]After CSR compression processing, the data is stored in Node 1.
The adaptive graph computing and updating system based on the hybrid memory is characterized by comprising two NUMA nodes, namely Node0 and Node1; each NUMA node adopts a hybrid memory system HMS, wherein the hybrid memory system HMS comprises a permanent memory PMEM and a memory DRAM; the Node0 is provided with a preprocessing module, a sub-graph construction module and a sub-graph update module;
the Node0 is used for storing sub-graph data of the edge mode; the Node1 is used for storing sub-graph data of an edge entering mode;
the preprocessing module is configured to segment a vertex set V and an edge set E of graph data g= (V, E) into P disjoint vertex intervals V 1 ~V p Sum edge block E 1 ~E p For generating P sub-graphs G 1 ~G P The method comprises the steps of carrying out a first treatment on the surface of the Wherein i=1 to P, and the ith vertex section V i The included vertex subset is the sub-graph G to be generated i Vertex set, edge block E i Including sub-graph G to be generated i Out block i]And an in-edge data block InBlock i];
The sub-graph construction module is configured to calculate a degree of density of each sub-graph to be generated according to the edge data of each sub-graph read from the Node0, set a scale parameter delta according to a statistical analysis of vertex degree information of the graph data G, and make the most durable memory PMEM of the hybrid memory system HMS of the Node0The dense delta P sub-image data to be generated is migrated to a memory DRAM of a hybrid memory system HMS of a Node 0; reading the incoming edge data of the corresponding subgraph from the Node1, calculating the density degree of each subgraph to be generated, and according to the proportion parameter delta, migrating the most dense delta P subgraph data to be generated in the permanent memory PMEM of the hybrid memory system HMS of the Node1 to the memory DRAM of the hybrid memory system HMS of the Node 1; then generating an ith sub-graph G according to the ith sub-graph data in the memory DRAM of the Node0 and the ith sub-graph data in the memory DRAM of the Node1 i =(V i ,E i );
The sub-graph update module is used for updating the sub-graph G for each sub-graph i The iterative updating method comprises the following steps: computing sub-graph G at current iteration i Is of activity epsilon of (2) [i] Determining a subgraph G i The update mode adopted; wherein one is set based on epsilon [i] And θ, as subgraph G) i Is of activity epsilon of (2) [i] When the threshold value theta is smaller than the set threshold value theta, returning the threshold function selection model () to the Push mode, otherwise, returning to the Pull mode; subgraph G in Push mode i The vertex v in (1) holds the target vertex set D v The method comprises the steps of carrying out a first treatment on the surface of the Subgraph G in Pull mode i The vertex v in (1) holds the source vertex set S v The method comprises the steps of carrying out a first treatment on the surface of the Push mode graph algorithm pushes updated value of vertex v to D v Pull mode graph algorithm from S v The source vertex value is pulled to update the value of vertex v.
The invention has the following advantages:
the invention is a new mixed memory graph calculation model with effective competitiveness after the current mainstream graph calculation model, namely a single graph calculation model based on the traditional memory, a graph calculation model based on the distributed shared memory and a graph calculation model based on the mixed memory, and effectively solves the problems of the first three graph calculation models. In short, the single-machine graph calculation model based on the traditional memory seriously depends on a DRAM with large capacity, and has limited calculation capacity, so that large-scale graph data are difficult to process; graph calculation models based on distributed shared memories easily cause a problem of unbalanced load in the graph dividing process, and communication overhead among distributed calculation nodes is caused to become a performance bottleneck of the models. Based on the graph calculation model of the emerging hybrid memory, the remote data access overhead becomes a main performance bottleneck due to the read-write performance difference between the DRAM and the PMEM. Aiming at the problems, the scheme provided by the invention is a novel hybrid memory map calculation model, and thoroughly solves the problems of low data processing capacity, low calculation capacity, low overall cost performance and low deployment efficiency. From the advantages, the invention effectively combines the Dual-Block diagram storage representation mode and the data-driven self-adaptive push-pull strategy, and optimizes the diagram data access efficiency in the HMS through the diagram data layering and dynamic diagram data migration strategies; experimentally, optimal performance can be achieved, as well as a better balance of efficient graph data access and efficient graph computation, compared to the current most advanced models.
Drawings
FIG. 1 is a flow chart of a Dual-Block diagram storage representation method.
FIG. 2 is a flow chart of sub-graph construction of a sub-graph data layering strategy based on Dual-Block graph storage representation.
Fig. 3 is a Push mode and a Pull mode;
(a) A Push mode, and (b) a Pull mode.
FIG. 4 is a diagram of a forward update mode and a backward update mode;
(a) is original G, (b) is a push mode (forward update mode) based on CSR, and (c) is a pull mode (backward update mode) based on CSR.
Fig. 5 is a specific flow chart of an adaptive update strategy.
FIG. 6 is a diagram computation model framework for NUMA-based efficient hybrid memory.
FIG. 7 is a graph showing execution times of different update policies;
(a) is the execution time of the different update modes in the facebook, (b) is the execution time of the different update modes in the soc-LiveJournal, (c) is the execution time of the different update modes in the Twitter-2010, (d) is the execution time of the different update modes in the Friendster, and (e) is the execution time of the different update modes in the Yahoo Web.
FIG. 8 is a graph of multithreading execution time of PageRank and WCC on the friends.
FIG. 9 is a graph of execution time of different models on five different graph datasets;
(a) is the execution time of the different models in the facebook, (b) is the execution time of the different models in the soc-LiveJournal, (c) is the execution time of the different models in the Twitter-2010, (d) is the execution time of the different models in the Friendster, and (e) is the execution time of the different models in the Yahoo Web.
Detailed Description
The invention will now be described in further detail with reference to the accompanying drawings, which are given by way of illustration only and are not intended to limit the scope of the invention.
The invention provides a self-adaptive graph calculation updating method and a self-adaptive graph calculation updating system based on a hybrid memory, which are a rather competitive model after a single graph calculation model of a traditional memory, a graph calculation model of a distributed shared memory and a graph calculation model of an emerging hybrid memory. Firstly, constructing a Dual-Block storage mode based on the large capacity, persistence and byte addressing capability of PMEM in HMS so as to accelerate the subgraph construction process; meanwhile, the storage mode can effectively support a self-adaptive Push-Pull (Push-Pull) update model based on data driving. Next, an Adaptive update strategy (Adaptive Push-Pull Update Strategy) is built according to algorithm 3 to select a corresponding computing mode: algorithm 4 Push mode (Push Model) and algorithm 5 pull mode (Push Model). Then, with the help of the algorithms of the two modes, in the algorithm 4 Push mode (Push Model), algorithm 1 (Push-based graph algorithm) is executed or in the algorithm 5 (Push Model), algorithm 2 (Push-based graph algorithm) is executed. The details of the specific technical scheme are described by expanding four parts of (1) a Dual-Block diagram storage representation method, (2) a sub-image data layering strategy based on the Dual-Block diagram storage representation, (3) an adaptive push-pull update strategy and (4) an NPGraph system framework.
(1) First, the Dual-Block diagram storage representation method. Similar to the scheme proposed by Li et al, NPGraph partitions graph g= (V, E) into P subgraphs: g 1 ~G p . That is, vertex set V and edge set E are partitionedInto P disjoint intervals V 1 ~V p Sum edge block E 1 ~E p . To reduce data storage space, graph data is typically stored in a Compressed Sparse Row (CSR) compressed format. The implementation process of the Dual-Block graph representation method is specifically as follows: 1) And in the preprocessing stage, the outgoing side mode data and the incoming side mode data of the graph data are simultaneously stored in the solid state disk or the mechanical hard disk. 2) Then, the graph data in the two storage formats are respectively subjected to sub-graph division according to the vertex interval and the corresponding edge data. 3) And then compressing the sub-image data by using a CSR compression method, and storing the sub-image data in a solid state disk or a mechanical hard disk. The flow design of the Dual-Block diagram storage representation method is shown in FIG. 1.
(2) Then, the sub-graph data layering strategy based on the Dual-Block graph storage representation is as follows: NPGraph analyzes the data access rate of DRAM in HMS, subject to asymmetric scalability between DRAM and PMEM in HMS. According to the execution flow, the graph attribute data and the state data of the graph structure in the memory mentioned in fig. 1 (D curr 、D next 、S curr And S is next ) Should be placed in DRAM. NPGraph analyzes the access rate of DRAM in HMS. Suppose the number of accesses to DRAM is N D The number of accesses of PMEM is N P Memory access rate R D Can be expressed as:
R D =N D /(N D +N P )#(1)
wherein, according to the execution flow of the graphic structure in the memory mentioned in FIG. 2, N D And N p Can be expressed as:
and
wherein R is D Representing the ratio of access to DRAM;representing vertex v i Is the number of degrees; n (N) D Represents the number of accesses to DRAM in an iteration, and N p Similarly.
Wherein R is i Representing subgraph G i =(V i ,E i ) Is a degree of denseness of (2); d, d v Degree representing vertex v; v (V) i I represents subgraph G i Number of medium vertices.
According to the above formula (4), according to subgraph G i =(V i ,E i ) Degree of each vertex V in the subgraph and vertex V in the subgraph i Can calculate the degree of density R of all sub-graphs i Then for R i And (5) performing descending order arrangement. According to the statistical analysis of the vertex degree information of the graph data by Li and the like, setting a proportion parameter delta=0.2 loaded into a memory, namely loading the first 20% of the most dense sub-graph data in PMEM into a DRAM; the remaining 80% of the sub-graph data is still stored in the PMEM.
In the case of sub-graph data layering, according to the degree of density R i And a scaling parameter delta, the sub-picture structure data is selectively migrated from PMEM to DRAM at which time the DRAM access rate R in equation (1) D Can be expressed as
Wherein δ (0.ltoreq.δ1) represents the sub-picture data portion loaded into the DRAM;representing the ratio of access to DRAM when placing graph data hierarchically between DRAM and PMEM.
Equation (2) -equation (5) demonstrates the effectiveness of hierarchical placement strategies based on subgraph density in HMS. Based on the Dual-Block graph representation and the hierarchical placement strategy, NPGraph can achieve better performance in HMS. Furthermore, NPGraph converts local PMEM data access to remote memory access when data access is performed across NUMA nodes, given that remote memory access is faster than local PMEM access.
In the sub-graph construction stage, sub-graph data of an edge-out mode is loaded into a Node 0; loading the sub-graph data of the edge entering mode into a Node 1; and the outgoing edge data of each sub-graph is read from the Node0, and the incoming edge data of each sub-graph is read from the Node1, so that the construction of the sub-graph is completed. The specific process is as follows: 1) Node0 and Node1 create state data and attribute data for all sub-graph data vertices in DRAM at the same time. 2) If the creation is successful, the Node0 reads out block data of all the subgraphs to be generated and enters the local PMEM; meanwhile, the Node1 reads all InBlock data of the sub-graph to be generated and enters the local PMEM. 3) In Node0 and Node1, the degree of density R of each sub-graph to be generated is calculated according to formula (4) i And are arranged in descending order. 4) In Node0 and Node1, according to equation (4) and the scaling parameter δ of the loaded memory, the partial dense subgraph stored in PMEM is migrated to the local DRAM. 5) Sub-graph G may then be iterated i Is realized by the construction operation of: DRAM or PMEM in Node0 reads OutBlock [ i ]]Data block and read InBlock [ i ] by DRAM or PMEM in Node1 with remote memory access]Data block, complete sub-graph G i Is a construction process of (1). The flow design of the sub-graph data layering strategy based on the Dual-Block graph storage representation in the sub-graph construction stage is shown in fig. 2.
(3) And thirdly, an adaptive push-pull updating strategy. Due to the power distribution of the graph data, an asymmetric convergence phenomenon is commonly existed in the iterative graph calculation process. That is, the reason that sparse subgraphs typically converge quickly and dense subgraphs converge slowly. As shown in fig. 3, in the push model, each vertex will scatter (write) the changes to its neighbors through its outgoing edge. In contrast, in the pull model, each vertex collects (reads) information from the received neighbors and then updates its own value with the collected information.
In the DRAM, the constructed sub-graph data includes graph structure data, attribute data, and state data. Wherein vertices and edges in the graph structure data (i.e., vertex and edge data) are labeled Row and Col; in order to facilitate execution of the graph algorithm, attribute data and state data for vertices are also created. Since the attribute data and the state data of the vertices are in one-to-one correspondence with the respective vertices, the available array directly represents the attribute data and the state data. Wherein the attribute data of the vertex before the iterative calculation of the present round is marked as D curr And attribute data of vertices after the iterative calculation of the present round are marked D next The method comprises the steps of carrying out a first treatment on the surface of the The state data of the vertex before the present round of iterative computation is marked as S curr And the state data of the vertex after the present round of iterative computation is marked as S next . Thus, a different graph algorithm, such as the PageRank or WCC algorithm, may be iteratively performed on each subgraph constructed.
As shown in FIG. 4, each subgraph G i =(V i ,E i ) Are associated with Forward and Backward modes (Forward and Backward). Wherein, subgraph G i After CSR compression, the outgoing edge storage mode is suitable for Push (Push) mode update, and is marked as Forward mode (Forward mode), as shown in fig. 4 (b). Subgraph G i After CSR compression, the incoming side storage mode is suitable for Pull (Pull) mode update, and is marked as a forward mode (Backward manager), as shown in fig. 4 (c).
As shown in FIG. 4 (a), the original graph G with 6 vertices and 13 edges is partitioned into G 0 =(V 0 ,E 0 ) And G 1 =(V 1 ,E 1 ). Wherein V is 0 = {0,1,2} and V 1 = {3,4,5}. The Forward (Forward manger) and Backward (Backward manger) modes thereof are shown in fig. 4 (b) and (c). The execution flow starts from the active vertex and either updates its outgoing neighbor data or updates its own data. In the forward mode of fig. 4 (b), there are two execution flows during one iteration of the process, represented by solid and dashed arrows, respectively. Specifically due to S curr [0]=1 and S curr [5]=1,v 0 And v 5 Is an active vertex. Vertex v 0 With an out-edge set E 0 =Col[Row[0],Row[1]) = { (0, 1), (0, 2) }. Top v 5 Is set of (E) out edges 5 =Col[Row[5]Sizeof (Row)) = { (5, 2), (5, 3) }. In push model, v 1 、v 2 And v 3 Corresponding D next [1,2,3]And S is next [1,2,3]The update will be performed in a subsequent process. However, in the backward mode, vertex v 0 And v 5 It is necessary to update its own data by collecting information of its in-edge neighbors. Specifically v 0 The need to collect the data from v 3 Information D of (2) curr [3]And S is curr [3]And update its own D next [0]And S is next [0]. Similarly, for vertex v 5 It needs to be from the source vertex v 2 And v 4 Collect information and then update its D next [5]And S is next [5]。
Obviously, the push update model and the pull update model are applicable to different scenes depending on the subgraph G i Number of middle active vertices and edges:
sum[i]=∑ v∈Vi d v #(6)
wherein sum [ i ]]Representing subgraph G i Degree of all vertices in (a); d, d v Representing the degree of egress or ingress of vertex v. Subgraph G i The number of active vertices in a graph may describe sub-graph G i The overall activity of (c) can be expressed as ε [ i ]]:
ε[i]=∑ v∈Vi S curr [v]×d v #(7)
Wherein ε [ i ]]Representing subgraph G i Is the activity of (1); s is S curr [v]Representing subgraph G i All active vertices in (a); d, d v Outbound or inbound degree information representing the active vertex v. And subgraph G i Can be normalized as relative activity of (2)
A specific flow of the adaptive update strategy is shown in fig. 5.
Algorithm 1 the adaptive push-pull update procedure of the present invention
In one iteration map calculation, algorithm 1 describes an adaptive push-pull update strategy. select model () is based on ε [i] And setting a threshold function of the threshold value θ. When epsilon [i] When less than θ, it returns to Push. Otherwise, return to Pull. For each sub-graph, NPGraph adaptively selects a push or pull update model to accommodate different graph computation tasks. The selection of push mode and pull mode is based on the number of active vertices and the data access performance prediction method.
Algorithm 2 describes the execution of the data-driven push model. It processes the vertex interval V by Forward manger i Is formed on the outer surface of the base. Once vertex v appears in the worklist, it traverses all its outbound neighbors and pushes updates to its outbound neighborhood D using a user-defined update function v . In the process, it will be from D curr Reads the vertex value and writes the update to D next . If the target vertex D v Activated, they are added to the worklist and the calculation is continued in the next iteration. Algorithm 3 describes the execution of the data driven pull model. Likewise, the algorithm processes the vertex interval V by a corresponding backword manger i Is included.
Algorithms 2 and 3 Forward/Backward manger update mode of the present invention
Depending on the manner in which the data is activated, the graph algorithms can be divided into two categories: topology driven graph algorithm and data driven graph algorithm. For topology driven graph algorithms, all vertices need to be processed for each iteration. In contrast, in a data-driven graph algorithm, vertices are dynamically activated by their neighbors, i.e., the vertices are made active or inactive by a user-defined function. The data-driven graph algorithm may allow us to focus more on "hot-vertex" in the graph that needs to be updated more frequently. Thus, in many applications, data driven graph algorithms are more efficient than topology driven graph algorithms. NPGraph takes the PageRank algorithm as an example, and comprehensively analyzes a data-driven push-pull model, and the execution process is shown as an algorithm 4 and an algorithm 5. Algorithm 4 and algorithm 5 are for each subgraph G i The vertexes v in (a) respectively reserve a target vertex set D v Sum source vertex set S v . As shown in algorithm 4, push-based PageRank updates the value r of vertex v v Push to target vertex set D v For use in the next iteration. PageRank based on Pull, when calculating the current value of vertex v, reading the source vertex set S v To complete the data update itself.
Algorithms 4 and 5 push-based/pull-based graph algorithm of the present invention (PageRank is taken as an example)
The benefit of filtering active vertices is typically greater than the overhead of computing all vertices. The order of execution of the active vertices is critical to the graph algorithm. For example, in the PageRank algorithm based on push mode, each vertex v has a new residual r v Its pr v The sum is updated. Total residual error r v Will reduce (1-alpha) r v . This suggests that the PageRank algorithm may converge faster if vertices with larger residuals are processed first.
(4) Finally, the system framework part of NPGraph. In the HMS architecture, the PMEM operating mode may be switched between Memory mode and APP-Driect mode. However, in Memory mode, PMEM is used as a volatile Memory as an extension of DRAM. At the same time, the DRAM becomes a hardware managed cache which will result in read/write latency on the order of 10 ns. In addition, all DRAM cache misses and page faults will flow to PMEM, which will result in read/write latency on the order of μs. This means that the Memory model is not suitable for all application scenarios, especially large-scale graph data processing. However, in APP-Driect mode, PMEM is byte-addressable as DRAM. Meanwhile, it provides two memory access functions: DMA (direct memory access) and RDMA (remote direct memory access). More importantly, it can read/write data in blocks (e.g., 512B, 4 KB) like a hard disk. This mode may provide better performance, lower latency, and better durability than conventional HDDs and SSDs. Thus, the APP-Driect mode is particularly suited for large-scale graphics processing, and it provides a more cost-effective solution for graphics computing.
The model framework is designed for NUMA-based HMS as shown in FIG. 6. It is composed of 2 NUMA nodes, noted as Node0, node1, each NUMA Node containing 16GB of memory (DRAM) and 256GB of PMEM, the external memory being composed of HDD or SSD. It uses 2 NUMA nodes (Node) to store out-edge (OutBlock) and in-edge (InBlock) graph data, respectively. Specifically, in the preprocessing stage, sub-division is performed according to the vertex interval; and compressing the outgoing side (OutBlock) data and the incoming side (InBlock) data by using a Compressed Sparse Row (CSR) data compression method; then, the external memory (HDD or SSD) stores Forward and Backward compressed image data (push update mode based on OutBlock) simultaneously; in the data loading stage, node0 loads all Forward manger edge data, and Node1 loads all Backward manger edge data. In each Node, DRAM mainly stores all vertexes, vertex state data and partial dense sub-graph edge block data; the PMEM stores mainly all remaining sub-picture data edge block data.
Furthermore, the execution time of the graph algorithm is proportional to the number of memory accesses to the active vertices and edges. Therefore, the graph data live migration problem between DRAM and PMEM should be considered in HMS. S is S curr [v]、D v And S is v The activity of the different subgraphs may be reflected. In the map calculationDuring the iteration of the method, the migration strategy will calculate the relative liveness ε of each sub-graph in DRAM and PMEM, respectively. By using memcpy () function, NPGraph can perform swap-in and swap-out operations between DRAM and PMEM. Because the self-adaptive push-pull updating strategy and the dynamic data migration strategy are adopted, the NPGraph can furthest improve the cache hit rate in the DRAM. In addition, the process is bound to a specific kernel and NUMA Balance is closed, so that the graph data access efficiency between different NUMA nodes is improved.
The following is a description in connection with experimental data. The experimental environment configuration, the experimental data set and the comparison algorithm for the NPGraph experiment are introduced first, and then experimental results and analysis are displayed.
(1) First is the experimental environment configuration section. All experiments were performed on 2 NUMA nodes. Each node includes an Intel Xeon Gold 5218R CPU with 20 cores, 40 threads, L2 level 32KB, L3 level 27.5MB. To achieve the experimental conditions of limited DRAM resources, each NUMA node was equipped with 16GB DRAM and 265GB Optane DC PMEM module. The experiment platform runs on a Ubuntu 18.04LTS system to conduct large-scale graph data processing experiments.
Table 1 brief description of experimental environment configuration
Device name Specific parameters
CPU 2×Intel Xeon Gold 5218R@2.10GHz
L1 Cache 32KB,64B cache block
L2 Cache 32KB,64B cache block
L3 Cache 28160KB,64B cache block
DRAM 4×8GB DDR4,2666MHz
PMEM 4×128GB Intel Optane DC,2666MHz
(2) Then the data set and contrast algorithm introduction. All of the graph datasets used in the experiments were published real world graph data. The graph dataset Facebook and the soc-LiveJournal of the social network are selected to validate NPGraph. Further, the large scale graph data sets Twitter-2010, friendster, and Yahoo Web are social networks, gaming websites, and websites, respectively. They consist of billions of vertices and edges, with larger diameters. Specifically, the following three graph datasets are stored on the basis of the Dual-Block graph representation method at 1.56 times, 1.94 times, and 6.81 times the capacity of the DRAM, respectively.
Table 2 reference dataset profile
Data set Top count Edge number Type(s)
Facebook 4,039 88,234 Social Graphs
soc-LiveJournal 4,847,571 68,993,773 Social Graphs
Twitter-2010 61,578,416 246,313,664 Social Graphs
Friendster 65,608,366 1,806,067,135 Game Graphs
Yahoo Web 1,413,511,424 5,654,045,696 Web Graphs
The basic graph algorithm used in the evaluation includes a representative sparse matrix multiplication algorithm PageRank and WCC based on graph traversal. In the experiment, pageRank was set to iterate ten times, while WCC was run to converge. These two types of algorithms exhibit different computational characteristics, and can evaluate both computation and traversal of npgraphs. Finally, NPGraph was compared with the most advanced systems GraphOne and XPGraph.
(3) Influence of adaptive push-pull strategy: on the five public data sets, the WCC algorithm and the PageRank algorithm are utilized to comprehensively compare the Forward-Push model, the Backward-Pull model and the Dual-Adaptive model. The single-threaded execution times of the three update models described above are shown in FIG. 7.
Obviously, the overall performance of the Dual-Adaptive model is superior to that of the Forward-Push model and the Backward-Pull model. More specifically, for WCC, the Dual-Adaptive model performs 15.3% -27.6% higher than the Forward-Push model. For PageRank, the Dual-Adaptive model performance is 14.7% -28.9% higher than that of Forward-Push model.
It was found that the Forward-Push model performed better than the Backward-Push model in this experiment. In general, the Backward-Pull model has more reading characteristics, and is more friendly to caching graph data. However, in real-world graph data, since graph data tends to exist in a small world, a large amount of random data access occurs in the iterative computation process of graph computation. This may result in optimizing the cache behavior (Backward-Push model) not necessarily more efficient than optimizing the fast propagation mode (Forward-Push model). That is, forward-Push based models are advantageous over Backward-Push models in terms of updated information propagation. The extra write operation based on Forward-Push model is not just another implementation of vertex update, but affects the scheduling of tasks. It is more convenient to switch the state of the vertex, and this improved scheduling approach compensates for the time overhead of writing the load.
(4) Influence of data layering strategy: to verify the validity of the data layering strategy in HMS, NPGraph performed a comparative experiment of multithreading (1 to 64 threads) on Friendster. As described above, one graph G is divided into P subgraphs. The sparseness degree of the subgraph is easily determined according to the offset of Row and the size of Col. Therefore, the preferential loading of dense subgraphs into DRAM is a natural approach. The comparative experiments are divided here into two cases: one is to load all the structural data of the subgraph into the PMEM. In this case, pageRank and WCC are labeled PR and WCC as shown in FIG. 8. Another data placement strategy is used as a comparison. NPGraph loads top-20% of the dense subgraph into memory according to DRAM and data set size. In this case, pageRank and WCC are labeled PR-L and WCC-L, respectively. More importantly, NPGraph employs an adaptive push-pull strategy to achieve optimal performance. Meanwhile, an exchange primitive such as atomic_compare_exchange () is used in the code program.
As can be seen from FIG. 8, the two data placement strategies compare experiments, PR-L and WCC-L performed 1.51-1.83 times as much as PR and WCC. And as the number of threads increases, the execution time gradually decreases, which tends to demonstrate that the graph data layering strategy can fully utilize the DRAM and PMEM in the HMS. Therefore, the overall performance of NPGraph can be significantly improved by the data layering strategy. Based on the above observations, we can draw two conclusions: (1) PageRank achieves a greater performance improvement than WCC due to the sensitivity to heterogeneous memory. (2) From the trend of the multithreading execution time, NPGraph has good parallelism scalability.
(5) Influence of dynamic data migration policy: to evaluate the effectiveness of the data migration strategy in HMS, NPGraph runs WCC and PageRank algorithms on the five data sets described above. NPGraph unifies the adaptive push-pull mode, the data layering mode and the data migration mode as NPGraph-A, NPGraph-L and NPGraph-M. For optimal performance, 64 worker threads are used to accelerate the NPGraph graph computation model in parallel.
TABLE 3 execution time (seconds) of WCC and PageRank
As shown in Table 3, the performance of NPGraph-M was overall superior to NPGraph-L; whereas NPGraph-L performed better than NPGraph-A. More specifically, in both WCC and PageRank algorithms, NPGraph-M performed 14.3% -34.6% higher than NPGraph-A. Furthermore, in WCC, NPGraph-M has 14.1% -23.5% higher performance than NPGraph-L. Meanwhile, in the PageRank aspect, the performance of NPGraph-M is 12.9% -18.5% higher than that of NPGraph-L.
By analyzing the experimental results in table 3, the effectiveness of the dynamic data migration policy in NPGraph can be determined. Based on this strategy, the performance of NPGraph is further improved. This further demonstrates that combining data layering and dynamic data migration can promote both spatial and temporal locality of graph computation in HMS.
(6) Comparison to other models: NPGraph is compared with the most advanced memory model: graphOne and XPGraph. They all support parallel mode graph computation. NPGraph provides 16GB DRAM, 256GB PMEM, and 2 NUMA nodes for optimal performance. For fairness, graphOne, XPGraph and NPGraph run in the same environment, the thread count is set to 64. Fig. 9 shows the execution times of five different graphic data sets.
As shown in FIG. 9, NPGraph significantly increased the speed compared to Graphine and XPGraph. Specifically, on Pagerank and WCC two types of algorithms, NPGraph is raised by 27.36% to 43.8% relative to GraphOn. Furthermore, it is 21.67% to 32.03% higher than XPGraph on Pagerank and WCC algorithms. The improvement of NPGraph performance benefits from the constructed Dual-Block diagram storage representation mode and the data-driven self-adaptive Push-Pull (Push-Pull) update model which is effectively supported by the NPGraph storage representation mode.
In order to store very large scale graph data, graphOne uses a hybrid storage format, combining the two most commonly used memory graph storage formats, namely edge list and adjacency list. It uses a random access pattern centered on the vertex to traverse the graph structure data. However, since the dynamic nature of the graph computation is ignored, it must traverse the graph data in all memories, resulting in inefficiency. In addition, it can generate a large number of intermediate result write operations, resulting in significant I/O overhead. NUMA-based XPGraph developed an XPline-friendly graph access model with vertex-centric graph buffering. However, it mainly solves the problem of large-scale dynamic graph storage. In each iteration, it ignores the problem of load imbalance among multiple threads. This is precisely the greatest advantage of NPGraph. In a NUMA-based hybrid memory system, NPGraph reduces the graph data access cost and improves the parallel computing efficiency of graph data based on a Dual-Block graph storage representation mode and a data-driven self-adaptive push-pull strategy.
Although specific embodiments of the invention have been disclosed for illustrative purposes, it will be appreciated by those skilled in the art that the invention may be implemented with the help of a variety of examples: various alternatives, variations and modifications are possible without departing from the spirit and scope of the invention and the appended claims. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed as the best mode contemplated for carrying out this invention, but that the invention will have the scope indicated by the scope of the appended claims.

Claims (9)

1. A self-adaptive graph calculation updating method based on a hybrid memory comprises the following steps:
1) Setting two NUMA nodes, namely Node0 and Node1, wherein each NUMA Node adopts a hybrid memory system HMS, and the hybrid memory system HMS comprises a persistent memory PMEM and a memory DRAM; the Node0 is provided with a preprocessing module, a sub-graph construction module and a sub-graph update module;
2) The preprocessing module divides the vertex set V and the edge set E of the graph data g= (V, E) into P disjoint vertex sections V 1 ~V p Sum edge block E 1 ~E p For generating P sub-graphs G 1 ~G P The method comprises the steps of carrying out a first treatment on the surface of the Wherein i=1 to P, and the ith vertex section V i The included vertex subset is the sub-graph G to be generated i Vertex set, edge block E i Including sub-graph G to be generated i With a vertex interval V i OutBlock [ i ] is an out-edge data block of a source vertex]And in the vertex interval V i InBlock [ i ] for destination vertex edge data block]The method comprises the steps of carrying out a first treatment on the surface of the Then loading the sub-graph data of the edge-out mode into the Node0, and loading the sub-graph data of the edge-in mode into the Node 1;
3) The subgraph construction module calculates the density degree of each subgraph to be generated according to the outgoing edge data of each subgraph read from the Node0, sets a proportion parameter delta according to the statistical analysis of the vertex degree information of the graph data G, and transfers the most dense delta P subgraph data to be generated in the permanent memory PMEM of the hybrid memory system HMS of the Node0 to the memory DRAM of the hybrid memory system HMS of the Node 0; slave Node1, reading the incoming edge data of the corresponding subgraph, calculating the density degree of each subgraph to be generated, and according to the proportion parameter delta, migrating the most dense delta P subgraph data to be generated in the permanent memory PMEM of the hybrid memory system HMS of the Node1 to the memory DRAM of the hybrid memory system HMS of the Node 1; then the sub-graph construction module generates an ith sub-graph G according to the ith sub-graph data in the local memory DRAM of the Node0 and the ith sub-graph data in the memory DRAM of the remote access Node1 i =(V i ,E i );
4) The sub-graph update module updates each sub-graph G i The iterative updating method comprises the following steps: computing sub-graph G at current iteration i Is of activity epsilon of (2) [i] Determining a subgraph G i The update mode adopted; wherein one is set based on epsilon [i] And θ, as subgraph G) i Is of activity epsilon of (2) [i] When the threshold value theta is smaller than the set threshold value theta, returning the threshold function selection model () to the Push mode, otherwise, returning to the Pull mode; subgraph G in Push mode i The vertex v in (1) holds the target vertex set D v The method comprises the steps of carrying out a first treatment on the surface of the Subgraph G in Pull mode i The vertex v in (1) holds the source vertex set S v The method comprises the steps of carrying out a first treatment on the surface of the Push mode graph algorithm pushes updated value of vertex v to D v Pull mode graph algorithm from S v The source vertex value is pulled to update the value of vertex v.
2. The method according to claim 1, wherein the specific implementation method of step 1) is as follows: firstly, storing the edge-out mode data of the graph data G in a first storage unit and performing sub-division according to vertex intervals to obtain each vertex interval and an edge-out data block corresponding to each vertex interval, wherein the edge-out data block corresponding to an ith vertex interval [ i ] is OutBlock [ i ]; and storing the edge entering mode data of the graph data G in a second storage unit and performing sub-division according to the vertex intervals to obtain each vertex interval and an edge entering data block corresponding to each vertex interval, wherein the edge entering data block corresponding to the ith vertex interval [ i ] is InBlock [ i ].
3. The method of claim 2, wherein the first storage unit and the second storage unit are solid state disks or mechanical hard disks.
4. A method according to claim 1, 2 or 3, characterized in that in step 2) the method of constructing the subgraph is: 1) Node0 and Node1 create state data and attribute data of all sub-graph data vertices in DRAM at the same time;
2) Reading out edge data blocks OutBlock of all sub-graphs to be generated from a Node0 and entering a local PMEM; reading all edge entering data blocks InBlock of all sub-graphs to be generated from a Node1 and entering a local PMEM; 3) Node0 and Node1 respectively calculate the density degree R of each sub-graph to be generated i The method comprises the steps of carrying out a first treatment on the surface of the 4) Node0 and Node1 respectively set a proportion parameter delta according to the statistical analysis of the vertex degree information of the graph data G, and transfer the most dense delta P pieces of sub-graph data to be generated in the permanent memory PMEM of the mixed memory system HMS to the memory DRAM of the mixed memory system HMS;
5) Generating an ith sub-graph G according to the ith sub-graph data in the DRAM or the PMEM of the Node0 and the ith sub-graph data in the DRAM or the PMEM of the Node1 i =(V i ,E i )。
5. A method according to claim 1 or 2 or 3, characterized in that sub-graph G i Is of a degree of density of (1)Wherein d v Degree representing vertex v; v (V) i I represents subgraph G i Number of medium vertices.
6. A method according to claim 1, 2 or 3, characterized in that the access rate of the DRAMWherein N is D N is the number of accesses to DRAM P The number of access times of PMEM is 0.ltoreq.delta.ltoreq.1.
7. A method according to claim 1 or 2 or 3, characterized in that the data of each sub-graph comprises graph structure data, attribute data and state data; marking vertexes in the graph structure data as Row and edges as Col; creating attribute data and state data for each vertex; marking attribute data of vertices as D curr And D next Marking state data of the vertex as S curr And S is next
8. The method of claim 1, wherein the sub-graph G i Out block i]After being compressed by CSR, the data are stored in a Node 0; subgraph G i InBlock [ i ] of the incoming edge data block of (1)]After CSR compression processing, the data is stored in Node 1.
9. The adaptive graph computing and updating system based on the hybrid memory is characterized by comprising two NUMA nodes, namely Node0 and Node1; each NUMA node adopts a hybrid memory system HMS, wherein the hybrid memory system HMS comprises a permanent memory PMEM and a memory DRAM; the Node0 is provided with a preprocessing module, a sub-graph construction module and a sub-graph update module;
The Node0 is used for storing sub-graph data of the edge mode; the Node1 is used for storing sub-graph data of an edge entering mode;
the preprocessing module is configured to segment a vertex set V and an edge set E of graph data g= (V, E) into P disjoint vertex intervals V 1 ~V p Sum edge block E 1 ~E p For generating P sub-graphs G 1 ~G P The method comprises the steps of carrying out a first treatment on the surface of the Wherein i=1 to P, and the ith vertex section V i The included vertex subset is the sub-graph G to be generated i Vertex set, edge block E i Including sub-graph G to be generated i Out block i]And an in-edge data block InBlock i];
The sub-graph construction module is used for calculating the density degree of each sub-graph to be generated according to the edge data of each sub-graph read from the Node0 and according to the statistics of the vertex degree information of the graph data GAnalyzing and setting a proportion parameter delta, and transferring the most dense delta P pieces of sub-image data to be generated in the permanent memory PMEM of the hybrid memory system HMS of the Node0 to the memory DRAM of the hybrid memory system HMS of the Node 0; reading the incoming edge data of the corresponding subgraph from the Node1, calculating the density degree of each subgraph to be generated, and according to the proportion parameter delta, migrating the most dense delta P subgraph data to be generated in the permanent memory PMEM of the hybrid memory system HMS of the Node1 to the memory DRAM of the hybrid memory system HMS of the Node 1; then generating an ith sub-graph G according to the ith sub-graph data in the memory DRAM of the Node0 and the ith sub-graph data in the memory DRAM of the Node1 i =(V i ,E i );
The sub-graph update module is used for updating the sub-graph G for each sub-graph i The iterative updating method comprises the following steps: computing sub-graph G at current iteration i Is of activity epsilon of (2) [i] Determining a subgraph G i The update mode adopted; wherein one is set based on epsilon [i] And θ, as subgraph G) i Is of activity epsilon of (2) [i] When the threshold value theta is smaller than the set threshold value theta, returning the threshold function selection model () to the Push mode, otherwise, returning to the Pull mode; subgraph G in Push mode i The vertex v in (1) holds the target vertex set D v The method comprises the steps of carrying out a first treatment on the surface of the Subgraph G in Pull mode i The vertex v in (1) holds the source vertex set S v The method comprises the steps of carrying out a first treatment on the surface of the Push mode graph algorithm pushes updated value of vertex v to D v Pull mode graph algorithm from S v The source vertex value is pulled to update the value of vertex v.
CN202311197360.0A 2023-09-15 2023-09-15 Adaptive graph calculation updating method and system based on hybrid memory Pending CN117149795A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311197360.0A CN117149795A (en) 2023-09-15 2023-09-15 Adaptive graph calculation updating method and system based on hybrid memory

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311197360.0A CN117149795A (en) 2023-09-15 2023-09-15 Adaptive graph calculation updating method and system based on hybrid memory

Publications (1)

Publication Number Publication Date
CN117149795A true CN117149795A (en) 2023-12-01

Family

ID=88909946

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311197360.0A Pending CN117149795A (en) 2023-09-15 2023-09-15 Adaptive graph calculation updating method and system based on hybrid memory

Country Status (1)

Country Link
CN (1) CN117149795A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118277624A (en) * 2024-05-31 2024-07-02 杭州海康威视数字技术股份有限公司 Data processing method, device, system, electronic equipment and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118277624A (en) * 2024-05-31 2024-07-02 杭州海康威视数字技术股份有限公司 Data processing method, device, system, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
Sariyüce et al. Betweenness centrality on GPUs and heterogeneous architectures
Ediger et al. Massive streaming data analytics: A case study with clustering coefficients
CN110737804B (en) Graph processing access optimization method and system based on activity degree layout
Almasri et al. Update on k-truss decomposition on gpu
CN117149795A (en) Adaptive graph calculation updating method and system based on hybrid memory
CN110795213B (en) Active memory prediction migration method in virtual machine migration process
Jaiyeoba et al. Graphtinker: A high performance data structure for dynamic graph processing
US20230281157A1 (en) Post-exascale graph computing method, system, storage medium and electronic device thereof
Cao et al. Scaling graph traversal to 281 trillion edges with 40 million cores
CN112799597A (en) Hierarchical storage fault-tolerant method for stream data processing
CN114117150B (en) Graphic analysis algorithm general optimization method based on GPU
CN111429974A (en) Molecular dynamics simulation short-range force parallel optimization method on super computer platform
CN107370807A (en) The service end and its cache optimization method accessed based on transparent service platform data
Chen et al. DBSCAN-PSM: an improvement method of DBSCAN algorithm on Spark
Mirsadeghi et al. PTRAM: A parallel topology-and routing-aware mapping framework for large-scale HPC systems
Sun et al. GraphMP: I/O-efficient big graph analytics on a single commodity machine
Qiu et al. Parallelizing big de bruijn graph construction on heterogeneous processors
CN116841762A (en) Fixed-length type edge point combined sampling mechanism in graph neural network training
Ediger et al. Computational graph analytics for massive streaming data
CN112817982B (en) Dynamic power law graph storage method based on LSM tree
Laili et al. Parallel transfer evolution algorithm
CN108256694A (en) Based on Fuzzy time sequence forecasting system, the method and device for repeating genetic algorithm
CN113065035A (en) Single-machine out-of-core attribute graph calculation method
Lee et al. File Access Characteristics of Deep Learning Workloads and Cache-Friendly Data Management
Li et al. Optimizing Data Layout for Training Deep Neural Networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination