CN109522428B - External memory access method of graph computing system based on index positioning - Google Patents

External memory access method of graph computing system based on index positioning Download PDF

Info

Publication number
CN109522428B
CN109522428B CN201811082365.8A CN201811082365A CN109522428B CN 109522428 B CN109522428 B CN 109522428B CN 201811082365 A CN201811082365 A CN 201811082365A CN 109522428 B CN109522428 B CN 109522428B
Authority
CN
China
Prior art keywords
edge
data
file
vertex
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811082365.8A
Other languages
Chinese (zh)
Other versions
CN109522428A (en
Inventor
王芳
冯丹
陈静
蒋子威
王子毅
刘上
杨蕾
杨文鑫
陈硕
曹孟媛
戴凯航
施展
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huazhong University of Science and Technology
Original Assignee
Huazhong University of Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huazhong University of Science and Technology filed Critical Huazhong University of Science and Technology
Priority to CN201811082365.8A priority Critical patent/CN109522428B/en
Publication of CN109522428A publication Critical patent/CN109522428A/en
Application granted granted Critical
Publication of CN109522428B publication Critical patent/CN109522428B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention discloses an external memory access method of a graph computing system based on index positioning, which comprises the following steps: dividing the complete graph data into a plurality of subgraphs; sequencing the edges of each subgraph according to the source vertex number and the target vertex number respectively; writing the sequenced subgraphs into an external storage file, and respectively establishing indexes for the source vertex number and the target vertex number; selecting an optimal loading mode from the loading modes of index positioning and the loading modes of accessing complete data; and loading each sub-graph in the external memory into the internal memory in an optimal loading mode. The invention redesigns the external memory data structure, improves the data loading mode, enables the system to analyze the effective data in the external memory before loading, and obviously reduces the I/O data volume and the random access times; the time overhead of the complete data access mode and the index positioning mode is analyzed, the optimal data loading mode of the system is dynamically judged, and the time overhead of data loading is reduced.

Description

External memory access method of graph computing system based on index positioning
Technical Field
The invention belongs to the field of graph calculation based on an external memory, and particularly relates to an external memory access method of a graph calculation system based on index positioning.
Background
The graph computation is performed by iteratively executing an update function. A commonly used approach for external memory based graph computing systems is to organize graph data into multiple sub-graph data files on a disk so that each sub-graph file can be placed into memory. Each sub-graph contains vertex information for computing updates, and a complete iteration process will process all sub-graph data. The key point is how to manage the computation states of all subgraphs to ensure the correctness of the processing result, wherein the graph data is loaded from an external memory to a memory, and an intermediate result is written back to the external memory, so that the subsequent computation can obtain an updated result. Thus, each iteration requires access to a large amount of data, which can generate a large amount of IO overhead and become a bottleneck for the external memory based approach.
The graphci system divides vertices in graph data into disjoint intervals during preprocessing and partitions edges into multiple data slices, the data slices corresponding one-to-one to the vertex intervals, the target vertex of an edge in each data slice corresponding to a vertex in the respective vertex interval, collects data from neighboring vertices using a vertex-centric processing model, executes an update function on each vertex, calculates and updates vertex values. Using a parallel sliding window technique to reduce random external memory I/O; keval Vora et al propose a general optimized access method ADS for the external memory map computing system, which only reads active data, and the main idea is to regenerate new sub-partitions from the next iteration of active data when each iteration is finished, and to read only the newly generated sub-partitions when the next iteration is performed. And simultaneously, setting DELAY _ BUFFER in a memory for storing the vertex which needs to be updated but is not executed, and uniformly reading the original subgraph to update the subgraph in the DELAY _ BUFFER every other iteration.
In summary, most of the current external memory data access methods in graph computing systems are loading complete subgraph partitions. Whether GraphChi using a parallel sliding window or ADS creating a dynamic sub-graph partition, since the external memory data is not divided more finely, data required by accurate positioning access calculation cannot be achieved. Meanwhile, if a simple positioning method is used for loading external memory data, although the data loading amount can be reduced and the resource utilization rate is improved, the original sequential access is divided into multiple random accesses, and extra time overhead is brought.
Disclosure of Invention
Aiming at the defects of the prior art, the invention aims to solve the technical problems of large I/O data volume, random access times and time overhead of an external memory access mode in the prior art.
In order to achieve the above object, in a first aspect, an embodiment of the present invention provides an external memory access method for a graph computing system based on index positioning, where the method includes the following steps:
s0. dividing the complete graph data into multiple sub-graphs that the memory can hold;
s1, sequencing edges of each subgraph according to a source vertex number and a target vertex number;
s2, writing the sequenced subgraphs into an external storage file, and respectively establishing indexes for a source vertex number and a target vertex number;
s3, selecting an optimal loading mode from the loading modes of index positioning and the loading modes of accessing complete data;
and S4, loading each sub-graph in the external memory into the internal memory in an optimal loading mode.
Specifically, the writing of the sorted subgraphs into the external storage file specifically includes the following steps:
when each sub-graph after edge sequencing is written into an external memory file, edges with the same source vertex and target vertex are stored in continuous external memory data blocks; the data of the edge comprises an edge and an edge value, and the edge are respectively stored as two files in an external memory: an edge file and an edge value file; the edge file stores the topological structure of the graph, the storage format is the adjacency list format; the order of the edge values in the edge file corresponds to the order of the edges in the edge file.
Specifically, the edge file includes: an edge-out file according to the source vertex number and an edge-in file according to the target vertex number; the edge-out file is organized according to the source vertex number, and the source vertex number, the out-degree and the target vertex number of the edge-out are sequentially and continuously stored; the edge entering file is composed according to the target vertex number, and the target vertex number, the degree of entry and the source vertex number of the edge entering are sequentially and continuously stored.
Specifically, the index records the offset address of the edge corresponding to the vertex in the external file.
Specifically, the loading manner of the index positioning is as follows:
(1) finding out data blocks needing to be loaded into the memory in the edge file according to the index positioning;
(2) according to the offset address of the data block output edge which needs to be loaded in the output edge file, finding out the edge value data of the output edge in the edge value file, and loading the edge value data into the memory;
(3) constructing an edge-out topological structure of a vertex in a memory;
(4) when the application needs to enter the edge, locating an edge entering data block which needs to be loaded into the memory in the edge entering file according to the index;
(5) according to the edge entering offset address of the data block to be loaded in the edge entering file, finding the edge value offset address of the edge entering in the edge value index file, and loading the edge value data offset address to the memory;
(6) and constructing an edge-entering topological structure of the vertex in the memory.
Specifically, step S3 is as follows:
s30, determining active vertexes of all sub-graphs, judging whether the proportion of the active vertexes exceeds a threshold value, if so, entering a step S31, otherwise, accessing a loading mode of complete data to be an optimal loading mode, and ending the step S3;
s31, recording a data block number corresponding to the active vertex data;
s32, calculating the cost (index) of a loading mode for index positioning and the cost (all) of a loading mode for accessing complete data based on the number of a data block corresponding to the active vertex data;
and S33, judging whether cost (all) is greater than cost (index), if so, determining that the loading mode of index positioning is the optimal loading mode, and otherwise, determining that the loading mode of accessing the complete data is the optimal loading mode.
Specifically, the value range of the threshold is 20% -30%.
In particular, the overhead of accessing the load of the complete data
Figure BDA0001802287430000041
Overhead of index-oriented load-wise
Figure BDA0001802287430000042
Wherein E is the set of all edges in the graph; d is a storage space occupied by a single edge; b is the data block size of primary I/O; b is the size of the data block pointed to by each index entry; k is the number of index data blocks corresponding to all active vertexes in the current system; r is the random access overhead.
In a second aspect, an embodiment of the present invention provides a computer-readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the computer program implements the external memory access method according to the first aspect.
Generally, compared with the prior art, the above technical method contemplated by the present invention has the following beneficial effects:
1. the invention combines the operation characteristics of graph application to redesign the organization structure of the external memory data, so that the data of the same vertex is stored in the continuous space of the external memory for being read conveniently, and an index is established for the offset address of the data block corresponding to the vertex in the file, so as to quickly access the corresponding data block, improve the data loading mode of the system, and enable the system to calculate and analyze the effective data in the external memory before the data loading stage, thereby realizing the selection of the vertex data required by the loading calculation, and obviously reducing the I/O data volume and the random access times;
2. the invention analyzes the time overhead of the original access complete data mode and the index positioning mode, designs a decision judgment function for dynamically judging the optimal data loading mode of the system in each iteration, and effectively reduces the time overhead of data loading.
Drawings
FIG. 1 is a diagram illustrating a corresponding relationship between an edge file and an edge offset address file according to the present invention;
fig. 2 is a schematic diagram of an edge-out index file and an edge-out file provided in an embodiment of the present invention;
fig. 3 is a flowchart of step S3 according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical means and advantages of the present invention more apparent, the present invention will be described in further detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention provides an external memory access method of a graph computing system based on index positioning, which comprises the following steps:
s0. dividing the complete graph data into multiple sub-graphs that the memory can hold;
s1, sequencing edges of each subgraph according to a source vertex number and a target vertex number;
s2, writing the sequenced subgraphs into an external storage file, and respectively establishing indexes for a source vertex number and a target vertex number;
s3, selecting an optimal loading mode from the loading modes of index positioning and the loading modes of accessing complete data;
and S4, loading each sub-graph in the external memory into the internal memory in an optimal loading mode.
The graph computation system mainly comprises a preprocessing stage and a computation execution stage. The preprocessing stage comprises reading original graph data into an internal memory, processing the data into a format required by system calculation and execution, and writing the format into an external memory file; the calculation execution stage comprises loading external memory data into a memory, constructing a subgraph in the memory, executing calculation updating and writing a calculation result back to the external memory. In order to realize the selective loading of external memory data based on indexes, sub-graph data sorting and index establishment are added in the preprocessing stage; and decision judgment is added in the calculation execution stage, and the modes of side data loading and side value data loading are modified.
The preprocessing stage of the present invention includes steps S0-S2.
Step S0. splits the complete graph data into multiple subgraphs that the memory can accommodate.
According to the size of available memory in computing resources, the graph data is divided into a plurality of subgraphs from complete original data, and the volume of the subgraph data does not exceed the available memory capacity generally, so as to ensure that the memory can store all data of a single subgraph.
And S1, sequencing the edges of each subgraph according to the source vertex numbers and the target vertex numbers.
And regarding the vertexes as being numbered continuously from 0 to | V | -1, and sequencing the sub-image data according to the source vertex number and the target vertex number respectively. The vertex set V is a set of all vertices in the graph, a vertex being composed of a vertex number and a value of the vertex itself.
And S2, writing the sequenced subgraphs into an external storage file, establishing indexes for the source vertex number and the target vertex number respectively, and recording the offset addresses of the edges corresponding to the vertices in the external storage file by the indexes.
And when the sub-graphs after edge sequencing are written into an external memory file, storing the edges with the same source vertex and target vertex in continuous external memory data blocks. The data of the edge comprises an edge and an edge value, and the edge are respectively stored as two files in an external memory: an edge file and an edge value file. The edge file stores the topological structure of the graph, and the storage format is an adjacency list format. The side file includes: and the edge-out file is numbered according to the source vertex and the edge-in file is numbered according to the target vertex. The edge-out file is organized according to the source vertex number, and the source vertex number, the out-degree and the target vertex number of the edge-out are sequentially and continuously stored; the edge entering file is composed according to the target vertex number, and the target vertex number, the degree of entry and the source vertex number of the edge entering are sequentially and continuously stored. The order of the edge values in the edge file corresponds to the order of the edges in the edge file. Because the stored information sequences in the two files are the same, the data in the edge value file can be conveniently positioned through the information of the edge file.
Considering that it is inevitable to synchronize and write back twice as much data if two pieces of edge data are to be saved, optimization of the data write back process is considered. Because only the difference of the storage sequence exists between the two edge values, an improved edge value file structure is designed, and only one piece of edge value data and one piece of offset address of the edge value data in the file are stored. Fig. 1 is a schematic diagram of a corresponding relationship between an edge file and an edge offset address file according to the present invention. As shown in fig. 1, the edge-out data file stores original edge data, and the sequence of the edge values corresponds to the sequence of the edges in the data file for compatibility with the original data loading manner of the system; the data file of the incoming edge constructs a piece of 'edge value' data with the same format, and the content stored in the file is not a real edge value, but a file offset address pointing to the position of the real edge value data in the file.
The index records the offset address of the edge corresponding to the vertex in the external file. The invention adopts a sparse index method to establish indexes for the source vertex number and the target vertex number respectively, uses a redundant storage method to store each edge as an incoming edge and an outgoing edge respectively once, and uses the external storage cost doubled to replace the improvement of the time performance. The method comprises the following specific steps:
when the index is established, an index file is established by the vertex number of the edge file, and the index points to the offset address of the outgoing edge/incoming edge of the vertex in the file. Each row of the index file corresponds to a doublet: vertex number + offset address of its corresponding outgoing side information/incoming side information in the file. Dividing the edge file into a plurality of blocks according to given interval intervals, wherein each index item points to the first piece of data of the corresponding block and comprises a vertex number and a file offset address thereof. In this way a large amount of vertex data can be skipped. Meanwhile, when data is loaded, although some useless data can be inevitably read in by loading one data block at a time, the data block read each time may contain data of a plurality of required vertexes, so that the I/O frequency can be reduced, and the efficiency is improved.
Fig. 2 is a schematic diagram of an edge-out index file and an edge-out file according to an embodiment of the present invention. As shown in fig. 2, each line in the edge-out file represents edge-out information of a source vertex, and the edge-out information comprises: source vertex number + out degree + destination vertex number. The interval given is 3. When the index is established, the edge file is divided into a plurality of blocks at intervals of 3, the vertexes 1-3 are one block, the vertexes 4-6 are one block, and the established index points to a first vertex of the block, for example, a first vertex V1, which points to the first address 0 of the first block, and a vertex 4 which points to the first address 32 of the second block (assuming that 4 bits are required for storing each value).
The topology of the graph directly affects the organization of the external data, and thus the location of the external data. On the premise of not changing the topological structure of the graph, the invention establishes indexes for the external memory data and finds the positioning data, thereby reducing the complexity of realization and simultaneously reducing the overhead of additional external memory access.
The calculation execution phase of the present invention includes steps S3-S4.
And S3, selecting an optimal loading mode from the loading modes of index positioning and the loading modes of accessing complete data. Fig. 3 is a flowchart of step S3 according to an embodiment of the present invention. As shown in fig. 3, step S3 is specifically as follows:
s30, determining the active vertex of each sub-graph, judging whether the proportion of the active vertices exceeds a threshold value, if so, entering the step S31, otherwise, accessing the loading mode of the complete data as an optimal loading mode, and ending the step S3.
The vertex participating in the calculation update is called an active vertex; vertices that do not participate in the computation are called inactive nodes.
The round with the lower active vertex proportion can obtain performance improvement by using an index positioning loading mode, the round with the higher active vertex proportion has better performance by using an original mode for accessing complete data, and the two data loading modes are combined to obtain an overall optimal scheme, namely, the decision judgment module is used for selecting the data loading mode to obtain the maximum performance improvement effect. The value range of the threshold is 20-30%, preferably 30%.
And S31, recording the number of the data block corresponding to the active vertex data.
One data block usually contains data of a plurality of vertexes, and data blocks corresponding to all active vertex data are counted. And (4) considering the distribution situation of the active vertexes in the data blocks, namely, determining which data blocks have the active vertexes, and counting the data blocks with the active vertexes.
And S32, calculating the overhead cost (index) of a loading mode for index positioning and the overhead cost (all) of a loading mode for accessing complete data based on the number of the data block corresponding to the active vertex data.
There are two ways to load data from external memory into internal memory, and the contents of the data loaded in these two ways are different. When accessing complete data, loading complete sub-graph data, corresponding to data files with ordered source vertexes, and not loading data files with ordered target vertexes; when the index positioning is used for loading the effective data, the outgoing edge of the vertex is loaded from the orderly outgoing edge file of the source vertex, and the incoming edge of the vertex is loaded from the orderly incoming edge file of the target vertex.
For the data loading mode of accessing the complete subgraph, the data loading mode is usually used when the data volume required by the application is large, the skipped invalid data is less, the optimization benefit is lower, and higher access efficiency can be obtained by utilizing the high performance of the external memory sequential access. Note that the original graph data is preprocessed to obtain two pieces of external storage data, and the two pieces of data files both contain complete sub-graph information, so that the system only needs to access one of the data files when loading data. And when the subgraph is constructed, each edge in the data file is sequentially processed, and whether a source vertex and a target vertex of the edge are data required by the application or not is respectively judged. If the source vertex needs to participate in calculation, adding the current edge into the edge output sequence of the vertex; if the target vertex needs to participate in the computation, the current edge is added to the in-edge sequence of the vertex.
The data loading mode for accessing the complete subgraph is as follows:
(1) sequentially loading data files with ordered source vertexes;
(2) sequentially processing each edge in the data file, respectively judging whether a source vertex and a target vertex of the edge are data required by application, and adding a current edge into an edge outlet sequence of the vertex if the source vertex needs to participate in calculation; if the target vertex needs to participate in the computation, the current edge is added to the in-edge sequence of the vertex.
The index positioning data loading mode is usually used when the data volume required by the application is small, and a lot of invalid data can be skipped, so that the optimization effect is obvious, and the index positioning is utilized to access the data required by the application, thereby reducing the I/O data volume and obtaining higher access efficiency. Before accessing the external memory data, firstly obtaining the vertex state information in the system, recording the number of a data block corresponding to the active vertex data, wherein one data block usually comprises data of a plurality of vertexes, counting the data blocks corresponding to all the active vertex data, and positioning the corresponding file position according to the index to access the data block. Ideally, only the data corresponding to the active vertex needs to be loaded, but in actual reading, considering that it is inefficient that each vertex needs to access the external memory once, the data of multiple vertices can be read in at one time by using a method of accessing a data block corresponding to the vertex, although the data block inevitably contains some useless data, the I/O frequency can be greatly reduced, and the I/O efficiency is improved. Note that the loaded data is obtained from two separate data files, the outgoing edge of the vertex is loaded from the file with ordered source vertices, and the incoming edge of the vertex is loaded from the file with ordered target vertices. Therefore, when the subgraph is constructed, the outgoing edge is added to the outgoing edge sequence of the corresponding source vertex, and the incoming edge is added to the incoming edge sequence of the corresponding target vertex, so that compared with a mode of accessing complete data, the data amount needing to be processed is reduced.
The data loading mode of index positioning is as follows:
(1) finding out data blocks needing to be loaded into the memory in the edge file according to the index positioning;
(2) according to the offset address of the data block output edge which needs to be loaded in the output edge file, finding out the edge value data of the output edge in the edge value file, and loading the edge value data into the memory;
(3) constructing an edge-out topological structure of a vertex in a memory;
(4) when the application needs to enter the edge, locating an edge entering data block which needs to be loaded into the memory in the edge entering file according to the index;
(5) according to the edge entering offset address of the data block to be loaded in the edge entering file, finding the edge value offset address of the edge entering in the edge value index file, and loading the offset address of the edge value data to the memory;
(6) and constructing an edge-entering topological structure of the vertex in the memory.
When the data is loaded, the position of the data in the file can be quickly positioned according to the number of the vertex, so that the data searching and accessing efficiency during the operation is improved.
Overhead of load-wise access to complete data
Figure BDA0001802287430000101
Overhead of index-oriented load-wise
Figure BDA0001802287430000102
Wherein E is the set of all edges in the graph; d is a storage space occupied by a single edge; b is the data block size of primary I/O; b is the size of the data block pointed to by each index entry; k is the number of index data blocks corresponding to all active vertexes in the current system; r is the random access overhead.
D. The constants | E | B, b and r are obtained and assigned before the computation is run, wherein D | E | is the space occupied by the data file.
And S33, judging whether cost (all) is greater than cost (index), if so, determining that the loading mode of index positioning is the optimal loading mode, and otherwise, determining that the loading mode of accessing the complete data is the optimal loading mode.
By comparing the two data loading modes, it can be seen that the data loading mode of index positioning is suitable for the case of less active vertex number, at this time, the amount of data required for calculation is less, if the complete data is accessed, there are many invalid data, and the amount of data to be processed when constructing the subgraph is also great, which wastes CPU resources. And the data loading mode for accessing the complete subgraph can fully utilize the sequential access performance of the external memory disk, and better performance can be obtained by loading all data at one time when the proportion of the number of active vertexes is large. The two data loading modes are comprehensively used, so that the system always obtains the optimal performance when the data is loaded, and the comprehensive efficiency of the system is greatly improved.
And S4, loading each sub-graph in the external memory into the internal memory in an optimal loading mode.
The above description is only for the preferred embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions that can be easily conceived by those skilled in the art within the technical scope of the present application should be covered within the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (7)

1. An external memory access method for a graph computing system based on index positioning, the method comprising the steps of:
s0. dividing the complete graph data into multiple sub-graphs that the memory can hold;
s1, sequencing edges of each subgraph according to a source vertex number and a target vertex number;
s2, writing the sequenced subgraphs into an external storage file, and respectively establishing indexes for a source vertex number and a target vertex number;
s3, selecting an optimal loading mode from the loading modes of index positioning and the data loading modes of accessing the complete subgraph, wherein the step S3 is as follows:
s30, determining active vertexes of all sub-graphs, judging whether the proportion of the active vertexes exceeds a threshold value, if so, entering a step S31, otherwise, accessing the data loading mode of the complete sub-graph to be an optimal loading mode, and ending the step S3;
s31, recording a data block number corresponding to the active vertex data;
s32, calculating the overhead cost (index) of a loading mode for index positioning and the overhead cost (all) of a data loading mode for accessing a complete subgraph based on the number of a data block corresponding to the active vertex data;
s33, judging whether cost (all) is greater than cost (index), if so, determining the loading mode of index positioning as the optimal loading mode, otherwise, determining the data loading mode of accessing the complete subgraph as the optimal loading mode;
the loading mode of the index positioning is as follows: (1) finding out data blocks needing to be loaded into the memory in the edge file according to the index positioning; (2) according to the offset address of the data block output edge which needs to be loaded in the output edge file, finding out the edge value data of the output edge in the edge value file, and loading the edge value data into the memory; (3) constructing an edge-out topological structure of a vertex in a memory; (4) when the application needs to enter the edge, locating an edge entering data block which needs to be loaded into the memory in the edge entering file according to the index; (5) according to the edge entering offset address of the data block to be loaded in the edge entering file, finding the edge value offset address of the edge entering in the edge value index file, and loading the edge value data offset address to the memory; (6) constructing an edge-entering topological structure of a vertex in a memory;
the data loading mode for accessing the complete subgraph is as follows: (1) sequentially loading data files with ordered source vertexes; (2) sequentially processing each edge in the data file, respectively judging whether a source vertex and a target vertex of the edge are data required by application, and adding a current edge into an edge outlet sequence of the vertex if the source vertex needs to participate in calculation; if the target vertex needs to participate in calculation, adding the current edge into the edge entering sequence of the vertex;
and S4, loading each sub-graph in the external memory into the internal memory in an optimal loading mode.
2. The method according to claim 1, wherein the writing of the sorted subgraphs into the external memory file is as follows:
when each sub-graph after edge sequencing is written into an external memory file, edges with the same source vertex and target vertex are stored in continuous external memory data blocks; the data of the edge comprises an edge and an edge value, and the edge are respectively stored as two files in an external memory: an edge file and an edge value file; the edge file stores the topological structure of the graph, the storage format is the adjacency list format; the order of the edge values in the edge file corresponds to the order of the edges in the edge file.
3. The external memory access method of claim 2, wherein the edge file comprises: an edge-out file according to the source vertex number and an edge-in file according to the target vertex number; the edge-out file is organized according to the source vertex number, and the source vertex number, the out-degree and the target vertex number of the edge-out are sequentially and continuously stored; the edge entering file is composed according to the target vertex number, and the target vertex number, the degree of entry and the source vertex number of the edge entering are sequentially and continuously stored.
4. The method as claimed in claim 2, wherein, when creating the index, the edge-value index file is created by using the vertex number of the edge file, and each row of the edge-value index file corresponds to one tuple: the vertex number of the edge file and the offset address of the corresponding outgoing edge or incoming edge in the external file.
5. The method according to claim 1, wherein the threshold value ranges from 20% to 30%.
6. The method of claim 1, wherein the overhead of accessing the data load of a complete subgraph
Figure FDA0002575606660000031
Overhead of index-oriented load-wise
Figure FDA0002575606660000032
Wherein E is the set of all edges in the graph; d is a storage space occupied by a single edge; b is the data block size of primary I/O; b is the size of the data block pointed to by each index entry; k is the number of index data blocks corresponding to all active vertexes in the current system; r is the random access overhead.
7. A computer-readable storage medium, having stored thereon a computer program which, when executed by a processor, implements a method of external memory access according to any one of claims 1 to 6.
CN201811082365.8A 2018-09-17 2018-09-17 External memory access method of graph computing system based on index positioning Active CN109522428B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811082365.8A CN109522428B (en) 2018-09-17 2018-09-17 External memory access method of graph computing system based on index positioning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811082365.8A CN109522428B (en) 2018-09-17 2018-09-17 External memory access method of graph computing system based on index positioning

Publications (2)

Publication Number Publication Date
CN109522428A CN109522428A (en) 2019-03-26
CN109522428B true CN109522428B (en) 2020-11-24

Family

ID=65771279

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811082365.8A Active CN109522428B (en) 2018-09-17 2018-09-17 External memory access method of graph computing system based on index positioning

Country Status (1)

Country Link
CN (1) CN109522428B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111523000B (en) * 2020-04-23 2023-06-23 北京百度网讯科技有限公司 Method, apparatus, device and storage medium for importing data
CN112287182B (en) * 2020-10-30 2023-09-19 杭州海康威视数字技术股份有限公司 Graph data storage and processing method and device and computer storage medium
CN112799845A (en) * 2021-02-02 2021-05-14 深圳计算科学研究院 Graph algorithm parallel acceleration method and device based on GRAPE framework
CN112988064B (en) * 2021-02-09 2022-11-08 华中科技大学 Concurrent multitask-oriented disk graph processing method
CN113448964B (en) * 2021-06-29 2022-10-21 四川蜀天梦图数据科技有限公司 Hybrid storage method and device based on graph-KV
CN114282073B (en) * 2022-03-02 2022-07-15 支付宝(杭州)信息技术有限公司 Data storage method and device and data reading method and device
CN114756483A (en) * 2022-03-31 2022-07-15 深圳清华大学研究院 Subgraph segmentation optimization method based on inter-core storage access and application
CN115391341A (en) * 2022-08-23 2022-11-25 抖音视界有限公司 Distributed graph data processing system, method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122248A (en) * 2017-05-02 2017-09-01 华中科技大学 A kind of distributed figure processing method of storage optimization
CN107491495A (en) * 2017-07-25 2017-12-19 南京师范大学 Storage method of the preferential space-time trajectory data file of space attribute in auxiliary storage device
CN107957962A (en) * 2017-12-19 2018-04-24 重庆大学 It is a kind of to calculate efficient figure division methods and system towards big figure

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107122248A (en) * 2017-05-02 2017-09-01 华中科技大学 A kind of distributed figure processing method of storage optimization
CN107491495A (en) * 2017-07-25 2017-12-19 南京师范大学 Storage method of the preferential space-time trajectory data file of space attribute in auxiliary storage device
CN107957962A (en) * 2017-12-19 2018-04-24 重庆大学 It is a kind of to calculate efficient figure division methods and system towards big figure

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于Multi-GPU平台的大规模图数据处理;张珩等;《计算机研究与发展》;20180115;第55卷(第2期);第273-288页 *

Also Published As

Publication number Publication date
CN109522428A (en) 2019-03-26

Similar Documents

Publication Publication Date Title
CN109522428B (en) External memory access method of graph computing system based on index positioning
CN106777351B (en) Computing system and its method are stored based on ART tree distributed system figure
US8140585B2 (en) Method and apparatus for partitioning and sorting a data set on a multi-processor system
CN107122248B (en) Storage optimization distributed graph processing method
CN112287182A (en) Graph data storage and processing method and device and computer storage medium
CN107015868B (en) Distributed parallel construction method of universal suffix tree
US8990492B1 (en) Increasing capacity in router forwarding tables
CN106599091B (en) RDF graph structure storage and index method based on key value storage
US11210343B2 (en) Graph data processing method and apparatus thereof
Jaiyeoba et al. Graphtinker: A high performance data structure for dynamic graph processing
CN104778077A (en) High-speed extranuclear graph processing method and system based on random and continuous disk access
KR20100004605A (en) Method for selecting node in network system and system thereof
US20230281157A1 (en) Post-exascale graph computing method, system, storage medium and electronic device thereof
CN110688055B (en) Data access method and system in large graph calculation
CN112699134A (en) Distributed graph database storage and query method based on graph subdivision
CN110222055B (en) Single-round kernel value maintenance method for multilateral updating under dynamic graph
US9507794B2 (en) Method and apparatus for distributed processing of file
Falchi et al. Nearest neighbor search in metric spaces through content-addressable networks
CN112988064B (en) Concurrent multitask-oriented disk graph processing method
CN112817982B (en) Dynamic power law graph storage method based on LSM tree
CN115391341A (en) Distributed graph data processing system, method, device, equipment and storage medium
CN110851178B (en) Inter-process program static analysis method based on distributed graph reachable computation
CN109240600B (en) Disk map processing method based on mixed updating strategy
CN110377601B (en) B-tree data structure-based MapReduce calculation process optimization method
CN111737347B (en) Method and device for sequentially segmenting data on Spark platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant