CN117349477A - Graph data heterogeneous hierarchical storage structure based on persistent memory and method thereof - Google Patents

Graph data heterogeneous hierarchical storage structure based on persistent memory and method thereof Download PDF

Info

Publication number
CN117349477A
CN117349477A CN202311270531.8A CN202311270531A CN117349477A CN 117349477 A CN117349477 A CN 117349477A CN 202311270531 A CN202311270531 A CN 202311270531A CN 117349477 A CN117349477 A CN 117349477A
Authority
CN
China
Prior art keywords
node
persistent memory
hierarchical
heterogeneous
key
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311270531.8A
Other languages
Chinese (zh)
Inventor
段翰聪
刘益辰
王瀚生
王智
明宇
陈铎汝
沈飞旭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202311270531.8A priority Critical patent/CN117349477A/en
Publication of CN117349477A publication Critical patent/CN117349477A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9014Indexing; Data structures therefor; Storage structures hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2358Change logging, detection, and notification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9027Trees

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a graph data heterogeneous hierarchical storage structure based on a persistent memory and a method thereof.

Description

Graph data heterogeneous hierarchical storage structure based on persistent memory and method thereof
Technical Field
The invention belongs to the technical field of computer software, and particularly relates to a graph data heterogeneous hierarchical storage structure based on a persistent memory and a method thereof.
Background
In the existing graph storing and querying technology, for data persistence and querying, data is generally persistence in a disk, and the data is loaded into a memory in a lazy loading or full loading mode for calculation and then returned to a client. If data is stored through a disk, a large amount of waiting time is necessarily consumed for reading the data, and the lazy loading mode balances the time of data I/O to each inquiry, so that the method is favorable for memory management and memory use limitation, but because of the characteristic of graph calculation, the number of accessed edge data is exponentially increased by the expansion operation, the full graph data can be possibly traversed after a few expansion operations, and the inquiry speed is seriously reduced if the lazy loading mode is adopted. The full load method is an important method for solving the problem of the delay of the query of the graph, but the full load needs to load all data into the memory during the initialization of the program running, which can seriously slow down the starting speed, and can cause the memory to be exhausted because the memory space is usually smaller. Therefore, attempts to migrate data to a higher read and write speed sustainable device have been a trend, but if the data structures in memory are moved as such to the persistent memory, a large number of writes may create a significant performance bottleneck due to read and write asymmetry in the persistent memory.
Disclosure of Invention
In order to solve the problems of low query speed, limited memory space, serious performance bottleneck possibly caused by a large number of writes in a persistent memory and the like in the prior art, the invention provides a graph data heterogeneous hierarchical storage structure based on the persistent memory and a method thereof.
The invention is realized by the following technical scheme:
a graph data heterogeneous hierarchical storage structure based on a persistent memory, wherein the graph data heterogeneous hierarchical storage structure comprises a hierarchical heterogeneous hash table and a hierarchical heterogeneous B+ tree;
wherein, the topology data of the graph data and the equivalent query data structure in the index data are respectively stored by adopting a hierarchical heterogeneous hash table;
the range query data structure in the index data of the graph data is stored by adopting a hierarchical heterogeneous B+ tree.
In the existing graph storage and query technology, the lazy loading mode can seriously reduce the query speed; the full loading mode needs to load all data into the memory when the program is initialized, the starting speed is seriously slowed down, and the memory is possibly exhausted due to the smaller memory space; while persisting a large number of writes in memory can create a serious performance bottleneck. According to the characteristic of the graph data, the heterogeneous hierarchical storage structure provided by the invention is used for constructing the equivalent query data structure in the hierarchical heterogeneous hash table storage topology data and the index data, and constructing the range query data structure in the hierarchical heterogeneous B+ tree storage index data, so that the writing times of the data structure in the persistent memory are reduced, and the query speed is improved.
As a preferred embodiment, the hierarchical heterogeneous hash table is divided into an upper layer and a lower layer, wherein the size of the lower layer is one half of that of the upper layer, namely, every two barrels of the upper layer corresponds to one barrel of the lower layer;
the hierarchical heterogeneous hash table adopts WAL logs to perform addition, deletion and modification operations.
As a preferred embodiment, each barrel contains 8 key value pairs, when one key is written into a hash table, firstly, inquiring whether the key exists in the current barrel, if not, continuing to inquire whether the key exists in the lower layer, if not, directly writing, if the upper layer and the lower layer are full of barrels, setting the lower layer as a temporary storage layer, setting the upper layer as the lower layer, starting to perform re-hash operation, distributing the new upper layer to be twice the original upper layer, and re-hashing the key of the temporary storage layer into the upper layer.
As a preferred implementation mode, the hierarchical heterogeneous B+ tree adopts DRAM and persistent memory to store in a hierarchical mode, the leaf layers of the B+ tree, namely all leaf nodes are stored in the persistent memory, and the upper node is stored in the DRAM;
each leaf node stores 16 256B key-value pairs, i.e., each leaf node is 4KB in size;
and the hierarchical heterogeneous B+ tree adopts WAL logs to perform addition, deletion and modification operations.
When the operation of adding, deleting and changing is carried out, the invention firstly writes the modified leaf node into the DRAM, points the pointer of the father node to the DRAM, then records the operation into the WAL log of the persistent memory, starts a thread for timing application log, applies the operation of log record to the leaf node stored in the persistent memory through the transaction, and points the pointer of the father node in the DRAM to the updated node in the persistent memory after one log is commit to the persistent memory.
As a preferred embodiment, the WAL log of the present invention is 256 bytes in size, and the fields include: 2 bytes of CRC16 check code, 1 byte of operation type OPTYPE, 1 byte of VALUE byte length value_LEN, 4 bytes of sequence code SERIAL_NUM, 8 bytes of KEY CONTENT KEY_CONTENT, and maximum 240 bytes of VALUE CONTENT value_CONTENT; each log is written to the tail end of the log in an incremental manner.
In a second aspect, the present invention further provides a data storage method based on the graph data heterogeneous hierarchical storage structure, where the data storage method includes:
when the graph database process is started, the hierarchical heterogeneous B+ tree firstly reads leaf nodes in the persistent memory, and builds upper nodes of the B+ tree in the DRAM, wherein a bottom pointer points to the leaf nodes in the persistent memory;
when a graph topology and node attributes thereof are imported, firstly, two hierarchical heterogeneous hash tables are constructed, and mapping of node IDs and source edges and destination edges thereof and mapping of edge IDs and source nodes and destination nodes thereof are respectively stored;
and establishing a hierarchical heterogeneous hash table storage equivalent index or establishing a hierarchical heterogeneous B+ tree storage range query index according to the attribute type of the node.
As a preferred embodiment, the present invention writes data to a hierarchical heterogeneous hash table:
writing key value pairs into the DRAM hash table for temporary storage;
simultaneously writing the key value pair into a WAL log stored in a persistent memory;
the background thread periodically analyzes the WAL log and changes the modification to a persistent memory hash table;
releasing the space of the temporary key value pair of the DRAM hash table;
when writing data to the hierarchical heterogeneous b+ tree:
if the leaf node space is enough, the node splitting or merging is not triggered, copying the corresponding written leaf node in the persistent memory into the DRAM, writing the value into the leaf node, and pointing an upper pointer to the leaf node; if the leaf node space is insufficient, two new leaf nodes are generated in the DRAM, and an upper pointer is pointed to the two new leaf nodes;
writing the key value pair into a WAL log stored in a persistent memory;
the background thread periodically analyzes the WAL log and changes the modification to the leaf node of the persistent memory;
and releasing the space of the leaf node temporarily stored in the B+ tree in the DRAM.
In a third aspect, the present invention further provides a data query method based on the heterogeneous hierarchical storage structure of graph data, where the data query method includes:
when access to the hierarchical heterogeneous hash table is required:
calculating a hash_key through a hash function;
searching in a DRAM hash table, if the hash_key does not exist, continuing to the next step, if the hash_key exists, judging whether the mark is empty, if so, returning to the empty, otherwise, returning to the value;
and searching in the persistent memory hash table, returning a value if the hash_key exists, and otherwise returning to the null state.
As a preferred embodiment, the data query method of the present invention further includes:
when access to the hierarchical heterogeneous b+ tree is required:
the first value larger than the key value is searched in the root node in a binary mode, if the first value exists, the child node corresponding to the node is accessed, otherwise, the last child node pointer of the root node is accessed;
the child node is used as a current node, a first value larger than a key value is found in the current node in two halves, if the first value exists, the child node corresponding to the current node is accessed, otherwise, the last child node pointer of the current node is accessed, and the current operation is circulated until the leaf node is accessed;
when the pointer of the leaf node is obtained, if the pointer is positioned in the DRAM, the leaf node is read in the DRAM, otherwise, the leaf node is read in the persistent memory, whether the key value exists or not is judged through binary search, if the key value exists, the value is returned, and otherwise, the value is returned to be empty.
Compared with the prior art, the invention has the following advantages and beneficial effects:
1. the hierarchical heterogeneous storage hash table and the hierarchical heterogeneous storage B+ tree are constructed by utilizing the heterogeneous characteristics, so that the write delay and write times of the persistent memory are reduced, and the read speed of the DRAM and the persistent memory is utilized to accelerate the inquiry;
2. the invention uses DRAM to record the modified content, and writes the persistent WAL log in 256 bytes as unit, and places the modification of the persistent memory in the background thread for writing, and uses the data recorded by DRAM to greatly reduce the blocking caused by simultaneous reading and writing of the persistent memory, and simultaneously uses the property of hierarchical hash to reduce the writing times of the persistent memory and improve the service life of the persistent memory.
Drawings
The accompanying drawings, which are included to provide a further understanding of embodiments of the invention and are incorporated in and constitute a part of this application, illustrate embodiments of the invention. In the drawings:
FIG. 1 is a diagram topology and topology data hash representation intent of an embodiment of the present invention;
FIG. 2 is a schematic diagram of a hierarchical heterogeneous B+ tree boot-up according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a hierarchical heterogeneous hash table write data according to an embodiment of the present invention;
FIG. 4 is a diagram of hierarchical heterogeneous B+ tree write data according to an embodiment of the present invention;
FIG. 5 is a schematic diagram of a hierarchical heterogeneous hash table query data according to an embodiment of the present invention;
fig. 6 is a hierarchical heterogeneous hash table lookup flow diagram in accordance with an embodiment of the present invention.
Detailed Description
For the purpose of making apparent the objects, technical solutions and advantages of the present invention, the present invention will be further described in detail with reference to the following examples and the accompanying drawings, wherein the exemplary embodiments of the present invention and the descriptions thereof are for illustrating the present invention only and are not to be construed as limiting the present invention.
Examples:
in the existing graph storage and query technology, the lazy loading mode spreads the time of data I/O to each query, and the query speed is seriously reduced although the method is beneficial to memory management and memory use restriction; the full loading mode needs to load all data into the memory when the program is initialized, the starting speed is seriously slowed down, and the memory is possibly exhausted due to the smaller memory space; while persisting a large number of writes in memory can create a serious performance bottleneck. Based on this, the embodiment provides a graph data heterogeneous hierarchical storage structure based on a persistent memory, and the heterogeneous hierarchical storage structure according to the characteristics of graph data adopts a hierarchical hash table heterogeneous storage structure to store topology data in the graph data and equivalent query data structures in index data, and simultaneously adopts a b+ Tree storage structure for hierarchical storage to store range query data structures in the index data, wherein the attribute data is still persisted to a disk by using an LSM-Tree, so that the problem of limited query speed and memory space is solved, and meanwhile, write delay is reduced by converting random write of the persistent memory into sequential write, and write times of the data structures in the persistent memory are reduced by constructing a hierarchical storage architecture, so that the query speed is improved.
In the graph data query, the first is topology data query which needs to be calculated by utilizing graph topology information, the topology data is stored by using a hash table, and the topology data of the node/side is queried through the node/side id; the second type is to query the attribute information of the graph node, and because the attribute information occupies a large amount of storage space and the attribute information is only accessed at the end point of the graph query, the embodiment persists the graph attribute information to a disk by using an LSM-Tree; the last one is index query, which is divided into two types, one is equivalent query, the embodiment uses hash table storage, the other is range query, and B+ tree storage is used. Because index and topology queries need to be frequently used in graph queries, and the involved edges and nodes increase exponentially along with the number of expand times, the space occupied by the hash table and the B+ tree is suitable for being stored in the persistent memory, and therefore the embodiment is mainly optimized for the storage of the hash table and the B+ tree in the persistent memory.
In order to solve the problem of high random write delay of the persistent memory, the embodiment adopts a hierarchical heterogeneous storage hash table to reduce random write, and the hierarchical heterogeneous storage hash table adopts a log to reduce the write delay of the persistent content; specifically, when performing the adding and deleting operation on the persistent memory hash table, using the WAL log, firstly recording the operation into the WAL log, and then writing the newly added data into the DRAM hash table (i.e. DRAM HashMap), wherein each WAL log has a size of 256 bytes, and the fields comprise: CRC16 check code (2 bytes), operation type OPTYPE (1 byte), VALUE byte length value_LEN (1 byte), sequence code SERIAL_NUM (4 bytes), KEY CONTENT KEY_CONTANT (8 bytes), VALUE CONTENT value_CONTANT (side length, 240 bytes maximum), each log is written into the tail end of the log in an add mode, and since the 8-byte atomic write can be guaranteed for the persistent memory (Pmen), the 8 bytes total of CRC16 check code + operation type + VALUE byte length + sequence code can be guaranteed to be written into the persistent memory, so CRC16 check code can be used for judging whether the log is written successfully and completely. The 256 bytes of each log can ensure that the sequential write performance of the persistent memory is utilized to the greatest extent. At the same time, a thread of a timing application (apply) log is started, and the transaction is used to apply the log into a hash table (i.e., PMEM HashMap) of the persistent memory. In addition, the hierarchical heterogeneous storage hash table utilizes the idea of hierarchical hash (pengfeizuo et.) to reduce the data to be modified in the hash table re-hash (rehash); specifically, the hash table is divided into an upper layer and a lower layer, the size of the lower layer is 1/2 of that of the upper layer, every two barrels of the upper layer correspond to one barrel of the lower layer, each barrel contains 8 key-value pairs, when one key is written into the hash table, whether the key exists in the current barrel is firstly inquired, if the key does not exist, the lower layer is continuously inquired whether the key exists, and if the key does not exist, the writing is directly carried out; if the barrels of the upper layer and the lower layer are full, the lower layer is set as a temporary storage layer, the upper layer is set as the lower layer, the re-hash operation is started, the new upper layer is distributed to be twice as large as the original upper layer, the key of the temporary storage layer is re-hashed into the upper layer, and in this way, the key-value pair changed by the re-hash can be reduced by 1/3 of the whole hash table.
Because a large number of B+ tree upper structures need to be modified when nodes are added and deleted in the traditional B+ tree, a large number of write operations can be formed, and in order to reduce the write operations in the persistent memory, the embodiment adopts a layered heterogeneous storage B+ tree; specifically, in this embodiment, the DRAM and the persistent memory are used for hierarchical storage, and the leaf layers of the b+ tree, i.e., all leaf nodes, are stored in the persistent memory (i.e., the b+ tree PMEM portion), and the upper node is stored in the DRAM (i.e., the b+ tree DRAM portion). Each leaf node stores 16 key value pairs of 256B, namely each leaf node is 4KB in size, when writing, modifying and deleting operations exist, the modified leaf node is written into a DRAM, the pointer of a parent node of the modified leaf node points to the DRAM, then the operation is recorded into a WAL log of a persistent memory, the WAL log format is the same as that of the hierarchical heterogeneous storage hash table, a thread for timing application log is started, the operation recorded by the log is applied to the leaf node stored in the persistent memory through the transaction, and after one log is completed to the persistent memory, the pointer of the parent node in the DRAM points to the updated node in the persistent memory.
The process of storing and querying the graph database data based on the hierarchical heterogeneous storage structure specifically includes the following steps:
the data storage (data writing) process is:
when the machine is started (i.e. when the graph database process is started), the hierarchical heterogeneous B+ tree needs to read leaf nodes in the persistent memory first, and an upper node of the B+ tree is built in the DRAM, and a bottom pointer points to the leaf nodes in the persistent memory, as shown in FIG. 2;
when a graph topology and its node attributes are imported, two hierarchical heterogeneous hash tables are first constructed, and the mappings of the node IDs and their source and destination edges, and the mappings of the edge IDs and their source and destination nodes are stored, respectively, as shown in fig. 1.
For the attributes of the nodes, indexes are required to be established according to requirements, the attributes are divided into equivalent indexes and range query indexes according to attribute types, and a hierarchical heterogeneous hash table and a hierarchical heterogeneous B+ tree can be established for storing the equivalent indexes and the range query indexes respectively.
As shown in particular in fig. 3, when writing data to the hierarchical heterogeneous hash table:
writing key value pairs into the DRAM hash table for temporary storage;
simultaneously writing the key value pair into a WAL log stored in a persistent memory;
the background thread periodically analyzes the WAL log and changes the modification to the persistent memory hash table;
and releasing the space of the temporary key value pair of the DRAM hash table.
As shown in particular in fig. 4, when writing data to the hierarchical heterogeneous b+ tree:
if the leaf node space is enough, the node splitting or merging is not triggered, copying the corresponding written leaf node in the persistent memory into the DRAM, writing the value into the leaf node, and pointing an upper pointer to the leaf node; if the leaf node space is insufficient, two new leaf nodes are generated in the DRAM, and an upper pointer is pointed to the two new leaf nodes;
writing the key value pair into a WAL log stored in a persistent memory;
the background thread can analyze WAL logs regularly and change the changes to leaf nodes of the persistent memory;
and releasing the space of the leaf node temporarily stored in the B+ tree in the DRAM.
The data query process comprises the following steps:
when expansion operation is performed, a destination point set of all outgoing edges or incoming edges of the node set is obtained from one node set, and a hierarchical heterogeneous hash table needs to be accessed, which is shown in fig. 5-6 in detail:
calculating a hash_key through a hash function hash (x);
searching in a DRAM hash table, if the hash_key does not exist, continuing to the next step, if the hash_key exists, judging whether the mark is empty, if so, returning to the empty, otherwise, returning to the value;
and searching in the persistent memory hash table, returning a value if the hash_key exists, and otherwise returning to the null state.
When a node needs to be looked up by attribute, a hierarchical heterogeneous b+ tree of attribute indexes needs to be accessed:
the first value larger than the key value is searched in the root node in a binary mode, if the first value exists, the child node corresponding to the node is accessed, otherwise, the last child node pointer of the root node is accessed;
the child node (namely the child node determined in the last step) is used as a current node, a first value larger than a key value is found in the current node in a binary mode, if the first value exists, the child node corresponding to the current node is accessed, otherwise, the last child node pointer of the current node is accessed, and the current operation is circulated until a leaf node is accessed;
when the pointer of the leaf node is obtained, if the pointer is positioned in the DRAM, the leaf node is read in the DRAM, otherwise, the leaf node is read in the persistent memory, whether the key value exists or not is judged through binary search, if the key value exists, the value is returned, and otherwise, the value is returned to be empty.
The following description will be given by taking the following specific examples as examples for the graph database data storage flow and query:
the storage process (writing process) specifically includes:
when the machine is started (namely when a graph database process is started), the hierarchical heterogeneous B+ tree needs to read leaf nodes in the persistent memory at first to construct an upper node;
when a graph topology and its node attributes are imported, two hierarchical heterogeneous hash tables are first constructed, and the mappings of node IDs and their source and destination edges, and the mappings of edge IDs and their source and destination nodes are stored in a distributed manner. For the attributes of the nodes, indexes are required to be established according to requirements, the attributes are divided into equivalent indexes and range query indexes according to attribute types, a hierarchical heterogeneous hash table and a hierarchical heterogeneous B+ tree can be established for meeting the requirements, and the range query indexes are assumed to be established for the attributes of the ages.
For both nodes, it is necessary to add a key pair {78562323330406344, { "out_edge_id":1}, {43455993040678557, { "in_edge_id":1}, to the topology data hash table, and a key pair {19,78562323330406344}, {25,43455993040678557}, to the age attribute index data.
When writing data to the topology data hash table:
(1) Writing key value pairs {78562323330406344, { out_edge_id ":1} } into the DRAM hash table for temporary storage;
(2) Writing operation to WAL log, setting OPTY as 1 (writing), setting KEY as 856232333043344, setting value as { "out_edge_id":1} serialized character string to obtain log with length of 256 bytes, writing WAL log stored in persistent memory (PMEM);
(3) The background thread can analyze WAL log regularly and change the change to the hash table of the persistent memory;
(4) The space of the DRAM hash table scratch key pair {78562323330406344, { "out_edge_id":1} } } }, is freed.
When {19,78562323330406344} is written to the age attribute index B+ tree:
(1) If the leaf node space is enough, the node splitting or merging is not triggered, copying the corresponding written leaf node in the persistent memory into the DRAM, writing the value into the leaf node, and pointing an upper pointer to the leaf node;
if the leaf node space is insufficient, two new leaf nodes are generated in the DRAM and the upper layer pointer is directed to the two leaf nodes.
(2) Writing to the WAL log, setting the OPTY as 1 (writing), setting the KEY as 19, setting the VALUE as 78562323330406344 serialized character string to obtain the log with the length of 256 bytes at most, and writing the log into the WAL log stored in the persistent memory.
(3) The background thread will parse the WAL log at regular time, change the change to the leaf node of the b+ tree in the persistent memory, and direct the upper layer pointer change of the DRAM to the leaf node of the persistent memory.
(4) And releasing the space of the leaf node temporarily stored in the B+ tree in the DRAM.
When {25,43455993040678557} is written to the age attribute index B+ tree:
(1) If the leaf node space is enough, the node splitting or merging is not triggered, copying the corresponding written leaf node in the persistent memory into the DRAM, writing the value into the leaf node, and pointing an upper pointer to the leaf node;
if the leaf node space is insufficient, two new leaf nodes are generated in the DRAM, and an upper pointer is pointed to the two leaf nodes;
(2) Writing to the WAL log, setting the OPTY of the operation code to 1 (writing), setting the KEY to 25, setting the VALUE to 43455993040678557, serializing the character string to obtain the log with the length of 256 bytes at most, and writing the log into the WAL log stored in the persistent memory.
(3) The background thread will parse the WAL log at regular time, change the change to the leaf node of the b+ tree in the persistent memory, and direct the upper layer pointer change of the DRAM to the leaf node of the persistent memory.
(4) The space of two leaf nodes temporarily stored in DRAM+tree is released.
The query process specifically includes:
when the expansion operation is performed, namely, a node set is obtained, and all destination point sets of outgoing edges or incoming edges of the node set are obtained, so that the topology data hash table needs to be accessed.
When node 78562323330406344 looks up its outgoing edge 1, and its outgoing edge 1 looks up the destination node 43455993040678557, the topology hash table is accessed:
(1) The hash_key is calculated by a hash function hash (x).
(2) Searching in the DRAM hash table, if the hash_key does not exist, continuing to the next step, if the hash_key exists, judging whether the hash_key is marked as empty, if so, returning to the empty, otherwise, returning to the value.
(3) And searching in the persistent memory hash table, returning a value if the hash_key exists, and otherwise returning to the null state.
When a node needs to be searched by an attribute, for example, a node with an age=19 needs to be searched, an index b+ tree of the age attribute needs to be accessed:
(1) And searching the first value larger than 19 in the root node in a binary mode, if the first value exists, accessing the child node corresponding to the node, otherwise, accessing the last child node pointer of the root node.
(2) The child node is used as a current node, a first value larger than 19 is found in the current node in two halves, if the first value exists, the child node corresponding to the current node is accessed, otherwise, the last child node pointer of the current node is accessed, and the current operation is circulated until the leaf node is accessed.
(3) When the leaf node pointer is acquired, if the pointer is located in the DRAM, the leaf node is read in the DRAM, otherwise, the leaf node is read in the persistent memory. And judging whether key=19 exists or not through binary search, if so, returning a value, namely node id78562323330406344, otherwise, returning to the null state.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the invention, and is not meant to limit the scope of the invention, but to limit the invention to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (10)

1. The graph data heterogeneous hierarchical storage structure based on the persistent memory is characterized by comprising a hierarchical heterogeneous hash table and a hierarchical heterogeneous B+ tree;
wherein, the topology data of the graph data and the equivalent query data structure in the index data are respectively stored by adopting a hierarchical heterogeneous hash table;
the range query data structure in the index data of the graph data is stored by adopting a hierarchical heterogeneous B+ tree.
2. The heterogeneous hierarchical storage structure of graph data based on persistent memory according to claim 1, wherein the hierarchical heterogeneous hash table is divided into an upper layer and a lower layer, and the size of the lower layer is one half of the size of the upper layer, namely, every two buckets of the upper layer correspond to one bucket of the lower layer;
the hierarchical heterogeneous hash table adopts WAL logs to perform addition, deletion and modification operations.
3. The heterogeneous hierarchical memory structure of persistent memory-based graph data according to claim 2, wherein each bucket contains 8 key-value pairs, when a key is written into the hash table, it is first queried whether the key exists in the current bucket, if not, it is continued to query whether the key exists in the lower layer, if not, it is directly written, if both the upper and lower buckets are full, it is put into the temporary storage layer, it is put into the lower layer, it starts to perform the re-hash operation, the new upper layer is twice as large as the original upper layer, and the key of the temporary storage layer is re-hashed into the upper layer.
4. The graph data heterogeneous hierarchical storage structure based on the persistent memory according to claim 1, wherein the hierarchical heterogeneous B+ tree is stored in a DRAM and the persistent memory in a hierarchical manner, leaf layers of the B+ tree, namely all leaf nodes, are stored in the persistent memory, and upper nodes are stored in the DRAM;
each leaf node stores 16 256B key-value pairs, i.e., each leaf node is 4KB in size;
and the hierarchical heterogeneous B+ tree adopts WAL logs to perform addition, deletion and modification operations.
5. The heterogeneous hierarchical memory structure of claim 4, wherein when adding, deleting and changing operations are performed, a modified leaf node is written into a DRAM, a pointer of a parent node of the modified leaf node is pointed to the DRAM, then the operation is written into a WAL log of the persistent memory, a thread for timing application of the log is started, the operation of logging is applied to the leaf node stored in the persistent memory through the transaction, and when a log commit is performed to the persistent memory, the pointer of the parent node in the DRAM is pointed to an updated node in the persistent memory.
6. The heterogeneous hierarchical persistent memory-based graph data storage structure of any of claims 2-5, wherein the WAL log size is 256 bytes and the fields include: 2 bytes of CRC16 check code, 1 byte of operation type OPTYPE, 1 byte of VALUE byte length value_LEN, 4 bytes of sequence code SERIAL_NUM, 8 bytes of KEY CONTENT KEY_CONTENT, and maximum 240 bytes of VALUE CONTENT value_CONTENT; each log is written to the tail end of the log in an incremental manner.
7. A data storage method based on a graph data heterogeneous hierarchical storage structure based on a persistent memory according to any one of claims 1 to 6, wherein the data storage method comprises:
when the graph database process is started, the hierarchical heterogeneous B+ tree firstly reads leaf nodes in the persistent memory, and builds upper nodes of the B+ tree in the DRAM, wherein a bottom pointer points to the leaf nodes in the persistent memory;
when a graph topology and node attributes thereof are imported, firstly, two hierarchical heterogeneous hash tables are constructed, and mapping of node IDs and source edges and destination edges thereof and mapping of edge IDs and source nodes and destination nodes thereof are respectively stored;
and establishing a hierarchical heterogeneous hash table storage equivalent index or establishing a hierarchical heterogeneous B+ tree storage range query index according to the attribute type of the node.
8. The data storage method of claim 7, wherein when writing data to the hierarchical heterogeneous hash table:
writing key value pairs into the DRAM hash table for temporary storage;
simultaneously writing the key value pair into a WAL log stored in a persistent memory;
the background thread periodically analyzes the WAL log and changes the modification to a persistent memory hash table;
releasing the space of the temporary key value pair of the DRAM hash table;
when writing data to the hierarchical heterogeneous b+ tree:
if the leaf node space is enough, the node splitting or merging is not triggered, copying the corresponding written leaf node in the persistent memory into the DRAM, writing the value into the leaf node, and pointing an upper pointer to the leaf node; if the leaf node space is insufficient, two new leaf nodes are generated in the DRAM, and an upper pointer is pointed to the two new leaf nodes;
writing the key value pair into a WAL log stored in a persistent memory;
the background thread periodically analyzes the WAL log and changes the modification to the leaf node of the persistent memory;
and releasing the space of the leaf node temporarily stored in the B+ tree in the DRAM.
9. A data query method based on a graph data heterogeneous hierarchical storage structure based on persistent memory according to any one of claims 1 to 6, wherein the data query method comprises:
when access to the hierarchical heterogeneous hash table is required:
calculating a hash_key through a hash function;
searching in a DRAM hash table, if the hash_key does not exist, continuing to the next step, if the hash_key exists, judging whether the mark is empty, if so, returning to the empty, otherwise, returning to the value;
and searching in the persistent memory hash table, returning a value if the hash_key exists, and otherwise returning to the null state.
10. The data query method of claim 9, wherein the data query method further comprises:
when access to the hierarchical heterogeneous b+ tree is required:
the first value larger than the key value is searched in the root node in a binary mode, if the first value exists, the child node corresponding to the node is accessed, otherwise, the last child node pointer of the root node is accessed;
the child node is used as a current node, a first value larger than a key value is found in the current node in two halves, if the first value exists, the child node corresponding to the current node is accessed, otherwise, the last child node pointer of the current node is accessed, and the current operation is circulated until the leaf node is accessed;
when the pointer of the leaf node is obtained, if the pointer is positioned in the DRAM, the leaf node is read in the DRAM, otherwise, the leaf node is read in the persistent memory, whether the key value exists or not is judged through binary search, if the key value exists, the value is returned, and otherwise, the value is returned to be empty.
CN202311270531.8A 2023-09-28 2023-09-28 Graph data heterogeneous hierarchical storage structure based on persistent memory and method thereof Pending CN117349477A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311270531.8A CN117349477A (en) 2023-09-28 2023-09-28 Graph data heterogeneous hierarchical storage structure based on persistent memory and method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311270531.8A CN117349477A (en) 2023-09-28 2023-09-28 Graph data heterogeneous hierarchical storage structure based on persistent memory and method thereof

Publications (1)

Publication Number Publication Date
CN117349477A true CN117349477A (en) 2024-01-05

Family

ID=89364310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311270531.8A Pending CN117349477A (en) 2023-09-28 2023-09-28 Graph data heterogeneous hierarchical storage structure based on persistent memory and method thereof

Country Status (1)

Country Link
CN (1) CN117349477A (en)

Similar Documents

Publication Publication Date Title
US9672235B2 (en) Method and system for dynamically partitioning very large database indices on write-once tables
US9575976B2 (en) Methods and apparatuses to optimize updates in a file system based on birth time
CN110083601B (en) Key value storage system-oriented index tree construction method and system
CN105117415B (en) A kind of SSD data-updating methods of optimization
Leis et al. The adaptive radix tree: ARTful indexing for main-memory databases
US8868926B2 (en) Cryptographic hash database
WO2020186549A1 (en) Metadata management method, system and medium
CN103229164B (en) Data access method and device
CN105912687A (en) Mass distributed database memory cell
US10289709B2 (en) Interleaved storage of dictionary blocks in a page chain
US20220027349A1 (en) Efficient indexed data structures for persistent memory
US10521117B2 (en) Unified table delta dictionary memory size and load time optimization
CN103106286A (en) Method and device for managing metadata
CN110109927A (en) Oracle database data processing method based on LSM tree
CN113535670B (en) Virtual resource mirror image storage system and implementation method thereof
Amur et al. Design of a write-optimized data store
CN112732725B (en) NVM (non volatile memory) hybrid memory-based adaptive prefix tree construction method, system and medium
US10013442B2 (en) Database value identifier hash map
Jensen et al. Optimality in external memory hashing
CN110110034A (en) A kind of RDF data management method, device and storage medium based on figure
Carter et al. Nanosecond indexing of graph data with hash maps and VLists
WO2022068289A1 (en) Data access method, apparatus and device, and computer-readable storage medium
WO2022121274A1 (en) Metadata management method and apparatus in storage system, and storage system
CN117349477A (en) Graph data heterogeneous hierarchical storage structure based on persistent memory and method thereof
Cai et al. The Embedded IoT Time Series Database for Hybrid Solid‐State Storage System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination