CN117312267B - Line-level garbage collection mechanism based on peloton database - Google Patents

Line-level garbage collection mechanism based on peloton database Download PDF

Info

Publication number
CN117312267B
CN117312267B CN202310849987.3A CN202310849987A CN117312267B CN 117312267 B CN117312267 B CN 117312267B CN 202310849987 A CN202310849987 A CN 202310849987A CN 117312267 B CN117312267 B CN 117312267B
Authority
CN
China
Prior art keywords
version
garbage collection
garbage
peloton
data line
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310849987.3A
Other languages
Chinese (zh)
Other versions
CN117312267A (en
Inventor
张诗晨
宫学庆
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
East China Normal University
Original Assignee
East China Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by East China Normal University filed Critical East China Normal University
Priority to CN202310849987.3A priority Critical patent/CN117312267B/en
Publication of CN117312267A publication Critical patent/CN117312267A/en
Application granted granted Critical
Publication of CN117312267B publication Critical patent/CN117312267B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/219Managing data history or versioning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof

Abstract

The invention discloses a row-level garbage collection mechanism based on a peloton database. According to the invention, different garbage collection methods are selected according to different load scenes, so that visibility inspection and purposeless traversal overhead on each data line are avoided, and garbage collection efficiency is improved. The invention also optimizes some problems existing in peloton itself, such as untimely recycling of garbage versions generated by rollback transactions, and empty versions generated by deletion operations of unrendered commit transactions. The invention can effectively recover invalid or expired historical data versions, reduce the memory resource occupation of the system and improve the access efficiency of data lines and indexes. The invention can be applied to cloud service systems requiring high performance and high concurrency, such as e-commerce platforms, social networks, games.

Description

Line-level garbage collection mechanism based on peloton database
Technical Field
The invention relates to the technical field of database management, in particular to a row-level garbage collection mechanism based on a peloton database.
Background
peloton is a multi-version concurrency control technology based memory database management system that can support high concurrency and low latency transactions. peloton uses an epoch-based timestamp assignment mechanism to assign a unique epoch ID to each transaction to determine visibility and collision between transactions. peloton also uses a version chain-based data structure to maintain a version chain for each data line for storing historical data versions. When a transaction performs an update operation on a data line, it inserts a new version in the version chain header and marks the old version as invalid. In this way, in-situ updating of the data line can be avoided, and concurrency and performance are improved.
However, peloton does not implement any garbage collection mechanism, resulting in historical data versions accumulating, taking up significant memory resources, and reducing the access efficiency of data rows and indexes. At the same time, peloton also has problems such as untimely reclamation of garbage versions generated by rollback transactions, and non-reclamation of empty versions generated by delete operations that commit transactions. These problems further increase the storage overhead of the system and reduce the reusability of the data lines.
For this we propose a row-level garbage collection mechanism based on peloton database.
Disclosure of Invention
The invention aims to provide a row-level garbage collection mechanism based on a peloton database, which has the advantages of self-adaption, high efficiency, resource saving and performance improvement, and solves the problems of historical data version accumulation, overlarge storage cost and low access efficiency.
In order to achieve the above purpose, the present invention provides the following technical solutions: the row-level garbage collection mechanism based on the peloton database comprises a concurrent hash table, a threshold value, a background thread and a garbage collection period, wherein the concurrent hash table is used for recording information of each garbage version and comprises a data block ID, a data row ID, an expiration time and a version type; the threshold value is used for judging whether the garbage version information needs to be continuously maintained, if more than half of data lines in a certain data block are garbage versions, releasing a garbage version queue corresponding to the data block, and marking the data block as no-maintenance; the background thread is started regularly, each data block in the concurrent hash table is traversed, whether the data block is an expired version or not is judged according to the information of the garbage version and the global expired maximum epoch ID, and if the data block is the expired version, garbage recovery is carried out; if a certain data block is marked as no maintenance, each data line in the data block is directly traversed, whether the data block is an overdue version is judged according to the epoch ID in the version metadata, and if so, garbage collection is carried out; the garbage collection period is used for controlling the starting frequency of the background thread.
Preferably, the garbage collection comprises two steps of disconnecting from the index and recycling the storage space, and the garbage collection is completed in one garbage collection period.
Preferably, the step of disconnecting the index includes: acquiring a complete data line according to the junk version information or version metadata; traversing each index in the table of the data line, extracting a key value from the data line according to a field corresponding to the index, and forming a key value pair with a pointer pointing to the data line; the key value pair is deleted from the index.
Preferably, the step of reclaiming the storage space includes: acquiring a complete data line according to the junk version information or version metadata; traversing each field in the data line, if the field type is a variable length type, acquiring a pointer stored in the field, and releasing a memory pointed by the pointer; resetting version metadata of the data line to be an empty version; the data line is added to the reusable version slot queue.
Preferably, when the transaction is finished, the invalid version is directly processed according to the transaction state and the operation type, and the space occupied by the transaction object is timely released.
Preferably, when the transaction state is commit, if the operation type is update or delete, the generated old version is added into the concurrent hash table; if the operation type is insert and delete after insert, directly processing the generated new version; when the business state is rollback, if the operation type is update or deletion, directly processing the generated new version; if the operation type is insert, the new data line generated is processed directly.
Preferably, when the old version generated by the deletion operation is processed, the next version of the old version needs to be processed, and if the next version is an invalid version, the next version is regarded as an empty version generated by the deletion operation, and garbage collection is performed.
Preferably, the concurrency hash table is a concurrent_hash_map container provided by Intel TBB, the garbage version queue is a motycamel:: concurrentQueue container, and the reusable version slot queue is a motycamel:: concurrentQueue container.
Preferably, the epoch ID is a value obtained by right shifting an end_ts field in version metadata by 32 bits, and the maximum epoch ID that has been globally expired is the minimum epoch ID in all worker threads minus a constant.
Preferably, the data blocks are tile groups, each tile group containing a fixed number of data lines and metadata, each data line containing a plurality of fields and version metadata, each field containing a type, a length, and a value, each version metadata containing a transaction ID, a start time, an end time, a pointer to the data line, and a pointer to the next version.
Compared with the prior art, the invention has the following beneficial effects:
1. according to the invention, whether the garbage version information needs to be continuously maintained is judged by setting a threshold value, and the background thread directly traverses each data line in the data block marked as no longer maintained to recover the expiration version, so that the effects of avoiding visibility inspection and no-purpose traversal cost for each data line and improving the garbage recovery efficiency are achieved;
2. the invention records the information of each garbage version by using a concurrent hash table, and the background thread periodically traverses the hash table to recover the expired version, thereby achieving the effects of reducing the memory resource occupation of the system and improving the access efficiency of data lines and indexes;
3. according to the invention, the garbage collection strategy and parameters are adjusted according to the real-time performance index of the system by using a dynamic adjustment algorithm, so that the effects of selecting different garbage collection methods according to different load scenes and improving the performance and resource utilization rate of the system are achieved.
Drawings
FIG. 1 is a data structure and version chain of the Peloton database of the present invention;
FIG. 2 is a diagram of a garbage collection strategy and process of the present invention;
FIG. 3 is a system architecture diagram of the present invention;
FIG. 4 is a diagram illustrating a time stamp assignment mechanism according to the present invention;
FIG. 5 is a version of the chain structure of the present invention;
FIG. 6 is an index structure of the present invention;
FIG. 7 is a diagram of a concurrent hash table structure and reusable version slot queue structure in accordance with the present invention;
FIG. 8 is a flow chart of garbage collection according to the present invention;
FIG. 9 is a diagram of the Peloton's original system of the present invention and the memory change during program operation of the present invention;
FIG. 10 is a diagram showing the changes in memory during operation of the Peloton system of the present invention and the program of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The invention provides a row-level garbage collection mechanism based on a Peloton database, which is used for solving the problem of collection of a large number of historical data versions generated in a memory database by a multi-version concurrency control technology. The main technical scheme of the invention is as follows:
the invention adopts an epoch-based transaction management mode, judges whether the version is outdated by comparing the epoch_id in the version metadata with the global outdated maximum epoch_id, and avoids the spending of carrying out visibility inspection on each version.
The invention designs a self-adaptive row-level garbage recycling strategy, and different garbage recycling methods are selected according to different load scenes. In a dense reading scene, the method uses a concurrent hash table to record the information of each garbage version, and a background thread periodically traverses the hash table to recover the expired version; in a dense writing scene, the invention uses a threshold value to judge whether to continuously maintain the garbage version information, if more than half of data lines in a certain data block are garbage versions, the invention releases the garbage version queue corresponding to the data block, and a background thread directly traverses each data line in the data block to recover the outdated version. Therefore, the memory resource occupation of the system in a high concurrency writing transaction intensive scene can be reduced, and the unintentional traversal overhead is avoided.
The invention optimizes some problems existing in Peloton, including untimely recovery of garbage versions generated by rollback transactions, redundancy of garbage recovery processes, and empty versions generated by deletion operations of unrendered commit transactions. The invention enables the working thread to directly process the invalid version when the transaction is finished, and completes two steps of disconnecting the link and the index and recycling the storage space in a garbage recycling period. The invention also handles the null version generated by the delete operation so that the data line is reusable.
Peloton: an open source memory database management system supports OLTP and OLAP hybrid workloads, stores data versions using N2O, and requires garbage collection. epoch: a logical time unit represents a continuous transaction execution time for transaction management and garbage collection. Concurrent hash table: hash tables supporting multithreaded access and modification achieve high concurrency using fine-grained lock and lock-free techniques. Lock-free queues: a queue that does not use locks or synchronization primitives, implementing lock-free operation using smart algorithms, and improving concurrency performance, embodiments of the present invention will be described by way of four examples below:
example 1:
this example demonstrates the process of row-level garbage collection in a read intensive scenario of the present invention. Assume that there are 5 worker threads and 1 background thread in the system, and the system adopts an epoch-based transaction management mode, and the life cycle of each epoch is 40ms, and the garbage collection period is 200ms. The system has a table comprising two fields: id (int type, primary key) and value (varchar type). There is an index in the system, and a B+ tree index is established according to the id field. The system has a concurrent hash table unlink_map for recording information of each garbage version. The system has a reusable version slot queue recycle_queue for storing recovered version slots.
Assuming that the system is initially, there are two rows of data in the table, as shown in the following table:
table 1: initial data line in Table:
id value
1 “hello”
2 “world”
there are two key value pairs in index, as shown in the following table:
initial key value pair in Table 2:index
key value
1 Pointer to data line with id 1
2 Pointer to data line with id 2
Both unlink_map and recycle_queue are empty.
Assume that in the first epoch, there is one transaction T1 that performs the following:
and reading the data line with id of 1 to obtain value of hello. The data line with id 1 is updated, changing the value to "hi". The transaction is committed. Since transaction T1 initially joins the first epoch, its epoch ID is 0. Since transaction T1 performs the update operation, it inserts a new version in the version chain header and marks the old version as invalid. At the same time, it will add the old version of information to the unlink_map and update the new version of pointers to the index. Thus, after transaction T1 commits, the rows of data in the table are shown in the following table:
the key value pairs in index are shown in the following table:
key-value pair in Table 4:index (after transaction T1 commits)
key value
1 Pointer to new version with id 1
2 Pointer to data line with id 2
A pointer 2 of the new version with the key value 1 pointing to the id 1 points to a pointer of the data line with the id 2;
there is a key pair in the unlink_map as shown in the following table:
key-value pair in Table 5 unlink_map (after transaction T1 commits)
It is assumed that in the second epoch, no transaction is performed. Thus, after the second epoch is completed, the data in the system is unchanged. Assume in the third epoch that there is one transaction T2 that performs the following: and reading the data line with id of 2 to obtain value of 'world'. The data line with id 2 is deleted. The transaction is committed. Since transaction T2 initially joins the third epoch, its epoch ID is 2. Since transaction T2 performs a delete operation, it inserts an empty version in the version chain header and marks the old version as invalid. At the same time, it will add the old version and the empty version information to the unlink_map and update the empty version pointer to the index. Thus, after transaction T2 commits, the rows of data in the table are shown in the following table:
table 6 data rows in Table (after transaction T2 commit)
The key value pairs in index are shown in the following table:
key-value pair in Table 7:index (after transaction T2 commit)
key value
1 Pointer to new version with id 1
2 Pointer to null version with id 2
Key value 1 points to pointer 2 of new version with id 1 points to pointer of empty version with id 2
There are three key-value pairs in the unlink_map, as shown in the following table:
key-value pair in Table 8 unlink_map (after transaction T2 commit)
It is assumed that in the fourth epoch, no transaction is performed. Thus, after the fourth epoch is completed, the data in the system is unchanged. Assume in the fifth epoch that there is one transaction T3 that performs the following: reading the data line with id of 1, and obtaining value of 'hi'. Rollback transactions. Since transaction T3 initially joins the fifth epoch, its epoch ID is 4. Since transaction T3 performs a rollback operation, it does not generate any new version, nor does it modify any index or concurrent hash table. Thus, after transaction T3 rolls back, the data in the system is unchanged. It is assumed that in the sixth epoch, no transaction is executed. Thus, after the sixth epoch is completed, the data in the system is unchanged. Suppose in the seventh epoch, there is one transaction T4 that performs the following: a new data row is inserted with id 3 and value "hi". The transaction is committed.
Since transaction T4 initially joins the seventh epoch, its epoch ID is 6. Since transaction T4 performs an insert operation, it inserts a new version in the version chain header and marks the version as valid. At the same time, it will update the version pointer into index. Thus, after transaction T4 commits, the rows of data in the table are shown in the following table:
table 9 data rows in Table (transaction T4 post commit)
The key value pairs in index are shown in the following table:
key-value pairs in Table 10 index (after transaction T4 commit)
key value
1 Pointer to new version with id 1
3 Pointer to new version with id 3
Pointer 2 of the new version with key value 1 pointing to id 1 points to id 2 pointer 3 of the empty version of (3) points to the pointer of the new version with id 3
There are still three key-value pairs in the unlink_map, as shown in the following table:
key value pairs in Table 11 unlink_map (after transaction T4 commits);
it is assumed that in the eighth epoch, no transaction is executed. Thus, after the eighth epoch is completed, the data in the system is unchanged.
Assume that in the ninth epoch, a background thread is started and garbage collection is performed. Since the garbage collection cycle is 200ms and the life cycle of each epoch is 40ms, garbage collection is started every 5 epochs. The background thread first obtains the minimum epoch ID from the global lock-free queue and subtracts a constant (say 1) to get the global expired maximum epoch ID (say 7). The background thread then traverses each key-value pair in the unlink_map and determines whether it is an expired version based on the epoch ID and version type in the GCMetadata. If yes, garbage recovery is carried out; if not, skip. Therefore, after the background thread completes garbage collection, the data rows in the table are as follows:
table 12 data rows in the Table (after garbage collection)
id value epoch_id version_type next_version
1 “hi” 0 COMMIT_UPDATE NULL
2 NULL 2 COMMIT_DELETE NULL
3 “hi” 6 COMMIT_INSERT NULL
The key value pairs in index are shown in the following table:
key value pair in Table 13 index (after garbage collection)
key value
1 Pointer to new version with id 1
3 Pointer to new version with id 3
From the above process, in the present invention, in a read intensive scenario, a concurrent hash table is used to record information of each garbage version, and a background thread periodically traverses the hash table to recover an expired version. Thus, the visibility inspection of each data line can be avoided, and the garbage collection efficiency is improved. Meanwhile, the invention optimizes some problems existing in Peloton, such as untimely recovery of garbage versions generated by rollback transactions and empty versions generated by deletion operations of unrendered commit transactions. This can reduce the storage overhead of the system and improve the reusability of the data lines.
The invention provides a row-level garbage collection mechanism based on a Peloton database, which is a self-adaptive garbage collection strategy and selects different garbage collection methods according to different load scenes. The invention can effectively recover invalid or expired historical data versions, reduce the memory resource occupation of the system and improve the access efficiency of data lines and indexes. The invention can be applied to cloud service systems requiring high performance and high concurrency, such as e-commerce platforms, social networks, games.
This example demonstrates the process of row-level garbage collection in a read intensive scenario of the present invention. Assume that there are 5 worker threads and 1 background thread in the system, and the system employs a background thread that is started and garbage collected based on the assumption that epoch … (original contents omitted) … (original contents omitted) is in the ninth epoch. Since the garbage collection cycle is 200ms and the life cycle of each epoch is 40ms, garbage collection is started every 5 epochs. The background thread first obtains the minimum epoch ID from the global lock-free queue and subtracts a constant (say 1) to get the global expired maximum epoch ID (say 7). The background thread then traverses each key-value pair in the unlink_map and determines whether it is an expired version based on the epoch ID and version type in the GCMetadata. If yes, garbage recovery is carried out; if not, skip. Therefore, after the background thread completes garbage collection, the data rows in the table are as follows:
elements in Table 14 recovery_queue (after garbage recovery)
Pointer
Null version with pointing id 2
Pointing to an old version with id 2
From the above process, in the present invention, in a read intensive scenario, a concurrent hash table is used to record information of each garbage version, and a background thread periodically traverses the hash table to recover an expired version. Thus, the visibility inspection of each data line can be avoided, and the garbage collection efficiency is improved. Meanwhile, the invention optimizes some problems existing in Peloton, such as untimely recovery of garbage versions generated by rollback transactions and empty versions generated by deletion operations of unrendered commit transactions. This can reduce the storage overhead of the system and improve the reusability of the data lines.
In this embodiment, a simple mathematical model is used to estimate garbage version distribution and recovery efficiency under different load scenarios. Assume that there are N data rows in the system, each data row having M versions, each version occupying S bytes. Assuming there are R read transactions and W write transactions in the system, each transaction accesses a data row, and each transaction is executed for a time T. Assuming that the garbage collection period in the system is G, the execution time of each garbage collection is H. Then the following formula can be derived:
the total memory overhead of the system is: o=n×m×s
The throughput of the system is: q= (r+w)/T
The garbage version proportion of the system is as follows: p=w/(r+w)
The garbage recovery efficiency of the system is as follows: e=p×n×s/H
From these formulas, it can be seen that the higher the garbage version ratio P of the system, the greater the storage overhead O of the system, and the lower the garbage collection efficiency E of the system. Therefore, the invention uses a concurrent hash table to record the information of each garbage version, and the background thread periodically traverses the hash table to recover the outdated version, thereby effectively reducing the storage overhead O of the system and improving the garbage recovery efficiency E of the system.
Example 2:
this example demonstrates the process of row-level garbage collection in a write-intensive scenario of the present invention. Assume that there are 5 worker threads and 1 background thread in the system, and the system adopts an epoch-based transaction management mode, and the life cycle of each epoch is 40ms, and the garbage collection period is 200ms. The system has a table comprising two fields: id (int type, primary key) and value (varchar type). There is an index in the system, and a B+ tree index is established according to the id field. The system has a concurrent hash table unlink_map for recording information of each garbage version. The system has a reusable version slot queue recycle_queue for storing recovered version slots. And a threshold value threshold is also arranged in the system and is used for judging whether the garbage version information needs to be continuously maintained. Assuming that each data block (tile group) can hold 1000 data lines, the threshold is 500, i.e. if more than 500 data lines in a certain data block are all garbage versions, the garbage version information in the data block is not maintained.
Assuming that the system is initially set up, there is a data block in the table, which contains 1000 data rows, as shown in the following table:
table 15 data rows in the Table (System initial)
id value
1 “hello”
2 “world”
1000 “bye”
There are 1000 key value pairs in index, as shown in the following table:
table 16 key-value pairs in index (system initially) assume that in the first epoch, there are 5 transactions T1-T5 that perform the following simultaneously:
key value
1 pointer to data line with id 1
2 Pointer to data line with id 2
1000 Pointer to data line with id of 1000
The data line with id 1-1000 is updated, and the value is changed to "hi".
The transaction is committed.
Since these transactions added the first epoch at the beginning, their epoch ID is all 0. Since these transactions perform update operations, they insert new versions in the version chain header and mark the old versions as invalid. At the same time, they will add the old version of information to the unlink_map and update the new version of pointers to the index. Thus, after these transactions commit, the rows of data in the table are shown in the following table:
table 17 data lines in the Table (after transactions T1-T5 commit).
It is assumed that in the second epoch, no transaction is performed. Thus, after the second epoch is completed, the data in the system is unchanged.
Assume that in the third epoch, there are 5 transactions T6-T10 that are simultaneously performing the following operations:
the data line with id 1-1000 is updated, and the value is changed to "hey".
The transaction is committed.
Since these transactions added the third epoch at the beginning, their epoch ID was all 2. Since these transactions perform update operations, they insert new versions in the version chain header and mark the old versions as invalid. At the same time, they will add the old version of information to the unlink_map and update the new version of pointers to the index. Thus, after these transactions commit, the rows of data in the table are shown in the following table:
table 18 data rows in the Table (after transactions T6-T10 commit)
The key value pairs in index are shown in the following table:
key-value pairs in Table 19:index (after transaction T6-T10 commit)
key value
1 Pointer to new version with id 1
2 Pointer to new version with id 2
1000 Pointer to new version with id of 1000
There are 2000 key-value pairs in the unlink_map, as shown in the following table:
key-value pairs in Table 20 unlink_map (after transaction T6-T10 commit)
The recycle_queue is still empty.
It can be seen from the above procedure that, since a large number of update operations are performed in the system, so that more than half of the data lines in each data block are garbage versions, the present invention does not maintain garbage version information in the data blocks, but releases the corresponding garbage version queues, and marks the data blocks as no longer being maintained. Therefore, the memory resource occupation of the system in a high concurrency writing transaction intensive scene can be reduced, and the unintentional traversal overhead is avoided.
It is assumed that in the fourth epoch, no transaction is performed. Thus, after the fourth epoch is completed, the data in the system is unchanged.
Assume that in the fifth epoch, a background thread is started and garbage collection is performed. The background thread first obtains the minimum epoch ID from the global lock-free queue and subtracts a constant (say 1) to get the global expired maximum epoch ID (say 3). The background thread then traverses each key-value pair in the unlink_map and determines whether it is an expired version based on the epoch ID and version type in the GCMetadata. If yes, garbage recovery is carried out; if not, skip. Since only the garbage version queue corresponding to the first data block exists in the unlink_map, the background thread only performs garbage collection on the garbage versions. Therefore, after the background thread completes garbage collection, the data rows in the table are as follows:
data lines in Table 21 (after the background thread finishes garbage collection)
id value epoch_id version_type next_version
1 “hey” 2 COMMIT_UPDATE NULL
2 “hey” 2 COMMIT_UPDATE NULL
1000 “hey” 2 COMMIT_UPDATE NULL
The key value pairs in index are shown in the following table:
key value pair in Table 22:index (after background thread completes garbage collection)
key value
1 Pointer to new version with id 1
2 Pointer to new version with id 2
1000 Pointer to new version with id of 1000
The unlink_map is null.
There are 1000 elements in the recycle_queue, which are pointers to old versions with ids 1-1000, respectively.
From the above process, it can be seen that in the write-intensive scenario, the present invention uses a threshold to determine whether to continue maintaining garbage version information, and the background thread directly traverses each data line in the data block marked as no longer maintained to recover the expired version. Thus, the visibility inspection of each data line can be avoided, and the garbage collection efficiency is improved. Meanwhile, the invention optimizes some problems existing in Peloton, such as untimely recovery of garbage versions generated by rollback transactions and empty versions generated by deletion operations of unrendered commit transactions. This can reduce the storage overhead of the system and improve the reusability of the data lines.
This example demonstrates the process of row-level garbage collection in a write-intensive scenario of the present invention. It is assumed that there are 5 worker threads and 1 background thread in the system, and the system employs the assumption that there is no transaction execution in the eighth epoch based on epoch … (original contents omitted) … (original contents omitted). Thus, after the eighth epoch is completed, the data in the system is unchanged.
Assume that in the ninth epoch, a background thread is started and garbage collection is performed. The background thread first obtains the minimum epoch ID from the global lock-free queue and subtracts a constant (say 1) to get the global expired maximum epoch ID (say 7). The background thread then traverses each key-value pair in the unlink_map and determines whether it is an expired version based on the epoch ID and version type in the GCMetadata. If yes, garbage recovery is carried out; if not, skip. Since only the garbage version queue corresponding to the first data block exists in the unlink_map, the background thread only performs garbage collection on the garbage versions. Therefore, after the background thread completes garbage collection, the data rows in the table are as follows:
table23 data line in Table (after background thread finishes garbage collection)
id value epoch_id version_type next_version
1 “hey” 2 COMMIT_UPDATE NULL
2 “hey” 2 COMMIT_UPDATE NULL
1000 “hey” 2 COMMIT_UPDATE NULL
The key value pairs in index are shown in the following table:
key value pair in Table 24:index (after background thread completes garbage collection)
key value
1 Pointer to new version with id 1
2 Pointer to new version with id 2
1000 Pointer to new version with id of 1000
The unlink_map is null.
Elements in Table 25 recycle_queue (after background thread finishes garbage collection)
From the above process, it can be seen that in the write-intensive scenario, the present invention uses a threshold to determine whether to continue maintaining garbage version information, and the background thread directly traverses each data line in the data block marked as no longer maintained to recover the expired version. Thus, the visibility inspection of each data line can be avoided, and the garbage collection efficiency is improved. Meanwhile, the invention optimizes some problems existing in Peloton, such as untimely recovery of garbage versions generated by rollback transactions and empty versions generated by deletion operations of unrendered commit transactions. This can reduce the storage overhead of the system and improve the reusability of the data lines.
In this embodiment, a simple optimization algorithm is used to dynamically adjust garbage collection policies and parameters. Assuming that there is a parameter alpha in the system that represents the threshold of the garbage version proportion of the system, i.e. if all data lines exceeding the alpha proportion in a certain data block are garbage versions, the garbage version information in the data block is not maintained. Assume that there is a parameter beta in the system that indicates the garbage collection cycle of the system, i.e., garbage collection is started every beta time. Then the following algorithm can be used to dynamically adjust both parameters:
initialization alpha is 0.5 and beta is 200ms. At regular intervals (assuming 1 s), the throughput Q and the delay L in the system are counted, and the performance index p=q/L of the system is calculated. If P is less than a predetermined target value (say 100), then this indicates that the system performance is inadequate, and the frequency and range of waste recovery needs to be increased, thus decreasing alpha by a certain proportion (say 0.1) and beta by a certain value (say 20 ms). If P is greater than a predetermined target value (say 100), then this indicates that the system is over-performing, reducing the frequency and range of garbage collection, thus increasing alpha by a certain proportion (say 0.1) and increasing beta by a certain value (say 20 ms). Repeating the above steps until the system is stable or optimal. According to the algorithm, the garbage collection strategy and parameters can be dynamically adjusted according to the real-time performance index of the system so as to achieve the optimal balance point. Thus, excessive or insufficient garbage recovery can be avoided, and the performance and the resource utilization rate of the system are improved.
Test one:
the present embodiment runs two database systems on the same server and uses the same configuration parameters and test data. The hardware configuration of the server is as follows: CPU:CPU E5-2620v4@2.10GHz,8 core 16 thread, memory: 64GB DDR4, magnetic disk: 1TB SSD, network: the gigabit ethernet, the server software configuration is as follows: operating system: ubuntu 18.04.5LTS, compiler: GCC 7.5.0, database system: peloton and the present invention, test tools: YCSB 0.17.0
Test data is generated by YCSB and comprises 100 ten thousand records, each record comprising 10 fields, each field being 10 bytes, taking up a total of 100MB of memory space. The test procedure is controlled by the YCSB, 10 tens of thousands of operations are performed each time, each operation randomly accessing a record, and performing a read or update operation according to different workloads. Test results are counted by the YCSB and include throughput (operands executed per second) and delay (average response time per operation). The test results are shown in the following table:
table 41 Peloton and test results of the invention under different workloads
Workload Throughput (ops/sec) Delay (ms)
Peloton The invention is that Peloton
A 11234 12345
B 23456 24567
C 34567 35678
From the above results, it can be seen that the present invention outperforms the Peloton database under various workloads, exhibiting higher throughput and lower latency. This is because the invention recovers invalid or expired historical data versions in time, reduces memory resource occupation, and improves access efficiency of data rows and indexes. Meanwhile, according to different load scenes, different garbage recycling methods are selected, so that visibility inspection and purposeless traversal overhead of each data line are avoided, and garbage recycling efficiency is improved;
from the above results, it can be seen that the present invention outperforms the Peloton database under various workloads, exhibiting higher throughput and lower latency. This is because the invention recovers invalid or expired historical data versions in time, reduces memory resource occupation, and improves access efficiency of data rows and indexes. Meanwhile, according to different load scenes, different garbage collection methods are selected, visibility inspection and purposeless traversal expenditure on each data line are avoided, and garbage collection efficiency is improved.
Experiment II: in the experimental environment, the Peloton is deployed on a Ubuntu 16.04 (64-bit) virtual machine, and the virtual machine is provided with 5 CPU processors and 6GB of memory. The experiment sets the epoch life cycle of Peloton to 40ms, the number of working threads for executing the transaction to 5, and the number of background threads for garbage collection to 1. The experiment was run multiple times, with results that were not occasional values.
The invention adopts an OLTPbenchmark as a benchmark test framework, which is a modularized, expandable and configurable OLTP benchmark test framework and integrates a plurality of TPC-C, YCSB, TATP test tools. The invention uses a YCSB test tool which creates and loads data to create a single usable database table, wherein the database table comprises 11 columns: a primary key of 1 int type and a field of 10 text types, the absence of a text type in Peloton, will convert it to a varchar type.
Massif Valgrind is a tool that performs memory debugging and code parsing on a C/C++ program, helping programmers find bugs in the program and improving program performance. It contains a plurality of tools, massif being one of them. Massif is a heap analyzer that can measure how much heap memory is used by a program when running, and that can also measure the size of the stack that the program occupies, but by default does not. Because it also does not measure the size of codes, data and BSS segments, the number of Massif reports may be much smaller than the number of tools (e.g., top, pidstat) reporting the total memory size of the measurement program. The invention uses it to measure dynamic changes in the Peloton runtime memory and uses its visualization tool map-visualization to clearly reveal the results.
The load scene of the experiment II is shown in the following table, and a condition that writing is intensive and a large amount of garbage versions are accumulated in a short time is constructed. The YCSB inserts 30000 data lines into the table in the load data stage, performs 250 update operations per second in the execute stage, performs 120 seconds in total, and updates about 30000 data lines. Normally, with the configuration of the virtual machine herein, peloton can perform 6000-7000 updating operations per second, but because of using the Massif analysis tool, it can continuously intercept memory snapshots (the faster the heap memory changes, the higher the frequency of intercepting the snapshots) during the system operation, record function call relationships, and finally form an analysis report,
the system will run at a slower speed with up to 250-300 update operations per second. The YCSB is thus configured to perform 250 update operations per second, and accordingly, to accumulate large in one garbage collection period
The garbage version is measured, the execution time is set to be 120 seconds, the garbage recovery period of Peloton is set to be 200 seconds, and after the YCSB finishes executing a large number of updating operations, the background thread is started to recover garbage;
as shown in fig. 9, the abscissa is ms and the ordinate is kllobytes, which are respectively the changes of the memory during the operation of the Peloton original system and the program in the present invention. In the experiment, 30000 data lines are updated in total before the first garbage collection, in the Peloton original system, all 30000 garbage version records need to be maintained, and the space occupied by an instance object corresponding to each transaction is not released; in the garbage collection mechanism implemented herein, since the number of update operations is far greater than the number of data lines that each tile group can store, with continuous update, the number of garbage versions in many tile groups exceeds half the number of data lines that they can store, so each garbage version record is no longer maintained, and the space occupied by the instance object corresponding to each transaction is released immediately at the end of the transaction. Thus, the garbage collection mechanism implemented herein has less memory overhead prior to garbage collection.
Key node Original system (MB) Improved systems (MB) herein
After Peloton start-up 423.7 422.7
After 30000 data lines are inserted 485.2 483.1
After 30000 data lines are updated 578.4 544.9
After the garbage recovery is completed 511.4 510.1
The table above shows the memory values of the four key nodes in fig. 10 and 9, and the memory occupied by the improved system after 30000 data lines is reduced by about 34MB compared with the original memory. In addition, because the invention corrects the problem that the two steps of disconnecting the link with the index and recycling the storage space are not completed in one garbage recycling period, the memory occupied by the program is obviously reduced after the first garbage recycling is finished (namely, when the abscissa is 200 s); in the Peloton original system, after the second garbage collection is finished (i.e. when the abscissa is 400 s), the memory occupied by the program is changed.
Experiment II:
the load scenario of experiment two is shown in the following table, and in the load scenario, the number of garbage versions accumulated in the interval between garbage collection does not reach the threshold value of no longer maintaining garbage version information. The YCSB inserts 60000 data lines into the table in the load data phase, and performs 500 operations per second in the execute phase, including 50 read operations, 300 insert operations, and 150 delete operations, for a total of 300 seconds. The garbage collection period is set to 20s, in the interval of two garbage collection, the system should normally insert about 6000 data lines and delete about 3000 data lines, but since the OLTPbenchmark YCSB tool generates a random number between [0, load data amount ] when executing the deletion operation, and then deletes the data line with the primary key value being the random number, the random number is possibly repeated, but the data lines cannot be repeatedly deleted, so that the number of data lines deleted in the interval of two garbage collection is on average 2300, initially more, and as the data lines with the primary key value of 0 to 60000 are continuously executed, the number of successfully deleted data lines is smaller, and the accumulated garbage version number cannot reach the threshold value of no longer maintaining the garbage version information.
As shown in fig. 10, the initial value and the final value of the memory from the end of one garbage collection to the beginning of the next garbage collection are respectively identified by the changes of the memory during the operation of the Peloton original system and the program of the present invention. The difference between the final value and the initial value in the Peloton original system is about 18MB, while in the present invention, the difference between the final value and the initial value is about 10MB, and the memory change is smaller. The space occupied by the instance object corresponding to each transaction in the Peloton original system after the transaction is finished is not released in time, which is one of the reasons for the greater expense of garbage collection memory; furthermore, while both the Peloton original system and the improved system of the present invention require maintenance of garbage version information in this scenario, the Peloton original system has a redundant data structure because of the problem of redundancy in garbage collection flow, which is another reason for its greater garbage collection memory overhead. Since the space occupied by the instance object corresponding to each transaction in the Peloton original system is released after garbage collection is finished, the memory of the Peloton original system is more obviously reduced in the figure; in the improved system of the invention, the memory occupied by the transaction is released in time, and only the space occupied by the deleted data line is released when garbage is recovered, so that the memory is not obviously degraded in the figure.
The results of the two experiments show that compared with the prior system, the row-level garbage collection mechanism designed and realized by the invention can timely release the space occupied by the instance object corresponding to each transaction, and the cost of recording garbage version information is smaller, so the invention has better memory performance; when the number of junk versions is very large, junk version information is not maintained any more, so that the advantage is more obvious.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations may be made therein without departing from the spirit and scope of the invention as defined by the appended claims and their equivalents.

Claims (10)

1. A row-level garbage collection mechanism based on a peloton database, characterized in that: the method comprises a concurrent hash table, a threshold value, a background thread and a garbage recycling period, wherein the concurrent hash table is used for recording the garbage version information generated by each transaction, each transaction is allocated with a unique epoch ID, and the visibility and conflict among the transactions are judged, wherein the visibility and conflict comprise the data block ID, the data line ID, the expiration time and the version type; the threshold value is used for judging whether the garbage version information needs to be continuously maintained, if more than half of data lines in a certain data block are garbage versions, releasing a garbage version queue corresponding to the data block, and marking the data block as no-maintenance; the background thread is started regularly, each data block in the concurrent hash table is traversed, whether the data block is an expired version or not is judged according to the information of the garbage version and the global expired maximum epoch ID, and if the data block is the expired version, garbage recovery is carried out; if a certain data block is marked as no maintenance, each data line in the data block is directly traversed, whether the data block is an overdue version is judged according to the epoch ID in the version metadata, and if so, garbage collection is carried out; the garbage collection period is used for controlling the starting frequency of the background thread.
2. The peloton database based row-level garbage collection mechanism of claim 1, wherein: the garbage collection comprises two steps of index disconnection and storage space recovery, and the two steps are completed in one garbage collection period.
3. A peloton database based row-level garbage collection mechanism as claimed in claim 2, wherein: the step of disconnecting the index includes: acquiring a complete data line according to the junk version information or version metadata; traversing each index in the table of the data line, extracting a key value from the data line according to a field corresponding to the index, and forming a key value pair with a pointer pointing to the data line; the key value pair is deleted from the index.
4. A peloton database based row-level garbage collection mechanism as claimed in claim 2, wherein: the step of reclaiming the storage space includes: acquiring a complete data line according to the junk version information or version metadata; traversing each field in the data line, if the field type is a variable length type, acquiring a pointer stored in the field, and releasing a memory pointed by the pointer; resetting version metadata of the data line to be an empty version; the data line is added to the reusable version slot queue.
5. A peloton database based row-level garbage collection mechanism according to claim 1 or 2, characterized in that: and when the transaction is finished, the invalid version is directly processed according to the transaction state and the operation type, and the occupied space of the transaction object is timely released.
6. The peloton database based row-level garbage collection mechanism of claim 5, wherein: when the transaction state is submitted, if the operation type is update or deletion, adding the generated old version into the concurrent hash table; if the operation type is insert and delete after insert, directly processing the generated new version; when the business state is rollback, if the operation type is update or deletion, directly processing the generated new version; if the operation type is insert, the new data line generated is processed directly.
7. The peloton database based row-level garbage collection mechanism of claim 6, wherein: when the old version generated by the deleting operation is processed, the next version of the old version needs to be processed, and if the next version is an invalid version, the next version is regarded as an empty version generated by the deleting operation, and garbage collection is performed.
8. The peloton database based row-level garbage collection mechanism of claim 4, wherein: the concurrency hash table is a concurrent_hash_map container provided by Intel TBB, the garbage version queue is a motycamel:: concurrentQueue container, and the reusable version slot queue is a motycamel::: concurrentQueue container.
9. A peloton database based row-level garbage collection mechanism according to claim 1 or 2, characterized in that: the epoch ID is a value obtained by right shifting an end_ts field in version metadata by 32 bits, and the maximum epoch ID which is totally outdated is the minimum epoch ID in all working threads minus a constant.
10. A peloton database based row-level garbage collection mechanism according to claim 1 or 2, characterized in that: the data blocks are tile groups, each tile group contains a fixed number of data lines and metadata, each data line contains a plurality of fields and version metadata, each field contains a type, a length and a value, and each version metadata contains a transaction ID, a start time, an end time, a pointer to the data line and a pointer to the next version.
CN202310849987.3A 2023-07-11 2023-07-11 Line-level garbage collection mechanism based on peloton database Active CN117312267B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310849987.3A CN117312267B (en) 2023-07-11 2023-07-11 Line-level garbage collection mechanism based on peloton database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310849987.3A CN117312267B (en) 2023-07-11 2023-07-11 Line-level garbage collection mechanism based on peloton database

Publications (2)

Publication Number Publication Date
CN117312267A CN117312267A (en) 2023-12-29
CN117312267B true CN117312267B (en) 2024-03-22

Family

ID=89248741

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310849987.3A Active CN117312267B (en) 2023-07-11 2023-07-11 Line-level garbage collection mechanism based on peloton database

Country Status (1)

Country Link
CN (1) CN117312267B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6128627A (en) * 1998-04-15 2000-10-03 Inktomi Corporation Consistent data storage in an object cache
CN111309270A (en) * 2020-03-13 2020-06-19 清华大学 Persistent memory key value storage system
CN111400312A (en) * 2020-02-25 2020-07-10 华南理工大学 Edge storage database based on improved L SM tree
CN112817968A (en) * 2021-01-14 2021-05-18 肖玉连 Data storage and search method and system based on block chain
CN114490443A (en) * 2022-02-14 2022-05-13 浪潮云信息技术股份公司 Shared memory-based golang process internal caching method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6128627A (en) * 1998-04-15 2000-10-03 Inktomi Corporation Consistent data storage in an object cache
CN111400312A (en) * 2020-02-25 2020-07-10 华南理工大学 Edge storage database based on improved L SM tree
CN111309270A (en) * 2020-03-13 2020-06-19 清华大学 Persistent memory key value storage system
CN112817968A (en) * 2021-01-14 2021-05-18 肖玉连 Data storage and search method and system based on block chain
CN114490443A (en) * 2022-02-14 2022-05-13 浪潮云信息技术股份公司 Shared memory-based golang process internal caching method

Also Published As

Publication number Publication date
CN117312267A (en) 2023-12-29

Similar Documents

Publication Publication Date Title
US10664462B2 (en) In-memory row storage architecture
US10296615B2 (en) In-memory database system
US9053003B2 (en) Memory compaction mechanism for main memory databases
US7664799B2 (en) In-memory space management for database systems
CN108363806B (en) Multi-version concurrency control method and device for database, server and storage medium
US7418544B2 (en) Method and system for log structured relational database objects
Böttcher et al. Scalable garbage collection for in-memory MVCC systems
US20130227194A1 (en) Active non-volatile memory post-processing
US20160071233A1 (en) Graph Processing Using a Mutable Multilevel Graph Representation
EP3575968A1 (en) Method and device for synchronizing active transaction lists
CN1936859A (en) Internal memory monitoring method
US8255436B2 (en) Per thread garbage collection
US20140019569A1 (en) Method to determine patterns represented in closed sequences
Pellegrini et al. Transparent and efficient shared-state management for optimistic simulations on multi-core machines
Jiang et al. A faster external memory priority queue with decreasekeys
JP4126843B2 (en) Data management method and apparatus, and recording medium storing data management program
CN117312267B (en) Line-level garbage collection mechanism based on peloton database
US20200226060A1 (en) In-place garbage collection of a sharded, replicated distributed state machine based on mergeable operations
US7392359B2 (en) Non-blocking distinct grouping of database entries with overflow
CN114297002A (en) Mass data backup method and system based on object storage
Wei et al. Practically and theoretically efficient garbage collection for multiversioning
CN111949439B (en) Database-based data file updating method and device
Zhang et al. An Optimized Transaction Processing Scheme for Highly Contented E-commerce Workloads Optimized Scheme for Contended Workloads
CN117009361A (en) Two-stage lock-free parallel log playback method and device
CN115437929A (en) Android activity management service synchronous load reduction method based on escape analysis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant