CN116821127A - Method for realizing hash index of kv stored distributed database - Google Patents

Method for realizing hash index of kv stored distributed database Download PDF

Info

Publication number
CN116821127A
CN116821127A CN202310739518.6A CN202310739518A CN116821127A CN 116821127 A CN116821127 A CN 116821127A CN 202310739518 A CN202310739518 A CN 202310739518A CN 116821127 A CN116821127 A CN 116821127A
Authority
CN
China
Prior art keywords
hash
index
partition
hash index
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310739518.6A
Other languages
Chinese (zh)
Inventor
柴毅
徐佳庆
牟冠学
蒋家超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Yunxi Technology Co ltd
Original Assignee
Shanghai Yunxi Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Yunxi Technology Co ltd filed Critical Shanghai Yunxi Technology Co ltd
Priority to CN202310739518.6A priority Critical patent/CN116821127A/en
Publication of CN116821127A publication Critical patent/CN116821127A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2255Hash tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2471Distributed queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for realizing hash indexes of a distributed database stored by a kv, which relates to the technical field of distributed kv, and is characterized in that the method is based on a distributed database in a kv storage mode of a RocksDB, carries out the operations of creating, deleting and modifying the hash indexes, carries out the operations of inserting, updating, deleting and data backfilling related to the hash indexes, adds the plan support related to the hash indexes, stores the information of the hash indexes through metadata in the indexes, distinguishes the data stored in different partitions through modifying key prefixes of the hash indexes, distributes query requests to each partition when the query operation related to the hash indexes is executed, and solves the problem of database write-in hot spots and AP performance under certain accurate query conditions by modifying part of the plan when the query requests can be positioned to specific partitions under certain conditions.

Description

Method for realizing hash index of kv stored distributed database
Technical Field
The invention relates to the technical field of distributed kv, in particular to a method for realizing a hash index of a distributed database stored by kv.
Background
The distributed relational database is mainly designed to be extensible, strong in consistency and high in reliability. In order to improve the expansibility of the database, the distributed relational database adopts a complete decentralization architecture, the positions of all nodes in the cluster are completely equivalent, the bottom layer storage organizes data into ordered Key-Value pairs to form a KV map, the KV map is logically segmented into a large number of Key spaces according to the Range, and each Key space is called Range. Each Range is replicated and distributed to a plurality of nodes, so that high availability of the Range is guaranteed. Queries are concurrently executed in a distributed task fashion at each data node, with a Raft consistency protocol being used between multiple copies of Range. Only one of the plurality of identical Ranges is a leader and is responsible for executing corresponding kv operations. The election of leader also relies on the Raft protocol.
The key of the user table is an arbitrary byte array, and the composition mode is as follows:
a main key: globally unique tableid+primary key id+primary key code.
Common index: globally unique tableid+index id+index key code+primary key code.
The distributed relational database described herein uses range partitions by default, which have higher performance than hash partitions for common range scan queries, but the load of range partitions may become unbalanced for a particular range of scan queries, because such queries are done in a small portion of range, which is particularly true in the case of sequential insertion, and thus is more suitable for using hash partitions.
When a database frequently accesses a range, the bottom layer of the distributed relational database stores Key-values, and the Key-values are stored in a global Key space, all tables and indexes are mapped to the space, the Key space is divided into ordered continuous blocks called ranges, and each range has a certain size (configurable). As data is added/deleted, it splits/merges into more or less ranges.
When a database frequently accesses a range, the database attempts to split the frequently accessed range into multiple smaller ranges (referred to as load-based splitting), which also attempts to redistribute the ranges in the cluster according to load in order to achieve even load distribution in the cluster. If the successive insertion loads reach one of the boundaries of range, they cannot be split as such. This results in a single range hot spot. The data is appended to the end of only one range until it reaches its maximum size threshold, and then to the end of a new range, so our insert/query performance is limited by the single range performance. For this, a hash index is introduced to solve such a problem.
Disclosure of Invention
Aiming at the needs and the shortcomings of the prior art development, the invention provides a method for realizing a hash index of a kv stored distributed database.
The invention discloses a method for realizing a hash index of a kv stored distributed database, which solves the technical problems and adopts the following technical scheme:
a method for realizing hash index of a distributed database stored in kv is based on the distributed database in the kv storage mode of a RocksDB, and comprises the steps of creating, deleting and modifying the hash index, performing hash index related insertion, updating, deleting and data backfilling operations, and adding hash index related plan support, so that the execution performance of large data volume insertion/import of the database is improved, and the execution efficiency of accurate query is improved.
Optionally, when the hash index is created, according to a hash index column and a hash bucket defined by a user in the creating grammar, adding ColumnNames, columnIDs, name, ID items of information, and if data exists in the table, triggering data backfilling operation of the hash index;
when the hash index is created, the database divides the corresponding range according to the partition ID in advance, and then the data adding/deleting corresponding range splitting/merging operation is performed in each of the pre-divided range.
Preferably, the key coding mode of the hash index is as follows:
globally unique tableid+index id+partition id+index key code+primary key code.
Optionally, when deleting the hash index, the hash index data in each partition needs to be deleted.
Optionally, modifying the hash index includes modifying the partition number and partition names of the hash index, and modifying the hash index into other partition modes;
the related operations of modifying the hash index are: the index metadata is modified according to the information in the grammar, and if the partition number is related to the modification of the partition mode, the data backfilling operation is needed.
Optionally, when the hash index is subjected to operations of insertion, updating, deleting and data backfilling, the corresponding hash value is required to be calculated according to the key value corresponding to the hash distribution column, and then the hash partition ID where the data is located is obtained by modulus, and the key prefix of the index is modified according to the partition ID.
Further optionally, when the hash index performs the update operation, if the hash index performs the update operation involving modification of the hash value, the partition ID where the hash index is located needs to be recalculated by using the hash algorithm xxhash, and if the partition changes, the update operation needs to be converted into the insert+delete operation.
Further optionally, the originally ordered index data is scattered and stored into each hash partition through the hash index, and at this time, the data in each partition is ordered;
adding the plan support related to the hash index, when the query index operation is executed, the query is required to be carried out in each partition, if the result set is required to be ordered, the merging and sorting operation is required to be added on the query upper layer of each partition;
when the plan support related to the hash index is added and index inquiry is executed, the inquiry can be converted into inquiry operation in a certain partition by containing the precise filtering condition of the hash column in the sphere expression.
Preferably, when the hash index needs to be subjected to the lookupjoin, when the range of the index query is generated, the query request needs to be distributed to each partition, and when a specific partition can be located through the value of the hash column, the query request can be sent to the located specific partition.
Compared with the prior art, the method for realizing the hash index of the kv stored distributed database has the beneficial effects that:
(1) According to the method, the information of the hash index is stored through metadata in the index, the data stored in different partitions are distinguished and stored through modifying key prefixes of the hash index, when query operation related to the hash index is executed, a query request is generally required to be distributed to each partition, when a specific partition can be located under certain conditions, an accurate query request can be sent to the specific partition through modifying part of a plan, and the problem of database write-in hot spots and AP performance under certain accurate query conditions are solved;
(2) When incremental data are continuously inserted, the hash index is created, the performance of the hash index can be improved by about 40% compared with that of the common index, the hash index can have higher performance than the common index when the query is carried out under certain specific conditions, and the hash index has no great difference from the common index in performance under other conditions.
Drawings
Fig. 1 is a flow chart of a method according to a first embodiment of the present invention.
Detailed Description
In order to make the technical scheme, the technical problems to be solved and the technical effects of the invention more clear, the technical scheme of the invention is clearly and completely described below by combining specific embodiments.
Embodiment one:
with reference to fig. 1, this embodiment proposes a method for implementing a hash index of a kv stored distributed database, which is based on a distributed database in a kv storage mode of a RocksDB, performs creation, deletion and modification of the hash index, performs hash index related insertion, update, deletion and data backfilling operations, and adds a hash index related plan support, so as to improve execution performance of large data volume insertion/import of the database and improve execution efficiency of accurate query.
And (one) creating, deleting and modifying the hash index.
(1) When the hash index is created, according to a hash index column and a hash bucket defined by a user in the creating grammar, adding ColumnNames, columnIDs, name, ID various information, and if data exist in the table, triggering data backfilling operation of the hash index;
when the hash index is created, the database divides the corresponding range according to the partition ID in advance, and then the data adding/deleting corresponding range splitting/merging operation is performed in each of the pre-divided range.
Preferably, the key coding mode of the hash index is as follows:
globally unique tableid+index id+partition id+index key code+primary key code.
(2) When deleting the hash index, the hash index data in each partition needs to be deleted.
(3) Modifying the hash index comprises modifying the partition number and partition names of the hash index and modifying the hash index into other partition modes;
the related operations of modifying the hash index are: the index metadata is modified according to the information in the grammar, and if the partition number is related to the modification of the partition mode, the data backfilling operation is needed.
And (II) performing hash index related insertion, updating, deleting and data backfilling operations.
When the hash index is subjected to insertion, updating, deleting and data backfilling operations, the corresponding hash value is required to be calculated according to the key value corresponding to the hash distribution column, and then modulo is carried out on the hash value to obtain a hash partition ID where the data is located, and the key prefix of the index is modified according to the partition ID.
When the hash index is updated, if the hash index is updated and the value of the hash is modified, the partition ID of the partition needs to be recalculated by using a hash algorithm xxhash, and if the partition is changed, the updating operation needs to be changed into an insert-delete operation.
And (III) adding the plan support related to the hash index.
The originally ordered index data is scattered and stored into each hash partition through the hash index, and at the moment, the data in each partition are ordered.
Adding the plan support related to the hash index, when the query index operation is executed, the query is required to be carried out in each partition, if the result set is required to be ordered, the merging and sorting operation is required to be added on the query upper layer of each partition;
when the plan support related to the hash index is added and index inquiry is executed, the inquiry can be converted into inquiry operation in a certain partition by containing the precise filtering condition of the hash column in the sphere expression.
When the hash index needs to be subjected to the lookupjoin, when the range of index query is generated, the query request needs to be distributed to each partition, and when a specific certain partition can be located through the value of the hash column, the query request can be sent to the located specific partition.
In summary, by adopting the method for realizing the hash index of the kv stored distributed database, which is disclosed by the invention, the data stored in different partitions can be distinguished by modifying the key prefix of the hash index, and the accurate query request can be sent to the data by modifying part of the plan, so that the problem of database write-in hot spot and the AP performance under certain accurate query conditions are solved.
The foregoing has outlined rather broadly the principles and embodiments of the present invention in order that the detailed description of the invention may be better understood. Based on the above-mentioned embodiments of the present invention, any improvements and modifications made by those skilled in the art without departing from the principles of the present invention should fall within the scope of the present invention.

Claims (9)

  1. The method is characterized in that the method is based on a distributed database in a RocksDB in a kv storage mode, comprises the steps of creating, deleting and modifying the hash index, performing hash index related insertion, updating, deleting and data backfilling operations, and adding hash index related planning support, so that the execution performance of large data volume insertion/import of the database is improved, and the execution efficiency of accurate query is improved.
  2. 2. The method for realizing the hash index of the kv stored distributed database according to claim 1, wherein when the hash index is created, each item of information ColumnNames, columnIDs, name, ID is added according to a hash index column and a hash bucket defined by a user in a creating grammar, and if data exists in a table, a data backfilling operation of the hash index is triggered;
    when the hash index is created, the database divides the corresponding range according to the partition ID in advance, and then the data adding/deleting corresponding range splitting/merging operation is performed in each of the pre-divided range.
  3. 3. The method for implementing the hash index of the kv stored distributed database according to claim 2, wherein the key coding mode of the hash index is:
    globally unique tableid+index id+partition id+index key code+primary key code.
  4. 4. The method for implementing a hash index of a kv stored distributed database according to claim 2, wherein when deleting the hash index, the hash index data in each partition needs to be deleted.
  5. 5. The method for implementing the hash index of the kv stored distributed database according to claim 2, wherein the modification of the hash index includes modifying the partition number, partition name, and modifying the hash index into other partition modes;
    the related operations of modifying the hash index are: the index metadata is modified according to the information in the grammar, and if the partition number is related to the modification of the partition mode, the data backfilling operation is needed.
  6. 6. The method for implementing the hash index of the kv stored distributed database according to claim 5, wherein when the hash index is subjected to operations of insertion, update, deletion and data backfilling, the corresponding hash value is calculated according to the key value corresponding to the hash distribution column, and then the hash partition ID where the data is located is obtained by taking a modulus of the hash value, and the key prefix of the index is modified according to the partition ID.
  7. 7. The method according to claim 6, wherein when the hash index performs an update operation, if the hash index performs an update operation involving a modification of a hash value, it is necessary to recalculate the partition ID in which the hash index is located using a hash algorithm xxhash, and if the partition changes, it is necessary to change the update operation into an insert+delete operation.
  8. 8. The kv-stored distributed database hash index implementation method according to claim 7, wherein the originally ordered index data is scattered and stored into each hash partition through the hash index, and at this time, the data in each partition is ordered;
    adding the plan support related to the hash index, when the query index operation is executed, the query is required to be carried out in each partition, if the result set is required to be ordered, the merging and sorting operation is required to be added on the query upper layer of each partition;
    when the plan support related to the hash index is added and index inquiry is executed, the inquiry can be converted into inquiry operation in a certain partition by containing the precise filtering condition of the hash column in the sphere expression.
  9. 9. The method for implementing a hash index of a kv stored distributed database according to claim 8, wherein when the hash index needs to perform a look-up join, when generating a range of index queries, a query request needs to be distributed to each partition, and when a specific partition can be located by a value of a hash column, the query request can be sent to the located specific partition.
CN202310739518.6A 2023-06-21 2023-06-21 Method for realizing hash index of kv stored distributed database Pending CN116821127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310739518.6A CN116821127A (en) 2023-06-21 2023-06-21 Method for realizing hash index of kv stored distributed database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310739518.6A CN116821127A (en) 2023-06-21 2023-06-21 Method for realizing hash index of kv stored distributed database

Publications (1)

Publication Number Publication Date
CN116821127A true CN116821127A (en) 2023-09-29

Family

ID=88116097

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310739518.6A Pending CN116821127A (en) 2023-06-21 2023-06-21 Method for realizing hash index of kv stored distributed database

Country Status (1)

Country Link
CN (1) CN116821127A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117633024A (en) * 2024-01-23 2024-03-01 天津南大通用数据技术股份有限公司 Database optimization method based on preprocessing optimization join

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117633024A (en) * 2024-01-23 2024-03-01 天津南大通用数据技术股份有限公司 Database optimization method based on preprocessing optimization join
CN117633024B (en) * 2024-01-23 2024-04-23 天津南大通用数据技术股份有限公司 Database optimization method based on preprocessing optimization join

Similar Documents

Publication Publication Date Title
Fagin et al. Extendible hashing—a fast access method for dynamic files
CN105975587B (en) A kind of high performance memory database index organization and access method
CA1214284A (en) Sparse array bit map used in data bases
US8224829B2 (en) Database
US4611272A (en) Key-accessed file organization
US6473774B1 (en) Method and apparatus for record addressing in partitioned files
CN105117415B (en) A kind of SSD data-updating methods of optimization
CN109299113B (en) Range query method with storage-aware mixed index
US20100114843A1 (en) Index Compression In Databases
JP2015090615A (en) System and method for managing data
JPH07191891A (en) Computer method and storage structure for storage of, and access to, multidimensional data
US6745198B1 (en) Parallel spatial join index
CN116821127A (en) Method for realizing hash index of kv stored distributed database
JPH1131096A (en) Data storage/retrieval system
US11675743B2 (en) Web-scale distributed deduplication
CN112632068A (en) Solution for rapidly providing mass data query service
Otoo et al. A mapping function for the directory of a multidimensional extendible hashing
CN115718819A (en) Index construction method, data reading method and index construction device
US20050240595A1 (en) Dynamic redistribution of a distributed memory index when individual nodes have different lookup indexes
CN117573676A (en) Address processing method and device based on storage system, storage system and medium
Jensen et al. Optimality in external memory hashing
US20220365905A1 (en) Metadata processing method and apparatus, and a computer-readable storage medium
KR100325688B1 (en) A method for controlling directory splits of the extendible hashing
JPH08235040A (en) Data file management system
Otoo A multidimensional digital hashing scheme for files with composite keys

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination