CN115858467A - File processing method and device for key value database, electronic equipment and medium - Google Patents

File processing method and device for key value database, electronic equipment and medium Download PDF

Info

Publication number
CN115858467A
CN115858467A CN202211498375.6A CN202211498375A CN115858467A CN 115858467 A CN115858467 A CN 115858467A CN 202211498375 A CN202211498375 A CN 202211498375A CN 115858467 A CN115858467 A CN 115858467A
Authority
CN
China
Prior art keywords
string table
character string
candidate
table file
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211498375.6A
Other languages
Chinese (zh)
Inventor
胡永超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Construction Bank Corp
CCB Finetech Co Ltd
Original Assignee
China Construction Bank Corp
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Construction Bank Corp, CCB Finetech Co Ltd filed Critical China Construction Bank Corp
Priority to CN202211498375.6A priority Critical patent/CN115858467A/en
Publication of CN115858467A publication Critical patent/CN115858467A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure provides a file processing method, a file processing device, equipment, a storage medium and a program product for a key value database, relates to the technical field of computers, and can be applied to the technical field of finance. The method comprises the following steps: in response to receiving a merge operation instruction, determining a first target tier of a storage component of a key-value store; determining candidate first-order string table files stored in a first target layer; determining a candidate second sorting character string list file stored in a second target layer according to the candidate first sorting character string list file; wherein the second target layer is a storage layer one level lower than the first target layer; executing merging operation according to the candidate first sorting character string list file and the candidate second sorting character string list file to obtain a merged target sorting character string list file; and storing the target sorting character string table file to a storage component.

Description

File processing method and device for key value database, electronic equipment and medium
Technical Field
The present disclosure relates to the field of computer technologies, and more particularly, to a method, an apparatus, a device, a medium, and a program product for processing a file in a key-value store.
Background
The blockchain is a point-to-point network, which is formed by connecting nodes forming the chain and has the attribute of a distributed transaction database. The blockchain system requires that each node replicate a complete transaction database to ensure that new transactions can be independently validated. In a typical blockchain system (e.g., bitcoin system), the nodes do not rely on any central agent, and each node maintains a complete copy of the transaction database, synchronized by the distributed consensus algorithm itself. However, this characteristic determines a rapid increase in the size of the blockchain transaction database. Therefore, as the system continues to operate, the memory required by the node also needs to be expanded to support the system operation. Especially in the big data era, the ever-increasing transaction volume and data traffic of banks can bring greater challenges to data storage. Therefore, the storage database commonly used by the blockchain system with massive data is a key-value database, and the flexible expandability and the data model of the key-value database are utilized.
The storage database traditionally used by the block chain system is a LevelDB, which is a key value database based on an LSM tree (log structure merge tree) structure, and can provide a relatively high immediate writing speed, but as the transaction amount is continuously increased, when the data amount reaches an order of magnitude, the performance of the LevelDB is obviously reduced, and especially under a write-intensive load, a serious delay problem is caused.
Disclosure of Invention
In view of the foregoing, the present disclosure provides a file processing method, apparatus, device, medium, and program product for a key-value store, which may determine candidate first sorted string table files stored at a first target layer by first determining the candidate first sorted string table files when executing a merge operation instruction; and determining a candidate second sorting character string table file stored in a second target layer according to the candidate first sorting character string table file to control the sorting character string table files participating in the merging operation, so that the number of files merged each time is limited by screening the files executing the merging operation, and further, the I/O overhead and write pause caused in the file processing process are improved through the reduction of the data volume, the performance of writing data is improved, and the delay problem is improved.
According to a first aspect of the present disclosure, there is provided a file processing method for a key-value store, including: in response to receiving a merge operation instruction, determining a first target tier of a storage component of the key-value store; determining a candidate first ordered string table file stored at the first target layer; determining a candidate second sorting character string table file stored in a second target layer according to the candidate first sorting character string table file; wherein the second target layer is a storage layer one level lower than the first target layer; executing a merging operation according to the candidate first sorting character string table file and the candidate second sorting character string table file to obtain a merged target sorting character string table file; and storing the target sorting character string table file to the storage component.
According to an embodiment of the present disclosure, the determining a candidate first sorted string table file stored at the first target layer includes: determining a first key value range corresponding to each first sorting character string table file based on all the first sorting character string table files stored in the first target layer; determining a second key value range corresponding to each second sorting character string table file based on all second sorting character string table files stored in the second target layer; determining a key value overlapping rate between any first sorting character string table file and any second sorting character string table file according to the first key value range and the second key value range, and obtaining a plurality of key value overlapping rate calculation results corresponding to the first sorting character string table file; and determining the candidate first sorting character string table file according to the calculation results of the plurality of key value overlapping rates.
According to the embodiment of the present disclosure, determining, according to the first key value range and the second key value range, a key value overlap ratio between any first sorted string table file and any second sorted string table file to obtain a plurality of key value overlap ratio calculation results corresponding to the first sorted string table file includes: and determining the key value overlapping rate between any first sorting character string table file and any second sorting character string table file by adopting a probability base estimation method according to the first key value range and the second key value range, and obtaining a plurality of key value overlapping rate calculation results corresponding to the first sorting character string table file.
According to an embodiment of the present disclosure, determining the candidate first sorted character string table file according to the calculation result of the multiple key value overlapping rates includes: and taking the first sorted character string table file with the smallest key value overlapping rate in the multiple key value overlapping rate calculation results as the candidate first sorted character string table file.
According to an embodiment of the present disclosure, the method further comprises, prior to the determining the candidate first ordered string table file stored at the first target layer: determining whether the first target layer is a predetermined layer; and setting the number of files of the candidate first sorted string table file for performing the merge operation in a case where it is determined that the first target layer is the predetermined layer.
According to the embodiment of the present disclosure, the executing a merge operation according to the candidate first sorted string table file and the candidate second sorted string table file to obtain a merged target sorted string table file includes: and loading the candidate first sorting character string table file and the candidate second sorting character string table file into a memory of the key value database for sorting and merging to obtain a merged target sorting character string table file.
According to an embodiment of the present disclosure, the storing the target sorting string table file to the storage component includes: and storing the target sorting character string table file to the second target layer.
According to an embodiment of the present disclosure, the method further comprises: deleting the candidate first sorted string table file and the candidate second sorted string table file from the key-value store in response to the target sorted string table file being stored to the storage component.
A second aspect of the present disclosure provides a file processing apparatus for a key-value store, including: a first determination module to determine a first target tier of a storage component of the key-value store in response to receiving a merge operation instruction; a second determination module for determining a candidate first ordered string table file stored at the first target layer; a third determining module, configured to determine, according to the candidate first sorted string table file, a candidate second sorted string table file stored in a second target layer; wherein the second target layer is a storage layer one level lower than the first target layer; the merging operation module is used for executing merging operation according to the candidate first sorting character string list file and the candidate second sorting character string list file to obtain a merged target sorting character string list file; and the storage operation module is used for storing the target sorting character string table file to the storage component.
A third aspect of the present disclosure provides an electronic device, comprising: one or more processors; a memory for storing one or more programs, wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the above-described file processing method for a key-value store.
A fourth aspect of the present disclosure also provides a computer-readable storage medium having stored thereon executable instructions that, when executed by a processor, cause the processor to perform the above-described file processing method for a key-value store.
A fifth aspect of the present disclosure also provides a computer program product comprising a computer program which, when executed by a processor, implements the above-described file processing method for a key-value store.
In the file processing method for the key value database provided by this embodiment, when executing the merge operation instruction, the candidate first sorted string table file stored in the first target layer may be determined first; and determining a candidate second sorting character string table file stored in a second target layer according to the candidate first sorting character string table file to control the sorting character string table files participating in the merging operation, so that the number of files merged each time is limited by screening the files executing the merging operation, and further, the I/O overhead and write pause caused in the file processing process are improved through the reduction of the data volume, the performance of writing data is improved, and the delay problem is improved.
Drawings
The foregoing and other objects, features and advantages of the disclosure will be apparent from the following description of embodiments of the disclosure, which proceeds with reference to the accompanying drawings, in which:
FIG. 1 schematically illustrates an application scenario diagram of a file processing method, apparatus, device, medium and program product for a key-value store, in accordance with an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow diagram of a file processing method for a key-value store in accordance with an embodiment of the present disclosure;
FIG. 3 schematically illustrates a flow diagram of a method of file processing for a key-value store in accordance with another embodiment of the present disclosure;
FIG. 4 is a block diagram schematically illustrating a structure of a file processing apparatus for a key-value store according to an embodiment of the present disclosure; and
fig. 5 schematically illustrates a block diagram of an electronic device adapted to implement a file processing method for a key-value store in accordance with an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is illustrative only and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. Moreover, in the following description, descriptions of well-known structures and techniques are omitted so as to not unnecessarily obscure the concepts of the present disclosure.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the disclosure. The terms "comprises," "comprising," and the like, as used herein, specify the presence of stated features, steps, operations, and/or components, but do not preclude the presence or addition of one or more other features, steps, operations, or components.
All terms (including technical and scientific terms) used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs, unless otherwise defined. It is noted that the terms used herein should be interpreted as having a meaning that is consistent with the context of this specification and should not be interpreted in an idealized or overly formal sense.
Where a convention analogous to "at least one of A, B, and C, etc." is used, in general such a construction is intended in the sense one having skill in the art would understand the convention (e.g., "a system having at least one of A, B, and C" would include but not be limited to systems that have A alone, B alone, C alone, A and B together, A and C together, B and C together, and/or A, B, and C together, etc.).
The embodiment of the disclosure provides a file processing method and device for a key-value database, wherein a first target layer of a storage component of the key-value database is determined in response to receiving a merge operation instruction; determining a candidate first ordered string table file stored at the first target layer; determining a candidate second sorting character string list file stored in a second target layer according to the candidate first sorting character string list file; wherein the second target layer is a storage layer one level lower than the first target layer; executing a merging operation according to the candidate first sorting character string list file and the candidate second sorting character string list file to obtain a merged target sorting character string list file; and storing the target sorting character string table file to the storage component.
Fig. 1 schematically illustrates an application scenario diagram of a file processing method, apparatus, device, medium, and program product for a key-value store according to an embodiment of the present disclosure.
As shown in fig. 1, the application scenario 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104 and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may use terminal devices 101, 102, 103 to interact with a server 105 over a network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have installed thereon various communication client applications, such as shopping-like applications, web browser applications, search-like applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (for example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and perform other processing on the received data such as the user request, and feed back a processing result (e.g., a webpage, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that the file processing method for the key-value store provided by the embodiment of the present disclosure may be generally executed by the server 105. Accordingly, the file processing apparatus for key-value store provided by the embodiment of the present disclosure may be generally disposed in the server 105. The file processing method for the key-value database provided by the embodiment of the present disclosure may also be executed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the file processing apparatus for the key-value store provided by the embodiment of the present disclosure may also be disposed in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The following describes a file processing method for a key-value store according to the disclosed embodiment in detail with reference to fig. 2 based on the scenario described in fig. 1.
FIG. 2 schematically illustrates a flow chart of a file processing method for a key-value store according to an embodiment of the disclosure.
As shown in fig. 2, the embodiment includes operations S210 to S250, and the file processing method for the key-value store may be performed by a server.
In the technical scheme of the disclosure, the processing of data acquisition, collection, storage, use, processing, transmission, provision, disclosure, application and the like all conform to the regulations of relevant laws and regulations, necessary security measures are taken, and the customs of public sequences is not violated.
In operation S210, a first target tier of a storage component of a key-value store is determined in response to receiving a merge operation instruction.
In operation S220, candidate first-order string table files stored in the first target layer are determined.
Determining a candidate second sorted string table file stored in the second target layer according to the candidate first sorted string table file in operation S230; wherein the second target layer is a storage layer one level lower than the first target layer.
In operation S240, a merging operation is performed according to the candidate first sorted string table file and the candidate second sorted string table file to obtain a merged target sorted string table file.
In operation S250, the target sorting string table file is stored to the storage component.
The database storage system may employ policies such as In-Place Update and Out-of-Place Update when handling updates. However, the two updating strategies have respective defects in the read-write aspect aiming at the characteristics of large data volume and strong expansibility of emerging technologies in the financial field such as block chains. The LSM tree structure solves these problems, providing not only high write performance but also certain constraints on query performance and space utilization based on the merging process integrated in the structure itself.
A LSM tree based key-value store database, such as a LevelDB key-value store database. The key value database is mainly divided into a memory and a disk, transaction data in a block chain can be firstly written into the memory as a cache, and Memtable is a memory structure for organizing and maintaining data. When the storage size reaches a set threshold, the system converts the storage size into a non-modifiable memory file. The disk assemblies of the key-value store are hierarchical. After the transaction data is stored on the disk, the transaction data is divided into a plurality of levels, and the size of each level is set to be multiple times of that of the previous level. Wherein each level contains a plurality of SSTable (sorted string table) files, and key value pairs in each sorted string table file are sorted according to key, and there is also no overlapping key range between each sorted string table file. When a certain level of the sorted string table files exceeds a predetermined size, the system will invoke the compact operation.
It can be appreciated that the merge operation is one of the performance bottlenecks of the key-value store, and the resulting performance impact is particularly pronounced in the case of massive amounts of transactional data in the blockchain system. The merge operation is used to reorder merge the sorted string table files in adjacent levels where multiple key ranges overlap and store the newly generated sorted string table files where key ranges do not overlap to a storage component, such as a disk assembly. Taking the L0 layer as an example, the specific merging operation execution process can be divided into three parts: 1. heavy loading: after the L0 layer triggers the merging operation, firstly reloading the candidate sorting character string table file into the memory; 2. sorting and merging: the files participating in the merging are subjected to multi-path merging and sorting by utilizing a CPU. Whether the keys need to be stored or not can be judged by adopting a certain standard after combination, if the keys do not have the storage value, the keys are directly discarded, and if the keys need to be stored continuously, the keys are reorganized to form a new series of sorted character string table files; 3. writing back: and writing the newly generated sorting character string table file back to the L1 layer. Through the process, candidate transaction data, namely candidate sorting character string table files are processed one by one, a series of new sorting character string table files of the L1 layer are formed, and meanwhile files which originally participate in the merging operation are deleted.
In order to avoid the occurrence of redundant read/write operations during the process of executing the merge operation, which causes system performance fluctuation and causes a serious delay problem in the blockchain system. In this embodiment, the files that perform the merge operation are screened by controlling the sorted string list files that participate in the merge operation, so as to limit the number of files that are merged each time, and further, by reducing the data amount, the I/O overhead and write pause caused in the file processing process are improved, and the performance of writing data is improved.
For example, when a layer reaches the condition triggering the merge operation and the background thread starts to call the compact thread, it is first determined from which layer in the storage component of the key-value database the merge operation instruction originates, such as the Li layer, by calculating the score of each layer. Then, the files participating in the merge operation are determined, for example, the candidate sorted string table file stored in the Li layer, i.e., the candidate first sorted string table file stored in the first target layer, is first determined. Then, selecting a candidate file of the Li +1 layer, for example, determining a candidate second sorting character string table file stored in a second target layer according to the candidate first sorting character string table file; wherein the second target layer is a storage layer one level lower than the first target layer; and adding the candidate sorting character string table file of the Li layer and the candidate sorting character string table file of the Li +1 layer into a candidate set. And executing merging operation aiming at the files in the candidate set, namely the candidate first sorting character string list file and the candidate second sorting character string list file, so as to obtain a merged target sorting character string list file. And writing the newly generated target sorting character string table file back to the storage component.
In the file processing method for the key value database provided by this embodiment, when executing the merge operation instruction, the candidate first sorted string table file stored in the first target layer may be determined first; and determining a candidate second sorting character string table file stored in a second target layer according to the candidate first sorting character string table file to control the sorting character string table files participating in the merging operation, so that the number of files merged each time is limited by screening the files executing the merging operation, and further, the I/O overhead and write pause caused in the file processing process are improved through the reduction of the data volume, the performance of writing data is improved, and the delay problem is improved.
Determining candidate first ordered string table files stored at the first target layer, including: determining a first key value range corresponding to each first sorting character string table file based on all the first sorting character string table files stored in the first target layer; determining a second key value range corresponding to each second sorting character string table file based on all second sorting character string table files stored in a second target layer; determining a key value overlapping rate between any first sorting character string table file and any second sorting character string table file according to the first key value range and the second key value range, and obtaining a plurality of key value overlapping rate calculation results corresponding to the first sorting character string table files; and determining a candidate first sorting character string table file according to a plurality of key value overlapping rate calculation results.
In the embodiment, a method for restrictively selecting candidate first-order string table files is provided, and purposeful selection is adopted to further reduce the I/O overhead generated by the merging operation.
For example, the key value range of each sorted string table file in the first target layer and the second target layer is determined first. Then, determining the key value overlapping rate between any first sorting character string table file and any second sorting character string table file according to the first key value range and the second key value range; after the result of the calculation of the overlap ratio is obtained, the sorted character string table file with a smaller overlap range with the second target layer can be selected from the first target layer as the object participating in the merging operation.
According to the file processing method for the key value database, the key value overlapping rate between any first sorting character string table file and any second sorting character string table file is determined according to the first key value range and the second key value range, and a plurality of key value overlapping rate calculation results corresponding to the first sorting character string table files are obtained; and determining candidate first sorting character string table files according to a plurality of key value overlapping rate calculation results, and realizing the restrictive selection of the candidate first sorting character string table files, thereby realizing the re-loading of only a part of overlapped sorting character string table files as candidates into the memory, and further reducing the I/O overhead generated by the merging operation.
Determining the key value overlapping rate between any first sorting character string table file and any second sorting character string table file according to the first key value range and the second key value range to obtain a plurality of key value overlapping rate calculation results corresponding to the first sorting character string table file, wherein the key value overlapping rate calculation results comprise: and determining the key value overlapping rate between any first sorting character string table file and any second sorting character string table file by adopting a probability base number estimation method according to the first key value range and the second key value range to obtain a plurality of key value overlapping rate calculation results corresponding to the first sorting character string table file.
For example, when the Li layer triggers the merge operation, all sorted string table files in the layer need to be queried to obtain a file whose key value range overlaps with the minimum number of key value ranges of the Li +1 layer file, and the file is used as a candidate. In order to calculate the key value overlapping rate between files, a HyperLogLog (HLL) probability base estimator is adopted in the present embodiment. HLL is an algorithm that can estimate the number of different elements in multiple sets. Further, in the present embodiment, a metric or (overlap ratio) may be defined, which is used to represent the overlap ratio. The calculation formula of the overlap ratio may be as follows:
Figure BDA0003965795910000101
wherein, K (file) i ) And the UniK represents the number of non-overlapping unique keys generated after the merging operation is finished with a series of files at the lower layer when the ith file is used as a candidate file of the merging operation. SumK represents the total number of files participating in the merge operation process. UniK and K (file) i ) The estimation is performed using probability cardinality estimation. For merge operations that occur for the Li layers, an HLL structure is associated with each Li layer sorting string table file in the disk assembly. And calculating the overlapping rate of all the sorted character string table files in turn. And finding out the file with the minimum overlapping rate as a final candidate file through sorting.
In the file processing method for the key-value database provided in this embodiment, a probability base estimation method may be used to determine the key-value overlap rate between any first sorted character string table file and any second sorted character string table file, so as to obtain multiple key-value overlap rate calculation results corresponding to the first sorted character string table file.
Determining a candidate first sorted character string table file according to a plurality of key value overlapping rate calculation results, wherein the method comprises the following steps of: and taking the first sorting character string table file with the minimum key value overlapping rate in the multiple key value overlapping rate calculation results as a candidate first sorting character string table file.
To further reduce the I/O overhead generated by the merge operation, the first sorted string table file with the smallest overlap ratio of key values may be selected as the candidate first sorted string table file.
For example, S1, maintains two sorted key range lists in the background, which may be represented by P for Li-layer key range, Q for Li + 1-layer key range, and the key range of the first sorted list file is represented by P at Li-layer. It will be appreciated that P is a subsequence of P.
And S2, traversing the whole Q, inquiring a subsequence Q which has an overlapping range with p in the Q, and calculating and recording the overlapping rate by adopting a probability base number estimation method.
And S3, alternately querying P in the P, and repeating the second step. And simultaneously, synchronously updating the records of the overlapping rate. Then, using a sorting algorithm, comparing all the overlapping rate sizes, finding out a first sorting character string table file with the minimum overlapping rate, and finally successfully finding out a final file which meets the expected selection and adding the final file into the candidate object set participating in the merging operation.
In the file processing method for the key value database provided by this embodiment, the first sorted string table file with the smallest key value overlap ratio is used as the candidate first sorted string table file, and the sorted string table file with the smallest overlap range with the next level is selected as the merge operation object, which is beneficial to further reducing the consumption generated by each merge operation, thereby improving the overall average write performance of the block chain system.
Prior to determining a candidate first ordered string table file stored at the first target layer: determining whether the first target layer is a predetermined layer; and setting the number of files of the candidate first sorted string table file for performing the merge operation in a case where it is determined that the first target layer is the predetermined layer.
In the present embodiment, the predetermined layer may be an L0 layer, for example.
It can be understood that the merging operation process has the most significant influence on the performance of the key-value storage database, and the merging operation process of the L0 layer is the more significant factor affecting the high delay of the system.
For example, for the delay problem caused by the merging operation of the L0 layer, in order to reduce the performance loss caused by the write stall caused by the delay problem to the system, only a part of the overlapped sorted string table files may be selected as candidates to be reloaded back into the memory during the merging operation every time the L0 layer triggers the merging operation.
Fig. 3 schematically shows a flowchart of a file processing method for a key-value store according to another embodiment of the present disclosure, referring to fig. 3. In operation S301, a first target tier of a storage component of a key-value store is determined.
In operation S302, it is determined whether the first target layer is a predetermined layer, such as a predetermined layer may be a L0 layer, and then it is determined whether the merging operation is at L0 or other layers by a method of calculating a score of each layer.
If the first target layer is a predetermined layer, the number of files needs to be limited, that is, in operation S303, the number of files of the candidate first sorted string table file for performing the merge operation is set. And operation S304 is performed. If the first target layer is not the predetermined layer, operation S304 is performed directly.
In operation S304, the candidate second sorted string table file stored in the second target layer is continuously selected.
In operation S305, a merge operation is performed, for example, the candidate first sorted string table file and the candidate second sorted string table file are loaded into the memory of the key-value database for sorting and merging, so as to obtain a merged target sorted string table file.
In operation S306, storing the target sorting string table file to a storage component includes: and storing the target sorting character string table file to a second target layer.
According to the file processing method for the key value database, the number of the files participating in the merging operation can be limited by setting the number of the candidate first sorted character string table files for executing the merging operation, and the data volume is reduced.
Executing a merging operation according to the candidate first sorting character string list file and the candidate second sorting character string list file to obtain a merged target sorting character string list file, wherein the merging operation comprises the following steps: and loading the candidate first sorting character string table file and the candidate second sorting character string table file into a memory of a key value database for sorting and merging to obtain a merged target sorting character string table file.
It can be understood that, in the process of the merge operation, the file participating in the merge operation may be loaded into the memory of the key-value database, and the specific operation process may include sorting and merging the data to obtain data whose key ranges are not overlapped, that is, the merged target sorting string table file.
According to the file processing method for the key-value database, the candidate first sorting character string table file and the candidate second sorting character string table file are loaded into the memory of the key-value database for sorting and merging, so that file data is reordered and merged, and the query performance and the space utilization rate are further restrained.
Storing the target sorting string table file to a storage component, comprising: and storing the target sorting character string table file to a second target layer.
In the file processing method for the key value database provided by this embodiment, the target sorting character string table file is stored in the second target layer, so that the target sorting character string table file is stored in the storage layer of a lower hierarchy.
The file processing method for the key value database further comprises the following steps: in response to the target sorted string table file being stored to the storage component, the candidate first sorted string table file and the candidate second sorted string table file are deleted from the key-value store.
It will be appreciated that, to further free up storage space, the candidate first sorted string table file and the candidate second sorted string table file may be deleted from the key-value store after the target sorted string table file is stored to the storage component.
Based on the file processing method for the key-value database, the disclosure also provides a file processing device for the key-value database. The apparatus will be described in detail below with reference to fig. 4.
Fig. 4 schematically shows a block diagram of a file processing apparatus for a key-value store according to an embodiment of the present disclosure.
As shown in fig. 4, the file processing apparatus 400 for a key-value store of this embodiment includes a first determining module 410, a second determining module 420, a third determining module 430, a merging operation module 440, and a storage operation module 450.
A first determining module 410, configured to determine a first target tier of a storage component of a key-value store in response to receiving a merge operation instruction; a second determining module 420, configured to determine a candidate first sorted string table file stored in the first target layer; a third determining module 430, configured to determine, according to the candidate first sorted string table file, a candidate second sorted string table file stored in the second target layer; wherein the second target layer is a storage layer one level lower than the first target layer; a merging operation module 440, configured to perform a merging operation according to the candidate first sorted string table file and the candidate second sorted string table file to obtain a merged target sorted string table file; and a storage operation module 450 for storing the target sorting string table file to the storage component.
In some embodiments, the second determining module comprises: the first determining submodule is used for determining a first key value range corresponding to each first sorting character string table file based on all the first sorting character string table files stored in the first target layer; a second determining submodule, configured to determine, based on all second sorted string table files stored in the second target layer, a second key value range corresponding to each second sorted string table file; the third determining sub-module is used for determining the key value overlapping rate between any first sorting character string table file and any second sorting character string table file according to the first key value range and the second key value range to obtain a plurality of key value overlapping rate calculation results corresponding to the first sorting character string table file; and the fourth determining submodule is used for determining the candidate first sequencing character string list file according to the calculation result of the overlapping rates of the key values.
In some embodiments, the third determination submodule comprises: and the calculating unit is used for determining the key value overlapping rate between any first sorting character string table file and any second sorting character string table file by adopting a probability base number estimation method according to the first key value range and the second key value range, and obtaining a plurality of key value overlapping rate calculation results corresponding to the first sorting character string table file.
In some embodiments, the fourth determination submodule comprises: and the sorting unit is used for taking the first sorting character string table file with the minimum key value overlapping rate in the multiple key value overlapping rate calculation results as a candidate first sorting character string table file.
In some embodiments, the apparatus further comprises a setup module to, prior to determining the candidate first ordered string table file stored at the first target layer: determining whether the first target layer is a predetermined layer; and setting the number of files of the candidate first sorted character string table file for performing the merge operation in a case where it is determined that the first target layer is the predetermined layer.
In some embodiments, the merge operation module is configured to load the candidate first sorted string table file and the candidate second sorted string table file into a memory of the key value database for sorting and merging to obtain a merged target sorted string table file.
In some embodiments, the storage operation module is configured to store the target sorting string table file to the second target layer.
In some embodiments, the apparatus further includes a deletion module to delete the candidate first sorted string table file and the candidate second sorted string table file from the key-value store in response to the target sorted string table file being stored to the storage component.
According to an embodiment of the present disclosure, any plurality of the first determining module 410, the second determining module 420, the third determining module 430, the merging operation module 440, and the storing operation module 450 may be combined in one module to be implemented, or any one of the modules may be split into a plurality of modules. Alternatively, at least part of the functionality of one or more of these modules may be combined with at least part of the functionality of the other modules and implemented in one module. According to an embodiment of the present disclosure, at least one of the first determining module 410, the second determining module 420, the third determining module 430, the merging operation module 440, and the storage operation module 450 may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging a circuit, or in any one of three implementations of software, hardware, and firmware, or in any suitable combination of any of them. Alternatively, at least one of the first determining module 410, the second determining module 420, the third determining module 430, the merging operation module 440 and the storing operation module 450 may be at least partially implemented as a computer program module, which when executed, may perform a corresponding function.
Fig. 5 schematically illustrates a block diagram of an electronic device adapted to implement a file processing method for a key-value store in accordance with an embodiment of the present disclosure.
As shown in fig. 5, an electronic device 500 according to an embodiment of the present disclosure includes a processor 501 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 502 or a program loaded from a storage section 508 into a Random Access Memory (RAM) 503. The processor 501 may comprise, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or associated chipset, and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), among others. The processor 501 may also include onboard memory for caching purposes. Processor 501 may include a single processing unit or multiple processing units for performing different actions of a method flow according to embodiments of the disclosure.
In the RAM 503, various programs and data necessary for the operation of the electronic apparatus 500 are stored. The processor 501, the ROM502, and the RAM 503 are connected to each other by a bus 504. The processor 501 performs various operations of the method flows according to the embodiments of the present disclosure by executing programs in the ROM502 and/or the RAM 503. Note that the programs may also be stored in one or more memories other than the ROM502 and the RAM 503. The processor 501 may also perform various operations of method flows according to embodiments of the present disclosure by executing programs stored in the one or more memories.
According to an embodiment of the present disclosure, electronic device 500 may also include an input/output (I/O) interface 505, input/output (I/O) interface 505 also being connected to bus 504. The electronic device 500 may also include one or more of the following components connected to the I/O interface 505: an input portion 506 including a keyboard, a mouse, and the like; an output portion 507 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker; a storage portion 508 including a hard disk and the like; and a communication section 509 including a network interface card such as a LAN card, a modem, or the like. The communication section 509 performs communication processing via a network such as the internet. The driver 510 is also connected to the I/O interface 505 as necessary. A removable medium 511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 510 as necessary, so that a computer program read out therefrom is mounted into the storage section 508 as necessary.
The present disclosure also provides a computer-readable storage medium, which may be contained in the apparatus/device/system described in the above embodiments; or may exist alone without being assembled into the device/apparatus/system. The computer-readable storage medium carries one or more programs which, when executed, implement a method according to an embodiment of the disclosure.
According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. For example, according to embodiments of the present disclosure, a computer-readable storage medium may include ROM502 and/or RAM 503 and/or one or more memories other than ROM502 and RAM 503 described above.
Embodiments of the present disclosure also include a computer program product comprising a computer program containing program code for performing the method illustrated in the flow chart. When the computer program product runs in a computer system, the program code is used for causing the computer system to realize the file processing method for the key-value database provided by the embodiment of the disclosure.
The computer program performs the above-described functions defined in the system/apparatus of the embodiments of the present disclosure when executed by the processor 501. The systems, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In one embodiment, the computer program may be hosted on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program may also be transmitted, distributed in the form of a signal on a network medium, downloaded and installed through the communication section 509, and/or installed from the removable medium 511. The computer program containing program code may be transmitted using any suitable network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
In such an embodiment, the computer program may be downloaded and installed from a network through the communication section 509, and/or installed from the removable medium 511. The computer program, when executed by the processor 501, performs the above-described functions defined in the system of the embodiments of the present disclosure. The systems, devices, apparatuses, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the present disclosure.
In accordance with embodiments of the present disclosure, program code for executing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, these computer programs may be implemented using high level procedural and/or object oriented programming languages, and/or assembly/machine languages. The programming language includes, but is not limited to, programming languages such as Java, C + +, python, the "C" language, or the like. The program code may execute entirely on the user computing device, partly on the user device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Those skilled in the art will appreciate that various combinations and/or combinations of features recited in the various embodiments and/or claims of the present disclosure can be made, even if such combinations or combinations are not expressly recited in the present disclosure. In particular, various combinations and/or combinations of the features recited in the various embodiments and/or claims of the present disclosure may be made without departing from the spirit or teaching of the present disclosure. All such combinations and/or associations are within the scope of the present disclosure.
The embodiments of the present disclosure have been described above. However, these examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Although the embodiments are described separately above, this does not mean that the measures in the embodiments cannot be used in advantageous combination. The scope of the disclosure is defined by the appended claims and equivalents thereof. Various alternatives and modifications can be devised by those skilled in the art without departing from the scope of the present disclosure, and such alternatives and modifications are intended to be within the scope of the present disclosure.

Claims (12)

1. A method of file processing for a key-value store, comprising:
in response to receiving a merge operation instruction, determining a first target tier of a storage component of the key-value store;
determining a candidate first ordered string table file stored at the first target layer;
determining a candidate second sorting character string list file stored in a second target layer according to the candidate first sorting character string list file; wherein the second target layer is a storage layer one level lower than the first target layer;
executing a merging operation according to the candidate first sorting character string list file and the candidate second sorting character string list file to obtain a merged target sorting character string list file; and
and storing the target sorting character string table file to the storage component.
2. The method of claim 1, wherein said determining a candidate first ordered string table file stored at the first target layer comprises:
determining a first key value range corresponding to each first sorting character string table file based on all the first sorting character string table files stored in the first target layer;
determining a second key value range corresponding to each second sorting character string table file based on all second sorting character string table files stored in the second target layer;
determining a key value overlapping rate between any first sorting character string table file and any second sorting character string table file according to the first key value range and the second key value range, and obtaining a plurality of key value overlapping rate calculation results corresponding to the first sorting character string table file; and
and determining the candidate first sequencing character string table file according to the calculation results of the multiple key value overlapping rates.
3. The method of claim 2, wherein determining, according to the first and second key value ranges, a key value overlap ratio between any first sorted string table file and any second sorted string table file to obtain a plurality of key value overlap ratio calculation results corresponding to the first sorted string table file comprises:
and determining the key value overlapping rate between any first sorting character string table file and any second sorting character string table file by adopting a probability base estimation method according to the first key value range and the second key value range, and obtaining a plurality of key value overlapping rate calculation results corresponding to the first sorting character string table file.
4. The method of claim 2, wherein determining the candidate first sorted string table file according to the plurality of key value overlap ratio calculations comprises:
and taking the first sorted character string table file with the smallest key value overlapping rate in the multiple key value overlapping rate calculation results as the candidate first sorted character string table file.
5. The method of claim 1, further comprising, prior to said determining a candidate first ordered string table file stored at the first target layer:
determining whether the first target layer is a predetermined layer; and
setting the number of files of the candidate first sorted string table file for performing the merge operation in a case where it is determined that the first target layer is the predetermined layer.
6. The method of claim 1, wherein the performing a merge operation according to the candidate first sorted string table file and the candidate second sorted string table file to obtain a merged target sorted string table file comprises:
and loading the candidate first sorting character string table file and the candidate second sorting character string table file into a memory of the key value database for sorting and merging to obtain a merged target sorting character string table file.
7. The method of claim 1, wherein the storing the target sorted string table file to the storage component comprises:
and storing the target sorting character string table file to the second target layer.
8. The method of claim 1, further comprising:
deleting the candidate first sorted string table file and the candidate second sorted string table file from the key-value store in response to the target sorted string table file being stored to the storage component.
9. A file processing apparatus for a key-value store, comprising:
a first determination module to determine a first target tier of a storage component of the key-value store in response to receiving a merge operation instruction;
a second determination module for determining a candidate first ordered string table file stored at the first target layer;
a third determining module, configured to determine, according to the candidate first sorted string table file, a candidate second sorted string table file stored in a second target layer; wherein the second target layer is a storage layer one level lower than the first target layer;
the merging operation module is used for executing merging operation according to the candidate first sorting character string list file and the candidate second sorting character string list file to obtain a merged target sorting character string list file; and
and the storage operation module is used for storing the target sorting character string table file to the storage component.
10. An electronic device, comprising:
one or more processors;
a storage device to store one or more programs,
wherein the one or more programs, when executed by the one or more processors, cause the one or more processors to perform the method of any of claims 1-8.
11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any one of claims 1 to 8.
12. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.
CN202211498375.6A 2022-11-28 2022-11-28 File processing method and device for key value database, electronic equipment and medium Pending CN115858467A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211498375.6A CN115858467A (en) 2022-11-28 2022-11-28 File processing method and device for key value database, electronic equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211498375.6A CN115858467A (en) 2022-11-28 2022-11-28 File processing method and device for key value database, electronic equipment and medium

Publications (1)

Publication Number Publication Date
CN115858467A true CN115858467A (en) 2023-03-28

Family

ID=85666953

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211498375.6A Pending CN115858467A (en) 2022-11-28 2022-11-28 File processing method and device for key value database, electronic equipment and medium

Country Status (1)

Country Link
CN (1) CN115858467A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116561073A (en) * 2023-04-14 2023-08-08 云和恩墨(北京)信息技术有限公司 File merging method and system based on database, equipment and storage medium
CN116991794A (en) * 2023-05-24 2023-11-03 阿里云计算有限公司 Data management method, system, device, equipment and medium in data warehouse

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116561073A (en) * 2023-04-14 2023-08-08 云和恩墨(北京)信息技术有限公司 File merging method and system based on database, equipment and storage medium
CN116561073B (en) * 2023-04-14 2023-12-19 云和恩墨(北京)信息技术有限公司 File merging method and system based on database, equipment and storage medium
CN116991794A (en) * 2023-05-24 2023-11-03 阿里云计算有限公司 Data management method, system, device, equipment and medium in data warehouse

Similar Documents

Publication Publication Date Title
US11283897B2 (en) Adaptive computation and faster computer operation
US10552372B2 (en) Systems, methods, and computer-readable media for a fast snapshot of application data in storage
US10318512B2 (en) Storing and querying multidimensional data using first and second indicies
CN115858467A (en) File processing method and device for key value database, electronic equipment and medium
CN109271343B (en) Data merging method and device applied to key value storage system
US20190102427A1 (en) Online optimizer statistics maintenance during load
US20160179919A1 (en) Asynchronous data replication using an external buffer table
US20150317352A1 (en) Managing a temporal key property in a database management system
US10540360B2 (en) Identifying relationship instances between entities
EP3923155A2 (en) Method and apparatus for processing snapshot, device, medium and product
US11308058B1 (en) Building and using combined multi-type sub-indices to search NoSQL databases
CN111858366B (en) Test case generation method, device, equipment and storage medium
US11934927B2 (en) Handling system-characteristics drift in machine learning applications
US10761940B2 (en) Method, device and program product for reducing data recovery time of storage system
US20200387412A1 (en) Method To Manage Database
CN114637809A (en) Method, device, electronic equipment and medium for dynamic configuration of synchronous delay time
CN111859403A (en) Method and device for determining dependency vulnerability, electronic equipment and storage medium
US11341113B1 (en) Hybrid locking mechanism in relational database management systems
US11940975B2 (en) Database distribution to avoid contention
US20230132173A1 (en) Data reading method, device and storage medium
AU2021390717B2 (en) Batch job performance improvement in active-active architecture
CN113127238B (en) Method and device for exporting data in database, medium and equipment
CN108984450B (en) Data transmission method, device and equipment
CN115422188A (en) Table structure online changing method and device, electronic equipment and storage medium
KR20230017329A (en) Method of responding to operation, apparatus of responding to operation, electronic device, storage medium, and computer program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination