CN117149081A

CN117149081A - Time sequence database storage engine construction method based on ZNS solid state disk

Info

Publication number: CN117149081A
Application number: CN202311150886.3A
Authority: CN
Inventors: 刘烈超; 刘兴斌
Original assignee: Wuhan Lugu Technology Co ltd
Current assignee: Wuhan Lugu Technology Co ltd
Priority date: 2023-09-07
Filing date: 2023-09-07
Publication date: 2023-12-01
Anticipated expiration: 2043-09-07
Also published as: CN117149081B

Abstract

The application provides a time sequence database storage engine construction method based on a ZNS solid state disk, which divides a plurality of Zones of the ZNS solid state disk into two areas, namely a Meta-Zone and an Io-Zone, wherein the Meta-Zone is used for storing logs, the Io-Zone is used for storing time sequence data files TsFile, and time sequence data with a time stamp is firstly written into a memory buffer area; when the system operates, an object writer ObjWriter is allocated for each device and each dimension in an object mode, and when the recorded File sequence is added, data in a memory buffer area is written into the TsFile, a time index is built in the File Meta Data of the memory, and all operations except the time record are recorded into a File System Meta Data log in the memory.

Description

Time sequence database storage engine construction method based on ZNS solid state disk

Technical Field

The application relates to the field of data storage, in particular to a time sequence database storage engine construction method based on a ZNS solid state disk.

Background

Along with the promotion of the related technology of the internet of things, the application demand of time series data is rapidly increased, the data writing of the internet of things has the characteristics of stability, persistence, high concurrence and high throughput, the data writing of the internet of things is usually more and less in reading, the data which is recently generated is written in by time stamps, the data updating operation is hardly or rarely carried out, the data query is usually required to be read in a time range, the data query and analysis in multiple dimensions are required to be supported, and a specially designed time series database is generated aiming at the characteristics of the time series data.

At present, a magnetic disk is generally adopted as a persistent storage medium in a time sequence database, but in some high-performance and high-concurrency application scenes, the writing bandwidth and the query throughput rate of the magnetic disk can be bottleneck, the IO access performance can be effectively improved by adopting a solid state disk, but the time sequence database generally needs to meet the calling requirements of a large number of devices, the time sequence data of a plurality of dimensions are concurrency written in, the storage engine of the conventional time sequence database can cause the mixed storage of a plurality of data streams in the solid state disk, so that a large amount of garbage is required to be recovered in the solid state disk, the overall read-write performance is reduced, and meanwhile, the service life of a product is also influenced by the writing amplification in the solid state disk. If the memory is not used as the data memory, but the received data is directly stored in the disk (i.e. the disk is a persistent storage medium), the data is much larger than the memory for reading, and the system is greatly limited by IO because the read operation is very low. The storage speed of data is also affected by frequently invoking disk write-read operations, far from optimal storage performance.

Therefore, the current time-series database storage engine scheme cannot fully exert the greatest advantage of the solid state disk.

Disclosure of Invention

The application provides a method for constructing a time sequence database storage engine based on a ZNS solid state disk, which matches a file write strategy of the time sequence database storage engine with a ZNS solid state disk Zone write strategy by utilizing the characteristic that time sequence data is recorded according to a time stamp sequence and is rarely updated. Meanwhile, aiming at a small amount of data updating operation, the garbage collection of the solid state disk and the disordered data are recombined together according to time sequence, so that the performance and service life influence caused by the write amplification in the solid state disk are eliminated.

Specifically, a time sequence database storage engine construction method based on a ZNS solid state disk, which takes the ZNS solid state disk as a persistent storage device and divides a plurality of Zones of the ZNS solid state disk into two areas of Meta-Zone and Io-Zone, is characterized in that the Meta-Zone is used for storing logs, the Io-Zone is used for storing time sequence data files TsFile, and time sequence data with time stamps is firstly written into a memory buffer area; when the system operates, an object writer ObjWriter is allocated for each device and each dimension in an object mode, and when a record File is added in sequence, data in a memory buffer area is written into TsFile, a time index is built in File Meta Data of the memory, and all operations except the time record are recorded into a File System Meta Data log in the memory; when the buffer area is full of data and/or reaches a certain threshold value, the buffer area is sequentially distributed in a Zone, the data in the memory are persisted into the TsFile, and the file metadata are modified in the writing process.

Further, the multi-device multi-dimension data share one file, according to the type and the number of the write buffers configured during the initialization of the file system, the type and the number of the write buffers include which devices are integrated into one file, different dimensions contained by each device, a single file size range and the like, the access write points are written into Actived open Zone of the ZNS solid state disk, and the number of the concurrent write of the configuration files is smaller than the Actived Open Zone number supported by the ZNS solid state disk.

Further, when the current writing point does not have the corresponding Openwriting file, creating the writing file, distributing a file descriptor corresponding to the OpenZone, modifying file system metadata and recording Journal, persisting Journal, initializing the current TsFile file metadata, writing the TsFile Header, and completing the operation of creating the writing file.

Further, when a File length reaches a certain threshold, the File is closed, file Meta Data is appended to the end of the File, file System Meta Data of the closed File is generated, and Journal is persisted.

Further, a time index is built for each file, and required time sequence data is quickly positioned according to the time stamp during inquiry; meanwhile, an index of the equipment in the file is built, and time sequence data of the corresponding equipment can be rapidly positioned; each device establishes a data dimension index catalog in the file, and can directly search time series data of corresponding dimension of the corresponding device.

Further, the TsFile files are organized into sequential files and random files according to time sequence, file sorting and index establishment are carried out after the files are written in, a merging/deleting executor adopts a minimum cost merging strategy according to life cycle information and Zone effective data information, tsFile files matched in time sequence are selected to be merged into the sequential files, expiration files with the life cycle expired are deleted, and file sorting and index establishment can be carried out on the merged sequential files.

Further, the time sequence data file TsFile includes a Header, a data area and a metadata area, wherein the data area is organized according to the sequence of the time sequence data during persistence, the time sequence data is organized into objects Obj according to the equipment and dimensions during primary persistence, the end of a piece of time sequence data with specific dimensions of specific equipment is written with an inverted index P2O, the information of the equipment and dimensions to which the time sequence data belongs is determined during index reconstruction, and the P2O also includes statistical information of the specific time sequence for accelerating time sequence data query.

Further, the metadata area includes a time sequence index Time series Index, a time sequence index is established for each dimension of each device, and an inverted index P2O is written at the end of the time sequence index for reconstructing the index and accelerating the time sequence data query; the metadata area also contains a secondary index Index Of Time series Index for fast locating time series indexes, a Bloom Filter for fast locating whether time series data is likely in the present file, and file level statistics.

Further, when the time sequence data is queried, firstly, locating the corresponding file to be searched from the global index and the Bloom Filter, then, finding the time sequence data block of the corresponding equipment and dimension through index information in the file, reading the time sequence data block, retrieving the matched time sequence data from the time sequence data block, and finally, forming all the time sequence data meeting the requirement TsFile into a set and returning the set to the application side.

The application has the beneficial effects that: compared with the background technology, the application has the beneficial effects that: by means of the method that the time sequence data storage files are sequentially recorded in ZNS solid state disk Zone, data and files with similar life cycle are written into the same Zone, and write amplification caused by multi-path data mixed storage in the solid state disk is eliminated. Meanwhile, the storage engine selects data with the minimum cost according to the effective data information of the Zone, so that the write-in amplification of the storage engine is reduced, the write-in performance is improved, the service life of a product is prolonged, and a stable and high write-in performance can be maintained; in addition, a time sequence index, a device index and a dimension index are established for the record file, so that time sequence data inquiry and analysis capability is quickened, and writing amplification of the whole storage system is greatly improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the related art, the drawings required for the description of the embodiments or the prior art will be briefly described below, and it is apparent that the drawings in the following description are only embodiments of the present application, and other drawings may be obtained according to the provided drawings without inventive effort to those skilled in the art.

The structures, proportions, sizes, etc. shown in the drawings are shown only in connection with the present disclosure, and are not intended to limit the scope of the application, since any modification, variation in proportions, or adjustment of the size, etc. of the structures, proportions, etc. should be considered as falling within the spirit and scope of the application, without affecting the effect or achievement of the objective.

FIG. 1 is a schematic diagram of a time series database storage engine according to an embodiment of the present application;

FIG. 2 is an organizational chart of a time sequence file according to an embodiment of the application;

FIG. 3 is a diagram illustrating a write and merge mechanism of a storage engine according to an embodiment of the present application;

FIG. 4 is a timing data query flow chart according to an embodiment of the present application.

Detailed Description

Embodiments of the present application will now be described more fully hereinafter with reference to the accompanying drawings, in which it is shown, however, in which some, but not all embodiments of the application are shown. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Current time series data is a set of index values with time stamps and a plurality of labels, which are typically used to record data over time. The workload characteristics of the time sequence data are as follows: most writing operation proportion can reach more than 95%, writing is sequentially added, updating is rarely performed, most writing is recently performed, index data generated in a certain past time is rarely written, therefore, batch deletion is generally performed, data in a certain index history time range is deleted, and random deletion is rarely performed.

The application provides a time sequence database storage engine construction method, which is specially designed aiming at a ZNS solid state disk and facing to a time sequence data operation mode, and is characterized in that according to the characteristic that time sequence data is written in according to time sequence and is deleted according to life cycle, and a small part of time sequence data is written out in disorder due to network and the like, a special time sequence database file system and a file operation strategy which is matched with the special time sequence database file system are designed, the file system provides the storage system with relevant indication information of the life cycle of data about each Zone and file, the storage engine writes data with similar life cycle into the same Zone or file, data movement can be reduced when the data are combined, garbage recovery of the solid state disk is transferred to the storage engine to be combined, and write amplification maintains the efficient storage performance of the time sequence database.

In a scenario of multi-device concurrent recording, the number of Actived open Zone in the ZNS solid state disk is usually far smaller than the number of devices, and not only is one device usually required to record data with multiple dimensions, but also a mode of organizing multi-device multi-dimensional data by sharing one file is designed, and as long as the number of concurrent writing of configuration files is smaller than the Actived Open Zone number supported by the ZNS solid state disk, the scenario of concurrent recording of a large number of multi-dimensional data of the devices can be met.

Establishing a time index for each file, and quickly positioning the required time sequence data according to the time stamp during inquiry; meanwhile, an index of the equipment in the file is built, so that time sequence data of the corresponding equipment can be conveniently and rapidly positioned; and establishing a data dimension index catalog in the file for each device, so that the time series data of the corresponding dimension of the corresponding device can be directly searched.

All files are divided into sequential files and random files, and are organized into two different linked lists according to time sequence, corresponding files are positioned when time sequence data are queried, and in order to optimize file searching performance, data structures such as Bloomfilter or B+ tree can be adopted to accelerate indexing.

When a record file is written, the record file is sequentially written into the ActivedOpen Zone in the corresponding ZNS solid state disk, and file statistics information is maintained for each Zone, so that when a storage engine searches a combined file, data with the minimum cost is selected according to the file statistics information of the Zone to be combined, write amplification is reduced, and system performance is maintained.

The timing database storage engine shown in fig. 1 is a schematic diagram of a timing database storage engine, and the timing database storage engine is based on a zon solid state disk as a persistent storage device, a plurality of Zone partitions of the zon solid state disk are divided into two areas of a Meta-Zone and an Io-Zone, wherein the Meta-Zone is used for storing logs, the Io-Zone is used for storing timing Data files TsFile, when time-stamped timing Data are written in, the time-stamped timing Data are written in a memory buffer area first, when the system operates, an object writer objWrite is allocated for each device and each dimension in an object mode, when a File sequence is added, data in the memory buffer area are written in the TsFile, a time index is built in File Meta Data of the memory, and all operations except the time record are recorded in File System Meta Data logs in the memory.

When writing into the memory buffer, the writing point is accessed according to the type and the number of the writing buffer configured during the initialization of the file system (comprising which devices are combined into one file, which dimensions each device comprises, the size range of a single file and the like), and the writing point is written into the Actived open Zone of the ZNS solid state disk.

When the current writing point does not have a corresponding OpenZone writing file, creating the writing file, distributing a file descriptor corresponding to the OpenZone, modifying file system metadata and recording a Journal (Journal), persisting the Journal, initializing the current TsFile file metadata, writing the TsFile Header, and completing the operation of creating the writing file.

When the buffer area is full of data and/or reaches a certain threshold value, the data in the memory is persisted into the TsFile, and is sequentially distributed in a Zone during writing, wherein the persisted file is a multi-step process, namely, the TsFile is written in batches, and the metadata of the file is modified during writing.

When a File length reaches a certain threshold, the File is closed, file Meta Data is appended to the end of the File, file System Meta Data of the closed File is generated, and the log is persisted.

As shown in fig. 2, the time sequence file organization chart includes a Header, a data area and a metadata area, wherein the data area is organized according to the sequence of the time sequence data when the time sequence data is persisted, and the time sequence data is organized into objects Obj according to the equipment and dimensions when the time sequence data is persisted once. The end of a piece of time series data with specific equipment and specific dimension is written into an inverted index P2O, equipment and dimension information of the time series data are determined when the index is rebuilt, and the P2O also contains statistical information of the time series sequence and is used for accelerating time series data inquiry. The metadata area contains a time sequence index Time series Index, a time sequence index is established for each dimension of each device, and an inverted index P2O is written at the tail of the time sequence index for reconstructing the index and accelerating the time sequence data query; the metadata area also contains a secondary index Index Of Time series Index for fast locating time series indexes, a Bloom Filter for fast locating whether time series data is likely in the present file, and file level statistics.

The storage engine writing and merging mechanism depicted in fig. 3 depicts a diagram, and TsFile files are organized in time sequence into sequential files and random files. And after the files are written in, file sorting and index establishment are carried out, and a merging/deleting executor adopts a cost minimum merging strategy according to life cycle information and Zone effective data information, selects TsFile files matched in time sequence to merge into a sequence file, and simultaneously deletes an expiration file with the expiration of the life cycle, and the sequence file generated by merging carries out file sorting and index establishment.

For random files, the time sequence database generally needs to meet the calling requirements of a large number of devices, time sequence data of a plurality of dimensions are simultaneously written in a concurrent mode, the data called by different devices relate to the plurality of dimensions, the data is disordered during data storage, and a plurality of data streams in the solid state disk are stored in a mixed mode, so that the random file data of a large amount of garbage in the solid state disk is caused to need to be identified and restored.

The application designs a random file processing engine, which is used for identifying the data of the random file, recovering decoding and writing the recovered data into an Io-Zone area of the ZNS solid state disk.

The random file processing engine comprises an out-of-order detector and a processor, wherein the metadata sequence recorded with the random file time in a certain period is decoded into a reference data sequence, the reference data sequence is sent to the out-of-order detector, the time index of each metadata in the metadata sequence is read, the processing time and the event time of each metadata are transmitted to each corresponding reference data sequence in the reference data sequence, whether the metadata are out-of-order data or not is judged according to the event time of the metadata adjacent in sequence, the out-of-order data are directly deleted, the data sequence of the reference data sequence after the out-of-order data are subjected to the processing of the processor, the files of the reference data sequence after the out-of-order data are rearranged, the time sequence index is re-established, the time sequence index is positioned to the secondary index Index Of Time series Index of the metadata time sequence index, the time sequence index is quickly positioned, the time file record is inserted in the corresponding time sequence, and all metadata in the period is covered, namely, and the interval metadata of the out-of-order data is stored in an Io-Zone area.

When the time sequence data is queried, the time sequence data query flow chart shown in fig. 4 is firstly positioned to the corresponding file to be searched from the global index and the Bloom Filter, then the time sequence data block of the corresponding equipment and dimension is found through the index information in the file, the time sequence data block is read, the matched time sequence data is retrieved from the time sequence data block, and finally all the time sequence data in the TsFile meeting the requirement are formed into a set to be returned to the application side.

In the present specification, each embodiment is described in a progressive manner, or a parallel manner, or a combination of progressive and parallel manners, and each embodiment is mainly described as a difference from other embodiments, and identical and similar parts between the embodiments are all enough to refer to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A time sequence database storage engine construction method based on a ZNS solid state disk is characterized in that a plurality of Zones of the ZNS solid state disk are divided into two areas of Meta-Zone and Io-Zone by taking the ZNS solid state disk as a persistent storage device, the method is characterized in that the Meta-Zone is used for storing logs, the Io-Zone is used for storing time sequence data files TsFile, and time sequence data with time stamps are firstly written into a memory buffer area; when the system operates, an object writer ObjWriter is allocated for each device and each dimension in an object mode, and when a record File is added in sequence, data in a memory buffer area is written into TsFile, a time index is built in File Meta Data of the memory, and all operations except the time record are recorded into a File System Meta Data log in the memory; when the buffer area is full of data and/or reaches a certain threshold value, the buffer area is sequentially distributed in a Zone, the data in the memory are persisted into the TsFile, and the file metadata are modified in the writing process.

2. The method for constructing the time-series database storage engine based on the ZNS solid state disk according to claim 1, wherein the multi-device multi-dimensional data share one file, the configured write buffer types and the configured write buffer numbers comprise which devices are integrated into one file according to the types and the numbers of write buffers configured during the initialization of a file system, each device comprises different dimensions, a single file size range and the like, access write points are written into Actived open Zone of the ZNS solid state disk, and the number of concurrent write of the configuration files is smaller than Actived Open Zone supported by the ZNS solid state disk.

3. The method for constructing the time-series database storage engine based on the ZNS solid state disk according to claim 1, wherein when the corresponding Open write file does not exist at the current write point, creating the write file, distributing a file descriptor corresponding to the Open Zone, modifying file system metadata and recording Journal, persisting Journal, initializing the current TsFile file metadata, writing the TsFile Header, and completing the operation of creating the write file.

4. The method for constructing a time-series database storage engine based on a ZNS solid state disk according to claim 1, wherein when a File length reaches a certain threshold value, closing the File, adding File Meta Data to the end of the File, generating File System Meta Data for closing the File, and persisting Journal.

5. The method for constructing the time sequence database storage engine based on the ZNS solid state disk of claim 1, wherein a time index is established for each file, and the time sequence data required by the time sequence data is quickly positioned according to the time stamp during inquiry; meanwhile, an index of the equipment in the file is built, and time sequence data of the corresponding equipment can be rapidly positioned; each device establishes a data dimension index catalog in the file, and can directly search time series data of corresponding dimension of the corresponding device.

6. The method for constructing the time sequence database storage engine based on the ZNS solid state disk according to claim 5, wherein the TsFile files are organized into sequential files and random files according to time sequence, file sorting and index establishment are performed after file writing, and a merging/deleting executor selects TsFile files matched in time sequence to merge into sequential files according to life cycle information and effective data information of the Zones by adopting a cost minimum merging strategy, and deletes expired files due to the life cycle, and file sorting and index establishment are performed on the merged sequential files.

7. The method for constructing the time sequence database storage engine based on the ZNS solid state disk as set forth in claim 5, wherein the time sequence data file TsFile comprises a Header, a data area and a metadata area, wherein the data area is organized according to the sequence of the time sequence data during persistence, the time sequence data are organized into objects Obj according to equipment and dimensions during primary persistence, the end of a piece of time sequence data with specific dimensions of a specific equipment is written into an inverted index P2O, the equipment and dimension information to which the time sequence data belong are determined during index reconstruction, and the statistical information of the specific time sequence is also contained in the P2O and used for accelerating time sequence data query.

8. The method for constructing a time-series database storage engine based on a ZNS solid state disk according to claim 7, wherein the metadata area comprises a time-series index Time series Index, wherein a time-series index is established for each dimension of each device, and an inverted index P2O is written at the end of the time-series index for reconstructing the index and accelerating the time-series data query; the metadata area also contains a secondary index Index Of Time series Index for fast locating time series indexes, a Bloom Filter for fast locating whether time series data is likely in the present file, and file level statistics.

9. The method for constructing the time sequence database storage engine based on the ZNS solid state disk according to claim 1 is characterized in that when time sequence data is queried, the time sequence data block of corresponding equipment and dimension is found through index information in a corresponding file which is located from a global index and a Bloom Filter, the time sequence data block is read, matched time sequence data is retrieved from the time sequence data block, and finally all time sequence data which meet the requirement TsFile are formed into a set to be returned to an application side.