CN116561124A - File merging method and device for time sequence database - Google Patents

File merging method and device for time sequence database Download PDF

Info

Publication number
CN116561124A
CN116561124A CN202310468738.XA CN202310468738A CN116561124A CN 116561124 A CN116561124 A CN 116561124A CN 202310468738 A CN202310468738 A CN 202310468738A CN 116561124 A CN116561124 A CN 116561124A
Authority
CN
China
Prior art keywords
merging
file
space
files
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310468738.XA
Other languages
Chinese (zh)
Inventor
王建民
黄向东
乔嘉林
侯昊男
张金瑞
刘旭鑫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianmou Technology Beijing Co ltd
Tsinghua University
Original Assignee
Tianmou Technology Beijing Co ltd
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianmou Technology Beijing Co ltd, Tsinghua University filed Critical Tianmou Technology Beijing Co ltd
Priority to CN202310468738.XA priority Critical patent/CN116561124A/en
Publication of CN116561124A publication Critical patent/CN116561124A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2272Management thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/24569Query processing with adaptation to specific hardware, e.g. adapted for using GPUs or SSDs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2457Query processing with adaptation to user needs
    • G06F16/24578Query processing with adaptation to user needs using ranking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2474Sequence data queries, e.g. querying versioned data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5038Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the execution order of a plurality of tasks, e.g. taking priority or time dependency constraints into consideration
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5018Thread allocation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5021Priority
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and a device for merging files in a time sequence database, wherein the files in the time sequence database are stored in a sequential space or an unordered space, and the method comprises the following steps: after a file merging thread of the time sequence database is triggered, determining a file merging task executable by the time sequence database; buffering the file merging tasks into a global queue according to a descending priority mode; and sequentially executing file merging tasks in the global queue. The invention selects all executable file merging tasks at one time and then caches the selected file merging tasks in the global queue, and the file merging tasks are executed all the time as long as the global queue has the file merging tasks which are not executed, thereby avoiding the problem of foreground read-write performance fluctuation caused by executing a large number of file merging tasks at specific time.

Description

File merging method and device for time sequence database
Technical Field
The present invention relates to the field of computer data management technologies, and in particular, to a method and an apparatus for merging files in a time-series database.
Background
With the development of the internet of things, a large number of sensors are deployed in the production and the living process, so that the amount of time series data and the processing demands are increased. However, the conventional relational database cannot well meet the demands of writing, storing, querying and the like of the time series data, so that a time series database specially designed for storing and analyzing the time series data is generated. Currently popular timing databases are Apache IoTDB, influxdb, openTSDB, etc., which typically use LSM trees as their storage engines in order to meet the high load of writing in a timing data scenario.
The LSM tree storage structure comprises a Memable data storage structure positioned in a memory and an SSTable data storage structure positioned in a disk, wherein the Memable is used for storing data which is just written by a user and persisting in the STable after the data is accumulated to a certain size. In order to speed up queries, reduce the number of files, increase data compression rate, etc., databases using LSM tree storage structures will periodically merge data of several files into one file in the background, a process called file merge. Because the file merging needs to consume multiple system resources such as CPU, IO, memory and the like, the existing time sequence database can not execute the file merging when a specific time (such as too slow inquiry, data brushing and the like) is waited, and the mode easily brings fluctuation to the foreground read-write performance of the database.
Therefore, it is desirable to provide a new time-series database file merging method.
Disclosure of Invention
In order to solve the problems, the invention provides a file merging method and a device for a time sequence database, which can select all executable file merging tasks at one time and then buffer the selected executable file merging tasks in a global queue, so long as the global queue and the file merging tasks are not executed, the file merging tasks can be executed all the time, and the problem that the read-write performance of a foreground fluctuates due to the fact that a large number of file merging tasks are executed at specific time is avoided.
In a first aspect, the present invention provides a method for merging files in a time-series database, the files in the time-series database being stored in a sequential space or an unordered space, the method comprising:
after a file merging thread of the time sequence database is triggered, determining a file merging task executable by the time sequence database;
buffering the file merging tasks into a global queue according to a descending priority mode;
and sequentially executing file merging tasks in the global queue.
According to the file merging method for the time sequence database, provided by the invention, the file merging thread of the time sequence database is triggered at fixed time.
According to the file merging method for the time sequence database, the sequence space stores sequence files, and the disorder space stores disorder files; the determining the executable file merging task of the time sequence database comprises the following steps:
generating a sequence space merging candidate file list, an disordered space merging candidate file list and a cross-space merging candidate file list according to the sequence file and the disordered file;
inputting the sequence space merging candidate file list into a pre-stored selector, so that the selector generates one or more sequence space merging tasks according to a built-in merging selection strategy;
inputting the out-of-order space merging candidate file list into the selector, so that the selector generates one or more out-of-order space merging tasks according to a built-in merging selection strategy;
inputting the cross-space merging candidate file list into the selector, so that the selector generates one or more cross-space merging tasks according to a built-in merging selection strategy;
and taking the sequential space merging task, the out-of-order space merging task and the cross-space merging task as file merging tasks executable by the time sequence database.
According to the file merging method for the time sequence database, the generation of a cross-space merging candidate file list according to the sequence file and the disordered file comprises the following steps:
for any disordered file in the disordered space, searching a sequence file with data overlapping with any disordered file in the sequence space;
writing the cross-space file into a first list which is initially empty under the condition that the capacity of the cross-space file formed by any one of the disordered files and all sequence files overlapped with the data of the disordered files is not larger than a first preset capacity threshold;
traversing all the disordered files in the disordered space to obtain a final first list;
and taking the final first list as the cross-space merging candidate file list.
According to the file merging method for the time sequence database provided by the invention, the generating of the sequence space merging candidate file list/the disorder space merging candidate file list according to the sequence file and the disorder file comprises the following steps:
writing the sequential files/disordered files which are not contained in the cross-space merging candidate file list in the sequential space/disordered space into an initially empty second list according to the existing arrangement sequence;
and taking the written second list as the sequence space merging candidate file list/the disordered space merging candidate file list.
According to the file merging method for the time sequence database, the merging selection strategy is that files to be merged under one file merging task are continuous in the corresponding candidate file list, the total capacity of the files to be merged under one file merging task is smaller than a second preset capacity threshold, and the number of the files to be merged under one file merging task is smaller than a preset number threshold.
According to the file merging method for the time sequence database provided by the invention, the determining process of the priority descending sequence of all the file merging tasks executable by the time sequence database comprises the following steps:
respectively carrying out priority descending sequencing on all cross-space merging tasks, all sequential space merging tasks and all disordered space merging tasks in the file merging tasks according to the sequence from late to early of the data writing time so as to correspondingly obtain a first priority descending sequence, a second priority descending sequence and a third priority descending sequence;
and splicing the priority ordering sequences in a mode that the first priority ordering sequence is in front, the second priority ordering sequence is centered and the third priority ordering sequence is in rear, so as to obtain the priority ordering sequences of all file merging tasks.
According to the file merging method for the time sequence database provided by the invention, the execution process of the sequential space merging task comprises the following steps:
extracting files to be combined under the sequential space combining task and performing data splicing to obtain a combined file corresponding to the sequential space combining task;
the execution process of the out-of-order space merging task/the cross-space merging task comprises the following steps:
extracting the files to be merged under the out-of-order space merging task/the cross-space merging task, and performing data decompression, sequencing and recompression to obtain merging files corresponding to the out-of-order space merging task/the cross-space merging task.
According to the file merging method for the time sequence database, the merging files corresponding to the sequence space merging tasks and the merging files corresponding to the cross-space merging tasks are written into the sequence space, and the merging files corresponding to the disordered space merging tasks are written into the disordered space.
In a second aspect, the present invention provides a file merging apparatus for a time-series database in which files are stored in a sequential space or an unordered space, the apparatus comprising:
the determining module is used for determining executable file merging tasks of the time sequence database after the file merging threads of the time sequence database are triggered;
the buffer module is used for buffering the file merging tasks into a global queue in a descending priority mode;
and the execution module is used for sequentially executing the file merging tasks in the global queue.
The invention provides a method and a device for merging files in a time sequence database, wherein the files in the time sequence database are stored in a sequential space or an unordered space, and the method comprises the following steps: after a file merging thread of the time sequence database is triggered, determining a file merging task executable by the time sequence database; buffering the file merging tasks into a global queue according to a descending priority mode; and sequentially executing file merging tasks in the global queue. The invention selects all executable file merging tasks at one time and then caches the selected file merging tasks in the global queue, and the file merging tasks are executed all the time as long as the global queue has the file merging tasks which are not executed, thereby avoiding the problem of foreground read-write performance fluctuation caused by executing a large number of file merging tasks at specific time.
Drawings
In order to more clearly illustrate the invention or the technical solutions of the prior art, the following description will briefly explain the drawings used in the embodiments or the description of the prior art, and it is obvious that the drawings in the following description are some embodiments of the invention, and other drawings can be obtained according to the drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a method for merging files in a time-series database according to the present invention;
FIG. 2 is a schematic diagram of a time-series database file merging task scheduling framework provided by the invention;
FIG. 3 is a schematic diagram of a file merging apparatus for a time-series database according to the present invention;
fig. 4 is a schematic structural diagram of an electronic device provided by the present invention;
reference numerals:
410: a processor; 420: a communication interface; 130: a memory; 440: a communication bus.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be clearly and completely described below with reference to the accompanying drawings, and it is apparent that the described embodiments are some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
The file merging method and apparatus for a time series database of the present invention are described below with reference to fig. 1 to 4.
In a first aspect, the present invention provides a method for merging files in a time-series database, where the time-series database specifically refers to a time-series database having two time-series data storage spaces, namely a sequential space and an unordered space, the time-series data in the sequential space do not overlap with each other, and the files are kept ordered; the sequential data of the unordered space may overlap the sequential space or overlap each other, and the files remain ordered and unordered. As shown in fig. 1, the method includes:
s11: after a file merging thread of the time sequence database is triggered, determining a file merging task executable by the time sequence database;
s12: buffering the file merging tasks into a global queue according to a descending priority mode;
s13: and sequentially executing file merging tasks in the global queue.
According to the file merging method for the time sequence database, all executable file merging tasks are selected at one time and then cached in the global queue, so that the file merging tasks are executed all the time as long as the global queue has no file merging tasks executed, and the problem that the read-write performance of a foreground fluctuates due to the fact that a large number of file merging tasks are executed at specific time is avoided.
Specifically, the file merging thread of the time sequence database is triggered regularly.
In view of the fact that the scheduling of the merging task of the existing time sequence database (such as a level db) is mostly and excessively triggered by the influence of the file size and the query times, the performance of the database often shakes due to the unstable time of merging triggering. Therefore, the invention triggers the file merging thread of the time sequence database at fixed time so as to avoid the generation of the problems.
Specifically, for a sequential database having two sequential data storage spaces, namely a sequential space and an unordered space, a hard disk file includes a sequential file located in the sequential space and an unordered file located in the unordered space, and each type of sequential data in the sequential space is kept ordered in the file and between the files; each type of sequential data in the out-of-order space remains ordered within the files, unordered between files (which may overlap sequential or other out-of-order files).
In view of this, the file merging of the present invention is divided into three types, namely, in-order space merging, out-of-order space merging and cross-space merging, and the files involved in the in-order space merging are all sequential files, the files involved in the out-of-order space merging are all out-of-order files, and the files involved in the cross-space merging have sequential files and also have out-of-order files.
For the three file merging types, the invention determines the executable file merging task of the time sequence database according to the step S11.
The step S11 includes:
s11.1: generating a sequence space merging candidate file list, an disordered space merging candidate file list and a cross-space merging candidate file list according to files in the time sequence database;
s11.2: inputting the sequence space merging candidate file list into a pre-stored selector, so that the selector generates one or more sequence space merging tasks according to a built-in merging selection strategy;
s11.3: inputting the out-of-order space merging candidate file list into the selector, so that the selector generates one or more out-of-order space merging tasks according to a built-in merging selection strategy;
s11.4: inputting the cross-space merging candidate file list into the selector, so that the selector generates one or more cross-space merging tasks according to a built-in merging selection strategy;
s11.5: and taking the sequential space merging task, the out-of-order space merging task and the cross-space merging task as file merging tasks executable by the time sequence database.
Further, in S11.1, generating a cross-space merge candidate file list according to the files in the time sequence database includes:
S11.1-A: for any disordered file in the disordered space, searching a sequence file with data overlapping with any disordered file in the sequence space;
for example: any of the random files includes: and (3) searching for a sequential file with data overlapping with the sequential data A from a sequential space to obtain sequential files X and Y, searching for a sequential file with data overlapping with the sequential data B from the sequential space to obtain a sequential file Z, searching for a sequential file with data overlapping with the sequential data C from the sequential space to obtain a sequential file B, and searching for a sequential file with data overlapping with the sequential data D from the sequential space to obtain sequential files R and U, wherein the sequential files with data overlapping with any disordered file are X, Y, Z, B, R and U.
Taking the time stamp of the time sequence data A in any disordered file as an example, if the time stamp of the time sequence data A in any disordered file comprises any time point of 2-10, such as 1-5, 3-9, 7-11, etc., the data overlap exists between any disordered file and any ordered file, and if the time stamp of the time sequence data A in any ordered file does not comprise any time point of 2-10, such as 11-13, the data overlap does not exist between any disordered file and any ordered file.
S11.1-B: writing the cross-space file into a first list which is initially empty under the condition that the capacity of the cross-space file formed by any one of the disordered files and all sequence files overlapped with the data of the disordered files is not larger than a first preset capacity threshold;
here, the cross-space file formed by any out-of-order file and all sequence files overlapping with any out-of-order file existence data is a file group, for example, the combination of any out-of-order file and all sequence files X, Y, Z, B, R, U overlapping with any out-of-order file existence data is not combined at this step for file combination.
The cross-space merging is necessarily performed on the basis of the cross-space files, and the capacity of the cross-space files can be used as a file to be merged under the condition that the capacity of the cross-space files is not larger than a first preset capacity threshold value, so that the problem that the cross-space files are too large to be merged is solved. And meanwhile, the problem of merging only part of files in the cross-space file is avoided.
S11.1-C: traversing all the disordered files in the disordered space to obtain a final first list;
S11.1-D: and taking the final first list as the cross-space merging candidate file list.
Further, in S11.1, according to the files in the time sequence database, the sequential spatial merging candidate file list/the out-of-order spatial merging candidate file list includes:
S11.1-I: writing the sequential files/disordered files which are not contained in the cross-space merging candidate file list in the sequential space/disordered space into an initially empty second list according to the existing arrangement sequence;
S11.1-II: and taking the written second list as the sequence space merging candidate file list/the disordered space merging candidate file list.
Because one file in the hard disk can only participate in the merging task once, the sequential file/out-of-order file that participates in sequential spatial merging/out-of-order spatial merging cannot be the sequential file/out-of-order file contained in the candidate cross-spatial merging file in the cross-spatial merging candidate file list. In addition, the sequential file/disordered file arrangement in the sequential space merge candidate file list/disordered space merge candidate file list is performed according to the sequential file/disordered file arrangement in the sequential space/disordered space, so that the data of the merged file obtained after the sequential space merge task is executed is ordered, and the merged file obtained after the disordered space merge task is executed is ordered as much as possible.
Further, the merging selection policy is that files to be merged under one file merging task are continuous in the corresponding candidate file list, the total capacity of the files to be merged under one file merging task is smaller than a second preset capacity threshold, and the number of the files to be merged under one file merging task is smaller than a preset number threshold.
In addition, the invention reserves an open interface in the selector, and allows future users or developers to formulate a merging selection strategy according to own needs.
It can be seen that the whole step S11 can be regarded as the first stage of file merging task scheduling, also referred to as the task selection stage, and the stage selects all executable file merging tasks at one time, thereby laying a foundation for file merging.
Specifically, in step S12, all the file merging tasks are submitted to a global queue, where the global queue is a priority queue, and the descending order of priorities of all the file merging tasks is given by the task priority comparator, specifically as follows:
s12.1: respectively carrying out priority descending sequencing on all cross-space merging tasks, all sequential space merging tasks and all disordered space merging tasks in the file merging tasks according to the sequence from late to early of the data writing time so as to correspondingly obtain a first priority descending sequence, a second priority descending sequence and a third priority descending sequence;
s12.2: and splicing the priority ordering sequences in a mode that the first priority ordering sequence is in front, the second priority ordering sequence is centered and the third priority ordering sequence is in rear, so as to obtain the priority ordering sequences of all file merging tasks.
The task priority comparator generates a priority descending sequence of all file merging tasks according to the principle that the priority of the recently written data is higher than the priority of the previously written data, the cross-space merging priority is higher than the merging priority in space and the merging priority in the sequence space is higher than the merging priority in the disordered space.
In addition, the invention reserves an open interface in the task priority comparator, and allows future users or developers to realize new ordering rules according to own needs.
It can be seen that step S12 can be regarded as a second stage of file merging task scheduling, also called a task ordering stage, which submits all executable file merging tasks to a priority buffer queue, thereby laying a foundation for file merging.
Specifically, in the step S13, the execution of the file merging task is performed by a thread pool with a fixed size, each thread inside the thread pool is implemented using a producer-consumer model, the file merging task with the highest priority is continuously taken from the priority queue, and an executor (Performer) is used to execute the file merging task.
The file merging task is performed, namely, the process of reading out the contents in a plurality of data files and writing the contents into a new data file after reorganization.
The executor executes sequential space merging tasks, disordered space merging tasks and cross-space merging tasks according to a preset rule, wherein:
the execution process of the sequential space merging task comprises the following steps:
extracting files to be combined under the sequential space combining task and performing data splicing to obtain a combined file corresponding to the sequential space combining task;
that is, the data are merged in the sequential space, and the merged data are orderly, so that the binary data can be directly spliced without decompressing the data, and then the spliced data are written into the disk. Compared with the traditional mode of file reading-data decompression-data sequencing-recompression, the method can save the consumption of CPU resources.
The execution process of the out-of-order space merging task/the cross-space merging task comprises the following steps:
extracting the files to be merged under the out-of-order space merging task/the cross-space merging task, and performing data decompression, sequencing and recompression to obtain merging files corresponding to the out-of-order space merging task/the cross-space merging task.
That is, the data faced by cross-space merging and out-of-order intra-space merging is unordered, and can be selectively decompressed according to the data range of each data block when executed: for example, the data which are overlapped are decompressed and sequenced, and the data which are not overlapped are not decompressed, so that CPU resources are saved. It should be noted here that, the files to be merged under the cross-space merge task refer to sequential files and unordered files corresponding to the cross-space files under the cross-space merge task.
In addition, the invention reserves an open interface in the executor, and allows future users or developers to realize new merging execution rules according to own needs.
It can be seen that the step S13 as a whole can be regarded as a third stage of file merging task scheduling, also referred to as a task execution stage, which will execute the file merging tasks in the queue one by one, so as to reduce the number of files, increase the size of the data block, and improve the query efficiency.
It should be noted that, in the above embodiment, one selector and one actuator are configured for the sequential spatial merge, the out-of-order spatial merge, and the cross-spatial merge, and in practice, one selector and one actuator may be configured for the sequential spatial merge, and one selector and one actuator are configured for the cross-spatial merge in the out-of-order spatial merge, so as to control the merge task selection and the merge task execution of three types of merges, for example, the sequential database file merge task scheduling framework illustrated in fig. 2. The method can flexibly schedule different types of merging tasks and improve merging efficiency under different loads.
The invention is further illustrated by the following examples. 20 sequential files in the time sequence database are numbered 1-20; there are 20 out-of-order files, numbered 21-40; both the first preset capacity threshold and the second preset capacity threshold are set to 2GB, and the preset number threshold is set to 30.
After the file merge thread is triggered, the file merge task schedule is as follows:
(1) An executable file merge task is determined using the selector.
Files 11-20 are used as merging tasks in a sequence space and are recorded as task 1;
files 31-40 are used as merging tasks in an out-of-order space and are recorded as task 2;
files 1-10 and files 21-30 participate in a cross-space merge task, denoted task 3.
(2) Tasks 1, 2, 3 are prioritized and then submitted to the global priority queue.
Assuming that the task priority comparator sets cross-space merge higher than in-order space merge, in-order space merge higher than in-out-of-order space merge, then task 3 has a higher priority than task 1, and task 1 has a higher priority than task 2.
(3) And the merging execution thread takes out the task with the highest priority from the global priority queue, and uses an executor to execute the merging task.
With 1 merge execution thread configured. The method comprises the steps of firstly taking out a task 3 from a priority queue, selectively decompressing and sequencing data in a merging process, obtaining a new merging file after merging is finished, then taking out a task 1 from the queue, and obtaining the new merging file after merging is finished without decompressing the data in an execution process; and finally, taking out the task 2 from the queue, selectively decompressing and sequencing the data in the execution process, and obtaining a new merged file after merging is finished.
Specifically, the merging files corresponding to the merging tasks of the sequential space and the merging files corresponding to the merging tasks of the cross-space are written into the sequential space, and the merging files corresponding to the merging tasks of the disordered space are written into the disordered space.
Merging files corresponding to merging in the sequence space and cross-space merging are sequence files, and writing the sequence files into the sequence space; and merging the corresponding merged files in the disordered space to obtain the disordered file, and writing the disordered file into the disordered space.
In a second aspect, the file merging apparatus for a time-series database provided by the present invention is described, and the file merging apparatus for a time-series database described below and the file merging method for a time-series database described above may be referred to correspondingly to each other. Fig. 3 illustrates a schematic structure of a file merging apparatus for a time-series database, in which files are stored in a sequential space or an unordered space, as shown in fig. 3, the apparatus comprising:
a determining module 21, configured to determine a file merging task executable by the time sequence database after a file merging thread of the time sequence database is triggered;
the buffer module 22 is configured to buffer the file merging task to a global queue in a descending priority manner;
and the execution module 23 is used for sequentially executing the file merging tasks in the global queue.
Based on the above embodiments, as an optional embodiment, the file merging thread of the time sequence database is triggered regularly.
On the basis of the above embodiments, as an optional embodiment, the files in the time sequence database include sequential files in a sequential space and out-of-order files in an out-of-order space; the determining module includes:
the generating unit is used for generating a sequence space merging candidate file list, an disordered space merging candidate file list and a cross-space merging candidate file list according to the files in the time sequence database;
the sequence space merging task generating unit is used for inputting the sequence space merging candidate file list into a pre-stored selector so that the selector generates one or more sequence space merging tasks according to a built-in merging selection strategy;
the out-of-order space merging task generating unit is used for inputting the out-of-order space merging candidate file list into the selector so that the selector generates one or more out-of-order space merging tasks according to an internal merging selection strategy;
the cross-space merging task generating unit is used for inputting the cross-space merging candidate file list into the selector so that the selector generates one or more cross-space merging tasks according to a built-in merging selection strategy;
and the file merging task determining unit is used for taking the sequential space merging task, the disordered space merging task and the cross-space merging task as file merging tasks which can be executed by the time sequence database.
On the basis of the above embodiments, as an alternative embodiment, the generating unit includes: a cross-space merge candidate file list generation sub-module, the cross-space merge candidate file list generation sub-module comprising:
the searching subunit is used for searching the sequence file with data overlapping with any disordered file in the disordered space for any disordered file in the disordered space;
a first writing subunit, configured to write, in a first list that is initially empty, a cross-space file that is formed by the arbitrary disordered file and all sequence files that have data overlapping with the arbitrary disordered file, when a capacity of the cross-space file is not greater than a first preset capacity threshold;
the traversing subunit is used for traversing all the disordered files in the disordered space to obtain a final first list;
and the cross-space merging candidate file list generation subunit is used for taking the final first list as the cross-space merging candidate file list.
On the basis of the foregoing embodiments, as an optional embodiment, the generating unit includes a sequential spatial merge candidate file list generating sub-module, where the sequential spatial merge candidate file list generating sub-module includes:
a second writing subunit, configured to write, in accordance with an existing arrangement order, a sequential file in the sequential space that is not included in the cross-space merge candidate file list into a second list that is initially empty;
and the sequence space merging candidate file list generating subunit is used for taking the written second list as the sequence space merging candidate file list.
The generating unit includes an unordered spatial merge candidate file list generating sub-module, which includes:
a third writing subunit, configured to write, according to an existing arrangement order, an unordered file in the unordered space that is not included in the cross-space merge candidate file list into a second list that is initially empty;
and the unordered space merging candidate file list generating subunit is used for taking the written second list as the unordered space merging candidate file list.
On the basis of the foregoing embodiments, as an optional embodiment, the merging selection policy is that files to be merged under one file merging task are continuous in the corresponding candidate file list, a total capacity of the files to be merged under one file merging task is smaller than a second preset capacity threshold, and a number of the files to be merged under one file merging task is smaller than a preset number threshold.
On the basis of the above embodiments, as an optional embodiment, the buffer module includes a priority determining unit for determining a descending order of priority of all file merging tasks executable by the time-series database;
the priority determining unit includes:
the descending order sorting subunit is used for respectively carrying out priority descending order on all cross-space merging tasks, all sequence space merging tasks and all disorder space merging tasks in the file merging tasks according to the sequence from late to early of the data writing time so as to correspondingly obtain a first priority descending order sequence, a second priority descending order sequence and a third priority descending order sequence;
and the splicing subunit is used for splicing the priority ordering sequences in a mode that the first priority ordering sequence is in front, the second priority ordering sequence is centered and the third priority ordering sequence is in rear, so as to obtain the priority ordering sequences of all file merging tasks.
Based on the above embodiments, as an optional embodiment, the merge file corresponding to the sequential space merge task and the merge file corresponding to the cross-space merge task are written into the sequential space, and the merge file corresponding to the out-of-order space merge task is written into the out-of-order space.
In a third aspect, fig. 4 illustrates a physical schematic diagram of an electronic device, as shown in fig. 4, where the electronic device may include: processor 410, communication interface (Communications Interface) 420, memory 430 and communication bus 440, wherein processor 410, communication interface 420 and memory 430 communicate with each other via communication bus 440. Processor 410 may invoke logic instructions in memory 430 to perform a file merge method for a time-sequential database, where files are stored in sequential or out-of-order space, the method comprising: after a file merging thread of the time sequence database is triggered, determining a file merging task executable by the time sequence database; buffering the file merging tasks into a global queue according to a descending priority mode; and sequentially executing file merging tasks in the global queue.
Further, the logic instructions in the memory 430 described above may be implemented in the form of software functional units and may be stored in a computer-readable storage medium when sold or used as a stand-alone product. Based on this understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art or in a part of the technical solution, in the form of a software product stored in a storage medium, comprising several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to perform all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, or other various media capable of storing program codes.
In a fourth aspect, the present invention also provides a computer program product comprising a computer program storable on a non-transitory computer readable storage medium, the computer program when executed by a processor being capable of performing a method of merging files for a time-series database provided by the methods described above, the files in the time-series database being stored in a sequential space or an out-of-order space, the method comprising: after a file merging thread of the time sequence database is triggered, determining a file merging task executable by the time sequence database; buffering the file merging tasks into a global queue according to a descending priority mode; and sequentially executing file merging tasks in the global queue.
In a fifth aspect, the present invention also provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, is implemented to perform the method for merging files for a time-series database provided by the above methods, the files in the time-series database being stored in a sequential space or an out-of-order space, the method comprising: after a file merging thread of the time sequence database is triggered, determining a file merging task executable by the time sequence database; buffering the file merging tasks into a global queue according to a descending priority mode; and sequentially executing file merging tasks in the global queue.
The apparatus embodiments described above are merely illustrative, wherein the elements illustrated as separate elements may or may not be physically separate, and the elements shown as elements may or may not be physical elements, may be located in one place, or may be distributed over a plurality of network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art will understand and implement the present invention without undue burden.
From the above description of the embodiments, it will be apparent to those skilled in the art that the embodiments may be implemented by means of software plus necessary general hardware platforms, or of course may be implemented by means of hardware. Based on this understanding, the foregoing technical solution may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as ROM/RAM, a magnetic disk, an optical disk, etc., including several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the method described in the respective embodiments or some parts of the embodiments.
Finally, it should be noted that: the above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; although the invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical scheme described in the foregoing embodiments can be modified or some technical features thereof can be replaced by equivalents; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims (10)

1. A method for merging files in a time-sequential database, the files in the time-sequential database being stored in a sequential or out-of-order space, the method comprising:
after a file merging thread of the time sequence database is triggered, determining a file merging task executable by the time sequence database;
buffering the file merging tasks into a global queue according to a descending priority mode;
and sequentially executing file merging tasks in the global queue.
2. The method of claim 1, wherein the file merge thread of the time series database is triggered periodically.
3. The file merging method for a time series database according to claim 1 or 2, wherein the sequential space stores sequential files, and the out-of-order space stores out-of-order files; the determining the executable file merging task of the time sequence database comprises the following steps:
generating a sequence space merging candidate file list, an disordered space merging candidate file list and a cross-space merging candidate file list according to the sequence file and the disordered file;
inputting the sequence space merging candidate file list into a pre-stored selector, so that the selector generates one or more sequence space merging tasks according to a built-in merging selection strategy;
inputting the out-of-order space merging candidate file list into the selector, so that the selector generates one or more out-of-order space merging tasks according to a built-in merging selection strategy;
inputting the cross-space merging candidate file list into the selector, so that the selector generates one or more cross-space merging tasks according to a built-in merging selection strategy;
and taking the sequential space merging task, the out-of-order space merging task and the cross-space merging task as file merging tasks executable by the time sequence database.
4. A file merging method for a temporal database according to claim 3, wherein said generating a list of cross-spatial merge candidate files from said sequential file and said out-of-order file comprises:
for any disordered file in the disordered space, searching a sequence file with data overlapping with any disordered file in the sequence space;
writing the cross-space file into a first list which is initially empty under the condition that the capacity of the cross-space file formed by any one of the disordered files and all sequence files overlapped with the data of the disordered files is not larger than a first preset capacity threshold;
traversing all the disordered files in the disordered space to obtain a final first list;
and taking the final first list as the cross-space merging candidate file list.
5. The file merging method for a time series database according to claim 4, wherein the generating a sequential spatial merge candidate file list/an out-of-order spatial merge candidate file list from the sequential file and the out-of-order file comprises:
writing the sequential files/disordered files which are not contained in the cross-space merging candidate file list in the sequential space/disordered space into an initially empty second list according to the existing arrangement sequence;
and taking the written second list as the sequence space merging candidate file list/the disordered space merging candidate file list.
6. A method of merging files for a time series database according to claim 3, wherein the merging selection policy is that files to be merged under one file merging task are consecutive in the corresponding candidate file list, the total capacity of files to be merged under one file merging task is smaller than a second preset capacity threshold and the number of files to be merged under one file merging task is smaller than a preset number threshold.
7. A method for file merging according to claim 3, characterized in that the determining of the descending order of priority sequence of all file merging tasks executable by the time series database comprises:
respectively carrying out priority descending sequencing on all cross-space merging tasks, all sequential space merging tasks and all disordered space merging tasks in the file merging tasks according to the sequence from late to early of the data writing time so as to correspondingly obtain a first priority descending sequence, a second priority descending sequence and a third priority descending sequence;
and splicing the priority ordering sequences in a mode that the first priority ordering sequence is in front, the second priority ordering sequence is centered and the third priority ordering sequence is in rear, so as to obtain the priority ordering sequences of all file merging tasks.
8. A method for merging files in a temporal database according to claim 3, wherein the execution of the sequential spatial merging task comprises:
extracting files to be combined under the sequential space combining task and performing data splicing to obtain a combined file corresponding to the sequential space combining task;
the execution process of the out-of-order space merging task/the cross-space merging task comprises the following steps:
extracting the files to be merged under the out-of-order space merging task/the cross-space merging task, and performing data decompression, sequencing and recompression to obtain merging files corresponding to the out-of-order space merging task/the cross-space merging task.
9. The method according to claim 8, wherein the merge file corresponding to the sequential space merge task and the merge file corresponding to the cross-space merge task are written into a sequential space, and the merge file corresponding to the out-of-order space merge task is written into an out-of-order space.
10. A file merging apparatus for a time-series database, the files in the time-series database being stored in a sequential space or an out-of-order space, the apparatus comprising:
the determining module is used for determining executable file merging tasks of the time sequence database after the file merging threads of the time sequence database are triggered;
the buffer module is used for buffering the file merging tasks into a global queue in a descending priority mode;
and the execution module is used for sequentially executing the file merging tasks in the global queue.
CN202310468738.XA 2023-04-26 2023-04-26 File merging method and device for time sequence database Pending CN116561124A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310468738.XA CN116561124A (en) 2023-04-26 2023-04-26 File merging method and device for time sequence database

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310468738.XA CN116561124A (en) 2023-04-26 2023-04-26 File merging method and device for time sequence database

Publications (1)

Publication Number Publication Date
CN116561124A true CN116561124A (en) 2023-08-08

Family

ID=87493998

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310468738.XA Pending CN116561124A (en) 2023-04-26 2023-04-26 File merging method and device for time sequence database

Country Status (1)

Country Link
CN (1) CN116561124A (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114020713A (en) * 2021-10-12 2022-02-08 清华大学 File merging method and device of log structure merging tree, electronic equipment and storage medium

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114020713A (en) * 2021-10-12 2022-02-08 清华大学 File merging method and device of log structure merging tree, electronic equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
LINGZHE ZHANG ETC.: "合并机制总体流程", pages 1 - 9, Retrieved from the Internet <URL:https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=181308383> *

Similar Documents

Publication Publication Date Title
CN103729480B (en) Method for rapidly finding and scheduling multiple ready tasks of multi-kernel real-time operating system
EP3432157B1 (en) Data table joining mode processing method and apparatus
CN103218455A (en) Method of high-speed concurrent processing of user requests of Key-Value database
JP2003044296A5 (en)
US20080196030A1 (en) Optimizing memory accesses for multi-threaded programs in a non-uniform memory access (numa) system
CN107273200B (en) Task scheduling method for heterogeneous storage
CN110413776B (en) High-performance calculation method for LDA (text-based extension) of text topic model based on CPU-GPU (Central processing Unit-graphics processing Unit) collaborative parallel
CN106909554B (en) Method and device for loading database text table data
WO2013032436A1 (en) Parallel operation on b+ trees
US20110023044A1 (en) Scheduling highly parallel jobs having global interdependencies
CN114217966A (en) Deep learning model dynamic batch processing scheduling method and system based on resource adjustment
CN117234710A (en) Method for realizing memory optimization of AI model training by reinforcement learning
CN117251275B (en) Multi-application asynchronous I/O request scheduling method, system, equipment and medium
Xie et al. Adaptive preshuffling in Hadoop clusters
CN116069480B (en) Processor and computing device
CN113407343A (en) Service processing method, device and equipment based on resource allocation
CN116561124A (en) File merging method and device for time sequence database
JP2018132948A (en) Loading program, loading method, and information processing device
CN108027727A (en) Dispatching method, device and the computer system of internal storage access instruction
Tang et al. A network load perception based task scheduler for parallel distributed data processing systems
CN105573834B (en) A kind of higher-dimension vocabulary tree constructing method based on heterogeneous platform
CN111756802B (en) Method and system for scheduling data stream tasks on NUMA platform
CN112068948B (en) Data hashing method, readable storage medium and electronic device
Wang et al. OPTAS: Optimal data placement in MapReduce
CN110059378B (en) Automatic manufacturing system Petri network state generation method based on GPU parallel computing

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination