CN113495871A - File management method and device based on LSM-Tree storage engine - Google Patents

File management method and device based on LSM-Tree storage engine Download PDF

Info

Publication number
CN113495871A
CN113495871A CN202010261689.9A CN202010261689A CN113495871A CN 113495871 A CN113495871 A CN 113495871A CN 202010261689 A CN202010261689 A CN 202010261689A CN 113495871 A CN113495871 A CN 113495871A
Authority
CN
China
Prior art keywords
file
blob
sst
marking
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010261689.9A
Other languages
Chinese (zh)
Other versions
CN113495871B (en
Inventor
周越
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Xiamen Wangsu Co Ltd
Original Assignee
Xiamen Wangsu Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xiamen Wangsu Co Ltd filed Critical Xiamen Wangsu Co Ltd
Priority to CN202010261689.9A priority Critical patent/CN113495871B/en
Priority to PCT/CN2021/085631 priority patent/WO2021197493A1/en
Publication of CN113495871A publication Critical patent/CN113495871A/en
Application granted granted Critical
Publication of CN113495871B publication Critical patent/CN113495871B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/11File system administration, e.g. details of archiving or snapshots
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/1873Versioning file systems, temporal file systems, e.g. file system supporting different historic versions of files
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0608Saving storage space on storage systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Human Computer Interaction (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a file management method and device based on an LSM-Tree storage engine, and belongs to the technical field of computers. The method comprises the following steps: monitoring the version registration state of the SST file in the process of generating a Blob file and an SST file; marking the valid Blob file corresponding to the SST file by using different stage information according to different version registration states of the SST file; and processing the abnormal files of the validated Blob files regularly according to the stage information. By adopting the method and the device, the leakage of the storage space of the SMR disk can be effectively avoided.

Description

File management method and device based on LSM-Tree storage engine
Technical Field
The invention relates to the technical field of computers, in particular to a file management method and device based on an LSM-Tree storage engine.
Background
A Shingled Magnetic Recording (SMR) disk is a storage medium in which tracks are partially overlapped to increase the storage density, and may include a sequential storage area composed of a plurality of overlapped tracks and a general random access area inside. The SMR disk is applied to the cloud storage platform, so that the product cost of the cloud storage platform can be effectively reduced, but the performance of an original operating system of the cloud storage platform cannot meet the IO throughput requirement of the SMR disk at present.
The LSM-Tree storage engine is an embedded storage system that can efficiently support SMR disks. When data storage is carried out, the LSM-Tree storage engine can store the written service data in a Key-value form in a memory table, Flush the data in the memory table to an SST file after the memory table is full, and finally submit and validate the SST file by registering the version of the SST file. When the number of the SST files reaches a fixed value, the compact is further executed on a plurality of SST files so as to compress the plurality of SST files into one file, and the SST files newly generated by the compact can be submitted and validated through the registered version. So that the LSM-Tree storage engine can query the validated SST file.
The LSM-Tree storage engine also supports a KV separated data storage mode, namely, a Value with a large data volume in the service data is separated into the LSM-Tree and is independently stored in the Blob file, the SST file can store a Key Value in the service data and a storage position of the Value in the Blob file, and therefore the Key Value and the Value can be associated.
The Blob file is also generated by version registration, the version of the Blob file is managed separately, in order to ensure the security of data query, the version of the Blob file needs to be registered before the version of the SST file, otherwise, the problem of data inconsistency may occur. However, in the case of system exception, the Blob file version may be valid but the SST version is not valid, so that the Blob file is in an exception phase because no valid SST file corresponds to the Blob file, which may cause leakage of the storage space of the SMR disk.
Disclosure of Invention
In order to solve the problems in the prior art, embodiments of the present invention provide a file management method and apparatus based on an LSM-Tree storage engine. The technical scheme is as follows:
in a first aspect, a file management method based on an LSM-Tree storage engine is provided, where the method includes:
monitoring the version registration state of the SST file in the process of generating a Blob file and an SST file;
marking the valid Blob file corresponding to the SST file by using different stage information according to different version registration states of the SST file;
and processing the abnormal files of the validated Blob files regularly according to the stage information.
In a second aspect, an LSM-Tree storage engine-based file management apparatus is provided, the apparatus including:
the file generation module is used for monitoring the version registration state of the SST file in the process of generating the Blob file and the SST file;
the phase marking module is used for marking the valid Blob file corresponding to the SST file by using different phase information according to different version registration states of the SST file;
and the file processing module is used for processing the abnormal files of the effective Blob files periodically according to the stage information.
In a third aspect, a network device is provided, which includes a processor and a memory, where at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the memory, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by the processor to implement the LSM-Tree storage engine-based file management method according to the first aspect.
In a fourth aspect, a computer-readable storage medium is provided, in which at least one instruction, at least one program, a set of codes, or a set of instructions is stored, which is loaded and executed by a processor to implement the LSM-Tree storage engine based file management method according to the first aspect.
The technical scheme provided by the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, in the process of generating the Blob file and the SST file, the version registration state of the SST file is monitored; marking the valid Blob file corresponding to the SST file by using different stage information according to different version registration states of the SST file; and regularly processing the abnormal files of the validated Blob files according to the phase information. In this way, under the different version registration states of the SST file, the Blob files which are valid are marked by different stage information respectively, so that when abnormal files are processed regularly, which Blob files are in an abnormal stage can be easily identified through the stage information, invalid Blob files in an SMR disk can be released through processing the abnormal files, and the storage space of the SMR disk can be effectively prevented from being leaked.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.
FIG. 1 is a flowchart of a file management method based on an LSM-Tree storage engine according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a file management process based on an LSM-Tree storage engine according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a phase change of a Blob file according to an embodiment of the present invention;
fig. 4 is a structural diagram of a file management apparatus based on an LSM-Tree storage engine according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a network device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, embodiments of the present invention will be described in detail with reference to the accompanying drawings.
The embodiment of the invention provides a file management method based on an LSM-Tree storage engine, which can be applied to a storage cluster configured with the LSM-Tree storage engine and is mainly executed by the LSM-Tree storage engine. The storage cluster can be a cloud storage platform, the storage cluster can store service data by using an SMR disk, and data reading, writing, updating, deleting and the like in the SMR disk are managed by an LSM-Tree storage engine. The LSM-Tree storage engine can support a KV separation storage mode, and respectively stores a Value and a Key Value of service data through a Blob file and an SST file. After the Blob file and the SST file are generated by the LSM-Tree storage engine, the Blob file and the SST file can be validated in a mode of registering file versions, and the validated Blob file and the validated SST file can be respectively stored in a sequential storage area and a random storage area in an SMR disk. Each validated Blob file corresponds to a validated SST file, and after the compact of a plurality of SST files is executed, a plurality of validated Blob files can correspond to a validated SST file. In this embodiment, a storage cluster is taken as an example of a cloud storage platform for description, and other situations are similar and are not illustrated one by one.
The process flow shown in fig. 1 will be described in detail below with reference to specific embodiments, and the contents may be as follows:
step 101, in the process of generating a Blob file and an SST file, monitoring the version registration state of the SST file.
In implementation, when the cloud storage platform provides cloud storage service to the outside, the LSM-Tree storage engine may acquire service data uploaded by a user, and analyze data content of the service data to read Key-Value form data therein. And then, the LSM-Tree storage engine can store Key-Value data in the memory table, and when the memory table is full, Flush processing can be performed on the data in the memory table, so that an SST file and a Blob file corresponding to the service data can be generated. Specifically, the LSM-Tree storage engine may store the Key Value and the Value of the service data in the Blob file, and store the Key Value of the service data and a storage location of the corresponding Value in the Blob file in the SST file. Based on the sequential storage areas of the Blob files stored in the SMR disk, one Blob file corresponds to one sequential storage area, and the content recorded in the SST file may be data in the form of a Key value and the number of the sequential storage area. In addition to the foregoing Flush process for the service data, the LSM-Tree storage engine may generate a new SST file based on a plurality of SST files when performing compact on the SST file, and thus there is a process for generating the SST file. And in the process of generating the Blob file and the SST file, the LSM-Tree storage engine can monitor the version registration state of the SST file.
And 102, marking the valid Blob file corresponding to the SST file by using different stage information according to different version registration states of the SST file.
In implementation, different version registration states of the SST file can correspond to different phase information of the Blob file. In the process of Flush service data, a Blob file is generated and completed before an SST file, and correspondingly, the Blob file is registered and takes effect before the SST file. In the course of compact, the Blob file is always in the valid state in general. Therefore, in the two processes, when a Blob file is in a valid state, the corresponding SST file will have multiple version registration states. Further, phase information of different Blob files corresponding to different version registration states of the SST file can be set, for example, the version registration state is registered corresponding phase information a, and the version registration state is unregistered corresponding phase information B. Therefore, the LSM-Tree storage engine can mark the valid Blob file corresponding to the SST file by using different stage information according to different version registration states of the SST file. The marking mode may be to add phase information to the meta information of the Blob file.
And step 103, regularly performing exception file processing on the validated Blob file according to the stage information.
In implementation, after the Blob file and the SST file are registered and validated, the LSM-Tree storage engine may periodically detect phase information corresponding to the validated Blob file in the SMR disk. And then, the LSM-Tree storage engine can process the abnormal file of the validated Blob file according to the detected phase information. Specifically, the marked stage information is determined as an abnormal file to be processed, and the abnormal file is processed after the marked stage information does not belong to the Blob file of the specified stage information. Further, a GC Loop program dedicated to file cleaning may be run in the LSM-Tree storage engine, and the GC Loop executes the exception file processing in step 103.
Optionally, the Blob file may be marked as an intermediate or final state according to whether the SST file is registered or not, and accordingly, the processing in step 102 may be as follows: and if the version of the SST file is not registered, marking the Blob file corresponding to the SST file as an intermediate stage, otherwise, marking the Blob file as a final stage.
In implementation, when the LSM-Tree storage engine generates a Blob file, since the Blob file is generally generated before the SST file, the SST file corresponding to the Blob file is in an unregistered version at this time. Thus, the LSM-Tree storage engine may mark the Blob file as an intermediate phase. Meanwhile, the LSM-Tree storage engine can monitor the version registration state of the SST file, and if the SST file is in the registered version state, the Blob file can be marked as the final stage.
Optionally, based on the marking manners of the intermediate stage and the final stage, the Flush process and the compact process will be described below, and specific contents may be as follows:
the Flush process comprises the following steps: marking the Blob file as an intermediate stage when Flush processing of service data is completed and the Blob file is generated; and after the SST file corresponding to the Blob file is generated and registered, marking the switching of the Blob file as a final stage.
In implementation, when the LSM-Tree storage engine performs Flush processing on the service data, the LSM-Tree storage engine may extract Key values and Value values of the service data one by one from the memory table, and then store the Key values and the Value values in the Blob file correspondingly. After the storage is completed, the LSM-Tree storage engine may record the storage location of the Value in the Blob file by using the Key Value as an identifier, and then store the Key Value and the storage location in the SST file corresponding to the Blob file. For example, if the service data is "cell number 1-15011112222", where the Key value is "cell number 1" and the value is "15011112222", the storage content in the Blob file is "cell number-15011112222", and the storage location of the data in the Blob file is assumed to be the offset location of the 50M-51M of the first Blob file, then the storage content in the SST file may be "cell number 1-Blob file 1.50M-51M". It can be understood that one Blob file may store a large amount of Key-Value data, and the SST file may also store a large amount of "Key-storage location" data.
Thus, during the Flush processing, the LSM-Tree storage engine may generate a Blob file first, and mark the Blob file as an intermediate stage at this time. And then, when the generation of the SST file corresponding to the Blob file is monitored and the registration is completed, the LSM-Tree storage engine can switch the mark of the Blob file from the intermediate stage to the final stage.
(II) compact process: when the compact of the SST file is started to be executed, the Blob file corresponding to the SST file is switched and marked to be in an intermediate stage; after completing the compact processing of the SST file and generating and registering a new SST file, the Blob file switch corresponding to the SST file is marked as the final stage.
In implementation, the LSM-Tree storage engine may begin executing the compact process for an SST file when the number of SST files generated reaches a certain threshold. At this time, the LSM-Tree storage engine may determine all SST files that need to execute the compact process, and switch the flags of the Blob files corresponding to these SST files from the final stage to the intermediate stage. Thereafter, the LSM-Tree storage engine may perform compact processing of the SST file, generate a new STT file, and register a version thereof. At this time, since the Blob file corresponding to the new SST file is the Blob file corresponding to all STT files executing the compact process, the LSM-Tree storage engine may mark the Blob file corresponding to the SST file as a final stage.
Optionally, based on the marking manners of the intermediate stage and the final stage, the processing of step 103 may be: and regularly processing the abnormal files of the validated target Blob files marked as the intermediate stage.
In implementation, if the SST file is not registered, and a Blob file corresponding to the SST file is marked as an intermediate stage, the Blob file marked as the intermediate stage can be considered as a junk file, and a GC Loop program in the LSM-Tree storage engine can periodically perform exception file processing on a valid target Blob file marked as the intermediate stage.
Optionally, when processing the exception file, it needs to consider whether to retain the data content of the Blob file, and the corresponding processing may be as follows: and if the valid target SST file corresponding to the target Blob file does not exist, deleting the target Blob file, otherwise, recreating the target Blob file.
In implementation, if the GC Loop program detects that the target Blob file is marked as the intermediate stage, it may first determine whether a valid target SST file corresponding to the target Blob file exists in the LSM-Tree storage engine. The specific judgment process may be that the Key value in the target Blob file is read first, and then whether a storage location corresponding to the Key value exists in the LSM-Tree storage engine is retrieved. If the Key value exists in the target Blob file, the LSM-Tree storage engine may be considered to have the valid SST file corresponding to the Key value, but it is further determined whether the SST file corresponds to the target Blob file, that is, it may be determined whether the storage location is the storage location corresponding to the Key value in the target Blob file. If so, the SST file is represented as an effective target SST file corresponding to the target Blob file. At this time, the phase information of the target Blob file is recorded incorrectly, other Blob files corresponding to the target SST file do not exist in the LSM-Tree storage engine, and the GCLoop program can recreate the target Blob file.
If the retrieved storage location does not point to the target Blob file, the SST file can be considered to correspond to other valid Blob files, and the other Blob files record Key-Value data in the target Blob file, so that the target Blob file can be judged to be a redundant file, and the GCLoop program can directly delete the target Blob file without recreating the target Blob file.
And if the storage position corresponding to the Key value does not exist in the LSM-Tree storage engine, the Key value can be considered to be still stored in the memory table, that is, Flush processing of related service data is not completed, so that the LSM-Tree storage engine can continue to execute corresponding Flush processing, that is, the LSM-Tree storage engine can regenerate the Blob file and the corresponding SST file. Therefore, the target Blob file can be determined to be a redundant file, and the GCLoop program can directly delete the target Blob file without recreating the target Blob file.
Optionally, the process of reconstructing the Blob file may exist in at least two ways:
the first method is as follows: and creating a new Blob file and a new SST file according to the target Blob file, and deleting the target Blob file and the target SST file.
In implementation, when the GC Loop program recreates the target Blob file, the GC Loop program can read Key-Value data in the target Blob file, write the Key-Value data into a new Blob file, and create a new SST file at the same time. It should be noted that after the new Blob file is created and before the new SST file is created, the new Blob file may be marked as an intermediate stage, and after the new SST file is created, the new Blob file may be switched and marked as a final stage. In this way, the GC Loop program can store the new Blob file and the new SST file in the sequential storage area and the random storage area of the SMR disk, respectively, and can delete the target Blob file and the target SST file at the same time.
Creating a new Blob file according to the target Blob file, and marking the new Blob file as an intermediate stage; and establishing association between the new Blob file and the target SST file, marking the new Blob file as a final stage, and deleting the target Blob file.
In implementation, when the GC Loop program recreates the target Blob file, the Key-Value data in the target Blob file may be read, the Key-Value data is written into the new Blob file, and the new Blob file is marked as an intermediate stage. And then, the GC Loop program can re-establish the association between the new Blob file and the target SST file, switch and mark the new Blob file as a final stage, and simultaneously delete the target Blob file in the SMR disk to release the storage space.
It is worth mentioning that the data size of the SST file generated by the LSM-Tree storage engine is small and is stored in a random storage area in the SMR disk, and the Blob file is stored in a sequential storage area in the SMR disk, and the data size of the Blob file can be set to the space size of one sequential storage area, so that one Blob file can correspond to one sequential storage area. When abnormal file processing is carried out, processing can be carried out by taking the sequential storage area as a unit, and storage space recovery of the SMR disk is facilitated.
The above overall process flow may be as shown in fig. 2:
1. generating an SST file and an associated Blob file when Flush or compact is executed;
blob File version registration validation, marked as intermediate stage
The registration of the SST file version is effective;
marking the Blob file as a final stage;
and 5. the GC Loop program periodically processes the abnormal Blob file.
For ease of understanding, fig. 3 illustrates the phase change principle of the Blob file in the present embodiment, in which:
the initialization phase of the Blob file is a temporary state and is not marked.
Generating a Blob file by Flush processing, and switching the Blob file from an initialization stage to an intermediate stage;
3, after Flush is finished, generating an SST file and the registration is effective, switching the Blob file from the intermediate stage to the final stage;
when the LSM-Tree storage engine meets the compact condition, the Blob file is switched from the final stage to the intermediate stage before the compact is executed;
5, completing the compact, generating the SST file, and switching the Blob file from the intermediate stage to the final stage after the registration is effective;
the GCLoop program finds an abnormal Blob file, and generates a new Blob file according to the abnormal Blob file, wherein the new Blob is in an intermediate stage, and the stage of the abnormal Blob file is unchanged;
7, after the GCLoop program associates the new Blob file with the SST file, switching the new Blob file from the intermediate stage to the final stage, and deleting the Blob file in the abnormal stage;
in the embodiment of the invention, in the process of generating the Blob file and the SST file, the version registration state of the SST file is monitored; marking the valid Blob file corresponding to the SST file by using different stage information according to different version registration states of the SST file; and regularly processing the abnormal files of the validated Blob files according to the phase information. In this way, under the different version registration states of the SST file, the Blob files which are valid are marked by different stage information respectively, so that when abnormal files are processed regularly, which Blob files are in an abnormal stage can be easily identified through the stage information, invalid Blob files in an SMR disk can be released through processing the abnormal files, and the storage space of the SMR disk can be effectively prevented from being leaked.
Based on the same technical concept, an embodiment of the present invention further provides a file management apparatus based on an LSM-Tree storage engine, as shown in fig. 4, the apparatus includes:
a file generating module 401, configured to monitor a version registration state of the SST file in a process of generating a Blob file and an SST file;
a phase marking module 402, configured to mark, according to different version registration states of the SST file, a Blob file that has been validated and corresponds to the SST file by using different phase information;
and a file processing module 403, configured to perform exception file processing on the Blob file that has been validated periodically according to the stage information.
Optionally, the phase marking module 402 is specifically configured to:
and if the version of the SST file is not registered, marking the Blob file corresponding to the SST file as an intermediate stage, otherwise, marking the Blob file as a final stage.
Optionally, the file processing module 403 is specifically configured to:
and regularly processing the abnormal files of the validated target Blob files marked as the intermediate stage.
Fig. 5 is a schematic structural diagram of a network device according to an embodiment of the present invention. Such a network device 500 may vary widely in configuration or performance and may include one or more central processors 522 (e.g., one or more processors) and memory 532, one or more storage media 530 (e.g., one or more mass storage devices) storing applications 542 or data 544. Memory 532 and storage media 530 may be, among other things, transient storage or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a sequence of instructions for operating on the network device 500. Still further, central processor 522 may be configured to communicate with storage medium 530 to execute a series of instruction operations in storage medium 530 on network device 500.
The network device 500 may also include one or more power supplies 529, one or more wired or wireless network interfaces 550, one or more input-output interfaces 558, one or more keyboards 556, and/or one or more operating systems 541, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, etc.
Network device 500 may include memory, and one or more programs, wherein the one or more programs are stored in the memory and configured to be executed by the one or more processors include instructions for performing the above-described LSM-Tree based file management.
It will be understood by those skilled in the art that all or part of the steps of implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing associated hardware, where the program may be stored in a computer-readable storage medium, where the above-mentioned storage medium may be a hard disk, a magnetic disk (e.g., an SMR magnetic disk), an optical disk, or the like.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like that fall within the spirit and principle of the present invention are intended to be included therein.

Claims (15)

1. A file management method based on an LSM-Tree storage engine is characterized by comprising the following steps:
monitoring the version registration state of the SST file in the process of generating a Blob file and an SST file;
marking the valid Blob file corresponding to the SST file by using different stage information according to different version registration states of the SST file;
and processing the abnormal files of the validated Blob files regularly according to the stage information.
2. The method as claimed in claim 1, wherein the marking the Blob file corresponding to the SST file with different phase information according to different version registration states of the SST file comprises:
and if the version of the SST file is not registered, marking the Blob file corresponding to the SST file as an intermediate stage, otherwise, marking the Blob file as a final stage.
3. The method as claimed in claim 2, wherein the marking of the Blob file corresponding to the SST file as an intermediate phase if the SST file is not registered, and marking as a final phase if not, comprises:
when Flush processing of service data is completed and a Blob file is generated, marking the Blob file as an intermediate stage;
and after the SST file corresponding to the Blob file is generated and registered, marking the switching of the Blob file as a final stage.
4. The method as claimed in claim 2, wherein the marking of the Blob file corresponding to the SST file as an intermediate phase if the SST file is not registered, and marking as a final phase if not, comprises:
when the compact of the SST file is started to be executed, the Blob file corresponding to the SST file is switched and marked to be in an intermediate stage;
after completing the compact processing of the SST file, generating and registering a new SST file, and then marking the Blob file corresponding to the SST file as a final stage.
5. The method of claim 2, wherein the periodically performing exception file processing on the validated Blob file according to the phase information includes:
and regularly processing the abnormal files of the validated target Blob files marked as the intermediate stage.
6. The method of claim 5, wherein said performing exception file handling comprises:
and if the valid target SST file corresponding to the target Blob file does not exist, deleting the target Blob file, otherwise recreating the target Blob file.
7. The method of claim 6, wherein the recreating the target Blob file comprises:
and creating a new Blob file and a new SST file according to the target Blob file, and deleting the target Blob file and the target SST file.
8. The method of claim 6, wherein the recreating the target Blob file comprises:
creating a new Blob file according to the target Blob file, and marking the new Blob file as an intermediate stage;
and establishing association between the new Blob file and the target SST file, marking the new Blob file as a final stage, and deleting the target Blob file.
9. The method of claim 1, wherein the exception file handling is performed by a GC Loop program within an LSM-Tree storage engine.
10. The method of claim 1, wherein the SST file is stored in a random storage area in an SMR disk, wherein the Blob file is stored in a sequential storage area in the SMR disk, and wherein one sequential storage area corresponds to one Blob file.
11. An LSM-Tree storage engine based file management apparatus, the apparatus comprising:
the file generation module is used for monitoring the version registration state of the SST file in the process of generating the Blob file and the SST file;
the phase marking module is used for marking the valid Blob file corresponding to the SST file by using different phase information according to different version registration states of the SST file;
and the file processing module is used for processing the abnormal files of the effective Blob files periodically according to the stage information.
12. The apparatus according to claim 11, wherein the phase labeling module is specifically configured to:
and if the version of the SST file is not registered, marking the Blob file corresponding to the SST file as an intermediate stage, otherwise, marking the Blob file as a final stage.
13. The apparatus of claim 12, wherein the file processing module is specifically configured to:
and regularly processing the abnormal files of the validated target Blob files marked as the intermediate stage.
14. A network device comprising a processor and a memory, wherein the memory stores at least one instruction, at least one program, a set of codes, or a set of instructions, which is loaded and executed by the processor to implement the LSM-Tree storage engine based file management method according to any of claims 1 to 10.
15. A computer-readable storage medium, wherein at least one instruction, at least one program, a set of codes, or a set of instructions is stored in the storage medium, and the at least one instruction, the at least one program, the set of codes, or the set of instructions is loaded and executed by a processor to implement the LSM-Tree storage engine-based file management method according to any one of claims 1 to 10.
CN202010261689.9A 2020-04-04 2020-04-04 File management method and device based on LSM-Tree storage engine Active CN113495871B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010261689.9A CN113495871B (en) 2020-04-04 2020-04-04 File management method and device based on LSM-Tree storage engine
PCT/CN2021/085631 WO2021197493A1 (en) 2020-04-04 2021-04-06 File management method and apparatus based on lsm-tree storage engine

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010261689.9A CN113495871B (en) 2020-04-04 2020-04-04 File management method and device based on LSM-Tree storage engine

Publications (2)

Publication Number Publication Date
CN113495871A true CN113495871A (en) 2021-10-12
CN113495871B CN113495871B (en) 2023-06-23

Family

ID=77995177

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010261689.9A Active CN113495871B (en) 2020-04-04 2020-04-04 File management method and device based on LSM-Tree storage engine

Country Status (1)

Country Link
CN (1) CN113495871B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817263A (en) * 2022-04-28 2022-07-29 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105138622A (en) * 2015-08-14 2015-12-09 中国科学院计算技术研究所 Append operation method for LSM tree memory system and reading and merging method for loads of append operation
CN106156070A (en) * 2015-03-31 2016-11-23 华为技术有限公司 A kind of querying method, Piece file mergence method and relevant apparatus
CN108052643A (en) * 2017-12-22 2018-05-18 北京奇虎科技有限公司 Date storage method, device and storage engines based on LSM Tree structures
CN109032501A (en) * 2017-06-12 2018-12-18 爱思开海力士有限公司 Storage system and its operating method
CN109271343A (en) * 2018-07-24 2019-01-25 华为技术有限公司 A kind of data merging method and device applied in key assignments storage system
US20200019331A1 (en) * 2017-03-22 2020-01-16 Huawei Technologies Co., Ltd. File Merging Method and Controller
US20200097498A1 (en) * 2018-09-24 2020-03-26 Salesforce.Com, Inc. Upgrading a database from a first version to a second version

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106156070A (en) * 2015-03-31 2016-11-23 华为技术有限公司 A kind of querying method, Piece file mergence method and relevant apparatus
CN105138622A (en) * 2015-08-14 2015-12-09 中国科学院计算技术研究所 Append operation method for LSM tree memory system and reading and merging method for loads of append operation
US20200019331A1 (en) * 2017-03-22 2020-01-16 Huawei Technologies Co., Ltd. File Merging Method and Controller
CN109032501A (en) * 2017-06-12 2018-12-18 爱思开海力士有限公司 Storage system and its operating method
CN108052643A (en) * 2017-12-22 2018-05-18 北京奇虎科技有限公司 Date storage method, device and storage engines based on LSM Tree structures
CN109271343A (en) * 2018-07-24 2019-01-25 华为技术有限公司 A kind of data merging method and device applied in key assignments storage system
US20200097498A1 (en) * 2018-09-24 2020-03-26 Salesforce.Com, Inc. Upgrading a database from a first version to a second version

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
梁君健: "基于时空特性数据的分布式存储系统的设计与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
陈陆: "分布式键值存储引擎的研究与实现", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114817263A (en) * 2022-04-28 2022-07-29 北京达佳互联信息技术有限公司 Data processing method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN113495871B (en) 2023-06-23

Similar Documents

Publication Publication Date Title
CN110531940B (en) Video file processing method and device
JP5572163B2 (en) Recovering computers that contain virtual disks
US8370593B2 (en) Method and apparatus to manage groups for deduplication
US8229897B2 (en) Restoring a file to its proper storage tier in an information lifecycle management environment
CN109582443A (en) Virtual machine standby system based on distributed storage technology
CN104216801B (en) The data copy method and system of a kind of Virtual environment
US11221785B2 (en) Managing replication state for deleted objects
WO2021197493A1 (en) File management method and apparatus based on lsm-tree storage engine
US20040143609A1 (en) System and method for data extraction in a non-native environment
WO2021169163A1 (en) File data access method and apparatus, and computer-readable storage medium
US12019977B2 (en) Fast fill for computerized data input
CN105608150A (en) Business data processing method and system
CN109542860B (en) Service data management method based on HDFS and terminal equipment
CN113268206B (en) Network target range resource hot plug implementation method and system
CN113495871B (en) File management method and device based on LSM-Tree storage engine
CN116483284B (en) Method, device, medium and electronic equipment for reading and writing virtual hard disk
US20050262033A1 (en) Data recording apparatus, data recording method, program for implementing the method, and program recording medium
US7600151B2 (en) RAID capacity expansion interruption recovery handling method and system
CN115576743B (en) Operating system recovery method, operating system recovery device, computer equipment and storage medium
US12026132B2 (en) Storage tiering for computing system snapshots
WO2016117007A1 (en) Database system and database management method
WO2019196157A1 (en) File reading method and application entity
CN116257531B (en) Database space recovery method
US20240146748A1 (en) Malware identity identification
US12093568B1 (en) Segregated filesystem metadata operations using buffered atomic write interface

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant