CN116069681A

CN116069681A - Disk space recovery method and device, electronic equipment and storage medium

Info

Publication number: CN116069681A
Application number: CN202211715651.XA
Authority: CN
Inventors: 刘海军; 杨光
Original assignee: Wuhan Os Easy Cloud Computing Co ltd
Current assignee: Wuhan Os Easy Cloud Computing Co ltd
Priority date: 2022-12-29
Filing date: 2022-12-29
Publication date: 2023-05-05

Abstract

The application relates to a disk space recovery method, a device, electronic equipment and a storage medium, and relates to the technical field of distributed storage optimization, wherein the method comprises the following steps: acquiring a data block failure instruction issued by a file system layer; generating a corresponding failure key value based on the sector address and the sector length; traversing the b+tree in the cache disk, and recovering the invalid data block corresponding to the invalid key value on the cache disk if the invalid key value exists on the node of the b+tree. By adopting the method, the front-end cache disk can be informed of which data blocks of the front-end cache disk are invalid in advance under a normal working environment, so that the garbage collection drive can recover invalid data blocks in the cache disk in advance, and the situation that the data which are defined as invalid by an operating system to be copied in order to merge the idle space in the cache disk is avoided, thereby reducing the data which need to be copied during garbage collection, and improving the recovery efficiency of invalid data in the idle space of the disk.

Description

Disk space recovery method and device, electronic equipment and storage medium

Technical Field

The application relates to the technical field of distributed storage optimization, in particular to a disk space recycling method, a device, electronic equipment and a storage medium.

Background

Bcache is a cache system of a block device layer in Linux kernel, which constructs a plurality of (> =1) high-speed block devices (generally solid state disks) into caches of a plurality of low-speed block devices (generally mechanical hard disks), wherein the SSD device is a high-speed hard disk (solid state disk), and sda-sdn is a low-speed hard disk (mechanical hard disk). The SSD hard disk is a cache disk, and provides cache service for the low-speed hard disk at the back end. Bcache writes IO data of the application layer onto a cache device (high-speed block device), and then Bcache can write the data on the cache device back onto a low-speed block device at the rear end, so that higher writing performance is achieved.

The Cache disk in Bcache generally adopts a high-speed SSD disk, the rear-end storage disk adopts an HDD mechanical disk, and the characteristics of SSD determine that the SSD disk cannot work like a common HDD disk. When a file is deleted in the operating system, the system does not actually delete the data of the file, and it simply marks the address occupied by the data as 'empty', and can be used in an overriding manner. But this is simply an operation at the file system level, the hard disk itself does not know that the data at those addresses has been 'invalidated' unless the system informs it to write new data at those addresses. There is no problem in HDD, since HDD allows overwriting, but the problem is that SSD does not allow overwriting, but only erases and writes first, so that flash memory space is left 'free' for writing, and SSD must perform Garbage Collection (GC) operation. In normal operation, the SSD cannot know in advance that the data pages that were 'deleted' are 'invalid', and it is necessary to know that the data can be erased until the system requires writing the data in the same place, so that the best optimization cannot be made at the most appropriate time, which affects both the GC efficiency (indirectly affects the performance) and the SSD lifetime.

Therefore, how to efficiently recycle the invalid storage space in the Cache disk in Bcache is a problem which needs to be solved at present.

Disclosure of Invention

The application provides a disk space recovery method, a disk space recovery device, electronic equipment and a storage medium, which can efficiently recover a failure space in a Cache disk in Bcache.

To achieve the above object, the present application provides the following aspects.

In a first aspect, the present application provides a disk space reclamation method, the method including the steps of:

acquiring a data block failure instruction issued by a file system layer; the data block failure instruction comprises a sector address and a sector length of a rear-end storage disk;

generating a corresponding failure key value based on the sector address and the sector length;

traversing the b+tree in the cache disk, and recovering the invalid data block corresponding to the invalid key value on the cache disk if the invalid key value exists on the node of the b+tree.

Further, before the data block failure instruction issued by the file system layer is obtained, the method comprises the following steps;

b+tree is built in the cache disk based on the data corresponding relation between the rear-end storage disk and the cache disk;

acquiring a failure notification instruction issued by a file system layer;

and judging that the failure notification instruction is a data block failure instruction based on the attribute of the failure notification instruction.

acquiring a failure notification instruction issued by a file system layer;

Further, the determining that the invalidation notification instruction is a data block invalidation instruction based on the attribute of the invalidation notification instruction includes the following steps:

determining a target node on the b+tree, and setting the target node as a command attribute check point;

checking whether the attribute of the failure notification instruction is REQ_OP_DISCARD by using the command attribute check point;

if yes, judging that the failure notification instruction is a data block failure instruction.

Further, if the invalidation key exists on a node of the b+tree, recovering an invalidation data block corresponding to the invalidation key on the cache disk, wherein the node is the node of the b+tree, the node comprises the following steps:

acquiring a root node on the b+tree, and traversing all nodes on the b+tree from the root node;

if the failure key value exists in the node of the b+tree, acquiring a data block address corresponding to the failure key value on the cache disk;

and carrying out garbage collection on the data blocks existing on the data block addresses.

Further, the method further comprises:

and if the failure key value does not exist in the node of the b+tree, not performing garbage collection on the storage data block in the cache disk.

Further, the garbage collection of the data block existing at the data block address includes the following steps:

acquiring a second invalid data block address on the cache device based on a first invalid data block address which corresponds to the invalid key value and is positioned on the rear-end cache disk;

generating an updated invalidation key value based on the second invalidation data block address;

if the updated invalid key value exists on the b+tree, merging the invalid data block address corresponding to the invalid key value with the original data block address on the b+tree;

encapsulating the combined second data block address into a bio command;

and calling a garbage recycling mechanism to recycle the invalid data block corresponding to the second invalid data block address based on the bio command.

Before the combined second data block address is packaged into the bio command, the method comprises the following steps:

when the timer reaches the time triggering moment, judging whether the b+tree is empty or not;

and if the b+tree is empty, not performing garbage collection operation on the data on the cache disk.

Further, before the second combined data block address is encapsulated into the bio command, the method includes the following steps:

In a second aspect, the present application provides a disk space reclamation apparatus, the apparatus comprising:

the failure instruction acquisition module is used for acquiring a data block failure instruction issued by the file system layer; the data block failure instruction comprises a sector address and a sector length of a rear-end storage disk;

the key value acquisition module is used for generating a corresponding failure key value based on the sector address and the sector length;

and the recovery module is used for traversing the b+tree in the cache disk, and if the invalid key value exists on a node of the b+tree, recovering the invalid data block corresponding to the invalid key value on the cache disk.

Further, the failure instruction obtaining module further includes:

the b+tree construction submodule is used for constructing the b+tree in the cache disk based on the data corresponding relation between the rear-end storage disk and the cache disk;

the failure notification instruction acquisition sub-module is used for acquiring a failure notification instruction issued by the file system layer;

and the judging sub-module is used for judging that the failure notification instruction is a data block failure instruction based on the attribute of the failure notification instruction.

Further, the judging submodule includes:

a checkpointing unit for determining a target node on the b+tree and setting it as a command attribute checkpoint;

an attribute judging unit for checking whether an attribute of the failure notification instruction is req_op_disable using the command attribute check point;

and the judging unit is used for judging that the failure notification instruction is a data block failure instruction if the failure notification instruction is the data block failure instruction.

Further, the recycling module includes:

the traversal submodule is used for acquiring a root node on the b+tree and traversing all nodes on the b+tree from the root node;

the address analysis sub-module is used for acquiring a data block address corresponding to the failure key value on the cache disk if the failure key value exists in the node of the b+tree;

and the garbage recycling sub-module is used for recycling garbage of the data blocks existing on the data block addresses.

Further, the garbage collection sub-module includes:

the address acquisition unit is used for acquiring a second invalid data block address on the cache device based on a first invalid data block address which corresponds to the invalid key value and is positioned on the rear-end cache disk;

a key value updating unit, configured to generate an updated failure key value based on the second failure data block address;

a merging unit, configured to merge, if the updated failure key value exists on the b+tree, a failure data block address corresponding to the failure key value with an original data block address on the b+tree;

the packaging unit is used for packaging the combined second data block address into a bio command;

and the calling unit is used for calling a garbage recycling mechanism to recycle the invalid data block corresponding to the second invalid data block address based on the bio command.

Further, the packaging unit further includes:

an empty set judging unit, configured to judge whether the b+tree is empty when the timer reaches the time triggering time;

and the first operation unit is used for not performing garbage collection operation on the data on the cache disk if the b+tree is empty.

The beneficial effects that technical scheme that this application provided brought include:

the cache disk controller acquires a data block failure instruction issued by a file system layer; the data block failure instruction comprises a sector address and a sector length of a rear-end storage disk; generating a corresponding failure key value based on the sector address and the sector length; traversing the b+tree in the cache disk, and recovering the invalid data block corresponding to the invalid key value on the cache disk if the invalid key value exists on the node of the b+tree. By adopting the method, the front-end cache disk can be informed of which data blocks of the front-end cache disk are invalid in advance under a normal working environment, so that the garbage collection drive can recover invalid data blocks in the cache disk in advance, and the situation that the data which are defined as invalid by an operating system to be copied in order to merge the idle space in the cache disk is avoided, thereby reducing the data which need to be copied during garbage collection, and improving the recovery efficiency of invalid data in the idle space of the disk.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart illustrating steps for disk space reclamation provided in an embodiment of the present application;

FIG. 2 is a flowchart illustrating the steps for disk space reclamation provided in another embodiment of the present application.

Detailed Description

For the purposes of making the objects, technical solutions and advantages of the embodiments of the present application more clear, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present application based on the embodiments herein.

Embodiments of the present application are described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, an embodiment of the present application provides a disk space recycling method, which includes the following steps:

s1, acquiring a data block failure instruction issued by a file system layer;

among them, a file system is a mechanism for organizing and managing files on a storage device. It will be appreciated that different organization and management of files on a storage device may result in different types of file systems.

Bcache is a block device cache system in a Linux kernel, and the basic function of Bcache is to cache a mechanical hard disk (back-end device) by using a solid state disk (cache device). bcache is the Cache memory of the kernel block device layer of the linux operating system. Cache memory is a Cache memory device, which is a small but high-speed memory located between the CPU and the main memory DRAM, and is typically composed of static memory. The Cache is used for improving the data input and output speed of the CPU.

The application scene in the Bcache cache system is as follows: and a layer of cache is made on the HDD disk with slower IO speed by using the SSD disk, so that the IO speed of the HDD disk is improved. One cache disk (SSD) may provide caching for multiple back-end disks (HDDs) simultaneously. Since it is a cache, it is natural to think of a cache policy, bcache supports three types of cache policies: writeback: write-back strategy, all data will be written into the buffer disk first, then wait for the system to write back the data into the back-end data disk; writethrough: write-through strategy (default strategy), data will be written into the buffer disk and the back-end data disk at the same time; writearoud: data will be written directly to the back-end disk.

Specifically, the back-end disk controller builds a b+tree in the cache disk based on the data correspondence between the back-end storage disk and the cache disk; acquiring a failure notification instruction issued by a file system layer; based on the attribute of the invalidation notification instruction, the invalidation notification instruction is determined to be a data block invalidation instruction.

S2, generating a corresponding failure key value based on the sector address and the sector length;

it will be appreciated that data is read from and written to the back-end storage disk using a disk drive in sectors. On the back-end storage disk, the DOS operating system allocates disk space for files in "clusters". The cluster of the back-end storage disk is typically a plurality of sectors, depending on the type of disk, DOS version, and size of the hard disk partition. Each cluster can only be occupied by one file, even if the file has a few bytes, more than two files are never allowed to share one cluster, otherwise, data confusion is caused. The mechanism taking the cluster as the minimum allocation unit makes the management of the data by the back-end storage disk relatively easy, but also causes the waste of disk space, especially under the condition of a large number of small files, the waste of the disk space of a large hard disk with a kilomega can reach hundreds of megabytes.

The Disk Buffer (Disk Buffer) or the Disk Cache (Disk Cache) stores the downloaded data in the memory space allocated by the system for software (the memory space is called as a memory pool), when the data stored in the memory pool reaches a certain degree, the data is stored in the Cache Disk, so that the actual read-write operation on the Disk is reduced, and the Disk is effectively protected from damage caused by repeated read-write operation. The disk cache is used for reducing the times of reading the rear-end storage disk by the CPU through the I/O and improving the reading and writing efficiency of the rear-end storage disk.

Specifically, when the backend disk controller determines that the attribute of the failure notification instruction issued by the file system layer is a data block failure instruction in step S1, the backend disk controller analyzes the sector address and the sector length where the failure data block is located in the backend storage disk from the data block failure instruction, and generates a failure key value corresponding to the sector address according to the sector address and the sector length.

S3, traversing the b+tree in the cache disk, and if the invalid key value exists on a node of the b+tree, recovering the invalid data block corresponding to the invalid key value on the cache disk.

The method comprises the steps that a rear-end disk controller obtains a second invalid data block address on a cache device based on a first invalid data block address, corresponding to an invalid key value, of a rear-end cache disk; generating an updated invalidation key value based on the second invalidation data block address; if the updated failure key value exists on the b+tree, merging the failure data block address corresponding to the failure key value with the original data block address on the b+tree; encapsulating the combined second data block address into a bio command; based on the bio command, invoking a garbage collection mechanism to collect the invalid data block corresponding to the address of the second invalid data block.

In an embodiment of the application, step S1 includes:

b+tree is built in the cache disk based on the data corresponding relation between the rear-end storage disk and the cache disk; acquiring a failure notification instruction issued by a file system layer; based on the attribute of the invalidation notification instruction, the invalidation notification instruction is determined to be a data block invalidation instruction.

It can be appreciated that the mapping relationship between the data above the cache device and the backend device is maintained in Bcache by using b+tree, which is a multi-path search tree. The operations related to the B+tree are four types of searching, traversing, inserting and arranging, and writing data into the cache equipment can insert elements into the B+tree; reading data from the cache device looks up the element from the B + tree.

Because the correspondence rule exists between the address of the stored data in the cache device and the address in the rear-end storage disk, the address in the rear-end storage disk is generally related to the address of the stored data in the cache device in a mirror image mode, and a b+tree is constructed in the cache disk according to the relationship between the addresses in the rear-end storage disk and the cache device.

The cache disk controller acquires an invalidation notification instruction issued by a file system layer, determines a target node on a b+tree, and sets the target node as a command attribute check point; checking whether the attribute of the failure notification instruction is REQ_OP_DISCARD by using a command attribute check point; if so, the invalidation notification instruction is judged to be a data block invalidation instruction.

In an application embodiment, step S3 includes:

s301, acquiring a root node on the b+tree, and traversing all nodes on the b+tree from the root node;

s302, if the invalid key value exists in the node of the b+tree, acquiring a data block address corresponding to the invalid key value on the cache disk;

in another embodiment, if the invalidation key does not exist in the node of the b+tree, garbage collection is not performed on the stored data blocks in the cache disk.

S303, garbage collection is carried out on the data blocks existing on the data block addresses.

The cache disk controller obtains a second invalid data block address on the cache device based on a first invalid data block address on the rear-end cache disk corresponding to the invalid key value; generating an updated invalidation key value based on the second invalidation data block address; if the updated failure key value exists on the b+tree, merging the failure data block address corresponding to the failure key value with the original data block address on the b+tree; encapsulating the combined second data block address into a bio command; based on the bio command, invoking a garbage collection mechanism to collect the invalid data block corresponding to the address of the second invalid data block.

In an embodiment of the present application, as shown in fig. 2, a disk space recycling method is provided, and the method includes the following steps:

a1, constructing a new b+tree, and marking the new b+tree as SSD-b+tree;

a2, setting a timer;

a3, selecting a data block of the SSD to be informed of the invalidation time: according to the REQ_OP_DISCARD command issued by the file system layer, which blocks on the SSD are known to be invalid by the file system layer can be obtained, so that a check point is set in the Bcache data writing flow, whether the command attribute issued by the file system is REQ_OP_DISCARD or not is checked, if not, whether the command attribute issued by the file system is REQ_OP_DISCARD or not is judged, if so, the method provided by the invention is entered;

a4, copying a REQ_OP_DISCARD attribute command;

a5, initializing a tree operation lock; converting the sector address and the sector length of the back-end HDD device carried by the REQ_OP_DISCARD command into a KEY in Bcache; acquiring a root node of a b+tree in Bcache; starting searching from the root node of the b+tree; comparing whether the KEY corresponding to the req_op_disable command is on the b+tree, if not on the b+tree, i.e. not hitting, it indicates that the HDD failure block data corresponding to the req_op_disable command is not on the SSD, and if on the b+tree, i.e. hitting, it indicates that the HDD failure block data corresponding to the req_op_disable command is on the SSD, it is necessary to retrieve the failure data block on the SSD.

A6, mapping the HDD disk block address corresponding to the hit REQ_OP_DISCARD command into the address of the SSD disk through the updated KEY; converting the obtained address mapped into an SSD disk into a new KEY; adding the updated KEY to the SSD-b+tree; traversing the new KEY on the SSD-b+tree, merging the two KEY to realize SSD data block merging if the SSD address of the new KEY is found to be coincident with the SSD address of the original KEY on the SSD-b+tree, and adding the KEY to the SSD-b+tree if the SSD data block is not coincident with the SSD address of the original KEY;

a7, when the timer is triggered, checking whether the SSD-b+tree is empty, if not, packaging SSD addresses corresponding to the KEY on the SSD-b+tree into a plurality of bio commands, and if so, not performing any operation; and issuing the packaged bio command to the SSD driver through the block device layer, so that the SSD calls the GC to recover the failure data block according to the packaged bio command.

In this embodiment, it is possible to "notify" in advance, in a normal working environment, which data blocks of the front-end cache disk have failed, so that the garbage collection driver recovers failed data blocks in the cache disk in advance, and avoids copying data in the cache disk, which has been defined as invalid by the operating system, into the idle blocks for merging the idle space, thereby reducing the data to be copied during garbage collection, and improving the recovery efficiency of failed data in the idle space of the disk.

It should be noted that, step numbers of each step in the embodiments of the present application do not limit the order of each operation in the technical solution of the present application.

In a second aspect, an embodiment of the present application provides a disk space recovery apparatus, where the apparatus includes:

The back-end disk controller builds a b+tree in the cache disk based on the data corresponding relation between the back-end storage disk and the cache disk; acquiring a failure notification instruction issued by a file system layer; based on the attribute of the invalidation notification instruction, the invalidation notification instruction is determined to be a data block invalidation instruction.

And when the rear-end disk controller judges that the attribute of the failure notification instruction issued by the file system layer is a data block failure instruction in the step S1, resolving the sector address and the sector length of the failure data block in the rear-end storage disk from the data block failure instruction, and generating a failure key value corresponding to the sector address according to the sector address and the sector length.

In an application implementation, the failure instruction obtaining module further includes:

In an application implementation, the judging sub-module includes:

In one application implementation, the recovery module includes:

In an application implementation, the garbage collection sub-module includes:

In an application implementation, the packaging unit further includes:

It should be noted that, the magnetic disk space recovery device provided in the embodiment of the present application has technical problems, technical means and technical effects corresponding to the same principle as the magnetic disk space recovery method.

In a third aspect, embodiments of the present application provide a storage medium having stored thereon a computer program which, when executed by a processor, implements the disk space reclamation method mentioned in the first aspect.

In a fourth aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program running on the processor, and the processor implements the disk space reclamation method mentioned in the first aspect when executing the computer program.

It should be noted that in this application, relational terms such as "first" and "second" and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises an element.

The foregoing is merely a specific embodiment of the application to enable one skilled in the art to understand or practice the application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for recovering disk space, the method comprising the steps of:

2. The method for recovering disk space according to claim 1, wherein before said obtaining the data block failure instruction issued by the file system layer, the method comprises the steps of;

acquiring a failure notification instruction issued by a file system layer;

3. The disk space reclamation method as recited in claim 2, wherein the determining that the invalidation notification instruction is a data block invalidation instruction based on an attribute of the invalidation notification instruction comprises the steps of:

4. The method for reclaiming disk space according to claim 2, wherein, if the invalidation key exists on a node of the b+tree, reclaiming the invalidation data block corresponding to the invalidation key on the cache disk comprises the following steps:

5. The disk space reclamation method as recited in claim 4, further comprising:

6. The disk space reclaiming method as claimed in claim 4, wherein the garbage reclaiming of the data blocks existing at the data block address comprises the steps of:

encapsulating the combined second data block address into a bio command;

7. The disk space reclamation method as recited in claim 6, wherein before said encapsulating the merged second data block address into a bio command, comprising the steps of:

8. A disk space reclamation apparatus, the apparatus comprising:

9. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method according to any of claims 1 to 7 when executing the computer program.

10. A computer readable storage medium storing a computer program, characterized in that the computer program when executed by a processor implements the method according to any one of claims 1 to 7.