CN111597147A - Space recovery method, device, storage medium and processor - Google Patents

Space recovery method, device, storage medium and processor Download PDF

Info

Publication number
CN111597147A
CN111597147A CN202010366322.3A CN202010366322A CN111597147A CN 111597147 A CN111597147 A CN 111597147A CN 202010366322 A CN202010366322 A CN 202010366322A CN 111597147 A CN111597147 A CN 111597147A
Authority
CN
China
Prior art keywords
space
merged
objects
determining
statistical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010366322.3A
Other languages
Chinese (zh)
Other versions
CN111597147B (en
Inventor
张宏瑞
鲁加福
张旭明
王豪迈
胥昕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xingchen Tianhe Technology Co ltd
Original Assignee
Xsky Beijing Data Technology Corp ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Xsky Beijing Data Technology Corp ltd filed Critical Xsky Beijing Data Technology Corp ltd
Priority to CN202010366322.3A priority Critical patent/CN111597147B/en
Publication of CN111597147A publication Critical patent/CN111597147A/en
Application granted granted Critical
Publication of CN111597147B publication Critical patent/CN111597147B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/13File access structures, e.g. distributed indices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/17Details of further file system functions
    • G06F16/1727Details of free space management performed by the file system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0602Interfaces specially adapted for storage systems specifically adapted to achieve a particular effect
    • G06F3/0604Improving or facilitating administration, e.g. storage management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0638Organizing or formatting or addressing of data
    • G06F3/0644Management of space entities, e.g. partitions, extents, pools

Abstract

The invention discloses a space recovery method, a space recovery device, a storage medium and a processor. Wherein, the method comprises the following steps: determining spatial use data of a merged large object in a statistical counting object based on a statistical counting object of a recovery scanning traversal preset index pool; determining a recoverable space of a data pool associated with a predetermined index pool according to space usage data; and carrying out space recovery on the recoverable space of the data pool according to a preset space recovery rule. The invention solves the technical problem that the multithreading model of the object storage system in the related technology performs space recovery to cause the thread to occupy resources in switching.

Description

Space recovery method, device, storage medium and processor
Technical Field
The invention relates to the technical field of space recovery of an object storage system, in particular to a space recovery method, a space recovery device, a storage medium and a processor.
Background
In the existing object storage system, a default void rate is set, generally 50%, and the system periodically scans the counting objects in the index pool to obtain the space use condition of each merged large object in the data pool. And after the merged small files are deleted, the system can select whether to add the merged large object into the space recycling list according to the default configured void rate, if the void rate generated on the currently scanned merged large object is greater than or equal to a set threshold value, the merged large object is added into the recycling list, and if not, the next merged large object is skipped and processed. And then a space recovery and arrangement service module of the system background extracts the data of the merged large object from the recovery list for space recovery. And the multiple merging threads simultaneously occupy the process address space and resources of each space service module during operation to perform the processing of the recovery task.
The current space reclamation mechanisms have several disadvantages:
1. the merging thread runs all the time, and system resources are consumed, so that the service is influenced. For example, a plurality of threads share the address space and resources of the object storage background gateway process, different task executions are completed, and each thread occupies the CPU resources; when the thread is switched, the CPU needs to store all execution states of the thread, such as a thread number, an execution position and the like, and then executes other threads; when two or more threads are executing, a phenomenon of mutual deadlock caused by contention of resources may occur, resulting in a failure of task execution.
2. The merging space recovery processing time is relatively fixed and cannot be dynamically adjusted.
3. Space reclamation is performed only after the small files are deleted and the merged large object reaches the specified void rate, and the method is not flexible enough.
In view of the above problems, no effective solution has been proposed.
Disclosure of Invention
The embodiment of the invention provides a space recovery method, a space recovery device, a storage medium and a processor, which are used for at least solving the technical problem that a thread occupies resources in switching due to space recovery of an object storage system multithreading model in the related technology.
According to an aspect of an embodiment of the present invention, there is provided a space reclamation method including: determining spatial usage data of a data pool associated with a merged large object in a statistical counting object based on a statistical counting object of a recycle scan traversing a predetermined index pool; determining a recoverable space of the data pool according to the space usage data; and carrying out space recovery on the recoverable space of the preset index pool according to a preset space recovery rule.
Optionally, before determining spatial usage data merging large objects in the statistical count objects based on a recycle scan traversing the statistical count objects of the predetermined index pool, the method comprises: and determining the space usage data of the merged large object in the statistical counting objects based on the statistical counting objects of the pre-estimated scanning traversing the preset index pool.
Optionally, determining spatial usage data of a merged large object in the statistical count objects based on a recycle scan traversing the statistical count objects of the predetermined index pool comprises: acquiring invalid space and total space of a large merging object in the counting objects; determining the void rate of the merged large object according to the invalid space and the total space; and adding the merged large object to a merged space recycling list under the condition that the void rate meets a preset void rate threshold value, and determining the occupied space and the recyclable space of the merged large object.
Optionally, in a case that the void rate meets a predetermined void rate threshold, adding the merged large object to a merged space reclamation list includes: judging whether the void rate of the merged large object is greater than or equal to the preset void rate threshold value or not; and adding the merged large object to a merged space recycling list when the void rate is greater than or equal to the predetermined void rate threshold.
Optionally, after adding the merged large object to the merge space recycle list, the method further comprises: determining the number of merged large objects which are added completely and the counted invalid total space capacity in the merged space recycling list; and updating the metadata of the statistical counting object according to the number and the invalid total space capacity.
Optionally, the metadata of the statistical count object is a key-value pair of a tree structure, where the metadata of the statistical count object includes: merging the names of the large objects and the information corresponding to the names of the large objects, wherein the information corresponding to the names of the large objects comprises at least one of the following information: small object number, occupied space, invalid space, deleted small object number, data pool ID.
Optionally, before obtaining the invalid space and the total space of the large object merged in the statistical count object, the method further includes: establishing an incidence relation between a statistical counting object and a management object of the preset index pool; and acquiring the metadata of the management object according to the incidence relation.
Optionally, after updating the metadata of the statistical count object, the method further comprises: updating the metadata of the management object.
Optionally, the metadata of the management object is a key-value pair with a tree structure, where the metadata of the management object includes: counting the name of the counting object and the information corresponding to the name of the counting object, wherein the information corresponding to the name of the counting object comprises at least one of the following: task type, total number of counting statistical objects, index pool ID, predetermined void rate threshold, and space recycling statistical data.
Optionally, determining spatial usage data of a merged large object in the statistical count objects based on pre-estimated scans traversing the statistical count objects of the predetermined index pool comprises: acquiring invalid space and total space of a large merging object in the counting objects; determining the void rate of the merged large object according to the invalid space and the total space; determining occupied space and recoverable space of the merged large object if the void rate meets a predetermined void rate threshold.
Optionally, before determining spatial usage data merging a large object in the statistical count object based on a recycle scan traversing the statistical count object of the predetermined index pool, the method further comprises: a predetermined index pool and a predetermined hole rate threshold are determined.
According to another aspect of the embodiments of the present invention, there is also provided a space recycling apparatus, including: the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining the space use data of a merged large object in a statistical counting object based on the statistical counting object of a recovery scanning traversal preset index pool; a second determining module, configured to determine, according to the space usage data, a recoverable space of the data pool associated with the predetermined index pool; and the recovery module is used for performing space recovery on the recoverable space of the data pool according to a preset space recovery rule.
According to another aspect of the embodiments of the present invention, there is also provided a storage medium, where the storage medium includes a stored program, and when the program runs, a device in which the storage medium is located is controlled to execute any one of the above space reclamation methods.
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes to perform the space reclamation method described in any one of the above.
In the embodiment of the invention, a statistical counting object based on a recycle scan traversal preset index pool is adopted to determine the space use data of a merged large object in the statistical counting object; determining a recoverable space of the data pool associated with the predetermined index pool according to the space usage data; the method comprises the steps of obtaining the recoverable space of a data pool associated with a preset index pool by merging the space use data of large objects in the preset index pool determined by recovery scanning according to a preset space recovery rule, and performing space recovery on the recoverable space of the data pool according to the preset space recovery rule, so that the purpose of flexibly realizing space recovery according to actual needs is achieved, the technical effects of reducing system space waste and improving the space utilization rate are achieved, and the technical problem that the multithreading model of an object storage system in the related technology performs space recovery to cause threads to occupy resources in switching is solved.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the invention and together with the description serve to explain the invention without limiting the invention. In the drawings:
FIG. 1 is a schematic diagram of a metadata structure of a management object according to an embodiment of the present invention;
FIG. 2 is a diagram of a metadata structure of a statistical count object according to an embodiment of the present invention;
FIG. 3 is a flow chart of a space reclamation method according to an embodiment of the present invention;
FIG. 4 is a flow diagram of a predictive scan in a method of space reclamation according to an alternative embodiment of the invention;
FIG. 5 is a flow diagram of a reclaim scan in a space reclaim method in accordance with an alternative embodiment of the invention;
FIG. 6 is a schematic diagram of a manner of logical transition of scan states in a space reclamation method in accordance with an alternative embodiment of the present invention;
FIG. 7 is a schematic illustration of a recycling flow in a space recycling method according to an alternative embodiment of the present invention;
fig. 8 is a schematic view of a space reclamation apparatus according to an embodiment of the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the invention described herein are capable of operation in sequences other than those illustrated or described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of description, some nouns or terms appearing in the present invention will be described in detail below.
An object storage system: the storage system is constructed based on object storage devices, wherein the objects are basic units of data storage, one object is actually a combination of a data part and a metadata part of a file, and each object storage device can intelligently and automatically manage the data distribution of the objects, and provides a flattened data structure and a concurrent data access mode.
Merging the small files: small files meeting a certain quantity or size are merged into a large object, and the problems of high space occupancy rate and reading and writing performance of the small files are solved.
Void ratio: when some small files in the large object generated after the small files are merged are deleted, a corresponding number of holes, called as invalid spaces, are left in the address space occupied by the large object. The ratio of the total volume of all dead spaces to the total space occupied by the large object is called the void rate.
An index pool: metadata information of the object is stored. When the object size is smaller than 1MB, the data portion and the metadata portion of the object are stored, and when the object size is larger than 1MB, only the metadata information of the object is stored. The index pool is generally created by adopting a medium as a hard disk of the SSD, and the read-write performance of the index and the small object can be improved.
A data pool: storing data portions of objects larger than 1MB and objects larger than 4MB in size.
Merging the large objects: the data of a plurality of small files are merged together to generate a large object, and the name is stored in a data pool beginning with sfm.
Managing the object: and (4) merging the objects in the system resource pool, wherein the default number of the objects is 31, and recording the metadata information of the merged large object to be recycled. 31 management objects will generate 31 collection tasks. The coroutine traverses each task object from the 31 task lists, obtains the task type and carries out corresponding processing according to the type information. Fig. 1 is a schematic diagram of a metadata structure of a management object according to an embodiment of the present invention, as shown in fig. 1, the metadata structure of the management object records a set of key-value key value pairs of a fine-grained tree structure, where key is a statistically counted object name, and each key corresponds to value information.
Counting the objects: and (4) defaulting 1024 objects in the system index pool, and recording deletion count information of small files in the large object. Fig. 2 is a schematic diagram of a metadata structure of a statistical count object according to an embodiment of the present invention, and as shown in fig. 2, the metadata structure of the statistical count object also records a set of key-value key value pairs of a fine-grained tree structure, where key is a name of a merged large object, and each key corresponds to value information.
The estimation scanning module Estimate: and traversing all the statistical counting objects, and calculating and statistically merging the use condition of the large object space. There are four states:
init represents the initialization recovery scanning;
running: indicating that a recovery scan is in progress;
error: indicating that the recovery scan fails to execute, and returning an error;
finish: the recovery scanning is successfully executed, and the completion is returned;
recovery scanning module Recycle: and traversing all the counting objects, acquiring the space use condition of each merging large object, calculating the void ratio, adding the merging large objects into a merging space recovery task list if the void ratio is greater than or equal to a specified threshold value, and waiting for a background merging thread to process. There are six states:
init: the situation that the recovery scanning is initialized and the global task state is init is shown:
when all task states are init.
Estimate: and (3) recovering the estimated state of scanning, wherein the global task state is the condition of estimate:
when all task states are estimate.
Running: indicating a case where the estimated scan is in progress and the global state is shown as running:
when one state in all tasks is running;
when part of all tasks are init and the task number of the init state is less than the total task number.
error: indicating that the pre-estimated scan fails to execute, returning an error, and showing the global state as the condition of error:
when all task states are error;
when only the finish state and the error state exist in all the tasks, and the number of the tasks in the error state is larger than that in the finish state.
finish, which represents the situation that the pre-estimated scanning is successfully executed and returns to be completed, and the global state is shown as finish:
when all task states are finish;
when only a finish state and an error state exist in all the tasks, and the number of the tasks in the finish state is greater than that of the tasks in the error state;
when the execution command forcibly stops (stop) the current recovery scanning task, finish is returned.
Pause: and suspending the current recovery scanning task, returning to a pause state, recording the current scanning progress, executing pause in all states, and after executing the continue command, continuing to execute scanning according to the progress, wherein the state is kept consistent with that before suspension.
Example 1
In accordance with an embodiment of the present invention, there is provided an embodiment of a space reclamation method, it is noted that the steps illustrated in the flowchart of the figure may be performed in a computer system such as a set of computer-executable instructions and that, although a logical order is illustrated in the flowchart, in some cases the steps illustrated or described may be performed in an order different than here.
Fig. 3 is a flowchart of a space reclamation method according to an embodiment of the present invention, as shown in fig. 3, the space reclamation method including the steps of:
step S302, based on the statistical counting objects of the recycle scanning traversal preset index pool, determining the space usage data of the merged large object in the statistical counting objects;
the space usage data of the merged large object includes, but is not limited to, occupied space, recoverable space (releasable space), invalid space, total space, etc. of the merged large object.
The counting object adopts a metadata structure of a key-value key value pair, so that the problem that the detailed statistical information of the recycling space cannot be timely acquired due to large organization granularity of metadata of the existing object storage system can be effectively solved.
The recovery scanning is based on the statistical counting object traversing the preset index pool, and the whole recovery scanning is controlled by determining the space use data of the merged large object in the statistical counting object in the execution process by using the set pause state, the continuation state and the stop state, so that the scanning process is convenient to master. This is different from the scanning pattern from the beginning to the end in the prior art.
Step S304, determining recoverable space of a data pool associated with a preset index pool according to space use data;
and S306, performing space recovery on the recoverable space of the data pool according to a preset space recovery rule.
The preset space reclamation rule includes setting a predetermined time period or time point, a current load of the object storage system, and the like, and the space reclamation is to release a recyclable space. In particular, the reclaimable space of the data pool associated with the predetermined index pool may be spatially reclaimed at predetermined time intervals based on the current time, the predetermined time periods including, but not limited to, months, days, hours, minutes, etc., e.g., 30 minutes, two days, 3 months, etc. Additionally, if the space reclamation of the reclaimable space of the data pool associated with the predetermined index pool is not completely completed for the current predetermined time period, the space reclamation of the remaining reclaimable space of the data pool associated with the predetermined index pool may be continued for the next predetermined time period.
Optionally, the recoverable space of the data pool associated with the predetermined index pool is spatially recovered according to a predetermined time period and/or the current load of the object storage system, so that the problem that the existing object storage system is single and fixed in spatial recovery time and cannot be dynamically set according to the actual load condition of the system is effectively solved.
It should be noted that the embodiments of steps S302 to S306 are applied to space reclamation of the object storage system, and under a scenario where a large amount of small files are repeatedly read/written/deleted, the coroutine model does not occupy system thread resources, and space reclamation is completed.
Specifically, the above embodiments of step S302 to step S306 are executed in the coroutine, where the coroutine has a smaller switching overhead, is completely controlled by a program, belongs to user-level switching, and is completely imperceptible to an operating system, so that the coroutine is lighter; the concurrency effect can be realized in a single thread, the CPU is utilized to the maximum extent, and the CPU overhead caused by thread switching is avoided. Space scanning can be executed at any time, the scanning state is controlled, the space generated by deleting the object is released according to needs, and new small files are promoted to merge in time, so that the waste of system space is reduced, and the space utilization rate is improved.
Through the steps, the statistical counting objects traversing the preset index pool based on recovery scanning can be adopted, and the space use data of the merged large object in the statistical counting objects is determined; determining a recoverable space of a data pool associated with a predetermined index pool according to space usage data; the method comprises the steps of obtaining the recoverable space of a data pool associated with a preset index pool by merging space use data of large objects in the preset index pool determined by recovery scanning according to a preset space recovery rule, and performing space recovery on the recoverable space of the data pool according to the preset space recovery rule, so that the purpose of flexibly realizing space recovery according to actual needs is achieved, the technical effects of reducing system space waste and improving the space utilization rate are achieved, and the technical problem that threads occupy resources in switching due to space recovery of an object storage system multithread model in the related technology is solved.
Optionally, before determining spatial usage data merging large objects in the statistical count objects based on the statistical count objects traversing the predetermined index pool by the recycle scan, the method includes: and determining the space usage data of the merged large object in the statistical counting object based on the statistical counting object of the pre-estimated scanning traversing the preset index pool.
Through pre-estimating scanning and traversing all the statistical counting objects, the space use data of the merged large object can be calculated and counted, and the use condition of the merged large object space can be timely and accurately obtained. In a specific implementation process, the statistical counting object traversing the preset index pool based on the pre-estimated scanning can be executed firstly, the space use data of the merged large object in the statistical counting object is determined, then the statistical counting object traversing the preset index pool based on the recovery scanning is executed, and the space use data of the merged large object in the statistical counting object is determined; the method can also directly determine the space usage data of the merged large object in the statistical counting object by traversing the statistical counting object of the preset index pool based on the recovery scanning. Of course, it is also possible to separately perform the determination of the spatial usage data of the merged large object in the statistical count object based on the statistical count object that traverses the predetermined index pool based on the pre-estimated scan. The above embodiments can be flexibly set according to the requirements of application scenarios.
Optionally, determining the spatial usage data of the merged large object in the statistical count object based on the statistical count object of the recycle scan traversing the predetermined index pool comprises: acquiring invalid space and total space of a large merging object in the counting objects; determining the void rate of the merged large object according to the invalid space and the total space; and under the condition that the void rate meets a preset void rate threshold value, adding the merged large object to a merged space recycling list, and determining the occupied space and the recyclable space of the merged large object.
In the index pool, the metadata of the statistical count object includes not only the name of the merged large object, but also the number of small objects in the merged large object, the occupied space in the merged large object, the invalid space in the merged large object, the number of deleted small objects in the merged large object, the ID of the data pool in which the merged large object is located, and the like.
And determining the void rate of the merged large object according to the small object number, the invalid space and the total space of the merged large object in the acquired statistical counting objects, adding the merged large object to a merged space recycling list under the condition that the void rate accords with a preset void rate threshold value, and determining the occupied space and the recyclable space of the merged large object. Therefore, the merged large object which meets the preset voidage threshold value, the occupied space of the merged large object and the recoverable space can be recorded through the merged space recovery list, and subsequent space recovery is facilitated.
The predetermined hole rate threshold may be preset, or may be default.
Optionally, in a case that the void rate meets a predetermined void rate threshold, adding the merged large object to the merged space reclamation list includes: judging whether the void rate of the merged large object is greater than or equal to a preset void rate threshold value or not; and adding the merging large object to a merging space recycling list under the condition that the void rate is greater than or equal to a preset void rate threshold value.
In order to accurately and quickly add the required merged large object to the merged space recycling list, a predetermined voidage threshold value can be set according to the requirement of an application scene, and then when the voidage of the merged large object is greater than or equal to the predetermined voidage threshold value, the merged large object is added to the merged space recycling list.
For example, a function of manually triggering the space scanning of the merged large object can be provided based on a collaborative process model, after the triggering, all merged objects are scanned, and the merged objects meeting the space recovery are added into a space recovery list according to a preset voidage threshold value which is currently set.
Optionally, after adding the merged large object to the merge space recycle list, the method further includes: determining the number of merged large objects which are added completely and the counted invalid total space capacity in the merged space recycling list; and updating the metadata of the statistical counting object according to the number and the invalid total space capacity.
Through the method, the metadata of the counting object can be updated in time through the number of the merged large objects which are added in the merged space recycling list and the counted invalid total space capacity.
In the embodiment of the invention, the object storage system space recovery metadata organization structure maintains the space statistical information of the merging large object through the management object and the statistical counting object, the statistical counting object and the management object both adopt a tree-shaped metadata organization mode, the current recoverable space information is recorded in detail, and the occupation condition of the fragment space of the current system can be better known.
Optionally, the metadata of the statistical count object is a key-value pair of a tree structure, where the metadata of the statistical count object includes: merging the names of the large objects and information corresponding to the names of the large objects, wherein the information corresponding to the names of the large objects comprises at least one of the following information: small object number, occupied space, invalid space, deleted small object number, data pool ID.
The metadata of the counting object adopts a more optimized tree metadata organization mode, current recoverable space information is recorded in detail, and the occupation situation of the fragment space of the current system is better known.
Optionally, before obtaining the invalid space and the total space of the large object merged in the statistical count object, the method further includes: establishing an incidence relation between a statistical counting object and a management object of a preset index pool; and acquiring the metadata of the management object according to the association relation.
Before establishing the association relationship between the statistical count object and the management object of the predetermined index pool, the method further includes: and cleaning task information on the management object, wherein the task information comprises but is not limited to task types, task states and the like.
The association relationship is a hash relationship, and in a specific implementation process, the names of the statistical counting objects with the first predetermined number in the predetermined index pool can be hashed to the management objects with the second predetermined number, so as to obtain the hash relationship with the second predetermined number. Optionally, the first predetermined number is 1024; the second predetermined number is 31. The hash relation of the second predetermined number corresponds to the generation of the second predetermined number of collection tasks.
In the process of hashing a first predetermined number of statistical counting object names in a predetermined index pool to a second predetermined number of management objects, corresponding task information of each management object needs to be recorded.
The number of the incidence relations is at least one, and the incidence relations are stored in the task list. In the specific implementation process, a management object can be obtained from the task list and locked; judging whether the management object is locked or not, and reading the metadata of the management object if the management object is locked; otherwise, waiting for 30 seconds and continuing to lock next time; in the case that the metadata of the management object is not successfully read or the management object is not locked, the loop execution can be continued to acquire one management object from the task list and lock the management object until the metadata of the management object in the task list is completely read.
Optionally, after updating the metadata of the statistical count object, the method further includes: the metadata of the management object is updated.
Alternatively, the reclaimable spaces of the merged large object in the merged space reclaiming list can be accumulated to obtain the total reclaimable space of the merged space reclaiming list, and then the metadata of the management object is updated.
Optionally, the metadata of the management object is a key-value pair with a tree structure, where the metadata of the management object includes: counting the name of the counting object and the information corresponding to the name of the counting object, wherein the information corresponding to the name of the counting object comprises at least one of the following: task type, total number of counting statistical objects, index pool ID, predetermined void rate threshold, and space recycling statistical data.
The metadata of the management object adopts a more optimized tree metadata organization mode, current recoverable space information is recorded in detail, and the occupation situation of the fragment space of the current system is better known.
Optionally, determining the spatial usage data of the merged large object in the statistical count object based on pre-estimated scanning to traverse the statistical count object of the predetermined index pool comprises: acquiring invalid space and total space of a large merging object in the counting objects; determining the void rate of the merged large object according to the invalid space and the total space; and determining the occupied space and the recoverable space of the merged large object under the condition that the void rate meets a preset void rate threshold value.
And determining the void rate of the merged large object according to the invalid space and the total space of the merged large object in the acquired statistical counting object, and determining the occupied space and the recoverable space of the merged large object under the condition that the void rate accords with a preset void rate threshold value. Therefore, the merged large object which meets the preset voidage threshold value, the occupied space of the merged large object and the recoverable space can be recorded through the merged space recovery list, so that the use condition of the merged large object space can be conveniently known.
Optionally, before the obtaining of the statistical count object merges the large object invalid space and the total space, the method further includes: establishing an incidence relation between a statistical counting object and a management object of a preset index pool; and acquiring the metadata of the management object according to the association relation.
Optionally, after determining the occupied space and the recoverable space of the merged large object, the method further comprises: the metadata of the management object is updated.
Optionally, before determining the spatial usage data of the merged large object in the statistical count object based on the statistical count object of the recycle scan traversing the predetermined index pool, the method further includes: a predetermined index pool and a predetermined hole rate threshold are determined.
In a specific implementation process, the predetermined index pool and the predetermined void rate threshold may be set manually or automatically. For example, the predetermined index pool may be determined in a manner of specifying an ID of the index pool; the predetermined voidage threshold described above may be manually entered, etc. It should be noted that the predetermined voidage threshold value has a value range of [0, x% ], where x is any positive integer between 0 and 100; for example, the predetermined voidage threshold may be 35%, 45%, 82%, etc., and may be flexibly set according to the requirements of the application scenario.
In the specific implementation process, the time and the void rate can be flexibly and dynamically set for scanning, the scanning state is controllable, the void space generated by deleting the merged small objects is timely released under the condition of not influencing the service performance, the merging of new small files is promoted, and the waste of system capacity is reduced.
Fig. 4 is a flowchart of pre-estimation scanning in a space reclamation method according to an alternative embodiment of the present invention, as shown in fig. 4, when the running state is the init state, the background scanning routine will first clear all task information on all 31 management objects, then traverse the predetermined index pool in the storage policy, and hash 1024 statistical count object names in the predetermined index pool to 31 management objects. Then entering running state.
After the background scanning coroutine acquires the management object tasks, the management objects are locked, the current task type and the task state are recorded in the metadata of each management object, and the void rate of each merging large object can be acquired according to the counting objects.
For merged large objects that meet the set predetermined void rate requirements, each merged large object releasable space is recorded. And when all 31 tasks are executed, accumulating all the recorded releasable spaces on each merging large object to obtain a final total released space value, then updating the metadata of the management object, setting the running state to finish, and unlocking the management object.
Fig. 5 is a flowchart of recovery scanning in a space recovery method according to an alternative embodiment of the present invention, and as shown in fig. 5, after a command is executed, if the running state is an init state, a background scanning routine may also execute task information on a cleaning management object, and traverse an index pool specified in a storage policy, hash names of 1024 statistical count objects in the index pool onto 31 management objects, and generate 31 recovery tasks.
The recovery scanning also needs to execute one recovery pre-estimation scanning, and the two processes are independent and independent of the scanning result, which is different from the independent execution of the pre-estimation scanning. And after estimation is finished, entering a running state. The recovery prediction scanning adds the merging large objects which accord with the specified void rate into the merging space recovery list. If the join fails, an error status is returned. If the result is successful, the metadata information of the current counting object is updated, including the number of sfm large objects which are completed and the counted invalid total space capacity. Until all the statistical count objects are scanned and updated.
The background scanning coroutine reads the key information on 100 statistical counting objects each time for processing, accumulates the total volume of the releasable space meeting the requirement according to the granularity of the designated index pool, records the quantity information of all invalid spaces, deletes the sfm merging large objects which have completed statistics from all sfm large objects, then deletes the key of the corresponding statistical counting object on the management object, and then updates the metadata of the management object.
And continuing to read the key information on the rest statistical counting objects for processing until all statistical counting objects are processed and the number of the rest sfm objects is reduced to 0, updating the running state to finish, and unlocking the management object.
Fig. 6 is a schematic diagram of a logic transition manner of a scan state in a space reclamation method according to an alternative embodiment of the present invention, as shown in fig. 6, a reclamation scan supports setting of a suspend, a resume, and a stop state, after the setting, each management object updates an operation state, and the state can be set only when a task type is a reclamation scan. The predicted scan is not supported.
And when the running state of the recovery scanning is finish, automatically exiting the coroutine. Fig. 7 is a schematic diagram of a recycling process in a space recycling method according to an alternative embodiment of the present invention, as shown in fig. 7, different execution time periods are selected according to different system loads, and start and end times may be executed all day or across days. After the specified space recovery and arrangement time is reached, the counted space is released, the background merging coordination process performs new small file merging again, and after the new objects are merged, the merging information of the small file objects in the index pool is updated, so that the space is recycled, and the waste of capacity space caused by merging of the small files is reduced. If the execution is not completed within the specified time period, a new execution time period may be set or a next execution cycle may be waited to continue completing the remaining space reclamation.
In the above embodiment, time and void rate can be flexibly and dynamically set for scanning, the state is controllable, the void space generated by deletion of the merged small object is timely released under the condition of not influencing service performance, merging of new small files is promoted, and waste of system capacity is reduced
Example 2
According to another aspect of the embodiments of the present invention, there is also provided a space recycling apparatus, and fig. 8 is a schematic view of the space recycling apparatus according to the embodiments of the present invention, as shown in fig. 8, the space recycling apparatus including: a first determination module 802, a second determination module 804, and a reclamation module 806. The space recovery apparatus will be described in detail below.
A first determining module 802, configured to determine, based on a recycle scan traversing a statistical count object of a predetermined index pool, spatial usage data of a merged large object in the statistical count object; a second determining module 804, connected to the first determining module 802, for determining a recoverable space of the data pool associated with the predetermined index pool according to the space usage data; a recycling module 806, connected to the second determining module 804, configured to perform space recycling on the recyclable space of the data pool according to a preset space recycling rule.
It should be noted that the above modules may be implemented by software or hardware, for example, for the latter, the following may be implemented: the modules can be located in the same processor; alternatively, the modules may be located in different processors in any combination.
It should be noted that the first determining module 802, the second determining module 804 and the recycling module 806 correspond to steps S302 to S306 in embodiment 1, and the modules are the same as the corresponding steps in the implementation example and application scenarios, but are not limited to the disclosure in embodiment 1. It should be noted that the modules described above as part of an apparatus may be implemented in a computer system such as a set of computer-executable instructions.
As can be seen from the above, in the above embodiments of the present application, the first determining module 802 may be implemented to determine the space usage data of the merged large object in the statistical count object based on the statistical count object that is obtained by traversing the predetermined index pool through the recycle scan; the second determining module 804 determines the recyclable space of the data pool associated with the predetermined index pool according to the space usage data; the recycling module 806 performs a space recycling manner on the recyclable space of the data pool according to a preset space recycling rule, and the space recycling device can merge the space usage data of the large object in the preset index pool determined by the recycling scan to obtain the recyclable space of the data pool associated with the preset index pool, and perform space recycling on the recyclable space of the data pool according to the preset space recycling rule, thereby achieving the purpose of flexibly achieving space recycling according to actual needs, thereby achieving the technical effects of reducing system space waste and improving space utilization rate, and further solving the technical problem that the multithreading model of the object storage system in the related art performs space recycling to cause the thread to occupy resources during switching.
Optionally, before determining the spatial usage data of the merged large object in the statistical count object based on the statistical count object traversing the predetermined index pool by the recycle scan, the apparatus includes: and the third determining module is used for determining the space use data of the merged large object in the statistical counting object based on the statistical counting object of the pre-estimated scanning traversing the preset index pool.
Optionally, the first determining module includes: the first acquisition unit is used for acquiring the invalid space and the total space of the merged large object in the counting object; the first determining unit is used for determining the void rate of the merged large object according to the invalid space and the total space; and the adding unit is used for adding the merged large object to the merged space recycling list under the condition that the voidage accords with the preset voidage threshold value, and determining the occupied space and the recyclable space of the merged large object.
Optionally, the adding unit includes: the judging subunit is used for judging whether the void rate of the merged large object is greater than or equal to a preset void rate threshold value or not; and the adding subunit is used for adding the merged large object to the merged space recycling list under the condition that the void rate is greater than or equal to the preset void rate threshold value.
Optionally, after adding the merged large object to the merge space recycle list, the apparatus further includes: the second determining unit is used for determining the number of the merged large objects which are added completely and the counted invalid total space capacity in the merged space recycling list; and the first updating unit is used for updating the metadata of the statistical counting object according to the quantity and the invalid total space capacity.
Optionally, the metadata of the statistical count object is a key-value pair of a tree structure, where the metadata of the statistical count object includes: merging the names of the large objects and information corresponding to the names of the large objects, wherein the information corresponding to the names of the large objects comprises at least one of the following information: small object number, occupied space, invalid space, deleted small object number, data pool ID.
Optionally, before acquiring the number of small objects, the invalid space and the total space of the statistical count object, which merge the large objects, the apparatus further includes: the establishing unit is used for establishing an incidence relation between a statistic counting object and a management object of a preset index pool; and the second acquisition unit is used for acquiring the metadata of the management object according to the association relation.
Optionally, after updating the metadata of the statistical count object, the apparatus further includes: and the second updating unit is used for updating the metadata of the management object.
Optionally, the metadata of the management object is a key-value pair with a tree structure, where the metadata of the management object includes: counting the name of the counting object and the information corresponding to the name of the counting object, wherein the information corresponding to the name of the counting object comprises at least one of the following: task type, total number of counting statistical objects, index pool ID, predetermined void rate threshold, and space recycling statistical data.
Optionally, the third determining module includes: a third obtaining unit, configured to obtain a merging large object invalid space and a total space in the statistical count object; the third determining unit is used for determining the void rate of the merged large object according to the invalid space and the total space; and the fourth determining unit is used for determining the occupied space and the recoverable space of the merged large object under the condition that the void rate meets the preset void rate threshold value.
Optionally, before determining the spatial usage data of the merged large object in the statistical count object based on the statistical count object of the recycle scan traversing the predetermined index pool, the apparatus further includes: and the fourth determination module is used for determining a preset index pool and a preset void rate threshold.
Example 3
According to another aspect of the embodiments of the present invention, there is also provided a storage medium, where the storage medium includes a stored program, and when the program runs, the apparatus where the storage medium is located is controlled to execute the space reclamation method in any one of the above.
Example 4
According to another aspect of the embodiments of the present invention, there is also provided a processor, configured to execute a program, where the program executes a space reclamation method as described in any one of the above.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
In the above embodiments of the present invention, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.
In the embodiments provided in the present application, it should be understood that the disclosed technology can be implemented in other ways. The above-described embodiments of the apparatus are merely illustrative, and for example, the division of the units may be a logical division, and in actual implementation, there may be another division, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, units or modules, and may be in an electrical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic or optical disk, and other various media capable of storing program codes.
The foregoing is only a preferred embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (14)

1. A method of space reclamation, comprising:
determining spatial usage data of a merged large object in a statistical counting object based on a statistical counting object of a recycle scan traversing a predetermined index pool;
determining a recoverable space of the data pool associated with the predetermined index pool according to the space usage data;
and carrying out space recovery on the recoverable space of the data pool according to a preset space recovery rule.
2. The method of claim 1, prior to determining spatial usage data for a merged large object in a statistically counted object based on a recycle scan traversing the statistically counted object of the predetermined index pool, the method comprising:
and determining the space usage data of the merged large object in the statistical counting objects based on the statistical counting objects of the pre-estimated scanning traversing the preset index pool.
3. The method of claim 1, wherein determining spatial usage data for a merged large object in the statistically counted objects based on a recycle scan traversing the statistically counted objects of the predetermined index pool comprises:
acquiring invalid space and total space of a large merging object in the counting objects;
determining the void rate of the merged large object according to the invalid space and the total space;
and adding the merged large object to a merged space recycling list under the condition that the void rate meets a preset void rate threshold value, and determining the occupied space and the recyclable space of the merged large object.
4. The method of claim 3, wherein adding the merged large object to a merge space reclamation list if the void rate meets a predetermined void rate threshold comprises:
judging whether the void rate of the merged large object is greater than or equal to the preset void rate threshold value or not;
and adding the merged large object to a merged space recycling list when the void rate is greater than or equal to the predetermined void rate threshold.
5. The method of claim 3, wherein after adding the merged large object to a merge space recycle list, the method further comprises:
determining the number of merged large objects which are added completely and the counted invalid total space capacity in the merged space recycling list;
and updating the metadata of the statistical counting object according to the number and the invalid total space capacity.
6. The method of claim 5, wherein the metadata of the statistical count object is a tree-structured key-value pair, and wherein the metadata of the statistical count object comprises: merging the names of the large objects and the information corresponding to the names of the large objects, wherein the information corresponding to the names of the large objects comprises at least one of the following information: small object number, occupied space, invalid space, deleted small object number, data pool ID.
7. The method of claim 3, wherein prior to obtaining the invalid space and the total space of the statistically counted objects that merge the large objects, the method further comprises:
establishing an incidence relation between a statistical counting object and a management object of the preset index pool;
and acquiring the metadata of the management object according to the incidence relation.
8. The method of claim 7, wherein after updating the metadata of the statistical count object, the method further comprises:
updating the metadata of the management object.
9. The method of claim 7, wherein the metadata of the management object is a tree-structured key-value pair, and wherein the metadata of the management object comprises: counting the name of the counting object and the information corresponding to the name of the counting object, wherein the information corresponding to the name of the counting object comprises at least one of the following: task type, total number of counting statistical objects, index pool ID, predetermined void rate threshold, and space recycling statistical data.
10. The method of claim 2, wherein determining spatial usage data for a merged large object in the statistically counted objects based on pre-estimated scans across statistically counted objects of the predetermined index pool comprises:
acquiring invalid space and total space of a large merging object in the counting objects;
determining the void rate of the merged large object according to the invalid space and the total space;
determining occupied space and recoverable space of the merged large object if the void rate meets a predetermined void rate threshold.
11. The method of any of claims 1 to 10, wherein prior to determining spatial usage data for a large one of the statistically counted objects that merges with a large object based on a recycle scan traversing the statistically counted objects of the predetermined index pool, the method further comprises:
a predetermined index pool and a predetermined hole rate threshold are determined.
12. A space reclamation apparatus, comprising:
the device comprises a first determining module, a second determining module and a third determining module, wherein the first determining module is used for determining the space use data of a merged large object in a statistical counting object based on the statistical counting object of a recovery scanning traversal preset index pool;
a second determining module, configured to determine, according to the space usage data, a recoverable space of the data pool associated with the predetermined index pool;
and the recovery module is used for performing space recovery on the recoverable space of the data pool according to a preset space recovery rule.
13. A storage medium, characterized in that the storage medium comprises a stored program, wherein when the program runs, a device in which the storage medium is located is controlled to execute the space reclamation method according to any one of claims 1 to 11.
14. A processor configured to run a program, wherein the program when running performs the space reclamation method of any one of claims 1 to 11.
CN202010366322.3A 2020-04-30 2020-04-30 Space recovery method, device, storage medium and processor Active CN111597147B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010366322.3A CN111597147B (en) 2020-04-30 2020-04-30 Space recovery method, device, storage medium and processor

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010366322.3A CN111597147B (en) 2020-04-30 2020-04-30 Space recovery method, device, storage medium and processor

Publications (2)

Publication Number Publication Date
CN111597147A true CN111597147A (en) 2020-08-28
CN111597147B CN111597147B (en) 2021-12-17

Family

ID=72192064

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010366322.3A Active CN111597147B (en) 2020-04-30 2020-04-30 Space recovery method, device, storage medium and processor

Country Status (1)

Country Link
CN (1) CN111597147B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925643A (en) * 2021-02-26 2021-06-08 北京百度网讯科技有限公司 Data processing method and device and storage engine device

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6622226B1 (en) * 2000-07-31 2003-09-16 Microsoft Corporation Method and system for using a mark-list for garbage collection
CN105138282A (en) * 2015-08-06 2015-12-09 上海七牛信息技术有限公司 Storage space recycling method and storage system
CN110688504A (en) * 2019-09-27 2020-01-14 中国工商银行股份有限公司 Image data management method, apparatus, system, device and medium
CN110750495A (en) * 2019-10-14 2020-02-04 Oppo(重庆)智能科技有限公司 File management method, file management device, storage medium and terminal
CN110888837A (en) * 2019-11-15 2020-03-17 星辰天合(北京)数据科技有限公司 Object storage small file merging method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6622226B1 (en) * 2000-07-31 2003-09-16 Microsoft Corporation Method and system for using a mark-list for garbage collection
CN105138282A (en) * 2015-08-06 2015-12-09 上海七牛信息技术有限公司 Storage space recycling method and storage system
CN110688504A (en) * 2019-09-27 2020-01-14 中国工商银行股份有限公司 Image data management method, apparatus, system, device and medium
CN110750495A (en) * 2019-10-14 2020-02-04 Oppo(重庆)智能科技有限公司 File management method, file management device, storage medium and terminal
CN110888837A (en) * 2019-11-15 2020-03-17 星辰天合(北京)数据科技有限公司 Object storage small file merging method and device

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112925643A (en) * 2021-02-26 2021-06-08 北京百度网讯科技有限公司 Data processing method and device and storage engine device
CN112925643B (en) * 2021-02-26 2024-01-12 北京百度网讯科技有限公司 Data processing method and device and storage engine device

Also Published As

Publication number Publication date
CN111597147B (en) 2021-12-17

Similar Documents

Publication Publication Date Title
JP3763992B2 (en) Data processing apparatus and recording medium
US9430388B2 (en) Scheduler, multi-core processor system, and scheduling method
US6772155B1 (en) Looking data in a database system
CN108509462B (en) Method and device for synchronizing activity transaction table
US8626765B2 (en) Processing database operation requests
US7912821B2 (en) Apparatus and method for data management
CN104793988A (en) Cross-database distributed transaction implementation method and device
CN107545015B (en) Processing method and processing device for query fault
US11468011B2 (en) Database management system
CN112241400A (en) Method for realizing distributed lock based on database
CN111597147B (en) Space recovery method, device, storage medium and processor
CN115145697A (en) Database transaction processing method and device and electronic equipment
CN109783578A (en) Method for reading data, device, electronic equipment and storage medium
CN113448701A (en) Multi-process outbound control method, system, electronic equipment and storage medium
CN112463795A (en) Dynamic hash method, device, equipment and storage medium
CN115964176B (en) Cloud computing cluster scheduling method, electronic equipment and storage medium
CN111221468B (en) Storage block data deleting method and device, electronic equipment and cloud storage system
CN116467267A (en) Garbage recycling method, device, storage medium and system
CN115408342A (en) File processing method and device and electronic equipment
CN116450328A (en) Memory allocation method, memory allocation device, computer equipment and storage medium
CN112328539A (en) Data migration method based on big data
CN114791901A (en) Data processing method, device, equipment and storage medium
CN115827508B (en) Data processing method, system, equipment and storage medium
CN117312267B (en) Line-level garbage collection mechanism based on peloton database
CN112667652B (en) Block chain based simulated transaction method, device, equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address
CP03 Change of name, title or address

Address after: 100094 101, floors 1-5, building 7, courtyard 3, fengxiu Middle Road, Haidian District, Beijing

Patentee after: Beijing Xingchen Tianhe Technology Co.,Ltd.

Address before: 100097 room 806-1, block B, zone 2, Jinyuan times shopping center, indigo factory, Haidian District, Beijing

Patentee before: XSKY BEIJING DATA TECHNOLOGY Corp.,Ltd.