CN114153392A

CN114153392A - Object storage data storage management method, device and equipment

Info

Publication number: CN114153392A
Application number: CN202111430777.8A
Authority: CN
Inventors: 罗心; 江文龙; 王志豪; 周明伟
Original assignee: Zhejiang Dahua Technology Co Ltd
Current assignee: Zhejiang Dahua Technology Co Ltd
Priority date: 2021-11-29
Filing date: 2021-11-29
Publication date: 2022-03-08

Abstract

The application discloses a method, a device and equipment for storing and managing object storage data, wherein when a multiplexing period is reached, a ZG cluster matched with the data volume is distributed according to the estimated data volume of a source data file received in the multiplexing period; responding to an object writing instruction of the source data file, selecting a ZG from ZG clusters allocated in the current multiplexing period, and indicating a storage node where a zone in the ZG is located to execute the writing of object fragments of the object; and deleting any overdue ZG or ZG cluster according to the life cycle of the source data file and the writing time of the object fragment of the source data file. The problems that an SMR hard disk only supports continuous zone writing and the whole zone deleting characteristic brings space waste and space timely recovery are solved, and the space utilization rate of the whole object storage system is improved.

Description

Object storage data storage management method, device and equipment

Technical Field

The application relates to the technical field of cloud storage, in particular to a method, a device and equipment for object storage data storage management.

Background

Shingled Magnetic Recording (SMR) is a leading next-generation magnetic disk technology, and adjacent tracks are partially overlapped in sequence, thereby improving the storage density of a unit storage medium and reducing the storage cost. Due to the physical characteristics of the SMR Disk, the read behavior of the SMR Disk is not different from that of a common HDD (hard Disk drive) mechanical hard Disk, but the write behavior of the SMR Disk is greatly changed, and the SMR Disk does not support random writing and in-place update writing because the SMR Disk causes overlapping track data to be covered. The SMR disk only supports head-to-tail sequential writes.

SMR disks divide the tracks into a plurality of bands (bands), i.e., areas where shells made up of consecutive tracks are written consecutively, each area becoming a basic unit of cells that need to be written sequentially. Band is the physical concept of SMR disks, the corresponding logical concept is called "Zone" (Zone), and the size of one Zone is 256 MB. Namely, a Zone only supports continuous writing, random reading and deletion of the whole Zone space, which causes space waste and space can not be timely recovered.

Based on the above disadvantages of SMRs, what is more currently used inside SMR hard disks is, for example, creating a local file system, creating an internal index of the SMR hard disk, and associating the SMR with a random access medium. The method does not relate to the global management and the space management of the distributed system layer, the research on the aspects of the data type collection, the data reliability and the like, does not effectively solve the problems that an SMR hard disk only supports continuous Zone writing, and the space waste and the space timely recovery are caused by the deletion characteristic of the whole Zone, and does not perform the cloud storage space management aiming at the security streaming data global.

Disclosure of Invention

The application provides a method, a device and equipment for managing storage space of object storage data, which are used for managing the storage space of the object storage data and improving the space utilization rate of the whole cloud storage system based on SMR hard disks.

In a first aspect, an embodiment of the present application provides a method for managing an object storage data storage space, where the method includes:

when a multiplexing period is reached, distributing a ZG cluster matched with the data volume according to the estimated data volume of a source data file received in the multiplexing period, wherein the ZG cluster comprises a plurality of ZGs, and the number of zones in the ZG is consistent with the number N + M of object fragments of an object of the source data file;

responding to an object writing instruction of the source data file, selecting a ZG from ZG clusters allocated in the current multiplexing period, and indicating a storage node where a zone in the ZG is located to execute the writing of object fragments of the object;

and deleting any overdue ZG or ZG cluster according to the life cycle of the source data file and the writing time of the object fragment of the source data file.

In one possible embodiment, selecting one ZG from the ZG cluster allocated for the current multiplexing period includes:

when determining that the current object is the first object in the multiplexing period, writing the current object into one of the ZGs in the ZG cluster;

and when the current object is determined not to be the first object in the multiplexing period, if the ZG residual space written by the previous object is available, writing the current object into the ZG written by the previous object, and if the ZG residual space written by the previous object is unavailable, selecting an unused ZG from the ZG cluster again for object writing.

In a possible embodiment, the method further comprises:

and if the ZG residual space written by the last object is unavailable and the ZG cluster does not have unused ZG, reselecting N + M unused zones from the storage nodes to form a ZG for writing the object fragment according to a storage node load balancing strategy.

In one possible embodiment, allocating a ZG cluster matching an estimated data volume of a source data file received in a multiplexing period according to the data volume comprises:

determining all unused zones and storage nodes where the unused zones are located according to the information of all zones of the storage nodes;

determining the number n of ZG to be distributed according to the estimated data volume of the source data file received in the multiplexing period;

and according to a load balancing strategy among the storage nodes, selecting N + M unused zones from different storage nodes to finish the distribution of one ZG to finish the distribution of N ZGs.

for each ZG to be distributed, determining corresponding N + M storage nodes according to a load balancing strategy among the storage nodes;

in response to an object write instruction of the source data file, selecting one ZG from ZG clusters allocated in a current multiplexing cycle, including:

responding to the object writing instruction of the source data file, selecting a ZG from the ZG cluster distributed in the current multiplexing period, determining N + M storage nodes corresponding to the selected ZG, and respectively selecting an unused zone from the N + M storage nodes according to a strategy of internal load balancing of the storage nodes.

In one possible embodiment, in response to the object write instruction of the source data file, the method further includes:

determining a current object as data written for the first time after the current object is interrupted, acquiring a first ZG written by the last object before interruption when the interruption time is determined not to exceed the preset time, and writing the current object into the first ZG when the remaining space of the first ZG is determined to be available;

otherwise, one unused ZG is selected from the ZG cluster again for object writing.

In a possible embodiment, after instructing the storage node where the zone is located in the ZG to perform writing of the object slice of the object, the method further includes:

establishing a bidirectional index relationship between the object and the ZG cluster allocated in the current period and the selected ZG; or

And recording the time of writing the object into the selected ZG, and sequencing the ZGs in the ZG cluster according to the time of writing the first object in the ZG.

In a possible embodiment, the method further comprises:

receiving a file reading request sent by a client, wherein the file reading request comprises a file identifier and acquisition time of a source data file to be read;

searching file information of all source data files corresponding to the acquisition time according to the acquisition time of the source data files, and searching the file information of the source data files to be read according to file identifiers in the searched file information of the source data files;

determining an object included in the source data file to be read and a ZG cluster to which the object belongs according to the searched file information of the source data file to be read, wherein the object starts ZG in the ZG cluster to which the object belongs and offsets in the ZG, ends the ZG and offsets and lengths in the end ZG;

reading is carried out from a starting position determined according to the starting ZG of the object in the ZG cluster to which the object belongs and the offset in the ZG to an ending position determined according to the ending ZG and the offset and the length in the ending ZG.

In a possible embodiment, deleting any ZG or ZG cluster that expires according to the life cycle of the source data file and the write time of the object fragment of the source data file includes:

determining an expired ZG cluster and deleting the expired ZG cluster according to the expiration time of the ZG cluster, wherein the expiration time of the ZG cluster is the first writing time of the first object in the ZG of the first written object in the ZG cluster, and the time is obtained by summing the life cycle of the source data file and the multiplexing cycle of the ZG cluster;

and determining an expired ZG and deleting the expired ZG according to the expiration time of the ZG, wherein the expiration time of the ZG is the second writing time of the first object in the ZG and is the time obtained by summing the life cycle of the source data file and the multiplexing cycle of the ZG.

In a possible embodiment, at least one of the following steps is also included:

when determining that the life cycle of the source data file is changed or the EC mode of the object of the source data file is changed, releasing unused ZG in a ZG cluster allocated in the current reset cycle, and triggering the redistribution of the ZG cluster;

triggering to reach the next multiplexing period to perform ZG cluster allocation again when the allocated ZG cluster space is judged to be incapable of meeting the storage requirement before the current service period is ended;

and when determining that the unused ZG exists in the ZG cluster at the end of the current multiplexing period, releasing the unused ZG from the ZG cluster.

In a possible embodiment, the method further comprises:

detecting that a second ZG with abnormal zone exists in a current multiplexing period ZG cluster;

reselecting a new ZG to replace the second ZG when it is determined that the second ZG is not in use;

and when the second ZG is determined to be used, according to the number of the zones which do not lose data in the second ZG, determining that a recovery condition is met, and performing data recovery on the second ZG.

In one possible embodiment, in response to a ZG integrity check instruction, detecting that there is a second ZG with an abnormal zone in all allocated ZG clusters;

and according to the number of the zones of which the data are not lost in the second ZG, when the condition of recovering is determined to be met, performing data recovery on the second ZG.

In a second aspect, an embodiment of the present application provides an object storage data storage space management apparatus, where the apparatus includes:

the ZG cluster preallocation module is used for allocating a ZG cluster matched with the data volume according to the estimated data volume of the source data file received in the multiplexing period, wherein the ZG cluster comprises a plurality of ZGs, and the number of the zones in the ZG is consistent with the number N + M of the object fragments of the object of the source data file;

a data writing module, configured to select, in response to an object writing instruction of the source data file, one ZG from a ZG cluster allocated in a current multiplexing period, and instruct a storage node where a zone in the ZG is located to perform writing of an object slice of an object;

and the ZG cluster deleting module is used for deleting any overdue ZG or ZG cluster according to the life cycle of the source data file and the writing time of the object fragment of the source data file.

In a third aspect, an embodiment of the present application provides an object storage data storage space management device, where the device includes:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of the object store data memory space management methods.

In a fourth aspect, an embodiment of the present application provides a computer storage medium storing a computer program for causing a computer to execute any one of the object storage data storage space management methods.

According to the method, the device and the equipment for managing the storage space of the object storage data, global space application, management and dynamic adjustment of data storage are performed; the global space recovery of data deletion achieves the effect of efficiently recycling the cloud storage space, the problems that an SMR hard disk only supports continuous Zone writing, the whole Zone deletion characteristic brings space waste and space timely recovery are solved, and the space utilization rate of the whole object storage system is improved.

Drawings

FIG. 1 is a schematic view of an exemplary ZG composition in accordance with an exemplary embodiment of the present invention;

FIG. 2 is a flowchart illustrating an exemplary method for managing storage space of object storage data according to an exemplary embodiment of the present invention;

FIG. 3 is a schematic diagram of a data writing process according to an example of an exemplary embodiment of the invention;

FIG. 4 is a schematic diagram illustrating an object write flow according to an exemplary embodiment of the present invention;

FIG. 5 is a schematic diagram of an object storage data storage space management apparatus according to an example embodiment of the present invention;

FIG. 6 is a diagram illustrating an object store data storage space management apparatus, according to an illustrative embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present application will be described in detail and clearly with reference to the accompanying drawings. It is to be understood that the embodiments described are only a few embodiments of the present invention, and not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

First, terms of art of the embodiments of the present invention will be described.

Shingled Magnetic Recording (SMR) disks: the method is a leading next-generation magnetic disk technology, adjacent magnetic tracks are partially overlapped in sequence, the storage density of a unit storage medium can be improved, and the storage cost is reduced. Due to the physical characteristics of the SMR Disk, the read behavior of the SMR Disk is not different from that of a common mechanical Hard Disk (HDD), but the write behavior of the SMR Disk is greatly changed, and the SMR Disk does not support random writing and in-place update writing because the SMR Disk can cause overlapping track data to be covered. The SMR disk only supports head-to-tail sequential writes.

SMR disks divide the tracks into a plurality of bands (bands), i.e., areas where shells made up of consecutive tracks are written consecutively, each area becoming a basic unit of cells that need to be written sequentially. Band is the physical concept of an SMR disk, and the corresponding logical concept is called "zone" (zone), and the size of one zone is 256 MB. That is, Zone supports only continuous writing, random reading, and only entire Zone space deletion.

Erasure Code (EC): the general term is N + M, i.e., N object slices, generating M check blocks.

Data life cycle: i.e. to indicate the effective storage time of the data. For example, in a security storage scenario, the storage period of video and picture is usually measured in "hour" granularity. For example, video is stored for 90 days by 24 hours, and pictures are stored for 180 days by 24 hours. After the data life cycle expires, the system automatically deletes the expired data, releases the space, and is used for writing new data and circularly utilizing the storage space.

Zone Group (Zone Group, ZG): a set of zones. And selecting N + M zones from the N + M SMR hard disks (one zone is selected for each hard disk) to form a ZG. In a distributed storage system, in consideration of reliability of data storage, a zone component in ZG generally originates from different racks, different storage nodes, and different SMR hard disks. Wherein the number of N + M is the same as the number of N + M in the EC mode.

As shown in fig. 1, the ZG is composed of, for example, 4+1, zone1, zone2, zone3, zone4, and zone5, each zone from a different storage node.

The object storage data storage space management method in the embodiment of the present application will be described in detail below with reference to the accompanying drawings.

Fig. 2 is a schematic flow chart of a method for managing an object storage data storage space according to an embodiment of the present application, where an application and management node includes:

s201: when a multiplexing period is reached, distributing a ZG cluster matched with the data volume according to the estimated data volume of the source data file received in the multiplexing period, wherein the ZG cluster comprises a plurality of ZGs, and the number of the zones in the ZG is consistent with the number N + M of the object fragments of the object of the source data file.

A user writes data into a cloud storage system through a client, and a bucket directory corresponding to the client is established in the storage system. The source data file in the embodiment of the application is written into the cloud storage system according to the EC mode, that is, one data file can be divided into a plurality of objects, the objects are fragmented according to the preset object fragment size, then a plurality of redundant object fragments are calculated, and finally, N + M object fragments are obtained.

In a preset multiplexing period, the data volume of a source data file uploaded to a cloud storage system can be estimated, a ZG cluster matched with the data volume is allocated according to the estimated data volume, the ZG cluster comprises a plurality of ZGs, the ZGs comprise N + M zones, the N + M zones can be from the same storage node, but in consideration of data reliability, the N + M zones included in the ZG in the embodiment of the application are respectively from different storage nodes, and the N + M zones are all idle zones. The multiplexing period is a period for allocating ZG clusters, and can be configured according to the requirements of users.

The management node needs the storage node to report all the zone information on all the storage nodes, including the number of used zones and unused zones, where the used zones include written zones which are not fully written and written zones which are fully written; the number of zones allocated and capacity information in each storage node. The information is obtained by the management node indicating each storage node to report and/or each storage node reporting according to the preset time granularity.

S202: responding to the object writing instruction of the source data file, selecting one ZG from the ZG clusters allocated in the current multiplexing period, and indicating the storage node where the zone in the ZG is located to execute the writing of the object fragment of the object.

Selecting a ZG from the ZG clusters allocated in the current multiplexing cycle needs to be selected according to the writing order of the current object, the ZG remaining space written by the previous object, the capacity of the current ZG cluster, and the interruption time of the data stream.

It should be noted that the writing processes in the embodiment of the present application are all aligned writing, that is, N + M objects are written into N + M zones of the current ZG at the same time.

S203: and deleting any overdue ZG or ZG cluster according to the life cycle of the source data file and the writing time of the object fragment of the source data file.

And the source data files of the same client belong to the same Bucket, and the life cycle of the source data files is managed according to the Bucket granularity. Enumerating all source data file information under a Bucket owned by a client, wherein the source data file information comprises object fragment writing time of each ZG in a ZG cluster distributed for the source data file, if the current system time exceeds a life cycle from the object fragment writing time, the ZG or ZG cluster deleting condition is met, the ZG or ZG cluster is deleted, and a corresponding storage node is instructed to recover an idle zone and reset a write pointer, and the idle zone is uniformly managed by available resources. The deleting mode does not have zone delayed release caused by a zone data hole, and the space utilization rate of the system is greatly improved.

According to the object storage data storage space management method provided by the embodiment of the application, based on SMR hard disks on storage nodes, the same data source data is written into pre-allocated ZGs by logically allocating ZG spaces in advance. After the data lifecycle expires, the entire ZG space is quickly reclaimed. Namely data write space application and release, logically ZG whole block application, whole block release. And the method can be very flexibly suitable for the writing of the streaming data source, support frequent life cycle change, dynamically adapt to the scenes of EC writing mode change and the like, and achieve the purpose of improving the utilization rate of the cloud storage space.

In the step of S201, the correspondence between the data file and the ZG is as shown in fig. 3, a user writes data into the cloud storage system through the client according to the EC mode, the allocated ZG cluster is constructed by a ZG allocation time policy, for example, a data amount of 1 day is estimated, 8 ZGs are allocated to data of 1 day, the 8 ZGs are managed in a unified manner, so as to obtain a ZG cluster allocated within 1 day, and the time can be configured according to the will of the user. The zones included in the same ZG in the ZG cluster are respectively sourced from different storage nodes, and the zones in different ZGs can be sourced from the same storage node.

In one possible implementation, allocating a ZG cluster matching the data volume according to the estimated data volume of the source data file received in the multiplexing period includes at least one of the following cases:

as a possible implementation manner, in the project of allocating the ZG cluster matching with the estimated data volume of the source data file received in the multiplexing period, at least one of the following situations is included:

1) and the management node only records the number of zones needing to be allocated on each storage node by the ZG cluster, and acquires a specific certain SMR disk of the storage node and the offset address of the corresponding specific zone before data writing.

2) The management node records the number of zones needing to be allocated on each storage node by the ZG cluster, and offset addresses of a specific certain SMR disk and a corresponding specific zone of the storage node.

3) The allocation may be in accordance with a multiplexing period.

For example, according to the fact that the estimated data volume of 1 day is 8GB, taking a 4+1 mode as an example, an 8GB data file is first divided into 8 1GB objects, and the 8 1GB objects are respectively allocated with ZG spaces according to the 4+1 mode, so that each 1GB object is allocated to one ZG, 8GB data is allocated to 8 ZGs, each ZG includes 5 zones, and each zone is from a different storage node.

If the data life cycle is 30 days, the data volume of 30 days can be estimated, and the required ZG is distributed for the data of 30 days at one time. The allocated ZG includes a plurality of idle zones, and the multiplexing period is a data life cycle.

4) And when determining that the life cycle of the source data file is changed or the EC mode of the object of the source data file is changed, releasing unused ZG in the ZG cluster allocated in the current reset cycle, and triggering the reallocation of the ZG cluster.

The EC mode change, i.e., the N + M change, is, for example, a 4+1 mode change to a 4+2 mode change. In this case, the number of zones in the ZG is also changed, so that the ZG in the previously pre-allocated ZG cluster cannot be used continuously, the unused ZG is released from the current ZG cluster, and the ZG cluster is reconstructed according to the changed 4+2 mode.

5) And triggering to reach the next multiplexing period to perform ZG cluster allocation again when the allocated ZG cluster space is judged to be incapable of meeting the storage requirement before the current service period is ended.

When the specification of the data code stream is improved or due to data volume prediction errors, the residual space in the ZG cluster allocated in the current multiplexing period cannot meet the storage requirement (for example, the residual space is lower than 5% of the total space and 5% of the total space is configurable), the reallocation of the ZG cluster in the next multiplexing period is triggered in advance.

The expansion of unused ZG is real-time, dynamic, and does not require stopping the current data writing action, i.e., the expansion is performed without the user's perception.

According to the distribution mode, the most reasonable hard disk and the most reasonable zone are selected for data writing each time, and the stability of the whole system is improved to the maximum extent.

The step S202 may specifically be implemented as follows:

in the writing process, data is continuously written, namely, an object is firstly written into a ZG, when the ZG is full or the residual space is not enough to write the next object, another ZG is selected from the current ZG cluster, and the unwritten object is written into the ZG.

In one possible implementation, selecting one ZG from the ZG cluster allocated in the current multiplexing period includes:

when the current object is determined to be the first object in the multiplexing period, writing the current object into one of ZGs in the ZG cluster, and if the current object is the first object, indicating that ZGs in the ZG cluster are all available, optionally selecting one ZG;

when the current object is determined not to be the first object in the multiplexing cycle, if the ZG residual space written by the previous object is available, the current object is written into the ZG written by the previous object, if the ZG residual space written by the previous object is unavailable, an unused ZG is selected from the ZG cluster again for object writing, and therefore when one ZG is full or the residual space is unavailable, the next ZG is written again, and the objects with small time difference are guaranteed to be written into the same ZG.

In a possible implementation manner, if the remaining space of the ZG written by the previous object is not available and there is no unused ZG in the ZG cluster, which indicates that the ZG allocated in the current multiplexing period is insufficient, according to a storage node load balancing policy, reselecting N + M unused zones from the storage node to form one ZG for writing object slices, and extending the reselected ZG to the current ZG cluster.

In the data writing process, interruption may occur before, in this embodiment, it is determined that a current object is data written for the first time after a stream interruption occurs before, when it is determined that the interruption time does not exceed the preset time, a first ZG written by the last object before the interruption occurs is obtained, and when it is determined that the remaining space of the first ZG is available, the current object is written into the first ZG;

In this way, after the data is interrupted for a period of time, the new object is written into the ZG that was not full last time, in the embodiment of the present application, the interruption time is usually not more than half of the data lifecycle, for example, the lifecycle is 180 days, the data writing time in the ZG does not differ by 90 days, and the above multiplexing time in this embodiment is a configurable item.

The specific data writing flow is shown in fig. 4:

writing the data into the same group of ZG clusters according to the same data source data; the same client, the streaming data which is similar in time, even if the writing of the intermediate streaming data is interrupted (there is interruption time), continues to be written into the previous unfilled ZG after the streaming data is recovered. I.e., data that are close in time, are written to the same ZG.

Step 1: firstly, receiving an object fragment sent from a client, judging whether a current object is data which is written for the first time after the current object is interrupted, if so, acquiring a first ZG written by the last object before the interruption occurs, and if not, executing a step 3.

Step 2: judging whether the interruption time exceeds a preset time, if not, judging whether the residual space of the ZG is available, and if so, executing the step 3; and if the residual space is available, acquiring the ZG writable offset address, establishing a corresponding relation between the file and the ZG and ZG clusters to write data, and if the residual space is unavailable, executing the step 3.

And step 3: and acquiring any unused ZG from the current ZG cluster, and establishing a corresponding relation between the file and the ZG and ZG clusters to write data.

And 4, step 4: and after the object file is written to a certain size, switching the writing of the new object file, and establishing the corresponding relation between the new object file and the ZG and ZG clusters.

It should be noted that, in order to ensure the persistence of information, after the storage node where the zone in the ZG is indicated to perform the writing of the object fragment of the object, a bidirectional index relationship between the object and the ZG cluster allocated in the current period and the selected ZG is established; or

In a possible implementation, during the data writing process, there are also the following cases:

1) and when determining that the unused ZG exists in the ZG cluster at the end of the current multiplexing period, releasing the unused ZG from the ZG cluster.

And when the specification of the data code stream is reduced or data volume estimation errors occur, and further unused ZG in the ZG cluster distributed in the current multiplexing period is caused to remain, releasing the unused ZG from the ZG cluster.

2) And when a second ZG with abnormal zone in the current multiplexing cycle ZG cluster is detected and the second ZG is determined not to be used, reselecting a new ZG to replace the second ZG.

The abnormal zone conditions include various conditions, such as the failure of a rack, a storage node and an SMR hard disk. In the above case, if the current ZG is not used, a new ZG is reselected to replace the current ZG, and the selection process is as described above.

If the abnormal Zone is a condition of disk pulling, disk damage, sector damage, node deletion, node offline and timeout, the management node resets the Zone information corresponding to the Zone cache (the Zone length and the storage node location are reset to 0), and determines that the Zone in which the object segment length or the write pointer location is reset to 0 in the first ZG is the abnormal Zone.

3) And detecting that a second ZG with abnormal zone exists in the current multiplexing cycle ZG cluster, determining that the second ZG is used, and performing data recovery on the second ZG when determining that a recovery condition is met.

Responding to a ZG integrity detection instruction, and detecting that a second ZG with abnormal zone exists in all the distributed ZG clusters;

Performing data length statistics on each zone in the second ZG with the abnormal zone to obtain at least one object fragment length, and determining the number of zones corresponding to each object fragment length;

when the maximum zone number L meets the recovery condition, determining that the ZG to be recovered can be recovered, wherein the recovery condition is as follows: and L satisfies that N is not more than L and is less than N + M.

The step S203 may specifically be implemented as follows:

in one possible embodiment, deleting any ZG or ZG cluster that is expired specifically includes at least one of:

1) and deleting with ZG cluster as granularity.

And determining an expired ZG cluster and deleting the expired ZG cluster according to the expiration time of the ZG cluster, wherein the expiration time of the ZG cluster is the first writing time of the first object in the ZG of the first written object in the ZG cluster, and the time is obtained by summing the life cycle of the source data file and the multiplexing cycle of the ZG cluster.

For example, a ZG cluster is managed with 1 day as granularity, that is, the multiplexing cycle of the ZG cluster is 1 day, the life cycle of a source data file is 30 days, if the write time of a first object in a ZG from which an object is written first in the ZG cluster exceeds 31(30+1) days, it is determined that the ZG cluster is expired, all ZGs in the ZG cluster are released, all zones corresponding to all ZGs are released, a management node instructs a storage node to perform zone recovery and write pointer reset, and the storage node is placed in an available zone space pool to perform unified management on available resources. And after the ZG cluster is successfully deleted, deleting all the associated object file recording information corresponding to the ZG cluster.

2) Deletion is performed with ZG as the granularity.

For example: the life cycle of a source data file is 30 days, the multiplexing cycle of the ZG is 1 day, if the write time of the first object in the ZG exceeds 31(30+1) days, and the ZG is determined to be out of date, all the zones corresponding to all the ZGs are released, and all the file records related to the ZG are deleted.

In particular, if the object file spans multiple ZGs, the file is marked as "invalid, to be deleted".

In the two modes, the physical granularity of the finally released space is the size of the zone space of the SMR hard disk, and the whole block is deleted and reset. There is no zone delayed release due to zone data holes. Greatly improving the space utilization rate of the system.

In a possible implementation manner, the process of reading data from the cloud storage system specifically includes the following steps:

For example, if a user wants to read a video from 9 o 'clock to 10 o' clock on 1 st/9 th day, the user finds a corresponding Bucket directory according to the information of the client, lists all files in the Bucket directory, the files from 9 o 'clock to 10 o' clock may include a plurality of files, searches the files from 9 o 'clock to 10 o' clock in all the files, and further determines a corresponding object in the files from 9 o 'clock to 10 o' clock and a ZG cluster to which the object belongs, wherein the object starts ZG in the ZG cluster to which the object belongs, starts reading from a start position determined according to the start position of the object in the ZG cluster to which the object belongs and the offset in the ZG, ends ZG and the offset and length in the ZG, and sequentially reads according to the time sequence of writing the object in the ZG.

Based on the same inventive concept, the present application provides an object storage data lifecycle management apparatus, 500, as shown in fig. 5, the apparatus comprising:

a ZG cluster preallocation module 501, configured to allocate, according to an estimated data volume of a source data file received in a multiplexing period, a ZG cluster that matches the data volume, where the ZG cluster includes multiple ZGs, and a number of zones in the ZG is consistent with a number N + M of object fragments of an object of the source data file;

a data writing module 502, configured to select, in response to an object writing instruction of the source data file, one ZG from a ZG cluster allocated in a current multiplexing period, and instruct a storage node where a zone in the ZG is located to perform writing of an object slice of an object;

and a ZG cluster deleting module 503, configured to delete any expired ZG or ZG cluster according to the life cycle of the source data file and the write-in time of the object fragment of the source data file.

In one possible implementation, the ZG cluster pre-allocation module selects one ZG from the ZG clusters allocated in the current multiplexing period, and includes:

In one possible implementation, the ZG cluster pre-allocation module is further configured to:

In a possible implementation manner, the allocating, by the ZG cluster pre-allocation module, a ZG cluster matching the data size of the source data file received in the multiplexing period according to the estimated data size includes:

In one possible implementation, the data writing module, in response to an object writing instruction of the source data file, further includes:

In a possible implementation manner, after the data writing module instructs the storage node where the zone in the ZG is located to perform writing of the object slice of the object, the method further includes:

In a possible implementation manner, a data reading module receives a file reading request sent by a client, wherein the file reading request comprises a file identifier and acquisition time of a source data file to be read;

In a possible implementation manner, the deleting module of a ZG cluster deletes any expired ZG or ZG cluster according to the life cycle of the source data file and the write time of the object fragment of the source data file, including:

In one possible implementation, the ZG cluster pre-allocation module further includes at least one of the following steps:

In one possible embodiment, the ZG cluster preallocation module further includes:

In one possible implementation, the ZG cluster pre-allocation module detects that a second ZG with an abnormal zone exists in all allocated ZG clusters in response to a ZG integrity detection instruction;

Based on the same inventive concept, the application provides a method and a device for managing an object storage data storage space, wherein the device comprises:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method according to any one of the object storage data storage space management methods in the embodiments of the present application.

The electronic device 130 according to this embodiment of the present application is described below with reference to fig. 6. The electronic device 130 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present application.

As shown in fig. 6, the electronic device 130 is represented in the form of a general electronic device. The components of the electronic device 130 may include, but are not limited to: the at least one processor 131, the at least one memory 132, and a bus 133 that connects the various system components (including the memory 132 and the processor 131).

The processor 131 is configured to read and execute the instructions in the memory 132, so that the at least one processor can execute the object storage data storage space management method provided in the foregoing embodiments.

Bus 133 represents one or more of any of several types of bus structures, including a memory bus or memory controller, a peripheral bus, a processor, or a local bus using any of a variety of bus architectures.

The memory 132 may include readable media in the form of volatile memory, such as Random Access Memory (RAM)1321 and/or cache memory 1322, and may further include Read Only Memory (ROM) 1323.

Memory 132 may also include a program/utility 1325 having a set (at least one) of program modules 1324, such program modules 1324 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.

The electronic device 130 may also communicate with one or more external devices 134 (e.g., keyboard, pointing device, etc.), with one or more devices that enable a user to interact with the electronic device 130, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 130 to communicate with one or more other electronic devices. Such communication may occur via input/output (I/O) interfaces 135. Also, the electronic device 130 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network, such as the internet) via the network adapter 136. As shown, network adapter 136 communicates with other modules for electronic device 130 over bus 133. It should be understood that although not shown in the figures, other hardware and/or software modules may be used in conjunction with electronic device 130, including but not limited to: microcode, device drivers, redundant processors, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.

In some possible embodiments, the aspects of an object storage data storage space management method provided by the present application may also be implemented in the form of a program product including program code for causing a computer device to perform the steps of an object storage data storage space management method according to various exemplary embodiments of the present application described above in this specification when the program product is run on the computer device.

In addition, the present application also provides a computer-readable storage medium storing a computer program for causing a computer to execute the method of any one of the above embodiments.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is intended to include such modifications and variations as well.

Claims

1. A method for managing storage space of object storage data, the method comprising:

2. The method of claim 1, wherein selecting one ZG from the ZG cluster allocated for the current multiplexing period comprises:

3. The method of claim 2, further comprising:

4. The method according to any one of claims 1 to 3, wherein allocating ZG clusters matching the estimated data volume of the source data file received in the multiplexing period comprises:

5. The method according to any one of claims 1 to 3, wherein allocating ZG clusters matching the estimated data volume of the source data file received in the multiplexing period comprises:

6. The method of claim 1, wherein in response to an object write instruction for the source data file, further comprising:

7. The method according to claim 1, wherein after instructing the storage node where the zone is located in the ZG to perform writing of the object slice of the object, further comprising:

8. The method of claim 1, further comprising:

9. The method of claim 1, wherein deleting any ZG or ZG cluster that expires according to the life cycle of the source data file and the write time of the object slice of the source data file comprises:

10. The method of claim 1, further comprising at least one of:

11. The method of claim 1, further comprising:

reselecting a new ZG to replace the second Z when it is determined that the second ZG is not in use;

12. The method of claim 1,

13. An object storage data storage space management method and device, wherein the device comprises:

14. An object storage data storage space management method device, characterized in that the device comprises:

at least one processor; and a memory communicatively coupled to the at least one processor; wherein the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the method of any one of claims 1-12.

15. A computer storage medium, characterized in that the computer storage medium stores a computer program for causing a computer to perform the method according to any one of claims 1-12.