WO2023179569A1 - 一种数据回收方法、系统、装置、计算机可读存储介质及程序产品 - Google Patents

一种数据回收方法、系统、装置、计算机可读存储介质及程序产品 Download PDF

Info

Publication number
WO2023179569A1
WO2023179569A1 PCT/CN2023/082611 CN2023082611W WO2023179569A1 WO 2023179569 A1 WO2023179569 A1 WO 2023179569A1 CN 2023082611 W CN2023082611 W CN 2023082611W WO 2023179569 A1 WO2023179569 A1 WO 2023179569A1
Authority
WO
WIPO (PCT)
Prior art keywords
storage unit
recycling
data
target
status
Prior art date
Application number
PCT/CN2023/082611
Other languages
English (en)
French (fr)
Inventor
黄一天
陈焱山
刘鸿
李可飞
Original Assignee
中移(苏州)软件技术有限公司
中国移动通信集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中移(苏州)软件技术有限公司, 中国移动通信集团有限公司 filed Critical 中移(苏州)软件技术有限公司
Publication of WO2023179569A1 publication Critical patent/WO2023179569A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers
    • G06F3/0601Interfaces specially adapted for storage systems
    • G06F3/0628Interfaces specially adapted for storage systems making use of a particular technique
    • G06F3/0646Horizontal data movement in storage systems, i.e. moving data in between storage devices or systems
    • G06F3/0652Erasing, e.g. deleting, data cleaning, moving of data to a wastebasket
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/06Digital input from, or digital output to, record carriers, e.g. RAID, emulated record carriers or networked record carriers

Definitions

  • This application relates to the field of infrastructure, and in particular, to a data recovery method, system, device, computer-readable storage medium and program product.
  • deletion is one of the most common operations.
  • the storage system In the actual engineering implementation corresponding to the deletion operation, the storage system generally does not immediately clear the files that the user wants to delete from the disk, because too many deletion operations will affect the performance of the entire system. Therefore, generally for user deletion operations, a common processing method for storage systems is to mark the file they want to delete as "deleted" status, making it invisible to the user, so as to delete the file from the user's perspective. Furthermore, the storage system will record files with a status of "Deleted” but actually still exist on the storage medium.
  • the storage system After the grace period (Grace time), the storage system will delete these files at a predetermined time (such as the early morning when the access volume is small) or a uniform time interval (such as every hour), thereby truly releasing the storage system's storage space to implement garbage collection in distributed storage systems.
  • a predetermined time such as the early morning when the access volume is small
  • a uniform time interval such as every hour
  • the above garbage collection process is usually performed spontaneously on the storage logical unit, which is inconvenient for unified progress management; and the garbage collection process consumes system resources.
  • the storage logical unit performs the garbage collection process, it is easy to cause damage to the data service performance of the entire storage cluster. impact, thereby reducing the performance of the storage cluster and the efficiency of garbage collection.
  • Embodiments of the present application provide a data recycling method, system, device, computer-readable storage medium and program product, which can improve garbage collection efficiency and storage cluster performance.
  • the embodiment of the present application provides a data recovery method, which method includes:
  • the usage information includes recycling status and historical recycling time
  • Stop the data processing service of the target storage unit perform a data recycling operation through the target storage unit, and update the recycling status of the target storage unit;
  • Embodiments of the present application provide a data recovery system.
  • the data recovery system includes: a control unit and a storage unit cluster; the control unit is connected to the storage units in the storage unit cluster through a preset interface; wherein,
  • the control unit is configured to obtain the usage information of the storage units in the storage unit cluster within the current scheduling cycle; the usage information includes recycling status and historical recycling time; based on the recycling status, determine the usage information of the storage unit cluster.
  • At least one first storage unit that is not in the data recovery state; when the recovery state indicates that it is in the data recovery state, determine the storage unit corresponding to the recovery state as the second storage unit; based on the historical recovery time, Determine a target storage unit from the at least one first storage unit; send a data recovery instruction to the target storage unit through the preset interface, and update the recovery status of the target storage unit; use the preset interface , sending a stop recycling instruction to the second storage unit, and restoring the data processing service of the second storage unit;
  • the storage unit is configured to stop the data processing service and perform a data recycling operation when receiving the data recycling instruction; to stop the data recycling operation when receiving the stop recycling instruction and perform the data recycling operation accordingly. Start the data processing service.
  • Embodiments of the present application provide a data recovery device, which includes an acquisition unit, a determination unit, a data recovery unit and a recovery unit; wherein,
  • the acquisition unit is configured to acquire the usage information of the storage units in the storage unit cluster within the current scheduling cycle; the usage information includes recycling status and historical recycling time;
  • the determining unit is configured to determine, based on the recycling state, at least one first storage unit in the storage unit cluster that is not in the data recycling state; when the recycling status indicates that it is in the data recycling state, the The storage unit corresponding to the recycling state is determined as the second storage unit; based on the historical recycling time, a target storage unit is determined from the at least one first storage unit;
  • the data recycling unit is configured to stop the data processing service of the target storage unit, perform data recycling operations through the target storage unit, and update the recycling status of the target storage unit;
  • the recovery unit is configured to stop the data recovery operation of the second storage unit and resume the data processing service of the second storage unit.
  • the determining unit is further configured to use the first storage unit with an empty historical recycling time as the target storage unit; the historical recycling time represents the time when the data recycling operation was last performed; and in the When the historical recycling time of at least one first storage unit is not empty, based on at least one of the load of the storage unit, the amount of data to be recycled and the usage of the storage unit, the at least one first storage unit is The target storage unit is determined in a storage unit.
  • the determining unit is further configured to perform a weighted sum of at least one of the storage unit load, the data to be recycled and the storage unit usage according to a preset weight, to obtain the a weighting value of each first storage unit in the at least one first storage unit; and determining the target storage unit from the at least one first storage unit according to the weighting value of each first storage unit.
  • the determining unit is further configured to, in the at least one first storage unit, when the number of the first storage units with the highest weighted value is greater than a preset quantity threshold, the first storage unit with the highest weighted value is A storage unit is used as a candidate storage unit; and the target storage unit is determined based on the historical recycling time of the candidate storage unit.
  • the determining unit is further configured to calculate the difference between the historical recycling time of the candidate storage unit and the current time; and based on the difference, determine the target storage unit in the candidate storage unit .
  • the recovery unit is further configured to send a stop recycling instruction to the second storage unit through the first preset interface, so that the second storage unit stops the data recycling operation; and transfer the at least The recycling status of a second storage unit is updated to not be in the data recycling status, and the historical recycling time of the at least one second storage unit is updated; and the data processing service of the second storage unit is restored.
  • the determination unit is further configured to: when the recycling state represents not being in a data recycling state, the working state represents normal operation, and the amount of data to be recycled is greater than a preset data amount threshold, The storage unit is determined as the first storage unit, thereby determining the at least one first storage unit.
  • the data recovery unit is further configured to send a data recovery instruction to the target storage unit through the second preset interface to stop the data processing service of the target storage unit, and send data to the target storage unit through the target storage unit. one performing a data recycling operation; and updating the recycling status of the target storage unit to be in the data recycling status.
  • the data recovery unit is also configured to enter the next scheduling cycle through the preset waiting time, and perform data recovery processing on the storage unit cluster through the next scheduling cycle, thereby passing at least one scheduling cycle. , to implement data recovery scheduling for the storage unit cluster.
  • the data recovery device includes:
  • Memory used to store executable instructions
  • a processor configured to execute executable instructions stored in the memory. When the executable instructions are executed, the processor executes the data recycling method applied to the terminal.
  • Embodiments of the present application provide a storage medium that stores executable instructions. When the executable instructions are executed, they are used to cause the processor to execute the data recovery method as described in the embodiments of the present application.
  • An embodiment of the present application provides a computer program product, which includes a computer program or instructions.
  • the computer program or instructions are executed by a processor, the data recovery method provided by the embodiment of the present application is implemented.
  • Embodiments of the present application provide a data recovery method, system, device, computer-readable storage medium, and program product.
  • the method includes: obtaining usage information of storage units in a storage unit cluster within the current scheduling cycle; the usage information Including recycling status and historical recycling time; based on the recycling status, determine at least one first storage unit in the storage unit cluster that is not in the data recycling status; when the recycling status indicates that it is in the data recycling status, all Determine the storage unit corresponding to the recycling status as the second storage unit; determine a target storage unit from the at least one first storage unit based on the historical recycling time; stop the data processing service of the target storage unit, and use the
  • the target storage unit performs a data recycling operation and updates the recycling status of the target storage unit; stops the data recycling operation of the second storage unit and resumes the data processing service of the second storage unit.
  • the target storage unit and the second storage unit are determined according to the recycling status.
  • the target storage unit can use its full resources to perform data recycling operations. Subsequently, it can The data service of the storage unit is restored according to the recycling status, thereby realizing unified scheduling of data recycling operations of the storage unit, making full use of the system resources of the target storage unit, and improving the efficiency of garbage collection; and it can reduce the time required for the target storage unit to run garbage collection operations. Impact on the data service performance of the storage cluster, thereby also improving the performance of the storage cluster.
  • Figure 1 is a schematic flow chart 1 of a data recovery method provided by an embodiment of the present application.
  • Figure 2 is a schematic flow chart 2 of a data recovery method provided by an embodiment of the present application.
  • Figure 3 is a schematic flow chart 3 of a data recovery method provided by an embodiment of the present application.
  • Figure 4 is a schematic flow chart 4 of a data recovery method provided by an embodiment of the present application.
  • Figure 5 is a schematic flowchart 5 of a data recovery method provided by an embodiment of the present application.
  • Figure 6 is a schematic structural diagram of a data recovery device provided by an embodiment of the present application.
  • Figure 7 is a schematic structural diagram of another data recovery device provided by an embodiment of the present application.
  • FIG. 1 is a schematic flowchart 1 of a data recovery method provided by an embodiment of the present application, which will be described in conjunction with the steps shown in FIG. 1 .
  • the usage information of the storage units in the storage unit cluster is obtained.
  • the usage information includes recycling status and historical recycling time.
  • the control unit collects the usage information of all storage units in the storage unit cluster during the current scheduling period.
  • the usage information includes recycling status and historical recycling time.
  • the recycling status includes ongoing garbage collection operations and non-recycling operations. From the recycling status, the garbage of the storage unit can be determined, that is, data operation information.
  • This historical collection time includes the time when the storage unit last performed a garbage collection operation.
  • the storage unit refers to a logical storage unit, which has the functions of storing data and reading and writing data.
  • S101 may be performed by a control unit interfaced with the storage unit.
  • the control unit can be connected via The port controls the storage unit to perform some operations. For example, a garbage collection instruction is sent to control the storage unit to perform a garbage collection operation. Through the instruction and collection of cluster status information, the control unit can calculate the garbage collection status of the entire cluster.
  • At least one first storage unit that is not in the data recycling state is determined in the storage unit cluster according to the recycling status of the collected storage units.
  • the storage unit according to the recycling status of the storage unit, it is determined whether the storage unit is in the data recycling operation, and the storage unit that is not in the data recycling operation is determined to be the first storage unit.
  • the storage unit in the data recycling operation is determined as the second storage unit according to the storage unit recycling status.
  • the target storage unit is determined from at least one storage unit by judging the historical recycling time of the storage unit.
  • the target storage unit is a storage unit that needs to be recycled, and the target storage unit may be one or multiple.
  • S105 Stop the data processing service of the target storage unit, perform the data recycling operation through the target storage unit, and update the recycling status of the target storage unit.
  • the target storage unit after the target storage unit stops the data processing service, it performs a data recycling operation on the target storage unit and updates the recycling status of the target storage unit.
  • the data processing service of the target storage unit is suspended, that is, the target storage unit is temporarily moved out of the cluster, and then the data recycling operation is performed on the moved target storage unit, and the recycling status of the target storage unit is updated, that is, Indicates that the target storage unit is undergoing data recovery operation.
  • the second storage unit stops the data recycling operation and resumes the data processing service of the second storage unit.
  • the second storage unit stops the data recycling operation and will store the data after stopping the data recycling operation.
  • the second storage unit rejoins the storage unit cluster to provide data processing services.
  • the usage information of the storage units in the cluster is collected, and then the first storage unit and the second storage unit are determined according to the recycling status of the storage unit in the usage information, and the first storage unit and the second storage unit are recycled according to the history. time, determine the target storage unit in the first storage unit, and finally stop the data processing service of the target storage unit (that is, move it out of the cluster), and then perform a data recycling operation on the target storage unit.
  • removing or adding storage units is a very common operation for distributed storage systems and will basically have no impact on the performance of the cluster.
  • the storage unit can perform full data recycling, that is, occupying as much system resources as possible for data recycling, thus freeing up more storage space.
  • S104 can be implemented through S1041 to S1042, which will be described in conjunction with the following steps.
  • the first storage unit after determining the first storage unit, check the historical recycling time of the first storage unit.
  • the historical recycling time represents the time when the storage unit last performed a data recycling operation. Therefore, the first storage unit is as the target storage unit.
  • the historical time of the first storage unit is empty, it indicates that the first storage unit has not undergone a garbage collection operation.
  • At least one of the steps determines the target storage unit in at least one first storage unit.
  • the first storage unit if there is only one first storage unit whose historical time is empty, the first storage unit is directly determined as the target storage unit; if there are two or more first storage units The historical time is empty, and the target storage unit is determined among the two or more storage units based on at least one of the load amount of the storage unit, the amount of data to be recycled, and the usage amount of the storage unit.
  • the load percentage of the current storage unit and the amount of data to be recycled can also be At least one of the percentage of the total data amount and the percentage of the storage unit usage to the total storage amount is used to determine the target storage unit among the two or more storage units.
  • the first storage unit whose historical recycling time is empty is determined as the target storage unit. If the historical time is not empty, the first storage unit is determined based on the storage unit load, the amount of data to be recycled, and At least one of the storage unit usage amounts determines the target storage unit in at least one first storage unit. This allows the storage unit to occupy system resources in an orderly and reasonable manner for data recycling.
  • FIG. 2 is a flow diagram 2 of a data recovery method provided by an embodiment of the present application
  • S1042 can be implemented through S201 to S202, which will be described in conjunction with the following steps.
  • each first storage unit in the at least one first storage unit is calculated based on at least one of the storage unit load, the amount of data to be recycled, and the storage unit usage, combined with a preset weight. weighted value.
  • the weight corresponding to the load of the storage unit can be set to 0.3
  • the weight corresponding to the amount of data to be recycled can be set to 0.5
  • the usage of the storage unit The corresponding weight is set to 0.2.
  • the weight corresponding to the amount of data to be recycled can be set to 0.5
  • the weight corresponding to the storage unit usage and the storage unit load can be set to 0, and the weighted values of the two or more first storage units can be calculated.
  • any two factors among the storage unit load, the amount of data to be recycled and the storage unit usage can also be used, combined with the corresponding weights, that is, the weights corresponding to the remaining factors are set to 0, to calculate The weighted value of each first storage unit in the first storage unit. For example, you can set the weight corresponding to the load of the storage unit to 0.3, the weight corresponding to the amount of data to be recycled to 0.5, and the weight corresponding to the usage of the storage unit to 0 to calculate the weighted value of the storage unit.
  • the weighted value of the storage unit can also be calculated by simultaneously utilizing the storage unit load, the amount of data to be recycled, and the storage unit usage, combined with the corresponding weights.
  • the corresponding weight can also be combined with at least one of the load percentage of the current storage unit, the percentage of the amount of data to be recycled to the total data volume, and the percentage of the storage unit usage to the total storage amount, Calculate the weighted value of the storage unit.
  • the target storage unit is determined from at least one first storage unit according to the calculated weight value of each first storage unit.
  • the first storage unit with the highest weighted value is selected and determined as the target storage unit.
  • a threshold can be set, weighted values greater than the threshold are used as candidate values, the candidate values are sorted, the highest weighted value is selected, and the first storage unit corresponding to the highest weighted value is determined. is the target storage unit.
  • a weighted sum is performed on at least one of the load amount of the storage unit, the amount of data to be recycled and the usage amount of the storage unit according to the preset weight, and each data in the first storage unit is obtained. According to the weighted value of each storage unit, the final target storage unit is determined, which enables the storage unit to further occupy system resources in an orderly and reasonable manner for data recycling.
  • Figure 3 is a schematic flowchart 3 of a data recovery method provided by an embodiment of the present application.
  • S201 can be implemented through S2011 to S2012, which will be described in conjunction with the following steps.
  • At least one first storage unit if the number of first storage units with the highest weighted value is greater than the preset quantity threshold, use the first storage unit with the highest weighted value as a candidate storage unit.
  • the number of first storage units with the highest weighted value is greater than the preset quantity threshold, that is, there are multiple first storage units with equal and highest weighted values at the same time. , using the plurality of first storage units as candidate storage units.
  • the preset quantity threshold can be 1, or can be set according to actual needs. The specific selection is made according to the actual situation, which is not limited in the embodiments of this application.
  • the target storage unit is determined based on the historical recycling times of the multiple candidate storage units.
  • Figure 4 is a schematic flowchart 4 of a data recovery method provided by an embodiment of the present application.
  • S2012 can be implemented through S301 to S302, which will be described in conjunction with the following steps.
  • the historical recycling time of the candidate storage unit is obtained, that is, the time when the candidate storage unit last performed a data recycling operation, and the difference is calculated using the time when the candidate storage unit last performed a data recycling operation and the current scheduling time. value.
  • each candidate storage unit corresponds to a difference value.
  • the differences are sorted, the maximum difference value is found, and the candidate storage unit corresponding to the maximum difference value is determined as the target storage unit.
  • one or more candidate storage units whose differences are greater than a preset difference threshold may also be determined as target storage units.
  • the number of target storage units can be set according to the actual situation of the storage cluster. For example, removing a corresponding number of target storage units will not have a serious impact on the data service performance of the storage cluster. The specific selection is made according to the actual situation, and is not limited by the embodiments of this application.
  • the target storage unit when the weighted values of multiple candidate storage units are the highest and equal, the target storage unit is determined to perform data processing by calculating the difference between the historical recycling time of the candidate storage unit and the current time.
  • the recycling operation can accurately locate the target storage unit of the storage unit that requires data recycling operation, thus improving the recycling efficiency.
  • S106 can be implemented through S1061 to S1063, which will be described in conjunction with the following steps.
  • control unit sends a stop recycling instruction to the second storage unit through the first preset interface, so that the second storage unit stops the data recycling operation.
  • the second storage unit is a storage unit in a data recycling operation.
  • the recycling of the at least one second storage unit is updated to not be in the data recycling state, and the historical recycling time of the at least one second storage unit is updated.
  • restoring the data processing service of the second storage unit means rejoining the second storage unit whose data recycling status has been changed to the cluster.
  • the order of S1062 and S1063 is not limited.
  • S102 can be implemented through S1021, which will be described in conjunction with the following steps.
  • the recycling status indicates that the data is not in the data recycling status
  • the working status indicates normal operation, and the amount of data to be recycled is greater than the preset data amount threshold, determine the storage unit as the first storage unit, thereby determining at least one first Storage unit; usage information also includes working status.
  • the storage units that are currently working normally and need to perform data recycling operations are determined, and then the recycling status of the storage units is checked to find out the storage units that are not in the data recycling status.
  • Storage unit name the found storage unit as the first storage unit, thereby determining at least one first storage unit.
  • normal storage units that require garbage collection are determined from the storage unit cluster to perform data recycling operations, so that system resources can be effectively utilized.
  • S105 can be implemented through S1051 to S1052, which will be described in conjunction with the following steps.
  • control unit sends a data recycling instruction to the target storage unit through the preset second interface, so that the target storage unit stops the data processing service and performs the data recycling operation through the target storage unit.
  • causing the target storage unit to stop data processing services is to temporarily remove the target storage unit from the storage unit cluster.
  • the first preset interface and the second preset interface may be the same or different.
  • updating the recycling status of the target storage unit to be in the data recycling status indicates that the target storage unit is performing a data recycling operation.
  • a data recycling instruction is sent to the target storage unit through the interface to stop the data processing service of the target storage unit, and the data recycling operation is performed through the target storage unit.
  • the data recycling operation is performed, so that the target storage unit uses its full resources to perform data recycling operations, improving garbage collection efficiency; and it can reduce the impact on the data service performance of the storage cluster when the target storage unit runs garbage collection operations. impact, thereby also improving storage cluster performance.
  • S401 is also included, as follows.
  • the next scheduling cycle is entered after a preset waiting time, and the storage unit cluster is garbage collected through the next scheduling cycle, thereby realizing the storage unit cluster through at least one scheduling cycle.
  • Data recycling schedule is
  • the algorithm for determining the target storage unit in this application is essentially a polling algorithm, which ensures that each storage unit can complete at least one garbage collection after a certain period of time. Moreover, garbage collection scheduling is not affected by time. Generally speaking, the business traffic is at a low point in the early morning. Then the control unit can appropriately increase the storage unit for garbage collection between 00:00-02:00 when scheduling. quantity.
  • next scheduling cycle is entered through the preset waiting time, and the data recovery processing of the storage unit cluster is performed through the next scheduling cycle, thereby realizing the storage unit cluster through at least one scheduling cycle.
  • Cluster data recycling schedule It is guaranteed that each storage unit can complete at least one garbage collection after a certain period of time, so that each storage unit can complete at least one garbage collection after a certain period of time.
  • the embodiment of this application provides a data recycling method, which can be applied to the actual scenario of garbage recycling in a storage cluster.
  • the storage cluster can contain multiple storage units, and each storage unit is connected to the control unit through an interface. The method is shown in Figure 5 Show.
  • the control unit collects information from all storage units.
  • the control unit first collects the usage information of all storage units in the entire cluster. This information includes: whether the storage unit is working normally, whether the storage unit needs to be garbage collected, and the time when the storage unit last performed a garbage collection operation ( Empty if never), whether the storage unit is currently in garbage collection state.
  • the control unit selects a storage unit from the units that are working normally and need to be garbage collected, temporarily removes it from the data cluster, and updates its status information to garbage collection in progress; at the same time, it controls this
  • the storage unit performs full garbage collection operations. In some embodiments, if a memory unit last performed garbage The time of the recycling operation is empty, indicating that this storage unit has never performed a garbage collection operation. This storage unit is selected as the storage unit that needs to perform a garbage collection operation, that is, the first storage unit.
  • control unit temporarily removes the target storage unit from the data cluster and updates its status information to garbage collection in progress.
  • the storage unit starts to collect all garbage, and then executes S10.
  • the storage unit can utilize full resources for garbage collection operations.
  • the storage unit stops garbage collection.
  • control unit analyzes and finds the storage unit that performed the garbage collection operation in the previous round through the information about whether the storage unit is in a garbage collection state, as the second storage unit.
  • the control unit controls the second storage unit to stop the garbage collection operation and rejoin it to the data cluster. At the same time, it updates the information about the time of its last garbage collection operation, cancels its garbage collection state, and resumes use.
  • control unit waits for one scheduling cycle and returns to S1 to start again.
  • the usage information of all storage units in the cluster is collected, and then the storage unit that needs to be garbage collected is determined based on the usage information, and the storage unit is temporarily removed from the cluster, and Garbage collection occurs after removal.
  • the unit is removed from the cluster. At this time, no matter what operation is performed on this storage unit, it will not affect the entire cluster, and the garbage collection efficiency is improved.
  • the data recovery system includes: a control unit and a storage unit cluster; the control unit communicates with the storage units in the storage unit cluster through a preset first interface and a preset second interface. unit interconnected; among them,
  • the control unit is configured to obtain the usage information of the storage units in the storage unit cluster within the current scheduling cycle; the usage information includes recycling status and historical recycling time; based on the recycling status, determine the usage information of the storage unit cluster.
  • At least one first storage unit that is not in the data recovery state; when the recovery state indicates that it is in the data recovery state, determine the storage unit corresponding to the recovery state as the second storage unit; based on the historical recovery time, Determine a target storage unit from the at least one first storage unit; send a data recovery instruction to the target storage unit through the preset interface, and update the recovery status of the target storage unit; use the preset interface , sending a stop recycling instruction to the second storage unit, and restoring the data processing service of the second storage unit;
  • the storage unit is configured to stop the data processing service and perform a data recycling operation when receiving the data recycling instruction; to stop the data recycling operation when receiving the stop recycling instruction and perform the data recycling operation accordingly. Start the data processing service.
  • FIG. 6 is a schematic structural diagram of a data recovery device provided by an embodiment of the present application.
  • the data recovery device includes: an acquisition unit 601, a determination unit 602, a data Recycling unit 603, recovery unit 604; wherein,
  • the acquisition unit 601 is configured to acquire the usage information of the storage units in the storage unit cluster within the current scheduling cycle; the usage information includes recycling status and historical recycling time.
  • the determining unit 602 is configured to determine, based on the recycling state, at least one first storage unit in the storage unit cluster that is not in the data recycling state.
  • the determining unit 602 is further configured to determine the storage unit corresponding to the recycling status as the second storage unit when the recycling status indication is in the data recycling status.
  • the determining unit 602 is further configured to determine a target storage unit from the at least one first storage unit based on the historical recycling time.
  • the data recycling unit 603 is configured to stop the data processing service of the target storage unit, perform a data recycling operation through the target storage unit, and update the recycling status of the target storage unit.
  • the recovery unit 604 is configured to stop the data recovery operation of the second storage unit and resume the data processing service of the second storage unit.
  • the determining unit 602 is further configured to use the first storage unit with an empty historical recycling time as the target storage unit; the historical recycling time represents the time when the data recycling operation was last performed; and in the at least one first When the historical recycling time of a storage unit is not empty, based on the load of the storage unit and the data to be recycled
  • the target storage unit is determined in the at least one first storage unit based on at least one of the amount and the usage amount of the storage unit.
  • the usage information also includes: storage unit load, amount of data to be recycled, and storage unit usage.
  • the determining unit 602 is further configured to perform a weighted sum of at least one of the storage unit load, the data to be recycled and the storage unit usage according to a preset weight, to obtain the at least one a weighting value of each first storage unit in the first storage unit; and determining the target storage unit from the at least one first storage unit according to the weighting value of each first storage unit.
  • the determining unit 602 is further configured to, in the at least one first storage unit, when the number of the first storage units with the highest weighted value is greater than the preset quantity threshold, the first storage unit with the highest weighted value is as a candidate storage unit; and determining the target storage unit based on historical recycling times of the candidate storage unit.
  • the determination unit 602 is further configured to calculate the difference between the historical recycling time of the candidate storage unit and the current time; and determine the target storage unit in the candidate storage unit based on the difference.
  • the recovery unit 604 is configured to send a stop recycling instruction to the second storage unit through the first preset interface, so that the second storage unit stops the data recycling operation; and the at least one second storage unit
  • the recycling status of the unit is updated to not be in the data recycling status, and the historical recycling time of the at least one second storage unit is updated; and the data processing service of the second storage unit is restored.
  • the determining unit 602 is also configured to, when the recycling status represents not being in a data recycling state, the working status represents normal operation, and the amount of data to be recycled is greater than a preset data amount threshold, the The storage unit is determined as the first storage unit, thereby determining the at least one first storage unit.
  • the data recovery unit 603 is also configured to send a data recovery instruction to the target storage unit through the second preset interface to stop the data processing service of the target storage unit, and execute data through the target storage unit. Recycling operation; and updating the recycling status of the target storage unit to be in the data recycling state.
  • the data recycling unit 603 is also configured to enter the next scheduling cycle through the preset waiting time, and perform data recycling processing on the storage unit cluster through the next scheduling cycle, thereby achieving at least one scheduling cycle. Data recycling schedule of the storage unit cluster.
  • the usage information of all storage units in the cluster is collected, and then the storage units that need to be garbage collected are determined based on the usage information, and are temporarily removed from the cluster and placed there. Carry out garbage collection after removal.
  • removing or adding storage units is a very common operation for distributed storage systems and will basically have no impact on the performance of the cluster.
  • the storage unit because the storage unit has been removed from the cluster during garbage collection, there is no need to consider the performance impact of garbage collection on this storage unit at this time. Therefore, during this period, the storage unit can perform full garbage collection, that is, occupying as much system resources as possible for garbage collection, so more storage space can be released.
  • FIG. 7 is a schematic structural diagram of a data recovery device provided by an embodiment of the present application, including: a processor 701 and a memory 702;
  • the memory 702 stores one or more programs that are executable by the processor 701.
  • the processor 701 executes a data recovery method corresponding to the aforementioned embodiment.
  • Embodiments of the present application provide a computer-readable storage medium that stores executable instructions for causing the processor to implement the data recovery method when executed.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product implemented on one or more computer-usable storage media (including, but not limited to, magnetic disk storage and optical storage, etc.) embodying computer-usable program code therein.
  • a computer-usable storage media including, but not limited to, magnetic disk storage and optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.
  • the usage information of the storage units in the cluster is collected, and then the first storage unit and the second storage unit are determined according to the recycling status of the storage unit in the usage information.
  • the first storage unit is The target storage unit is determined, and finally the data processing service of the target storage unit is stopped (that is, it is moved out of the cluster), and then the data recycling operation is performed on the target storage unit.
  • removing or adding storage units will not affect the performance of the distributed storage system cluster; on the other hand, because the storage unit has been moved out of the cluster during data (garbage) collection, the storage unit can be Carry out full data recycling to release more storage space, and subsequently restore data services to the storage unit based on the recycling status.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Memory System (AREA)

Abstract

本申请实施例公开了一种数据回收方法、系统、装置、计算机可读存储介质及程序产品,方法包括:在当前调度周期内,获取存储单元集群中存储单元的使用信息;使用信息包括回收状态与历史回收时间;基于回收状态,确定存储单元集群中不处于数据回收状态的至少一个第一存储单元;在回收状态表征处于数据回收状态的情况下,将回收状态对应的存储单元确定为第二存储单元;基于历史回收时间,从至少一个第一存储单元中确定目标存储单元;停止目标存储单元的数据处理服务,通过目标存储单元执行数据回收操作,并更新目标存储单元的回收状态;停止第二存储单元的数据回收操作,并恢复第二存储单元的数据处理服务。通过本方案,能够提升垃圾回收效率与存储集群性能。

Description

一种数据回收方法、系统、装置、计算机可读存储介质及程序产品
相关申请的交叉引用
本申请基于申请号为202210305712.9、申请日为2022年03月25日、发明名称为“一种数据回收方法、系统、装置及计算机可读存储介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本申请涉及基础设施领域,尤其涉及一种数据回收方法、系统、装置、计算机可读存储介质及程序产品。
背景技术
用户在使用基于分布式存储开发的产品(如网盘或对象存储)时,删除操作是最常见的操作之一。在删除操作对应的实际工程实现中,存储系统一般不会立刻将用户想要删除的文件从磁盘上完全清除,因为过多的删除操作会影响到整个系统的性能。因此,一般针对用户删除操作,存储系统常用处理方法是将其想要删除的文件标记为“已删除”状态,使其对用户不可见,以从用户角度删除该文件。进而,存储系统会记录状态为“已删除”但实际在存储介质上仍然存在的文件。在经过宽限期(Grace time)后,存储系统会以预定的时间(例如访问量较小的凌晨)或者统一的时间间隔(例如每1小时)对这些文件进行删除,从而真正释放存储系统的存储空间,实现分布式存储系统中的垃圾回收。
然而,上述垃圾回收过程通常由存储逻辑单元上自发进行,不便于统一进行进度管理;并且垃圾回收过程非常消耗系统资源,存储逻辑单元在执行垃圾回收过程时容易对整个存储集群的数据服务性能造成影响,从而降低了存储集群的性能与垃圾回收的效率。
发明内容
本申请实施例提供了一种数据回收方法、系统、装置、计算机可读存储介质及程序产品,能够提升垃圾回收效率与存储集群性能。
本申请的技术方案是这样实现的:
本申请实施例提供了一种数据回收方法,所述方法包括:
在当前调度周期内,获取存储单元集群中存储单元的使用信息;所述使用信息包括回收状态与历史回收时间;
基于所述回收状态,确定所述存储单元集群中不处于数据回收状态的至少一个第一存储单元;
在所述回收状态表征处于数据回收状态的情况下,将所述回收状态对应的存储单元确定为第二存储单元;
基于所述历史回收时间,从所述至少一个第一存储单元中确定目标存储单元;
停止所述目标存储单元的数据处理服务,通过所述目标存储单元执行数据回收操作,并更新所述目标存储单元的回收状态;
停止所述第二存储单元的数据回收操作,并恢复所述第二存储单元的数据处理服务。
本申请实施例提供一种数据回收系统,所述数据回收系统包括:控制单元与存储单元集群;所述控制单元通过预设接口与所述存储单元集群中的存储单元互相连接;其中,
所述控制单元,被配置为在当前调度周期内,获取存储单元集群中存储单元的使用信息;所述使用信息包括回收状态与历史回收时间;基于所述回收状态,确定所述存储单元集群中不处于数据回收状态的至少一个第一存储单元;在所述回收状态表征处于数据回收状态的情况下,将所述回收状态对应的存储单元确定为第二存储单元;基于所述历史回收时间,从所述至少一个第一存储单元中确定目标存储单元;通过所述预设接口,向所述目标存储单元发送数据回收指令,并更新所述目标存储单元的回收状态;通过所述预设接口,向所述第二存储单元发送停止回收指令,并恢复所述第二存储单元的数据处理服务;
所述存储单元,被配置为在接收到所述数据回收指令的情况下,停止数据处理服务,并进行数据回收操作;在接收到所述停止回收指令的情况下,停止数据回收操作,并相应启动数据处理服务。
本申请实施例提供一种数据回收装置,所述数据回收装置包括获取单元、确定单元、数据回收单元和恢复单元;其中,
所述获取单元,被配置为在当前调度周期内,获取存储单元集群中存储单元的使用信息;所述使用信息包括回收状态与历史回收时间;
所述确定单元,被配置为基于所述回收状态,确定所述存储单元集群中不处于数据回收状态的至少一个第一存储单元;在所述回收状态表征处于数据回收状态的情况下,将所 述回收状态对应的存储单元确定为第二存储单元;基于所述历史回收时间,从所述至少一个第一存储单元中确定目标存储单元;
所述数据回收单元,被配置为停止所述目标存储单元的数据处理服务,通过所述目标存储单元执行数据回收操作,并更新所述目标存储单元的回收状态;
所述恢复单元,被配置为停止所述第二存储单元的数据回收操作,并恢复所述第二存储单元的数据处理服务。
上述装置中,所述确定单元,还被配置为将历史回收时间为空的第一存储单元作为所述目标存储单元;所述历史回收时间表征上一次执行数据回收操作的时间;以及在所述至少一个第一存储单元的历史回收时间均不为空的情况下,基于所述存储单元负载量、所述待回收数据量与所述存储单元使用量中的至少一个,在所述至少一个第一存储单元中确定所述目标存储单元。
上述装置中,所述确定单元,还被配置为根据预设权重,对所述存储单元负载量、所述待回收数据量与所述存储单元使用量中的至少一个进行加权求和,得到所述至少一个第一存储单元中每个第一存储单元的加权值;及根据所述每个第一存储单元的加权值,从所述至少一个第一存储单元中确定出所述目标存储单元。
上述装置中,所述确定单元,还被配置为在所述至少一个第一存储单元中,加权值最高的第一存储单元的数量大于预设数量阈值情况下,将所述加权值最高的第一存储单元作为候选存储单元;以及基于所述候选存储单元的历史回收时间,确定所述目标存储单元。
上述装置中,所述确定单元,还被配置为计算所述候选存储单元的历史回收时间和当前时间的差值;以及基于所述差值,在所述候选存储单元中确定所述目标存储单元。
上述装置中,所述恢复单元,还被配置为通过第一预设接口,向所述第二存储单元发送停止回收指令,以使所述第二存储单元停止数据回收操作;及将所述至少一个第二存储单元的回收状态更新为不处于数据回收状态,并更新所述至少一个第二存储单元的历史回收时间;以及恢复所述第二存储单元的数据处理服务。
上述装置中,所述确定单元,还被配置为在所述回收状态表征不处于数据回收状态、所述工作状态表征正常工作、且所述待回收数据量大于预设数据量阈值的情况下,将所述存储单元确定为第一存储单元,从而确定出所述至少一个第一存储单元。
上述装置中,所述数据回收单元,还被配置为通过第二预设接口,向所述目标存储单元发送数据回收指令,以停止所述目标存储单元的数据处理服务,并通过所述目标存储单 元执行数据回收操作;以及将所述目标存储单元的回收状态更新为处于数据回收状态。
上述装置中,所述数据回收单元,还被配置为通过预设等待时间进入下一调度周期,通过所述下一调度周期,对所述存储单元集群进行数据回收处理,从而通过至少一个调度周期,实现对所述存储单元集群的数据回收调度。
本申请实施例提供一种数据回收装置,所述数据回收装置包括:
存储器,用于存储可执行指令;
处理器,用于执行所述存储器中存储的可执行指令时,当所述可执行指令被执行时,所述处理器执行应用于终端的所述数据回收方法。
本申请实施例提供了一种存储介质,所述存储介质存储有可执行指令,当所述可执行指令被执行时,用于引起处理器执行如本申请实施例所述的数据回收方法。
本申请实施例提供了一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时,实现本申请实施例提供的数据回收方法。
本申请实施例提供了一种数据回收方法、系统、装置、计算机可读存储介质及程序产品,该方法包括:在当前调度周期内,获取存储单元集群中存储单元的使用信息;所述使用信息包括回收状态与历史回收时间;基于所述回收状态,确定所述存储单元集群中不处于数据回收状态的至少一个第一存储单元;在所述回收状态表征处于数据回收状态的情况下,将所述回收状态对应的存储单元确定为第二存储单元;基于所述历史回收时间,从所述至少一个第一存储单元中确定目标存储单元;停止所述目标存储单元的数据处理服务,通过所述目标存储单元执行数据回收操作,并更新所述目标存储单元的回收状态;停止所述第二存储单元的数据回收操作,并恢复所述第二存储单元的数据处理服务。上述方案中,在当前调度周期中,根据回收状态确定出目标存储单元与第二存储单元,对目标存储单元停止数据处理服务后,可通过目标存储单元利用其全量资源进行数据回收操作,后续可以根据回收状态恢复存储单元的数据服务,从而实现了对存储单元数据回收操作的统一调度,能充分利用目标存储单元的系统资源,提升垃圾回收效率;并且,能够减少目标存储单元运行垃圾回收操作时对存储集群的数据服务性能的影响,从而也提高了存储集群性能。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本申请。
附图说明
图1为本申请实施例提供的一种数据回收方法的流程示意图一;
图2为本申请实施例提供一种数据回收方法的流程示意图二;
图3为本申请实施例提供一种数据回收方法的流程示意图三;
图4为本申请实施例提供一种数据回收方法的流程示意图四;
图5为本申请实施例提供一种数据回收方法的流程示意图五;
图6为本申请实施例提供的一种数据回收装置的结构示意图;
图7为本申请实施例提供的另一种数据回收装置的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部实施例。基于本申请的实施例,本领域普通技术人员在没有做出创造性劳动前提下,所获得的所有其他实施例,都属于本申请保护范围。
为了使本技术领域的人员更好地理解本申请方案,下面结合附图和具体实施方式对本申请作进一步的详细说明。本申请实施例的方法可以由数据回收装置执行,在一些实施例中,数据回收装置可以是终端或服务器。图1为本申请实施例提供的一种数据回收方法的流程示意图一,将结合图1示出的步骤进行说明。
S101、在当前调度周期内,获取存储单元集群中存储单元的使用信息;所述使用信息包括回收状态与历史回收时间。
在本申请的实施例中,在当前调度周期内,获取存储单元集群中存储单元的使用信息,使用信息包括回收状态和历史回收时间。
在本申请的实施例中,控制单元在当前调度周期内,收集存储单元集群中所有存储单元的使用信息。其中使用信息包括回收状态和历史回收时间,该回收状态包括正进行垃圾回收操作及未进行回收操作,从回收状态可以确定存储单元的垃圾,即数据操作信息。该历史回收时间包括存储单元上一次执行垃圾回收操作的时间。
在本申请的实施例中,存储单元是指逻辑存储单元,具有存储数据和读写数据的功能。在一些实施例中,S101可由与存储单元接口连接的控制单元来执行。控制单元可以通过接 口控制存储单元执行一些操作。示例性的,发送垃圾回收指令,控制存储单元进行垃圾回收操作,通过指令和对集群状态信息的收集,控制单元可以统计出整个集群的垃圾回收状态。
S102、基于回收状态,确定存储单元集群中不处于数据回收状态的至少一个第一存储单元。
在本申请的实施例中,根据收集的存储单元的回收状态,在存储单元集群中确定不处于数据回收状态的至少一个第一存储单元。
在本申请的实施例中,根据存储单元的回收状态,判断存储单元是否正在处于数据回收操作,确定未处于数据回收操作的存储单为第一存储单元。
S103、在回收状态表征处于数据回收状态的情况下,将回收状态对应的存储单元确定为第二存储单元。
在本申请的一些实施例中,根据存储单元回收状态,确定出处于数据回收操作的存储单元作为第二存储单元。
S104、基于历史回收时间,从至少一个第一存储单元中确定目标存储单元。
在本申请的实施例中,通过判断存储单元的历史回收时间,从至少一个存储单元中确定目标存储单元。
在本申请的实施例中,目标存储单元是需要进行数据回收的存储单元,目标存储单元可以是一个,也可以为多个。
S105、停止目标存储单元的数据处理服务,通过目标存储单元执行数据回收操作,并更新目标存储单元的回收状态。
在本申请的实施例中,目标存储单元停止数据处理服务后,对目标存储单元执行数据回收操作,并更新目标存储单元的回收状态。
在本申请的实施例中,将目标存储单元暂停数据处理服务,即暂时先将目标存储单元移出集群,然后对移出后的目标存储单元进行数据回收操作,并更新目标存储单元的回收状态,即表征目标存储单元正在进行数据回收操作。
S106、停止第二存储单元的数据回收操作,并恢复第二存储单元的数据处理服务。
在本申请的实施例中,第二存储单元停止数据回收操作,并恢复第二存储单元的数据处理服务。
在本申请的实施例中,第二存储单元停止数据回收操作,并将停止数据回收操作后的 第二存储单元重新加入到存储单元集群中进行数据处理服务。
可以理解的是,在本申请的实施例中,通过收集集群内存储单元的使用信息,然后根据使用信息中的存储单元的回收状态,确定出第一存储单元和第二存储单元,根据历史回收时间,在第一存储单元中确定出目标存储单元,最后将目标存储单元停止数据处理服务(即移出集群)后,对目标存储单元执行数据回收操作。一方面,移除或增加存储单元对于分布式存储系统是很常见的操作,基本不会对集群的性能产生影响。另一方面,因为存储单元在做数据(垃圾)回收的时候已经被移出了集群,此时不需要再考虑数据回收对这个存储单元产生的性能影响。因此,在此期间,该存储单元可以进行全量的数据回收,即尽可能的占用系统资源来进行数据回收,因此可以释放更多的存储空间。
在本申请的一些实施例中,S104可以通过S1041至S1042实现,结合以下步骤进行说明。
S1041、将历史回收时间为空的第一存储单元作为目标存储单元;历史回收时间表征上一次执行数据回收操作的时间。
在本申请的一些实施例中,确定第一存储单元后,查看第一存储单元的历史回收时间,该历史回收时间表征存储单元上一次执行数据回收操作的时间,所以,将该第一存储单元作为目标存储单元。
在本申请的一些实施例中,如果存在第一存储单元的历史时间为空,即表明该第一存储单元未进行过垃圾回收操作。
S1042、在至少一个第一存储单元的历史回收时间均不为空的情况下,基于存储单元负载量、待回收数据量与存储单元使用量中的至少一个,在至少一个第一存储单元中确定目标存储单元。
在本申请的一些实施例中,如果在多个存储单元中存在至少一个第一存储单元的历史回收时间均不为空的情况下,基于存储单元负载量、待回收数据量与存储单元使用量中的至少一个,在至少一个第一存储单元中确定目标存储单元。
在本申请的一些实施例中,如果只存在一个第一存储单元的历史时间为空,则直接将该第一存储单元确定为目标存储单元;如果存在两个或两个以上的第一存储单元的历史时间为空,根据存储单元负载量、待回收数据量和存储单元使用量中的至少一个,在该两个或两个以上的存储单元中确定目标存储单元。
在本申请的一些实施例中,还可以根据当前存储单元的负载百分比、待回收数据量占 总数据量的百分比以及存储单元使用量占总存储量的百分比中的至少一个,在该两个或两个以上的存储单元中确定目标存储单元。
可以理解的是,在本申请的一些实施例中,将历史回收时间为空的第一存储单元确定为目标存储单元,如果历史时间均不为空,根据存储单元负载量、待回收数据量与存储单元使用量中的至少一个,在至少一个第一存储单元中确定目标存储单元。这使得存储单元能有序并合理的占用系统资源来进行数据回收。
在本申请的一些实施例中,如图2所示,图2为本申请实施例提供的一种数据回收方法的流程示意图二,S1042可以通过S201至S202实现,将结合以下步骤进行说明。
S201、根据预设权重,对存储单元负载量、待回收数据量与存储单元使用量中的至少一个进行加权求和,得到至少一个第一存储单元中每个第一存储单元的加权值。
在本申请的一些实施例中,基于存储单元负载量、待回收数据量与存储单元使用量中的至少一个,结合预设权重,计算出至少一个第一存储单元中的每个第一存储单元的加权值。
在本申请的一些实施例中,当第一存储单元为两个或两个以上时,可以将存储单元负载量的对应权重设置为0.3,待回收数据量对应权重设置为0.5及存储单元使用量对应的权重设置为0.2。计算第一存储单元的加权值时,可以只利用存储单元负载量、待回收数据量与存储单元使用量中的一个,结合对应权重,也即将其余两个因素的权重暂设置为0,以计算出第一存储单元中的每个第一存储单元的加权值。例如:可以将待回收数据量对应权重设置为0.5,存储单元使用量与存储单元负载量对应的权重设置为0,计算出该两个或两个以上第一存储单元的加权值。
在本申请的一些实施例中,也可以利用存储单元负载量、待回收数据量与存储单元使用量中的任意两个因素,结合对应权重,也即将剩余因素对应的权重设置为0,计算出第一存储单元中的每个第一存储单元的加权值。例如:可以将存储单元负载量的对应权重设置为0.3,待回收数据量对应权重设置为0.5,存储单元使用量对应的权重设置为0,计算存储单元的加权值。
在本申请的一些实施例中,也可以同时利用存储单元负载量、待回收数据量与存储单元使用量,结合对应权重,计算出存储单元的加权值。
在本申请的一些实施例中,还可以根据当前存储单元的负载百分比、待回收数据量占总数据量的百分比以及存储单元使用量占总存储量的百分比中的至少一个,结合对应权重, 计算出存储单元的加权值。
S202、根据每个第一存储单元的加权值,从至少一个第一存储单元中确定出目标存储单元。
在本申请的一些实施例中,根据计算出的每个第一存储单元的加权值,从至少一个第一存储单元中确定出目标存储单元。
在本申请的一些实施例中,计算出每个第一存储单元的加权值后,进行排序选出加权值最高的第一存储单元,将其确定为目标存储单元。
在本申请的一些实施例中,可以设置一个阈值,将加权值大于该阈值的加权值作为候选值,将候选值进行排序,选出最高加权值,将最高加权值对应的第一存储单元确定为目标存储单元。
可以理解的是,在本申请的一些实施例中,根据预设权重,对存储单元负载量、待回收数据量与存储单元使用量中的至少一个进行加权求和,获取第一存储单元中每个存储单元的加权值,根据加权值,确定最终的目标存储单元,这使得存储单元能进一步有序并合理的占用系统资源来进行数据回收。
在本申请的一些实施例中,如图3所示,图3为本申请实施例提供的一种数据回收方法的流程示意图三,S201可以通过S2011至S2012实现,结合以下步骤进行说明。
S2011、在至少一个第一存储单元中,加权值最高的第一存储单元的数量大于预设数量阈值情况下,将加权值最高的第一存储单元作为候选存储单元。
在本申请的一些实施例中,在至少一个第一存储单元中,如果加权值最高的第一存储单元的数量大于预设数量阈值,即同时存在多个第一存储单元的加权值相等且最高,将该多个第一存储单元均作为候选存储单元。
在一些实施例中,预设数量阈值可以为1,也可以根据实际需要进行设置,具体的根据实际情况进行选择,本申请实施例不作限定
S2012、基于候选存储单元的历史回收时间,确定目标存储单元。
在本申请的一些实施例中,获取到多个候选存储单元后,根据多个候选存储单元的历史回收时间,确定目标存储单元。
在本申请的一些实施例中,如图4所示,图4为本申请实施例提供的一种数据回收方法的流程示意图四,S2012可以通过S301至S302实现,结合以下步骤进行说明。
S301、计算候选存储单元的历史回收时间和当前时间的差值。
在本申请的一些实施例中,获取候选存储单元的历史回收时间,即候选存储单元上一次执行数据回收操作的时间,利用候选存储单元上一次执行数据回收操作的时间和当前调度时间,计算差值。
在本申请的一些实施例中,每个候选存储单元对应一个差值。
S302、基于差值,在候选存储单元中确定目标存储单元。
在本申请的一些实施例中,计算出每个候选存储单元对应的差值后,将差值进行排序,找到最大差值,将最大差值对应的候选存储单元确定为目标存储单元。
在本申请的一些实施例中,也可以将差值大于预设差值阈值的一个或多个候选存储单元确定为目标存储单元。目标存储单元的数量可以根据存储集群的实际情况来设置,如移除相应数量的目标存储单元不会对存储集群的数据服务性能造成严重影响。具体的根据实际情况进行选择,本申请实施例不作限定。
可以理解的是,在本申请的一些实施例中,当多个候选存储单元的加权值最高且相等时,通过计算候选存储单元的历史回收时间和当前时间的差值,确定目标存储单元进行数据回收操作,可以准确定位需要进行数据回收操作的存储单元目标存储单元,进而提高了回收效率。
在本申请的一些实施例中,S106可以通过S1061至S1063实现,结合以下步骤进行说明。
S1061、通过第一预设接口,向第二存储单元发送停止回收指令,以使第二存储单元停止数据回收操作。
在本申请得到一些实施例中,控制单元通过第一预设接口,向第二存储单元发送停止回收指令,使得第二存储单元停止数据回收操作。
在本申请的一些实施例中,第二存储单元为处于数据回收操作的存储单元。
S1062、将至少一个第二存储单元的回收状态更新为不处于数据回收状态,并更新至少一个第二存储单元的历史回收时间。
在本申请的一些实施例中,将至少一个第二存储单元的回收更新为不处于数据回收状态,并更新至少一个第二存储单元的历史回收时间。
S1063、恢复所述第二存储单元的数据处理服务。
在本申请的一些实施例中,恢复第二存储单元的数据处理服务,即为将数据回收状态进行改变的第二存储单元重新加入到集群中。
在本申请的一些实施例中,S1062和S1063的顺序不作限制。
在本申请的一些实施例中,S102可以通过S1021实现,结合以下步骤进行说明。
S1021、在回收状态表征不处于数据回收状态、工作状态表征正常工作、且待回收数据量大于预设数据量阈值的情况下,将存储单元确定为第一存储单元,从而确定出至少一个第一存储单元;使用信息还包括工作状态。
在本申请的一些实施例中,根据使用信息包含的工作状态,确定出目前能正常工作,且需要进行数据回收操作的存储单元,然后查看存储单元的回收状态,找出不处于数据回收状态的存储单元,将找出的存储单元命名为第一存储单元,从而确定出至少一个第一存储单元。
可以理解的是,在本申请的一些实施例中,从存储单元集群中确定出正常且需要进行垃圾回收的存储单元去进行数据回收操作,使得系统资源能得到有效利用。
在本申请的一些实施例中,S105可以通过S1051至S1052实现,结合以下步骤进行说明。
S1051、通过第二预设接口,向所述目标存储单元发送数据回收指令,以停止所述目标存储单元的数据处理服务,并通过所述目标存储单元执行数据回收操作。
在本申请的一些实施例中,控制单元通过预设第二接口,向目标存储单元发送数据回收指令,使得目标存储单元停止数据处理服务,并通过目标存储单元执行数据回收操作。
在本申请的一些实施例中,使得目标存储单元停止数据处理服务即将目标存储单元暂时从存储单元集群中进行移出。
在本申请的一些实施例中,第一预设接口和第二预设接口可以相同也可以不同。
S1052、将目标存储单元的回收状态更新为处于数据回收状态。
在本申请的一些实施例中,将目标存储单元的回收状态更新为处于数据回收状态,即表征目标存储单元正在进行数据回收操作。
可以理解的是,在本申请的一些实施例中,通过接口,向目标存储单元发送数据回收指令,以停止目标存储单元的数据处理服务,并通过目标存储单元执行数据回收操作。将目标存储单元移出集群后进行数据回收操作,使得目标存储单元利用其全量资源进行数据回收操作,提升垃圾回收效率;并且,能够减少目标存储单元运行垃圾回收操作时对存储集群的数据服务性能的影响,从而也提高了存储集群性能。
在本申请的一些实施例中,还包括S401,如下。
S401、通过预设等待时间进入下一调度周期,通过下一调度周期,对存储单元集群进行数据回收处理,从而通过至少一个调度周期,实现对存储单元集群的数据回收调度。
在本申请的一些实施例中,通过预设等待时间后进入下一个调度周期,通过下一个调度周期,对存储单元集群再进行垃圾回收处理,从而通过至少一个调度周期,实现对存储单元集群的数据回收调度。
在本申请的一些实施例中,从通过预设等待时间进入下一调度周期,通过下一调度周期,对存储单元集群进行数据回收处理,从而通过至少一个调度周期,实现对存储单元集群的数据回收调度。可以看出,本申请中确定目标存储单元的算法本质上是一种轮询的算法,保证了每个存储单元在一定时间后至少能完成一次垃圾回收。且进行垃圾回收调度不受时间影响,一般来说凌晨会是业务访问量的低点,那么控制单元在调度时,可以在00:00-02:00之间适当的增加进行垃圾回收的存储单元的数量。
可以理解的是,在本申请的实施例中,通过预设等待时间进入下一调度周期,通过下一调度周期,对存储单元集群进行数据回收处理,从而通过至少一个调度周期,实现对存储单元集群的数据回收调度。保证了每个存储单元在一定时间后至少能完成一次垃圾回收,使得每个存储单元在一定时间后至少能完成一次垃圾回收。
本申请实施例提供了一种数据回收方法,可应用于存储集群进行垃圾回收的实际场景,存储集群中可以包含多个存储单元,每个存储单元与控制单元通过接口连接,方法如图5所示。
S1、控制单元收集所有存储单元的信息。
S2、依次遍历每个存储单元的信息。
S1-S2中,控制单元首先对整个集群中所有存储单元的使用信息进行收集,这些信息包含:存储单元是否正常工作、存储单元是否需要进行垃圾回收、存储单元上一次执行垃圾回收操作的时间(从未有则为空)、存储单元当前是否处于垃圾回收的状态。
S3、判断存储单元是否处于垃圾回收状态,若是,执行S8,若否,执行S4。
S4、判断存储单元上一次执行回收操作的时间是否为空,若是,执行S6,若否,执行S5。
S2-S4中,控制单元经过分析,从正常工作且需要进行垃圾回收的单元中选择一个存储单元,将它临时从数据集群中移出,将它的状态信息更新为垃圾回收进行中;同时控制这个存储单元进行全量的垃圾回收操作。在一些实施例中,如果某存储单元上一次执行垃圾 回收操作的时间为空,说明此存储单元从未进行过垃圾回收操作,选择该存储单元作为需要执行垃圾回收操作的存储单元,也即第一存储单元。
S5、判断存储单元上一次执行垃圾回收时间是否和当前时间差值最大,若是,执行S6,若否,执行S3。
S5中,如果所有存储单元均做过至少一遍垃圾回收操作,计算存储单元的当前负载百分比与其对应权重(如0.3)、待垃圾回收的对象数量占总对象数的百分比与其对应权重(如0.5),以及当前存储单元的使用量占总存储量的百分比与其对应权重(如0.2)的加权值,选取加权值最高的存储单元作为目标存储单元。如果最高加权值相等且不止一个,选取其中上一次执行垃圾回收操作时间和当前时间差值最大的单元作为目标存储单元。
S6、将存储单元移出集群。
S6中,控制单元将目标存储单元临时从数据集群中移出,将其状态信息更新为垃圾回收进行中。
S7、存储单元开始全量回收垃圾,然后执行S10。
S6中,存储单元可以利用全量资源进行垃圾回收操作。
S8、存储单元停止垃圾回收。
S9、将存储单元重新加入集群,然后执行S10。
S9中,控制单元经过分析,通过存储单元有关是否处于垃圾回收状态的信息找到上一轮进行垃圾回收操作的存储单元,作为第二存储单元。控制单元控制第二存储单元停止垃圾回收操作,并将其重新加入到数据集群中,同时更新其上一次执行垃圾回收操作时间的信息,并取消它处于垃圾回收的状态,恢复使用。
S10、等待一个调度周期的时间。
S10中,控制单元等待一个调度周期的时间,回到S1重新开始。
可以理解的是,在本申请的实施例中,通过收集集群内所有存储单元的使用信息,然后根据使用信息,确定出需要进行垃圾回收操作的存储单元,将其在集群中进行临时移出,并在移出后进行垃圾回收操作。当一个存储单元在被选中开始进行垃圾回收到结束这段时间,该单元被移出了集群。此时不管对这个存储单元做操作,都不会对整个集群产生影响,且提高了垃圾回收效率。
本申请实施例提供一种数据回收系统,该数据回收系统包括:控制单元和存储单元集群;所述控制单元通过预设第一接口和预设第二接口,与所述存储单元集群中的存储单元 互相连接;其中,
所述控制单元,被配置为在当前调度周期内,获取存储单元集群中存储单元的使用信息;所述使用信息包括回收状态与历史回收时间;基于所述回收状态,确定所述存储单元集群中不处于数据回收状态的至少一个第一存储单元;在所述回收状态表征处于数据回收状态的情况下,将所述回收状态对应的存储单元确定为第二存储单元;基于所述历史回收时间,从所述至少一个第一存储单元中确定目标存储单元;通过所述预设接口,向所述目标存储单元发送数据回收指令,并更新所述目标存储单元的回收状态;通过所述预设接口,向所述第二存储单元发送停止回收指令,并恢复所述第二存储单元的数据处理服务;
所述存储单元,被配置为在接收到所述数据回收指令的情况下,停止数据处理服务,并进行数据回收操作;在接收到所述停止回收指令的情况下,停止数据回收操作,并相应启动数据处理服务。
本申请实施例提供一种数据回收装置,如图6所述,图6为本申请实施例提供的一种数据回收装置的结构示意图,该数据回收装置包括:获取单元601、确定单元602、数据回收单元603、恢复单元604;其中,
所述获取单元601,被配置为在当前调度周期内,获取存储单元集群中存储单元的使用信息;所述使用信息包括回收状态与历史回收时间。
所述确定单元602,被配置为基于所述回收状态,确定所述存储单元集群中不处于数据回收状态的至少一个第一存储单元。
所述确定单元602,还被配置为在所述回收状态表征处于数据回收状态的情况下,将所述回收状态对应的存储单元确定为第二存储单元。
所述确定单元602,还被配置为基于所述历史回收时间,从所述至少一个第一存储单元中确定目标存储单元。
所述数据回收单元603,被配置为停止所述目标存储单元的数据处理服务,通过所述目标存储单元执行数据回收操作,并更新所述目标存储单元的回收状态。
所述恢复单元604,被配置为停止所述第二存储单元的数据回收操作,并恢复所述第二存储单元的数据处理服务。
所述确定单元602,还被配置为将历史回收时间为空的第一存储单元作为所述目标存储单元;所述历史回收时间表征上一次执行数据回收操作的时间;及在所述至少一个第一存储单元的历史回收时间均不为空的情况下,基于所述存储单元负载量、所述待回收数据 量与所述存储单元使用量中的至少一个,在所述至少一个第一存储单元中确定所述目标存储单元。所述使用信息还包括:存储单元负载量、待回收数据量与存储单元使用量。
所述确定单元602,还被配置为根据预设权重,对所述存储单元负载量、所述待回收数据量与所述存储单元使用量中的至少一个进行加权求和,得到所述至少一个第一存储单元中每个第一存储单元的加权值;及根据所述每个第一存储单元的加权值,从所述至少一个第一存储单元中确定出所述目标存储单元。
所述确定单元602,还被配置为在所述至少一个第一存储单元中,加权值最高的第一存储单元的数量大于预设数量阈值情况下,将所述加权值最高的第一存储单元作为候选存储单元;以及基于所述候选存储单元的历史回收时间,确定所述目标存储单元。
所述确定单元602,还被配置为计算所述候选存储单元的历史回收时间和当前时间的差值;以及基于所述差值,在所述候选存储单元中确定所述目标存储单元。
所述恢复单元604,被配置为通过第一预设接口,向所述第二存储单元发送停止回收指令,以使所述第二存储单元停止数据回收操作;及将所述至少一个第二存储单元的回收状态更新为不处于数据回收状态,并更新所述至少一个第二存储单元的历史回收时间;以及恢复所述第二存储单元的数据处理服务。
所述确定单元602,还被配置为在所述回收状态表征不处于数据回收状态、所述工作状态表征正常工作、且所述待回收数据量大于预设数据量阈值的情况下,将所述存储单元确定为第一存储单元,从而确定出所述至少一个第一存储单元。
所述数据回收单元603,还被配置为通过第二预设接口,向所述目标存储单元发送数据回收指令,以停止所述目标存储单元的数据处理服务,并通过所述目标存储单元执行数据回收操作;以及将所述目标存储单元的回收状态更新为处于数据回收状态。
所述数据回收单元603,还被配置为通过预设等待时间进入下一调度周期,通过所述下一调度周期,对所述存储单元集群进行数据回收处理,从而通过至少一个调度周期,实现对所述存储单元集群的数据回收调度。
可以理解的是,在上述装置实现方案中,通过收集集群内所有存储单元的使用信息,然后根据使用信息,确定出需要进行垃圾回收操作的存储单元,将其在集群中进行临时移出,并在移出后进行垃圾回收操作。一方面,移除或增加存储单元对于分布式存储系统是很常见的操作,基本不会对集群的性能产生影响。另一方面,因为存储单元在做垃圾回收的时候已经被移出了集群,此时不需要再考虑垃圾回收对这个存储单元产生的性能影响因 此,在此期间,该存储单元可以进行全量的垃圾回收,即尽可能的占用系统资源来进行垃圾回收,因此可以释放更多的存储空间。
基于上述实施例的方法,本申请实施例提供的一种结构示意图,如图7所示,图7本申请实施例提供的一种数据回收装置的结构示意图,包括:处理器701和存储器702;存储器702存储处理器701可执行的一个或者多个程序,当一个或者多个程序被执行时,通过处理器701执行如前所述实施例对应的一种数据回收方法。
本申请实施例提供了一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现所述的数据回收度方法。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
以上所述,仅为本申请的较佳实施例而已,并非用于限定本申请的保护范围。
工业实用性
本申请实施例中,通过收集集群内存储单元的使用信息,然后根据使用信息中的存储单元的回收状态,确定出第一存储单元和第二存储单元,根据历史回收时间,在第一存储单元中确定出目标存储单元,最后将目标存储单元停止数据处理服务(即移出集群)后,对目标存储单元执行数据回收操作。一方面,移除或增加存储单元不会对分布式存储系统集群的性能产生影响;另一方面,因为存储单元在做数据(垃圾)回收的时候已经被移出了集群,因此可以对该存储单元进行全量的数据回收,从而释放更多的存储空间,后续可以根据回收状态恢复存储单元的数据服务。这样,实现了对存储单元数据回收操作的统一调度,能充分利用目标存储单元的系统资源,提升垃圾回收效率;并且,能够减少目标存储单元运行垃圾回收操作时对存储集群的数据服务性能的影响,从而也提高了存储集群性能。

Claims (14)

  1. 一种数据回收方法,包括:
    在当前调度周期内,获取存储单元集群中存储单元的使用信息;所述使用信息包括回收状态与历史回收时间;
    基于所述回收状态,确定所述存储单元集群中不处于数据回收状态的至少一个第一存储单元;
    在所述回收状态表征处于数据回收状态的情况下,将所述回收状态对应的存储单元确定为第二存储单元;
    基于所述历史回收时间,从所述至少一个第一存储单元中确定目标存储单元;
    停止所述目标存储单元的数据处理服务,通过所述目标存储单元执行数据回收操作,并更新所述目标存储单元的回收状态;
    停止所述第二存储单元的数据回收操作,并恢复所述第二存储单元的数据处理服务。
  2. 根据权利要求1所述的方法,其中,所述使用信息还包括:存储单元负载量、待回收数据量与存储单元使用量;所述基于所述历史回收时间,从所述至少一个第一存储单元中确定目标存储单元,包括:
    将历史回收时间为空的第一存储单元作为所述目标存储单元;所述历史回收时间表征上一次执行数据回收操作的时间;
    在所述至少一个第一存储单元的历史回收时间均不为空的情况下,基于所述存储单元负载量、所述待回收数据量与所述存储单元使用量中的至少一个,在所述至少一个第一存储单元中确定所述目标存储单元。
  3. 根据权利要求2所述的方法,其中,所述基于所述存储单元负载量、所述待回收数据量与所述存储单元使用量中的至少一个,在所述至少一个第一存储单元中确定所述目标存储单元,包括:
    根据预设权重,对所述存储单元负载量、所述待回收数据量与所述存储单元使用量中的至少一个进行加权求和,得到所述至少一个第一存储单元中每个第一存储单元的加权值;
    根据所述每个第一存储单元的加权值,从所述至少一个第一存储单元中确定出所述目标存储单元。
  4. 根据权利要求3所述的方法,其中,所述根据所述每个第一存储单元的加权值,从所述至少一个第一存储单元中确定出所述目标存储单元,包括:
    在所述至少一个第一存储单元中,加权值最高的第一存储单元的数量大于预设数量阈值情况下,将所述加权值最高的第一存储单元作为候选存储单元;
    基于所述候选存储单元的历史回收时间,确定所述目标存储单元。
  5. 根据权利要求4所述的方法,其中,所述基于所述候选存储单元的历史回收时间,确定所述目标存储单元,包括:
    计算所述候选存储单元的历史回收时间和当前时间的差值;
    基于所述差值,在所述候选存储单元中确定所述目标存储单元。
  6. 根据权利要求1-5任一项所述的方法,其中,所述停止所述第二存储单元的数据回收操作,并恢复所述第二存储单元的数据处理服务,包括:
    通过第一预设接口,向所述第二存储单元发送停止回收指令,以使所述第二存储单元停止数据回收操作;
    将所述至少一个第二存储单元的回收状态更新为不处于数据回收状态,并更新所述至少一个第二存储单元的历史回收时间;
    恢复所述第二存储单元的数据处理服务。
  7. 根据权利要求2-5任一项所述的方法,其中,所述使用信息还包括工作状态,所述基于所述回收状态,确定所述存储单元集群中不处于数据回收状态的至少一个第一存储单元,包括:
    在所述回收状态表征不处于数据回收状态、所述工作状态表征正常工作、且所述待回收数据量大于预设数据量阈值的情况下,将所述存储单元确定为第一存储单元,从而确定出所述至少一个第一存储单元。
  8. 根据权利要求1-5任一项所述的方法,其中,所述停止所述目标存储单元的数据处理服务,通过所述目标存储单元执行数据回收操作,并更新所述目标存储单元的回收状态,包括:
    通过第二预设接口,向所述目标存储单元发送数据回收指令,以停止所述目标存储单元的数据处理服务,并通过所述目标存储单元执行数据回收操作;
    将所述目标存储单元的回收状态更新为处于数据回收状态。
  9. 根据权利要求1-5任一项所述的方法,其中,所述方法还包括:
    通过预设等待时间进入下一调度周期,通过所述下一调度周期,对所述存储单元集群进行数据回收处理,从而通过至少一个调度周期,实现对所述存储单元集群的数据回收调 度。
  10. 一种数据回收系统,包括:控制单元与存储单元集群;所述控制单元通过预设接口与所述存储单元集群中的存储单元互相连接;其中,
    所述控制单元,被配置为在当前调度周期内,获取存储单元集群中存储单元的使用信息;所述使用信息包括回收状态与历史回收时间;基于所述回收状态,确定所述存储单元集群中不处于数据回收状态的至少一个第一存储单元;在所述回收状态表征处于数据回收状态的情况下,将所述回收状态对应的存储单元确定为第二存储单元;基于所述历史回收时间,从所述至少一个第一存储单元中确定目标存储单元;通过所述预设接口,向所述目标存储单元发送数据回收指令,并更新所述目标存储单元的回收状态;通过所述预设接口,向所述第二存储单元发送停止回收指令,并恢复所述第二存储单元的数据处理服务;
    所述存储单元,被配置为在接收到所述数据回收指令的情况下,停止数据处理服务,并进行数据回收操作;在接收到所述停止回收指令的情况下,停止数据回收操作,并相应启动数据处理服务。
  11. 一种数据回收装置,包括获取单元、确定单元、数据回收单元和恢复单元;其中,
    所述获取单元,被配置为在当前调度周期内,获取存储单元集群中存储单元的使用信息;所述使用信息包括回收状态与历史回收时间;
    所述确定单元,被配置为基于所述回收状态,确定所述存储单元集群中不处于数据回收状态的至少一个第一存储单元;在所述回收状态表征处于数据回收状态的情况下,将所述回收状态对应的存储单元确定为第二存储单元;基于所述历史回收时间,从所述至少一个第一存储单元中确定目标存储单元;
    所述数据回收单元,被配置为停止所述目标存储单元的数据处理服务,通过所述目标存储单元执行数据回收操作,并更新所述目标存储单元的回收状态;
    所述恢复单元,被配置为停止所述第二存储单元的数据回收操作,并恢复所述第二存储单元的数据处理服务。
  12. 一种数据回收装置,包括:
    存储器,用于存储可执行指令;
    处理器,用于执行所述存储器中存储的可执行指令时,实现权利要求1至9任一项所述的方法。
  13. 一种计算机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现权 利要求1至9任一项所述的方法。
  14. 一种计算机程序产品,包括计算机程序或指令,所述计算机程序或指令被处理器执行时,实现权利要求1至9任一项所述的方法。
PCT/CN2023/082611 2022-03-25 2023-03-20 一种数据回收方法、系统、装置、计算机可读存储介质及程序产品 WO2023179569A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210305712.9 2022-03-25
CN202210305712.9A CN116841453A (zh) 2022-03-25 2022-03-25 一种数据回收方法、系统、装置及计算机可读存储介质

Publications (1)

Publication Number Publication Date
WO2023179569A1 true WO2023179569A1 (zh) 2023-09-28

Family

ID=88099968

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/082611 WO2023179569A1 (zh) 2022-03-25 2023-03-20 一种数据回收方法、系统、装置、计算机可读存储介质及程序产品

Country Status (2)

Country Link
CN (1) CN116841453A (zh)
WO (1) WO2023179569A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103080911A (zh) * 2010-06-30 2013-05-01 桑迪士克科技股份有限公司 存储器块的先占式垃圾回收
CN105512049A (zh) * 2015-11-23 2016-04-20 联想(北京)有限公司 一种存储器数据回收方法、装置及系统
CN109496300A (zh) * 2018-03-23 2019-03-19 华为技术有限公司 一种存储介质垃圾回收方法、存储介质和程序产品
CN110716690A (zh) * 2018-07-12 2020-01-21 阿里巴巴集团控股有限公司 数据回收方法和系统
CN110968417A (zh) * 2018-09-30 2020-04-07 伊姆西Ip控股有限责任公司 管理存储单元的方法、装置、系统和计算机程序产品
CN111768297A (zh) * 2020-06-30 2020-10-13 北京三快在线科技有限公司 状态变换参数获取方法、电子设备及存储介质
US10803012B1 (en) * 2014-05-09 2020-10-13 Amazon Technologies, Inc. Variable data replication for storage systems implementing quorum-based durability schemes

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103080911A (zh) * 2010-06-30 2013-05-01 桑迪士克科技股份有限公司 存储器块的先占式垃圾回收
US10803012B1 (en) * 2014-05-09 2020-10-13 Amazon Technologies, Inc. Variable data replication for storage systems implementing quorum-based durability schemes
CN105512049A (zh) * 2015-11-23 2016-04-20 联想(北京)有限公司 一种存储器数据回收方法、装置及系统
CN109496300A (zh) * 2018-03-23 2019-03-19 华为技术有限公司 一种存储介质垃圾回收方法、存储介质和程序产品
CN110716690A (zh) * 2018-07-12 2020-01-21 阿里巴巴集团控股有限公司 数据回收方法和系统
CN110968417A (zh) * 2018-09-30 2020-04-07 伊姆西Ip控股有限责任公司 管理存储单元的方法、装置、系统和计算机程序产品
CN111768297A (zh) * 2020-06-30 2020-10-13 北京三快在线科技有限公司 状态变换参数获取方法、电子设备及存储介质

Also Published As

Publication number Publication date
CN116841453A (zh) 2023-10-03

Similar Documents

Publication Publication Date Title
US9182923B2 (en) Controlling throughput of processing units associated with different load types in storage system
JP2007249491A (ja) マルチサーバ環境においてバッチジョブを分散させるプログラム、装置、および方法
CN102473134A (zh) 虚拟硬盘的管理服务器及管理方法、管理程序
CN111124254B (zh) 调度存储空间回收请求的方法、电子设备和程序产品
US20200042392A1 (en) Implementing Affinity And Anti-Affinity Constraints In A Bundled Application
CN111813347B (zh) 垃圾回收空间管理方法、装置及计算机可读存储介质
CN117971789A (zh) 一种基于云计算的分布式储存系统及其文件备份方法
WO2023179569A1 (zh) 一种数据回收方法、系统、装置、计算机可读存储介质及程序产品
CN108959614A (zh) 一种快照管理方法、系统、装置、设备及可读存储介质
CN115794315B (zh) 脏页速率的统计方法及装置、电子设备和存储介质
CN115408342A (zh) 文件处理方法、装置及电子设备
CN111221468A (zh) 存储块数据删除方法、装置、电子设备及云存储系统
CN111857988B (zh) 一种基于任务管理系统的容器任务调度方法及装置
JP5509921B2 (ja) 性能情報採取装置、性能情報採取方法、及び性能情報採取プログラム
JP2003223335A (ja) アウトソーシングシステム、アウトソーシング方法およびアウトソーシング用プログラム
CN112685334A (zh) 一种分块缓存数据的方法、装置及存储介质
CN114116317A (zh) 一种数据处理方法、装置、设备及介质
JP6051733B2 (ja) 制御システム、制御方法、及び、制御プログラム
CN111090627A (zh) 基于池化的日志存储方法、装置、计算机设备及存储介质
CN115826886B (zh) 追加写模式的数据垃圾回收方法、装置、系统及存储介质
CN117332881B (zh) 分布式训练方法及电子设备
WO2012119290A1 (zh) 分布式计算方法和分布式计算系统
CN116795571A (zh) 一种修复方法、装置及计算机可读存储介质
CN117785056A (zh) 一种分布式存储文件系统的数据写入方法及装置
CN114968064A (zh) 一种数据均衡方法、装置、服务器设备及可读存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23773816

Country of ref document: EP

Kind code of ref document: A1