CN115826886A - Write-pattern-added data garbage collection method, device, system and storage medium - Google Patents

Write-pattern-added data garbage collection method, device, system and storage medium Download PDF

Info

Publication number
CN115826886A
CN115826886A CN202310159671.1A CN202310159671A CN115826886A CN 115826886 A CN115826886 A CN 115826886A CN 202310159671 A CN202310159671 A CN 202310159671A CN 115826886 A CN115826886 A CN 115826886A
Authority
CN
China
Prior art keywords
space
data
spaces
length
physical
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310159671.1A
Other languages
Chinese (zh)
Other versions
CN115826886B (en
Inventor
王勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN202310159671.1A priority Critical patent/CN115826886B/en
Publication of CN115826886A publication Critical patent/CN115826886A/en
Application granted granted Critical
Publication of CN115826886B publication Critical patent/CN115826886B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Memory System (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a data garbage recovery method, a device, a system and a storage medium of an additional writing mode, and relates to the field of data garbage recovery. The method only takes the physical space with small effective space as the space to be recovered, so that the amount of effective data to be read is reduced, and the garbage recovery efficiency is accelerated; in addition, when there is a new data write, the valid data corresponding to the new data is invalidated, the normal reading and writing of the data can be prevented from being influenced.

Description

Write-pattern-added data garbage collection method, device, system and storage medium
Technical Field
The present invention relates to the field of data garbage collection, and in particular, to a data garbage collection method, apparatus, system and storage medium for write-added mode.
Background
The data has multiple read-write modes, wherein in the additional write mode, the data newly written in by the service request can be stored in a certain small storage space in a new physical space, and the small storage space in the old physical space originally responsible for storing the data can be marked as an invalid space. In order to ensure effective utilization of the physical space, the old physical space needs to be garbage collected, that is, all data in the old physical space is released, and the remaining effective data in the old physical space needs to be transferred to other physical spaces. When garbage collection is performed in the prior art, all physical spaces where garbage collection is to be performed are generally placed into a queue to be collected for one-by-one recovery, effective data need to be read from the physical spaces first when the effective data are transferred, when the remaining effective data are more, the remaining effective data need to be read for a longer time, the physical space with the fastest recovery speed cannot be selected in real time for recovery, so that garbage collection efficiency is slowed down, and the overlong garbage collection process can also affect the operation of other services in equipment.
In addition, since the garbage collection process and the service request writing process are performed simultaneously, if a new service request writes new data in the process of transferring all the remaining valid data in the old physical space to another physical space, if the written new data is exactly the transferred valid data, the data has two pieces of valid data, which affects the normal reading and writing of the data.
Disclosure of Invention
The invention aims to provide a data garbage collection method, a device, a system and a storage medium for an additional write mode, which can accelerate the garbage collection efficiency and avoid influencing the normal reading and writing of data.
In order to solve the above technical problem, the present invention provides a data garbage collection method of an additional write mode, which comprises:
partitioning the entire memory space into a plurality of physical spaces of uniform length;
when the ratio of the total length of the invalid spaces in all the physical spaces to the total length of the storage spaces is detected to be larger than a preset ratio, determining the length of the valid space in each physical space;
all the physical spaces with the length of the effective space smaller than a first preset length are used as spaces to be recovered;
transferring the valid data stored in the valid space of all the spaces to be recycled to other physical spaces;
if it is detected that new data corresponding to any one of the valid data is written in the storage space during the process of transferring the valid data, taking the valid data corresponding to the new data as invalid data;
and releasing all the data in the space to be recovered.
Preferably, after determining the length of the effective space in each of the physical spaces, the method further includes:
for any one physical space, judging whether the length of an effective space in the physical space is greater than a second preset length;
if not, adding the physical space into a management queue;
all the physical spaces with the length of the effective space being smaller than a first preset length are used as spaces to be recovered, and the method comprises the following steps:
all the physical spaces of which the lengths of the effective spaces in the management queue are smaller than a first preset length are used as spaces to be recovered;
wherein the second preset length is greater than the first preset length.
Preferably, before all the physical spaces in the management queue whose length of the effective space is smaller than a first preset length are used as spaces to be recycled, the method further includes:
s21: sequencing each physical space in sequence according to a preset sequence;
s22: judging whether the number of the physical spaces in the management queue is within a preset number range or not; if the number of the physical spaces is within the preset number range, taking all the physical spaces of which the lengths are smaller than a first preset length as spaces to be recovered; if the number exceeds the preset number range, entering S23; if the number is less than the preset number range, entering S24;
s23 the method comprises the following steps: moving the first physical space in the management queue out of the management queue, and returning to S22;
s24 the method comprises the following steps: and adding a preset value as a new second preset length on the basis of the second preset length, adding the physical space of which the length of the effective space is greater than the new second preset length into a management queue, and returning to the step S21.
Preferably, the sequencing of each physical space in sequence according to a preset sequence includes:
and sequencing the physical spaces in sequence according to the sequence of the lengths of the effective spaces from large to small.
Preferably, after determining the length of the effective space in each of the physical spaces in the management queue, the method further includes:
establishing N information queues, wherein different information queues correspond to different first length ranges of the effective space, and N is an integer not less than 2;
for all the physical spaces, recording the physical spaces into the corresponding information queues according to a first length range where the length of the effective space in the physical spaces is located;
after the first physical space in the management queue is moved out of the management queue, the method further comprises:
adding 1 to the number of shifted-out entries of the information queue in which the shifted-out physical space is located;
if the number of the physical spaces in the management queue exceeds a preset number range, after performing S22 to S23 multiple times to make the number of the physical spaces within the preset number range, the method further includes:
and in all the physical spaces of the information queues with the largest number of shifted-out entries, taking the length of the minimum effective space as the new second preset length, and clearing the number of shifted-out entries of each information queue.
Preferably, increasing a preset value as a new second preset length on the basis of the second preset length includes:
taking the maximum value of the first length range from small to large as a direction, and taking the next information queue of the information queue where the physical space with the maximum length of the effective space in the management queue is located as a target queue;
and taking the length of the largest effective space in all the physical spaces of the target queue as the new second preset length.
Preferably, when the maximum length of the effective space of the physical space is M, the establishing N information queues includes:
establishing the information queues with the first length range of M of the N effective spaces, wherein M and M are positive integers and M is not more than M;
the first length ranges of the effective space corresponding to the information queues are (0,m ], (M, 2M ], (2m, 3m ] … … (N × M, M) ].
Preferably, after the information queue with the length range of m of the N effective spaces is established, the method further includes:
establishing X grouped intervals, wherein different grouped intervals correspond to different second length ranges of the effective space, the second length range is larger than the first length range, and X is an integer not smaller than 2;
all the physical spaces with the length of the effective space being smaller than a first preset length are used as spaces to be recovered, and the method comprises the following steps:
determining the grouping interval needing to be recycled according to the ratio of the total length of the invalid spaces in all the physical spaces to the total length of the storage space;
and in all the grouping intervals which need to be recovered, taking all the physical spaces of which the lengths of the effective spaces are smaller than a first preset length as spaces to be recovered.
Preferably, determining the packet interval needing to be recycled according to a ratio of a total length of the invalid spaces in all the physical spaces to a total length of the storage space includes:
and selecting q grouping intervals from the X grouping intervals according to the ratio as the grouping intervals needing to be recovered, wherein q is a positive integer not greater than X, and the q and the ratio form positive correlation.
Preferably, determining the packet interval needing to be recycled according to a ratio of a total length of the invalid spaces in all the physical spaces to a total length of the storage space includes:
when the ratio is not greater than a first preset ratio threshold, selecting the first w packet intervals from the X packet intervals as the packet intervals needing to be recycled in the direction from small to large of the second preset length range, wherein w is a positive integer not greater than X;
when the ratio is greater than the first preset ratio threshold and not greater than a second preset ratio threshold, selecting the first w + e grouping intervals from the X grouping intervals as the grouping intervals needing to be recovered in the direction from small to large of the second preset length range, wherein e is a positive integer not greater than X and w + e is not greater than X;
when the ratio is greater than the second preset ratio threshold and not greater than a third preset ratio threshold, taking the second preset length range from small to large as a direction, selecting the first w + e + r grouping intervals from the X grouping intervals as the grouping intervals needing to be recovered, wherein r is a positive integer not greater than X and w + e + r is not greater than X;
and when the ratio is greater than the third preset ratio threshold, all the grouping intervals are taken as the grouping intervals needing to be recycled.
Preferably, before recording the physical space into the corresponding information queue according to a first length range in which the length of the effective space in the physical space is located, the method further includes:
when detecting that new data is written into the storage space, determining the invalid space corresponding to the new data;
and based on the length of the invalid space corresponding to the new data, reducing the length of the effective space of the physical space where the invalid space corresponding to the new data is located by the length which is the same as the length of the invalid space corresponding to the new data.
Preferably, the transferring the valid data stored in the valid space of all the spaces to be reclaimed to other physical spaces includes:
respectively acquiring effective data stored in all effective spaces of the space to be recovered;
and aggregating all the valid data into a plurality of new physical spaces by taking the length of the physical spaces as a reference.
Preferably, aggregating all the valid data into a plurality of new physical spaces includes:
s31: judging whether the total length of all the effective data after aggregation is larger than the length of one physical space or not; if yes, entering S32; if not, the process goes to S33;
s32: aggregating the valid data with the same length as the physical space into a new physical space, and returning to S31;
s33: and aggregating all the valid data into a new physical space.
Preferably, aggregating all the valid data into a plurality of new physical spaces includes:
determining the time when each effective data is written into the corresponding effective space and each effective data and length;
sequentially aggregating the effective data in the order of the time from first to last;
when a plurality of effective data with the same time need to be aggregated, the effective data are sequentially aggregated in the descending order of the length of the effective data.
Preferably, after respectively acquiring valid data stored in all valid spaces of the space to be reclaimed, the method further includes:
judging whether a plurality of effective data belong to data written by the same user operation;
if yes, reserving any one valid data in the plurality of valid data written by the same user operation, and deleting all other valid data.
Preferably, the transferring the valid data stored in the valid space of all the spaces to be reclaimed to another physical space includes:
determining a first logical space corresponding to the physical space where each effective data is located;
determining the physical space corresponding to each first logic space at the current moment;
judging whether a coincidence space exists between the physical space corresponding to each first logic space at the current moment and the physical space where each effective data is located;
and if so, transferring the effective data in the coincident space to other physical spaces.
Preferably, if it is detected that new data corresponding to any one of the valid data is written in the storage space during the transfer of the valid data, the setting, as invalid data, of the valid data corresponding to the new data includes:
after the valid data in the coincidence space are all transferred to other physical spaces, determining a corresponding relation Rdv0 between the coincidence space and the first logic space corresponding to the coincidence space;
determining a corresponding relation Rd1 between the first logic space and the corresponding physical space when new data is written;
taking the overlapping part between the corresponding relation Rdv0 and the corresponding relation Rd1 as the corresponding relation Rdv1;
determining a corresponding relation Rdn0 between the physical space to which each effective data is transferred and a second logic space corresponding to the physical space;
for valid data located in the coincidence space of the corresponding relation Rdv1, determining the physical space located in the corresponding relation Rdn0 to which the valid data is transferred, and determining a corresponding relation Rdn1 between the physical space and a third logical space corresponding to the physical space;
in the correspondence relation Rdn0, the valid data other than the correspondence relation Rdn1 is set as the invalid data.
The present invention also provides a data garbage collection apparatus for write-added mode, comprising:
a memory for storing a computer program;
and a processor for implementing the steps of the data garbage collection method of the additional writing mode when the computer program is executed.
The invention also provides a data garbage recycling system of the additional writing mode, which comprises:
the dividing unit is used for dividing the whole storage space into a plurality of physical spaces with consistent lengths in advance;
a length determining unit, configured to determine lengths of the valid spaces in the respective physical spaces when it is detected that a ratio between a total length of the invalid spaces in all the physical spaces and a total length of the storage spaces is greater than a preset ratio;
the space to be recovered determining unit is used for taking all the physical spaces of which the lengths of the effective spaces are smaller than a first preset length as spaces to be recovered;
the data transfer unit is used for transferring the effective data stored in the effective space of all the spaces to be recovered to other physical spaces;
a mutex detection unit configured to, if it is detected that new data corresponding to any one of the valid data is written in the storage space during the transfer of the valid data, take the valid data corresponding to the new data as invalid data;
and the garbage recovery unit is used for releasing all the data in the space to be recovered.
The present invention also provides a storage medium, wherein the storage medium stores a computer program, and the computer program, when executed by a processor, implements the steps of the data garbage collection method of the above-mentioned additional write mode.
The invention provides a data garbage recycling method, a device, a system and a storage medium for tracing an additional writing mode, and relates to the field of data garbage recycling. The method only takes the physical space with small effective space as the space to be recovered, so that the amount of effective data to be read is reduced, and the garbage recovery efficiency is accelerated; in addition, when new data is written, the valid data corresponding to the new data is invalidated, so that the normal reading and writing of the data can be prevented from being influenced.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings required in the prior art and the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
FIG. 1 is a flow chart of a data garbage collection method of an additional write mode according to the present invention;
FIG. 2 is a diagram illustrating a mapping relationship between a logical space and a physical space according to the present invention;
FIG. 3 is a diagram illustrating a mapping relationship between a logical space and a physical space when data is newly written according to the present invention;
FIG. 4 is a diagram illustrating a mapping relationship between a logical space and a physical space when data is newly written according to another embodiment of the present invention;
FIG. 5 is a diagram of a mapping relationship between a logical space and a physical space during data transfer according to the present invention;
fig. 6 is a schematic structural diagram of a data garbage collection apparatus with an additional write mode according to the present invention.
Detailed Description
The core of the invention is to provide a data garbage collection method, a device, a system and a storage medium for adding a write mode, which can accelerate the garbage collection efficiency and avoid influencing the normal reading and writing of data.
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The physical space refers to a certain small segment of space in the whole storage space of the device, the physical space can be subdivided into a plurality of small spaces, for the distributed block storage device, data can be additionally written when being read and written, in an additional writing mode, all data newly written by service requests can be stored in a certain small space of a new physical space, the small space in an old physical space which is originally responsible for storing the data can be marked as an invalid space, after the invalid space is excessive, in order to ensure the effective utilization of the physical space, the physical space needs to be subjected to garbage collection processing, the valid data in the remaining valid space in the physical space needs to be transferred into other physical spaces, so that the physical space is completely invalid, then the physical space can be released and used again as an available space, and the garbage collection process is a complete process.
In the garbage recycling process, a general method in the prior art is to put physical spaces meeting requirements into a queue and process the physical spaces in sequence, but because valid data transfer needs to read the valid data from the physical spaces first to transfer the valid data, when the valid data in a certain physical space is excessive (i.e., the garbage in the physical space is less), it takes a long time to read all the valid data in the physical space, which results in an overlong time spent in the whole data reading process, and thus an overlong time spent in the garbage recycling process. In the prior art, the spaces to be recovered are not finely adjusted and controlled, and the optimal spaces to be recovered are not selected in real time.
Moreover, as the garbage collection process and the service processing process are performed simultaneously, if new data is written in the garbage collection process, the written new data is just certain transferred data, and if the new data is not processed, the data has two effective data, so that the device cannot distinguish which data is the real effective data. In the prior art, when the problem is solved, generally, the garbage collection process and the service processing process are mutually exclusive, that is, the service processing is not allowed in the garbage collection process, and vice versa, although the data exception condition can be avoided in this way, the garbage collection process needs a long time, which may have a great influence on the service processing process.
To solve the above technical problem, referring to fig. 1, fig. 1 is a flowchart of a data garbage collection method of an additional write mode according to the present invention, including:
s1: dividing the whole storage space into a plurality of physical spaces with consistent lengths in advance;
in order to manage the entire storage space conveniently, the entire storage space needs to be divided into a plurality of small physical spaces with consistent lengths, in order to ensure the continuity of the stored data, the physical spaces need to be actually divided according to the maximum data volume of single data when being divided, and it is ensured that the physical space can accommodate the largest single data, for example, every 4mb may be used as a segment of physical space.
S2: when the ratio of the total length of the invalid spaces in all the physical spaces to the total length of the storage spaces is detected to be larger than a preset ratio, determining the length of the valid space in each physical space;
s3: all physical spaces of which the length of the effective space is smaller than a first preset length are used as spaces to be recovered;
since the newly written data in the additional write mode is stored in another physical space, the corresponding data in the old physical space that is originally responsible for storing the data becomes invalid data, i.e. garbage data, and the space for storing the garbage data is called an invalid space. When the garbage in the storage space is too much, that is, the invalid space is too long, the garbage collection processing task needs to be executed to release the invalid space and the garbage data. In this case, it is considered that the garbage collection process needs to transfer the valid data in the remaining valid space in the physical space that needs to be garbage collected to another physical space that does not need to be garbage collected, and it takes a certain time to read the valid data, and the more the valid data, the more the time it takes. Based on this, when garbage collection is performed, a physical space with a short effective space is preferentially selected as a space to be collected, that is, a physical space with a large amount of garbage data is preferentially selected as a space to be collected, it can be understood that the physical space with a short effective space length contains less effective data and more invalid data (garbage data), time spent for reading the effective data in the physical spaces is short, and the invalid data can be removed, so that only the physical space with the effective space length smaller than a first preset length is used as the space to be collected when garbage collection is performed, garbage data can be obviously removed, and garbage collection efficiency is improved.
The first preset length may be set based on a tolerance level of a user to garbage data in practical application or an efficiency requirement of the user to a garbage disposal process, for example, when a physical space is 4mb, the first preset length may be set to 2mb, the physical space with the effective space length less than 2mb is regarded as a space to be recycled, and the physical space with the effective space length greater than 2mb is not disposed. When garbage is cleared, data transfer and garbage clearing are only performed on the physical space with the effective space length smaller than 2mb, and the physical space with the effective space length larger than 2mb does not participate in the steps of data transfer or garbage clearing.
S4: transferring the effective data stored in the effective space of all the spaces to be recovered to other physical spaces;
s5: if it is detected that new data corresponding to any valid data is written into the storage space in the process of transferring the valid data, taking the valid data corresponding to the new data as invalid data;
in order to ensure the validity of the newly written data in the business process, the corresponding transfer data needs to be invalidated. Since the valid data stored in the space to be recovered belongs to the past data and the data newly written by the business process belongs to the latest data, when the new written data is processed by the business, the newly written data is required to be taken as the standard, if the new written data of the business process is found in the process of transferring the valid data and the newly written data is just a certain valid data to be transferred, because the new written data can be written into other physical spaces instead of the physical space where the corresponding valid data is located, the physical space where the corresponding valid data is located can be regarded as an invalid space. As can be seen from this, if the valid data (actually, invalid data) has already been transferred to another physical space, and the data cannot be released in the subsequent garbage collection, in this case, in order to ensure the validity of the newly written data, the data needs to be directly defined as invalid data, and the physical space where the data is transferred also becomes an invalid space. For example, a valid data X1 exists in a valid space A1 in a space a to be recovered, the valid data X1 needs to be transferred to a valid space B1 in another physical space B that does not need garbage recovery, and in the transfer process, a business process flow newly writes data X2 corresponding to the data X1 into a valid space C1 in another physical space C, at this time, older data X1 needs to be invalidated, and both the valid space A1 and the valid space B1 need to become invalid spaces, so as to ensure the validity of the valid space C1 and the data X2.
S6: and releasing the data in all the spaces to be recovered.
After all valid data in all the spaces to be recovered are transferred, the spaces in the spaces to be recovered are all equivalent to invalid spaces, and all the data in the spaces to be recovered need to be released, that is, all the data are deleted, so that the storage resources of the spaces to be recovered are released, and subsequent services are used conveniently.
Referring to fig. 2 to 6, fig. 2 is a diagram illustrating a correspondence relationship between a logical space and a physical space according to the present invention, fig. 3 is a diagram illustrating a correspondence relationship between a logical space and a physical space when data is newly written according to the present invention, fig. 4 is a diagram illustrating a correspondence relationship between a logical space and a physical space when data is newly written according to the present invention, fig. 5 is a diagram illustrating a correspondence relationship between a logical space and a physical space when data is transferred according to the present invention, and fig. 6 is a schematic structural diagram illustrating a data garbage collection apparatus of an additional write mode according to the present invention. The physical space refers to a space where data is actually stored, the logical space refers to a location where a certain user operates or a certain service, when data is generated by a certain user operation and falls into a certain physical space for storage, the location where the user operates is the logical space, and a link exists between the logical space and the physical space where the data generated at this time actually falls, but the logical space does not always correspond to the certain physical space, and the corresponding relationship between the logical space and the physical space can be changed continuously. Before the additional write mode starts, the logical space originally corresponds to a physical space (the logical space corresponds to the valid interval 0 in fig. 2), after the additional write mode starts, if new data is generated by user operation/business processing, the data is stored in other physical spaces (the data is stored in the valid intervals 3 and 4 in fig. 3), and the physical space (the invalid intervals 0 and 1 in fig. 3) in which the data is originally to be stored becomes an invalid space, and the valid intervals 0, 1 and 2 and the invalid intervals 0 and 1 in fig. 3 constitute the valid interval 0 in fig. 2; when data transfer is needed, the data in the valid interval 0~2 in fig. 3 need to be transferred to the valid interval 2 in fig. 4; if new data is written during the process of transferring data, the new data should be written into the valid interval 4 in fig. 3 (or written into the valid interval 1 in fig. 4), but the new data is actually written into the valid interval 3 in fig. 5, and the data has already been transferred into the invalid interval 2 in fig. 5, in order to ensure the validity of the data stored in the valid interval 3 in fig. 5, the invalid interval 2 in fig. 5 needs to be used as an invalid space (that is, the data after the transfer is also defined as invalid data).
In summary, by dividing the entire storage space into a plurality of physical spaces with the same length in advance, when detecting that the invalid space in the storage space is too much, the physical space with the smaller length of the valid space is used as the space to be recovered, the valid data stored in the valid space of the space to be recovered is transferred to other physical spaces, the data in the space to be recovered is released to realize garbage recovery, and if it is detected that new data is written in the process of transferring the valid data, the valid data corresponding to the new data is invalidated. The method only takes the physical space with small effective space as the space to be recovered, so that the amount of effective data to be read is reduced, and the garbage recovery efficiency is accelerated; in addition, when new data is written, the valid data corresponding to the new data is invalidated, so that the normal reading and writing of the data can be prevented from being influenced.
On the basis of the above-described embodiment:
as a preferred embodiment, after determining the length of the effective space in each physical space, the method further includes:
for any physical space, judging whether the length of the effective space in the physical space is greater than a second preset length;
if not, adding the physical space into a management queue;
all the physical spaces with the length of the effective space smaller than the first preset length are used as spaces to be recovered, and the method comprises the following steps:
all physical spaces of which the length of the effective space in the management queue is smaller than a first preset length are used as spaces to be recovered;
wherein the second preset length is greater than the first preset length.
In order to further improve the garbage collection efficiency, in the present invention, it is considered that valid data in a space to be collected needs to be read out from the space to be collected first to be transferred, and when the valid data in a certain space to be collected is too much, it takes a long time to read out all valid data in the space to be collected, it can be understood that if the valid data in a physical space is too much, it indicates that the garbage data/invalid data in the physical space is too little, and the physical space is taken into a management range, so that it not only takes a long time to read the valid data, but also the garbage data to be cleaned is very little, and it is seen that the efficiency of garbage collection on the physical space with too long effective space is very low. Based on this, when a garbage collection task needs to be performed, it needs to be determined whether the effective space length in each physical space is too long, that is, whether the effective space length is larger than a second preset length, if so, the effective space length is not managed, and if not, the effective space length can be added into a management queue to perform subsequent steps; when determining the space to be reclaimed, the determination needs to be made from the physical space in the management queue. The second preset length may be set based on a tolerance of the user to the garbage data in the actual application or an efficiency requirement of the user to the garbage disposal process, for example, when the physical space is 4mb and the first preset length is 2mb, the second preset length may be set to 3mb, the physical space with the length of the effective space being less than 3mb may be added to the management queue, the physical space with the length of the effective space being greater than 3mb may not enter the management queue, and it may be ensured that the physical space with less garbage data may not be involved in the garbage disposal step. Based on this, the efficiency of garbage collection can be further improved.
As a preferred embodiment, before all physical spaces in the management queue whose effective space length is smaller than the first preset length are used as spaces to be reclaimed, the method further includes:
s21: sequencing all the physical spaces in sequence according to a preset sequence;
s22: judging whether the number of the physical spaces in the management queue is within a preset number range or not; if the number of the effective spaces is within the preset number range, taking all the physical spaces with the length of the effective spaces smaller than the first preset length as spaces to be recovered; if the number exceeds the preset number range, the method goes to S23; if the quantity is less than the preset quantity range, entering S24;
s23: moving the first physical space in the management queue out of the management queue, and returning to S22;
s24: and increasing a preset value as a new second preset length on the basis of the second preset length, adding the physical space of which the length of the effective space is greater than the new second preset length into the management queue, and returning to the step S21.
In order to further improve the garbage collection efficiency, in the invention, considering that when the number of physical spaces in the management queue is too large, more time is needed to read valid data in the physical spaces when transferring data, and the number of physical spaces in the management queue is required to be ensured not to be too large; meanwhile, if the number of physical spaces in the management queue is too small, invalid data cannot be effectively cleaned; therefore, the range of the management queue needs to be determined according to actual requirements, and the amount of physical space in the queue is guaranteed to be within the range. Specifically, the preset sequence may be sorted in an order from large to small in the effective space, and when the number of the physical spaces is too large, the physical space with the largest effective space in the management queue needs to be moved out of the management queue, because the physical space with the largest effective space needs the longest reading time and has the smallest garbage data, the moving of the physical space out of the management queue has the most obvious improvement on garbage collection efficiency and the least influence on garbage data clearing number; when the number of the physical spaces is too small, the current second preset length needs to be increased, and then some new physical spaces are added on the basis of the new second preset length, namely some physical spaces with slightly longer effective spaces are added. Based on this, can further improve the efficiency that rubbish was retrieved, can also guarantee to delete the quantity of rubbish data and can not be too little.
As a preferred embodiment, sequentially ordering the physical spaces according to a preset order includes:
and sequencing the physical spaces in turn according to the sequence of the lengths of the effective spaces from large to small.
As a preferred embodiment, after determining the length of the effective space in each physical space in the management queue, the method further includes:
establishing N information queues, wherein different information queues correspond to different first length ranges of effective spaces, and N is an integer not less than 2;
for all physical spaces, recording the physical spaces into corresponding information queues according to a first length range in which the length of the effective space in the physical spaces is located;
after moving the first physical space in the management queue out of the management queue, the method further comprises:
adding 1 to the number of shifted-out entries of an information queue where the shifted-out physical space is located;
if the number of the physical spaces in the management queue exceeds the preset number range, after executing S22 to S23 for multiple times to make the number of the physical spaces within the preset number range, the method further includes:
and in all physical spaces of the information queues with the largest number of shifted-out entries, taking the length of the minimum effective space as a new second preset length, and clearing the number of shifted-out entries of each information queue.
In order to simply manage each physical space, in the present invention, each physical space needs to be classified, specifically, the classification is performed according to the actual physical space size of each physical space, N information queues are established first, each information queue corresponds to a different effective space length range, for example, information queues Q1, Q2, and Q3 … … Qn are established, and the effective space length range of the information queue Q1 is 0 to 64kb, so that a physical space with an effective space length below 64kb is recorded in the information queue Q1, and it can be known from the same reason that, if the effective space length range of the information queue Q2 is 65 to 128kb, a physical space with an effective space length within the range is recorded in the information queue Q2.
When the number of physical spaces in the management queue is excessive and some physical spaces need to be removed, the number of removed physical spaces in each information queue needs to be recorded, because the sequence of removing the physical spaces is to be removed according to the sequence of the effective spaces in the physical spaces from large to small, the information queue with the largest removed number is usually the queue with the length range of the just most effective space including the second preset length, but because the most physical spaces in the information queue are removed, the second preset length at the time is still set to be excessively large, on the basis of this, in order to avoid adding the physical spaces which are removed from the management queue into the management queue later, the current second preset length needs to be shortened, the physical space with the smallest effective space length can be found in the information queue, the effective space length of the physical space is used as the new second preset length, and because the removed physical spaces in the information queue are the physical spaces with the longest effective space length in the information queue, the effective space with relatively short physical spaces can be left in the information queue, and the subsequent physical spaces can be used as the second preset length which is too short to avoid too much shorter than the second preset length. Based on this, it is possible to easily manage the respective physical spaces.
As a preferred embodiment, adding a preset value as a new second preset length on the basis of the second preset length includes:
taking the maximum value of the first length range from small to large as a direction, and taking the next information queue of the information queue where the physical space with the maximum length of the effective space in the management queue is located as a target queue;
and taking the length of the maximum effective space as a new second preset length in all physical spaces of the target queue.
In order to reasonably increase the physical space, in the present invention, when the number of the physical spaces in the management queue is too small, it is indicated that the second preset length is set to be too small, and it is necessary to determine the information queue where the physical space with the largest effective space length is located based on the physical space with the largest effective space in the current management queue, and then, taking the largest value of the first length range of each information queue from small to large as a direction, find the next information queue of the information queue as a target queue, find the physical space with the smallest effective space length in the target queue, and take the effective space length of the physical space as a new second preset length. For example, assuming that the effective space length of the physical space with the largest effective space length in the current management queue is 500kb, the physical space is located in the information queue of (448 to 512kb), the next information queue with the largest effective space length in the first length range from small to large is (513 to 577kb), and if the effective space length of the physical space with the smallest effective space length in the information queue of (513 to 577kb) is 522kb, the 522kb is taken as a new second preset length.
As a preferred embodiment, when the maximum length of the effective space of the physical space is M, N information queues are established, including:
establishing an information queue with a first length range of M of N effective spaces, wherein M and M are positive integers and M is not more than M;
the first length ranges of the effective space corresponding to each information queue are (0,m ], (M, 2M ], (2m, 3m ] … … (N × M, M), respectively.
For simple management of each information queue, the first length range of each information queue may be set to be the same length range, for example, each 64kb may be used as a length range to establish the information queues Q1, Q2, Q3 … … Qn, then the first length range corresponding to the information queue Q1 is 0 to 64kb, the first length range corresponding to the information queue Q2 is 65 to 128kb, and so on. On the basis of which the respective information queue can be simply managed.
As a preferred embodiment, after the information queue with the length range of m of the N effective spaces is established, the method further includes:
establishing X grouped intervals, wherein different grouped intervals correspond to second length ranges of different effective spaces, the second length range is larger than the first length range, and X is an integer not smaller than 2;
all the physical spaces with the length of the effective space smaller than the first preset length are used as spaces to be recovered, and the method comprises the following steps:
determining a grouping interval needing to be recycled according to the ratio of the total length of the invalid spaces in all the physical spaces to the total length of the storage space;
and in all the grouping intervals needing to be recovered, taking all the physical spaces of which the lengths of the effective spaces are smaller than a first preset length as spaces to be recovered.
In order to perform garbage collection more effectively, in the present invention, all the information queues may be further grouped into several large packets, the length range of the effective space of these packets is usually much larger than the length range of the effective space of a single information queue, but it still is not larger than the total length of a single physical space, for example, assuming that the total length of a single physical space is 4mb, an information queue is established with every 64kb as a length range, obtaining 64 information queues, and then 4 packets G1, G2, G3 and G4 are established, the lengths of these 4 packets are not necessarily the same as each other, but may be divided according to a certain proportion, such as 4:3:3: the proportion of 1, namely 0 to 1490kb is used as a G1 group, 1491 to 2719kb is used as a G2 group, 2867 to 3947kb is used as a G3 group, 3948 to 4096kb is used as a G4 group, and for each information queue, the information queue is classified into a corresponding group interval according to the length range of an effective space, for example, the information queue Q1 of 0 to 64kb is classified into the G1 group. When garbage collection is performed, it is required to determine which packets are selected as packets requiring garbage collection according to a ratio between a total length of invalid spaces in all physical spaces and a total length of storage spaces, and the smaller the ratio, the fewer the packets requiring garbage collection are, so as to reduce the amount of valid data required to be read, thereby improving the efficiency of garbage collection.
As a preferred embodiment, determining the packet interval needing to be recycled according to the ratio of the total length of the invalid spaces in all the physical spaces to the total length of the storage space includes:
and selecting q packet intervals from the X packet intervals according to the ratio as packet intervals needing to be recovered, wherein q is a positive integer not greater than X, and the q is in positive correlation with the ratio.
In order to improve the garbage collection efficiency, some packets need to be selected as packets needing garbage collection according to an actual ratio, and not all the packets need to be subjected to garbage collection, specifically, when garbage collection is performed, the priorities of the G1-G4 packets are sequentially lowered, because the physical space in the G1 packet is the least effective space relative to other packets, that is, the garbage data is the most garbage, the garbage collection needs to be performed on the physical space in the G1 packet first, and when the garbage collection is performed, the garbage collection needs to be started from the physical space with the least effective space in the G1 packet, and as the ratio is increased, the G2-G4 packets are sequentially selected to be subjected to garbage collection. Based on this, the garbage collection efficiency can be improved.
As a preferred embodiment, determining the packet interval needing to be recycled according to the ratio of the total length of the invalid spaces in all the physical spaces to the total length of the storage space includes:
when the ratio is not greater than a first preset ratio threshold, selecting the first w packet intervals from the X packet intervals as packet intervals needing to be recovered in a direction from small to large of a second preset length range, wherein w is a positive integer not greater than X;
when the ratio is greater than a first preset ratio threshold and not greater than a second preset ratio threshold, selecting the first w + e packet intervals from the X packet intervals as packet intervals needing to be recovered in the direction from small to large of the second preset length range, wherein e is a positive integer not greater than X and w + e is not greater than X;
when the ratio is greater than a second preset ratio threshold and not greater than a third preset ratio threshold, selecting the first w + e + r packet intervals from the X packet intervals as packet intervals needing to be recovered in the direction from small to large of a second preset length range, wherein r is a positive integer not greater than X and w + e + r is not greater than X;
and when the ratio is greater than a third preset ratio threshold, all the packet intervals are used as packet intervals needing to be recycled.
In order to simply select the packets that need to be garbage recycled, in the present invention, considering that the sizes of different storage spaces are different, and the tolerance of users to garbage data and the requirement of garbage removal efficiency are different, 3 ratio thresholds can be preset and set in order to reasonably select some packets as the packets that need to be garbage recycled according to the actual requirement, and the three ratio thresholds respectively correspond to the above-mentioned G1, G2, G3 and G4 queues. Specifically, three ratio thresholds of low L (a first preset ratio threshold), medium M (a second preset ratio threshold) and high H (a third preset ratio threshold) may be set, the ratio between the total length of the dead space in all physical spaces and the total length of the storage space is divided into four intervals of (0,L ], (L, M ], (M, H ], (H, 1], when the ratio is not greater than the first preset ratio threshold (i.e., the ratio falls within the (0,L ]), only the G1 packet is garbage-recovered, when the ratio is greater than the first preset ratio threshold but not greater than the second preset ratio threshold (i.e., the ratio falls within the (L, M ]), the G1 and G2 packets are garbage-recovered, when the ratio is greater than the second preset ratio threshold but not greater than the third preset ratio threshold (i.e., the ratio falls within the (M, H ]), the three packets G1, G2 and G3 are recovered, when the ratio is greater than the ratio between the second preset ratio threshold and the third preset ratio threshold (M, H ]), namely 40%; similarly, the second preset ratio threshold may also be set to a value equal to the ratio of the G1+ G2 packet to the entire storage space, that is, 70%; the third predetermined ratio threshold is 90%, and when the actual ratio is equal to 30%, only the physical space in the G1 packet is garbage processed. Based on this, a packet that needs garbage collection can be simply selected.
As a preferred embodiment, before recording the physical space into the corresponding information queue according to the first length range in which the length of the effective space in the physical space is located, the method further includes:
when detecting that new data is written into the storage space, determining an invalid space corresponding to the new data;
based on the length of the invalid space corresponding to the new data, the length of the valid space of the physical space in which the invalid space corresponding to the new data is located is reduced by the same length as the length of the invalid space corresponding to the new data.
In order to ensure the real-time performance of garbage collection, in the present invention, after some valid data that has not been transferred is changed into invalid data due to new data being written into a storage space, in order to avoid transferring the invalid data, the space in which the invalid data is located needs to be changed into an invalid space, and the length of the valid space in the physical space needs to be shortened accordingly, so that the physical space is managed according to the actual length of the valid space in the physical space. Based on this, the real-time property of garbage recovery can be ensured.
As a preferred embodiment, transferring the valid data stored in the valid space of all the spaces to be reclaimed to other physical spaces includes:
respectively acquiring effective data stored in effective spaces of all spaces to be recovered;
and aggregating all valid data into a plurality of new physical spaces by taking the length of the physical spaces as a reference.
In order to manage the physical space more effectively, in the present invention, because the effective spaces in the spaces to be recovered have different lengths and contain different sizes of effective data, when transferring effective data, each space to be recovered containing effective data needs to be read, and thus, when effective data is dispersed in a plurality of physical spaces, the number of physical spaces to be managed increases. Based on this, when the data to be transferred is valid data, the data is aggregated to be changed into a new physical space unit, the length of the valid space of the new physical space unit is consistent with the maximum length of the physical space, that is, all valid data and no garbage data exist in the physical space, and when garbage collection is subsequently performed again, the physical space does not need to be managed, so that the situation that the valid data is dispersed in a plurality of physical spaces to cause the subsequent need to manage the plurality of physical spaces is avoided, and the physical space can be managed more effectively.
As a preferred embodiment, aggregating all valid data into a plurality of new physical spaces comprises:
s31: judging whether the total length of all the aggregated effective data is greater than the length of a physical space; if yes, entering S32; if not, the process goes to S33;
s32: aggregating the valid data with the length consistent with the length of the physical space into a new physical space, and returning to S31;
s33: all valid data are aggregated into a new physical space.
In order to avoid repeatedly aggregating data, in the present invention, effective data need to be aggregated one by one when aggregating effective data, when the remaining effective data which is not transferred is still larger than the length of a single physical space, effective data with the same length as the length of the single physical space is taken out from the remaining effective data which is not transferred and aggregated, and then whether the remaining effective data which is not transferred is still larger than the length of the single physical space is judged again until whether the remaining effective data which is not transferred is not larger than the length of the single physical space, at this time, all the remaining effective data are aggregated into a physical space, and the remaining space can transfer some effective data from other physical spaces or define it as an invalid space, which is not limited by the present invention. Based on this, duplicate aggregated data can be avoided.
As a preferred embodiment, aggregating all valid data into a plurality of new physical spaces comprises:
determining the time when each effective data is written into the corresponding effective space, each effective data and the length;
sequentially aggregating the effective data in the sequence of time from first to last;
when a plurality of valid data with the same time exist and need to be aggregated, the valid data are sequentially aggregated in the order of the length of the valid data from large to small.
In order to ensure the continuity of data, in the present invention, when aggregating data, it is necessary to aggregate the relevant data as much as possible. Specifically, the valid data may be aggregated according to a time sequence, and it is understood that two valid data written into the storage space in a time approximation may be data written by the same service, so that the valid data in a time approximation need to be aggregated together. For a plurality of valid data with the same time, the longest valid data needs to be aggregated when aggregation is performed, so as to avoid that the longer valid data cannot be aggregated to generate an additional physical space. Based on this, the continuity of data can be ensured, and the number of aggregated physical spaces can be reduced.
As a preferred embodiment, after respectively acquiring valid data stored in valid spaces of all the spaces to be reclaimed, the method further includes:
judging whether a plurality of valid data belong to data written by the same user operation;
if yes, any valid data is reserved in the plurality of valid data written by the same user operation, and all other valid data are deleted.
In order to avoid transferring repeated data, in the present invention, considering that when a service process, that is, a user operation writes data in a non-additional write mode, data may be written in a plurality of physical spaces, that is, the user operation corresponds to a plurality of valid data, and after entering the additional write mode, if the user operation is not performed again, the processor may not know which of the valid data is the newest data of the user operation. Based on this, transfer of duplicate data can be avoided.
As a preferred embodiment, transferring all valid data stored in the valid space of the space to be reclaimed to another physical space includes:
determining a first logic space corresponding to a physical space in which each effective data is located;
determining a physical space corresponding to each first logic space at the current moment;
judging whether a coincidence space exists between the physical space corresponding to each first logic space at the current moment and the physical space where each effective data is located;
if the valid data exists, the valid data in the overlapped space is transferred to other physical space.
In order to accurately determine the valid data to be transferred, in the present invention, the valid data may be determined according to the correspondence between the logical space and the physical space. Specifically, the physical space refers to a space where data is actually stored, the logical space refers to a location where a certain user operates or a service, when data is generated by a certain user operation and falls into a certain physical space for storage, the location where the user operates is the logical space, and there is a relationship between the logical space and the physical space where the data generated this time actually falls, but the logical space does not always correspond to a certain physical space, and a corresponding relationship between the logical space and the physical space changes constantly. In a garbage recovery backhaul, reversely determining a corresponding logical space of a physical space where valid data is located when the valid data is stored in an initial data disk, and then determining a physical space where the logical space is currently corresponding, if the logical space still contains the physical space where the valid data is located in the currently corresponding physical space, which indicates that the valid data is still valid, the data can be transferred; if not, it indicates that the user operation/service corresponding to the logical space points to other physical spaces, the valid data is actually equal to invalid data, and the data cannot be transferred. Based on this, the valid data that needs to be transferred can be accurately determined.
As a preferred embodiment, if it is detected that new data corresponding to any valid data is written in the storage space during the process of transferring valid data, the process of regarding valid data corresponding to the new data as invalid data includes:
after all the effective data in the coincidence space are transferred to other physical spaces, determining the corresponding relation Rdv0 between the coincidence space and the corresponding first logic space;
determining a corresponding relation Rd1 between a first logic space and a corresponding physical space when new data are written;
taking the overlapped part between the corresponding relation Rdv0 and the corresponding relation Rd1 as the corresponding relation Rdv1;
determining a corresponding relation Rdn0 between the physical space to which each effective data is transferred and the corresponding second logic space;
for the valid data located in the coincidence space of the corresponding relation Rdv1, determining the physical space located in the corresponding relation Rdn0 to which the valid data is transferred, and determining the corresponding relation Rdn1 between the physical space and the third logical space corresponding to the physical space;
in the correspondence relationship Rdn0, valid data other than the correspondence relationship Rdn1 is regarded as invalid data.
In order to accurately determine the valid data to be transferred, in the present invention, the valid data needs to be determined according to the relationship between the logical space and the physical space. Specifically, the Rdv0 is compared with the current forward relationship Rd1 in the system, a forward relationship part Rdv1 which is mapped consistently in the current state is determined, the consistent part is the current still valid forward relationship, because Rdv1 is a subset of Rdv0, the Rdv1 is compared with Rdv0, namely, the part which corresponds to Rdv1 in Rdn0 is intercepted, and Rdn1 which is the current still valid part is obtained, the content of Rdn1 is updated into the forward relationship recorded by the system, and Rd2 which is the forward relationship after garbage recovery is obtained, and at this time, the valid data segment has completed mapping to a new physical space position. Based on this, the valid data that needs to be transferred is accurately determined.
Referring to fig. 6, fig. 6 is a schematic structural diagram of a data garbage recycling apparatus with an additional write mode according to the present invention, including:
a memory 21 for storing a computer program;
the processor 22 is configured to implement the steps of the data garbage collection method of the write-added mode when executing the computer program.
For a detailed description of the data garbage collection apparatus of the write-added mode provided by the present invention, please refer to the above-mentioned embodiment of the data garbage collection method of the write-added mode, which is not described herein again.
The invention also provides a data garbage recycling system of the additional writing mode, which comprises:
the dividing unit is used for dividing the whole storage space into a plurality of physical spaces with consistent lengths in advance;
the length determining unit is used for determining the length of the effective space in each physical space when the ratio of the total length of the invalid spaces in all the physical spaces to the total length of the storage space is detected to be larger than a preset ratio;
the space to be recovered determining unit is used for taking all the physical spaces of which the length of the effective space is smaller than the first preset length as spaces to be recovered;
the data transfer unit is used for transferring the effective data stored in the effective space of all the spaces to be recovered to other physical spaces;
a mutex detection unit configured to, if it is detected that new data corresponding to any one of the valid data is written in the storage space in the process of transferring the valid data, take the valid data corresponding to the new data as invalid data;
and the garbage recovery unit is used for releasing all the data in the space to be recovered.
For a detailed description of the data garbage collection system of the write-added mode provided by the present invention, please refer to the above-mentioned embodiment of the data garbage collection method of the write-added mode, which is not described herein again.
The present invention also provides a storage medium, wherein the storage medium stores a computer program, and the computer program realizes the steps of the data garbage collection method of the additional writing mode when being executed by a processor.
For a detailed description of the storage medium provided by the present invention, please refer to the above embodiment of the data garbage collection method with additional write mode, which is not repeated herein.
In the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed in the embodiment corresponds to the method disclosed in the embodiment, so that the description is simple, and the relevant points can be referred to the description of the method part.
It is further noted that, in the present specification, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising a … …" does not exclude the presence of another identical element in a process, method, article, or apparatus that comprises the element.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (20)

1. A data garbage collection method of an additional writing mode is characterized by comprising the following steps:
dividing the whole storage space into a plurality of physical spaces with consistent lengths in advance;
when the ratio of the total length of the invalid spaces in all the physical spaces to the total length of the storage spaces is detected to be larger than a preset ratio, determining the length of the valid space in each physical space;
all the physical spaces with the length of the effective space smaller than a first preset length are used as spaces to be recovered;
transferring the valid data stored in the valid space of all the spaces to be recycled to other physical spaces;
if it is detected that new data corresponding to any one of the valid data is written in the storage space during the process of transferring the valid data, taking the valid data corresponding to the new data as invalid data;
and releasing all the data in the space to be recovered.
2. The data garbage collection method of the append write mode according to claim 1, further comprising, after determining a length of the active space in each of the physical spaces:
for any one physical space, judging whether the length of an effective space in the physical space is greater than a second preset length;
if not, adding the physical space into a management queue;
all the physical spaces with the length of the effective space being smaller than a first preset length are used as spaces to be recovered, and the method comprises the following steps:
all the physical spaces of which the lengths of the effective spaces in the management queue are smaller than a first preset length are used as spaces to be recovered;
wherein the second preset length is greater than the first preset length.
3. The method for garbage collection of data in append write mode according to claim 2, further comprising, before all the physical spaces in the management queue whose effective space length is smaller than a first preset length are used as spaces to be collected:
s21: sequencing each physical space in sequence according to a preset sequence;
s22: judging whether the number of the physical spaces in the management queue is within a preset number range or not; if the number of the physical spaces is within the preset number range, taking all the physical spaces of which the lengths are smaller than a first preset length as spaces to be recovered; if the number exceeds the preset number range, entering S23; if the number is less than the preset number range, entering S24;
s23: moving the first physical space in the management queue out of the management queue, and returning to S22;
s24: and adding a preset value as a new second preset length on the basis of the second preset length, adding the physical space of which the length of the effective space is greater than the new second preset length into a management queue, and returning to the step S21.
4. The method for garbage collection of additionally written mode data according to claim 3, wherein the sorting of each of the physical spaces in turn according to a preset order comprises:
and sequencing the physical spaces in sequence according to the sequence of the lengths of the effective spaces from large to small.
5. The method for garbage collection of data in an append write mode according to claim 3, further comprising, after determining the length of the active space in each of the physical spaces in the management queue:
establishing N information queues, wherein different information queues correspond to different first length ranges of the effective space, and N is an integer not less than 2;
for all the physical spaces, recording the physical spaces into the corresponding information queues according to a first length range where the length of the effective space in the physical spaces is located;
after the first physical space in the management queue is moved out of the management queue, the method further comprises:
adding 1 to the number of shifted-out entries of the information queue in which the shifted-out physical space is located;
if the number of the physical spaces in the management queue exceeds a preset number range, after executing S22 to S23 for multiple times to make the number of the physical spaces within the preset number range, the method further includes:
and in all the physical spaces of the information queues with the largest number of shifted-out entries, taking the length of the minimum effective space as the new second preset length, and clearing the number of shifted-out entries of each information queue.
6. The method for garbage collecting data in an append write mode according to claim 5, wherein adding a preset value as a new second preset length based on the second preset length comprises:
taking the maximum value of the first length range from small to large as a direction, and taking the next information queue of the information queue where the physical space with the maximum length of the effective space in the management queue is located as a target queue;
and taking the length of the largest effective space in all the physical spaces of the target queue as the new second preset length.
7. The method for garbage collection of additionally written mode data according to claim 5, wherein when the maximum length of the effective space of the physical space is M, establishing N information queues comprises:
establishing the information queues with the first length range of M of the N effective spaces, wherein M and M are positive integers and M is not more than M;
the first length ranges of the effective space corresponding to the information queues are (0,m ], (M, 2M ], (2m, 3m ] … … (N × M, M) ].
8. The method for garbage collection of an append write pattern according to claim 7, wherein after creating the N information queues of the valid space with length m, further comprising:
establishing X grouped intervals, wherein different grouped intervals correspond to different second length ranges of the effective space, the second length range is larger than the first length range, and X is an integer not smaller than 2;
all the physical spaces with the length of the effective space being smaller than a first preset length are used as spaces to be recovered, and the method comprises the following steps:
determining the grouping interval needing to be recycled according to the ratio of the total length of the invalid spaces in all the physical spaces to the total length of the storage space;
and in all the grouping intervals which need to be recovered, taking all the physical spaces of which the lengths of the effective spaces are smaller than a first preset length as spaces to be recovered.
9. The method for garbage collection of an append write mode as recited in claim 8, wherein determining the packet sections that need to be collected according to a ratio of a total length of invalid spaces in all the physical spaces to a total length of the storage space comprises:
selecting q grouping intervals from the X grouping intervals according to the ratio as the grouping intervals needing to be recovered, wherein q is a positive integer not greater than X, and positive correlation is formed between q and the ratio.
10. The method of claim 8, wherein determining the packet section to be recycled according to a ratio of a total length of the invalid spaces in all the physical spaces to a total length of the storage space comprises:
when the ratio is not greater than a first preset ratio threshold, selecting the first w packet intervals from the X packet intervals as the packet intervals needing to be recycled in the direction from small to large of the second preset length range, wherein w is a positive integer not greater than X;
when the ratio is larger than the first preset ratio threshold and not larger than a second preset ratio threshold, selecting the first w + e grouping intervals from the X grouping intervals as the grouping intervals needing to be recovered in the direction from small to large of the second preset length range, wherein e is a positive integer not larger than X and w + e is not larger than X;
when the ratio is greater than the second preset ratio threshold and not greater than a third preset ratio threshold, taking the second preset length range from small to large as a direction, selecting the first w + e + r grouping intervals from the X grouping intervals as the grouping intervals needing to be recovered, wherein r is a positive integer not greater than X and w + e + r is not greater than X;
and when the ratio is larger than the third preset ratio threshold, taking all the grouping intervals as the grouping intervals needing to be recycled.
11. The method for garbage collection of appended write patterns according to claim 5, wherein before recording the physical space in the corresponding information queue according to the first length range in which the length of the effective space in the physical space is located, further comprising:
when detecting that new data is written into the storage space, determining the invalid space corresponding to the new data;
and based on the length of the invalid space corresponding to the new data, reducing the length of the effective space of the physical space where the invalid space corresponding to the new data is located by the length which is the same as the length of the invalid space corresponding to the new data.
12. The method for garbage collection of an append write mode according to claim 1, wherein transferring all valid data stored in the valid space of the space to be collected to other physical spaces comprises:
respectively acquiring effective data stored in the effective spaces of all the spaces to be recovered;
and aggregating all the valid data into a plurality of new physical spaces by taking the length of the physical spaces as a reference.
13. The method of claim 12, wherein aggregating all of the valid data into a plurality of new physical spaces comprises:
s31: judging whether the total length of all the aggregated effective data is greater than the length of one physical space; if yes, entering S32; if not, the process goes to S33;
s32: aggregating the valid data with the length consistent with the length of the physical space into a new physical space, and returning to S31;
s33: and aggregating all the valid data into a new physical space.
14. The method of claim 13, wherein aggregating all of the valid data into a plurality of new physical spaces comprises:
determining the time when each effective data is written into the corresponding effective space and each effective data and the length;
sequentially aggregating the effective data in the order of the time from first to last;
when a plurality of effective data with the same time need to be aggregated, sequentially aggregating the effective data according to the sequence of the length of the effective data from large to small.
15. The method for garbage collection of additionally written data according to claim 12, further comprising, after respectively obtaining the valid data stored in the valid spaces of all the spaces to be collected:
judging whether a plurality of effective data belong to data written by the same user operation;
if yes, reserving any effective data in the plurality of effective data written by the same user operation, and deleting all other effective data.
16. The data garbage collection method of the append write mode according to any of claims 1 to 15, wherein transferring all valid data stored in the valid space of the space to be collected to other physical spaces comprises:
determining a first logical space corresponding to the physical space where each effective data is located;
determining the physical space corresponding to each first logic space at the current moment;
judging whether a coincidence space exists between the physical space corresponding to each first logic space at the current moment and the physical space where each effective data is located;
and if so, transferring the effective data in the coincident space to other physical spaces.
17. The method for garbage collection of data in an append write mode according to claim 16, wherein if it is detected that new data corresponding to any of the valid data is written in the storage space during the transferring of the valid data, the step of regarding the valid data corresponding to the new data as invalid data comprises:
after the valid data in the coincidence space are all transferred to other physical spaces, determining a corresponding relation Rdv0 between the coincidence space and the first logic space corresponding to the coincidence space;
determining a corresponding relation Rd1 between the first logic space and the corresponding physical space when new data is written;
taking the overlapping part between the corresponding relation Rdv0 and the corresponding relation Rd1 as the corresponding relation Rdv1;
determining a corresponding relation Rdn0 between the physical space to which each effective data is transferred and a second logic space corresponding to the physical space;
for valid data located in the coincidence space of the corresponding relation Rdv1, determining the physical space located in the corresponding relation Rdn0 to which the valid data is transferred, and determining a corresponding relation Rdn1 between the physical space and a third logical space corresponding to the physical space;
in the correspondence relation Rdn0, the valid data other than the correspondence relation Rdn1 is set as the invalid data.
18. A data garbage collection apparatus of an additional write mode, comprising:
a memory for storing a computer program;
a processor for implementing the steps of the data garbage collection method of the append write mode of any of claims 1 to 17 when executing the computer program.
19. A write-once data garbage collection system, comprising:
the dividing unit is used for dividing the whole storage space into a plurality of physical spaces with consistent lengths in advance;
a length determining unit, configured to determine lengths of the valid spaces in the respective physical spaces when it is detected that a ratio between a total length of the invalid spaces in all the physical spaces and a total length of the storage spaces is greater than a preset ratio;
the space to be recovered determining unit is used for taking all the physical spaces of which the lengths of the effective spaces are smaller than a first preset length as spaces to be recovered;
the data transfer unit is used for transferring the effective data stored in the effective space of all the spaces to be recovered to other physical spaces;
a mutex detection unit configured to, if it is detected that new data corresponding to any one of the valid data is written in the storage space during the transfer of the valid data, take the valid data corresponding to the new data as invalid data;
and the garbage recovery unit is used for releasing all the data in the space to be recovered.
20. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when executed by a processor, implements the steps of the data garbage collection method of an append write mode according to any of claims 1 to 17.
CN202310159671.1A 2023-02-24 2023-02-24 Data garbage collection method, device and system in additional write mode and storage medium Active CN115826886B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310159671.1A CN115826886B (en) 2023-02-24 2023-02-24 Data garbage collection method, device and system in additional write mode and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310159671.1A CN115826886B (en) 2023-02-24 2023-02-24 Data garbage collection method, device and system in additional write mode and storage medium

Publications (2)

Publication Number Publication Date
CN115826886A true CN115826886A (en) 2023-03-21
CN115826886B CN115826886B (en) 2023-05-12

Family

ID=85522218

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310159671.1A Active CN115826886B (en) 2023-02-24 2023-02-24 Data garbage collection method, device and system in additional write mode and storage medium

Country Status (1)

Country Link
CN (1) CN115826886B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727402A (en) * 2009-10-23 2010-06-09 深圳市江波龙电子有限公司 Read and write control method and system of data of nonvolatile storage
JP2014132457A (en) * 2013-01-03 2014-07-17 Samsung Electronics Co Ltd Method for reconfiguring storage system, variable structure storage system and variable structure storage device thereof, and executable software product and host
CN110018794A (en) * 2019-04-11 2019-07-16 苏州浪潮智能科技有限公司 A kind of rubbish recovering method, device, storage system and readable storage medium storing program for executing
CN111930301A (en) * 2020-06-29 2020-11-13 深圳佰维存储科技股份有限公司 Garbage recycling optimization method and device, storage medium and electronic equipment
CN112269535A (en) * 2020-10-16 2021-01-26 苏州浪潮智能科技有限公司 Space resource allocation method and device of storage system and readable storage medium
CN112749102A (en) * 2021-01-15 2021-05-04 苏州浪潮智能科技有限公司 Memory space garbage recycling method, device, equipment and medium
WO2022017002A1 (en) * 2020-07-22 2022-01-27 华为技术有限公司 Garbage collection method and device
CN114442949A (en) * 2022-01-14 2022-05-06 苏州浪潮智能科技有限公司 Garbage data recovery method and system

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101727402A (en) * 2009-10-23 2010-06-09 深圳市江波龙电子有限公司 Read and write control method and system of data of nonvolatile storage
JP2014132457A (en) * 2013-01-03 2014-07-17 Samsung Electronics Co Ltd Method for reconfiguring storage system, variable structure storage system and variable structure storage device thereof, and executable software product and host
CN110018794A (en) * 2019-04-11 2019-07-16 苏州浪潮智能科技有限公司 A kind of rubbish recovering method, device, storage system and readable storage medium storing program for executing
CN111930301A (en) * 2020-06-29 2020-11-13 深圳佰维存储科技股份有限公司 Garbage recycling optimization method and device, storage medium and electronic equipment
WO2022017002A1 (en) * 2020-07-22 2022-01-27 华为技术有限公司 Garbage collection method and device
CN112269535A (en) * 2020-10-16 2021-01-26 苏州浪潮智能科技有限公司 Space resource allocation method and device of storage system and readable storage medium
CN112749102A (en) * 2021-01-15 2021-05-04 苏州浪潮智能科技有限公司 Memory space garbage recycling method, device, equipment and medium
CN114442949A (en) * 2022-01-14 2022-05-06 苏州浪潮智能科技有限公司 Garbage data recovery method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
KEE-HOON JANG; TAE HEE HAN: "Efficient garbage collection policy and block management method for NAND flash memory" *
陈游?;朱博弘;韩银俊;屠要峰;舒继武;: "一种持久性内存文件系统数据页的混合管理机制" *
黄滨;俞建新;: "大容量闪存的层次型热数据识别框架" *

Also Published As

Publication number Publication date
CN115826886B (en) 2023-05-12

Similar Documents

Publication Publication Date Title
CN107943719B (en) Flash translation layer control method based on request classification
CN108776614B (en) Recovery method and device of storage block
CN110858162B (en) Memory management method and device and server
CN111061752B (en) Data processing method and device and electronic equipment
CN109918448A (en) A kind of cloud storage data classification method based on user behavior
US7870171B2 (en) Method and system for garbage collection in a multitasking environment
CN111880731A (en) Data processing method and device and related components
CN113867627A (en) Method and system for optimizing performance of storage system
CN106446044A (en) Storage space reclaiming method and device
CN112000281A (en) Caching method, system and device for deduplication metadata of storage system
CN116578409A (en) Method, system and medium for identifying and migrating memory hot page
CN113254270B (en) Self-recovery method, system and storage medium for storing cache hot spot data
CN106528703A (en) Deduplication mode switching method and apparatus
CN112799590B (en) Differentiated caching method for online main storage deduplication
CN110007860A (en) Method, solid state hard disk and the storage device of garbage disposal based on LSM database
US7155467B1 (en) Adaptive type-partitioned garbage collection
CN115826886A (en) Write-pattern-added data garbage collection method, device, system and storage medium
CN115408342A (en) File processing method and device and electronic equipment
CN111625506A (en) Distributed data deleting method, device and equipment based on deleting queue
CN111221468A (en) Storage block data deleting method and device, electronic equipment and cloud storage system
CN116643704A (en) Storage management method, storage management device, electronic equipment and storage medium
CN114115744A (en) Control method and device for data recovery task, electronic equipment and storage medium
CN113625959B (en) Data processing method and device
CN107329903B (en) Memory garbage recycling method and system
CN111143288A (en) Data storage method, system and related device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant