Disclosure of Invention
Aiming at the problem, the invention provides a data recovery method which is used for solving the problems that data pages need to be read one by one and the recovery cost is high in the prior art.
The embodiment of the invention provides a data recovery method, which comprises the following steps:
the storage device reads a data page list of the data recovery unit;
the storage device determines N data pages with valid data in a data recovery unit corresponding to the data page list according to the data page list; the data page list records whether the data of each data page in the data recovery unit is valid data; n is an integer greater than or equal to 0;
and when N is larger than 0, the storage device backs up the data of the N data pages and erases all the data of the data recovery unit.
Optionally, the reading, by the storage device, of the data page list of the data recycling unit includes:
the storage device determines the data page list according to a look-up table; the lookup table records the mapping relation between the data recovery unit and the data page list and the physical storage position of the data page list;
the storage device reads the list of data pages from the physical storage location.
Optionally, the method includes:
the data recovery unit comprises M data pages; m is an integer greater than or equal to N;
the data page list comprises M elements, and for any element in the M elements, the element is uniquely corresponding to one data page in the M data pages and is used for indicating that the data of the data page which is uniquely corresponding to the element is valid or invalid;
when the data page list is created, aiming at any data page in the M data pages, the storage device takes the value of the corresponding element of the data page in the data page list as a first preset value, and the first preset value is used for indicating that the data of the data page is invalid.
Optionally, the method includes:
and for any data page in the M data pages, when the data page is written into the data page for the first time after the data page list is created, the storage device takes the element value corresponding to the data page as a second preset value, and the second preset value is used for indicating that the data of the data page is valid.
Optionally, after the storage device takes the element value corresponding to the data page as the second preset value, the method further includes:
if the data stored in the data page needs to be updated, the storage device stores the updated data of the data page to a first data page; the first data page is any data page in the storage device;
the storage device takes the element value corresponding to the first data page in the data page list as the second preset value;
and the storage equipment takes the element value corresponding to the data page in the data page list as the first preset value.
Optionally, after the storage device erases all data of the data recovery unit, the method further includes:
the storage device recreates the list of data pages.
In the embodiment of the invention, the storage device stores the data page list of N data pages recording the data in the data recovery unit as the valid data in advance, only the data page storing the valid data is read according to the data page list, the data page storing the invalid data does not need to be read in a waste operation mode, and the data recovery cost is reduced.
Embodiments of the present invention further provide a storage medium including a program or instructions, where the program or instructions are executed to implement a data recovery method and any optional method provided by embodiments of the present invention.
Embodiments of the present invention further provide a computer including a program or instructions, where the program or instructions are executed to implement a data recovery method and any optional method provided by embodiments of the present invention.
Based on the same inventive concept, the embodiment of the present invention further provides a data recovery device, which comprises:
a processing unit that stores an acquisition program and a processing program, executes the acquisition program and the processing program,
acquiring a program: a data page list for reading the data recovery unit;
and (3) processing procedures: the data recovery unit is used for determining N data pages of which the data in the data recovery unit corresponding to the data page list is valid data according to the data page list; the data page list records whether the data of each data page in the data recovery unit is valid data; n is an integer greater than or equal to 0;
and when N is larger than 0, backing up the data of the N data pages and erasing all the data of the data recovery unit.
Optionally, the obtaining program is specifically configured to:
determining the data page list according to a lookup table; the look-up table records the mapping relation between the data recovery unit and the data page list and the physical storage position of the data page list;
reading the list of data pages from the physical storage location.
Optionally, the processing program is specifically configured to:
the data recovery unit comprises M data pages; m is an integer greater than or equal to N;
the data page list comprises M elements, and for any element in the M elements, the element is uniquely corresponding to one data page in the M data pages and is used for indicating that the data of the data page which is uniquely corresponding to the element is valid or invalid;
when the data page list is created, aiming at any data page in the M data pages, taking the value of the corresponding element of the data page in the data page list as a first preset value, wherein the first preset value is used for indicating that the data of the data page is invalid.
Optionally, the processing program is specifically configured to:
and aiming at any data page in the M data pages, when the data page is written into the data page for the first time after the data page list is created, taking an element value corresponding to the data page as a second preset value, wherein the second preset value is used for indicating that the data of the data page is valid.
Optionally, the processing program is further configured to:
if the data stored in the data page needs to be updated, storing the updated data of the data page into a first data page; the first data page is any data page in the device;
taking the element value corresponding to the first data page in the data page list as the second preset value;
and taking the element value corresponding to the data page in the data page list as the first preset value.
Optionally, the processing program is further configured to:
and recreating the data page list.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention clearer, the present invention will be described in further detail with reference to the accompanying drawings, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiments of the present invention will be described in further detail with reference to the drawings.
The hard disk is one of storage media of storage devices such as computers and servers, and the solid state hard disk is a hard disk made of a solid state electronic storage array. The memory array is composed of a large number of memory cells, each of which can store 1-bit binary data (0, 1). Typically, the memory cells are arranged in a row by column matrix. The method is characterized in that a plurality of disks form an array and are used as a single disk, data are stored in different disks in a segmented mode, and when the data are accessed, related disks in the array act together, so that the data access time is greatly reduced, and meanwhile, the space utilization rate is better.
The structure of the storage array of the solid state disk in the embodiment of the invention is shown in fig. 1.
The storage array can be divided into L horizontal strips from the horizontal direction, and the number is from 1 to L; the longitudinal direction can be divided into P longitudinal strips, and the number is from 1 to P. One longitudinal stripe consists of K sub-bands, and one sub-band consists of L data blocks, which are numbered from 1 to L. In addition, each data block has a plurality of data pages, and the number of data pages of each data block is S in the embodiment of the present invention, so that one horizontal stripe includes P · K · S data pages, and L, P, K, and S are integers greater than 0.
Each data page has a redundant area for storing the physical address of the data page and the logical address corresponding to the physical address. The computer also has an address mapping table for storing the physical address of each data page and the mapped logical address, and when accessing each data page, the data page can be accessed only by operating the logical address of the data page and indirectly accessing the physical address of the data page. When the data of the original data page is updated, the logical address of the data page in the mapping table maps the physical address of the data page storing the new data. At this time, the physical address of the original data page is lost and can not be accessed any more, at this time, the data stored in the old data page is called invalid data, and the data stored in the new data page is called valid data. A page of data does not store both valid and invalid data. In the embodiment of the present invention, if the logical address of a data page is the same as the physical address recorded in the data page in the address mapping table, the data page is called a valid data page, and otherwise, the data page is called an invalid data page. The data in the invalid data page cannot be accessed and occupies memory space, which is certainly a waste of memory resources.
Therefore, in order to fully utilize the storage resources, it is necessary to erase the storage space after storing the valid data stored in the valid data page in a certain storage area in another storage area, which is called data reclamation. The storage unit with the smallest data recovery is called a data recovery unit, that is, the local data in the data recovery unit cannot be recovered. In the embodiment of the present invention, the data recovery unit is a horizontal stripe.
In the embodiment of the present invention, a global table, called a data page list, for recording valid data pages is stored in the RAM of the storage device for each horizontal stripe. The data page list completely records the validity of the stored data of all data pages in one horizontal stripe by using bits, and if the stored data of the data page is valid, 1 is set; otherwise, it is set to 0.
Fig. 2 is a schematic structural diagram of a data page list corresponding to a data recovery method according to an embodiment of the present invention. Let T +1 data pages total, and every 1 bit represents the validity of 1 data page. Thus, the validity of 32 data pages is represented every 4 bytes. In the figure, first lines 0 to 31 indicate the serial numbers of each line of data pages; data page 0, data page 32, a. And T bits are integers greater than 0. Each data page corresponds to a valid bit, which is taken as binary, and has a value of 0 or 1, and the value of the valid bit of each data page is shown as an example. After recording a certain amount of data, the data page list is stored in a Single-level cell (SLC) of the storage device, for example, 64 Megabytes (MB) in units of the data amount. Specifically, the SLC is stored after the data page list is fully stored, that is, in units of the size of the storage space of the data page list. It should be noted that, when the solid state disk has an abnormal power failure and causes a data page not to be stored in the SLC in time, a table entry may be reconstructed by reading a physical address recorded in a redundant area of each data page in the horizontal stripe in a power failure reconstruction stage of the hard disk, consistency comparison is performed between the physical address of the data page and the physical address of the data page in the address mapping table, the same is valid, and the different is invalid, the comparison process is advanced to the power-on reconstruction stage, so that IO read-write performance is not affected, and at most, a validity flag corresponding to 64MB of data needs to be recovered each time.
The storage device uses a table to manage the physical location of the data page list in the flash memory and the mapping relationship between the data page list and the corresponding horizontal stripe, which is called a lookup table. For the horizontal stripes, one horizontal stripe contains P · K · S data pages, and if 1 bit can represent 1 data page, the size of the data page list corresponding to one horizontal stripe is (P × K × S/(8 × 1024)) kilobytes (Kilobyte, KB).
The following describes the data recovery method provided in the embodiment of the present invention in detail. Referring to fig. 3, a flowchart illustrating steps of a data recovery method according to an embodiment of the present invention is shown.
Step 301: the storage device reads a data page list of the data recovery unit.
Step 302: and the storage equipment determines N data pages with valid data in the data recovery unit corresponding to the data page list according to the data page list.
The data page list records whether the data of each data page in the data recovery unit is valid data; n is an integer greater than or equal to 0.
Step 303: and when N is larger than 0, the storage equipment backs up the data of the N data pages and erases all the data of the data recovery unit.
In step 301, the storage device first loads a lookup table, determines a data page list of the data recovery unit according to a relationship between the data recovery unit and the data page list recorded in the lookup table, reads the data page list of the data recovery unit from a storage location of the data page list, and loads the data page list into the memory.
Before step 302, the values of the T valid bits in the data page list are preprocessed. Of the T valid bits, each valid bit uniquely corresponds to a data page in the data recovery unit. For example, the valid bit is a binary value, and if the valid bit of the data page is 1, the data page is a valid data page; if the valid bit of the data page is 0, the data page is an invalid data page. In a storage array of a solid state disk, for any horizontal stripe which does not store data or is erased to carry out data, after the horizontal stripe is electrified for the first time, a data page list of the horizontal stripe is created. When the data page list is created, the values of M elements in the data page list are all preprocessed. For each data page, the storage device sets the value of the corresponding element of the data page in the data page list to a first preset value, and the first preset value is used for indicating that the data of the data page is invalid. And for any data page in the M data pages, when the data page is written into the data page for the first time after the data page list is created, the storage device takes the element value corresponding to the data page as a second preset value, and the second preset value is used for indicating that the data of the data page is valid.
For example, when the data page list is created, for a binary valid bit, the memory device sets the valid bit of each data page to 0, and 0 is used to indicate that the data of the data page is invalid. For each data page, when the data page is written with data for the first time after the data page list is created, the storage device takes the element value corresponding to the data page to be 1,1 for indicating that the data of the data page is valid.
After the storage device takes the value of the element corresponding to the data page as the second preset value, the value of the element may still be taken as the first preset value again, and one possible implementation manner is as follows:
when the data stored in the data page needs to be updated, specifically, the storage device stores the updated data of the data page to the first data page; the first data page is any data page in the storage device; the storage equipment updates the physical address mapped by the logical address of the data page in the address mapping table into the physical address of the first data page; the storage device updates the corresponding value of the element of the data page in the data page list to a first preset value; the storage device updates the corresponding value of the element of the first data page in the data page list to a first preset value.
For example, the data page name is data page 1, the physical address is PAdr1, the logical address is LAdr1, and the stored data is text document 1. When the text document 1 is modified and updated, the updated file is referred to as a text document 2. The storage device stores the text document 2 to the data page 2 and updates the physical address mapped by the LAdr1 in the address mapping table to the physical address PAdr2 of the text document 2. Thereafter, the memory device modifies the value of data page 1 in the data page list to 0 and modifies the value of data page 2 in the data page list corresponding to data page 2 to 1.
In step 302, the data page list records whether the data stored in each data page is valid or invalid. And screening N valid data pages recorded in the data page list in the data recovery unit according to a preset condition that the valid bit is 1 to represent valid and the valid bit is 0 to represent invalid.
If the data recovery unit has no valid data page, that is, if N is 0, all data in the data recovery unit is directly erased.
In the step, the storage device directly and quickly determines the effective data pages, so that the garbage recovery time is greatly shortened, RAM resources can be timely released for IO use, and the garbage recovery efficiency and the read-write performance of the solid state disk are improved.
In step 303, when N is greater than 0, the storage device backs up the data of the N data pages, and erases all the data of the data recovery unit.
One possible implementation manner is that, after the storage device directly reads the data of the N data pages and stores the data in other locations of the storage device, all the data of the data recovery unit is erased, so that the valid data is not lost.
Optionally, after the storage device erases all data of the data recovery unit, the method further includes: the storage device recreates the list of data pages. The re-creation means that after the data of the data page list is erased, the value of each element is taken as a first preset value.
In the embodiment of the invention, the storage device stores a global data page list in the RAM for each horizontal stripe, and the global data page list is used for recording the validity of each data page in the horizontal stripe. When a data page list is stored fully, the storage device divides a part of area in the flash memory to store the data page list, and when the data page list is stored in the flash memory, even if the storage device is powered off, the record and IO performance of the data page list cannot be influenced. In addition, the storage device provides a look-up table to hold a list of physical storage locations of the data pages in the flash memory. The storage device determines the number and physical location of valid data pages from the data page list quickly and efficiently.
In the embodiment of the invention, the storage device stores the data page list of N data pages for recording the data in the data recovery unit as the valid data in advance, only the data page for storing the valid data is read according to the data page list, the data page for storing the invalid data does not need to be read in a waste operation mode, and the cost for data recovery is reduced.
Fig. 4 is a detailed flowchart of steps of a data processing method according to an embodiment of the present invention.
The method comprises the following specific steps:
step 401: the storage device determines a data reclamation unit.
Step 402: the storage device determines a data page list corresponding to the data recovery unit.
Step 403: the storage device judges whether a data page list corresponding to the data recovery unit is in the RAM.
If yes, go to step 406; if not, go to step 404.
Step 404: the storage device determines the physical storage location of the data page list from the look-up table.
Step 405: the storage device loads the list of data pages.
Step 406: the storage device determines valid data pages of the data recovery unit according to the data page list.
Step 407: the storage device reads the valid data pages in the data reclamation unit.
Step 408: the storage device backs up the data of the valid data pages.
Step 409: the storage device erases all data of the data recovery unit and recreates the data page list for the data recovery unit.
Fig. 5 is a schematic diagram of a specific implementation method corresponding to the data recovery method provided in the embodiment of the present invention. Fig. 5 shows the valid states of all data pages in a horizontal stripe b containing T data pages, the bit 1 indicates that the data page is valid, and there are 5 valid data pages in the figure, and the valid data pages are relatively scattered and show strong randomness. Taking a Triple-Level Cell (TLC) as an example, the time for reading one datum is 80 microseconds us, the size of one data page is 16KB, assuming that T =147456, the hard disk reading rate is 500MB/s, and the time for traversing and reading all data pages is 4.608 seconds without considering the write IO.
The method in the prior art includes sequentially reading the physical address of each data page in a traversal manner, comparing the physical address with the corresponding physical address of the data page in an address mapping table, and if the physical address is the same as the corresponding physical address of the data page, indicating that the data page is valid; if not, it indicates that the data has been updated and the data page is invalid. The horizontal stripe b only has 5 valid data pages, but this method takes the read time of T-5 invalid data pages.
The method comprises the following specific steps:
step (1): determining a data page list x corresponding to the horizontal stripe b according to the lookup table;
step (2): determining a physical position A1 corresponding to the data page list x according to the lookup table;
and (3): reading a data page list x from A1 to a RAM;
and (4): inquiring the data page list x to know that bit corresponding to the data page 33, the data page 98, the data page T-59, the data page T-60 and the data page T-38 is 1, and 5 effective data pages are obtained in total;
and (5): directly reading the data of the 5 data pages into a memory;
and (6): and storing the data of 5 data pages into other storage areas, erasing the horizontal stripe b for recycling, and recreating the data page list x.
Compared with the traditional method, the method of the application only needs to read 5 data pages without traversing and reading all the data pages, and saves the time for reading T-5 data pages.
Embodiments of the present invention further provide a storage medium, which includes a program or instructions, where the program or instructions are executed to implement a data recovery method and any optional method provided by embodiments of the present invention.
Embodiments of the present invention further provide a computer including a program or instructions, where the program or instructions are executed to implement a data recovery method and any optional method provided by embodiments of the present invention.
Fig. 6 is a schematic structural diagram of a data recovery device according to an embodiment of the present invention.
A data recovery device, comprising:
a processing unit 601, a storage unit 602, the storage unit 602 being used for storing an acquisition program and a processing program, the processing unit 601 being used for executing the acquisition program and the processing program,
acquiring a program: a data page list for reading the data recovery unit;
the processing procedure comprises the following steps: the data recovery unit is used for determining N data pages of which the data in the data recovery unit corresponding to the data page list is valid data according to the data page list; the data page list records whether the data of each data page in the data recovery unit is valid data; n is an integer greater than or equal to 0;
and the data recovery unit is used for backing up the data of the N data pages and erasing all the data of the data recovery unit when N is larger than 0.
Optionally, the obtaining program is specifically configured to:
determining the data page list according to a lookup table; the look-up table records the mapping relation between the data recovery unit and the data page list and the physical storage position of the data page list;
reading the list of data pages from the physical storage location.
Optionally, the processing program is specifically configured to:
the data recovery unit comprises M data pages; m is an integer greater than or equal to N;
the data page list comprises M elements, and for any one element in the M elements, the element uniquely corresponds to one data page in the M data pages and is used for indicating that the data of the data page uniquely corresponding to the element is valid or invalid;
when the data page list is created, aiming at any data page in the M data pages, taking the value of the corresponding element of the data page in the data page list as a first preset value, wherein the first preset value is used for indicating that the data of the data page is invalid.
Optionally, the processing program is specifically configured to:
and aiming at any data page in the M data pages, when the data page is written into the data page for the first time after the data page list is created, taking an element value corresponding to the data page as a second preset value, wherein the second preset value is used for indicating that the data of the data page is valid.
Optionally, the processing program is further configured to:
if the data stored in the data page needs to be updated, storing the updated data of the data page into a first data page; the first data page is any data page in the device;
taking the element value corresponding to the first data page in the data page list as the second preset value;
and taking the element value corresponding to the data page in the data page list as the first preset value.
Optionally, the processing program is further configured to:
and recreating the data page list.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While the preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all alterations and modifications as fall within the scope of the application.